Table of Contents
URL: https://www.progressiverobot.com/realtime-audio-translation-using-openai-api-on-open-webui/
Introduction
With the increasing demand for multilingual communication, real-time audio translation is rapidly gaining attention. In this tutorial, you will learn to deploy a real-time audio translation application using OpenAI APIs on Open WebUI, all hosted on a powerful GPU Droplet from the cloud provider.
the cloud provider's GPU Droplets, powered by NVIDIA H100 GPUs, offer significant performance for AI workloads, making them ideal for fast and efficient real-time audio translation. Let’s get started.
Prerequisites
- A the cloud provider Cloud account.
- A GPU Droplet deployed and running.
- An OpenAI API key set up for accessing the OpenAI models.
- Familiarity with SSH and basic Docker commands.
- An SSH key for logging into your GPU Droplet.
Step 1 - Setting Up the GPU cloud servers
1.Create a New Project – You will need to create a new project from the cloud control panel and tie it to a GPU Droplet.
2.Create a GPU Droplet – Log into your cloud account, create a new GPU Droplet, and choose AI/ML Ready as the OS. This OS image installs all the necessary NVIDIA GPU Drivers. You can refer to our official documentation on how to create a GPU Droplet.
3.Add an SSH Key for authentication – An SSH key is required to authenticate with the GPU Droplet and by adding the SSH key, you can login to the GPU Droplet from your terminal.
4.Finalize and Create the GPU Droplet – Once all of the above steps are completed, finalize and create a new GPU Droplet.
Step 2 - Installing and Configuring Open WebUI
Open WebUI is a web interface that allows users to interact with language models (LLMs). It's designed to be user-friendly, extensible, and self-hosted, and can run offline. Open WebUI is similar to ChatGPT in its interface, and it can be used with a variety of LLM runners, including Ollama and OpenAI-compatible APIs.
There are three ways you can deploy Open WebUI:
- Docker: Officially supported and recommended for most users.
- Python: Suitable for low-resource environments or those wanting a manual setup.
- Kubernetes: Ideal for enterprise deployments that require scaling and orchestration.
In this tutorial you will deploy Open WebUI using Docker as a docker container on the GPU Droplet with Nvidia GPU support. You can check out and learn about how to deploy Open WebUI using other techniques in this Open WebUI quick start guide.
Docker Setup
Once the GPU Droplet is ready and deployed. SSH to the GPU Droplet from your terminal.
ssh root@<your-droplet-ip>
This Ubuntu AI/ML Ready H100x1GPU Droplet comes pre-installed with docker.
You can verify the docker version using the below command:
docker --version
[secondary_label Output]
Docker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1
Next, run the below command to verify and ensure Docker has access to your GPU:
docker run --rm --gpus all nvidia/cuda:12.2.0-runtime-ubuntu22.04 nvidia-smi
This command pulls the nvidia/cuda:12.2.0-runtime-ubuntu22.04 image (if it has not already been downloaded or updates an existing image) and starts a container.
Inside the container, it runs nvidia-smi to confirm that the container has GPU access and can interact with the underlying GPU hardware. Once nvidia-smi has executed, the --rm flag ensures the container is automatically removed, as it’s no longer needed.
You should observe the following output:
[secondary_label Output]
Unable to find image 'nvidia/cuda:12.2.0-runtime-ubuntu22.04' locally
12.2.0-runtime-ubuntu22.04: Pulling from nvidia/cuda
aece8493d397: Pull complete
9fe5ccccae45: Pull complete
8054e9d6e8d6: Pull complete
bdddd5cb92f6: Pull complete
5324914b4472: Pull complete
9a9dd462fc4c: Pull complete
95eef45e00fa: Pull complete
e2554c2d377e: Pull complete
4640d022dbb8: Pull complete
Digest: sha256:739e0bde7bafdb2ed9057865f53085539f51cbf8bd6bf719f2e114bab321e70e
Status: <^>Downloaded newer image for nvidia/cuda:12.2.0-runtime-ubuntu22.04<^>
==========
== CUDA ==
==========
CUDA Version 12.2.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Thu Nov 7 19:32:18 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:00:09.0 Off | 0 |
| N/A 28C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Deploy Open WebUI using Docker with GPU Support
Please use the below docker command to run the Open WebUI docker container.
docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --gpus all ghcr.io/open-webui/open-webui:main
The above command runs a Docker container using the open-webui image and sets up specific configurations for network ports, volumes, and GPU access.
- docker run -d:
docker runstarts a new Docker container.-druns the container in detached mode, meaning it runs in the background.
- -p 3000:8080:
- This maps port 8080 inside the container to port 3000 on the host machine.
- It allows you to access the application in the container by navigating to
http://localhost:3000on the host.
- -v open-webui:/app/backend/data:
- This mounts a Docker volume named
open-webuito the/app/backend/datadirectory inside the container. - Volumes are used to persist data generated or used by the container, ensuring it remains available even if the container is stopped or deleted.
- –name open-webui:
- Assigns the container a specific name,
open-webui, which makes it easier to reference (e.g.,docker stop open-webuito stop the container).
- ghcr.io/open-webui/open-webui:main:
- Specifies the Docker image to use for the container.
ghcr.io/open-webui/open-webuiis the name of the image, hosted on GitHub's container registry (ghcr.io).mainis the image tag, often representing the latest stable version or main branch.
- –gpus all:
- This option enables GPU support for the container, allowing it to use all available GPUs on the host machine.
- It’s essential for applications that leverage GPU acceleration, such as machine learning models.
Verify if the Open WebUI docker container is up and running:
docker ps
[secondary_label Output]
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4fbe72466797 ghcr.io/open-webui/open-webui:main "bash start.sh" 5 seconds ago <^>Up 4 seconds (health: starting)<^> 0.0.0.0:3000->8080/tcp, :::3000->8080/tcp open-webui
Once Open WebUI container is up and running, access it at http://<your_gpu_droplet_ip>:3000 on your browser.
Step 3 - Add OpenAI API Key to use GPT-4o with Open WebUI
In this step, you will add your OpenAI API key to Open WebUI.
Once logged in to the Open WebUI dashboard, you should notice no models running as seen in the below image:
To connect Open WebUI with OpenAI and use all the available OpenAI models, follow the below steps:
- Open Settings:
- In Open WebUI, click your user icon at the bottom left, then click Settings.
- Go to Admin:
- Navigate to the Admin tab, then select Connections.
- Add the OpenAI API Key:
- Add your OpenAI API key in the right textbox under the OpenAI API tab.
- Verify Connection:
- Click Verify Connection. A green light confirms a successful connection.
Now, Open WebUI will then auto-detect all available OpenAI models. Select GPT-4o from the list.
Next, set the text-to-speech and speech-to-text models and audio settings to use OpenAI whisper model:
Again, navigate and click Settings -> Audio to configure and save the audio STT and TTS settings, as seen in the above screenshot.
You can read more about the OpenAI text-to-speech and speech-to-text platform.openai.com.
Step 4 - Set up Audio Tunneling
If you're streaming audio from your local machine to the Droplet, route the audio input through an SSH tunnel.
Since the GPU Droplet has the Open WebUI container running on http://localhost:3000, you can access it on your local machine by navigating to http://localhost:3000 after setting up this SSH tunnel.
This is required to let Open WebUI access the microphone on your local machine for realtime audio translation and realtime lamguage processing. As without this it will throw the below error when clicking the headphone or microphone icon to use GPT-4o for natural language processing tasks.
Use the below command to set a local SSH tunnel from your local machine to the GPU Droplet by opening a new terminal on your local machine:
ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=5 root@<gpu_droplet_ip> -L 3000:localhost:3000
This command establishes an SSH connection to your GPU Droplet as the root user and establishes a local port forwarding tunnel. It also includes options to keep the SSH session alive. Here’s a detailed breakdown:
- -o ServerAliveInterval=60:
- This option sets the
ServerAliveIntervalto 60 seconds, meaning that every 60 seconds, an SSH keep-alive message is sent to the remote server. - This helps prevent the SSH connection from timing out due to inactivity.
- -o ServerAliveCountMax=5:
- This option sets the
ServerAliveCountMaxto 5, which allows up to 5 missed keep-alive messages before the SSH connection is terminated. - Together with
ServerAliveInterval=60, this setting means the SSH session will stay open for 5 minutes (5 × 60 seconds) of no response from the server before closing.
- -L 3000:localhost:3000:
- This part sets up local port forwarding.
3000(before the colon) is the local port on your machine, where you will access the forwarded connection.localhost:3000(after the colon) refers to the destination on the GPU Droplet.- In this case, it forwards traffic from port 3000 on your local machine to port 3000 on the GPU Droplet.
Now, this command will allow you to access the Open WebUI by visiting http://localhost:3000 on your local machine and also use the microphone for real-time audio translation.
Step 5 - Implementing Real-time Translation with GPT-4o
Click the headphone or microphone icon to use whisper and GPT-4o models for natural language processing tasks.
Clicking on the Headphone/Call button will open a voice assistant using OpenAI GPT-4o and whisper models for real-time audio processing and translation.
You can use it to translate and transcribe the audio in real time by talking with the GPT-4o voice assistant.
Conclusion
Deploying real-time audio translation using OpenAI APIs on Open WebUI with the cloud provider’s GPU Droplets allows developers to create high-performance translation systems. With easy setup and monitoring, the cloud provider’s platform provides the resources for scalable, efficient AI applications.