Table of Contents
Introduction
Deploying large language models (LLMs) no longer needs to be complicated or costly. Traditionally, setting up LLMs involves dealing with complex dependencies, specific hardware requirements, and time-consuming configurations. But with cloud servers, you can now run LLMs easily in the cloud—without the headache of intricate setups.
For many developers, the biggest barrier to exploring LLMs is the deployment process itself. Managing libraries, configuring containers, and working with heavy frameworks can quickly turn into a frustrating experience. Thankfully, Ollama and the cloud provider provide a much more developer-friendly solution.
Ollama streamlines the entire process by allowing you to run and manage LLMs like Phi-3 with just a single command. When combined with a cloud servers, deploying models becomes incredibly straightforward—no Docker, no CUDA, and no multi-step configurations required. Everything runs smoothly, right out of the box.
To make the experience even better, you’ll also set up a WebUI that lets you chat with your deployed model directly from your browser. The result? A fast, clean, and interactive interface that gets your LLM up and running in minutes.
In this article, you’ll learn how to:
- Launch a cloud servers suitable for LLM deployment
- Run Ollama and run Phi-3 with a single command
- Set up a lightweight Open WebUI for seamless interaction
- Extend the setup to support LLMs in just a few clicks
Prerequisites
Before you begin, make sure you have the following:
- A cloud account: You’ll need an active cloud account to create and manage Droplets. If you don’t have one, sign up here.
- Basic knowledge of SSH and the command line: You should be comfortable connecting to a remote server using SSH and running a few terminal commands.
- A cloud servers (Ubuntu 22.04 or later): A CPU-optimized or regular Droplet (with at least 2 vCPUs and 4GB RAM) is recommended for running lightweight LLMs like Phi-3.
- SSH key added to your cloud account: Set up an SSH key for secure access to your Droplet. You can follow the guide if you haven't done this before.
What is Open WebUI?
Open WebUI is a user-friendly, web-based interface that lets you talk to AI models (like Qwen, Llama or Phi-3) right from a browser—just like chatting with someone on WhatsApp or Messenger.
Instead of typing commands into a technical terminal or coding environment, Open WebUI gives you a clean, chat-like window where you can simply ask questions, give instructions, or have conversations with the AI. No coding required!
When you first set it up, Open WebUI guides you step-by-step and lets you create an admin account to manage users securely. Also you get full control over your models and there is also a feature to create custom user groups, who can control the chats with the models, and also decide who can manage or upload new models.
What Can You Do With Open WebUI in Real Life?
Here are some real-world uses of Open WebUI: Ask questions and learn new things Type in questions about science, history, cooking, or any topic, and get instant answers from the AI. Generate content Need help writing emails, blog posts, social media captions, or product descriptions? Just describe what you need, and the AI will write it for you. Image Generation Integration Chat with the AI to generate images using text prompts—perfect for creatives and designers looking to prototype visuals on the fly.
Introduction to Ollama
Ollama is a open source tool which makes it super easy to run small or large language model like Llama 3, Mistral, Gemma on your server with just one command. Think of it like Docker, but specifically for LLMs. It handles all the complicated stuff behind the scenes, so you don’t have to worry about installing frameworks, managing dependencies, or setting up GPU support manually. With the cloud provider, you do not have to deal with Ollama installation and setup the CLI as well; users simply need to select the model of choice and pull the model with just one command.
ollama run model_name
Ollama lets you download, run, and interact with language models—all from your own machine or cloud server—providing a private, scalable, and hassle-free environment for experimentation.
Previously, running an LLM meant manually downloading model weights from repositories like Hugging Face or GitHub, setting up dependencies, and configuring everything just to get started with inference. Ollama simplifies this entire process with a single command:
ollama run <model-name>
This one-liner handles everything—from downloading the model to launching a local inference server, much like spinning up a web server.
Ollama supports a wide range of powerful models out of the box, including the LLaMA series, Mistral, Qwen, Phi-3, and many more—making it one of the simplest ways to get started with LLMs in minutes.
Why Developers Love Ollama
Setting Up Your cloud servers
In this section, we will learn how to quickly set up a droplet and get started with ollama and open the Web UI.
First, we will log in to our cloud account
If you don’t have a cloud account yet, you can sign up easily. Once you’re logged in, you’ll land on your project dashboard. From there, click the “Create” button at the top. In the dropdown menu, select “Droplet” to start creating your cloud server.
Choose Region
Select your region, and make sure you select the region that is closest to your location.
Scroll down to select the image
By default, the “OS” tab is selected. Switch to the “Marketplace” tab, where you'll find a search box. Type “ollama with open webui” into the search bar, and the relevant option will appear in the results. Once this option gets highlighted, scroll down further to choose the size.
Choose Size
The default option here is the “Shared CPU”; for this tutorial, we will use it. Scroll further to choose from the “CPU options.” There are three options here, and they are:
- Regular
- Premium Intel
- Premium AMD
Feel free to choose as per your requirements and the model that you wish to deploy. Next, select the plan that you wish to go ahead with. For our case, we will go ahead with the premium AMD setup equipped with 4 AMD CPU cores and 8 GB of RAM, making it suitable for moderate workloads, development tasks, and hosting lightweight applications. It includes 160 GB of high-speed NVMe SSD storage, ensuring fast data access and improved performance. Additionally, the plan offers 5 TB of data transfer, providing ample bandwidth for serving web applications, transferring files, or handling API requests with ease.
Authentication Method
Now, select the authentication plan that you wish to go ahead with. We have two options here: “SSH Key” and “Password.” Select the one that works the best. For our case, we will go ahead with the “SSH Key.”
Hostname
Enter a hostname for your Droplet and click on “Create Droplet.” the cloud provider will then automatically set up Ollama with the Web UI, and your hosted instance will be ready shortly.
Droplet
Once the setup is complete, the Droplet will start running, indicated by a green dot next to the newly created Droplet.
Running Open WebUI
Once our droplet is ready, we will run the open webUI and access the dashboard. We will simply copy the droplet's IP address, which is located right next to its name. We will open up a new tab and type 'http://' along with your IP address. Press Enter. This will open up the sign-in page for Open WebUI.
Sign up to create the account and log in to the Open WebUI console. Please note that this process may take a couple of minutes.
Once this setup is complete, you will be logged in to the console.
Initially, we will have “tiny llama” as the default model. However, in our case, we will install models from Ollama and learn how to use them.
Ollama Setup and Downloading Phi3
Next, let’s add a base model from Ollama to your self-hosted Open WebUI instance.
Start by clicking on your profile icon in the bottom-left corner and choosing “Admin Panel.” In the admin panel, go to “Settings,” then click on “Models” from the menu on the left-hand side.
In this section, you’ll find an option to pull a model from Ollama using its model tag. Just below this panel, there's a link to Ollama’s official documentation where you can browse available models with the model tags.
To obtain the correct model tag, navigate to the Phi3 official documentation of Ollama and search for the model you want to use. In this example, we'll look for phi3:latest, then copy the corresponding model tag for use in your deployment.
ollama run phi3
We will recommend choosing a lightweight model here for faster inference. Next, go back to your admin panel and paste the model tag under the “Pull a model from Ollama.com” panel.
Click on the download button to begin pulling the model from ollama.
Once the model is downloaded, you will receive notifications confirming its completion, and it will be ready for use through the Web UI.
Using Phi-3 in Open WebUI
Once the model is successfully downloaded, we can chat using the model with Open WebUI. Please note here that all the model that has been downloaded will be available here and users are free to choose the model that works the best. Navigate to the top left corner and click on “New Chat” and select the base model.
In our case we will try the Phi3:latest:3.8B, hence allowing you to interact with it directly through the Web UI. Select the model and click on the text box to send a message to the model.
!deploy phi3.gif)
Troubleshooting: Ollama Internal Server Error (500)
If you encounter the following error while interacting with the Ollama Web UI:
Ollama: 500, message='Internal Server Error', url=URL('http://localhost:11434/api/chat') #3554
In this case,
- Re-run the Web UI.
- Verify model availability: Ensure that the model you’re trying to use (e.g.,
phi3:latest:3.8B) is fully downloaded and loaded. - Sometimes this error occurs when the model you have downloaded is resource intensive and you will either need to use a light weight model or a upsize the droplet.
If the problem persists, consider checking Ollama GitHub Issues or filing a new issue with detailed logs.
FAQ Section
1. What is Phi-3? Phi-3 is a lightweight, open-source large language model developed by Microsoft, optimized for efficiency and on-device usage.
2. What is Ollama? Ollama is a tool that simplifies downloading, running, and managing local LLMs like Phi-3 on your machine or server.
3. What is Open WebUI? Open WebUI is an open-source frontend that provides a clean, user-friendly interface for interacting with LLMs like those hosted via Ollama.
4. Can I run Phi-3 on a CPU-only Droplet? Yes, Phi-3 is designed to run on CPUs, though performance will be slower compared to GPU-based droplets.
5. Is this setup secure for production use? While suitable for development, securing the setup for production requires additional steps like HTTPS, authentication, and firewall rules.
6. How do I access the WebUI? Once deployed, you can access Open WebUI in your browser at http://your-droplet-ip.
7. Can I use other models with Ollama and Open WebUI? Yes, Ollama supports multiple models, including LLaMA, Mistral, and Gemma, which are also compatible with Open WebUI.
8. Is hosting on the cloud provider expensive? the cloud provider offers affordable hourly pricing, and you can choose smaller droplets if you're using CPU-only models like Phi-3.
9. How do I update Phi-3 or Ollama in the future? You can update Ollama via their official script and re-pull the latest Phi-3 model with ollama pull phi3.
Conclusion
Running powerful language models like Phi-3 doesn’t have to be hard or expensive. With Ollama, Open WebUI, and a cloud servers, you can set everything up in just a few steps—no complicated tools, no GPU setup, no stress.
Whether you're a developer exploring LLMs or just want your own private AI chatbot, this setup gives you an easy and reliable way to get started. Just launch a Droplet, select the model of your choice from the Ollama library, and start chatting—all within minutes.
It’s the simplest way to run and explore LLMs on your terms.