Simple RAG with Ollama, OpenWebUI, and VectorDB on Ubuntu 22.04

Prerequisites

I have already installed Nvidia proprietary drivers and the Nvidia Cuda Toolkit. I documented the install of the Cuda toolkit in an older post which can be found here.

Since I have Nvidia GPUs in my host system, and I intend to run some services in containers, I want to make sure that I install the nvidia-container-toolkit. Instructions on how to setup the repo on Ubuntu 22.04 can be found here. Once you have setup the repo, you can follow the steps below, or just follow the instructions found in the link above.

$ sudo apt-get install -y nvidia-container-toolkit

I will be using Docker as my container runtime. So I need to configure the container runtime to use docker.

$ sudo nvidia-ctk runtime configure --runtime=docker

Now restart Docker.

$ sudo systemctl restart docker

Install Ollama

Run the command below to install ollama as a service. You can also choose to run containerized ollama, however those steps are not documented here.

curl -fsSL https://ollama.com/install.sh | sh

Once complete, run the command below to confirm installation, and check version.

$ ollama --version
ollama version is 0.5.12

Add the environment variable below to the ollama service file in order to listen on all interfaces. This way we can access remotely if needed.

Environment=”OLLAMA_HOST=0.0.0.0″

Edit the ollama service files and add the line above to the bottom of the file.

$ sudo vi /etc/systemd/system/ollama.service

The file should appear as shown below.

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/cuda/bin:/home/cpaquin/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
Environment="OLLAMA_HOST=0.0.0.0"

[Install]
WantedBy=default.target

Now restart ollama and reload systemd

$ sudo systemctl restart ollama.service
$ sudo systemctl daemon-reload

Confirm ollama is listening on all interfaces.

$ netstat -a | grep 11434
tcp6 0 0 [::]:11434 [::]:* LISTEN

Test Connectivity from a remote host. If you are unable to reach the ip/port you may need to modify firewall on the remote system.

$ telnet 10.1.10.14 11434
Trying 10.1.10.14…
Connected to 10.1.10.14.
Escape character is '^]'.

Install OpenWeb UI

I am using the open-webui:cuda container as I am running dual Nvidia GPUs. I want OpenWeb-UI to bind to the primary interface on the host so that I can access it from my workstation.

# docker run -d -p 10.1.10.14:3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

The -v option in the command below will create a volume for OpenWeb-UI. You can verify with the command below

$ docker volume ls
DRIVER    VOLUME NAME
local     open-webui

Confirm that the container has started.

$ docker ps

If it has failed to start use “docker logs <container-id>” to troubleshoot.

Now, in a web browser, navigate to the ip/port combo you entered in the docker run command shown above. You should be greeted with the OpenWeb-UI getting started page.


Install ChromaDB

Use the command below to instanciate the ChromaDB container. 10.1.10.14 is the ip of my host, adjust as needed to fit your needs

$ docker run -d \
  --name chromadb \
  -p 10.1.10.14:8000:8000 \
  -v chroma_data:/chroma_db \
  --restart unless-stopped \
  chromadb/chroma

Confirm you can curl api.

curl http://10.1.10.14:8000/api/v1

Reinstantiate OpenWebUI with ChromaDB Connectivity

Stop current OpenWebUI container.

$ docker stop open-webui

Delete the container

$ docker remove open-webui

Recreate the container but add the following

  1. -e VECTORDB_PROVIDER=chroma
  2. -e CHROMADB_SERVER_HOST=”http://10.1.10.14:8000&#8243; (modify IP to fit your env)
$ docker run -d -p 10.1.10.14:3000:8080 --gpus all \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
-e VECTORDB_PROVIDER=chroma \
-e CHROMADB_SERVER_HOST="http://10.1.10.14:8000" \
--name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Check to ensure that env variables were set correctly

$ docker exec -it open-webui env | grep -i chroma
VECTORDB_PROVIDER=chroma
CHROMADB_SERVER_HOST=http://10.1.10.14:8000


Setup RAG in OpenWebUI

In order to setup RAG you will need to do the following.

  1. Create a Knowledge Base
  2. Upload files
  3. Create the Model that will use the Knowledge Base

First we will create a knowledge base. Navigate to Workspace > Knowledge > + Create a Knowledge Base. Choose a name for your Knowledge base and add a description.

We are going to name ours, Gordon Lightfoot

Now select Create Knowledge

Look real hard for the text “Drag and drop a file to upload or select a file to view”. This is where you w drag and drop your documents. We have two documents to add to our collection.

You can also upload entire directories or sync with a directory

Now navigate to Workspace > Models > + Add New Model. Imput a name for your custom model and choose a base model. I am choosing tinyllama for this test.

Scroll down and select Save & Create

Now let’s chat with our new model. Select Workspace and then select your new model (My Gordon Lightfoot Model)

You are now ready to chat with your model

Let’s ask it a question

So there you go. A working RAG implementation with Ollama, OpenWebUI, and VectorDB. I am sure that there are a lot more features to explore here and I am sure I have a lot more tuning to do. But for now I am off to a good start.


References

  1. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
  2. https://docs.openwebui.com/tutorials/tips/rag-tutorial/

Leave a Reply