Gonéri's blog Light modeDark mode


Llama.cpp with Vulkan

Running Llama.cpp with the Vulkan backends of my AMD and Intel graphic GPUs ends up being straightfoward and surprisingly fast.

Build you container with:

podman build -t llama-cpp-vulkan --target server -f .devops/vulkan.Dockerfile .

and run it with:

podman run -it --rm --security-opt seccomp=unconfined --device /dev/dri:/dev/dri --volume ~/.cache:/root/.cache:z -p 8080:8080 localhost/llama-cpp-vulkan -hf Qwen/Qwen3-Embedding-8B-GGUF --embeddings

In the example above, I reuse my ~/.cache directory and I use the model to compute embeddings, thus the --embeddings.