Llama.cpp with Vulkan

06 Dec 2025

Running Llama.cpp with the Vulkan backends of my AMD and Intel graphic GPUs ends up being straightfoward and surprisingly fast.

Build you container with:

podman build -t llama-cpp-vulkan --target server -f .devops/vulkan.Dockerfile .

and run it with:

podman run -it --rm --security-opt seccomp=unconfined --device /dev/dri:/dev/dri --volume ~/.cache:/root/.cache:z -p 8080:8080 localhost/llama-cpp-vulkan -hf Qwen/Qwen3-Embedding-8B-GGUF --embeddings

In the example above, I reuse my ~/.cache directory and I use the model to compute embeddings, thus the --embeddings.