1 July 2026· 5 min read·Emre Yurtbay

Self-hosting Qdrant: a vector database for RAG via Docker in 15 minutes

A runnable minimal example – start Qdrant via Docker with an API key and a persistent volume, create a collection, upsert vectors, run a similarity query via curl – plus the Gridstore upgrade trap.

QdrantVector DatabaseRAGEmbeddingsSelf-HostingDockerDevOpsAI

If you want an LLM to answer over your own company data, there is no way around Retrieval-Augmented Generation (RAG): instead of fine-tuning the model, you retrieve the relevant documents for each query and pass them along as context. The heart of that retrieval is a vector database. Qdrant is one of the most popular open-source options and is self-hosted in minutes. This article stands up a minimal but complete Qdrant service – secured with an API key, with persistent storage – and walks the core idea end to end: create a collection, upsert vectors, run a similarity query. It complements the previously published LLM-serving post with the missing retrieval layer.

What Qdrant does – and the core idea

An embedding model turns text (or images, code, …) into a vector of numbers. Semantically similar content ends up close together in the vector space. Qdrant stores those vectors together with metadata (the payload) and answers the question "which stored vectors are closest to a query vector?" – fast, even with millions of entries.

The one decision you must understand: the distance metric. It has to match the embedding model. Models like OpenAI's text-embedding-3 or many sentence transformers are designed for Cosine; others expect Dot or Euclid. Choose wrong here and your results are simply poor – with nothing crashing to tell you. The second decision is the vector dimension: it is fixed by the model (e.g. 1536 for text-embedding-3-small).

Starting Qdrant via Docker

A single compose.yaml is enough. We set an API key right away – more on that in the pitfalls – and mount a volume so the data survives a restart:

services:
  qdrant:
    image: qdrant/qdrant:v1.18.2
    ports:
      - "6333:6333"
      - "6334:6334"
    environment:
      QDRANT__SERVICE__API_KEY: <YOUR_API_KEY>
    volumes:
      - qdrant_storage:/qdrant/storage
    restart: unless-stopped

volumes:
  qdrant_storage:

Start and check:

docker compose up -d
curl -s http://localhost:6333/healthz

The web dashboard lives at http://localhost:6333/dashboard. From now on every request needs the api-key header. We store the key once as a shell variable:

export QDRANT_KEY=<YOUR_API_KEY>

The minimal example: collection, upsert, query

For illustration we take tiny 4-dimensional vectors by hand – in practice your embedding model produces them. First create the collection with dimension and metric:

curl -s -X PUT http://localhost:6333/collections/docs \
  -H "api-key: $QDRANT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": { "size": 4, "distance": "Cosine" }
  }'

Now upsert a few points. Each point has an id, the vector, and a free-form payload (here the original text you will later hand to the LLM):

curl -s -X PUT http://localhost:6333/collections/docs/points \
  -H "api-key: $QDRANT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "points": [
      { "id": 1, "vector": [0.10, 0.20, 0.30, 0.90],
        "payload": { "text": "cancel invoice" } },
      { "id": 2, "vector": [0.90, 0.10, 0.10, 0.20],
        "payload": { "text": "reset password" } },
      { "id": 3, "vector": [0.15, 0.25, 0.35, 0.85],
        "payload": { "text": "request a credit note" } }
    ]
  }'

And finally the similarity search: the three nearest hits for a query vector, including payload:

curl -s -X POST http://localhost:6333/collections/docs/points/query \
  -H "api-key: $QDRANT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": [0.12, 0.22, 0.32, 0.88],
    "limit": 3,
    "with_payload": true
  }'

Qdrant returns the points sorted by score. The query vector sits close to points 1 and 3 – exactly those come out on top, while point 2 falls behind. That is the entire RAG retrieval step in one call: the returned text payloads are what you would now pass as context to your LLM.

The data flow at a glance:

   +-----------+   embedding   +----------------+   upsert    +------------+
   |  Document | ------------> |  vector + text | ----------> |   Qdrant   |
   +-----------+               +----------------+             | collection |
                                                              |   "docs"   |
   +-----------+   embedding   +----------------+   query     +-----+------+
   |   Query   | ------------> |  query vector  | ----------------> |
   +-----------+               +----------------+   top-k      +----v-------+
                                                               |  hits +    |
                                                               |  payload   |
                                                               +------------+

Three common pitfalls

1. No API key = open full access. If Qdrant starts without QDRANT__SERVICE__API_KEY, it runs with no authentication at all – anyone with network access can read, write, and delete collections. Always set the key (the env format is the prefix QDRANT__, with nested keys separated by double underscores) and never expose port 6333 unprotected to the internet. For production put TLS in front, otherwise the key travels in plain text.

2. Port 6333 (REST) vs. 6334 (gRPC). Qdrant offers two interfaces: REST/HTTP on 6333 and gRPC on 6334. The curl examples above use REST. When a client SDK reports "connection refused", the gRPC port 6334 is usually not mapped – the official clients often speak gRPC. When in doubt, map both.

3. The upgrade landmine v1.15 → v1.17. With v1.17.0 the old storage engine RocksDB was fully replaced by Gridstore. A direct jump from v1.15.x to v1.17.x (or the current v1.18.2) fails with an unsupported storage error. You have to upgrade one minor version per step: first v1.15 → v1.16, start the service cleanly once and let the migration run, then v1.16 → v1.17, and so on. Pin a concrete tag instead of :latest in your Compose file so that a docker compose pull does not accidentally carry you across this boundary.

Where to go from here

You now have the foundation: a Qdrant service, secured and persistent, that ingests vectors and answers similarity searches. The obvious next steps are a real embedding model instead of hand-written vectors, filtered searches over payload fields, and one of the official clients instead of curl. Go deeper in the Qdrant quickstart, the security guide, and the v1.17.0 release notes before you attempt an upgrade.

Note: The articles on this blog are produced with the help of AI and are editorially reviewed before publication. Editorial responsibility lies with Emre Yurtbay (see the Impressum).