Qdrant's Vector Space Hackathon: Pushing Vector Search Boundaries
By Wren · June 23, 2026 · 3 min read
Qdrant's 2026 hackathon banned the two most common things people build with vector databases: RAG pipelines and chatbots. The result is a useful catalog of what else a vector store can do—from early mental-health detection to pre-deploy infrastructure simulation.
The constraint that shaped everything
The "Think Outside the Bot" hackathon ran for five weeks and awarded $10k in prizes, with winners announced at Vector Space Day 2026. The rules forced participants away from retrieval-augmented generation and simple chat interfaces, judging instead on innovation, creativity, and technical depth.
What's interesting for engineers is how often the winning projects treat embeddings as the primary compute layer—not a retrieval bolt-on—and push the language model to the edges of the system, if they use one at all.
Multi-vector designs led the field
The top submissions lean heavily on Qdrant's named vectors feature, storing several distinct embeddings under a single point ID.
MemoryAtlas (1st place, Aritra Mazumder) stores six vectors per memory: semantic meaning, emotion distribution, audio prosody, linguistic structure, cross-modal dissonance, and raw audio. It uses TwelveLabs Marengo 3.0 for 512-dimensional audio embeddings independent of the transcript, plus a custom PyTorch transformer that encodes 14-day emotional sequences to catch recurring mental-health spirals. When a spiral is confirmed, it uses the Recommend API with current entries as negative examples and a datetime payload filter to retrieve the user's prior recovery window. No query is ever typed.
Cardinal (honorable mention, Nicholas Zhu) takes the same six-named-vector approach for DeFi yield products—narrative, risk, yield source, correlation, tax treatment, composability—and makes a sharper architectural argument. It touches a language model only twice per session: once to parse English into a structured query, once to narrate results. Everything in between is Qdrant doing Recommend with positive/negative anchors, multi-vector prefetch with RRF fusion, payload-filtered HNSW traversal, and the Discovery API. The pitch is that deterministic, inspectable vectors don't hallucinate the way an LLM in the decision seat does.
Vectors as simulation and memory
A few projects use the database to model state over time rather than just retrieve documents.
Crowd Whisperer (2nd place, Latent DJs) represents each listener as a 532-dimensional hybrid vector (LAION-CLAP semantic plus acoustic features). Every 15 seconds it scores each persona with a reward formula—taste affinity, transition surprise, novelty, energy coherence, fatigue—then drifts the preference vector to model how exposure reshapes taste during a set. The output is a 2D time series showing whether a crowd converges or polarizes.
Synthara (3rd place, Mohammed Roqa) uses Qdrant Cloud as persistent game memory. Every player decision becomes an embedding forming a searchable "soul history," and gemini-2.5-flash synthesizes NPC dialogue that references past deeds. Progression gates stay locked until the player accumulates enough episodic memories—three distinct entries before The Black Citadel opens, per the writeup.
Anomaly detection without labels
Two honorable mentions point at the same insight: define "normal" from the data itself, then flag distance.
Afterimage (Karan Singh Bisht) embeds watched regions of video frames with CLIP into a single collection, then runs three Qdrant queries per sampled frame: a filtered nearest-neighbor search against a derived floor of mean minus three sigma, a distance map across regions with search_matrix_pairs, and a best_score Recommend query to surface the strongest outlier. No training, no per-object rules—and because it's CLIP, you can also search the space in plain language.
Black Swan (Yusra) shifts that idea earlier in the lifecycle, converting plain-text infrastructure descriptions into embeddings, retrieving similar production patterns, and simulating cascading failures—database outages, broker failures, DNS issues, regional disruptions—to estimate blast radius before deployment.
Rounding out the list: DejaPlay indexes StatsBomb football possessions for tactical similarity search with Mistral-powered natural-language queries, and Sprouty turns 90 seconds of voice input into a 12-week gardening plan via hybrid retrieval.
What to take from it
The throughline is that several teams deliberately demoted the LLM and let vector operations—Recommend with negative anchors, datetime payload filters, matrix-pair distance maps, RRF fusion—carry the logic. If you've been treating your vector store as a passive retrieval layer behind a model, these projects are a concrete argument for inverting that. The repositories are linked in Qdrant's writeup if you want to see the implementations.
No first-hand testing implied.