Poly-DB, also called the Polymarket Vector Derivatives system, is a tool by Roman Slack that scrapes prediction markets from Polymarket and vectorizes them using semantic embeddings. By converting each market's question and text into a vector, Poly-DB makes it possible to reason about relationships between markets mathematically rather than by keyword matching, enabling fast similarity search across hundreds of markets.
The system supports natural-language search, derivative detection, market clustering, and arbitrage discovery. Users can query in plain English to find relevant markets, identify near-duplicate or closely related markets through semantic similarity scores, and surface pricing inefficiencies between markets that describe substantially the same outcome. Similarity scores are interpreted on a graded scale, from near-duplicates at the top range down to unrelated markets, to classify how strongly two markets are connected.
Roman Slack built Poly-DB in Python with a production-oriented architecture: market data is pulled from the Polymarket Gamma API, text is embedded using Sentence Transformers (all-MiniLM-L6-v2, 384 dimensions), and vectors are persisted locally in ChromaDB with no external dependencies. A FastAPI backend exposes REST endpoints for stats, search, derivatives, and arbitrage, a minimal Flask web UI allows interactive exploration, and a CLI tool handles batch scraping and analysis. The entire stack is containerized with Docker Compose for straightforward deployment.
Key Features
- Semantic natural-language search across Polymarket markets
- Derivative detection via semantic similarity scoring
- Arbitrage detection between closely related markets
- Automatic topic-based market clustering
- FastAPI REST backend with search, derivatives, and arbitrage endpoints
- CLI tool and Dockerized deployment
Tech Stack
Designed and built by Roman Slack, Lead AI Platform Engineer. See more of Roman Slack's work on the projects page or get in touch via the contact page.