Building a Semantic Memory System with pgvector
How we use OpenAI embeddings and pgvector to build a cross-meeting memory graph that lets you search across all your conversations with natural language.
One of the most powerful features of Huddix is the ability to search across all your meetings using natural language. "What did Sarah say about the Q4 roadmap?" or "When did we discuss the new pricing model?" — and you get instant, accurate answers. This post explains how we built this semantic memory system.
The Challenge
Traditional search relies on keyword matching. If you search for "Q4 roadmap" but the meeting transcript says "product plan for Q4," you might miss relevant results. Semantic search solves this by understanding the meaning behind your query and the content you're searching.
Our Architecture
1. Embedding Generation
For each meeting, we generate embeddings for:
- The full transcript
- The AI summary
- Extracted key points and decisions
- Action items
We use OpenAI's text-embedding-3-large model, which produces 3072-dimensional vectors with state-of-the-art performance on semantic similarity benchmarks.
2. Vector Storage with pgvector
We store embeddings in PostgreSQL using the pgvector extension. Each meeting's embeddings are stored alongside metadata (date, participants, topics) in a single table with a vector index for efficient similarity search.
3. Hybrid Search
When you search "What did Sarah say about Q4?", we:
- Generate an embedding for your query
- Find similar embeddings using vector similarity (cosine distance)
- Filter by participants if you mention a name
- Apply keyword boosting for exact matches
- Re-rank results using our relevance model
4. Memory Graph
We go beyond simple vector search by building a memory graph. Entities (people, projects, companies) are extracted and linked across meetings. This enables multi-hop reasoning: "What projects did the engineering team discuss that were related to the Q4 launch?"
Performance
Our semantic search system handles 500K+ meetings with:
- ~50ms average query latency (p95)
- 94% relevance score (human eval)
- 99.9% uptime
Privacy and Security
All embeddings are encrypted at rest and in transit. User workspaces are isolated with row-level security — you can only search your own meetings. We never use your data to train models.
Future Directions
We're exploring temporal reasoning ( "how has the team's opinion on X evolved over time?"), multi-modal search (finding meetings based on slides or shared screens), and proactive memory suggestions ( "you might want to revisit this discussion from 3 months ago").