Back to Blog
Engineering Apr 5, 2026 7 min read

Building a Semantic Memory System with pgvector

How we use OpenAI embeddings and pgvector to build a cross-meeting memory graph that lets you search across all your conversations with natural language.

HT
Huddix Team
Huddix Team

One of the most powerful features of Huddix is the ability to search across all your meetings using natural language. "What did Sarah say about the Q4 roadmap?" or "When did we discuss the new pricing model?" — and you get instant, accurate answers. This post explains how we built this semantic memory system.

The Challenge

Traditional search relies on keyword matching. If you search for "Q4 roadmap" but the meeting transcript says "product plan for Q4," you might miss relevant results. Semantic search solves this by understanding the meaning behind your query and the content you're searching.

Our Architecture

1. Embedding Generation

For each meeting, we generate embeddings for:

  • The full transcript
  • The AI summary
  • Extracted key points and decisions
  • Action items

We use OpenAI's text-embedding-3-large model, which produces 3072-dimensional vectors with state-of-the-art performance on semantic similarity benchmarks.

2. Vector Storage with pgvector

We store embeddings in PostgreSQL using the pgvector extension. Each meeting's embeddings are stored alongside metadata (date, participants, topics) in a single table with a vector index for efficient similarity search.

3. Hybrid Search

When you search "What did Sarah say about Q4?", we:

  1. Generate an embedding for your query
  2. Find similar embeddings using vector similarity (cosine distance)
  3. Filter by participants if you mention a name
  4. Apply keyword boosting for exact matches
  5. Re-rank results using our relevance model

4. Memory Graph

We go beyond simple vector search by building a memory graph. Entities (people, projects, companies) are extracted and linked across meetings. This enables multi-hop reasoning: "What projects did the engineering team discuss that were related to the Q4 launch?"

Performance

Our semantic search system handles 500K+ meetings with:

  • ~50ms average query latency (p95)
  • 94% relevance score (human eval)
  • 99.9% uptime

Privacy and Security

All embeddings are encrypted at rest and in transit. User workspaces are isolated with row-level security — you can only search your own meetings. We never use your data to train models.

Future Directions

We're exploring temporal reasoning ( "how has the team's opinion on X evolved over time?"), multi-modal search (finding meetings based on slides or shared screens), and proactive memory suggestions ( "you might want to revisit this discussion from 3 months ago").

Related Articles

Engineering

How We Built Cross-Meeting Voice Fingerprinting

Using SpeechBrain's ECAPA-TDNN model, we can identify speakers across meetings with just seconds of ...