Single comment thread
See full discussion

Simplest solution for low scale is Postgres + pgvector:

  1. Create a content (text) column with a unique key to prevent dupes
  2. Create an embedding column to store the embedding
  3. Write a scheduled bg job that runs every so often, checks for docs that haven’t been updated recently, and updates the same “last processed at” column whenever embeddings are updated for a given doc.

For high scale, consider an elasticsearch cluster and have the scheduled job produce messages to be written to a queue + horizontally scalable worker to process the queue.

thanks, but i think this strategy is for low scale and less frequently changing docs. looking for something at a decent scale, and fast changing documents

Define “fast changing” and “decent scale”

Home
Search
Messages
Notifications
More