WIP

Join

Single comment thread

See full discussion

Ben Katz

@ben

Simplest solution for low scale is Postgres + pgvector:

Create a content (text) column with a unique key to prevent dupes
Create an embedding column to store the embedding
Write a scheduled bg job that runs every so often, checks for docs that haven’t been updated recently, and updates the same “last processed at” column whenever embeddings are updated for a given doc.

For high scale, consider an elasticsearch cluster and have the scheduled job produce messages to be written to a queue + horizontally scalable worker to process the queue.

Reply 12mo

Praneeth Pike

@praneethpike

thanks, but i think this strategy is for low scale and less frequently changing docs. looking for something at a decent scale, and fast changing documents

Reply 12mo

Ben Katz

@ben

Define “fast changing” and “decent scale”

Reply 12mo

Go to Homepage	`g` `h`
Go to Done Todos	`g` `d`
Compose a New Todo	`n`
Go to Search	`/`
Show this dialog	`?`

Keyboard Shortcuts