WIP

Single comment thread

colin charles

I find you need to create your own benchmarking tool, to focus on what you want. I've done one to focus on RAG pipelines, with a framework for creating, managing, and running benchmarks with various question types and validation methods. basically a test+eval tool, and thats similarly applied to testing workflow. Can use an LLM (VSCode+Copilot, Cursor) to help you do this in python, with a bit of work.

Reply 2024-12-05 05:11:31 UTC

Steven Irby

PRO

@Nomadsteve

Interesting. So, no framework really worked for you?

Reply 2024-12-05 19:16:59 UTC

colin charles

PRO

@byte

@Nomadsteve did you have one in mind, particularly? i ended up writing my own, might open source down the line

Reply 2024-12-05 23:05:01 UTC

Steven Irby

PRO

@Nomadsteve

trigger.dev looks very promising.

I think a hybrid approach might be the fastest/easiest. Use n8n.io/ai with real code if you need it.

Honestly, I need to build out more agents to understand fully. I have a lot of gaps in my knowledge.

Reply 2024-12-05 23:12:43 UTC

colin charles

PRO

@byte

ah so this wasn't the "testing them" only part ;) i focused on tests only! do not use either of the solutions, fwiw.

Reply 2024-12-05 23:46:54 UTC

Go to Homepage	`g` `h`
Go to Done Todos	`g` `d`
Compose a New Todo	`n`
Go to Search	`/`
Show this dialog	`?`

Keyboard Shortcuts