Back
Question
Asked

Running Llama 3?

Following up on my VPS-question earlier, what services do you use to run Llama 3 yourself? If you do that kind of stuff..?

Cloudflare Workers with Llama 3 bindings?


If you really want to do it yourself, I think it's going to be cost-prohibitive.

You would need to fork over several hundred dollars every single month for hardware with a GPU on it (assuming you need it online 24/7), then spend time configuring dependencies and so on.

Better to pay a provider that already has the necessary hardware: artificialanalysis.ai/models/…

Right now Groq or DeepInfra is the cheapest. Or if you already have AWS/Azure/whatever credits, use those first obviously as all the major cloud platforms have an AI service meant for executing jobs against LLMs these days.

Hmm, you’re raising fair questions. I always knew it would be hard, didn’t think about the new model upgrades though.

Why would you want to run it yourself?

Unless you have a VERY good reason, I'd use Groq as it's fast and affordable. In the future you can probably run it on device when browsers and operating systems have built-in support for LLM's.

BTW, I suggest implementing it in such a way that you're not tied down to any specific API provider. There are many open source libraries out there with OpenAI compatible API endpoints (which Groq also supports). So if you use that, it will be relatively easy to switch service providers in the future.

@smitmartijn pointed this out on X just now, but also highly recommend looking at Cloudflare AI Gateway so you can have automatic fallbacks between different providers, real time logs, response caching, and a bunch of other useful things that will either improve your DX or save you money, or both