Back
Similar todos
spend 1h debugging some k8s complexity-induced issue after a node failure at cloud provider for #crisp mirage gpus
fix broken nvidia a100 gpu server at vultr which has put #crisp mirage down for whole night due to being out-of-stock and no replacement physical node could be allocated
deploy #crisp mirage ai improvements to production all morning
finish some #crisp chatbot ai inference code going to mirage api which manages all our gpus
Upgrade k8 cluster at work
fix bugs on #crisp mirage dashboard
wake up from mayhem with my NVIDIA A16s, 70% of them were down all night and only fixed this morning by Vultr. A40 for large LLM still up fortunately #crisp
fix recurring api errors with #crisp mirage ai services
finalizing #crisp mirage ai backends, getting ready for launch!
speeding up #crisp mirage rust builds in docker
experience 6 hours GPU total downtime at Vultr, knocking down all #mirage AI services, will migrate to Scaleway cause Vultr are 🤡
benchmarking NVIDIA A16 vs A40 vs A100 GPUs economics to scale #crisp mirage
working on #crisp mirage
improve #crisp mirage dashboard error messages
finish migrating #mirage kubernetes intel and nvidia gpu instances to scaleway, getting last-generation NVIDIA L40S + L4 GPUs, running much smoother now! (previously: old A40 and A16)
fixing rust async-await issues on #crisp mirage ai
update k8s cluster #badal
implement #crisp mirage dashboard
Trying to upgrade python 3.10.6 to run A1111 on Nvidia is horrible
preparing #crisp mirage api production docker image