Back
Similar todos
spend 1h debugging some k8s complexity-induced issue after a node failure at cloud provider for #crisp mirage gpus
wake up from mayhem with my NVIDIA A16s, 70% of them were down all night and only fixed this morning by Vultr. A40 for large LLM still up fortunately #crisp
try to upgrade #crisp Mirage AI Kuberbetes cluster version, miserably fail at it cause of broken GPU image NVIDIA drivers from the cloud provider which stalled the upgrade process, destroy cluster and rebuild all infrastructure from scratch all evening 🥲
experience 6 hours GPU total downtime at Vultr, knocking down all #mirage AI services, will migrate to Scaleway cause Vultr are 🤡
benchmarking NVIDIA A16 vs A40 vs A100 GPUs economics to scale #crisp mirage
terraform #crisp mirage cloudflare & vultr configurations
fixed broken digital ocean server
take down separate vm for data processing b/c needs cuda & azure doesn't let me use their credits to access nvidia gpus 👎 #mused
Setup Vultr server #sketchfordesignrs
get more issues on my nomad server, nomad destroyed all my running services over night because it failed a heartbeat to the server for 10s, because my cloud provider VPS I/O froze for a few seconds, found a way to prevent that behavior. Nomad is definitely made for bare metal and not cloud #life
Experience serious issues with our cloud GPU provider that we're still waiting for them to fix for about 3 days 😭 #fashn
Pay for vultr servers
fix gated ai model this morning on #mirage which created downtime after k8s node restart since a model could not be pulled anymore from huggingface, i had to manually accept model ToS wtf
fix server being down cause optic fibre cable broke #shipr
move bots to vultr server
provision batch of L4 and L40S GPUs at Scaleway for #mirage since our account got validated and quotas lifted