The futile future of the gigawatt datacenter — by Nicholas Weaver

David Gerard@awful.systems · 2 days ago

The futile future of the gigawatt datacenter — by Nicholas Weaver

Architeuthis@awful.systems · 2 days ago

So if a company does want to use LLM, it is best done using local servers, such as Mac Studios or Nvidia DGX Sparks: relatively low-cost systems with lots of memory and accelerators optimized for processing ML tasks.

Eh, Local LLMs don’t really scale, you can’t do much better than one person per one computer, unless it’s really sparse usage, and buying everyone a top-of-the-line GPU only works if they aren’t currently on work laptops and VMs.

Sparks type machines will do better eventually but for now they’re supposedly geared more towards training than inference, it says here that running a 70b model there returns around one word per second (three tokens) which is snail’s pace.

David Gerard@awful.systems · 2 days ago

yeah. LLMs are fat. Lesser ML works great tho.

Wu-Tang Yutani Corporation@mastodon.me.uk · 2 days ago

@dgerard @Architeuthis

Lard Language Model