Software Ate Hardware Until Hardware Fought Back

For decades, software abstraction layers made hardware differences irrelevant. The AI era reversed this hardware architecture now dictates what software is possible, and the companies that design both are winning.

"Comparing on paper FLOP/s and HBM Bandwidth/Capacity is akin to comparing cameras by merely examining megapixel count. The only way to tell the actual performance is to run benchmarking." SemiAnalysis, MI300X vs H100 Benchmark

The Marc Andreessen thesis that "software is eating the world" held true for an era when CPUs were general-purpose enough that software could abstract away hardware differences. Database, web server, application server it did not much matter what silicon sat underneath. But the AI revolution has inverted this relationship. Nvidia's Tensor Cores deliver 16x the throughput of their general-purpose CUDA cores for matrix multiplication. If you are not doing matmuls, you get 19.5 teraflops instead of 312. The hardware architecture now determines what is computationally feasible.

Microsoft's Maia 100 chip illustrates both the opportunity and the peril. It was designed before the LLM explosion, so it loaded up on SRAM (which helps for some model architectures) but under-provisioned HBM memory bandwidth. The result: GPT-4 inference performance at roughly one-third of the H100. No amount of software optimization can close that gap because it is baked into the silicon. Microsoft, Amazon, Google, and Meta are all now designing custom chips precisely because the hyperscalers need to reduce capital cost in ways that GPU cloud startups cannot.

Amazon's Graviton chips attack from a different angle commoditizing the CPU by optimizing for total cost of ownership at the system and rack level rather than peak single-core performance. The common thread is vertical integration: the companies that control their silicon can make holistic tradeoffs across power, cooling, networking, and compute that companies buying off-the-shelf Nvidia GPUs simply cannot. Software still matters enormously AMD's MI300X has more raw FLOPS than the H100 but achieves less actual throughput because its software stack is immature. But the direction of the industry is clear: hardware is no longer fungible.

Takeaway: In the AI era, hardware architecture constrains what software can achieve, and the companies that design their own silicon will have structural cost advantages that cannot be competed away with code alone.


See also: Custom Silicon Will Eat General Purpose Computing | CUDA Is a Moat Not Just a Library | Dennard Scaling Ended and Everything Changed