blog: blackwell --final edit

2026-02-27 22:55:24 -05:00
parent c4fa3976f9
commit f647a4ad90
1 changed files with 3 additions and 2 deletions
--- a/content/posts/blackwell_datacenter_vs_geforce.mdx
+++ b/content/posts/blackwell_datacenter_vs_geforce.mdx
@@ -1,7 +1,7 @@
 ---
 title: 'Blackwell: Datacenter vs GeForce GPUs'
 date: '2026-02-27'
-description: 'Jensen scammed me.'
+description: 'Jensen scammed us.'
 tags: ['Nvidia', 'GPU', 'GPU Kernel']
 ---

@@ -33,7 +33,7 @@ I needed to confirm this myself. <SideNote>NVFP4 is Nvidia's new low precision f
 Over a PETA FLOP of nvfp4 compute! ggs. This is already insane, and I'm very happy with it. I didn't get `wgmma` from hopper, nor the `tcgen05` instructions and the `TMEM`, but I did get a petaflop of nvfp4 compute.
 Nsight Compute tells us exactly what we would expect

-![Nisght compute shows registers spilling, due to extremely high register pressure.](/images/1_blackwell_dc_vs_gf/geforce_ncu.png)
+![Nisght Compute shows registers spilling, due to extremely high register pressure.](/images/1_blackwell_dc_vs_gf/geforce_ncu.png)

 Tensor cores are so fast that the memory is bottlenecking them. All of the shared memory is filling up. Huh, I guess nvidia realised this and created `tcgen05` but we don't get to see any of that.

@@ -46,5 +46,6 @@ To see how the GPU folk in datacenters live, I booted up a vast ai instance and

 We're getting over 2 petaflops, and I'm sure these things can go even faster with better code. Not having `tcgen05` really holds back the geforce cards. 
 This is amazing, I wish I'd be able to get a taste of this locally.
+
 Why Jensen, why.