blog: blackwell updated images
All checks were successful
Deploy Website / build-and-deploy (push) Successful in 28s
All checks were successful
Deploy Website / build-and-deploy (push) Successful in 28s
This commit is contained in:
@@ -28,7 +28,7 @@ any kind of work load. Why did I have to dig so hard to find this information? T
|
|||||||
|
|
||||||
I needed to confirm this myself. <SideNote>NVFP4 is Nvidia's new low precision format. </SideNote> I downloaded the cutlass repo and ran the nvfp4 matrix multiply example. Here's what I got
|
I needed to confirm this myself. <SideNote>NVFP4 is Nvidia's new low precision format. </SideNote> I downloaded the cutlass repo and ran the nvfp4 matrix multiply example. Here's what I got
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Over a PETA FLOP of nvfp4 compute! ggs. This is already insane, and I'm very happy with it. I didn't get `wgmma` from hopper, nor the `tcgen05` instructions and the `TMEM`, but I did get a petaflop of nvfp4 compute.
|
Over a PETA FLOP of nvfp4 compute! ggs. This is already insane, and I'm very happy with it. I didn't get `wgmma` from hopper, nor the `tcgen05` instructions and the `TMEM`, but I did get a petaflop of nvfp4 compute.
|
||||||
Nsight Compute tells us exactly what we would expect
|
Nsight Compute tells us exactly what we would expect
|
||||||
@@ -42,7 +42,7 @@ Tensor cores are so fast that the memory is bottlenecking them. All of the share
|
|||||||
|
|
||||||
To see how the GPU folk in datacenters live, I booted up a vast ai instance and ran the same matmul, but with cutlass kernels for `sm_100a`.
|
To see how the GPU folk in datacenters live, I booted up a vast ai instance and ran the same matmul, but with cutlass kernels for `sm_100a`.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
We're getting over 2 petaflops, and I'm sure these things can go even faster with better code. Not having `tcgen05` really holds back the geforce cards.
|
We're getting over 2 petaflops, and I'm sure these things can go even faster with better code. Not having `tcgen05` really holds back the geforce cards.
|
||||||
This is amazing, I wish I'd be able to get a taste of this locally.
|
This is amazing, I wish I'd be able to get a taste of this locally.
|
||||||
|
|||||||
BIN
public/images/1_blackwell_dc_vs_gf/5090_65536_cropped.png
Normal file
BIN
public/images/1_blackwell_dc_vs_gf/5090_65536_cropped.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 8.2 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 12 KiB |
BIN
public/images/1_blackwell_dc_vs_gf/b200_65536_cropped.png
Normal file
BIN
public/images/1_blackwell_dc_vs_gf/b200_65536_cropped.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 12 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 114 KiB After Width: | Height: | Size: 20 KiB |
Reference in New Issue
Block a user