diff --git a/content/posts/blackwell_datacenter_vs_geforce.mdx b/content/posts/blackwell_datacenter_vs_geforce.mdx index 44467fb..d42d3ae 100644 --- a/content/posts/blackwell_datacenter_vs_geforce.mdx +++ b/content/posts/blackwell_datacenter_vs_geforce.mdx @@ -11,4 +11,13 @@ I jumped on the 50-series especially for the fp4 support on their 5th generation Imagine my surprise when I was perusing the GPU mode discord and find people calling the GeForce blackwell cards "Fake blackwell"?!! Looking online, I found next to no resources on the difference. I foolishly assumed that my GeForce card (arch=sm_120) would contain all the features from the datacenter cards (arch=sm_100), as it seemed to be a later arch. No, Nvidia just made it more confusing, and obscured the technical details extremely well. Going through the [cuda documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/), -you'll see that the new tensor core gen 5 instructions are only compatible with `sm_100[a-f]` (Datacenter Blackwell) and `sm_101` (Jetson Thor). What does this mean? That involved a lot more digging. \ No newline at end of file +you'll see that the new tensor core gen 5 instructions are only compatible with `sm_100[a-f]` (Datacenter Blackwell) and `sm_101` (Jetson Thor). What does this mean? That involved a lot more digging. + + +### What's in the new tensor cores? + +The blackwell tensor cores now support lower precision, namely FP6 and FP4, which the previous Hopper generation didn't. This enables extremely fast low precision matrix multiplications. +To test out the nvfp4 "Nvidia's low precision format. " support, I downloaded the cutlass repo and ran the nvfp4 matrix multiply example. Here's what I got + +[A screenshot of a cutlass nvfp4 matmul benchmark](public/images/1_blackwell_dc_vs_gf/5090_65536.png) + diff --git a/public/images/1_blackwell_dc_vs_gf/5090_65536.png b/public/images/1_blackwell_dc_vs_gf/5090_65536.png new file mode 100644 index 0000000..2820176 Binary files /dev/null and b/public/images/1_blackwell_dc_vs_gf/5090_65536.png differ diff --git a/public/images/1_blackwell_dc_vs_gf/Pasted Image (Copy 1).png b/public/images/1_blackwell_dc_vs_gf/b200_32768.png similarity index 100% rename from public/images/1_blackwell_dc_vs_gf/Pasted Image (Copy 1).png rename to public/images/1_blackwell_dc_vs_gf/b200_32768.png diff --git a/public/images/1_blackwell_dc_vs_gf/Pasted Image (Copy 2).png b/public/images/1_blackwell_dc_vs_gf/b200_65536.png similarity index 100% rename from public/images/1_blackwell_dc_vs_gf/Pasted Image (Copy 2).png rename to public/images/1_blackwell_dc_vs_gf/b200_65536.png diff --git a/public/images/1_blackwell_dc_vs_gf/Pasted Image.png b/public/images/1_blackwell_dc_vs_gf/b200_unable_to_profile.png similarity index 100% rename from public/images/1_blackwell_dc_vs_gf/Pasted Image.png rename to public/images/1_blackwell_dc_vs_gf/b200_unable_to_profile.png