AVX512 (3 of 3): Deep Dive into AVX512 Mechanisms

Показать описание

In this video we will explore the AVX 512 mechanisms.

Links:

CPUID Wikipedia page (contains a list of all the instruction set feature flags):

Software used to make this vid:

Рекомендации по теме

Комментарии

How come you were using vmovupd instead of vmovups in the rounding example to load the single precision floats. Is there any difference between the two, like vmovups requires 4byte alignment. Or do the vmovu* instructions remove all alignment requirements and expand to the same micro ops?

nayjames

Oh man, I can't wait to see some AVX1024 registers 😆

PunmasterSTP

Watching a video like this makes me understand how CPUs keep gaining millions upon millions of transistors. The muxing, control lines, registers and logic in general to implement all of these instructions, things like broadcasting etc would just keep piling on the transistors..!

And the detail in that koala drawing ... 🤯

TomStorey

Oh, wasn't expecting the art montage at the end. Appreciate it all the same with the series 🤭

_lapys

Your dad is an absolute legend indeed!

stijnkuipers

Interesting stuff. Those masks are quite fascinating. Also love your dad's artwork. Very talented.

NeilRoy

8:21 - The only thing special about k0 is that you can't use it in a lot of instructions. The *encoding* "000" is used to mean "no mask". It doesn't *read* k0 when you do that, it just is hardwired to "no mask". That's why changing the contents of k0 doesn't change that behaviour - it never actually reads the register. But because the encoding is reserved, it also means you can't use k0 for most instructions. It's a perfectly good normal register, it's just the *encoding* for most vector instructions is reserved, so you can only really use it in the other mask-register instructions - as a temporary or things like that.

Annoyingly, some assemblers allow you to use the "{k0}" syntax, which is technically illegal. Because again - the instruction doesn't read k0! They should produce an error, but they don't.

tom_forsyth

We ain't in Kansas anymore, Toto. Loved this trilogy, thank you. The Kmasks, the compressed displacement, broadcasting, the register files, all of it is exciting. I have experimented with SIMD since you first introduced us to SSE. I suspect that the power of these instructions will only really be experienced after a paradigm shift in the way we structure data. The classic vision of data in records (structs, Classes etc.) has served is well with the classic architectures . These revolved around pointers and pointer arithmetic (shock! horror! bare naked pointers are at the foundation of it ALL). The new architecture is less friendly to the mixing of numerical, textual and bit-field data. It thrives on sequential lists of data all being the same types. So, data currently stored thus: Name: Michael, Age: 69, Salary : beyond your wildest dreams; Name: Creel, Age: ... etc. will need to be stared as Name: Michael, Creel; Age: 69... etc. I, perhaps, need a lot more examples of data for clarity but the idea being that each numerical field can be accessed as one long array. Why? When one isn't number crunching enterprise amounts of data, the overhead to gather the numerical data from classic records can erase the advantage of these powerful instruction sets. It is not an obstacle just a different view of your data.

I really like your Dad's pictures. I can see why you are so proud of him. Stay healthy.

willofirony

This CPU is capabale of ZVX521 Fondation instruction set!

AlFasGD

Really very good picture of Kwala.or Quwala. By the way great tutorial for Intel CPU AVX512 series all three.

imrank

Great intro to this instruction set! I'm a database guy, so not quite sure when I'll ever write my first assembly code, but your teaching style is so good that I can't help watching!

oresteszoupanos

Great video - its interesting as a developer to learn some asm/intrinsincs.

matsedv

Awesome, I received 11 points! The compressed displacement explanation and example was brilliant. And thank you for sharing your Dad's artwork.

robertzavala

Great video, great lesson about AVX512 mechanisms

KristianDjukic

man I love your videos so much :'D

lordadamson

Thanks! I was capabale to understand most of the stuff.

mikkoyliharsila

Really cool and awesome videos talking about some of the AVX512 ! Can you explore more about avx512 in the future, especially the FMA instructions in the future? CUDA with tensor cores boost the GEMM computation through put by such a big degree that now the Ampere A100 basically has more silicon for tensor cores instead of the common CUDA cores. The GA100 actually has much less fp32 CUDA cores than the GA102 gaming/content creation lineups(like RTX A600 or 3090) . It would be interesting to see how the avx512 FMA implementation on BLAS boosts the speed/throughput in comparison to that of avx2 and no avx at all.

fdc

@Creel
20:24
Did you notice, that it rounded myFloats[0] = 1.5 to 2, but myFloats[4] = 0.5 to 0?
I would consider that strange. If a value is x.5 i would always expect to round the value upwards.

OpenGLever

Avx2 also has automatic broadcasting cool instruction

lukehanscom

Hello mate . Will u make some videos on opencl and gpu programming. Should be nice interesting addition to ur high performance software computing guide.

kadiyamsrikar

AVX512 (3 of 3): Deep Dive into AVX512 Mechanisms

AVX512 (3 of 3): Deep Dive into AVX512 Mechanisms

AVX 512 Properly Explained! – Performance and Syntax Analysis

AVX512 (1 of 3): Introduction and Overview

AVX512 (2 of 3): Programming AVX512 in 3 Different Ways

Linus Torvalds hopes: 'AVX512 Dies A Painful Death' & why the RISCV Vector extension ...

AVX Explained - Performance and Syntax Analysis

Zen 3: SMT4 and AVX512 FPU monster 19h Family from AMD

AVX-512 TOO COMPLEX, Intel RENAMES it to AVX 10 with even MORE complex and INCOMPATIBLE variations!

AMD Zen 3: +20% IPC, 6xFPU AVX512 Monster

AVX512 Convolution Implementation Optimization for Knights Landing

Fix Any AA Game or App Require AVX AVX2 AVX512 or Any Instructions For old CPUs

Why a Bad CPU Can Be Freaking Awesome - And Why AVX-512 is so Important? Best G7400 Testing on Water

Code Optimization for AVX-512

How AVX 512 still works on 12900K

Next-Gen CPU Acceleration: AVX For Generative AI

Stanford Seminar - Centaur Technology's Deep learning Coprocessor

Ray marching on the CPU with AVX 512 - 3D Cellular Tiling

The Future is Here: Unifying AVX-512 with Intel's AVX10 and APX

Fastest Search Tool, exact/literal full-text search using AVX2 and AVX512

Intel Deep Learning Boost

CPU shader with AVX-512: happy jumping

AMD's Zen 5 Challenges: Efficiency & Power Deep-Dive, Voltage, & Value

OCPSummit19 - EW: HPC & GPU/FPGA Technology - Next Generation Intel Xeon Scalable Processors

High Fidelity Unleashed with the Intel® oneAPI Rendering Toolkit