The Hard Tradeoffs of Edge AI Hardware

preview_player
Показать описание
Errata:

I said in this video that "CPUs and GPUs are not seen as acceptable hardware choices for edge AI solutions". This is not true, as CPUs are commonly used for small, sub-$100 items. And GPUs are frequently used in lieu of FPGAs due to their ease of programming. Thanks to Patron Gavin for his input.

Links:
Рекомендации по теме
Комментарии
Автор

Yet another interesting video, which condenses a lot of information into a manageable chunk. Keep it up! <3

philippepanayotov
Автор

I write inference pipelines on jetsons for my work. The latest generation have some very attractive performance characteristics. They do pull 60W, which definitely isn’t nothing, but for our use case it’s manageable.

Something that wasn't stressed is that those devices are heavily optimized around fixed point 8 bit throughput. A high-spec Orin can put out 275 TOPS, which is more than triple a 4090s 83 TFLOPS. Even if the models are much weaker, the increase in throughput opens up a lot of flexibility with system design.

fennewald
Автор

Dude your content is as outstanding as it is random. From soviet oil export over chip manufacturing to edge AI models xX

petermuller
Автор

As someone who actually deploys edge AI I heavily disagree with how "easy" you make FPGA's and ASICS seem. For the vast majority of "smaller" projects the Nvidia Jetson series is a far better choice since they support newer algorithms and functions. Especially with the speed the field of AI is progressing at. Furthermore fp16 GPU tensor cores are basically optimized for ML inference and provide good performance if you want to spend a little extra time on converting the model, though often even that has compiling issues with newer ML models.

tempacc
Автор

Post-training pruning is very much how human neural networks learn - massive connections anywhere and everywhere over the first few years - during the initial training phase, then massive pruning to get rid of the unnecessary connections.

StephenGillie
Автор

You're missing the latest development in edge AI: simplified models running on a physical quirk flash memory. Mythic AI and Syntiant are two companies taking advantage of this to do simple, this tech is in the earliest days and has a lot of future potential.

ChrisSudlik
Автор

Amazing video (as always), I am a ML engineer, it's not (always) true that the less weight the model has the less memory/faster it is. I know it's a little counter intuitive. It all comes down to how many GFLOPs the model need on some hardware

frazuppi
Автор

I would like to write some personal sentences since this is something I am recently working on.

There are already existing ASICs around, and it has a huge benefit compared to FPGAs, that is the "area". It's nice that we can reprogram our FPGAs anytime, but considering the edge-AI commercial devices (E.g., VR headsets), we have a very tight form factor for the chip. Then, it is feasible to say ASICs will dominate the edge-AI products. Even more, we will find the architecture of edge-AI, next to the processing units with more caches (everything on a single package = inside of our mobile devices in the future(?) ).

However, we need to consider some issues with AI accelerators.. They have to work quite busy, which makes their thermal profile a bit annoying, and we need too many buffers next to cache memories (too much memory on-chip = too much area... ). We have nice cooling solutions already, but we definitely need more, or we need new methods to reduce the sparse computation of neural networks. Maybe you heard about "Spiking Neural Networks (SNN)". They provide a nice "event-based" structure to the network, which allows you to create "idle" states for your computation...


That is already a nice idea to have a nice edge-AI chip with low power option !! Next, what if we make this chip in 3D? Considering the memory domination in the AI chips, what about stacking the memory die, vertically on top of the logic die?

We try to answer this question in imec.

refikbilgic
Автор

Some of the Nvidia Jetson boards like the AGX Xavier and Orin have separate Nvidia Deep Learning Accelerators built in as well as the GPUS. There is the Hailo-8 M.2 accelerator too.

jjhw
Автор

I thought you would have included the mobile processors from Apple, Samsung, Qualcomm, etcetera.

They all include a Neural Processor these days but I never hear much discussed in relation to them, how powerful they are, what they’re actually able to do and so on.

Given the many millions of people own phones that are powered by these processors, surely the potential to bring Edge AI to everyone is now here if they are used effectively.

PaulGreeve
Автор

Natural language processing is what will end me. There is literally nothing else I can do as a paraplegic with dyscalculia on the ass-end of Europe besides translation, and Deepl is already near perfect between every language pair I can speak.

I've got every recommendation under the sun from becoming a musician (been playing the guitar for 6 years daily, can't memorize the fretboard because of dyscalculia), to programming (can't memorize the syntax because of dyscalculia), to 3D modeling... (you get the gist, nothing even remotely related to manipulating quantities or memorizing arithmetic relations is viable), to becoming a novelist (sure, because we all know how great those people earn).

Aanyway, that was my comment for the algorithm.

dominic.h.
Автор

for a second, I thought the cow photo at the beginning said "dont eat ass"

nekomakhea
Автор

You should take a look at Perceive's ERGO chip it seems to be a gamechanger in this field

CrazyEngineer
Автор

That solder bridging in the stock video at 2:56 and 3:00. Yikes! Great video BTW!

MarcdeVinck
Автор

Thank you and much Love from the Philippines.

kennethtan
Автор

2:35 The AI-generated PCB image is a nice touch

rngQ
Автор

My fave video of yours recently! Thanks for making it!

TheTyno
Автор

Love the potato PCB ai generated image at 2:44

hypercube
Автор

Qualcomm AI100 DM.2e form-factor edge AI accelerators scale 70-200 TOPS at 15-25W, also various Qualcomm Snapdragon mobile SoCs fit the role of less powerful 30 TOPS and under edge AI accelerators. Qualcomm is pushing pretty hard into that direction with SoCs for edge AI, vision AI, drones/robotics, ADAS, and level 4-5 autonomous driving including software stacks. They even have AI research division dedicated to solving problems with edge AI model optimizations and other things.

LogioTek
Автор

Almost 20 years ago I led a project to port the neural models we ran on a Beowulf cluster to a more mobile platform. Our goal wasn't to create a processor to run solved networks like the current crop of AI processors - we built it with full plasticity so that the learning and processing could be performed on the same piece of hardware. I am disappointed that what is available today is a shadow of what we did in 2004. None of the current processors are constructed to model the neural structures required for real intelligence. They just keep modeling the same vacuous networks I used as an undergrad in the cog sci program at UCSD in the late 80's. Most of the people using the technology don't understand intelligence and sadly don't care what is required. One example: Lately I've seen numerous job postings for AI engineers who can do prediction - what they don't understand is that it isn't prediction, the missing component of these networks is expectation - facilitated by prospective memory.

jamessnook