E00: Software Drag Racing: C++ vs C# vs Python - Which Will Win?

preview_player
Показать описание
Retired Microsoft Engineer Davepl writes the same 'Primes' benchmark in Python, C#, and C++ and then compares and explains the differences in the code before racing them head to head to see what the performance difference is like between the languages.

It appears the upload process does some volume leveling or loudness, so my apologies if you get startled during the into. It was mixed down in the master, honest :-)

Thanks to the Simpsons for the inevitable reference or two that I throw in now and then! See if you can spot both in this episode!

0:00 Start
2:08 The Primes Assignment
6:25 How a Sieve Works
7:32 Coding Begins
9:00 Python Version
12:50 C# Version
18:25 C++ Version
22:22 Charts and Graphs
23:17 Outtakes

I've placed the code up on GitHub for your reference without any warranty for any purpose!

I get a lot of questions about which keyboard I'm using as well as various other camera and studio equipment questions, so here are the highlights:

CORSAIR K70 RGB MK.2 Mechanical Gaming Keyboard (Cherry MX Blue Switches)

Sony FX3 or A7SIII Cameras

Aputure 120D Mark II Light and Light Dome II Mini

Glide Gear TMP100 Prompter
Рекомендации по теме
Комментарии
Автор

Best video I have watched in a long time.

stuartogrady
Автор

A cool project would be to set up a git repo for everyone to check in their implementations of this algorithm in their favourite language. Then set up a CI pipeline to run them every time someone commits an optimisation. Chart the results.

ShALLaX
Автор

I think we can all predict which will be the slowest, but is it 1/5th, or 1/10th, or 1/100th the speed? Find out! And I'm NOT trying to make any language look silly or slow, just to find out what the differences are! And remember speed is not the only factor that matters...

DavesGarage
Автор

When writing tight loops in Python, you have to remember two things about the language:

1. Attribute lookups and variable lookups not in the local namespace are slow.
2. Calling functions is slow.

Thus I was able to speed up the Python version in the repo from 39 iterations for limit 1_000_000 to ~150 iterations just by inlining the code from GetBits/ClearBits and creating a reference for this.rawbits and this.sieveSize in local variables (and by eliminating the superfluous check for index%2 in the inner loop).

This speedup is achieved without any optimizations to the algorithm.

spotlight-kyd
Автор

Reminds me of a similar comparison that Google did a decade ago. It got kind of ridiculous when the Java engineers went "We can do better than 30% of C performance, we just need to hand-tune the VM and allocation settings!"

totallynotabot
Автор

The C++ STL does actually have a bit array! It is just unfortunately called std::vector<bool>. Seriously, the standard says this specialization of vector should be implemented as an array of bits!

juvenal
Автор

Incredible comparison!

I'd also like to add how -- this man has successfully managed to write the most C++ looking script in Python 😂

luischinchilla-garcia
Автор

His HS CS class: Algorithm optimization competition with classmates
My HS CS class: Creating seemingly never ending popup dialogs that ultimately climax with "You Suck"

richskater
Автор

I noticed in his comments that Dave's reasoning for using std:out was to avoid printing new lines with Python's build in print function, but there's actually an easier way to do it.
When calling the print function you can customize the endline character by using, well, endline="whatever you want as endline" as a parameter.
That way, your endline character could be a comma followed by a space, or whatever else you needed.

Other than that small tidbit which I came across by chance, awesome job as always Dave.

srth
Автор

We are about same age, i have enjoyed many of your videos because the products were so important in my career. Its good to put a human face on the digital world and see a programmer who worked on the product.

charlesbaldo
Автор

Although there were no surprises, it is a great video. Many Python programmers knows that the best way to achieve performance in Python is not using Python. This means that you should do most of the computation calling C optimized libraries like numpy, tensorflow, sklearn.

JuanManuelCuchilloRodriguez
Автор

As a retired CPU designer, I am constantly surprised by the "discovery" that interpreted languages (even those that use a JIT) are so much slower than optimized C or even assembly. There is little appreciation for the massive overhead of many of these script-like languages. As a demonstration to convince a software developer that we could run their massive program on a $35 compute module I recoded their most critical routine in assembly (60 instructions long) and showed that their entire system ran with less than 10% of a very cheap machine rather than 40% of a Mac.

The real nightmare, however, is the strato-layering of "packages" one on top of another instead for minimal additional functionality but a perceived decrease in design time. These chew up CPU cycles in massive overhead damaging the responsiveness and size of the code generated. As CS schools have stopped teaching even the rudiments of computer architecture this is not likely to change. Great for CPU producers, but a massive waste in time, power, and cost.

randyscorner
Автор

There are optimizsations you can do even on assembly level. Like using fancy vector instructions and such. I once optimized a piece of C++ code with some embedded assembly instructions to gain 10x performance just using MMX on a Pentium chip. Usually the hard part is identifying which 0.1% of code really needs to be optimized.

pihi
Автор

Best original Content on Youtube right now. Killing the Game Dave!!!

ian.e.mccormick
Автор

Beyond the "Hello World" program in C64 basic 30+ years ago, I'm not a coder. So it's a testament to your presentation style that I can more or less follow what you're doing, and enjoy watching the show. Keep it up!

AirZeee
Автор

I mean C++ techincally has vector<bool> for dynamic bitarrays. I believe it uses size_t instead of 8-bit chunks, because it's made to dynamically change size even after being created. I know technically alot of people don't like it in bigger codebases for various reasons, but in isolation it works fine just to access and change bits.

marvinabt
Автор

Massive respect for the systematic and clear approach to this comparison (the experience you gathered over the years is very clearly showing in the methodology and explanation).
Instant subscribe! Thanks for this and keep up the great work!

ViorelMocanu
Автор

I rarely see well-executed language comparisons. I love these performance/comparison types of videos.
I really enjoyed it, thank you!

malgailany
Автор

This line here: `for (int num = factor * 3; num <= this.sieveSize; num += factor * 2)`
You can start at `int num = factor * factor`. Everything below that has already been taken care of by the factors that came before. Like factor*3 has already been taken care of when factor was 3, and factor*5 when factor was 5 ...

thomasersosi
Автор

As someone who gets pumped following along to a free code academy tutorial video for Python. I am awe struck by this persons career and his ability to explain it to someone like myself. Keep rocking it!

pismith