Making an Algorithm Faster

Показать описание

#neetcode #leetcode #python

NeetCodeIO

Рекомендации по теме

Комментарии

"from this little youtuber primeOgen"

sahilverma_dev

A similar video by Matt Parker (Stand-up Maths) "Someone improved my code by 40, 832, 277, 770%." I was part of the team that optimized his 1 month solution down to 300 microseconds. We submitted ours kind of late so he wasn't able to cover our big algorithmic changes, but many of the techniques you mention here applied there as well.

landonkryger

18:00 no coincidence, the range from 97 to 122 falls between (32*3 = 96) and (32* 4 = 128), meaning the remainders span from 1 to 31 within this interval

haronxbelghit

To preface, You very nicely explained everything. Even the tricky bits :).

Stack is handled by the cpu under the direction of the OS. There is still overhead when you cross the page boundary. The heap is not handled by the OS, but by the whatever allocator you use (the allocator mmap()s pages when needed). Allocators usually use "bins/buckets" for various sizes of allocations, so it's pretty fast. Unless the allocator has to mmap() some more memory.

Anyway, what i'm trying to say is that it's complicated. Like if your language would let you, you could even use the stack as a dynamic array. Or you could mmap() a big piece of memory and just use it as a dynamic array, as memory doesn't get allocated until you touch it. Getting the same result as the stack. If the array size is fixed, the compiler could even just reserve a piece of memory from when the program is loaded (like they do for const strings).

Cache locality is also a bit.. Cpu's cache memory in "cache lines", that are usually 64 bytes. And yea, if your resize moves the data then all those cache lines get useless. Then again the memcpy puts the new data into cache, so it's not ~that~ bad. Just that something else will get thrown out. And there's more levels of cache, like L1 is closest to the core but way smaller then L3, ofc.
And yea, the cpu "prefetch"-es data as you read it in. It even figures out the direction, so going 0..n is the same as n..0.

In short you always want to use as little memory as possible, and keep the memory that is accessed at the same time as close together as you can. If you can keep it in registers, like the bitfield solution, then your golden. And you ~might~ want to align/pad to some power of 2 (especially to 16 bytes for SIMD, that you even had to on older cpus).

PS Oh, and your solution to subtract 'a' would also be faster then modulo (Modulo is divide, ofc subtract would be faster. Bdw bitwise operations usually take 1/3 of a cpu tick, the fastest operations there are (except maybe mov between registers)).

TestTost-jd

"the actual runtime is what matters" - tell that to the average react developer

TheOnlyJura

These random topic videos have been really insightful, great content!

juanmacias

I think you can still get a cache locality boost using an array, because the array’s memory is next to other stack variables. That means the array’s memory is more likely to be in the same cache line as the other stack variables.

eblocha

That is a BRILLIANT video, loved watching it.

howto.

Ascii values weren't technically chosen to be used mathematically like that, but there is a similar idea there. They wanted to be able to check whether a letter was upper or lowercase simply by checking a single bit. Or maybe it was for converting between.

CrapE_DM

Ignoring the constant *when you are learning Big O* is important, so that you dont get distracted, however, when building something, it’s only relevant if you are already at the “simplest form” or smallest big O you can achieve, and then the constant matters.

valentinrafael

At 14:20 he is relocating the array every-time the windows change but with a well constructed loop it is possible to reuse the same array and toss out the indexes we know can't have been updated and contain values from the previous sub-string. On C# doing this makes this algo 4x faster.

zangdaarrmortpartout

You're right about cache locality not being involved. It's the same thing with strings and small string optimizations

xmichael

25:44 it also instantly gives you the place where to jump to, when going forward you reach the 2nd J and you'd have to go back to see where the first J was in order to continue, going backwards your pointer is already at the second J (or actually the first J but we are going backwards) and instantly know where to start the next window ("index of J''' + 1)

freya

Feels like Boyer-Moore, but without the pain of preprocessing bad-character/good-suffix tables. Very nice.

MrSonny

You can create static arrays in Python with the "array" module. Still not sure if that qualifies it as a "real" language, though.

criptych

What I like about your explanation is you dont “assume” the audience know a thing, you drove into the tiniest detail like what even is an AND operation. Whereas college professor always have that assumption, like oh you guys must already know about stack, heap, memory allocation, let me talk about this scheduling algorithm, …

akialter

I am a lead web developer and have never done any leetcode except in university. I recently started leetcode to get into a big name company that pays like 10%-20% more than my current company and videos like this are very eye opening!

GuRuGeorge

Maybe I misunderstood what you wanted to say in the beginning, but CPUs are totally able to add two numbers together.
Does that in the end boil down to binary operations? Yes. But except in some very esoteric CPUs it doesn't run those binary operations, but there is dedicated circuitry to do the addition, in many cases in 1 cycle (e.g. on x86 it's a single uop, as well as on most embedded CPUs)

thargork

14:00 I believe the cache locality mattered because he was allocating one vector for each sliding window, and as such they were getting to random positions in the heap each time, with stack allocation it’s always the same

rodrigosantoszz

20:20 I often store stuff as bitset. It's more comfortable than working with arrays IMO.
Recently I also turned some struct of boolean flags into a bitset. (Or I rather told some AI to do it for me, since it's pretty repetitive)

porky

Making an Algorithm Faster

Making an Algorithm Faster

This Algorithm is 1,606,240% FASTER

Adding Nested Loops Makes this Algorithm 120x FASTER?

This video breaks the algorithm.

Do THIS YouTube Algorithm TRICK to Grow Your SMALL Channel FASTER

How to Trick the TikTok Algorithm to Make You Go Viral

Coding for 1 Month Versus 1 Year #shorts #coding

Best Programming Languages #programming #coding #javascript

DeepSeek AI R1 vs. ChatGPT: The SHOCKING Results!

ILLEGAL RUBIK'S CUBE ALGORITHM??

Flipped Edge F2L Case FASTEST ALGORITHM!

THIS SUNE CLL ALGORITHM WILL MAKE YOU FASTER (2x2) #shorts

Small Channels: Do THIS and the Algorithm Will LOVE You!

NEW F Perm Algorithm! 😱

Don't Use This Algorithm! | OLL 47

A* (A-Star) Pathfinding Algorithm Visualization on a Real Map

Can I Trick The YouTube Algorithm Into Showing My Videos?

The ONE concept to understand ANY machine learning algorithm faster!

PLL algorithm easy method. #shorts

How to Beat the YouTube Algorithm - Finding a Niche

Rubik’s Pyraminx last layer algorithm. flipped edge solution!

TIKTOK ALGORITHM EXPLAINED FOR 2025 (HOW TO GET MORE TIKTOK FOLLOWERS FAST IN 2025)

Pathfinding algorithm comparison: Dijkstra's vs. A* (A-Star)

How AI Discovered a Faster Matrix Multiplication Algorithm