Making an Algorithm Faster

preview_player
Показать описание


#neetcode #leetcode #python
Рекомендации по теме
Комментарии
Автор

"from this little youtuber primeOgen"

sahilverma_dev
Автор

A similar video by Matt Parker (Stand-up Maths) "Someone improved my code by 40, 832, 277, 770%." I was part of the team that optimized his 1 month solution down to 300 microseconds. We submitted ours kind of late so he wasn't able to cover our big algorithmic changes, but many of the techniques you mention here applied there as well.

landonkryger
Автор

18:00 no coincidence, the range from 97 to 122 falls between (32*3 = 96) and (32* 4 = 128), meaning the remainders span from 1 to 31 within this interval

haronxbelghit
Автор

To preface, You very nicely explained everything. Even the tricky bits :).

Stack is handled by the cpu under the direction of the OS. There is still overhead when you cross the page boundary. The heap is not handled by the OS, but by the whatever allocator you use (the allocator mmap()s pages when needed). Allocators usually use "bins/buckets" for various sizes of allocations, so it's pretty fast. Unless the allocator has to mmap() some more memory.


Anyway, what i'm trying to say is that it's complicated. Like if your language would let you, you could even use the stack as a dynamic array. Or you could mmap() a big piece of memory and just use it as a dynamic array, as memory doesn't get allocated until you touch it. Getting the same result as the stack. If the array size is fixed, the compiler could even just reserve a piece of memory from when the program is loaded (like they do for const strings).

Cache locality is also a bit.. Cpu's cache memory in "cache lines", that are usually 64 bytes. And yea, if your resize moves the data then all those cache lines get useless. Then again the memcpy puts the new data into cache, so it's not ~that~ bad. Just that something else will get thrown out. And there's more levels of cache, like L1 is closest to the core but way smaller then L3, ofc.
And yea, the cpu "prefetch"-es data as you read it in. It even figures out the direction, so going 0..n is the same as n..0.

In short you always want to use as little memory as possible, and keep the memory that is accessed at the same time as close together as you can. If you can keep it in registers, like the bitfield solution, then your golden. And you ~might~ want to align/pad to some power of 2 (especially to 16 bytes for SIMD, that you even had to on older cpus).

PS Oh, and your solution to subtract 'a' would also be faster then modulo (Modulo is divide, ofc subtract would be faster. Bdw bitwise operations usually take 1/3 of a cpu tick, the fastest operations there are (except maybe mov between registers)).

TestTost-jd
Автор

"the actual runtime is what matters" - tell that to the average react developer

TheOnlyJura
Автор

These random topic videos have been really insightful, great content!

juanmacias
Автор

I think you can still get a cache locality boost using an array, because the array’s memory is next to other stack variables. That means the array’s memory is more likely to be in the same cache line as the other stack variables.

eblocha
Автор

That is a BRILLIANT video, loved watching it.

howto.
Автор

Ascii values weren't technically chosen to be used mathematically like that, but there is a similar idea there. They wanted to be able to check whether a letter was upper or lowercase simply by checking a single bit. Or maybe it was for converting between.

CrapE_DM
Автор

Ignoring the constant *when you are learning Big O* is important, so that you dont get distracted, however, when building something, it’s only relevant if you are already at the “simplest form” or smallest big O you can achieve, and then the constant matters.

valentinrafael
Автор

At 14:20 he is relocating the array every-time the windows change but with a well constructed loop it is possible to reuse the same array and toss out the indexes we know can't have been updated and contain values from the previous sub-string. On C# doing this makes this algo 4x faster.

zangdaarrmortpartout
Автор

You're right about cache locality not being involved. It's the same thing with strings and small string optimizations

xmichael
Автор

25:44 it also instantly gives you the place where to jump to, when going forward you reach the 2nd J and you'd have to go back to see where the first J was in order to continue, going backwards your pointer is already at the second J (or actually the first J but we are going backwards) and instantly know where to start the next window ("index of J''' + 1)

freya
Автор

Feels like Boyer-Moore, but without the pain of preprocessing bad-character/good-suffix tables. Very nice.

MrSonny
Автор

You can create static arrays in Python with the "array" module. Still not sure if that qualifies it as a "real" language, though.

criptych
Автор

What I like about your explanation is you dont “assume” the audience know a thing, you drove into the tiniest detail like what even is an AND operation. Whereas college professor always have that assumption, like oh you guys must already know about stack, heap, memory allocation, let me talk about this scheduling algorithm, …

akialter
Автор

I am a lead web developer and have never done any leetcode except in university. I recently started leetcode to get into a big name company that pays like 10%-20% more than my current company and videos like this are very eye opening!

GuRuGeorge
Автор

Maybe I misunderstood what you wanted to say in the beginning, but CPUs are totally able to add two numbers together.
Does that in the end boil down to binary operations? Yes. But except in some very esoteric CPUs it doesn't run those binary operations, but there is dedicated circuitry to do the addition, in many cases in 1 cycle (e.g. on x86 it's a single uop, as well as on most embedded CPUs)

thargork
Автор

14:00 I believe the cache locality mattered because he was allocating one vector for each sliding window, and as such they were getting to random positions in the heap each time, with stack allocation it’s always the same

rodrigosantoszz
Автор

20:20 I often store stuff as bitset. It's more comfortable than working with arrays IMO.
Recently I also turned some struct of boolean flags into a bitset. (Or I rather told some AI to do it for me, since it's pretty repetitive)

porky