Browser hacking: JIT chores, refactoring, and some code size optimization

preview_player
Показать описание


Ladybird is a cross-platform web browser, also part of the SerenityOS project. :^)
Рекомендации по теме
Комментарии
Автор

Hello friends! :^) Some known mistakes today, let me know if you spot any others:

- Broke the patchable mov instructions in push_unwind_context() by turning them into xor reg, reg (thanks skyrising for spotting!)
- REX prefix missing a bit for xor reg, reg when reg is one of r8-r15

awesomekling
Автор

the yak shaving reference is simply genius, .

WalterStucco
Автор

I love the jit series. I need more jit videos! It's also incredibly important the backgrounds keep getting progressively more Yakky.

dbarenholz
Автор

Great video, as always. About jump compression: I many cases you know that a jump is less than 127 bytes (in the fast paths for instance), so you could add Assembler.short_jump etc. For the increment fast pass, why can’t you just compare the lower 32-bit register to Then there is no AND needed. This is quite x86-specific, so perhaps it should go into an Assembler function. And a third thing: We don’t you extend the JS test-suite with all the JIT-specific corner cases, so that you don’t run into regressions? And also, iirc the js-test suite ran (with one error which seemed expected) in earlier videos. 😊 Perhaps just run it before each commit? Looking very much forward to the next video!

perotubinger
Автор

Nice indeed. I would not worry about jump encoding yet. It complicates things, and in most assembler this is done by doing 2 (or sometimes even 3) passes. But in a web browser, you want jitting to be fast too, performance of the generated code is not always critical (unless you run non real world benchmarks).

When benchmarking CPU bound code (no IO, no syscalls, etc), with hyperfine, or otherwise, I would not look at mean (average), but rather at min only. Everything above min is noise (scheduler, hardware interrupts, rcu callbacks), so mean is biased, but min is a better estimator.

movaxh
Автор

backwards jump optimization is pretty easy to do with the current structure. if you’re emitting a jump to label with a known offset (it was already linked), emit the shortest possible jump :) otherwise fallback to the current behavior.

GabrielSoldani
Автор

41:30 At 3E you're storing rax and at 42 you're loading from that same location. It would be nice if you could prevent this pattern. You could keep some info about the previous emitted instruction and if it's basically a nop, don't emit the current instruction.
Of course now you're getting into non-local optimizations. But I would think this is fun stuff to do.

xbzq
Автор

Implementing to_numeric in JIT would allow you to implement a lot of other instructions in the JIT as they all call C++ code just to do a to_numeric before doing a really simple operation

urielsalis
Автор

I think the increment fast path can be done without masking off the top bits. You already know that the four upper bytes are the shifted int32 tag, so you don't need to mask them off again, you can include them in the comparison instead (so instead of comparing to you compare with SHIFTED_INT32_TAG |
Too bad there doesn't seem to be a comparison with a 64-bit immediate argument.

Jodmangel
Автор

You speculated if there was some "fancy technique" to encode short forwards jumps: it's not fancy, quite the opposite, but you can encode each basic block independently then stitch them together at the end, but you need to do that with the *machine code* basic blocks, which means individual instructions would emit several blocks!

You could fancy that up quite a bit by detecting when you have the target location as you emit, probably in Label::link, so you don't end up with thousands of blocks per function.

SimonBuchanNz
Автор

These Midjourney wallpapers are amazing xD would love to see the prompts in the description or something

yyny
Автор

I think some sort of micro/macro assembler approach would be ideal wrt. porting to multiple architectures. As in a microassembler (per architecture) that deals with encoding instructions on the specific architecture (and can choose different encodings when they are semantically equivalent), and then a macro assembler per application/library (ie. LibJS) per architecture with a common interface between architectures - which then allows the compilation for the various bytecodes to use the macro assembler and never worry about specific machine instructions. The macro assembler could then deal with everything that emits multiple instructions, ie. function prolog/epilog, etc.

kastermester
Автор

WHF! That background wallpaper is so on point! :^)

rajiv.kushwaha
Автор

Hiho Andreas,
For the number test of fastpath of lessthan, you can move the 2nd shift after the first int32 test. If first test is false (= not int32), there is no need for shifting the 2nd number since you don’t need to test if it is an int32 then 🙂

nils-kopal
Автор

This might not apply to your project but if I was doing this I'd consider using LLVM and it's IR. That might make porting easier but you may be more interested in doing stuff by hand...

green.holden
Автор

arg6 is simply pushed onto the stack :) (followed by realigning the stack)

after the call returns, you can pop it and realign the stack in a single mov.

GabrielSoldani
Автор

would it be possible to make a call to a specific function when doing all the pushs and pops? that could potentially save a lot of repeated opcodes

cocusar
Автор

I'm not sure if I got this right but I'm curious ... in some cases, aren't you just moving load from the resulting program to the JIT compiler? If the JIT process runs the code it generates in realtime, checks for 0 for example done in C++ also generate opcodes that need to be run, so in the end, more instructions might be needed for the same operation (check from C++ & actual instruction) which could also make the whole process slower. In the typical usecase of a JIT compiler, the speed of the compiler itself counts aswell as the generated program, I think. So, you'd have to include the opcodes needed during the JIT compile process when counting total bytes of the resulting program. Regarding speed, having more bytes in the final binary might be less important than having more to do in the JIT process. To me it seems to be one of the cases where you have to find a sweet spot to get max performance. I see the benefit of real shortcuts though.
I dont know anything about JIT compilers and I'm happy to learn more. Thanks!

movAXh
Автор

Have you decided to come back to YouTube and make more hacking videos, or are you just temporarily back? We miss these vids.

relakin
Автор

2:00 how come this comparison wasn’t implemented as a JL or similar jump intended for signed comparisons? Not that I’ve done any x86 assembly but I’ve done a fair amount on other ISAs.

lawrencemanning
welcome to shbcf.ru