98% Cloud Cost Saved By Writing Our Own Database

preview_player
Показать описание
Recorded live on twitch, GET IN

### Article

### My Stream

### Best Way To Support Me
Become a backend engineer. Its my favorite site

This is also the best way to support me is to support yourself becoming a better backend engineer.

MY MAIN YT CHANNEL: Has well edited engineering videos

Discord

Hey I am sponsored by Turso, an edge database. I think they are pretty neet. Give them a try for free and if you want you can get a decent amount off (the free tier is the best (better than planetscale or any other))
Рекомендации по теме
Комментарии
Автор

The best thing about saving 98% cloud cost is that developer hours are free and that this will be super easy to maintain when the original devs quit.

Fikn
Автор

I saved 99% of my cloud costs by connecting my frontend to an excel spreadsheet. Such a great idea!

TomNook.
Автор

TLDR: they wrote their own log file. No ACID = not a DB.

ivanjermakov
Автор

Chat misunderstanding RTK. RTK is literally just correcting GPS data using a known point in realtime. It is not better than GPS, it just enhances the way the measurements are interpreted

michaelcohen
Автор

5:35 Preping for scale can be worthwhile if they manage to get a contract with a very large company.

A company I work for recently contracted with a company that provides a similar service. The small scale test with 2, 000 GPS trackers was straining their infrastructure. The full rollout of 200, 000 trackers broke their service for a week or two while the had to rush to scale up their service by about 20x.

jsax
Автор

You mentioned at the beginning of the video that making your own language makes sense if its designed to actually solve a problem in a better way.

This is that. They did not attempt to write a general purpose db.

They wrote a really fast log file that is queryable in real time for their domain. This wins them points in costs (margins matter) but more importantly, gives them a marked advantage against competitors. Note that theyre storing and querying way more efficiently. Quality of product is improving while cost of competition is increasing. Seems like a no brainer on the business side.

dv_xl
Автор

I signed in and made a youtube account just now to say THANK YOU!

15:00 I DIDN'T THINK ABOUT VERSIONING MY DATA! Sometimes, the things you don't know when self taught are just jaw dropping. This has been very humbling.

GrizikYugno-kuzs
Автор

I do GIS at work and have several hand held units that connect over bluetooth to iOS and Android on my desk right now that can get sub meter accuracy. I have even played with centimeter accuracy. I have trimble and juniper geode units on hand. I built the mobile apps we use for marking assets in the field and syncing back to our servers, and am currently working on an offline mode for that. So yeah, GPS has come a long way since last you looked. Internal hardware is like 10-20 meters on a phone, but dedicated hardware that can pass over as a mock location on Android or whatever can get much much more accurate results.

PeterSteele
Автор

Isn't streaming data like this what kafka was made for?

christ.
Автор

Great solution! If they want to further optimize they should use fixed point instead of floating point and do variable length difference encoding. Most numbers would fit 8 or 16 bits. Using that the memory requirement could easily be half or even less. The size of the entry should be stored in uint16 or uint8 even. If size>65536 is possible then use variable length encoding for the size too. The whole data stream should be stored like that: a stream. 30.000 34 byte entries a second is 1MB/s which is a joke. Write all logs into a stream and parallel collect them for each data source in RAM until a disc block worth of data is collected. Only flush the whole blocks to the disc. This would optimize storage access and you could reach bandwidth limit of the hardware. In case of power failure the logs have to be re-processed like a transaction log is reprocessed by a database. Once we have optimized such a logger that we used no FS raw access to a spinning HDD and we could sustain very good write bandwidth using cheap hardware.

andrasschmidthu
Автор

8:58 KeyboardG: "high write and buffered is Kafka"

Yeah I'm with this comment. I still don't understand why they couldn't use Kafka instead of some custom DB.

Michaeltje
Автор

The thing about the version in the header is spot on, but unlikely to help them here since they want to be able to directly access specific bytes for fast indexing, so all the bytes for that can't ever change meaning. Assuming they haven't already used all of the available flags, the last flag value could be used to indicate another flag-prefixed data section.

mikeshardmind
Автор

In their case I would use Kafka to collect the data, and materialize to a database for queries.

michaellatta
Автор

There's a work around for the version in header thing. You can run V2 on a different port. But that's less safe than getting your minimum header right out of the gate. Error checking is also a good idea so that if something gets munged up in transit (or you send the wrong version to the wrong destination), a simple math function will mostly guarantee the bits won't math and the message can be rejected. You can also then check what version the bits do math for and give a nice error message.

bkucenski
Автор

I had hand on a similar database that's a bit more complex than this, lives with small changes for over 25 years and runs in thousands if instances. This was one of the few features that allowed my former employer to outcompete several competitors who based their solutions on general databases. The difference in performance, scalability and HW requirements is astronomical. The investment to R&D has payed off many-many times.
If this company expects to grow substantially than this DB can give them the edge against their competitors in pricing and flexibility. Assuming that they will be able to incorporate their future needs into it's design.

atlasz
Автор

Saved cloud cost, now they have maintenance and cost and a self-induced vendor lock-in. Well done.
Tens of thousands of vehicles and people isn't special and isn't big at all.

Somehow everybody thinks their problem is a unique one. But it isn't. Looking at their proposition it looks like something FlightTracker has been doing for

Writing a blog post about something you _just_ built is always easy because everything appears to work like it should. Now fast forward to 5 years in the future, and see how you handled all the incoming business requirement changes in your bespoke binary format.

avwie
Автор

The title is a play on [ not knowing ] the difference between "database" and "database-engine."
Databases are just files that store content.
A Database-engine is a CPU process that manages connections (and users) that read||write specific blocks of a data file.
It was still an interesting article.
Good Video.

complexity
Автор

a multi tenant postgis will do that trick, can be horizontally scaled, can have a tenant per customer if you want, still don't get reinventing the wheel
now they'll have another headache of maintaining their new db and need new developers to learn it :"

mohamednaser
Автор

15:59 They could just make one of the available flags mean "has extended flags" or "has version"

zxuiji
Автор

This could be a rare example of not having a version field in the header, or even a header at all. They've got the database itself. If they have to change anything, new whatever, stick it in a new database. 1 extra byte per entry when you've got data coming in as fast as it sounds like they are might be too expensive on something that effectively doesn't change.

x