2014 SouthEast LinuxFest - Richard Hipp - SQLite as an Application File Format

preview_player
Показать описание
2014 SouthEast LinuxFest
Richard Hipp
SQLite as an Application File Format
Рекомендации по теме
Комментарии
Автор

tldw: sqlite is a semantic filesystem for small "files" (table cells)

milahu
Автор

Some 'git' fud starts happening around 38:00 about how pack files are in 'some binary format that is only known to Git'. It's true that there is the actual delta format, for when Git chooses to delta-compress (to store objects that are similar to each-other). So okay, query: is this algorithm standardized and used in other tools as well? Is there documentation describing the format?

Well, but first of all, Git was built to be fast. I might admit that the code in git source, is heavily optimized. And perhaps even, one might complain that it is not readable and/or not commented. But there is a point to this approach -- if you didn’t have version control software, but wanted to be able to reference previous versions of your work, you might simply create a ZIP archive of your directory every day as a makeshift “version control.” Although Git uses a different compression algorithm, this is almost exactly what it does! Zlib compression is quite powerful, and disk space is cheap enough that Git is happy to use nothing more than these two techniques for quite a while. But there’s one thing that isn’t quite so cheap: bandwidth. Git is meant to be fast, and taking lots of data that is similar-but-not-quite-identical, compressing it, and then transmitting it over the wire is slow. Thus, packfiles, which you see, were really created to reduce network usage (and increase network performance). The design of Git’s packfiles were informed by the goal of making network usage easy.

Secondly, I left hanging the question of is there documentation describing the format? Okay, you can simply guess the basic algorithm, if this is the sort of question that you might be asking. After all, libxdiff used Eugene W. Myers's algorithm. Git delta encoding is copy/insert based. There is the paper 'File System Support for Delta Compression'. There is even a node.js library which implements both diff/patch functions using delta encoding, and that is, specifically, using git delta encoding. The transfer protocols used for push and fetch operations are implemented in at least two other major Git implementations besides the reference: JGit and libgit2, and I feel misled, if I am supposed to have gathered that there is no simple object-oriented api for working with git data at a lower level. I recall that there is an implementation of Git in pure Go.

daniellanglois