Binary Object Serialization with Data Structure Traversal & Reconstruction in Cpp - Chris Ryan

preview_player
Показать описание
---

Binary Object Serialization with Data Structure Traversal & Reconstruction in C++ - Chris Ryan - CppCon 2022

This talk will describe a minimally intrusive technique to add serialization to a set of classes, traversing hierarchical and non-hierarchical data, persisting in a binary format and dynamic reconstruction

When storing, it can deduce the data types using Template Argument Deduction (TAD) and safely protects against recursive reentrancy.
When loading, it uses a reflection type technique for dynamic object creation using an automatic serializable type registration mechanism.

Reflection as a language feature will be unavailable until at least C++26. This serialization technique can dynamically recreate a persisted complex data structure/structure network. This is a platform agnostic technique.

Not everybody is yet able to migrate to C++20, so this is using a C++14 compliant SFINAE/std::enable_if<> mechanism. We will also explore optimizations and what it takes to convert this technique to use C++20 concepts.

The serialized data can be persisted to a disk file, shared memory for IPC or using network streams for live remote data sharing like HPC or gaming.

This is not trying to sell you on the use of the library but rather sharing metaprogramming techniques you can add to your toolbox.
---

Chris Ryan

Chris Ryan was classically trained in both software and hardware engineering. He is well experienced in Modern C++ on extremely large/complex problem spaces and Classic ‘C’ on Embedded/Firmware devices (large & small). Chris has no interest in C#/.,Net, Java, js or any web-ish tech.
---

#cppcon #programming #cpp
Рекомендации по теме
Комментарии
Автор

Interesting run-through of the major parts and tricks for a basic serialization framework. I especially liked the trick of Serializable<D, B> to construct the class hierarchy and the necessary factory functions for the reconstruction (nicer than MACROs). However, there are some important caveats or limitations to point out. First, the type info mechanism presented cannot be used as is: (1) the typeid(T).name() is not portable or stable, there is no guarantee that it will be the same across any compiler, platform or even repeated invocations of the same program, so that cannot be used (there are other ways to construct a stable type name string), and (2) the factory function map does not handle hash collisions and in fact, does not retain the type name, so it can't distinguish at all between two types whose name hashes to the same thing (obviously, there are ways to handle that too). This just means that type info and factory function mapping has to be implemented in a stable and portable way, which is a little tricky but not so hard. The other important limitation is the loss of the data member names. In many practical serialization schemes, it is very useful to make it possible to associated the serialized elements with a name (with the side benefit of being able to test that save / load have matching ordering). And finally, having different stream types (files, network, etc.) is nice, but it is even nicer to also support different serialization formats (binary, xml, json, protobuf, schemas, etc.) which I would highly recommend (and retaining data member names becomes very useful here too, since human-readable formats can have named fields). In short, the improvements could be:
1) Create a stable, portable and non-hash-colliding type-info mechanism.
2) Serialize elements as named fields (e.g., pair<string_view, T>).
3) Have polymorphic "archive" types that can map fields to their byte-stream representation in different ways (binary, json, yaml, xml, protobuf, etc..).

mike
Автор

37:42 That Dynamic Uint compressed data also works for uint64_t. We also likely want to use that to ints as well since most of the time ints are small positive numbers. The problem of that is negative numbers. Now if we have an int64 at -1, that might take as much as 10 bytes to store it (instead of just 8). We can offset the number by a constant (eg: +1), so that we store -1 as 1 byte instead since -1 is such a common int flag, but that still doesn't fix for other negative numbers like -10. Perhaps the solution to that is to not use the operator<< overload for streaming, but its own method instead, so we can specify whether or not we want to use the uint compression (based on the purpose of the integer). 👍

RA-NAF