C++Now 2018: Z. Laine “Boost.Text: Fixing std::string, and Adding Unicode to Standard C++ (part 1)”

preview_player
Показать описание


std::string has serious design flaws: its fat interface; its uselessness for editing of very long sequences of characters; and its complete lack of support for text encoding. This talk presents a proposed library, Boost.Text, a library of interoperating types and algorithms.

Boost.Text tries to do two things. First, it seeks to address the deficiencies of std::string. Second, it adds an additional layer of full Unicode support for those users that need it (without encumbering users of strings that do not). Both of these are done in a consistent and modern way. This library is intended for eventual standardization.

Zach Laine has been using C++ in industry for 15 years, focusing on data visualization, numeric computing, games, generic programming, and good library design. He finds the process of writing bio blurbs to be a little uncomfortable.

---

*--*

---
Рекомендации по теме
Комментарии
Автор

His argument for why the operation is constant time is kind of bunk. It might be technically correct to say it, but if you follow that kind of reasoning then you can prove any operation completable on a specific computer is in AC0. With arguments like that every algorithm in the stl is constant time on most containers because the number of elements is bounded from above by size_t. It's true but it is a practically useless thing to say.

timseguine
Автор

The claim around 1:28:30 that “the longest sequence of things that is considered one grapheme is 18” is wrong. That may be the longest sequence of things that can be composed into one predefined code point (I haven't checked), but U+0041 followed by 100 copies of U+0308 is still one grapheme, by TR 29. (Still, the 32 code point limit is entirely reasonable in practice.)

ccreutzig
Автор

I really, really don't like the slicing operation hidden behind operator(). A string is not a callable object. Put a name on what you do: "slice" or "substr" but certainly not "( )".

sephirostoy
Автор

Is there any interoperability with std::string_view?
std::string_view is supposed to be the "glue" between string types, allowing for use without making every consumer boost::text aware.

farway-
Автор

To write portable c++ code with non-ascii characters in string literals was always a forbidden mystic black magic :D

christo
Автор

Need this. Text processing in C++ is pretty much impossible outside of a single legacy code page.

erikitter
Автор

There is a difference between a function that accepts a string and a function that works on a string. Member functions are a clear indication of the latter.

voltairespuppet
Автор

if something like this doesn't get included into the standard by ~c++23 i will be very sad

Sopel
Автор

Is there a reason why this library is not yet in the Boost review queue?

mjKlaim
Автор

IMHO if someone want COW then there should be text and cow_text (don't pay for what you don't want). Also defining type by negating property of other type is code smell — (unencoded) rope and encoded_rope is better than (encoded) rope and unencoded_rope.

LordNezghul
Автор

24:04 Why not make only `template<int N> string(const char (&)[N])` non-explicit (instead of `const char*`)?

cHrtzbrg
Автор

Great looking string replacements, but I'm not sure I agree on returning by value on operator[] const. It breaks code that does &str[0] for a const char* if str is const which isn't obvious. &str[0] works on just about everything but begin() works on almost nothing depending on your compiler (I'm looking at you, MSVC) unless you do the unintuitive &*str.begin().

I also agree about not using operator() for substr because a string is not a function-like object. Just 2 minor points.

ErroneousI
Автор

Thread safe programming is tricky. I don't think people are understanding why the rope is safe if you always copy it. I built something like this (it didn't use a b-tree, but was in other ways very similar) for StreamModule, and I thought it through carefully and came to the exact same conclusion. If you always make a copy when passing your data to another thread, that change in reference count will be seen in your thread, and since the other thread couldn't have had a reference before you gave it your reference, it will see that count too. Incremementing and decrementing the reference count do need to be handled atomically. But merely looking at it does not, if you always copy when handing something to another thread.

erichopper
Автор

There needs to be a sentinel value to represent leaving a particular part of your range blank, ala Python. And it should be trivial to spell. I suppose doing s(2, s.size()) is an option, but I will say that a lot of the pleasantness of using Python style slicing is to be able to leave parts of the slice specification blank an have it mean 'the right thing'. s[2:] is very nice to write, and it was obvious to me what was going on.

I agree that having it be operator () is kind of evil and wrong for a wide variety of reasons. operator overloading is a tool to be used very carefully. And there should be an obvious semantic mapping between your overloaded operator and what the regular operator does. This violates that. Additionally, it allows a string to be used in a callable context that would expect something function-like (like ::std::function) with surprising and bizarre results, which is really just a special case of the principle being violated here.

I obviously like Python, but stretching too far to reach that aesthetic isn't wise. C++ is C++, not Python.

erichopper
Автор

I know I'm in a small minority of people who care, but I can see your standard proposal tanking for not supporting allocators ("I hate them" is a valid personal reason though lol).

I reduced the CPU cost of using an important data structure in my current project by 40% just by substituting a custom (and not very fancy) allocator for the usual new/delete. The Lakos talk is interesting in that respect, too:


Allocators are not only about performance though, there are many things you can do with them, e.g. improving memory locality, putting objects in shared memory, for debug logging, for detecting buffer overruns (yes, that's right), and you can even use them in the creation and update of database files by allocating from mmap()ed files.

While the data structure from the above-mentioned project does not (yet?) contain strings, I'd hestitate to commit to a library that doesn't allow me to change allocators _if I ever need to_ . As for the particular _allocator model_ in the standard, it does in fact have many flaws, some of which have been fixed and others not.

Fixed ones:

- Standard containers were not obliged to support "stateful" (non-equal) allocators.
- tons of boilerplate code, fixed by mandating use of std::allocator_traits in container implementations
- some minor issues

Not fixed ones:

- The maybe worst syntax ever mandated by the standard: typename Alloc::template rebind<U>::other – say again? [Since "normal" allocator templates no longer need a nested "rebind" class template (which is indeed a good thing), container implementers now have to write rebind_alloc<U> actually.]
- propagate_on_container…, select_on_container… – a nice idea to simplify using copy/move/swap operations and non-equal allocators, but in the end half-baked and inconsistent.
- certainly others…

TruthNerds
visit shbcf.ru