CppCon 2014: James McNellis 'Unicode in C++'

preview_player
Показать описание

--
In some programming languages, text processing is easy. Unfortunately, C++ is not one of those languages. C++ lacks good, built-in support for Unicode, though the situation is starting to improve.

This session will begin with a brief overview of text encodings, and an introduction to Unicode and the various Unicode encodings. We'll look at the woeful state of Unicode support in C++98 (or, really, lack thereof), then take a look at the improvements that were made in C++11 and other improvements that have recently been proposed for standardization. We'll finish up with a discussion of several libraries designed to make it easier to work with Unicode in C++, including the widely-used, open-source International Components for Unicode (ICU) library.
--
--

*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

since 2017 the german uppercase for ß is ẞ (its own letter)
Before that there was no uppercase form of ß
SS seems like a workaround to make a program produce a desireble result in most cases.

YG
Автор

To write portable c++ code with non-ascii characters in string literals was always a forbidden mystic black magic :D

christo
Автор

"Any time there are numbers on the slide they're hexadecimal numbers"

next slide: "32 control characters, 95 printable characters"

I literally laughed out loud! No harm done but you have to admit it's funny.

JimBalter
Автор

Nice "Easteregg" at 0:11, i see what you did there :D

oder
Автор

If you know what Unicode is and interested only how all these are implemented in C++ skip to 30:07

budokan
Автор

There is an error at 17:25. U+0065 is e. Lovely talk regardless!

tymscar
Автор

I'd be surprised if there were anyone that didn't notice it...

xarcaz
Автор

And actually what was explained here is not quite accurate. The C++ standard is messier than that.
The video says that char16_t and char32_t are UTF-16 / UTF-32. But that is not what the standard says. That is only true if the STDC_UTF_16 (respectively STDC_UTF_32) macros are defined.

mihainita
Автор

Gosh...everything in computer science is getting so complicated. Even storing a piece of text is already quite involved. On the other hand, C++ seems to quite nicely abstract away some of the burden.

jonskunator
Автор

Visual C++ fucks up existing code at every new Release. GCC doesnt has this sort of Problem at all.

nwodon
Автор

And russian "Hello" -> "Привет" ;)

Dereline
Автор

In the year 2020 there shouldn't be a programming language without full unicode support. At least UTF-8. I think unicode is stupid anyways. Unicode should have been a specialist format. They should have had 2-byte/4-byte code point for all human languages without all the normalization/grapheme cluster non-sense. Oh two symbols look identical but are different? Who cares. Make them same. I just don't get it.... I'm just pissed off how annoying unicode is in C++.

nexusclarum
Автор

65, 536 characters won't be enough if you put useless symbols like that rabbit. Just alphabets and signs like -, .;:*. For that use images or dedicated fonts like windlings

avtem