CppCon 2017: Barbara Geller & Ansel Sermersheim “Unicode Strings: Why the Implementation Matters”

preview_player
Показать описание


We will provide a brief overview including an explanation of what Unicode is, string terminology, and how Unicode supports non US languages.

We will cover the pros and cons of various String formats and encodings including UTF-8, UTF-16, UCS-4, etc. A time line of Unicode development will be shown and how other languages have handled string processing over the last twenty years.

We will provide a brief overview of where strings are used, what can go wrong with strings, why string encoding is important, and how the CsString library solves a major problem with string handling.

We will explain how the CsString library has changed our CopperSpice Gui libraries and improved string processing in DoxyPress.

No prior knowledge of Unicode, CopperSpice, or DoxyPress is required.

Barbara Geller: CopperSpice, Co Founder

I am an independent consultant with over twenty-five years of experience as a programmer and software developer. I have worked with numerous smaller companies developing in-house applications. I have also designed and developed Windows applications for several vertical markets including medical billing, transportation, and construction.

My degree is in Electrical Engineering from Cal Poly Pomona with additional studies in Computer Science.

I am a Co-founder of CopperSpice, a C++ library derived from the existing Qt framework. I designed the Diamond Editor, a cross-platform programmers editor using the CopperSpice libraries. I have programmed in C++, Qt, Visual Objects, Clipper, PHP, and Java.

Ansel Sermersheim: CopperSpice, Co Founder

I have been working as a programmer for nearly twenty years. My degree is in Computer Science from Cal Poly San Luis Obispo. I have transitioned to independent consulting and I am currently working on a project for RealtyShares in San Francisco.

Co-founder of CopperSpice, a C++ GUI library.
Co-founder of DoxyPress, a C++ application for generating documentation.
Developer of the open source libraries: libGuarded, CsSignal and CsString.

I have programmed in C++, C, Lisp, Java, and Perl, with extensive knowledge in TCP/IP and mutilthreaded design. I am an avid follower of the C++ standard. Speaker at CppCon 2015, CppNow 2016, CppNow 2017, and several ACCU Bay Area meetings.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

I like the collaborative system that Barbara and Ansel employ in their presentations.

mmbrshp
Автор

I've never cared for the the duel presenter style but the information is interesting and I can understand the speakers.(clear voices, pacing is decent, and audio recording is good)

TheDuckofDoom.
Автор

To get to the CS String part, go to 33:14. Up to that it's about what is Unicode, history and why other string library implementations have problems.

DPGrupa
Автор

I really wonder if they have ever seen grapheme clusters and what there library would do when walking over a string with emoji with skin color modifier like this one: "👩🏽"? Will it print "👩" and "🏽"? Because characters on the screen are not the same as codepoints.

OSSMaxB
Автор

Codepoints is not enough for proper string operations, like substring, unfortunately. And iteration codepoint-by-codepoint is not very useful too. C# crash example could be extended: some characters consist of multiple codepoints. Please, test how your library "walks backward" (Unit Test 7 at your presentation) with some combined characters. Burmese, anyone? German text with umlauts in NFD, which is used on MacOS? Unicode is ugly.

blacklion
Автор

Can the Rust fanatics please stop rubbing their privates in the talks and comments of C++ videos?
We get that Rust has had the benefit of 30+ years of hindsight. We get that Rust hasn't had to support large legacy code bases whilst adding and updating new language and library features. We get that we should all switch to Rust; the objectively better programming language for all use cases.
We will throw our code out tomorrow and start fresh, we promise.

jaredmulconry
Автор

"I am a Rust programmer, and *we* obviously have..." - fanboy alert

broken_abi
Автор

This talk is BS. Read the Unicode Standard asking yourself what the atoms really are (the answer includes a depends and is certainly not as presented here) and how wrong the now 21 bit, but need 32 because future is. That is very basic stuff they are getting pretty much wrong. There are better talks about Unicode including, if I am not completely wrong at CppCon, probably later though, and at a very brief glance it seems Boost.Text is about to get pretty good, though not yet through the process. There is at least one informative talk about his work on it by the author. This is a sad example of how many people in the C++ community still get Unicode wrong even if they think they are trying (or in this case think they are doing great) based on the very historical only picture of character array properly representing text -- which it just doesn't.

Short version: at best this is basic level Unicode support (which excludes all the hard stuff that is sadly necessary for text processing).

erikitter