Unicode - going down the rabbit hole - Peter Bindels

preview_player
Показать описание
Have you ever looked at a Unicode emoji of a happy family and wondered how that actually works? Have you been to websites where � was a common sight? Is your name in a computer system not O'Neill but O’Neill ? Ever wondered why some texts crashed iOS and Android? Are you involved in a computer system that deals with text that may be more complicated than ASCII - such as receiving résumés - and would you like to understand how this all works beneath the surface?

In this talk I'll be taking you through a deep dive in history, starting with the Sumerians, Egyptians and working quickly to recent history, including Irish Ogham and Devanagari through to the most recent combining emoji with Fitzpatrick modifiers and variation selectors. We'll then take a look through Unicode itself, finding out along the way how and why things are as complicated as they are. Finally, we'll take a deep dive into past, current and future C++ support, slowly iterating our way out of C and arriving at the SG16 plans for C++23.

Save the date for NDC TechTown 2020 (31st of August - 3rd of September)

Check out more of our talks at:
Рекомендации по теме
Комментарии
Автор

39:21 small mistake, ツ is actually a Katakana character, つ would be the corresponding Hiragana character

tezerd
Автор

Unfortunately MS will never switch the internal Windows encoding from UTF16 to UTF8..

warrenbuckley
Автор

There _is_ a standard for switching 8859 code pages. i don't recall its number, but I've had to support it.

JohnDlugosz
Автор

10:25 I think that Japanese can be scrapped of the list because they also use Hiragana and Katana which are both absent from the text.

dr.c
Автор

"AD" goes before the number.

MrWaldo
Автор

This subject is both fascinating and amazingly annoying at the same time. I've been working on internationalization of my app and learned that people CAN'T agree on ANYTHING! Seriously, there's not a single universally accepted concept that humans came up with -- not in our language, not in our names, not in our dates, not in our time, not in our calendars, not in our money, and not even in our numbers. We can't even agree on how to write a floating-point number. There's nothing out there that's universal. For god's sake, people!

sentdc
Автор

2:45 uni students downloading pdfs and bypassing the trial numerous amount of times to get into scribdf

cykx
Автор

Taiwanese? Seriously? The standard language in Taiwan is Mandarin Chinese. Some old people still speak "Taiwanese" but it's just one of many Chinese languages (or dialects), none of the others of which were mentioned. And it's never written. Well almost never. Because when it is written in Taiwan it's typically one syllable or word within a longer Mandarin text and that bit will be written in Zhuyin Fuhao (Bopomofo). The one other Chinese language (or dialect) which is sometimes written is Cantonese, not Taiwanese. "Mojibake" is from Japanese and is pronounced "mo-ji-ba-ke" and has nothing to do with "baking".

andrewdunbar
Автор

Wheel was not invented yet in ancient Egypt? WAT? That's a gross mistake... Egyptians indeed had wheels!

akithered
Автор

Unicode and C++ is such a sad current state. And ICU has terrible instable API. The problem is Unicode even 30 years after invention is still changing to much.

llothar
Автор

was there a section where the speaker didnt' get something major wrong? i'm 20 minutes in and i've seen an error in basicly every section.


"the old words are called katakana" no... he couldn't have said that could he? *rewind*... *confirmation* *rewind, set video speed at 1.0* *double confirmation*
...


'taiwanese' that isn't the language. you are talking about traditional han characters, and the opposite comment should have been made. instead of talking about how taiwan had 'han' characters, it should have been about how the red revolution in china led to a decision to change the caligraphy-set into a (keyword) simplified form in the effort of increasing literacy.


if you want to talk about chinese languages either stick to the simple bits, or go to a tenable level of fluency. for example if you want to talk about taiwanese as a language, then you should use all the different chinese dialects and the various ideas of grammatically correct character order as per each province. if you don't want to refer to these territory based dialect changes, you should stick to the macro variations of simplified and traditional characters.


there is a taiwanese language, or rather there was. and it was more available and relevant when the bird-and-worm script was used in china. if you arent' going to include bird=and-worm or seal scripts, you shouldn't be like "this one place had a script/language 200 years ago, but it is topical. but this other place which also had such a script back then, that one doesnt' matter. you are correct that taiwan had a non-chinese script. but it is kinda old. and as for korean not using hangul... yeah.... that is like the most famous linguistics story.


just like china decided to simplify characters by taking 'heart' out of 'love', back when korea had an emporer he also was bothered by the lack of litteracy of his people. so he tasked some intellectuals to create something which would "only take a dull man a week to learn". and they created an system similar to an alphabet. it was right between an a sylabary and an alphabet. every sound had a shape like an alphabet, but you shaped a triad of sounds together into a single character.


think of it like using american shorthand and american judicial shorthand and the APA scripts as necessary alternatives for american english.


it is like you skimmed wikipedia.

morthim
Автор

That hieroglyph is the sun? I call BS, the Egyptians invented the donut.

smoothbeak