Unicode in Rust - Illustrated by Kanji - Jenny Manning

Показать описание

Have you ever wondered why you can’t look up a character in a string by its index? Or why the length of a string can be wildly different from the number of characters in the string? In this talk, we’ll dive into Unicode by looking at how Kanji is represented in Rust. You’ll learn about things like the Han unification, the origins of CJK languages from Oracle bone script, and why Rust handles strings differently than we expect.

Рекомендации по теме

Комментарии

I feel like this talk painted a fairly unfavourable picture of Han unification. Although the limited space in Unicode could perhaps be seen as one of the driving forces behind it, in reality what was unified and what wasn't was decided mostly on the basis of what East Asian encodings already did. And those East Asian encodings decided their principles for unification based on fairly sound principles. You can't encode every minor glyph variation separately or else you end up with a mess of a system where text is encoded incredibly inconsistently due to the large number of duplicates. Even if Unicode were to be designed from scratch today without any space limits, Han unification would end up very similar to how it is now.

noname

TBH, Most pragmatically useful content is at the beginning and end. IMO, the content in the middle while nice background information is a little light on utility.

@ ~ 1:20 Ms. Manning points out difference between string.len() & string.char().count()

@ ~ 15:40 Ms. Manning talks over UTF-8, UTF-16, & UTF-32

Chris-onbt

Unicode in Rust - Illustrated by Kanji - Jenny Manning

Unicode in Rust - Illustrated by Kanji - Jenny Manning

Introduction to Unicode with Rust

ASCII, Unicode, UTF-8: Explained Simply

Character data type in Rust || char || Rust Programming

Rust 101 - Char data type in Rust

Unicode at gigabytes per second

Rust Zürisee, Dec 2022: Next Generation i18n with Rust Using ICU4X

RustFest Zürich 2017 - Type-safe & high-perf distributed actor systems with Rust by Anselm Eickh...

Understanding Text Encoding in Go: ASCII, Unicode, and utf-8 Explained

Gping: Rust Clones For Everyone And Everything

Lets Learn Rust - 8 - Understanding Booleans and Characters

Introduction to Rust - Part 11: Real-World Interfaces and Error-Handling

Rust Basics 2024: Lesson 1 | Primitive Data Types

Common Collections in Rust

RustFest Zürich 2017 - Impractical Macros by Alex Burka

Monotron - a 1980s style home computer written in Rust - Jonathan Pallant [ACCU 2019]

RustFest Zürich 2017 - Mistakes to avoid when writing a wrapper around a C library by Pierre Krieger...

Rust general types overview and examples

Rust Program: Pig Latin

RustConf 2016 - A Modern Editor Built in Rust by Raph Levien

Lets Learn Rust - 39 - Understanding Strings

Python vs Rust for Simulation

Syntax conveniences afforded by the compiler — Tshepang Lekhonkhobe

ICU4X: Supercharging i18n :: IMUG 2023.06.21