All Rust string types explained

preview_player
Показать описание
Today we are diving into the surprisingly complicated world of strings. Not only will we learn about the fundamental data structures behind strings, but we'll also discuss how string are implemented in C and more importantly how they are implemented in Rust.

Chapters:
0:00 Overview
0:32 String fundamentals
3:26 Strings in C
4:39 Strings in Rust - safety
5:30 Strings in Rust - Strings and &str
7:37 Strings in Rust - &'static str
8:47 Strings in Rust - Box str
10:03 Strings in Rust - Rc str
10:36 Strings in Rust - Arc str
11:04 Strings in Rust - Byte representations
12:05 Strings in Rust - String literals
13:13 Strings in Rust - Specialized strings
15:13 Strings in Rust - Interoperability strings
18:23 - Summary
Рекомендации по теме
Комментарии
Автор

Note: While ASCII characters are stored in bytes, it only uses 7 bits. Meaning it only supports 128 distinct values (not 256 distinct values). There are encodings that extend ASCII to use the 8th bit, though, like e.g. ISO-8859-1 (aka latin1). But that's not ASCII.

blenderpanzi
Автор

A (mostly retired) person who has used (still use) C.
C is concentrated, very very smart, for its age and still pretty important.
If you want to understand the cleverness of Rust, write your experiments in C, fall down the holes, and you will understand why it is clever.
(In general, try to understand one level down)
(Oh, and get close to the metal. There is the fun!)

chrissaltmarsh
Автор

The Rc and Arc examples are actually a little misleading. The Rc::from() call will actually clone the str slice that it is given. Just like the Box<str>, the Rc<str> owns its data. All the Rc::clone calls will not clone the string data further though :)

chrraz
Автор

I would like to thank you for creating such well explained videos. As a rust novice I learned so much from your videos.

andrei
Автор

Great content. Those types may seems messy but they feels so natural as a c programmer.

ishi_nomi
Автор

I don't get why the Rc<str> isn't thread safe compared to Arc<str>. Since it's immutable, the threads won't be able to modify it's value, so, why do you need to make it atomically accessible? There is no threat by only reading a shared data, or am I missing something?

maltamo
Автор

I want an array of characters!

YOU CANT HANDLE AN ARRAY OF CHARACTERS!!!

Alguem
Автор

I feel that RUST fights me when I am working with internationally standardized protocols that assume ASCII formats and with tools that only care about ASCII and use that encoding as optimization. Rust needs an ANSI-C compatible ASCII-only type to interoperate with C serializations where high performance is a fundamental requirement... there are two scenarios I work with on a day to day basis where this is important:

1. processing of simple data that is ascii tag to ascii data... where the code optimization to assume ascii reduces parsing overhead.
2. processing of low level serialization standards that not only assume ascii but depend upon ascii... where unicode checks are unnecessary and degrade application performance.

Rust needs string types that specify encoding. Instead of "String" it needs "String<ENCODING>" (or some similar syntax) types. Then... String<C> and String<ASCII> and String<UTF8> and.... any other interesting encoding can be clearly communicated and converted through standard means. Stop fighting us and provide good abstractions that don't force upon devs a one world policy... let the world be open and instead establish standards for conversion.

mback
Автор

Great video. I do feel like these videos need a bit more of the "behind the scenes" animations of how the bits are stored and handled, because they do a great job in driving the point home

giladkay
Автор

My first CS course was in C and I never quite understood much of these concepts but your video just explained it so well. I would love to understand better how some of these C characteristics lead to security and runtime vulnerabilities and how Rust prevents them.

MrKaNuke
Автор

Few questions:
1. You initially said strings in rust are immutable. But later presented "&mut str" saying that it is mutable string, which allows in place transformations.
What did you mean by former statement then ?
2. You said "Box<str>" doesn't have the capacity information. By capacity, do you mean the length of string ?(usually capacity != used size).
If you meant capacity as in a vector, then what is difference between "Box<str>" and "&str" as both don't have capacity?
If you meant capacity = size, then how can this string be used in reality ? As lack of size & no null-termination would mean that we can't know where the string ends.

avinashthakur
Автор

4:30 so don’t be lazy and either provide a big enough buffer if you’re going to use magic hard coded numbers or use one of the tiny handful of functions c has to query at run time what size is needed.
It’s seriously not that bad

KyleHarrisonRedacted
Автор

To add to this allready great video, 'static doesn't really mean that the data is going to be there for the entire lifetime of the program. 'static is applicable anywhere when the variable is guaranteed to be unbound by any lifetime restrictions. So, you may find 'static added to trait bounds that apply to types such as Rc<str>. You may also find &'static applied to lazily instantiated variables using the lazy_static crate.

MRL
Автор

Loved your video❤
Easy, simple and precise detailed explanation of the cause behind each string type.

Makes us realise that on the low level, how complex very simple but fundamental aspects like strings can be.

Keep up the good work 👍🏻

exhilex
Автор

This is the free bible of strings! Thank you so much for the effort, this is string guide to everyone, both new and advanced alike!

Aucacoyan
Автор

Small errata I noticed: at 18:15 the null check seems to be inverted.

OmnipotentEntity
Автор

Thanks for this video! If we compare the data types of rust and C, we need at least to distinguish between char * and char[] on the C side. If we want to be super correct, we also need `const char *` and `const char[]` too. Also, it's PathBuf, not PathBuff :)

gottox
Автор

I think there's an error in the CString example ( 18:00 ).

The branch for success getenv call should be execute when `!val.is_null()` is true. The negation keyword `!` is missing.

edwin
Автор

C also has char16_t, Unicode library strings, other library strings like Glib probably has something like that for dynamic length strings, often a program may implement their own strings without using null termination which libc has partial support for already, etc

kreuner
Автор

I don't get Arc<str> for immutable strings. If it's immutable, why is any synchronization necessary?

EbonySeraphim