Strings in Rust FINALLY EXPLAINED!

preview_player
Показать описание
The ultimate Rust lang tutorial. Follow along as we go through strings in Rust. We will be talking about UTF-8, the &str and String types, indexing into strings, and more!

Chapters:
0:00​ Intro
1:09 What is a string?!
6:53 &str and String
10:21 Creating strings
12:19 Manipulating strings
13:31 Concatenating strings
15:29 Indexing into a string
19:44 Strings and functions
20:32 Outro

#letsgetrusty​​ #rust​lang​ #tutorial
Рекомендации по теме
Комментарии
Автор

This has become my favorite RUST channel on YouTube.

antonioquintero-felizzola
Автор

Little note about function parameters:
Taking &str instead of String is good, but only if you don't need an owned String. If you do need an owned String, make sure to take a String, so the caller can decide how the owned string is generated. (For example, the caller might already have a String. If you take a &str, you need an unnecessary clone)

Now to contradict myself: If you do need an owned String, it might be best to use an impl Into<String> instead. This way, the caller can pass in a &str as well. Improves ergonomics.

In the same vein, taking impl AsRef<str> instead of &str also allows your function to take an owned string. It is trivial to put an & in front, so it's not as important as Into, but it also slightly improves ergonomics.

Depending on what you actually do with it, you might even want your string input to be IntoIterator<Item=char> or something. This does decrease ergonomics, since the caller now needs to call chars on the string, but it does mean your function will also work with, for example, Vec<char>.

If you sometimes return an owned string and sometimes a &str, you can use Cow. For example, if you sometimes return a string literal, Cow<'static, str> will be your best friend.

Granted, all this is more API design than strings, but it's good to know regardless. You don't even need to remember all of this, just know that this exists, so if you run into any issues with your functions being too specific, you know roughly what to do and can look it up.

Yotanido
Автор

I think this is probably your best video yet. It's great that you've gone a little bit deeper. Programmers really need to know this stuff. Thanks for your effort.

jonathanmoore
Автор

As a note on your comment at the end: The type that you said that Rust does not have would be represented by Vec<char> in Rust applications. It is not equivalent to rune slices in Go ([]rune), but intended for the same usage.
In general, go slices are similar to rust vectors. However, there is a difference between char and rune: In Go, rune is an alias for int32. In Rust, char is its own type. With Rust's emphasis on memory safety, safe Rust code cannot generate invalid chars. This means that it's behavior is different from u32, the type that it would otherwise be equivalent with. In Go, it is perfectly possible to create meaningless runes. The same is true for strings by the way, safe Rust code cannot generate invalid UTF-8 values, but no such limitation exists in Go.

As I understand it, these are equivalent types between Go and Rust:
- string -> &[u8]
- rune -> i32
- byte -> u8
- []byte -> Vec<u8>
- []rune -> Vec<i32>
- [7]byte -> [u8; 7]
- [7]rune -> [i32; 7]

The other Rust types that we mentioned (char, &str, String, Vec<char>, etc) have additional memory safety guarantees that Go does not provide.

upgradeplans
Автор

At 20:50 you talk about fixed-length encoding using four bytes. This is what UTF-32 does. AFAIK none of the major languages uses it, but Python has an interesting take on it: When a string is created, the interpreter chooses the “best fit“ between ASCII, UTF-16, and UTF-32, so that constant indexing is always possible but not too much memory is wasted. This of course only works because Python strings are immutable.

JaycenGiga
Автор

This might be the best string and UTF8 encoding video I’ve ever seen. So many experienced, professional programmers really do not understand how Strings actually work, even in their own language or choice. And it truly did demystify for me Rusts behavior around the different string types. Much appreciated.

billhurt
Автор

Def one of the best, light weight (and logic dense) videos I've seen regarding rust string types from a practical standpoint (concerning aspiring Rustaceans).
Excellent vid.

NOPerative
Автор

Keep it up!! As Rust grows, you will one day be remembered as one of the O.G. Rust youtubers!

mistakenmeme
Автор

Probably the best explanation of string, utf-8, ascii types I’ve encountered in my 15 year career! Keep up the good work!

abhishekdas
Автор

This earns you the title of professor. Hardly seen anything better explained than this!

xedb
Автор

This was driving me crazy last week.

Really glad to see good, thorough showcase of this concept in Rust.

Appreciate your content! Keep at it, this will be big when Rust blows up.

dabzilla
Автор

Liked the little explanation of UTF-8 encoding. Thanks for making the videos. It helpful to refresh what I’ve read and get a few tips too

cramhead
Автор

Не очікував тут побачити земляка, дякую за туторіал)

ko-ko-ko-la
Автор

Finally! A great explanation of what's going on between string slices and Strings -- thank you! I also appreciated your delving into unicode encoding -- I was worried you were going to rabbit hole on binary representations of a bit characters, but you did exactly the right thing in terms of explaining how unicode encoding works, how it solves the "where am I" when you have a pointer to an arbitrary byte in a unicode string (i.e. "am I at the start?" "where is the next char boundary?") -- I love that you explicitly mentioned that the first byte of a multicode byte string is differentiable based on the high order bits, and that it encodes the length of the multibyte sequence. You mentioned indexing into a Unicode string was a linear operation, which is true, but it's sub-linear in terms of number of bytes explicitly traversed -- if you have 4 4-byte unicode chars in a string, traversal takes only 4 operations, not 16, due to this clever encoding of the first byte.

jamesbond_
Автор

awesome video, just a small thing. Newer programmers may be confused by the lookup vs search times for a character. For UTF-8 (or any variable length encoding) if you want to lookup the nth scalar or grapheme you need to do a linear walk through of the string to count off every time you get to the end of a sequence of bytes representing a scalar/grapheme but for a fixed length encoding (runes, UTF-32) you can rely on each unit (scalar/char/grapheme) being a fixed size and you can just skip (n - 1) * 4 bytes (UTF-32 uses 4 bytes per scalar) to land in the right place. Not sure if I just confused a bunch of people or added clarity for some.

primingdotdev
Автор

Wow, this is extremely useful! I come from the Java world and it was really confuse me when working with Rust string, especially when I need to deal with ideographic characters. Thank you!

chanhhua
Автор

I learned something new about how UTF-8 works! Thank you!

WizardOfArc
Автор

The thing I like about Rust is you can take a buffer out of one type and transfer it to another. Like for instance you can convert between String, Vec and Box, while keeping the same underlying buffer without reallocation and copying. Hope C++ would have this. It had node transfer in limited form for lists and in newer standard for maps.

skyeplus
Автор

2:04 you could create an array of chars or Vec<char> since chars are 4 bytes long.

JakobKenda
Автор

Namaste brother. You’re videos are too good. Keep this format going, tackling each topic standalone or mixed if it’s contextually relevant.

johnyepthomi