str vs bytes in Python

Показать описание

strings vs. bytes, what's the diff?

SUPPORT ME ⭐
---------------------------------------------------
Sign up on Patreon to get your donor role and early access to videos!

Feeling generous but don't have a Patreon? Donate via PayPal! (No sign up needed.)

Want to donate crypto? Check out the rest of my supported donations on my website!

Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, Neel R, Matt R, Johan A, Casey G, Mark M, Mutual Information

BE ACTIVE IN MY COMMUNITY 😄
---------------------------------------------------

CHAPTERS
---------------------------------------------------
0:00 Intro
0:20 str and bytes syntax
0:50 str and bytes functions
1:29 they don't mix
2:17 amazing sponsor
2:40 smiley
3:33 the meaning of bytes
4:53 encodings
6:07 dangers of not specifying encoding
7:21 warn default encoding
7:35 utf-8 mode
8:06 Outro and thanks

Рекомендации по теме

Комментарии

1:34 Fun fact: separating bytes from strings was the most important major breaking change between Python 2 and Python 3. Trying to keep strings as byte-encoded led to all kinds of unfortunate trouble in Python 2, which could not be fixed without sacrificing backward compatibility.

And they thought, while they were breaking things anyway, they might as well fix a few other things in a cleaner, non-backward-compatible way while they were at it.

lawrencedoliveiro

When I started learning Rust, this was something that actually comes up quite a bit, since you can't iterate over a string object (you don't necessarily know its encoding at compile time). It was the first time I realized that the difference between ascii, utf, and others is actually really important!

BenjaminWheeler

Windows encodings are a real nightmare.
There are the OEM/MS-DOS codepages used by the console which make almost impossible to consistently write non-English characters from a .bat script.
Then there are the "ANSI" codepages which are used by the Win32 functions accepting strings as char pointers (e.g. MessageBoxA). It is usually Windows-1252 in western countries which is a slightly incompatible variant of ISO 8859-1 (also known as "Latin1").
Then there are the "Unicode" strings/MBCS/wchar_t pointers which are actually UTF-16 (even MS documentation states wrongly that "Unicode is a 16-bit character encoding"), meaning that Emojis will probably work in some places and not in others (try calling MessageBoxW with an emoji...). Except not really because in some cases it is UCS-2 instead of UTF-16 (another slightly incompatible variant). BTW, at least until recently you needed to add the BOM character to make stuff like notepad to recognize a UTF-16 file.

Note that NONE of those encodings are UTF-8.

japedr

Wow really informative. I wrote most of a project in windows, started using it in a Linux Google cloud VM, but I realized some of my data in a csv file was invalid.

In the interest of getting a proof of concept out quick, I just quickly wrote a script in the VM that opens the file as a pandas dataframe, removes the invalid rows, and stores it as a csv file again. Except when I went to open this new file before giving it to my ML algorithm, it kept telling me the file was corrupted. I couldn't understand it, I was at a total loss, and I ended up just writing another hacky solution in which if I encountered an error loading one of the rows during the training process, I would just default to loading the first row instead.

This makes total sense that this could have been the problem. Thanks James!

timogden

When you're completely unsure what the encoding of any file that you're processing is, the chardet package is really helpful.

WalterVos

Its crazy how one day I am wondering about something and a week later you have a great video on it. Thanks for another great one!

kyleaustin

Looking forward to "from __future__ import default_encoding".

mrtnsnp

It's crazy how much you come across decoding/encoding issues in the wild. I sometimes work with large text datasets with mixed encodings, sometimes even in the same line! The worst is that if you try and decode with the wrong encoding it can raise a runtime error, so I ended up writing a short program with a bunch of try/excepts for the different possibilities (utf-8 first of course). I did the same thing when I worked in C and Tcl. Gotta be a better way...

cleverclover

As I was searching the interwebs as to what a type 'byte' was and how to convert it to a string, my YT refreshed and there was this video at the top of my subscription, 4 minutes old. This timing was apropos.

SeanCrites

Thank you! I' ve always struggled to understand the diff between str, bytes and what is encoding. And now I finally understand! Thank you 😊

che_kavo

Decoded the mystery in a few minutes, thank you! ☺

finnthirud

Yet another informative and well put video. Thanks!

nitishvirtual

5:25 Not just the most popular, but some languages, including Python, have embraced Unicode to the point that identifiers can contain any Unicode characters that are classed as “letters”. So for example while “in” is a reserved word, “ın” is not, and can be used as an identifier.

lawrencedoliveiro

Hey James, I’d love to see a video showcasing how to use the Textual package. It’s really neat, and fits your style.

AngryArmadillo

That was the best message from a sponsor I've ever seen.

tiagomacedo

New subscriber here. Just wanted to say that I love your videos. Very informative and fun to watch! Keep up the good work

eternlyytc

I absolutely love your videos. Regardless of my familiarity with a topic, every video seems to have some piece of information that I would not have discovered on my own. I never knew that files were encoded with the system encoding unless specified. It has never been an issue, but I know that one day it will be and without this knowledge, I would have really struggled to identify the issue. Future me really appreciates your hard work.

lethalantidote

interesting theme. waiting for your next video :)

denyspisotskiy

very good explanation, Cleared my head :)

Excellent video, thank you.
I've had a couple of issues where I've had to use the IO and locale libraries to "fix" encoding shenanigans, but I think if I revisited those lines I'd now have an actual understanding of what was happening, how the changes worked and, most importantly, how /to do it better/.

jasonhenson

str vs bytes in Python

str vs bytes in Python

What is the difference between str and bytes? - Python Basics for Experts 1 (Part 5)

Bytes and encodings in Python

Bytes and Bytearray in Python | Python by Surendra

str vs bytes in python

Python Basics Tutorial What are bytes?

expected string or bytes like object | Python Programming

Best way to convert string to bytes in Python 3?

Lessons learned from using gRPC in production with Chris Price

Bytes and Bytearray tutorial in Python 3

Python standard library: Byte strings (the 'bytes' type)

Differences Between Simple String and Bytes Strings - Python Recipe

Bytes vs Strings in Python: Mastering Encoding and Decoding

Python bytes() | Complete Guide

How to convert Bytes to String in Python | Decode the byte | bytes in python | string in python

Python String and Bytes Methods, Operators and Functions (Theory of Python) (Python Tutorial)

How to convert bytes to a string in Python

Python for Bits and Bytes

Convert Bytes to String [Python]

How to turn a Python string into bytes

How to convert bytes to a string in Python? | Decode the Byte

Data Types in Python | Day 4 - str, bytes, bytearray, range | Learn Python From Scratch #python

PYTHON : Python 3 - Encode/Decode vs Bytes/Str

PYTHON : Python 3 - Encode/Decode vs Bytes/Str