Python Tutorial: String operations

preview_player
Показать описание

---

In this video, we'll learn more about string manipulation, specifically, about basic string operations.

Most data science projects involve string manipulation. Python has many built-in methods that allow us to handle strings. Let's check some of them.
Suppose we have a string like the one in the example code.

Sometimes, the analysis requires the string to be entirely lowercase.
We could use the dot lower method to convert all alphabetic characters to lowercase as shown in the output.

On the contrary, we might want the string to be uppercase.

We could use the dot upper method to convert all alphabetic characters to uppercase as displayed.

Lastly, we could use dot capitalize to return a copy of the string with the first character in uppercase while keeping all other characters in lowercase as displayed.

There are methods that can convert between a string and other types of data, such as lists by breaking a string into pieces.

Let's work with the following example.

We want to split the string into a list of substrings.

Python provides us with two methods: dot split and dot rsplit. Both of them return a list. They both take a separating element by which we are splitting the string, and a maxsplit that tells us the maximum number of substrings we want.

As we can see in the code, the difference is that split starts splitting at the left.

rsplit begins at the right of the string. If maxsplit is not specified both methods behave in the same way. They give as many substrings as possible. If you want the split to be done by the whitespace you don't have to specify the sep argument.

Consider the following string.

If we print it out, we can see that contains two lines. Why is that?

There are some escape sequences such as slash n or slash r that indicates a line boundary.

Sometimes, we want to split a string into lines. So in the case of our string, we want to split it at the slash n.

For this aim, Python has the method splitlines().

As we can see in the code, the string is split at the slash n sequence returning a list of two elements.

Some methods can paste or concatenate together the objects in a list or other iterable data.

This is the case for dot join method. The syntax is simple. It first takes the separating element. Inside the call, we specify the list or iterable element.
We can observe in the example, that whitespace is specified as a separator and the data type is a list.

The result is a single string containing all the objects in the list separated by whitespace.

Lastly, we'll talk about methods that will trim characters from a string. The dot strip method will remove both leading and trailing characters. Inside the call, we can specify a character. If we don't do it, whitespace will be removed.

Let's say we have the following string.

And we apply the dot strip method as shown.

We get a string where both the leading space and the trailing escape sequence were removed.

We can apply dot rstrip method and it will return a string where the trailing slash n was removed.

If we apply the dot lstrip method, we'll get a string with the leading whitespace eliminated.

Now that you know many built-in methods for string manipulation, you can start to put them into practice!

#DataCamp #PythonTutorial #RegularExpressionsinPython #Stringoperations
Рекомендации по теме