R Tutorial: Regular expression basics

Показать описание

---
Hello and welcome to this introduction course on natural language processing in R.

My name is Kasey Jones, and I will be helping you along this journey to master the fundamental elements of NLP in R.

So what is Natural language processing? Basically,

NLP focuses on using computers to analyze and understand text.

In this course, we will be ambitious and cover topics such as

classification,

topic modeling,

named entity recognition,

sentiment analysis and others. Each topic that we cover will prepare you for real analysis of text and help you better understand how you can apply NLP to learn from your data. Let's jump right in to our first examples by exploring regular expressions.

Regular expressions are just

sequences or patterns of characters used to search text. Analysts use regular expressions for all kinds of tasks, including

searching files in a directory using the command line,

finding articles that contain a specific pattern of text,

replacing specific strings,

and several other use cases. The most general way to use regular expressions is to specify what you want to search, and what you want to find.

Let me give two concrete examples in R,

you could simply search some words for every mention of a number

and return the index for where those words occur,

or look for all words that include an apostrophe

and return those words. If you are new to regular expressions, seeing \d probably doesn't make a whole lot of sense. So let's look at what \d is, and some other basic regular expressions.

In order to search text for a pattern, you need to use the correct syntax. This could be a single character, such as \w, representing a simple search, or a large group or characters, representing a complex search. First up, we look for alphanumeric characters with \w.

Next, we can find any single digit with \d. In order to expand these searches past a single letter or number, we use something called a wildcard.

In this case, adding "+" behind w or d, allows us to find a word, or a digit of any length.

Next, we can look for spaces, with \s. Allowing us to find breaks in long sequences of characters.

Finally, we can negate any expression by using a capital letter. In this case, we look for any non-space character. We can do the same for non-digits, and non-alphanumerics as well.

In R, writing the expression is only half the battle. We also need to use the right function. Both base R and the stringr package have great functions for searching text.

Using base R, we can use two very common functions. grep, which will find all matches of a pattern in a vector of strings,

and gsub, which will replace all matches of the regular expression in a string or vector. There are many other functions we can use, but if we master these two, we can master the others as well.

There are several great resources out there for practicing regular expressions and learning about the complex patterns that can be created. I have provided a link for one such example here. If you want to learn more about combining expressions, or including certain letters while excluding others, I would suggest exploring this resource.

Let's explore a few examples of using regular expressions.

#R #RTutorial #DataCamp #Natural #Language #Processing #Regular #expression

Рекомендации по теме

Комментарии

The matching characters exercises were very useful. Thanks!

Jaffizy

Would you mind fixing the CC sometime? It is way off. Thanks!

SylvanBat

It is not like the usual regular expressions in Unix. In R we are adding extra \, for \d. I am still not getting used to this change, and how to change the pattern in R in compatible.

bhargavapothakamuri

R Tutorial: Regular expression basics

R Tutorial: Regular expression basics

Learn Regular Expressions In 20 Minutes

REGEX (REGULAR EXPRESSIONS) WITH EXAMPLES IN DETAIL | Regex Tutorial

R 2 Minutes Tutorial 12: regular expressions

How to write a simple regular expression in R using sub and str_replace (CC183)

Regular Expressions (RegEx) in 100 Seconds

R Tutorial | Regular Expressions in R

Regular Expressions in R

ENGLISH GRAMMAR LESSON — When to use: 'some' and 'any', 'do' and &a...

Regular Expressions in R language (With Examples)

Regex Basics | Match, Extract, and Clean Text

regular expression in R

02 - Regular Expressions ( [^] + * \w \d \s \b )

Creating an advanced regular expression in R with str_replace and separate (CC184)

Regular Expressions (Regex) Tutorial: How to Match Any Pattern of Text

How to Write Regular Expressions Without Going Crazy (Beginners Tutorial)

Basics of Regex (Regular Expressions)

Regular expression tutorial. Regex tutorial. Step by step learn the basics of regex.

Regex Tutorial | Regular Expressions Explained

Locate & Extract Regular Expression Match in R (2 Examples) | regexpr, regmatches & stringr ...

Regular Expressions Made Easy with Java - 2019 Tutorials

[5 Minute Tutorial] Regular Expressions (Regex) in Python

Regular Expressions in Python

Working with Regular Expressions Using StringR in R