filmov
tv
Python Tutorial: Text Munging with regular expressions

Показать описание
---
Regular expressions are a powerful tool for processing text.
We will use them for matching messages against known patterns,
for extracting key phrases, and for transforming sentences grammatically. These are the core pieces we need to create our ELIZA style bot.
Much of the magic of the ELIZA system relied on giving the _impression_ that the bot
had understood you, even though the underlying logic was extremely simple.
For example, asking ELIZA "do you remember when you ate strawberries in the garden?",
she would respond: "How could I forget when I ate strawberries in the garden?".
Part of what makes this example so compelling is the subject. We are asking about _memories_,
which we associate with our conscious minds and our sense of self. The memory itself,
of eating strawberries in the garden, invokes powerful emotions. But if we pick apart how the
response is generated, we see that it's actually quite simple.
To build an ELIZA-like system you need a few key components. The first is a simple
pattern matcher. This consist of a set of rules for matching user messages, like
"do you remember x"
To match patterns we use a technology called *regular expressions*, to use these in python we `import re`.
Regular expressions are a way to define patterns of characters, and then seeing if those patterns occur in a string.
In regular expressions, the dot character is special, and matches *any* character. The asterisk means "match 0 or more occurrences of this pattern", so "dot star" is basically a catch-all, it says match any string of characters.
We can check whether a message matches a pattern by calling re dot search brackets pattern comma message. This returns a match object.
If the string doesn't match the pattern, the match object will be `None`, so we can check if the string matches using a simple if statement.
Adding parentheses in the pattern string defines a `group`. A group is just a substring that we can retrieve after matching the string against the pattern.
We use the match object's `group` method to retrieve the parts of the string that matched. The default group, with index 0, is the whole string. The group with index one is the group we defined by including the parentheses in the pattern.
To make responses grammatically coherent, we will want to transform the extracted phrases from
first to second person and vice versa. In English, conjugating verbs is easy, and simply swapping
"I" and "you", "my" and "your" works in most cases.
For example, take the sentence "I walk my dog".
"You walk your dog".
The final step is to combine these logical pieces together.
We start with a pattern and a message. We extract the key phrase by creating a match object using pattern dot search, and then use the group method to extract the string represented by the parentheses.
We then choose a response appropriate to this pattern, and swap the pronouns so that the phrase makes sense when the bot says it.
We then insert the extracted phrase into the response, to partially echo back what the user talked about, giving the illusion that the bot has understood the question and remembers this experience.
Now it's your turn to build your own eliza style chatbot.
#Python #PythonTutorial #DataCamp #Chatbots #Python #TextMunging
Комментарии