PROBLEM SET 7: REGULAR, UM, EXPRESSIONS | SOLUTION (CS50 PYTHON)

preview_player
Показать описание


––– DISCLAIMER –––

The following videos are for educational purposes only. Cheating or any other activities are highly discouraged!! Using another person’s code breaks the academic honesty guidelines. This solution is for those who have finished the problem sets and want to watch for educational purposes, learning experience, and exploring alternative ways to approach problems and is NOT meant for those actively doing the problem sets. All problem sets presented in this video are owned by Harvard University.

–––
Рекомендации по теме
Комментарии
Автор

This passes all tests and is much simpler:
def count(text):
length = 0
if m_list := re.findall(r"\bum\b", text, re.IGNORECASE):
length = len(m_list)
return length

matissjansons
Автор

I had a play with this and I got it working. I used the following code

import re
import sys
def main():
print (count(input("Text: ")))

def count(input_text):
regex_to_match = r'\bum\b'
matches = re.findall(regex_to_match, input_text, re.IGNORECASE)
return len (matches)

if __name__ == "__main__":
main()

The only thing that is a bit odd is that it will count "__um" or "um__" as an instance of the word "um". The \b is looking for the boundary between an alpha_numerical character and any other character which is why it regards those examples as stand alone instances of the word "um". It might seem a bit peculiar to regard those as instances of "um" but the regex does so because the _ counts as a non alphanumeric character. Without list of allowed non alpha numeric characters such as, , . : etc, I don't think we can do it any other way that will allow for those list punctuations, but that also allows instance of "um" surrounded by any none alpha_numeric charaters, such as underscores.

chrism
Автор

there is still a bug for your code. it filed at umm. I think "\bum\b" is a good idea since function of \b already included all the requirements.

beibeiliu
Автор

this worked for me

import re


# ask for input from user
def main():
print(count(input("Text: ")))

# add a count and type the word being searched for
def count(s):

find = re.findall(r'\bum\b', s, re.IGNORECASE)
return len(find)
# return the result


if __name__ == "__main__":
main()

benymush
Автор

re.findall(r"\bum\b", s, re.IGNORECASE)

Reza_Ghamsari
Автор

Quick addition, They seem to not acknowledge apostrophe separated um words so generally don't include an exclusion for apostrophes

Venormous
Автор

Hi Giovanna. I am stuck on the "working 9-5" problem, could please help me with how to convert 12 to 24-hour format, please?

aigerimabseit
Автор

if the regex {r"\b\W*um\W*"} is used, wouldn't it match something like "umbrella" too?

stanfpv
Автор

My version of code:

import re

text = input("Text: ")
total = re.findall(r'\bum\b', text)
count = total.count("um")
print (count)

jayz
Автор

hello i have a question i cant open cs50 codespace and if i it opnes i cant use terminal can you help me?

radinfarmani
Автор

I was stuck for an hour and didn't notice there is \b, so I decided to remove the unwanted "um"s and only count those that is left. Here is my function:

def count(s):
"""Count the number of "um" that is not inside a word."""
s = re.sub(r"(?:\wum\w)?(?:\wum)?(?:um\w)?", "", s.lower())
if matches := re.findall(r"(um)", s.lower()):
return(len(matches))
else:
return 0

hahahaha-bmxs
Автор

Hi, I did all this, but when I try to check with Pytest, it does nothing. What's the problem?

javadmoh
Автор

As per my below code, the regex pattern I used is: r'(^um|\Wum\W|\bum\b)

import re

def main():
print(count(input("Text: ")))

def count(s):
return len(re.findall(r'(^um|\Wum\W|\bum\b)', string=s, flags=re.IGNORECASE))

if __name__ == "__main__":
main()

NomadLovesUs
Автор

I find this method to be easier to read

um_count = re.findall(r"\bum\b", s, re.IGNORECASE)

jincao
Автор

import re

def main():
print(count(input("Input: ")))

def count(str):
result = len(re.findall(r"\bum\b", str.lower()))
return result

if __name__ == "__main__":
main()

WatchThis_
Автор

seemed easier to do without regular expressions

def main():
print(int(count(input("Text: "))))

def count(s):
words = s.split(' ')
count = 0
for word in words:
if word[0:2] == 'um':
count += 1
return count

if __name__ == "__main__":
main()

Manveer_Dhindsa
Автор

I spend more time refreshing the vs code page to see that damn "$" sign and starting to write the code than writing the actual code and testing that. That's too boring.

zahramanafi
Автор

Your solution fails on this:

"Um, thanks for the um album."

culturedgaming
Автор

@DorsCodingSchool Hi Giovanna, Harvard changed the check 50, and you didn't end the raw string with \b. I failed the check 50 with a frown face :( um.py yields 2 for "Um? Mum? Is this that album where, um, umm, the clumsy alums play drums?"
expected "2", not "3\n. I fix it. Line 8 would be um_list = re.findall(r"\b\W*um\W*\b", s, re.IGNORECASE)
Check 50 was green
:) um.py and test_um.py exist
:) um.py yields 1 for "um"
:) um.py yields 1 for "Hello, um, world"
:) um.py yields 1 for "This is, um... CS50."
:) um.py yields 1 for "Um... what are regular expressions?"
:) um.py yields 2 for "Um, thanks, um, regular expressions make sense now."
:) um.py yields 2 for "Um? Mum? Is this that album where, um, umm, the clumsy alums play drums?"
:) correct um.py passes all test_um.py checks
:) test_um.py catches um.py matching "um" in words
:) test_um.py catches um.py with regular expression requiring spaces around "um"
:) test_um.py catches um.py without case-insensitive matching of "um"

Thanks for your videos. I am a beginner in coding! 😃

michaelnunezmd
Автор

i think in you code there is a bug, see using (r"\b\W*um\W") will not accept word 'um', instead we need to '\b\W*um\W/*'

nandanmaiya