Python interview with a Google engineer: Edit distance string comparison

preview_player
Показать описание


Рекомендации по теме
Комментарии
Автор

"let me think this through"

*goes on stackoverflow for 5 minutes*

bags
Автор

bruh why does my heart rate go up when watching these

bloodphantom
Автор

For anyone learning - there is actually a bug in the final solution of ins_del_compare() (with tolereance = 1). For strings with edit distance > 1, but length difference = 1 (e.g. "abdeX" and "abCdeF" function will throw StopIteration instead of returning false.

GrzesiuG
Автор

Really good interview overall. Easy problem in the beginning but quite challenging under a set time limit by the end.
The ending is awkward though and I love it! "Good luck with all your interviewing!" "Thanks! You.. you as well." sounds like something I would do haha

SimonsTechShow
Автор

13:30 Minor nitpick. The space complexity of that code is O(n) just because range is used rather than xrange. In python2 range generates a list.

alcesmir
Автор

When they introduced the tolerance ... the solution i had was just a improvement of his first solution
Something like :
do you see any edge case i didnt think about ?

def case_insensitive_compare(s1, s2):
err_margin = 3
l_s1, l_s2 = len(s1), len(s2)
index, err_counter = 0, 0
while index < max(l_s1, l_s2) and err_counter <= err_margin:
c1 = s1[index] if index < l_s1 else c1
c2 = s2[index] if index < l_s2 else c2
if c1.upper() != c2.upper():
err_counter += 1
index += 1

return err_counter <= err_margin

tarik
Автор

I'm majoring in Comp Sci. and seeing interview questions like these get me scared cuz I like have zero confidence in my abilities.

porlearktuy
Автор

i'm pretty sure the interviewiee was searching the internet, you can hear clicks and typing during the long silences

govindrai
Автор

#clickbait I wanted to see a Mighty Eel vs. an Intergalactic Avenger. This isn't even a SciFi channel movie.

chrisconley
Автор

A different approach to the second part with the tolerance!

def str_comp_dist2(str1, str2, tolerance):

list1 = max(list(str1.lower()), list(str2.lower()))
list2 = min(list(str1.lower()), list(str2.lower()))
len_missing = len([True for i in list1 if i not in list2]) + len([True for i in list2 if i not in list1])
if len_missing <= tolerance:
return True
else:
return False

karthikmysoreprakash
Автор

i did this for the second part of the problem:
def compare(str1, str2):
err=0
order = sorted([str1, str2], key=lambda x: len(x), reverse=True)
if len(order[0]) - len(order[1]) > 1:
return False
for i in order[0]:
if i.upper() not in order[1].upper():
err +=1
if err > 1:
return False
return True

garciajero
Автор

I dislike how the interviewer went RIGHT into the question. No greetings, no introduction... The guy said “it’s my first one of these” and the dude literally didn’t answer lol

Philgob
Автор

This is painful for me because I approach so many problems with future-proofing in mind. I started mapping a solve out in my head and I immediately assumed the difference tolerance should be optional and settable to any integer. That would make this problem particularly difficult for me because I'd be distracted the whole trying to make such a narrow solution.

davemittner
Автор

The guy knows some python (2 though) for sure, but he's never taken an algorithm course, Levenshtein distance and dynamic programming are so standard he would have at least remembered when he heard "edit distance". It's pretty clear he's reading about it at some point. From my point of view that's a long way from the technical level a Google engineer should have!

artemisheller
Автор

Even in the best case, final code around 12.25 will have time complexity of O(n) since he is checking length of both the strings. And len() function has linear complexity.

dalchemistt
Автор

Tried to follow along, this was my java solution. Tips appreciated!


public static boolean caseInsensitiveCompare(String a, String b){

int editDistance = 0;
int editLimit = 1;
int smallLength = a.length() > b.length() ? b.length() : a.length();
int diff = Math.abs(a.length() - b.length());

if(diff > editLimit){
return false;
} else {
editDistance += diff;
}

for(int i = 0; i < smallLength; i++){
if(editDistance > editLimit){return false;}

!=
editDistance++;
}
}

return true;
}


I could also stick the tolerance as a method parameter and replace editLimit with the tolerance value

Chuchfala
Автор

Well since he did the first function really well, I think a simple counter would do just fine. When they have the lengths different by 1 for instance, you can book a key, value pair for each character and their counts. First string will create the hash and second will add on to it. In the case 'ABC' and 'ABCD' for example the hash would look like: <A, 2>, <B, 2>, <C, 2>, <D, 1>. If there are more than one '1' in the value list, it will return false. When user is asked for the edit distance, you do the same with one difference and that is you will not check the false statement for 'more than one 1 in the value list' but more than <user_input> 1 in the value list'. I haven't tried this just typing while watching but I think this works and the code will look a lot cleaner and easier.

sinanyaman
Автор

There is an error with the code at 30:00. Let's take the input "AA" "BAB", in that case the length check passes, but the checks will go like
"(A)A" "(B)AB" not a match, skip in longer to get "(A)A" "B(A)B" which matches
"A(A)" "BA(B)" not a match, skipping the longer string raises a StopIteration (and the program crashes)

alcesmir
Автор

def compare(s1, s2, tolerance):

its1 = iter(s1)
its2 = iter(s2)
miss = 0

for _ in range(max(len(s1), len(s2))):
if next(its1, 'none').upper() != next(its2, 'none').upper():
miss = miss + 1

if miss > tolerance:
return False
return True

print(compare('AB', 'abcde', 2))

ijazahmad
Автор

For 8:31, if you're trying this stuff, please, please, use Python 3. There's no point in using Python 2 nowadays. It's soon ending support (January of 2020 if I remember correctly) and Python 3 provides quite a few nice abstractions to deal with the pain of strings and encoding. Also not to hate, but that solution is atrocious, a much better solution, which nice Python 3 abstractions would have been:


def case_insensitive_compare(s1, s2):
if len(s1) != len(s2):
return False

if s1 is s2:
return True

return all(a.casefold() == b.casefold() for a, b in zip(s1, s2))

---

This makes use of a few tricks. First it checks if s1 and s2 are the same object in memory with is. Python (or CPython anyway) is quite smart and will try to optimise by storing a string in memory, and if another variable with the same string is created it will simply point to the same memory address (you can test this by creating two variables with the same string characters, and doing id(string) on each)

Second, all() with the generator comprehensions and zip(). zip() returns a tuple of the iterators you pass it, in this case s1 and s2. It stops when one of the iterators is consumed. all() takes an iterator of True and False values, and if all are True it returns True (or False at the first False value). By using a generator comprehension, we can make it short circuit (so end early) at the first False value.

Third, casefold(). This is Python's 3 attempt, with Unicode, to do case insensitive comparisons. Plenty of languages have special characters, with special case rules. casefold() provides the base lower string to use for comparison.

Sighery