Google Machine Learning Engineer Python Interview

Показать описание

Preparing for an interview as a Machine Learning Engineer can be challenging. Watch how Aly, a Senior Research Data Scientist at Facebook, breezed through a Python coding interview question from Google that requires a dictionary representing a histogram of the dataset.

Full video on Interview Query question page:

Quick Links:
00:00 Question
00:25 Clarifying Questions
01:35 Solution Walkthrough

More from Jay:

#DataScientists #DataScience #InterviewPrep #MockInterview

Рекомендации по теме

Комментарии

Nicely done!

Just to note that none of the tests cover cases where a bin would have zero values.

At a quick glance, it could be remedied by initializing a `count` variable to zero (instead of`histogram[bucket_str] = 0`) before the last for-loop.
Once you break out of the loop, you can check if the count is non-zero before adding it to the histogram.

Snorrason

The number of bins should be (max -min)/x

yemanbrhane

The solution is wrong... and it's quite easy to test how :
dataset = [7, 8, 9]
x = 2
returned : {7-9 : 3}

here's a pretty much 'coverall' solution using no libraries :

def automatic_histogram(dataset, x):
histogram = {}

if len(dataset)==0:
return histogram

if len(set(dataset))>x:
return histogram

min_number = dataset[0]
max_number = dataset[0]

for i in dataset:
if i < min_number:
min_number = i
if i > max_number:
max_number = i

if min_number==max_number:
return

width = (max_number-min_number)//x

edges = [min_number, *[min_number + (i+1)*(width+1) for i in range(x-1)]]
bins = if edges[i]!=edges[i+1]-1 else str(edges[i]) for i in range(x-1)]

if edges[-1]==max_number:
bins.append(str(max_number))
else:

for b in bins:
histogram[b] = 0

for i in range(x-1):
for d in dataset:
if edges[i]<=d<edges[i+1]:
histogram[bins[i]] += 1

histogram[bins[-1]] =

return histogram

tokpot

Started with not using sort to decrease complexity from Onlogn then added a for loop in while loop for On².

Btw very nice solution, wasn't able to solve myself.

piyushagarwal

instead of using loops for calculating min and max we could easily use the min(dataset) and max(dataset) and use Counter to count the number of elements in the dataset.
here is my solution:
def histogram(data, x):
res={}
min_value=min(data)
max_value= max(data)

bins= math.ceil((max_value)/x)

counts= Counter(data)
if max_value==min_value:
return
i=min_value
while i <=max_value:
if (i+bins-1)<=max_value:
rng= str(i)+'-'+str(i+bins-1)
elif i==max_value:
rng=str(i)
else:
rng= str(i)+"-"+ str(max_value)
for j in range(i, i+bins):
if not rng in res:
res[rng]=counts[j] if counts[j] else 0
else:
res[rng]+=counts[j]
i+=bins

return res

print(histogram(data, x))

nedaebrahimi

My solution (using a map for easy value counting, without importing Counter). I believe it is fairly robust, though the question has some ambiguity. In particular around how to handle inexact division (though we can infer from the example that we want to drop extra range from the final bucket most likely. Additionally, the following solution will allow for the 'wrong' number of buckets in cases where we have buckets with value 0. That behavior is unclear from the directions.

def automatic_histogram(dataset: List[int],
x: int) -> dict:
"""
@param dataset(List[int]) the numbers to populate the histogram
@param x(int): the number of bins for the histogram

@return dict: dictionary containing keys that are string ranges spread uniformly
over the dataset (except possibly a smaller final range in the case of odd #
of input values (as a set))
"""

# dataset = sorted(dataset) # make it quick to extract min, max, and iterate
# # using O(nlogn) complexity once
if x < 1:
raise ValueError("Must have at least 1 bin")
if not dataset:
return {}
# O(2n)
min_val = min(dataset)
max_val = max(dataset)

val_range = max_val + 1 - min_val # e.g. (5+1-1) = 5

bin_size = ceil(val_range / x)
print(bin_size)
bin_range = x * bin_size
extra_vals = bin_range - val_range # e.g. 5-3 = 2

histogram = {}
value_map = {} #maps integers to their corresponding key in the histogram
upper_val = None
lower_val = min_val
while upper_val != max_val:
upper_val = lower_val + bin_size - 1
if upper_val > max_val: # bin range may be larger than value range
upper_val = max_val
if lower_val == upper_val:
key = f"{lower_val}"
value_map[lower_val] = key
histogram[key] = 0 # single int value range
else:
key = f"{lower_val}-{upper_val}"
for val in range(lower_val, upper_val+1, 1):
value_map[val] = key
histogram[key] = 0
lower_val += bin_size

for val in dataset:
key = value_map[val]
histogram[key] += 1 # add count for this val to corresponding bin

# removal of zero values from histogram
final_histogram = {}
for key, val in histogram.items():
if val == 0:
continue
final_histogram[key] = val

return final_histogram

DanielMichaelFrees

I liked both question and solution! But I don't think it would pass test case with a dataset containing only negative values, and the question doesn't state there can't be negatives, does it?

jp_magal

The solution provided is not fully correct. I understand the stress under timed interview. Here's my solution given ample time. Thank you, both the interviewer and interviewee for the mock interview. It's quite helpful to me.

from collections import defaultdict
import math

def automatic_histogram(dataset, x):
min_num = dataset[0]
max_num = dataset[0]
freq_count = defaultdict(int)
for d in dataset:
if d > max_num:
max_num = d
if d < min_num:
min_num = d
freq_count[d] += 1

bin_size = math.ceil((max_num - min_num) / x)

histogram = defaultdict(int)
for n, f in freq_count.items():
bin = (n-min_num) // bin_size
if n == max_num and (max_num-min_num)%bin_size == 0:
bin_str = str(max_num)
else:
bin_str = str(min_num + bin*bin_size) + '-' + str(min_num + bin*bin_size + bin_size-1)
histogram[bin_str] += f
return dict(histogram)

### Tests:

x = 4
dataset = [-4, -2, 0, 1, 2, 2, 5]
res = automatic_histogram(dataset, x)
print(res)
# Answer: {'-4--2': 2, '-1-1': 2, '2-4': 2, '5': 1}

x = 5
dataset = [1, 2, 3, 4, 5, 6, 7, 8, 9]
res = automatic_histogram(dataset, x)
print(res)
# Answer: {'1-2': 2, '3-4': 2, '5-6': 2, '7-8': 2, '9': 1}

x = 3
dataset = [1, 2, 4, 4, 5, 6, 6, 8, 9]
res = automatic_histogram(dataset, x)
print(res)
# Answer: {'1-3': 2, '4-6': 5, '7-9': 2}

teslarocks

Sir where we can get such type of questions?
I hardly fined it on internet
And now days many companies started asking in there 1st round of coding such type of questions

amanmiglani

Salary comparison between Machine Learning Enginner or Software Engineer

saurabhshrivastava

Google Machine Learning Engineer Python Interview

Google Machine Learning Engineer Python Interview

preparing for google's machine learning interview

Why Google Started Using Machine Learning?? 🔥💯 #machinelearning #google #python #programming

The Truth About Learning Python in 2024

Advice from the Top 1% of Software Engineers

I can't STOP reading these Machine Learning Books!

Machine Learning for Everybody – Full Course

Build your first machine learning model in Python

Подготовка к собеседованиям: Практика 5.4 #собеседование #алгоритмы #программирование...

Learning Machine Learning has never been easier #shorts #machinelearning #statistics #datascience

what it’s like to work at GOOGLE…

Machine Learning Explained in 100 Seconds

TensorFlow in 100 Seconds

How to learn AI and ML in 2024 - A complete roadmap

How to Deploy a Machine Learning Model to Google Cloud for 20% Software Engineers (CS329s tutorial)

How I would learn Machine Learning (if I could start over)

Machine Learning | What Is Machine Learning? | Introduction To Machine Learning | 2024 | Simplilearn

Detailed Roadmap for Machine Learning | Free Study Resources | Simply Explained

Machine learning on Google Cloud

Use your knowledge of Python to make AI

Don’t Become a Data Scientist If

How I learned to code in 3 months and cracked Google and Amazon

BE PREPARED Machine Learning Engineer interview questions

Google Just Launched a HUGE FREE Machine Learning Course #shorts