Amazon Data Scientist Mock Interview - Fraud Model

preview_player
Показать описание

====== ✅ Details ======

🤔 Try these machine learning questions asked in Amazon's data science interviews:

"Q1 - What is the variance and bias trade-off?"

"Q2 - What's the difference between boosting and bagging?"

"Q3 - How would you detect seller fraud on Amazon?"

This is a mock interview session covering machine learning questions asked in Amazon's data science interviews. The interviewer was a data scientist at Google and PayPal. The interviewee is a candidate preparing for data science interviews at FAANG companies.

====== ⏱️ Timestamps ======

0:00 Intro
00:55 Variance & Bias Trade-Off
03:47 Boosting vs Bagging
06:33 Seller Fraud Modeling
26:24 Assessment

====== 📚 Other Useful Contents ======

1. Principles and Frameworks of Product Metrics | YouTube Case Study

2. How to Crack the Data Scientist Case Interview

3. How to Crack the Amazon Data Scientist Interview

====== Connect ======

Рекомендации по теме
Комментарии
Автор

Great Video Dan, it was eye-opening! Thank you so much from NYC! just one note that, boosting and Bagging methods are not just for the tree-based ML systems and can be used with any ML method. However, they are much more popular for tree-based methods due to their fast training time and relatively straightforward application.

hsoley
Автор

PCA is a feature extraction technique. Feature selection techniques would choose from features list, extraction techniques would create features which capture the majority of vairance. Whatever the interviewee chose for feature selection are good I feel.

gpprudhvi
Автор

Im bagging, We won't say a model as weak leaner's.We use the word weak learners only in boosting and to specifically in Adaboost, because it only has a stumps for prediction not a full tree so only we say adaboost models as a weak learners

shilashm
Автор

Concerning '# of positive reviews' feature: I have to assume that there exists a subset of fraudulent sellers using bots/review farms to boost #/ratio of positive review. If positive reviews are locally important for non-fraudulent true positives, I imagine that this could potentially lead to a recall problem in our model. thoughts?

rr
Автор

In classification we use to have precision-recall tradeoff ryt?

shilashm
Автор

Great mock interview and I believe it is pretty representative! Thanks for providing this!

danielxing
Автор

Is this a typical interview for an L4 or L5 role?

aaronrasquinha
Автор

I would have asked about the provenance of the data, i.e. on what grounds the sellers and transactions were classified as fraud. If these were simply reported as fraud by other users, a fair share of these could be from bad-faith competitors. In this case, I would think of alternative ways to gather data, and propose a more lenient decision boundary for fraud.

The interviewee did not have time for a deep dive into the mechanisms for determining image/title misrepresentation. A subset of misrepresentation (image/title mismatch) could be captured by CV models.

qingyangzhang
Автор

HI SIR I AM ZAKIYAH FATHIMA M. I AM 12 YEARS OLD .I USED TO WATCH YOUR VIDEOS AND SUNDAS MAM'S CHANNEL. MY DREAM IS TO BECOME A DATASCIENTIST . I KNOW THE PROGRAM LANGUAGE PYTHON .

zakiyahfathimam.
Автор

Higher variance means more flexibility? In general, can't you look at variance in the same way you look at overfitting. I.e., a model with vary high variance will capture outliers, tend to overfit data that doesn't accurately represent the underlying phenomena that produced the data. In this case, wouldn't it make sense to say it does NOT correspond to more flexibility, since the higher variance means it is better suited for ONLY the training data? Just curious where my logic is straying from the interviewers. Thank you for posting this it has been very informative!

Drewbie_T
Автор

is it just me or you'll rather do clustering to find labels, then classify....

xEl_ence
Автор

I feel like the dude got lost in the sauce with seller based, listing based type shit.

yoyo-uepf
Автор

? from where hyperparamter comes into decision boundary. which kind of intangible things are they cooking on their own. God please save.

MrMandarpriya
welcome to shbcf.ru