Probability Interview Questions In Data Analysts’ Real Life
Connecting probability interview questions to data analysts’ everyday tasks
If you apply for the role of data analyst and data scientist, in your interviews you’ll often come across probability questions. But here’s the thing: some people are sure that these questions don’t have much to do with the real job. Questions like, “Why should we bother calculating the chance of rolling a 6 five times with a dice?” often come up. In this article, I’m going to share some real-life examples to explain why understanding probability matters more than you might think. For that, let’s take some interview tasks and see their applications in the real world.
Q1. You flip a coin 10 times in a row, what is the probability that they all come up heads?
Imagine you’re a data analyst in a food delivery service. After every order, customers can rate the food’s quality. The team’s primary goal is to deliver top-tier service, and if a restaurant receives bad reviews, you need to check it. So, here’s the big question — how many bad reviews should trigger a restaurant check-up?
Sometimes, a restaurant can end up with some not-so-great feedback just occasionally, and it’s not their fault. If a restaurant has handled 1000 orders, they might get a couple of bad reviews by chance.
Think of it like this: about 5% of orders end up with negative reviews just by chance. Then the number of bad reviews per restaurant follows a Binomial distribution Bin(n, p), with “n” being the orders and “p” the likelihood of a bad review (which is 5% in our case).
So, if a restaurant has 100 orders, there’s about a 23.4% chance they’ll get at least 7 bad reviews and a much smaller 2.8% chance they’ll get at least 10. You can check this using a calculator here, your parameters are n=100, x=10, p=0.05, and don’t forget to choose option x>=X.
Image by the author.
Here’s a takeaway: if you set the threshold at 7 reviews for a restaurant with 100 orders, you might end up checking on restaurants too often, which means extra costs for you and more pressure on the restaurants.
Q2. You draw a card from a standard deck of 52 cards 10 times. What is the probability of not getting any red card?
Now, picture yourself in the world of e-commerce websites. You and your team have just introduced a new payment method, and you’re curious about how often customers are using this new feature. But here’s the catch — due to a little bug, about 2% of requests to the new payment method fail. In other words, customers get to see this new payment option in 98% of their sessions. To figure out how often a customer chooses this payment method, you want to focus on those who had it available all the time. But here’s where it gets tricky.
Think about a user with just one session — you exclude them from your analysis with a 2% probability. Now, consider a user with 25 sessions. For them, the chance of not having the feature available in at least one session is 1–0.98²⁵ = 39.7%. So, you might unintentionally leave out some of your most loyal customers with a higher chance, and that could skew your analysis.
Image by the author.
Q3. If you roll a dice three times, what is the probability of getting two consecutive threes?
Imagine you’re working at a ride-hailing company like Uber. In some countries, people still pay for rides with cash, which can be a hassle for drivers. They have to carry change, deal with cash transactions, and so on.
Your team is worried that if a driver gets three cash orders in a row, they might get frustrated and run out of change. So, you’re thinking about limiting cash orders in such situations. But before doing that, you want to figure out how often this really happens.
Let’s say the average number of trips per driver per day is 10, and 10% of those trips are paid in cash.
So, the probability of getting 3 consecutive cash trips is 0.1*0.1*0.1 = 0.001. But it can be 1st, 2nd, 3rd trip; 2nd, 3rd, 4th trip, and so on. That means the chance of getting three cash trips in a row is just 8*0.1*0.1*0.1 = 0.008%. It seems to be pretty low, you might want to hold off on implementing this feature for now.
Image by the author.
Q4, An HIV test is 99% accurate (both ways). Only 0.3% of the population is HIV +. What is the probability that a random person is HIV + given that the person tests +?
Original article for the question here.
You’re in the banking or credit industry, building models to forecast whether a customer will return their loan. Overall, 85% of all loans typically get repaid. In your latest model, when it says a customer will repay, it is correct 92% of the time. However, it’s only correct 40% of the time when predicting that a customer won’t repay. Now, you have a concern: If your model is saying the customer won’t repay, what’s the real chance they’ll actually repay?
First, let’s calculate the likelihood of the model predicting “the customer won’t repay”. This involves two components:
The probability of getting this prediction from customers who won’t repay the credit: (1–0.4)*(1–0.85) = 0.09The probability of getting this prediction from customers who will: (1–0.92)*0.85 = 0.068Then the probability that the customer will repay the credit if our model doesn’t think so is: 0.068/(0.068+0.09) = 0.43Image by the author.
So, if you don’t think that the customer is going to return the credit, there’s a pretty high probability that he will.
What’s the whole point of this article? It highlights that understanding probability and combinatorics is essential for data scientists and analysts. In your daily life, you’ll encounter situations where a grasp of probability is necessary; otherwise, you might draw incorrect conclusions. However, from the perspective of employers, interview questions should be more practical to help future analysts recognize the practical applicability of this knowledge at work.
Thank you for taking the time to read this article. I would love to hear your thoughts, so please feel free to share any comments or questions you may have.
Probability Interview Questions In Real Life of Data Analysts was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.