A/B Testing for Data Analysts — Complete Guide with Python Examples (2026)
A/B testing knowledge separates junior analysts from senior ones. At product companies like Swiggy, Flipkart, Razorpay and Google, almost every product decision goes through experimentation — and data analysts own the design, execution and analysis of these tests.
Interview questions about A/B testing appear in 65% of product company data analyst interviews. This guide gives you the complete framework — with Python code for statistical analysis and the exact interview questions you'll face.
The 5-Step A/B Testing Framework
Every A/B testing interview question can be answered with this framework. Interviewers award marks for each step — missing any step costs points even if your statistics are correct.
Define Hypothesis
State H0 (null) and H1 (alternative) clearly. H0: The new checkout button has no effect on conversion rate. H1: The new checkout button increases conversion rate. Define: primary metric, secondary metrics, and guardrail metrics. Guardrail metrics are things you must NOT hurt — e.g., page load time, support tickets.
Calculate Sample Size
Before running any test, calculate how many users you need. Variables: baseline rate (current conversion, e.g., 5%), minimum detectable effect (the smallest lift worth detecting, e.g., 10% relative = 0.5% absolute), statistical power (80% standard), significance level (5% = alpha 0.05). Underpowered tests produce unreliable results — this is the #1 A/B testing mistake.
Run the Experiment
Randomise at the correct unit (usually user_id, not session_id — same user should always see the same variant). Collect data for the pre-calculated duration (minimum 1 full week, ideally 2 business cycles). Do NOT peek at results mid-test and stop early — this dramatically increases false positive rate (p-hacking).
Analyse Results
Calculate the test statistic and p-value. Check: primary metric, secondary metrics, guardrail metrics, and segment analysis (does the effect hold across different user groups? if only one segment benefits, that's important context). Check for novelty effects — early engagement often inflates results.
Make Decision and Communicate
If p < 0.05 AND no guardrail metrics harmed AND effect size is practically meaningful → ship the variant. Calculate business impact: lift × daily users × revenue per conversion = annual revenue impact. Communicate findings in plain language with a recommendation — not just statistics.
Python Code — A/B Test Analysis
5 A/B Testing Mistakes That Kill Experiments
- Peeking and stopping early. Checking results daily and stopping when p < 0.05 inflates false positives dramatically. Run until pre-calculated sample size is reached.
- Not calculating sample size before starting. Running for "a week and seeing what happens" produces underpowered results you can't trust.
- Randomising at session level instead of user level. The same user may see both variants — contaminating results. Always randomise by user_id.
- Testing too many metrics without correction. Testing 20 metrics at 5% significance means 1 will appear significant by chance. Use Bonferroni correction or pre-register your primary metric.
- Ignoring novelty effects. New features often get a short-term boost from users trying something new. Run tests for at least 2 business cycles to see through novelty.
Real A/B Testing Interview Questions
| Question | Company | What They're Testing |
|---|---|---|
| Design an A/B test for a new Swiggy search ranking algorithm | Swiggy | Hypothesis, metrics, randomisation unit, duration |
| DAU increased 8% in the test — should we ship? What else do you check? | Flipkart | Guardrails, segment analysis, novelty effects |
| How would you detect if our A/B test was corrupted by a bug in variant assignment? | SRM (Sample Ratio Mismatch) detection | |
| Our test ran for 3 days. The results look great. Can we stop early? | Amazon | Understanding of optional stopping problem |
| How do you test a feature that only affects 1% of users? | Razorpay | Power analysis with rare events |
⭐ Key Takeaways
- A/B testing framework: hypothesis → sample size → run experiment → analyse → decide and communicate
- Always calculate sample size BEFORE running — underpowered tests produce unreliable results
- Randomise at user_id level, not session level — same user must always see the same variant
- Statistical significance (p < 0.05) is necessary but not sufficient — also check practical significance and guardrails
- Never peek and stop early — run until pre-calculated sample size or use sequential testing methods
- Python: scipy.stats.proportions_ztest for significance, statsmodels for sample size calculation
Practice A/B testing questions with a mentor
Our data analyst mock sessions include A/B test design questions from Swiggy, Flipkart and Google — with live feedback.
Book Free Mock Session →