NumPy β€” The Foundation of Data Science | Data Analyst Interview
πŸ”’ Numerical Computing

NumPy β€” The Foundation of Data Science in Python

Arrays, broadcasting, and vectorised operations. Pandas is built on NumPy, and interviewers love testing it for speed and statistical reasoning.

Core Concepts
What interviewers test in NumPy

NumPy is the engine under Pandas. These six topics cover nearly every NumPy question in data analyst interviews.

1ndarray Basics

The N-dimensional array β€” faster and more memory-efficient than Python lists. Knowing shape, dtype, and ndim is essential.

  • np.array() from lists
  • shape, dtype, ndim, size
  • zeros, ones, arange, linspace

2Indexing & Slicing

Element, slice, boolean, and fancy indexing. 2D slicing syntax trips up many candidates.

  • arr[row, col] for 2D arrays
  • Boolean masks: arr[arr > 0]
  • Fancy indexing with index arrays

3Broadcasting

How NumPy handles operations on arrays of different shapes. The single most-asked NumPy concept in interviews.

  • Scalar + array operations
  • Shape compatibility rules
  • Common shape errors explained

4Math & Statistics

Mean, median, std, percentile, correlation β€” the statistical toolkit every data analyst must know.

  • np.mean, median, std, var
  • np.percentile and quantile
  • np.corrcoef for correlations

5Reshaping & Stacking

reshape, ravel, transpose, hstack, vstack β€” manipulate dimensions for ML pipelines and matrix operations.

  • reshape(-1, 1) for column vectors
  • concatenate vs stack
  • flatten vs ravel difference

6Random & Sampling

np.random for simulations, A/B test mocks, and bootstrap sampling β€” common in analytical case rounds.

  • np.random.seed for reproducibility
  • choice, normal, uniform
  • Sampling without replacement
Interview Example 1
Vectorised vs Python loop β€” performance

“Why is NumPy faster?” β€” demonstrate this and explain vectorisation in compiled C.

import numpy as np
import time
 
arr = np.arange(1_000_000)
 
# Slow β€” Python loop
t = time.time()
result = [x**2 for x in arr]
print(f”Loop: {time.time()-t:.3f}s”)
 
# Fast β€” vectorised C under the hood
t = time.time()
result = arr ** 2
print(f”NumPy: {time.time()-t:.3f}s”) # ~50-100x faster
Interview Example 2
IQR outlier detection

Tests percentile knowledge and boolean masking β€” common in EDA and data cleaning rounds.

prices = np.array([120, 150, 130, 800, 145, 160, 155])
 
q1, q3 = np.percentile(prices, [25, 75])
iqr = q3 – q1
lower, upper = q1 – 1.5*iqr, q3 + 1.5*iqr
 
outliers = prices[(prices < lower) | (prices > upper)]
print(f”Outliers: {outliers}”) # [800]
Interview Questions
Real NumPy questions asked in 2026

These NumPy questions come up in analytics and data engineering interviews alike.

Why is NumPy faster than a Python list?
NumPy arrays store data in contiguous memory with a fixed type, so operations execute in pre-compiled C. Python lists store pointers to separate objects, requiring type-checking on each iteration. NumPy is typically 50–100x faster for numerical operations.
Explain broadcasting with an example.
Broadcasting lets NumPy operate on arrays of different shapes by virtually expanding the smaller one. A (3,4) matrix plus a (4,) row works β€” the row is broadcast across all 3 rows. A (3,4) matrix plus a (3,) column fails; you’d first reshape to (3,1).
What is the difference between np.array and np.asarray?
np.array() always copies the input by default. np.asarray() avoids copying if the input is already a NumPy array of the correct dtype β€” useful inside functions to avoid unnecessary memory use.
What does axis=0 vs axis=1 mean?
axis=0 collapses along rows (gives a column-wise result). axis=1 collapses along columns (gives a row-wise result). For a (5,3) array, np.sum(axis=0) returns 3 numbers; np.sum(axis=1) returns 5 numbers.
When would you use np.where()?
np.where(condition, x, y) is a vectorised if-else β€” returns x where True, y where False. Example: np.where(scores > 60, ‘pass’, ‘fail’). Without x and y arguments, it returns the indices of True values β€” useful for finding row positions.
Quick Reference
NumPy cheat sheet

The functions that show up in every NumPy interview question.

FunctionPurpose
np.array(list)Create ndarray from a Python list
np.zeros / ones / fullConstant-filled arrays
np.arange(start, stop, step)Range as ndarray
np.linspace(start, stop, n)n evenly-spaced values
arr.reshape(rows, cols)Change array shape
arr.T / arr.transpose()Swap rows and columns
np.mean / median / stdStatistical aggregates
np.percentile(arr, q)q-th percentile
np.where(cond, x, y)Vectorised conditional
np.unique(arr, return_counts)Unique values and frequencies
np.concatenate / hstack / vstackCombine arrays
np.random.choice / normalRandom sampling

Ready to ace your NumPy interview?

Practise broadcasting, vectorisation, and statistical questions with a live mentor.

Book Free NumPy Session