NumPy — The Foundation of Data Science | Data Analyst Interview

Home SQL Python Case Studies Services Blog Book Free Call →

🔢 Numerical Computing

NumPy — The Foundation of Data Science in Python

Arrays, broadcasting, and vectorised operations. Pandas is built on NumPy, and interviewers love testing it for speed and statistical reasoning.

Book Mock Interview ← Back to Python

Core Concepts

What interviewers test in NumPy

NumPy is the engine under Pandas. These six topics cover nearly every NumPy question in data analyst interviews.

1ndarray Basics

The N-dimensional array — faster and more memory-efficient than Python lists. Knowing shape, dtype, and ndim is essential.

np.array() from lists
shape, dtype, ndim, size
zeros, ones, arange, linspace

2Indexing & Slicing

Element, slice, boolean, and fancy indexing. 2D slicing syntax trips up many candidates.

arr[row, col] for 2D arrays
Boolean masks: arr[arr > 0]
Fancy indexing with index arrays

3Broadcasting

How NumPy handles operations on arrays of different shapes. The single most-asked NumPy concept in interviews.

Scalar + array operations
Shape compatibility rules
Common shape errors explained

4Math & Statistics

Mean, median, std, percentile, correlation — the statistical toolkit every data analyst must know.

np.mean, median, std, var
np.percentile and quantile
np.corrcoef for correlations

5Reshaping & Stacking

reshape, ravel, transpose, hstack, vstack — manipulate dimensions for ML pipelines and matrix operations.

reshape(-1, 1) for column vectors
concatenate vs stack
flatten vs ravel difference

6Random & Sampling

np.random for simulations, A/B test mocks, and bootstrap sampling — common in analytical case rounds.

np.random.seed for reproducibility
choice, normal, uniform
Sampling without replacement

Interview Example 1

Vectorised vs Python loop — performance

“Why is NumPy faster?” — demonstrate this and explain vectorisation in compiled C.

import numpy as np

import time

arr = np.arange(1_000_000)

# Slow — Python loop

t = time.time()

result = [x**2 for x in arr]

print(f”Loop: {time.time()-t:.3f}s”)

# Fast — vectorised C under the hood

t = time.time()

result = arr ** 2

print(f”NumPy: {time.time()-t:.3f}s”) # ~50-100x faster

Interview Example 2

IQR outlier detection

Tests percentile knowledge and boolean masking — common in EDA and data cleaning rounds.

prices = np.array([120, 150, 130, 800, 145, 160, 155])

q1, q3 = np.percentile(prices, [25, 75])

iqr = q3 – q1

lower, upper = q1 – 1.5*iqr, q3 + 1.5*iqr

outliers = prices[(prices < lower) | (prices > upper)]

print(f”Outliers: {outliers}”) # [800]

Interview Questions

Real NumPy questions asked in 2026

These NumPy questions come up in analytics and data engineering interviews alike.

Why is NumPy faster than a Python list?

NumPy arrays store data in contiguous memory with a fixed type, so operations execute in pre-compiled C. Python lists store pointers to separate objects, requiring type-checking on each iteration. NumPy is typically 50–100x faster for numerical operations.

Explain broadcasting with an example.

Broadcasting lets NumPy operate on arrays of different shapes by virtually expanding the smaller one. A (3,4) matrix plus a (4,) row works — the row is broadcast across all 3 rows. A (3,4) matrix plus a (3,) column fails; you’d first reshape to (3,1).

What is the difference between np.array and np.asarray?

np.array() always copies the input by default. np.asarray() avoids copying if the input is already a NumPy array of the correct dtype — useful inside functions to avoid unnecessary memory use.

What does axis=0 vs axis=1 mean?

axis=0 collapses along rows (gives a column-wise result). axis=1 collapses along columns (gives a row-wise result). For a (5,3) array, np.sum(axis=0) returns 3 numbers; np.sum(axis=1) returns 5 numbers.

When would you use np.where()?

np.where(condition, x, y) is a vectorised if-else — returns x where True, y where False. Example: np.where(scores > 60, ‘pass’, ‘fail’). Without x and y arguments, it returns the indices of True values — useful for finding row positions.

Quick Reference

NumPy cheat sheet

The functions that show up in every NumPy interview question.

Function	Purpose
np.array(list)	Create ndarray from a Python list
np.zeros / ones / full	Constant-filled arrays
np.arange(start, stop, step)	Range as ndarray
np.linspace(start, stop, n)	n evenly-spaced values
arr.reshape(rows, cols)	Change array shape
arr.T / arr.transpose()	Swap rows and columns
np.mean / median / std	Statistical aggregates
np.percentile(arr, q)	q-th percentile
np.where(cond, x, y)	Vectorised conditional
np.unique(arr, return_counts)	Unique values and frequencies
np.concatenate / hstack / vstack	Combine arrays
np.random.choice / normal	Random sampling

Ready to ace your NumPy interview?

Practise broadcasting, vectorisation, and statistical questions with a live mentor.

Book Free NumPy Session