Pandas for Data Analysts — Top 20 Functions You Must Know (2026 Interview Guide)
📊 Data Analyst

Pandas for Data Analysts — Top 20 Functions You Must Know (2026 Interview Guide)

Quick Answer
The most tested Pandas functions in data analyst interviews: groupby() + agg(), merge(), pivot_table(), apply() + lambda, and data cleaning (dropna, fillna, astype). Master these 5 groups and you will handle 85% of Python interview questions at any company.

Python interviews for data analyst roles are really Pandas interviews. Out of every 10 Python questions you will face, 8 will use Pandas. But Pandas has 200+ functions — knowing which ones to focus on is what separates prepared candidates from those who studied the wrong things.

This guide gives you the exact 20 functions that appear most in interviews, with real code examples you can run today, plus the interview questions each one powers.

1. groupby() — The Most Important Function

Definition: groupby() splits a DataFrame into groups based on one or more columns, then applies an aggregation function to each group. It is the Pandas equivalent of SQL’s GROUP BY clause.

# Revenue and order count per category
result = df.groupby('category').agg(
    total_revenue=('amount', 'sum'),
    order_count=('order_id', 'count'),
    avg_order=('amount', 'mean')
).reset_index().sort_values('total_revenue', ascending=False)

# Multiple groupby columns
df.groupby(['region', 'category'])['amount'].sum().reset_index()
💡
Interview TipAlways use .agg() with named aggregations (named tuples) rather than .agg({‘col’: ‘sum’}) — the output column names are cleaner and the code is more readable. Interviewers notice this.

2. merge() — Joining DataFrames

Definition: merge() combines two DataFrames based on a common column or index. It is the Pandas equivalent of SQL JOIN. Supports inner, left, right, outer, and cross joins.

# Inner join (only matching rows)
merged = customers.merge(orders, on='customer_id', how='inner')

# Left join (all customers, even those with no orders)
merged = customers.merge(orders, on='customer_id', how='left')

# Anti-join: customers with NO orders
merged = customers.merge(orders, on='customer_id', how='left', indicator=True)
no_orders = merged[merged['_merge'] == 'left_only']

3. pivot_table() — Cross Tabulation

Definition: pivot_table() creates a spreadsheet-style pivot table from a DataFrame. You define rows, columns, values and the aggregation function. Perfect for revenue-by-region-by-category type analyses.

# Revenue by region (rows) and category (columns)
pivot = df.pivot_table(
    values='amount',
    index='region',
    columns='category',
    aggfunc='sum',
    fill_value=0,
    margins=True  # adds row/column totals
)

4. apply() + lambda — Custom Transformations

apply() lets you apply any function to a column or row. Combined with lambda, it’s the most flexible transformation tool in Pandas.

# Segment customers by spend
df['segment'] = df['total_spend'].apply(
    lambda x: 'High' if x > 50000 else ('Medium' if x > 10000 else 'Low')
)

# Apply custom function to multiple columns
def clean_name(name):
    return name.strip().title() if isinstance(name, str) else name

df['clean_name'] = df['customer_name'].apply(clean_name)

5. Data Cleaning Functions

Real-world data is messy. Data cleaning questions are asked in virtually every data analyst interview. Master these functions:

# Check for missing values
df.isnull().sum()
df.isnull().sum() / len(df) * 100  # percentage missing per column

# Remove rows with missing values
df.dropna(subset=['email', 'phone'], inplace=True)

# Fill missing values
df['age'].fillna(df['age'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)

# Fix data types
df['order_date'] = pd.to_datetime(df['order_date'])
df['amount'] = df['amount'].astype(float)

# Remove duplicates
df.drop_duplicates(subset=['customer_id', 'order_date'], keep='last')

More Essential Pandas Functions

FunctionWhat it DoesInterview Use Case
value_counts()Frequency count of unique valuesDistribution of categories, top values
sort_values()Sort DataFrame by column(s)Top N customers, ranking
query()Filter rows using string expressionCleaner filtering than boolean indexing
rolling()Rolling window calculations7-day moving average, rolling sum
shift()Shift values by N periodsPrevious period comparison (like SQL LAG)
str methodsString operations on text columnsData cleaning, extraction
cut() / qcut()Bin continuous data into categoriesAge groups, spend buckets
melt()Wide to long formatReshape for visualisation

Top Pandas Interview Questions 2026

  1. Find the top 3 customers by revenue in each city using groupby
  2. Merge two DataFrames and find customers who have never ordered (anti-join)
  3. Create a pivot table of revenue by region and product category with totals
  4. Clean a dataset with mixed date formats, missing values and duplicate rows
  5. Calculate month-over-month growth using shift() or diff()
  6. Find users who were active for 3+ consecutive days using date arithmetic
  7. Write a function that segments customers into High/Medium/Low using apply()
⚠️
Common Pandas MistakeForgetting reset_index() after groupby. The output of groupby has a MultiIndex — always call .reset_index() to get a clean flat DataFrame. Not doing this causes confusing errors downstream.

⭐ Key Takeaways

  • groupby() + agg() is the most tested Pandas function — master it first with multiple aggregations
  • merge() covers all SQL JOIN types — know inner, left, and the anti-join pattern (left + indicator)
  • pivot_table() is for 2D cross-tabulation — use it when you need rows AND columns to be categories
  • apply() + lambda handles any custom transformation that built-in functions can’t
  • Data cleaning: isnull(), dropna(), fillna(), astype(), drop_duplicates() — memorise all 5
  • Always reset_index() after groupby, use .copy() to avoid SettingWithCopyWarning
❓ Frequently Asked Questions
What Pandas functions are most tested in data analyst interviews?
+
The most tested: groupby() with named aggregations, merge() for all join types, pivot_table() for cross-tabulation, apply() + lambda for transformations, and data cleaning functions (dropna, fillna, astype, drop_duplicates). groupby is asked in virtually every Python data analyst interview — master it first.
What is the difference between groupby and pivot_table in Pandas?
+
groupby() splits data by one or more columns and aggregates each group — output is a single column of values per group. pivot_table() creates a 2D matrix where rows are one categorical variable and columns are another, with aggregated values in the cells. Use groupby for single-dimension aggregation. Use pivot_table when you want to compare across two dimensions simultaneously.
How do I handle missing values in Pandas for an interview?
+
The correct approach: (1) First diagnose — df.isnull().sum() to see which columns have missing values and how many. (2) Understand why values are missing — random missingness vs systematic. (3) Choose strategy: drop rows (dropna) if < 5% missing and random; fill with median for numerical data with outliers; fill with mode for categorical; fill with 0 for count-like fields. Always document your choice and reasoning — interviewers want to hear your decision logic.
What is the equivalent of SQL LAG function in Pandas?
+
The Pandas equivalent of SQL LAG() is the shift() function. df[‘prev_month_revenue’] = df[‘revenue’].shift(1) gives the previous row’s value. For LAG within groups (PARTITION BY in SQL), use df.groupby(‘category’)[‘revenue’].shift(1). The LEAD function equivalent is shift(-1) — shifting backwards gives the next row’s value.

Practice Pandas with real interview questions

Our Python Hub covers all 20 Pandas functions with exercises, real company questions and complete solutions.

Pandas Complete Guide →
PS
Prakhar Shrivastava
Founder · 10+ years in analytics · 800+ candidates mentored
Former analytics lead at top product companies. Helping India’s data analysts crack interviews with practical, job-ready preparation.

Leave a Reply

Discover more from Interview Preperation

Subscribe now to keep reading and get access to the full archive.

Continue reading