Pandas for Data Analysts — Top 20 Functions You Must Know (2026 Interview Guide)
Python interviews for data analyst roles are really Pandas interviews. Out of every 10 Python questions you will face, 8 will use Pandas. But Pandas has 200+ functions — knowing which ones to focus on is what separates prepared candidates from those who studied the wrong things.
This guide gives you the exact 20 functions that appear most in interviews, with real code examples you can run today, plus the interview questions each one powers.
1. groupby() — The Most Important Function
Definition: groupby() splits a DataFrame into groups based on one or more columns, then applies an aggregation function to each group. It is the Pandas equivalent of SQL’s GROUP BY clause.
# Revenue and order count per category
result = df.groupby('category').agg(
total_revenue=('amount', 'sum'),
order_count=('order_id', 'count'),
avg_order=('amount', 'mean')
).reset_index().sort_values('total_revenue', ascending=False)
# Multiple groupby columns
df.groupby(['region', 'category'])['amount'].sum().reset_index()
2. merge() — Joining DataFrames
Definition: merge() combines two DataFrames based on a common column or index. It is the Pandas equivalent of SQL JOIN. Supports inner, left, right, outer, and cross joins.
# Inner join (only matching rows)
merged = customers.merge(orders, on='customer_id', how='inner')
# Left join (all customers, even those with no orders)
merged = customers.merge(orders, on='customer_id', how='left')
# Anti-join: customers with NO orders
merged = customers.merge(orders, on='customer_id', how='left', indicator=True)
no_orders = merged[merged['_merge'] == 'left_only']
3. pivot_table() — Cross Tabulation
Definition: pivot_table() creates a spreadsheet-style pivot table from a DataFrame. You define rows, columns, values and the aggregation function. Perfect for revenue-by-region-by-category type analyses.
# Revenue by region (rows) and category (columns)
pivot = df.pivot_table(
values='amount',
index='region',
columns='category',
aggfunc='sum',
fill_value=0,
margins=True # adds row/column totals
)
4. apply() + lambda — Custom Transformations
apply() lets you apply any function to a column or row. Combined with lambda, it’s the most flexible transformation tool in Pandas.
# Segment customers by spend
df['segment'] = df['total_spend'].apply(
lambda x: 'High' if x > 50000 else ('Medium' if x > 10000 else 'Low')
)
# Apply custom function to multiple columns
def clean_name(name):
return name.strip().title() if isinstance(name, str) else name
df['clean_name'] = df['customer_name'].apply(clean_name)
5. Data Cleaning Functions
Real-world data is messy. Data cleaning questions are asked in virtually every data analyst interview. Master these functions:
# Check for missing values
df.isnull().sum()
df.isnull().sum() / len(df) * 100 # percentage missing per column
# Remove rows with missing values
df.dropna(subset=['email', 'phone'], inplace=True)
# Fill missing values
df['age'].fillna(df['age'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)
# Fix data types
df['order_date'] = pd.to_datetime(df['order_date'])
df['amount'] = df['amount'].astype(float)
# Remove duplicates
df.drop_duplicates(subset=['customer_id', 'order_date'], keep='last')
More Essential Pandas Functions
| Function | What it Does | Interview Use Case |
|---|---|---|
| value_counts() | Frequency count of unique values | Distribution of categories, top values |
| sort_values() | Sort DataFrame by column(s) | Top N customers, ranking |
| query() | Filter rows using string expression | Cleaner filtering than boolean indexing |
| rolling() | Rolling window calculations | 7-day moving average, rolling sum |
| shift() | Shift values by N periods | Previous period comparison (like SQL LAG) |
| str methods | String operations on text columns | Data cleaning, extraction |
| cut() / qcut() | Bin continuous data into categories | Age groups, spend buckets |
| melt() | Wide to long format | Reshape for visualisation |
Top Pandas Interview Questions 2026
- Find the top 3 customers by revenue in each city using groupby
- Merge two DataFrames and find customers who have never ordered (anti-join)
- Create a pivot table of revenue by region and product category with totals
- Clean a dataset with mixed date formats, missing values and duplicate rows
- Calculate month-over-month growth using shift() or diff()
- Find users who were active for 3+ consecutive days using date arithmetic
- Write a function that segments customers into High/Medium/Low using apply()
⭐ Key Takeaways
- groupby() + agg() is the most tested Pandas function — master it first with multiple aggregations
- merge() covers all SQL JOIN types — know inner, left, and the anti-join pattern (left + indicator)
- pivot_table() is for 2D cross-tabulation — use it when you need rows AND columns to be categories
- apply() + lambda handles any custom transformation that built-in functions can’t
- Data cleaning: isnull(), dropna(), fillna(), astype(), drop_duplicates() — memorise all 5
- Always reset_index() after groupby, use .copy() to avoid SettingWithCopyWarning
Practice Pandas with real interview questions
Our Python Hub covers all 20 Pandas functions with exercises, real company questions and complete solutions.
Pandas Complete Guide →