Foundations

What is SQL? β€” Complete Beginner Guide

SQL (Structured Query Language) is the standard language for managing and querying data stored in relational databases. It is the single most important technical skill for data analysts β€” required in over 90% of analyst job listings in India and globally.

Why SQL matters: SQL lets you pull, filter, group, join and transform data stored in databases β€” exactly what data analysts do every single day. Without SQL, you cannot do the job. With SQL, you can answer almost any business question from data.

πŸ—„οΈ

Relational Databases

SQL works with relational databases like MySQL, PostgreSQL, BigQuery, Snowflake, SQL Server β€” all widely used at companies hiring analysts.

πŸ“‹

Tables, Rows & Columns

Data is stored in tables. Each row is a record (e.g. one order). Each column is an attribute (e.g. order_amount, customer_id, date).

πŸ”

Querying Data

You write SQL queries to ask questions: “Which customers spent the most last month?” SQL retrieves exactly the data you need from millions of rows.

⚑

Why It’s Tested

Interviews test SQL because it directly mirrors real analyst work. Every analyst query you write on the job is SQL β€” interviewers want to see you can do it.

SQL vs other tools β€” where it fits

ToolWhat it doesWhen analysts use it
SQLQuery & transform data in databasesDaily β€” pulling any data from any database
Python (Pandas)Analyse & visualise data in-memoryComplex analysis, ML, automation
ExcelAd-hoc analysis, small datasetsQuick calculations, sharing with non-tech teams
Power BI / TableauInteractive dashboards & chartsReporting, presenting to stakeholders
Topic 1 β€” Beginner

SELECT, WHERE, ORDER BY β€” SQL Basics

Every SQL query starts with SELECT. These fundamentals are the building blocks of all SQL β€” even complex window function queries use SELECT at their core.

1The basic SELECT statement

SELECT retrieves data from a table. You specify which columns you want and which table to get them from. The asterisk (*) means “all columns.”

SQL
— Select all columns from orders table
SELECT *
FROM orders;
 
— Select only specific columns
SELECT order_id, customer_name, order_amount
FROM orders;
 
— Give columns a readable alias
SELECT
  customer_name AS “Customer”,
  order_amount AS “Order Value (β‚Ή)”
FROM orders;
Key rules: Column names go after SELECT separated by commas. The table name goes after FROM. AS creates a readable alias for any column. Semicolons end the query.
2WHERE β€” filtering rows

WHERE filters which rows are returned. It works like a condition β€” only rows where the condition is TRUE are included in the result.

SQL
— Single condition
SELECT * FROM orders
WHERE order_amount > 5000;
 
— Multiple conditions with AND / OR
SELECT * FROM orders
WHERE order_amount > 5000
  AND region = ‘North’;
 
— IN β€” match any value in a list
SELECT * FROM customers
WHERE city IN (‘Mumbai’, ‘Delhi’, ‘Bengaluru’);
 
— LIKE β€” pattern matching for text
SELECT * FROM products
WHERE product_name LIKE ‘%phone%’;
 
— BETWEEN β€” range filter
SELECT * FROM orders
WHERE order_date BETWEEN ‘2024-01-01’ AND ‘2024-03-31’;
Interview tip: LIKE with % means “anything.” '%phone%' matches “smartphone”, “phone case”, “headphones”. IS NULL checks for missing values β€” very common in data cleaning questions.
3ORDER BY and LIMIT

ORDER BY sorts results. LIMIT controls how many rows are returned. Together they answer “show me the top N…” questions.

SQL
— Top 10 orders by value
SELECT customer_name, order_amount
FROM orders
ORDER BY order_amount DESC
LIMIT 10;
 
— Sort by multiple columns
SELECT region, product_name, revenue
FROM sales
ORDER BY region ASC, revenue DESC;
ASC = smallest to largest (default). DESC = largest to smallest. Use DESC when you want “top N” β€” highest revenue, most orders, latest dates.
Topic 3 β€” Intermediate

SQL JOINs β€” Complete Guide

JOINs combine data from two or more tables based on a related column. This is the most commonly tested SQL topic at all levels. Understanding all 4 join types is essential.

βœ“

INNER JOIN

Returns only rows that have a match in both tables. Most common join. Use when you only want records that exist in both.

ON a.id = b.id

LEFT JOIN

Returns all rows from the left table, plus matching rows from the right. Non-matching right rows become NULL. Most used in analytics.

LEFT JOIN b ON a.id = b.id

RIGHT JOIN

Returns all rows from the right table, plus matching rows from the left. Rare in practice β€” analysts usually rewrite as a LEFT JOIN.

RIGHT JOIN b ON a.id = b.id

FULL OUTER JOIN

Returns all rows from both tables. Non-matching rows from either side become NULL. Use to find records missing from either table.

FULL OUTER JOIN b ON a.id = b.id
SQL β€” JOIN Examples
— INNER JOIN: customers who placed orders
SELECT c.customer_name, o.order_date, o.order_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
 
— LEFT JOIN: ALL customers, even those with no orders
SELECT c.customer_name, COUNT(o.order_id) AS total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
 
— Interview classic: customers with NO orders (NULL trick)
SELECT c.customer_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
Interview golden rule: When asked to find records that DON’T exist in another table (e.g. “customers with no orders”), always use LEFT JOIN + WHERE right_table.id IS NULL. This pattern comes up constantly.
Topic 2 β€” Beginner/Intermediate

GROUP BY & Aggregate Functions

Aggregate functions summarise many rows into a single value. GROUP BY groups rows with the same value so you can aggregate per group β€” like “total revenue per region.”

FunctionWhat it doesExample
COUNT(*)Count all rows (including NULLs)COUNT(*) AS total_orders
COUNT(col)Count non-NULL values in a columnCOUNT(order_id)
SUM(col)Add up all values in a columnSUM(revenue) AS total_revenue
AVG(col)Average of all valuesAVG(order_amount) AS avg_order
MIN(col)Smallest valueMIN(order_date) AS first_order
MAX(col)Largest valueMAX(order_amount) AS biggest_order
SQL β€” GROUP BY Examples
— Revenue by region
SELECT region, SUM(order_amount) AS total_revenue
FROM orders
GROUP BY region
ORDER BY total_revenue DESC;
 
— HAVING filters AFTER grouping (WHERE filters BEFORE)
SELECT customer_id, COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;
 
— Multi-level grouping: revenue by region AND product category
SELECT region, category, SUM(revenue) AS total
FROM sales
GROUP BY region, category
ORDER BY region, total DESC;
HAVING vs WHERE: WHERE filters individual rows before grouping. HAVING filters groups after grouping. A common interview trick is asking you to find groups where an aggregate condition is true β€” always use HAVING for that.
Topic 4 β€” Intermediate

Subqueries & CTEs (WITH Clauses)

When a single SELECT isn’t enough, you nest queries inside queries (subqueries) or break them into named steps (CTEs). CTEs are the cleaner, modern approach that interviewers love to see.

1Subqueries β€” queries inside queries

A subquery is a SELECT statement nested inside another. The inner query runs first, then the outer query uses its result.

SQL β€” Subquery
— Find customers with above-average order value
SELECT customer_name, order_amount
FROM orders
WHERE order_amount > (
  SELECT AVG(order_amount) FROM orders
);
 
— Classic interview Q: 2nd highest salary
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
2CTEs with WITH β€” cleaner and preferred

CTEs (Common Table Expressions) use the WITH keyword to define named temporary tables. They make complex queries readable and are strongly preferred in interviews over deeply nested subqueries.

SQL β€” CTEs
— CTE: find top customers per region
WITH regional_totals AS (
  SELECT region, customer_id,
    SUM(order_amount) AS total_spend
  FROM orders
  GROUP BY region, customer_id
),
ranked AS (
  SELECT *,
    RANK() OVER (
      PARTITION BY region
      ORDER BY total_spend DESC
    ) AS rnk
  FROM regional_totals
)
SELECT * FROM ranked WHERE rnk = 1;
Best practice: Always use CTEs over deeply nested subqueries in interviews. It shows you write maintainable, readable SQL β€” exactly what companies want in a working analyst.
Topic 5 β€” Advanced (Most Tested)

Window Functions β€” RANK, LAG, LEAD & More

Window functions are the single most tested SQL topic in senior data analyst interviews. They perform calculations across a set of rows related to the current row β€” without collapsing the result like GROUP BY does.

Key concept: Window functions do NOT reduce rows like GROUP BY. You get one result row per input row β€” the window function just adds extra calculated columns alongside the original data.

PARTITION BY dept β€” RANK() by salary within each department
emp_namedeptsalaryRANK()DENSE_RANK()ROW_NUMBER()
RahulAnalyticsβ‚Ή90,000111
PriyaAnalyticsβ‚Ή90,000112
AmitAnalyticsβ‚Ή75,000323
NehaEngineeringβ‚Ή1,20,000111
VikramEngineeringβ‚Ή95,000222
FunctionWhat it doesKey difference
RANK()Rank rows β€” ties get same rank, next rank skips1, 1, 3 (skips 2)
DENSE_RANK()Rank rows β€” ties get same rank, no skipping1, 1, 2 (no skip)
ROW_NUMBER()Unique sequential number β€” no ties ever1, 2, 3 always
LAG(col, n)Value from n rows before current rowPrevious month’s revenue
LEAD(col, n)Value from n rows after current rowNext month’s target
SUM() OVER()Running total or sum per partitionCumulative revenue
AVG() OVER()Moving average across a window7-day rolling average
NTILE(n)Divide rows into n equal bucketsQuartiles, deciles
SQL β€” Window Functions
— Month-over-month revenue change with LAG
SELECT
  month,
  revenue,
  LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
  revenue LAG(revenue, 1) OVER (ORDER BY month) AS mom_change
FROM monthly_revenue;
 
— Running total (cumulative revenue)
SELECT
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY order_date) AS cumulative_revenue
FROM daily_sales;
 
— Top 1 customer per region (classic interview question)
WITH ranked AS (
  SELECT region, customer_name, total_spend,
    ROW_NUMBER() OVER (
      PARTITION BY region ORDER BY total_spend DESC
    ) AS rn
  FROM customer_totals
)
SELECT * FROM ranked WHERE rn = 1;
Interview tip: Use ROW_NUMBER() (not RANK) when you need exactly one result per partition β€” e.g. “the top customer per region.” RANK() would return multiple rows if there’s a tie at the top.
Practice Tool

Live SQL Compiler β€” Practice Right Here

Write and run real SQL queries in your browser. Pre-loaded with sample datasets matching real interview scenarios. No account or setup needed.

SQL Practice Compiler ● Live
SQL Editor
Tables: customers Β· orders Β· products Β· employees
Try:
Query Results
Click “Run Query” to see results
Database Schema β€” click to inspect
customers(id, customer_name, city, email, total_orders, join_date)
Interview Prep

Real SQL Interview Questions with Answers

These exact questions have been asked at Google, Amazon, Flipkart, Swiggy, Paytm and other top companies. Click any question to see the full answer and SQL solution.

Easy
Find the second highest salary from the Employee table without using LIMIT or TOP.
TCSInfosysWipro
+
Use a subquery to exclude the maximum salary, then find the MAX of what’s left. This is a classic interview pattern for finding Nth highest values.
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
Alternative using DENSE_RANK (better for Nth):
WITH ranked AS (
  SELECT salary, DENSE_RANK() OVER(ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT DISTINCT salary FROM ranked WHERE rnk = 2;
Easy
Find all customers who placed orders in 2024 but NOT in 2023.
FlipkartMeesho
+
Use NOT IN with a subquery, or LEFT JOIN with IS NULL. The LEFT JOIN approach is generally preferred for performance on large datasets.
— Method 1: NOT IN
SELECT DISTINCT customer_id FROM orders
WHERE YEAR(order_date) = 2024
  AND customer_id NOT IN (
    SELECT DISTINCT customer_id FROM orders
    WHERE YEAR(order_date) = 2023
  );
Medium
For each department, find the employee with the highest salary. Return department, employee name, and salary.
AmazonPaytmRazorpay
+
The classic window function interview question. Use ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) and filter where rn = 1.
WITH ranked AS (
  SELECT
    department,
    employee_name,
    salary,
    ROW_NUMBER() OVER(
      PARTITION BY department
      ORDER BY salary DESC
    ) AS rn
  FROM employees
)
SELECT department, employee_name, salary
FROM ranked
WHERE rn = 1;
Medium
Calculate the month-over-month revenue growth rate for each month in 2024.
SwiggyZomatoPhonePe
+
Use LAG() to get the previous month’s revenue, then calculate the percentage change. This tests both window functions and arithmetic in SQL.
SELECT
  month,
  revenue,
  LAG(revenue) OVER(ORDER BY month) AS prev_revenue,
  ROUND(
    (revenue LAG(revenue) OVER(ORDER BY month))
    * 100.0
    / LAG(revenue) OVER(ORDER BY month),
    2
  ) AS mom_growth_pct
FROM monthly_revenue
WHERE year = 2024;
Hard
Find users who made purchases on 3 or more consecutive days within the last 30 days.
GoogleMetaUber
+
This is a classic consecutive days problem. The trick is to subtract the ROW_NUMBER from the date β€” rows that are consecutive will have the same “group date.” Count groups with 3+ rows.
WITH daily_activity AS (
  SELECT DISTINCT user_id, DATE(purchase_date) AS pdate
  FROM purchases
  WHERE purchase_date >= CURRENT_DATE 30
),
grouped AS (
  SELECT user_id, pdate,
    pdate ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY pdate) AS grp
  FROM daily_activity
)
SELECT DISTINCT user_id
FROM grouped
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;
Hard
Calculate the 7-day rolling average of daily active users for each product.
GoogleAirbnbWalmart
+
Use AVG() as a window function with ROWS BETWEEN 6 PRECEDING AND CURRENT ROW to define a 7-day window. PARTITION BY product_id ensures separate rolling averages per product.
SELECT
  product_id,
  activity_date,
  dau,
  AVG(dau) OVER (
    PARTITION BY product_id
    ORDER BY activity_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7d_avg
FROM daily_active_users
ORDER BY product_id, activity_date;
Quick Reference

SQL Cheat Sheet β€” Interview Quick Reference

The most important SQL syntax, functions and patterns in one place. Save this page and review before every interview.

Query execution order

SQL executes in this order β€” NOT the order you write it: FROM β†’ JOIN β†’ WHERE β†’ GROUP BY β†’ HAVING β†’ SELECT β†’ ORDER BY β†’ LIMIT. This is why you can’t use a SELECT alias in a WHERE clause β€” SELECT hasn’t run yet when WHERE executes.

SELECTFROMWHEREGROUP BY HAVINGORDER BYLIMITJOIN LEFT JOININNER JOINUNIONDISTINCT IS NULLINBETWEENLIKE COUNT()SUM()AVG()RANK() LAG()LEAD()OVER()PARTITION BY WITH (CTE)CASE WHENCOALESCE()CAST()
PatternSQLUse case
Top N per groupROW_NUMBER() OVER(PARTITION BY grp ORDER BY val DESC)Top customer per region
Running totalSUM(val) OVER(ORDER BY date)Cumulative revenue
% of totalval * 100.0 / SUM(val) OVER()Product share of revenue
Previous valueLAG(val, 1) OVER(ORDER BY date)MoM growth calculation
Anti-join (NOT IN)LEFT JOIN ... WHERE b.id IS NULLCustomers with no orders
Conditional countCOUNT(CASE WHEN status='paid' THEN 1 END)Count by status in one query
DeduplicateROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC) = 1Keep latest record per id
Pivot (manual)SUM(CASE WHEN cat='A' THEN val END) AS cat_aRows to columns

Ready to practise with a real mentor?

Book a free 30-min SQL mock interview. We’ll run through real questions, give you live feedback, and build your personalised prep plan.

Book Free SQL Session