Core Python Built-in Libraries | Data Analyst Interview

Home SQL Python Case Studies Services Blog Book Free Call →

🔧 Built-in Libraries

Core Python Built-in Libraries for Data Analysts

datetime, collections, itertools, os — the standard library modules that come up in real data engineering and analytics interviews without any pip install.

Book Mock Interview ← Back to Python

Core Concepts

Built-in libraries that appear in interviews

These standard library modules come up in data engineering and analytics interviews — no pip install required.

1datetime

Parse, format, and calculate dates and times. Essential for any time-series or business analytics question.

datetime.now() and date.today()
strptime() to parse strings
timedelta for date arithmetic

2collections

Counter, defaultdict, OrderedDict, namedtuple — data structures that replace verbose manual code.

Counter for frequency counts
defaultdict to avoid KeyError
deque for efficient queues

3itertools

chain, combinations, product, groupby — functional tools for combinatorics and data pipelines.

chain() to flatten iterables
combinations and permutations
groupby for run-length encoding

4os & pathlib

Navigate file systems, list directories, and build portable file paths — essential for ETL scripts.

os.path.join for safe paths
pathlib.Path — modern alternative
os.listdir / glob for file discovery

5json & csv

Read and write the two most common data interchange formats without external libraries.

json.load / json.dumps
csv.DictReader for CSV parsing
Handling encoding and delimiters

6functools & operator

reduce, partial, lru_cache — functional programming tools that make pipelines cleaner and faster.

functools.reduce for aggregation
functools.lru_cache for memoisation
operator.itemgetter for sorting

Interview Example 1

Word frequency with Counter

Counter is the fastest way to count anything in Python — interviewers love it as a warm-up question.

from collections import Counter

reviews = [‘good’, ‘great’, ‘good’, ‘bad’, ‘great’, ‘good’]

counts = Counter(reviews)

print(counts) # Counter({‘good’: 3, ‘great’: 2, ‘bad’: 1})

print(counts.most_common(2)) # [(‘good’, 3), (‘great’, 2)]

# Works on any iterable — strings, lists, DataFrames

char_freq = Counter(“data analyst”)

Interview Example 2

Date arithmetic with datetime

Calculate days between dates, find day-of-week, and parse non-standard formats — all come up in time-series questions.

from datetime import datetime, timedelta

signup = datetime.strptime(‘2025-11-01’, ‘%Y-%m-%d’)

purchase = datetime.strptime(‘2026-01-15’, ‘%Y-%m-%d’)

days_to_convert = (purchase – signup).days

print(f”Converted after {days_to_convert} days”) # 75

# Day of week (0=Monday)

print(purchase.strftime(‘%A’)) # Thursday

# Rolling 30-day window end date

window_end = purchase + timedelta(days=30)

Interview Questions

Real built-in library questions asked in 2026

These questions test Python fundamentals beyond the data science libraries.

What is a defaultdict and when would you use it?

defaultdict from collections creates a dictionary that returns a default value instead of raising KeyError when a key is missing. Example: defaultdict(list) starts every new key with an empty list — perfect for grouping data without an explicit check.

How would you read a CSV without Pandas?

Use csv.DictReader — it reads each row as an ordered dict with column names as keys. It’s lightweight, handles quoting and custom delimiters, and works on any file size without loading everything into memory. Still show Pandas in a real project, but knowing the built-in signals depth.

What is functools.lru_cache and when is it useful?

lru_cache memoises the results of a function — it caches the output for previously seen inputs and returns it instantly on repeat calls. Useful for expensive pure functions like API calls, recursive computations, or repeated date parsing inside loops.

How do you flatten a list of lists in Python?

Three ways: itertools.chain.from_iterable(nested) is the most Pythonic; [x for sub in nested for x in sub] is a list comprehension; sum(nested, []) works but is slow for large data. Interviewers want to see itertools as the first answer.

What is pathlib and why prefer it over os.path?

pathlib.Path provides an object-oriented API for file paths. Operations like path / ‘subdir’ / ‘file.csv’, path.suffix, path.stem, and path.glob(‘*.csv’) are far more readable than os.path equivalents. It is the modern standard since Python 3.4.

Quick Reference

Core Python built-ins cheat sheet

The standard library tools every data analyst should know without googling.

Module / Function	Purpose
datetime.strptime(str, fmt)	Parse string to datetime object
timedelta(days=n)	Add or subtract days from a date
collections.Counter(iterable)	Frequency count of any iterable
collections.defaultdict(type)	Dict with automatic default values
collections.namedtuple	Lightweight data classes
itertools.chain.from_iterable	Flatten a list of lists
itertools.combinations(iter, r)	All r-length combinations
functools.lru_cache	Memoisation decorator
functools.reduce(func, iter)	Cumulative aggregation
json.load / json.dumps	Parse and serialise JSON
csv.DictReader(file)	Read CSV rows as dicts
pathlib.Path()	Object-oriented file path handling

Ready to ace your Python fundamentals round?

Practise built-in library questions with a senior data analyst mentor.

Book Free Python Session