Core Python Built-in Libraries | Data Analyst Interview
🔧 Built-in Libraries

Core Python Built-in Libraries for Data Analysts

datetime, collections, itertools, os — the standard library modules that come up in real data engineering and analytics interviews without any pip install.

Core Concepts
Built-in libraries that appear in interviews

These standard library modules come up in data engineering and analytics interviews — no pip install required.

1datetime

Parse, format, and calculate dates and times. Essential for any time-series or business analytics question.

  • datetime.now() and date.today()
  • strptime() to parse strings
  • timedelta for date arithmetic

2collections

Counter, defaultdict, OrderedDict, namedtuple — data structures that replace verbose manual code.

  • Counter for frequency counts
  • defaultdict to avoid KeyError
  • deque for efficient queues

3itertools

chain, combinations, product, groupby — functional tools for combinatorics and data pipelines.

  • chain() to flatten iterables
  • combinations and permutations
  • groupby for run-length encoding

4os & pathlib

Navigate file systems, list directories, and build portable file paths — essential for ETL scripts.

  • os.path.join for safe paths
  • pathlib.Path — modern alternative
  • os.listdir / glob for file discovery

5json & csv

Read and write the two most common data interchange formats without external libraries.

  • json.load / json.dumps
  • csv.DictReader for CSV parsing
  • Handling encoding and delimiters

6functools & operator

reduce, partial, lru_cache — functional programming tools that make pipelines cleaner and faster.

  • functools.reduce for aggregation
  • functools.lru_cache for memoisation
  • operator.itemgetter for sorting
Interview Example 1
Word frequency with Counter

Counter is the fastest way to count anything in Python — interviewers love it as a warm-up question.

from collections import Counter
 
reviews = [‘good’, ‘great’, ‘good’, ‘bad’, ‘great’, ‘good’]
 
counts = Counter(reviews)
print(counts) # Counter({‘good’: 3, ‘great’: 2, ‘bad’: 1})
print(counts.most_common(2)) # [(‘good’, 3), (‘great’, 2)]
 
# Works on any iterable — strings, lists, DataFrames
char_freq = Counter(“data analyst”)
Interview Example 2
Date arithmetic with datetime

Calculate days between dates, find day-of-week, and parse non-standard formats — all come up in time-series questions.

from datetime import datetime, timedelta
 
signup = datetime.strptime(‘2025-11-01’, ‘%Y-%m-%d’)
purchase = datetime.strptime(‘2026-01-15’, ‘%Y-%m-%d’)
 
days_to_convert = (purchase – signup).days
print(f”Converted after {days_to_convert} days”) # 75
 
# Day of week (0=Monday)
print(purchase.strftime(‘%A’)) # Thursday
 
# Rolling 30-day window end date
window_end = purchase + timedelta(days=30)
Interview Questions
Real built-in library questions asked in 2026

These questions test Python fundamentals beyond the data science libraries.

What is a defaultdict and when would you use it?
defaultdict from collections creates a dictionary that returns a default value instead of raising KeyError when a key is missing. Example: defaultdict(list) starts every new key with an empty list — perfect for grouping data without an explicit check.
How would you read a CSV without Pandas?
Use csv.DictReader — it reads each row as an ordered dict with column names as keys. It’s lightweight, handles quoting and custom delimiters, and works on any file size without loading everything into memory. Still show Pandas in a real project, but knowing the built-in signals depth.
What is functools.lru_cache and when is it useful?
lru_cache memoises the results of a function — it caches the output for previously seen inputs and returns it instantly on repeat calls. Useful for expensive pure functions like API calls, recursive computations, or repeated date parsing inside loops.
How do you flatten a list of lists in Python?
Three ways: itertools.chain.from_iterable(nested) is the most Pythonic; [x for sub in nested for x in sub] is a list comprehension; sum(nested, []) works but is slow for large data. Interviewers want to see itertools as the first answer.
What is pathlib and why prefer it over os.path?
pathlib.Path provides an object-oriented API for file paths. Operations like path / ‘subdir’ / ‘file.csv’, path.suffix, path.stem, and path.glob(‘*.csv’) are far more readable than os.path equivalents. It is the modern standard since Python 3.4.
Quick Reference
Core Python built-ins cheat sheet

The standard library tools every data analyst should know without googling.

Module / FunctionPurpose
datetime.strptime(str, fmt)Parse string to datetime object
timedelta(days=n)Add or subtract days from a date
collections.Counter(iterable)Frequency count of any iterable
collections.defaultdict(type)Dict with automatic default values
collections.namedtupleLightweight data classes
itertools.chain.from_iterableFlatten a list of lists
itertools.combinations(iter, r)All r-length combinations
functools.lru_cacheMemoisation decorator
functools.reduce(func, iter)Cumulative aggregation
json.load / json.dumpsParse and serialise JSON
csv.DictReader(file)Read CSV rows as dicts
pathlib.Path()Object-oriented file path handling

Ready to ace your Python fundamentals round?

Practise built-in library questions with a senior data analyst mentor.

Book Free Python Session