Core Python Built-in Libraries for Data Analysts
datetime, collections, itertools, os — the standard library modules that come up in real data engineering and analytics interviews without any pip install.
These standard library modules come up in data engineering and analytics interviews — no pip install required.
datetime
Parse, format, and calculate dates and times. Essential for any time-series or business analytics question.
- datetime.now() and date.today()
- strptime() to parse strings
- timedelta for date arithmetic
collections
Counter, defaultdict, OrderedDict, namedtuple — data structures that replace verbose manual code.
- Counter for frequency counts
- defaultdict to avoid KeyError
- deque for efficient queues
itertools
chain, combinations, product, groupby — functional tools for combinatorics and data pipelines.
- chain() to flatten iterables
- combinations and permutations
- groupby for run-length encoding
os & pathlib
Navigate file systems, list directories, and build portable file paths — essential for ETL scripts.
- os.path.join for safe paths
- pathlib.Path — modern alternative
- os.listdir / glob for file discovery
json & csv
Read and write the two most common data interchange formats without external libraries.
- json.load / json.dumps
- csv.DictReader for CSV parsing
- Handling encoding and delimiters
functools & operator
reduce, partial, lru_cache — functional programming tools that make pipelines cleaner and faster.
- functools.reduce for aggregation
- functools.lru_cache for memoisation
- operator.itemgetter for sorting
Counter is the fastest way to count anything in Python — interviewers love it as a warm-up question.
Calculate days between dates, find day-of-week, and parse non-standard formats — all come up in time-series questions.
These questions test Python fundamentals beyond the data science libraries.
The standard library tools every data analyst should know without googling.
| Module / Function | Purpose |
|---|---|
| datetime.strptime(str, fmt) | Parse string to datetime object |
| timedelta(days=n) | Add or subtract days from a date |
| collections.Counter(iterable) | Frequency count of any iterable |
| collections.defaultdict(type) | Dict with automatic default values |
| collections.namedtuple | Lightweight data classes |
| itertools.chain.from_iterable | Flatten a list of lists |
| itertools.combinations(iter, r) | All r-length combinations |
| functools.lru_cache | Memoisation decorator |
| functools.reduce(func, iter) | Cumulative aggregation |
| json.load / json.dumps | Parse and serialise JSON |
| csv.DictReader(file) | Read CSV rows as dicts |
| pathlib.Path() | Object-oriented file path handling |
Ready to ace your Python fundamentals round?
Practise built-in library questions with a senior data analyst mentor.
Book Free Python Session