“`html
Data Analyst
CEH vs OSCP vs CISSP: Which Cybersecurity Certifications Are Worth It for Data Analysts in 2026 — And Why Ignoring This Could Cost You Your Next Job Offer
By dataanalystinterview.com Team · May 1, 2026 · 12 min read
India reported over 1.3 million cybersecurity incidents in 2023 alone, according to CERT-In, and that number has only climbed since. Companies like PhonePe, Paytm, and HDFC process hundreds of millions of transactions daily, and every single one of those transactions generates data that a data analyst touches. Here is what nobody tells freshers entering data analytics: the moment you start working with production data at a fintech or e-commerce company, cybersecurity is no longer someone else’s problem. It becomes yours. Data governance failures, access control gaps, and compliance breaches are increasingly landing on analyst teams — not just security teams. If you walk into a 2026 interview at Razorpay or Juspay without understanding the difference between CEH, OSCP, and CISSP, you are leaving money and credibility on the table.
What CEH, OSCP, and CISSP Actually Mean — And Why 2026 Is the Year Data Analysts Can No Longer Ignore Them
Let us be precise about what these three certifications actually are, because the internet is full of vague comparisons that do not help you make a real decision. CEH stands for Certified Ethical Hacker, offered by EC-Council. It teaches you how attackers think — how systems get breached, what vulnerabilities look like, and how to test defenses before a malicious actor does. OSCP stands for Offensive Security Certified Professional, offered by Offensive Security. It is hands-on, brutal, and respected — you spend 24 hours trying to hack into a series of machines in a controlled lab environment and then write a professional report. CISSP stands for Certified Information Systems Security Professional, offered by ISC2. It is a management-level certification that covers eight domains of security knowledge — risk management, architecture, software development security, identity management, and more. It requires five years of work experience before you can even sit for it.
In 2026, a Gartner report placed data security as the top investment priority for technology leaders across APAC. What this signals for India specifically is clear: companies are not just hiring security specialists anymore. They are expecting their data teams — analysts, engineers, scientists — to arrive with at least a foundational fluency in how data gets exposed, why it matters, and what the compliance framework around it looks like. The trend is not theoretical. It is showing up in job descriptions at Swiggy, Zepto, and CRED right now, where roles titled “Data Analyst — Risk and Compliance” or “Analytics Engineer — Data Security” are posting at 18 to 25 lakhs per annum.
Note
Most candidates assume cybersecurity certifications are only relevant for security engineers. The shift happening in 2026 is that data governance, DPDP Act compliance (India’s data protection law), and internal audit readiness are now sitting inside analytics team charters at mid-to-large companies. An analyst who can speak the language of CEH or CISSP will always outcompete one who cannot, even in a pure analytics role.
How the Cybersecurity Certification Wave Is Changing Data Analyst Hiring Patterns in India Right Now
Here is something concrete. When Zepto scaled from 10 to 40+ dark stores in 2023 and 2024, their data infrastructure sprawled rapidly. Analysts were suddenly working with customer location data, payment instrument data, and order behaviour data across dozens of microservices. The question their hiring managers started asking in interviews was not just “can you write a good SQL query” — it was “do you understand what happens when customer PII leaks from a poorly constructed data pipeline?” That question comes directly from the cybersecurity domain, and knowing the answer requires at least a surface understanding of what CEH-level ethical hacking knowledge covers.
At Paytm and PhonePe, the RBI’s data localisation mandates have forced analytics teams to build dashboards that track where data lives, how long it is retained, and who accessed it. That is a CISSP-flavoured skillset — it is about governance, risk, and compliance, not just querying. At CRED, which handles premium credit card data, the internal data team works closely with the security team to ensure that any new analytical model or report does not inadvertently expose cardholder information. An analyst who has read the CISSP syllabus — even without the certification — will immediately understand why that audit trail matters and how to build one. HDFC’s internal analytics divisions have started requiring new hires to complete a cybersecurity awareness module before they get production database access. That is not a security team requirement. That is an analytics team requirement.
The salary differential is also becoming real. A data analyst at a fintech with a CEH or CISSP certification is negotiating from a stronger position than one without. Entry-level analysts in pure analytics roles are earning 6 to 10 lakhs per annum in 2026. The moment a cybersecurity dimension enters the role — data governance analyst, compliance analytics, fraud analytics — that band shifts to 14 to 22 lakhs. That gap is not going to close. It is widening.
Interview Questions This Topic Is Generating at Top Companies — And What Interviewers Are Really Testing
Companies like Flipkart, PhonePe, and Swiggy are not asking candidates to recite the CEH syllabus in interviews. What they are doing is weaving cybersecurity-adjacent thinking into their analytics rounds — asking questions that separate candidates who understand data as a business asset with risk attached from candidates who see data as just rows and columns to query. The interviewer is testing whether you think about downstream consequences of the data you work with, whether you have any instinct for access control and data minimisation, and whether you can hold a credible conversation with a security team. These five questions are showing up across companies right now.
How would you handle a situation where you realise your analytics pipeline is pulling in more customer data than the business requirement actually needs?
This is a data minimisation question dressed as a process question. The interviewer wants to hear that you would flag it immediately, raise it with the data engineering team, and document the change — not just quietly fix it. Strong candidates mention the concept of purpose limitation, which is core to both CISSP and India’s DPDP Act. Bring in your understanding that over-collection is a compliance risk, not just a hygiene issue.
Write a SQL query to identify users who have accessed a sensitive data table more than 50 times in the last 7 days, and flag those who are not listed in the approved access roster.
This is a security audit query in disguise. The trap here is joining on user IDs without accounting for service accounts or system processes that will inflate counts. Handle this by filtering on user_type = ‘human’ or an equivalent flag before aggregating. The interviewer is testing whether you think about the data quality and edge cases in an access log, not just whether you can write a JOIN.
Razorpay is launching a new feature that requires analysts to access raw payment instrument data. As the lead analyst, how would you advise the team on structuring data access for this project?
Use the principle of least privilege as your framework anchor. Walk through: what is the minimum data needed, who needs access, for how long, what is the audit trail, and what is the de-identification strategy for non-production environments. A strong answer names specific controls — role-based access, row-level security in the data warehouse, and a time-bound access window — rather than staying abstract.
How would you explain a data breach risk assessment to a product manager who has no security background?
The interviewer is testing communication and translation skills. The answer should anchor on business impact first — what does a breach cost in rupees, in user trust, in regulatory fines under the DPDP Act — before moving to technical controls. Avoid leading with jargon. Strong candidates use an analogy: treating sensitive data like cash in a vault, where you need to know who has a key, when they used it, and why.
You are auditing a dataset and you notice that a column labeled “user_id” actually contains email addresses for 30 percent of rows. How do you approach this?
This is a data quality and PII governance question combined. The right answer is: document it immediately, do not use the column for any analysis until it is cleaned, alert the data engineering and security teams, and check whether that PII was ever exported to a non-secure environment. Candidates who say “I would just clean it and move on” are flagging a gap in their security thinking that interviewers will notice.
Interview Tip
When cybersecurity-flavoured questions come up in analytics interviews, lead with business impact before technical detail. Interviewers at companies like PhonePe and Juspay are not checking whether you can recite CISSP domains — they are checking whether you think like someone who understands that data has risk attached to it. Use a metrics-first framing: “The risk here is X, which could cost the business Y, so the control I would put in place is Z.” That structure signals maturity. Avoid getting lost in technical jargon before you have established why the business should care.
SQL You Need to Know: Auditing Data Access Logs Like a Security-Aware Analyst
Here is a scenario that plays out constantly at companies like Juspay and Razorpay. The security team flags that a sensitive table — say, one containing masked card numbers and transaction values — has been accessed unusually frequently by a set of internal users. The analytics team is asked to pull an access audit report from the data warehouse logs. The table is called data_access_logs and it has columns: log_id, user_id, user_type, table_name, access_timestamp, query_type, and rows_returned. The security team wants to know which human users accessed the payments_sensitive table more than 50 times in the past 7 days, how that compares to their 30-day average, and which ones are not on the approved access list stored in approved_data_access table.
-- Identifying anomalous access patterns to sensitive tables
-- Compares last 7 days access volume against 30-day rolling average
-- Flags users not on the approved access roster
WITH base_access AS (
SELECT
user_id,
user_type,
table_name,
access_timestamp,
query_type,
rows_returned
FROM data_access_logs
WHERE
table_name = 'payments_sensitive'
AND user_type = 'human'
AND access_timestamp >= CURRENT_DATE - INTERVAL '30 days'
),
seven_day_counts AS (
SELECT
user_id,
COUNT(*) AS access_count_7d,
SUM(rows_returned) AS total_rows_returned_7d
FROM base_access
WHERE access_timestamp >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY user_id
),
thirty_day_avg AS (
SELECT
user_id,
COUNT(*) / 4.0 AS avg_weekly_access_30d
FROM base_access
GROUP BY user_id
),
combined AS (
SELECT
s.user_id,
s.access_count_7d,
s.total_rows_returned_7d,
ROUND(t.avg_weekly_access_30d, 2) AS avg_weekly_access_30d,
ROUND(
(s.access_count_7d - t.avg_weekly_access_30d) /
NULLIF(t.avg_weekly_access_30d, 0) * 100, 2
) AS pct_deviation_from_avg
FROM seven_day_counts s
LEFT JOIN thirty_day_avg t ON s.user_id = t.user_id
WHERE s.access_count_7d > 50
)
SELECT
c.user_id,
c.access_count_7d,
c.total_rows_returned_7d,
c.avg_weekly_access_30d,
c.pct_deviation_from_avg,
CASE
WHEN a.user_id IS NULL THEN 'NOT APPROVED'
ELSE 'APPROVED'
END AS approval_status
FROM combined c
LEFT JOIN approved_data_access a ON c.user_id = a.user_id
ORDER BY c.pct_deviation_from_avg DESC;
What this query tells you is which users are not just accessing the table frequently, but accessing it significantly more than their own historical baseline — that percentage deviation column is what security teams actually care about. In an interview, presenting this query and explaining that you used a user’s own 30-day baseline rather than a fixed threshold shows that you understand context-aware anomaly detection. A follow-up question an interviewer might ask is: “What if a user was on leave for three of the four weeks? How would your 30-day average be misleading?” That is your cue to discuss adjusting the baseline to active days only.
Common Mistake
Candidates frequently forget to use NULLIF when calculating percentage deviations, which causes a division-by-zero error for users who had zero historical access. In a security audit context, a new user accessing a sensitive table 60 times in their first week is arguably the most suspicious pattern of all — but a query that crashes on their row will never surface them. Always wrap your denominator in NULLIF(column, 0) when computing ratios or percentage changes.
Python for This Topic: Detecting Anomalous Data Access Patterns With Code Analysts Actually Write
Imagine you are an analyst at a company like CRED, where the security team has handed you 90 days of data access logs as a CSV because their SIEM tool does not have a good analytics front end. They want to know whether there are users whose access patterns look statistically unusual compared to peers in the same role. This is a classic outlier detection problem, and it maps directly to what an analyst with even basic Python skills can solve using z-score normalisation on access frequency grouped by user role. The dataset has columns: user_id, role, date, table_accessed, access_count.
# Scenario: Detecting anomalous data access behaviour at a fintech company
# Dataset: 90 days of internal data access logs from CRED's warehouse
# Goal: Flag users whose weekly access frequency is statistically unusual for their role
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Load the access log data
df = pd.read_csv('data_access_logs_90d.csv', parse_dates=['date'])
# Create a week column for weekly aggregation
df['week'] = df['date'].dt.isocalendar().week
# Aggregate total accesses per user per week
weekly_access = (
df.groupby(['user_id', 'role', 'week'])['access_count']
