Back to cheat sheets

API & Data

SQL

A to-the-point review of the SQL an SDET gets asked about: SELECT execution order, the join types, GROUP BY vs HAVING vs WHERE, subqueries vs CTEs, window functions, indexes, NULL handling, transactions & ACID — plus how you use SQL to validate data in tests. Short explanations, short examples.

01How a SELECT Actually Runs

You write SELECT first, but the database evaluates clauses in a different order — knowing this explains most “why can’t I use that alias here” questions.

#ClauseWhat it does
1FROM / JOINpick & combine the source rows
2WHEREfilter rows (before grouping)
3GROUP BYcollapse into groups
4HAVINGfilter groups
5SELECTpick columns / compute expressions
6ORDER BYsort the result
7LIMITcut the rows returned
Why it matters: WHERE runs before SELECT, so a column alias defined in SELECT isn’t visible in WHERE — but it is visible in ORDER BY (which runs last).

02Joins

A join combines rows from two tables on a matching condition. The type decides what happens to non-matching rows.

  • INNER — only rows that match on both sides.
  • LEFT — all left rows; right columns are NULL where there’s no match.
  • RIGHT — mirror of LEFT (all right rows).
  • FULL OUTER — all rows from both; NULLs where either side is missing.
  • CROSS — every combination (Cartesian product); usually accidental.
  • SELF — a table joined to itself (e.g. employee → manager).
-- customers with no orders (the classic LEFT-JOIN/NULL pattern)
SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;
Common trap: filtering the right table in WHERE turns a LEFT JOIN back into an INNER JOIN (the NULL rows get filtered out). Put that condition in the ON clause to keep them.

03GROUP BY, HAVING & WHERE

Aggregate functions — COUNT, SUM, AVG, MIN, MAX — collapse many rows into one value per group.

  • WHERE filters rows before grouping (can’t use aggregates).
  • HAVING filters groups after grouping (can use aggregates).
  • Every non-aggregated SELECT column must appear in GROUP BY.
  • COUNT(*) counts rows; COUNT(col) skips NULLs; COUNT(DISTINCT col) counts unique non-null values.
SELECT customer_id, COUNT(*) AS orders, SUM(amount) AS total
FROM orders
WHERE status = 'paid'        -- filter rows first
GROUP BY customer_id
HAVING SUM(amount) > 1000     -- then filter groups
ORDER BY total DESC;

04Subqueries & CTEs

  • Subquery — a query nested inside another (in WHERE, FROM, or SELECT).
  • Correlated subquery — references the outer row, so it re-runs per row; powerful but can be slow.
  • CTE (WITH) — a named, readable temporary result; great for breaking a complex query into steps and for recursion (hierarchies).
  • EXISTS is usually faster than IN for existence checks and handles NULLs more predictably.
WITH ranked AS (
  SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS r
  FROM employees
)
SELECT dept, name, salary
FROM ranked
WHERE r <= 3;          -- top 3 earners per department
IN vs EXISTS with NULLs: NOT IN against a subquery that returns a NULL yields no rows at all — a notorious bug. Prefer NOT EXISTS.

05Window Functions

Window functions compute across a set of related rows without collapsing them (unlike GROUP BY). The go-to for ranking, running totals, and row-to-row comparisons.

  • ROW_NUMBER() — unique sequential number; RANK() leaves gaps on ties, DENSE_RANK() doesn’t.
  • LAG() / LEAD() — value from the previous / next row (deltas, trends).
  • Aggregates as windows — SUM(...) OVER (...) for running totals.
  • PARTITION BY = the group; ORDER BY = the sequence inside it.
-- de-duplicate: keep the latest row per id
SELECT * FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn
  FROM events
) t
WHERE rn = 1;
Classic ask: “2nd highest salary” or “latest record per key” — both are ROW_NUMBER()/DENSE_RANK() over a partition, then filter the rank.

06Indexes & Performance

An index is a sorted lookup structure (usually a B-tree) that lets the database find rows without scanning the whole table — the trade is slower writes and extra storage.

  • Clustered index = the table’s physical order (one per table, often the primary key); non-clustered = a separate structure pointing back to rows.
  • Composite index on (a, b) helps filters on a or a, bleft-prefix rule — but not on b alone.
  • Indexes help WHERE, JOIN, ORDER BY; they’re ignored if you wrap the column in a function (WHERE YEAR(d) = 2024) — that kills index use.
  • Use EXPLAIN / EXPLAIN ANALYZE to see whether a query does an index seek or a full scan.
Say this: indexes speed up reads and slow down writes — index the columns you filter/join/sort on, not every column.

07NULL Handling

NULL means unknown, not zero or empty string — and it spreads through expressions, which trips people up.

  • Compare with IS NULL / IS NOT NULL — never = NULL (always unknown).
  • Any arithmetic with NULL is NULL (5 + NULL = NULL).
  • Aggregates skip NULLs — AVG(col) ignores them, which can surprise you.
  • COALESCE(a, b, ...) returns the first non-null; NULLIF(a, b) returns NULL if they’re equal.
SELECT id, COALESCE(nickname, name, 'anonymous') AS display
FROM users;

-- guard a divide-by-zero:
SELECT total / NULLIF(count, 0) AS avg_value FROM stats;

08Transactions & ACID

A transaction groups statements so they succeed or fail as a unit. ACID is what makes that reliable.

PropertyMeaning
Atomicityall statements commit, or none do (ROLLBACK).
Consistencyconstraints hold before and after.
Isolationconcurrent transactions don’t corrupt each other.
Durabilityonce committed, it survives a crash.

Isolation levels trade consistency for concurrency, each allowing more anomalies: Read Uncommitted (dirty reads) → Read CommittedRepeatable Read (stops non-repeatable reads) → Serializable (stops phantoms; strictest).

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;   -- both apply or, on error, ROLLBACK leaves nothing changed

09SQL for Test Automation

For an SDET, SQL is a verification tool: you assert against the database, not just the UI/API.

  • Back-end validation — after an action, query the DB to confirm the row was written with the right values.
  • Data-quality checks — duplicates, orphaned rows, null required fields, row-count reconciliation between source and target.
  • Test setup/teardown — seed fixtures and clean up state with SQL inside your test framework.
  • DDL vs DML vs DCL/TCL — know the families: DDL (CREATE/ALTER/DROP), DML (SELECT/INSERT/UPDATE/DELETE), TCL (COMMIT/ROLLBACK).
-- duplicate-key audit
SELECT email, COUNT(*) AS n
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- orphaned rows (referential integrity)
SELECT o.id FROM orders o
LEFT JOIN customers c ON c.id = o.customer_id
WHERE c.id IS NULL;

10Rapid-Fire Q&A

Reveal each answer to self-check, then test yourself with the quiz.

WHERE vs HAVING?

WHERE filters rows before grouping and can’t use aggregates; HAVING filters groups after GROUP BY and can.

INNER vs LEFT JOIN?

INNER keeps only matching rows; LEFT keeps all left rows, filling right columns with NULL where there’s no match.

How do you find rows with no match?

LEFT JOIN the other table and filter WHERE other.key IS NULL (or use NOT EXISTS).

DELETE vs TRUNCATE vs DROP?

DELETE removes rows (logged, can WHERE, can rollback); TRUNCATE empties the table fast (minimal logging); DROP removes the table itself.

RANK vs DENSE_RANK vs ROW_NUMBER?

ROW_NUMBER is always unique; RANK leaves gaps after ties (1,1,3); DENSE_RANK doesn’t (1,1,2).

What does an index cost?

Faster reads on indexed columns, but slower INSERT/UPDATE/DELETE and extra storage — index what you filter/join/sort on.

Why doesn’t my index get used?

Often because the column is wrapped in a function or implicitly cast (e.g. WHERE YEAR(d)=2024) — rewrite as a range so it’s sargable.

= NULL vs IS NULL?

= NULL is always unknown (never true); use IS NULL / IS NOT NULL to test for nulls.

UNION vs UNION ALL?

UNION removes duplicates (extra sort/dedupe cost); UNION ALL keeps them and is faster.

Correlated subquery?

A subquery that references the outer query’s row, so it’s evaluated once per outer row — can be slow; often rewritable as a join.

What is ACID?

Atomicity, Consistency, Isolation, Durability — the guarantees that make transactions reliable.

How do you get the 2nd highest salary?

DENSE_RANK() over salary DESC and filter rank = 2, or a correlated subquery / OFFSET 1 LIMIT 1 on distinct salaries.