Interactive tour

The data stack, hands-on

Try SQL, Power BI, Python and Databricks side-by-side — same UK retail scenario, same numbers, four different tools doing four different jobs. Everything runs in your browser; no signup, no setup.

New to data? Start with the beginner's guide →

From till scan to boardroom decision

Imagine you're a national retailer with 400 stores, a website, and a loyalty app. Here's how SQL, Power BI, Python and Databricks fit together to turn every transaction into a business decision.

Step 1

Data collection

Every till scan, online checkout and loyalty-card tap lands in a central SQL database — one big set of related tables: customers, products, orders, returns.

Step 2 · SQL

Ask questions of the database

A data analyst writes SQL (Structured Query Language), short English-like queries, to ask the database questions. It is the language a database understands and is used to manipulate data. It is the first thing you need to learn. See examples below.

Pick a question and see the result in the table below
Try a variation
SQL query· read-only
SELECT
  product_category,
  COUNT(*)       AS transactions,
  SUM(price_gbp) AS total_revenue
FROM sales
WHERE region = 'London'
  AND order_date >= '2026-04-01'
  AND order_date <  '2026-05-01'
GROUP BY product_category
ORDER BY total_revenue DESC;
Query explained

Aggregates every London sale in April 2026 by product category, returning total revenue and transaction count per category, sorted highest revenue first.

  • SELECTChoose which columns to return — here: the category name, a count of transactions, and the total revenue.
  • COUNT(*)Aggregate function: counts the number of rows in each group. With GROUP BY product_category, you get one count per category.
  • SUMAggregate function: adds up all values in a column. SUM(price_gbp) gives the total revenue per category.
  • ASRenames a column in the output. AS transactions / AS total_revenue makes the result table headings readable.
  • FROMRead from the sales table.
  • WHEREFilter rows: London only, dates between 1st April (inclusive) and 1st May (exclusive).
  • GROUP BYRoll up rows that share a category so COUNT and SUM apply per category, not per row.
  • ORDER BYSort the result — largest revenue first, descending.

💡 Click a variation below to swap in a different region, date or category.

Result
Loading the in-browser SQL engine (~640 KB, one-time download)…
Step 3 · Power BI

Visualise the answer

A BI developer wires those queries into a Power BI dashboard. The marketing director opens one tab on Monday morning and sees bar charts, regional maps and league tables — never the SQL behind them.

Sales Dashboard
Updated 09:14 · auto-refresh on
Region
Month
Revenue by category (£M)· click a bar to cross-filter
Channel mix
  • In-store45%
  • Web37%
  • Mobile app14%
  • Click & collect4%
Total revenue
£59.8M
vs. same period last year
+35.2%
Synthetic data — click a slicer or a bar to see every visual recompute, the same way a real Power BI report would.
Liked that?

Build dashboards like that from scratch

Power BI Core walks you through connecting data, modelling it, and shipping interactive reports — same techniques behind the dashboard above. 24 hours · 113 lessons · £99 £129.

Start Power BI Core
Step 4 · Python

Move and transform data automatically

When SQL alone can't do the job — joining weather data to ice-cream sales, cleaning a messy supplier feed, emailing 400 store managers a personalised PDF — a data engineer writes Python. The glue between systems.

Pick a pipeline
Python pipeline
import pandas as pd

sales = pd.read_csv('sales.csv')
print(sales.head())
Pipeline explained

Loads a CSV into a pandas DataFrame and shows the top of the table — the quickest way to sanity-check the shape of any new data.

  • import pandas as pdLoad the pandas library — Python's standard data-handling toolkit. Aliased as pd by universal convention.
  • read_csvRead a CSV file into a DataFrame (a table with rows + columns).
  • headReturn the first 5 rows of the DataFrame. Default count is 5; pass a number for more.
  • printPretty-print to the terminal / notebook output.

💡 Every pandas pipeline starts with read_csv (or read_sql, read_parquet, read_excel) + a quick .head() to confirm the data loaded correctly.

Pre-computed output
order_dateproduct_categoryproduct_nameprice_gbpregion
2026-04-01ElectronicsHeadphones199.99London
2026-04-02SportsRunning shoes79.99London
2026-04-02Home & GardenLamp34.99London
2026-04-03SportsYoga mat24.99London
2026-04-03ClothingJeans39.99London
head() returns the first 5 rows by default — pass a number to change it (e.g. head(10)).
Step 5 · Databricks

When the data won't fit on one computer

Some answers need to scan every row of years of history — far too much for one machine. Databricks splits the data into chunks, runs the analysis across many computers in parallel, then stitches their answers back together. That's distributed parallel processing: hours of work compressed into seconds. The standard for banks, NHS trusts, energy companies and large retailers.

Databricks Cluster200M rows · 4 workers
Pick the data size
Pick the cluster size
sales_history · 200M rows
split across the cluster ↓
Worker 1
0 – 50M
Worker 2
50M – 100M
Worker 3
100M – 150M
Worker 4
150M – 200M
recombine results ↓
Aggregated answer
~1m 56s across the cluster · 3.4× faster than a single machine (6m 40s)
Aggregated result · top product categories by all-time revenue
CategoryRevenueTransactions
Electronics£227M36.0M
Sports£185M42.0M
Clothing£160M48.0M
Home & Garden£101M22.0M
Books£42.0M12.0M
5 rows · aggregated from 200M source rows in ~1m 56s
Rows scanned
200M
Workers
4
Estimated runtime
~1m 56s
3.4× vs single machine
Synthetic timing model — pick a bigger data size or smaller cluster and watch the runtime climb (or the cluster crash). That trade-off is exactly the call data engineers make when they pick a cluster.

That's the stack, end-to-end

Stores collectDatabase stores (SQL)Python cleansSQL queriesPower BI charts

Databricks replaces “Database stores” + “Python cleans” when the data outgrows a single database — same shape, bigger scale.

Ready to pick a track?

Ten short questions about your background, hours per week, and what kind of work appeals to you. Comes back with a recommended track, starting level, and a realistic time plan. No signup, no email capture — just a plan.

Got questions instead of an answer? Drop into the weekly Q&A — Thursdays 7pm UK, free.

Re-launch members

Re-launch prices on every course, live cohort and portfolio project — until 1 July.Browse offers →