Interactive tour

The data stack, hands-on

Try SQL, Power BI, Python and Databricks side-by-side — same UK retail scenario, same numbers, four different tools doing four different jobs. Everything runs in your browser; no signup, no setup.

New to data? Start with the beginner's guide →

From till scan to boardroom decision

Imagine you're a national retailer with 400 stores, a website, and a loyalty app. Here's how SQL, Power BI, Python and Databricks fit together to turn every transaction into a business decision.

Step 1

Data collection

Every till scan, online checkout and loyalty-card tap lands in a central SQL database — one big set of related tables: customers, products, orders, returns.

Step 2 · SQL

Ask questions of the database

A data analyst writes SQL (Structured Query Language), short English-like queries, to ask the database questions. It is the language a database understands and is used to manipulate data. It is the first thing you need to learn. See examples below.

Pick a question and see the result in the table below

Try a variation

SQL query· read-only

SELECT
  product_category,
  COUNT(*)       AS transactions,
  SUM(price_gbp) AS total_revenue
FROM sales
WHERE region = 'London'
  AND order_date >= '2026-04-01'
  AND order_date <  '2026-05-01'
GROUP BY product_category
ORDER BY total_revenue DESC;

Query explained

Aggregates every London sale in April 2026 by product category, returning total revenue and transaction count per category, sorted highest revenue first.

SELECT — Choose which columns to return — here: the category name, a count of transactions, and the total revenue.
COUNT(*) — Aggregate function: counts the number of rows in each group. With GROUP BY product_category, you get one count per category.
SUM — Aggregate function: adds up all values in a column. SUM(price_gbp) gives the total revenue per category.
AS — Renames a column in the output. AS transactions / AS total_revenue makes the result table headings readable.
FROM — Read from the sales table.
WHERE — Filter rows: London only, dates between 1st April (inclusive) and 1st May (exclusive).
GROUP BY — Roll up rows that share a category so COUNT and SUM apply per category, not per row.
ORDER BY — Sort the result — largest revenue first, descending.

💡 Click a variation below to swap in a different region, date or category.

Result

Loading the in-browser SQL engine (~640 KB, one-time download)…

Step 3 · Power BI

Visualise the answer

A BI developer wires those queries into a Power BI dashboard. The marketing director opens one tab on Monday morning and sees bar charts, regional maps and league tables — never the SQL behind them.

Sales Dashboard· All regions · All months · All categories

Updated 09:14 · auto-refresh on

Region

Month

Revenue by category (£M)· click a bar to cross-filter

Channel mix

In-store45%
Web37%
Mobile app14%
Click & collect4%

Total revenue

£59.8M

vs. same period last year

+35.2%

Synthetic data — click a slicer or a bar to see every visual recompute, the same way a real Power BI report would.

Liked that?

Build dashboards like that from scratch

Power BI Core walks you through connecting data, modelling it, and shipping interactive reports — same techniques behind the dashboard above. 24 hours · 113 lessons · £99 £129.

Start Power BI Core

Step 4 · Python

Move and transform data automatically

When SQL alone can't do the job — joining weather data to ice-cream sales, cleaning a messy supplier feed, emailing 400 store managers a personalised PDF — a data engineer writes Python. The glue between systems.

Pick a pipeline

Python pipeline

import pandas as pd

sales = pd.read_csv('sales.csv')
print(sales.head())

Pipeline explained

Loads a CSV into a pandas DataFrame and shows the top of the table — the quickest way to sanity-check the shape of any new data.

import pandas as pd — Load the pandas library — Python's standard data-handling toolkit. Aliased as pd by universal convention.
read_csv — Read a CSV file into a DataFrame (a table with rows + columns).
head — Return the first 5 rows of the DataFrame. Default count is 5; pass a number for more.
print — Pretty-print to the terminal / notebook output.

💡 Every pandas pipeline starts with read_csv (or read_sql, read_parquet, read_excel) + a quick .head() to confirm the data loaded correctly.

Pre-computed output

order_date	product_category	product_name	price_gbp	region
2026-04-01	Electronics	Headphones	199.99	London
2026-04-02	Sports	Running shoes	79.99	London
2026-04-02	Home & Garden	Lamp	34.99	London
2026-04-03	Sports	Yoga mat	24.99	London
2026-04-03	Clothing	Jeans	39.99	London

head() returns the first 5 rows by default — pass a number to change it (e.g. head(10)).

Step 5 · Databricks

When the data won't fit on one computer

Some answers need to scan every row of years of history — far too much for one machine. Databricks splits the data into chunks, runs the analysis across many computers in parallel, then stitches their answers back together. That's distributed parallel processing: hours of work compressed into seconds. The standard for banks, NHS trusts, energy companies and large retailers.

Databricks Cluster200M rows · 4 workers

Pick the data size

Pick the cluster size

sales_history · 200M rows

split across the cluster ↓

Worker 1

0 – 50M

Worker 2

50M – 100M

Worker 3

100M – 150M

Worker 4

150M – 200M

recombine results ↓

Aggregated answer

~1m 56s across the cluster · 3.4× faster than a single machine (6m 40s)

Aggregated result · top product categories by all-time revenue

Category	Revenue	Transactions
Electronics	£227M	36.0M
Sports	£185M	42.0M
Clothing	£160M	48.0M
Home & Garden	£101M	22.0M
Books	£42.0M	12.0M

5 rows · aggregated from 200M source rows in ~1m 56s

Rows scanned

200M

Workers

Estimated runtime

~1m 56s

3.4× vs single machine

Synthetic timing model — pick a bigger data size or smaller cluster and watch the runtime climb (or the cluster crash). That trade-off is exactly the call data engineers make when they pick a cluster.

That's the stack, end-to-end

Stores collectDatabase stores (SQL)Python cleansSQL queriesPower BI charts

Databricks replaces “Database stores” + “Python cleans” when the data outgrows a single database — same shape, bigger scale.

Ready to pick a track?

Ten short questions about your background, hours per week, and what kind of work appeals to you. Comes back with a recommended track, starting level, and a realistic time plan. No signup, no email capture — just a plan.

Take the 7-minute assessment →Or browse all courses →

Got questions instead of an answer? Drop into the weekly Q&A — Thursdays 7pm UK, free.