Completely new? Start here.

What is “data” — and is a data career for you?

If you've seen “data analyst”, “data engineer”, or “BI developer” on job ads and wondered what those people actually do, this page is for you. No assumed knowledge, no jargon.

A short read, top to bottom. Click any section to jump straight to it:

  1. What is data?
  2. How to identify data
  3. How data is generated, stored and shared
  4. Data Lifecycle
  5. A real scenario: a UK retailer
  6. Who's on a data team

If you see any terminology you don't understand, we would have explained it here: The data glossary →

01What is data?

The short version

Every company collects information — customer records, sales, machine readings, NHS appointments, parcel deliveries. Data professionals turn that information into insights that drive business decisions: they clean it, organise it, create charts and reports that managers can analyse, and build the plumbing that moves data between systems.

You don't need a maths PhD. You don't need to be a coder. The day-to-day skills are closer to careful spreadsheet work with better tools, plus enough programming-like skills to ask the data the right questions.

Roles split into analysts (they turn data into charts and reports), engineers (they build the pipelines that move data around), and BI developers (they build the dashboards that executives use to make informed business decisions).

At its simplest: data is just recorded facts — about purchases, people, events, sensors, anything. The job of data professionals is turning that record into useful facts: decisions, charts, alerts, automations.

Two flavours: quantitative and qualitative

Most data falls cleanly into one of two camps. Knowing which one you're looking at shapes the questions you can ask and the charts you can draw.

Data
Quantitative
Numerical
Qualitative
Descriptive
Discrete
Example: 412 stores
Continuous
Example: £8.27
Categories, names, free-text
Example: ‘London’ vs ‘Manchester’
Quantitative
Numerical information

Things you can count or measure. Split into  discrete (whole numbers: 412 stores, 3 children) and continuous (decimals: £8.27 basket value, 36.4°C temperature).

Qualitative
Descriptive information

Categories, names, free-text, sentiment. Things you can group but not directly average — ‘Sports’ vs ‘Electronics’, ‘London’ vs ‘Manchester’, the words in a product review.

02How to identify data

You're already surrounded by data

Every app, shop and service you use is collecting it. Once you start looking, you spot it everywhere. Some everyday examples:

Your supermarket loyalty card

Every scan tells the supermarket what you bought, when, where, and at what price. That's why your Clubcard discounts feel uncannily relevant — the data says you buy nappies every two weeks, so they email you a nappy voucher on week three.

Netflix's homepage

The order of rows you see, the thumbnails picked, the autoplay teaser — every choice is driven by data on what people like you have watched and abandoned. Two viewers see two different homepages.

NHS appointment letters

Behind every letter is a database of patient records, GP referrals, hospital capacity, and waiting-list rules. The decision ‘who gets seen Tuesday at 10am’ is a data question.

Your bank's fraud alerts

If a £40 transaction in Manchester is followed by a £400 one in Bangkok ten minutes later, the bank's fraud-detection system flags it instantly. That decision is a data model running in the background.

03How data is generated, stored and shared

Where does the data come from in the first place?

Every data point starts from somewhere. Below are three popular sources you'll meet in a data role.

Machine-generated

Data from sensors, cameras, GPS, satellites, server logs, IoT trackers. The volume is high and no human interaction.

Human-generated

Examples are tweets, photos, status updates, blog posts, product reviews, support emails. They are often messy, free-text, full of nuance.

Business-generated

Sales transactions, user registrations, process events, audit logs. Most companies' core data assets live here.

Where does the data actually live?

Before you can probe data, somebody had to put it somewhere. The three places you'll encounter most often:

Database

The default storage for most companies. Data is stored in tables which consist of rows and columns like an Excel file, related to each other. Data is explored using SQL. Examples are Microsoft SQL Server, PostgreSQL, MySQL.

product
id
name
price
1
Bread
£2.10
2
Milk
£1.45
3
Eggs
£3.20
sales
id
product_id
qty
101
1
12
102
3
4
103
2
8

Data Lake

A centralised repository that holds raw, semi-structured and structured data at any scale and it is cheap. It is used when one database isn't big enough to hold data. Examples are Azure Data Lake, AWS S3.

Azure SQL Data Lake
  • 📦 raw
  • 📁 2026
  • 📁 05
  • 📁 22
  • 📄 orders.json
  • 📄 events.parquet
  • 📁 21
  • 📄 orders.json
  • 📦 curated
  • 📁 2026
  • 📁 05
  • 📁 22
  • 📄 daily_sales.csv

Data Warehouse

A structured, optimised store for analysing historical data and designed for big queries over years of records. It is the storage for most reporting systems. Examples are Databricks, Snowflake, BigQuery, Synapse, Redshift.

fact_sales
date
product_id
total
2026-05-22
1
£25
2026-05-22
3
£13
2026-05-21
2
£12
dim_product
id
name
category
1
Bread
Bakery
2
Milk
Dairy
3
Eggs
Dairy

In short: a database runs your day-to-day app, a data warehouse powers your reporting on years of history, and a data lake holds the raw firehose feeding both. Many companies use all three.

How is data shared with the people who need it?

Raw tables of data are rarely useful on their own. By the time data reaches a decision-maker, it has been shaped into a form that's easy to read and act on. Common medium for sharing include:

  • Dashboards: Live charts updated automatically. Popular dashboard tools are Power BI, Tableau, Looker.
  • Reports: One-off summaries export, often via PDF or PowerPoint.
  • Alerts: Automatic emails / Slack pings when a threshold is breached.
  • APIs: A programmatic access for other systems to extract the data they need.

One important thing about access

Before any of the above happens, somebody has to grant you permissionto see the data. Real companies don't let everyone see everything: a customer-service agent sees one customer's record; an analyst sees aggregated regional sales; a finance director sees revenue but not individual postcodes.

That permission layer (“who can see what”) is part of every data job. On CareerSwerve, when you enrol on a course, we grant you a real SQL login to a training database so you can practise on the same kind of data NHS, energy and logistics companies actually use — but scoped just to your training schemas.

04Data Lifecycle

How data gets to a data person

Data goes on a journey before anyone analyses it. Understanding this lifecycle is half the battle — it's also where you'll spend most of your career, regardless of which tool you specialise in.

  1. 1
    Collected

    Apps log clicks. Tills record sales. Forms capture sign-ups. Sensors stream temperatures. APIs pull in weather data. Anything happening, anywhere, can become a row in a table.

  2. 2
    Stored

    The most common home is a database — a structured set of tables, like spreadsheets that can talk to each other. Smaller jobs use files (CSV, Excel, JSON). Very large jobs use cloud data lakes (Databricks, Azure, AWS S3).

  3. 3
    Cleaned

    Real-world data is messy. Duplicate customers, typo'd postcodes, missing fields, dates in five different formats. Most of a data engineer's day is fixing this so the analysis below it can be trusted.

  4. 4
    Analysed

    An analyst writes queries to answer business questions: ‘Which products did we sell more of last quarter than this one?’ The answer comes back as a small table of numbers.

  5. 5
    Shared

    That small table of numbers becomes a chart, a dashboard, a slide, or an email alert. Done well, a stakeholder can act on it in seconds without ever seeing the underlying data.

The big idea:By the time anyone makes a decision from data insights, the data has been collected from many sources, cleaned, organised, and presented in a way humans can read. Specific data roles' expertise are needed on various parts of this chain.

How fresh does the data need to be?

Two big decisions shape every data pipeline: how often it should run, and the volumeit should pull in each time. Different businesses sit in different boxes: a stock-trading firm wants every tick the moment it lands; a retailer's morning revenue report can wait until 6am.

Batch
Scheduled intervals

Run hourly, nightly, weekly. The classic 'overnight job' that has yesterday's report on your desk by 9am. Easier to operate, cheaper to run.

Real-time (streaming)
As-it-happens

Each event flows through the pipeline within seconds. Fraud detection, live dashboards, IoT sensors, trading. More complex; more expensive.

Full load
The whole dataset, every time

Simple. Reload everything from scratch on each run. Fine for small tables; impossible once the data grows.

Incremental (delta) load
Only what's new or changed

Track a 'last updated' timestamp; pull only rows that changed since then. Most production pipelines work this way.

05A real scenario: a UK retailer

End to end: from till scan to boardroom decision

Imagine you're a national retailer with 400 stores, a website, and a loyalty app. Here's how SQL, Power BI, Python and Databricks fit together to turn every transaction into a business decision.

End-to-end, in one line

Stores collectDatabase stores (SQL)Python cleansSQL queriesPower BI charts

Databricks replaces “Database stores” + “Python cleans” when the data outgrows a single database — same shape, bigger scale.

Want to see it work?

Try the four tools, hands-on

Run a SQL query, drive a Power BI dashboard, follow a Python pipeline, and scale a Databricks cluster — all against the same UK-retailer scenario, all in your browser, no signup.

Start the tour
Got questions?

Drop into the weekly Q&A

Thursdays at 7pm UK · 30 min · Microsoft Teams · free. Bring anything that's holding you back from starting — group format, no agenda.

Register
06Who's on a data team

How the roles fit together, end-to-end

One person rarely does all of this. Most data teams split the work across roles that hand off as the request travels from a business question to a decision-ready insight. Here's the typical flow.

Project oversight
Data Project Manager — keeps every phase below on track
Business need
Question to answer · decision to inform
Data Product Owner
Owns the outcome · prioritises the work
Data Business Analyst
Decodes the requirement · finds the data sources
Data Engineer
Builds pipelines that extract relevant data from the sources, transform it into the form needed to answer business questions, and save it to a data warehouse
Data Architect
Designs the data model and warehouse schema — defines how tables relate, what's stored where, and how the platform scales as data volumes grow
SQL Developer
Writes queries + transformations on the warehouse
Data Warehouse
Single source of truth · curated, governed
Data Analyst
Reports historical trends · explains what happened
Data Scientist
Forecasts what's likely · models, ML, statistics
Insights → Business Decisions
The loop closes — and the next question starts

Roles overlap in real teams — at smaller companies one person may wear two hats; at larger ones each box is a department. The flow is the same.

Reference

Data terminology, defined

Migration, ingestion, transformation, validation, governance, cataloguing… every data role uses these words daily. We've collected them — categorised, plain-English — on a dedicated page you can bookmark.

Open the data glossary →

Still not sure which track? That's what the assessment is for.

Ten short questions about your background, hours per week, and what kind of work appeals to you. Comes back with a recommended track, starting level, and a realistic time plan. No signup, no email capture — just a plan.

Re-launch members

Re-launch prices on every course, live cohort and portfolio project — until 1 July.Browse offers →