Module 04: Data Platform Modernisation

01

Why modernise your data platform — and why now?

Most organisations are running their analytics on infrastructure that was designed for a different era — on-premise data warehouses, legacy ETL pipelines, and BI tools that were state-of-the-art a decade ago. The problem is not that these systems don't work. It's that they can't support what organisations now need to do with data.

Modern AI and analytics capabilities — real-time processing, machine learning at scale, generative AI integration, self-service analytics for non-technical users — require a different foundation. A legacy warehouse was not designed to serve a large language model. A batch ETL pipeline cannot support real-time decision-making. The platform becomes the constraint.

The real cost of inaction

Staying on legacy infrastructure is not free. The costs are hidden but real: slower time-to-insight, higher maintenance overhead, inability to adopt new AI capabilities, and data quality issues that compound over time. Every year of delay makes the eventual migration harder and more expensive.

The good news: cloud-native data platforms have matured significantly. Tools like Databricks, Snowflake, and the major cloud providers' data stacks offer capabilities that would have been unimaginable — or unaffordable — five years ago. The barrier is no longer technology. It's change management, stakeholder alignment, and migration execution.

Key takeaway

Legacy platforms are not just a technical debt problem — they're a strategic constraint that limits your AI and analytics ambitions.
The cost of inaction is real and growing. Every year of delay compounds the eventual migration complexity.
The main barriers to modernisation are organisational, not technical — plan accordingly.

02

The modern data platform landscape

The terminology in this space can be overwhelming. Here's a practical map of the key components of a modern data platform and what each one does.

Layer	What it does	Key tools
Ingestion	Moves data from source systems into the platform — batch or real-time	Fivetran, Airbyte, Kafka, Azure Data Factory
Storage	Stores raw and processed data at scale, cost-effectively	Azure Data Lake, S3, Google Cloud Storage, Delta Lake
Processing	Transforms, cleans, and enriches data for analysis	Databricks, Spark, dbt, Azure Synapse
Serving	Makes processed data available to analytics and AI tools	Snowflake, BigQuery, Redshift, Databricks SQL
Analytics & BI	Enables exploration, dashboarding, and self-service reporting	Power BI, Tableau, Looker, Metabase
AI & ML	Trains models, runs inference, serves predictions	Databricks MLflow, Azure ML, Vertex AI, SageMaker
Governance & catalogue	Tracks data lineage, ownership, quality, and access	Unity Catalog, Purview, Collibra, Alation

You don't need all of these from day one. The right architecture depends on your organisation's size, maturity, and use cases. The most common mistake is buying a comprehensive platform before you understand what you actually need to build on it.

Key takeaway

Understand each layer before selecting tools — the layers are more important than the specific products.
Start with the layers that are most broken or most constraining. You don't need to modernise everything at once.
Governance and catalogue are often left to last — this is a mistake. Build them in from the start or you'll regret it.

03

The migration playbook — phase by phase

There is no migration that goes exactly to plan. But there are migrations that are well-designed and ones that aren't. Here is the sequence that has worked across the programs I've led and supported.

Phase 1

Discovery & assessment

4–8 weeks

Map every data source, pipeline, and consumer in the current environment. Understand what exists, who uses it, how critical it is, and what the quality baseline is. Do not skip this phase. Migrations that skip discovery always encounter expensive surprises later.

Phase 2

Architecture design

3–6 weeks

Design the target state architecture. Define which tools will be used for each layer, how data will flow, what the governance model will be, and what the migration sequence will be. Get buy-in from technical leads and key stakeholders before proceeding.

Phase 3

Foundation build

6–12 weeks

Build the core infrastructure: storage, processing environment, governance framework, security model, and basic ingestion pipelines. This is the foundation everything else will run on — invest in quality here, it pays dividends for years.

Phase 4

Incremental migration

3–12 months

Migrate data assets in priority order — starting with the most valuable and least complex. Run new and old environments in parallel for each asset until the new environment is validated. Only decommission the old system when confidence is high.

Phase 5

Adoption & optimisation

Ongoing

Train users, build champions, monitor usage and quality. Optimise costs and performance as usage patterns emerge. A migration is only successful when people are actually using the new platform — adoption is the measure of success, not go-live.

Key takeaway

Run old and new in parallel during migration — never cut over before the new environment is validated.
Migrate incrementally by priority, not all at once. Each successful migration builds confidence and momentum.
Adoption is the true measure of success. A platform nobody uses is a failed migration, regardless of technical quality.

04

Stakeholder alignment — the make-or-break factor

I've seen technically excellent migrations fail because of stakeholder misalignment — and I've seen imperfect migrations succeed because the people side was handled well. The technical work is the easier half.

The stakeholders you need to align — and what each one cares about:

C-level / Board

They care about cost, risk, and strategic value. Frame the migration in terms of business outcomes — what decisions will be faster, what capabilities will be unlocked, what the cost of inaction is. Avoid technical detail at this level.

Data consumers (analysts, business users)

They care about continuity — will their reports still work? Will their data still be there? Involve them early, communicate the migration timeline clearly, and never let their data disappear without warning. This group will make or break adoption.

IT & security

They care about security, compliance, and operational stability. Engage them in architecture decisions early — not as a blocker, but as a partner. Their requirements are legitimate and will surface eventually. Better early than late.

Data engineering team

They are doing the work. They care about technical quality, realistic timelines, and not being set up to fail. Protect them from scope creep, give them clear priorities, and create space for them to do the work properly.

Key takeaway

Map your stakeholders before the migration starts and understand what each group cares about.
Never let data consumers discover that their reports have broken — proactive communication prevents most adoption problems.
The data engineering team is your execution engine. Protect their time and give them clear, stable priorities.

05

The five pitfalls that sink migrations

These are not hypothetical. Every one of these I have seen derail a real migration.

💸

Underestimating data quality debt

The legacy system has years of accumulated quality issues that nobody has fully mapped. They surface during migration and cause delays. Always run a data quality assessment before committing to a migration timeline.

🔒

Late security and compliance engagement

Bringing in security and legal after the architecture is designed almost always results in expensive rework. Engage them in Phase 1 — their requirements will shape your design.

📦

Migrating everything at once

Big-bang migrations fail at a high rate. The complexity is unmanageable, the risk is concentrated, and when things go wrong there is no fallback. Always migrate incrementally.

👥

No adoption plan

Technical go-live is not the finish line. If users don't know how to use the new platform, don't trust it, or simply haven't been trained, the migration has failed regardless of technical quality.

💰

Cloud cost shock

Cloud platforms can be dramatically more expensive than expected if usage patterns are not managed. Build cost monitoring and governance into the platform from day one — not after the first bill arrives.

Key takeaway

Run a data quality assessment before finalising your timeline — quality debt is the most common source of delays.
Incremental migration with parallel running is slower but far more reliable than big-bang cutover.
Build cost monitoring in from day one — cloud cost management is a discipline, not an afterthought.

06

Real examples from the field

📍 United Nations — Databricks migration, Geneva

Supporting the UN's migration of its data warehouse to Databricks was a masterclass in the importance of patience and stakeholder management in a complex, multi-stakeholder environment. The technical migration itself was straightforward — the challenge was ensuring that dozens of teams across the organisation could continue to access their data during and after the transition, without disruption to critical reporting. The parallel running phase was longer than planned, but it was the right decision. Every team had time to validate their data in the new environment before the old system was decommissioned. No surprises, no broken reports, no lost trust.

📍 SDG Group — Insurance sector modernisation

Across insurance transformation programs, the data platform modernisation challenge was not just technical but regulatory: insurance data is heavily regulated, and every architectural decision needed to be defensible from a compliance perspective. The approach was to design the governance and security model first, then build the technical architecture around it — not the other way around. This added time upfront but eliminated the expensive rework that typically comes from retrofitting compliance onto a platform that wasn't designed for it.

📍 datalitiks — Building cloud-native from scratch

At datalitiks, the advantage was starting with a blank slate — no legacy to migrate, no existing users to manage. The lesson from this experience is that the decisions you make in the first 90 days of a data platform's life are disproportionately hard to undo later. We invested heavily upfront in data modelling, governance, and quality — choices that paid dividends as the platform scaled. Starting cloud-native is an opportunity to do it right from the beginning. Don't waste it by moving fast and cutting corners on the fundamentals.

Up next

Module 05: MLOps in Practice

From legacy warehouseto cloud-native —without breaking trust.

Why modernise your data platform — and why now?

The modern data platform landscape

The migration playbook — phase by phase

Stakeholder alignment — the make-or-break factor

The five pitfalls that sink migrations

Real examples from the field

From legacy warehouse
to cloud-native —
without breaking trust.