Learning HubModule 05
⚙️ MLOps in Practice

Getting ML models
out of notebooks and
into production.

MLOps is not a technical discipline — it's an organisational one. The gap between a working model and a model that delivers value in production is wider than most organisations expect.

Reading time~16 min
LevelAll levels
Module05 of 06
ByAna Rubio Herrera
01

What is MLOps — and why does it matter?

Every year, organisations invest significant resources in building machine learning models. Data scientists spend months developing models that achieve impressive accuracy in controlled conditions. And then — far too often — those models sit in notebooks, never making it into production. Or they get deployed once and never maintained, drifting silently until they become unreliable.

MLOps is the discipline of operationalising machine learning — the practices, tools, and culture required to deploy, monitor, and maintain ML models in production reliably and at scale. It borrows from software engineering (DevOps) and applies those principles to the unique challenges of machine learning.

The core problem MLOps solves

A machine learning model is not like a piece of software. Software behaves consistently as long as the code doesn't change. A model's performance degrades over time as the real world drifts away from the data it was trained on. MLOps provides the systems to detect, respond to, and prevent this degradation.

MLOps is also fundamentally an organisational challenge. The gap between data science teams (who build models) and engineering teams (who deploy and maintain systems) is at the root of most ML production failures. MLOps is as much about bridging that gap as it is about tooling.

Key takeaway
  • A model in a notebook creates no business value. Only models in production — reliably serving predictions — deliver ROI.
  • Models degrade over time. Without monitoring and retraining, a model that worked well at launch will quietly become a liability.
  • MLOps is 50% tooling, 50% culture. The organisational gap between data science and engineering is the hardest part to close.
02

The ML lifecycle — end to end

Understanding the full lifecycle of a machine learning model helps you identify where your organisation's bottlenecks are — and where MLOps practices will have the most impact.

📦
Data preparation
Collecting, cleaning, and transforming data into training-ready format. The most time-consuming phase — typically 60–80% of a data scientist's time.
🧪
Experimentation
Training and evaluating candidate models. Tracking experiments — parameters, metrics, artefacts — so results are reproducible and comparable.
Validation
Testing model performance on held-out data, checking for bias, validating against business acceptance criteria before deployment.
🚀
Deployment
Packaging the model and making it available as a service — via API, batch job, or embedded in an application. This is where most organisations struggle.
📊
Monitoring
Tracking model performance, data drift, and system health in production. The ongoing work that keeps models reliable after launch.
🔄
Retraining
Updating the model with new data when performance degrades. Automated retraining pipelines are the mark of a mature MLOps practice.

Most organisations have reasonable practices for the first two or three stages. The bottleneck is almost always at deployment and beyond — getting models into production reliably, and keeping them there.

Key takeaway
  • Map where your organisation spends its time across the lifecycle — the bottleneck is usually at deployment or monitoring, not experimentation.
  • Experiment tracking is foundational — if you can't reproduce a result, you can't build on it reliably.
  • Retraining is not optional. Build it into the plan from the start, not as an afterthought when performance starts to drop.
03

The data science — engineering gap

The most common root cause of MLOps failure is not tooling — it's the organisational gap between the people who build models and the people who deploy and maintain systems. Understanding this gap is the first step to closing it.

DimensionData science mindsetEngineering mindset
Primary goalMaximise model accuracyEnsure system reliability
Time horizonExperiment cycle (days/weeks)Production lifecycle (months/years)
Success metricAUC, F1, RMSEUptime, latency, cost
Code styleExploratory notebooksProduction-grade, tested, versioned
Failure modeModel doesn't convergeSystem goes down at 2am
Relationship to changeFrequent experimentationControlled, tested releases

Neither mindset is wrong — they're both essential. The problem is when they operate in isolation. MLOps is what happens when you force these two worlds to work together, with shared tools, shared standards, and shared accountability for production outcomes.

⚠ The most expensive gap

Data scientists build models in environments that don't reflect production — different data, different infrastructure, different dependencies. When the model is handed to engineering for deployment, it breaks. Closing this gap requires shared environments and standards from the start, not just at handoff.

Key takeaway
  • The data science–engineering gap is organisational, not just technical. Address it with shared goals, shared tools, and joint ownership of production outcomes.
  • Data scientists need to understand production constraints. Engineers need to understand model behaviour. Create forums for both.
  • Shared development environments between data science and engineering eliminate the most common source of deployment failures.
04

MLOps maturity — where is your organisation?

MLOps capability exists on a spectrum. Understanding where your organisation sits today is the starting point for knowing what to invest in next.

Level 0
Manual everything
Models built in notebooks, deployed manually, not monitored. Retraining happens when someone notices things have gone wrong. Most organisations start here.
Level 1
Automated training pipelines
Training is automated and reproducible. Experiment tracking is in place. Deployment is still manual or semi-manual. Basic monitoring exists but is not actionable.
Level 2
Automated deployment
CI/CD pipelines for model deployment. Model registry in use. Monitoring covers key performance metrics. Retraining is triggered manually but the process is defined.
Level 3
Full automation & self-healing
Automated retraining triggered by drift detection. A/B testing and canary deployments in place. Full observability across the ML lifecycle. Few organisations reach this level.

Most organisations operating ML in production sit at Level 1 or Level 2. The goal is not to reach Level 3 as fast as possible — it's to move one level at a time, solving real problems at each stage before adding more complexity.

Key takeaway
  • Honestly assess your current level before planning investments. Most organisations overestimate their maturity.
  • Move one level at a time. Each level solves specific problems — don't build Level 3 infrastructure before you've solved Level 1 problems.
  • Level 3 is not the goal for every organisation. The right level depends on the volume and criticality of your ML use cases.
05

Monitoring in production — what to watch

A model in production without monitoring is a liability waiting to materialise. Here are the four dimensions you need to monitor — and what happens when you don't.

📉
Model performance
Are predictions still accurate? Track business metrics (conversion, churn, fraud detection rate) alongside model metrics. Business metrics are the ultimate truth.
🌊
Data drift
Is the input data distribution changing? If the real world looks different from the training data, model performance will degrade — often silently.
System performance
Latency, throughput, error rates, infrastructure costs. The model needs to perform reliably as a system, not just produce accurate predictions in isolation.
⚖️
Fairness & bias
Are predictions fair across different demographic groups? Bias can emerge or worsen over time as data distributions shift. Especially critical in regulated industries.
⚠ Silent degradation

The most dangerous failure mode in production ML is silent degradation — the model keeps running, keeps returning predictions, but the predictions are increasingly wrong. Without monitoring, you may not discover this until the business impact is severe. Build alerting from day one.

Key takeaway
  • Monitor all four dimensions — model performance, data drift, system performance, and fairness.
  • Business metrics are the most important signal. If the business outcome is degrading, something is wrong — find it.
  • Build alerting and response playbooks before launch, not after the first incident.
06

Real examples from the field

📍 SDG Group — MLOps in insurance

In insurance transformation programs, operationalising ML models for pricing and risk assessment requires a particularly rigorous approach — these are high-stakes, regulated decisions where model failure has direct financial and compliance consequences. The key lesson from this context: explainability is not optional. Regulators and auditors need to understand why a model made a decision. This shapes the entire MLOps architecture — from model selection (preferring interpretable models) to monitoring (tracking not just accuracy but decision distribution) to documentation. Build for explainability from day one, not as a retrofit.

📍 GO plc — Telco analytics at scale

In a telecommunications context, ML models for churn prediction and customer analytics need to operate at scale — hundreds of thousands of customers, daily predictions, integration with CRM and marketing systems. The operational challenge here was not model accuracy (the models worked well) but pipeline reliability and data freshness. A churn model is only useful if it runs on time and on current data. Investing in robust data pipelines and monitoring proved more valuable than further model optimisation — a reminder that the infrastructure around the model matters as much as the model itself.

📍 datalitiks — ML in a resource-constrained startup

At a startup, you can't afford a dedicated MLOps team. The practical lesson: invest in the foundation, not the superstructure. Experiment tracking (MLflow), a simple model registry, and basic production monitoring gave us 80% of the value of a mature MLOps practice at 20% of the complexity. The temptation at startup scale is to skip MLOps entirely — but the cost of a silent model failure in a customer-facing product is higher than the cost of basic monitoring. Start simple. Start monitored.

Up next
Module 06: ESG & AI
Module 06: ESG & AI