Your Engineering Team Just Met Data. Now What?
Engineering and Data Scientists teams meet, what can go wrong?
👋 Hi beloved Optimist Engineer, Marcos here!
I come once again with a special guest,
, and I let Jose introduce himself.Jose is a Senior Data Science Manager at Skyscanner, with 10+ years of experience with Machine Learning (or today called AI). He now helps tech leaders scale teams, guide execs on how to add value through ML in production, level you up in data visualisation and distills complex topics into simple terms to demystify AI.
I want to recommend Jose’s newsletter to all those Data Engineers seeking growth and looking to learn about real-world ML lessons, career growth, and leadership.
Without taking any more time, I'll hand the word 🎤 over to Jose.
If you are reading this, I assume you have been working as an engineer for some time, and, over the years, you and your team have built things that matter: features that work, infrastructure that scales, and deployment pipelines that feel reliable.
The engineering culture is established and solid: agile/scrum routines, code reviews, automation, CI/CD, and a clear sense of what “good” looks like.
I also assume that, being 2025, as an engineer, you have worked with data extensively - albeit, through an engineering lens. You track uptimes, you track cluster costs, you track traffic anomalies. Metrics, logs, alerts; that is your world.
But there is a different kind of data work entering the picture now. And it can change how everything fits together.
The scenario that might be new to you
Leadership wants to enhance the service you have owned through more than just engineering metrics.
They want to understand:
How the product is used.
Where users bounce.
What features drive retention.
They need:
Automated reporting and dashboards…
Machine learning is integrated within the service.
This is not engineering-solo anymore.
Of course, this requires new profiles in the team. Data profiles.
Beware… Data analysts and Data Scientists arrive…
These new joiners start asking things that come your way:
Can we log these new user actions?
Can you clean up this dataset?
Can you help productionise this model?
Your engineering team begins to wonder how their responsibilities are shifting. Questions surface in daily stand-ups and sprint planning.
Who owns the tracking plan?
Is this dashboard part of our delivery, or someone else’s?
Are we becoming a support team for the data function?
How are these requests being prioritised?
Will this slow us down?
These concerns are real. And they are common.
The arrival of data professionals (analysts, scientists, sometimes analytics engineers) often marks the beginning of a new phase in how a company operates. But it also challenges established ways of working. It introduces new stakeholders, new dependencies, and new definitions of success.
This post is not here to argue for or against that shift. Instead, it is a practical, honest guide for engineers encountering this moment for the first time — when data people join the picture, and the rules of collaboration begin to change.
What will you read about in this post?
In the sections that follow, I will cover 4 areas:
What data analysts and data scientists actually do.
What kind of profiles are worth hiring (and which ones to avoid early on).
3 models of collaboration between engineering and data.
Where things often go wrong — and how to prevent them.
If you are an engineer, a tech lead, or someone navigating this transition for the first time, this article is for you.
(Written by your friendly neighbourhood data scientist. 👨🔬 🕷️)
What data people actually do (and what engineers need to know)
If you have never worked closely with a data analyst or a data scientist, it can be difficult to picture what their day-to-day looks like, or why they are suddenly asking for things that land in your backlog.
So, let’s start with what data people actually do.
Data analysts: decoding what is happening
A good analyst helps the business understand how things are going. They spend their time writing SQL, building dashboards, and answering product questions. The work is often ad hoc:
Where are users dropping off during onboarding?
Did this change in the funnel come from a new feature or from seasonality?
What is the retention rate at 60 days of acquisition on new cohorts?
If the business wants to understand what happened and make a decision based on it, the analyst is the one pulling the numbers and presenting the story.
They usually live in product, marketing, or commercial teams — not deep in engineering. But their ability to do the job depends almost entirely on what data engineering has made available.
If logs are missing, inconsistent, or changed without notice, their analysis breaks.
If events are not documented, it takes days to untangle the meaning of a single metric.
If a dashboard is suddenly empty, they do not know if it is a bug or a feature.
Therefore, clean, complete, well-described data is a “must” for their work to have a positive impact. Maybe that backlog ticket might start making more sense now, right?
Data scientists: moving from insight to optimisation
If analysts explain what happened, data scientists focus on what should happen next.
They build models, run experiments, and design algorithms that help a product adapt or improve itself.
They might predict churn. They might optimise ranking. They might experiment with different recommendation strategies.
Unlike analysts, their work is not always tightly scoped. It is often exploratory. Messy. Iterative.
They work in notebooks and Python scripts.
They are deep into experimentation frameworks.
They ask a lot of questions — and sometimes those questions lead to weeks of dead ends before something clicks.
What they need from engineering is not just logged data, but reproducibility.
If a model works once but fails in prod, it is useless.
If the feature they trained on disappears or changes without warning, all bets are off.
And if you have ever been asked to “productionise a model”, welcome to one of the messiest handoffs in tech. (PS: I am guilty of having provided really messy handoffs… I tell you, they don’t need to be messy).
If the descriptions above are still not clear, here goes a Star Wars analogy for you.
If the product is the Millennium Falcon, the analyst is R2-D2 — plugged into the dashboard, monitoring everything in real time.
The data scientist is probably Luke — closing his eyes and trusting the model to hit the target… sometimes successfully.
(Sorry fellow analysts, I have always wanted to picture myself as Skywalker — but hey, R2-D2 is the coolest robot in the whole galaxy.)
What kind of profiles are worth hiring (and which ones to avoid early on)
There are plenty of job descriptions out there to copy and paste, but little experience on how to build a data team from scratch.
Therefore, instead of listing what to look for, I will answer with the traps to avoid at all costs:
A fresh graduate with no context for working with real, messy product data.
A researcher-type who wants perfect data before starting anything.
A “machine learning engineer” with zero interest in actual product questions.
A junior analyst who cannot yet define a metric; let alone defend it.
In the early days, data people needed to be autonomous, communicative, and capable of bridging disciplines. That means knowing the business and the tools. A pretty dashboard is not helpful if no one knows what question it answers.
Collaboration models: How to work together
Once you have brought data analysts or scientists into the picture, the next challenge is figuring out how to actually work together.
Who joins which meetings?
Where do data requests go?
How are priorities aligned — or not?
And what happens when engineering velocity and data ambiguity collide?
There is no single right answer. But there are 3 common models that companies use to structure collaboration between engineering and data. Each has trade-offs, and some are better suited for getting started than others.
Let me go through them.
1. Separate teams, collaborating as partners
In this setup, engineering and data live in different teams. They work on different backlogs, report to different managers, and collaborate through shared rituals — usually weekly syncs or project checkpoints.
👍 Advantages:
This model allows each discipline to retain its functional identity.
Analysts and scientists can support each other, enforce shared standards, and avoid the isolation that comes from being the lone data person in a sea of engineers.
It is easier to scale headcount, maintain tooling consistency, and build internal best practices.
👎 Disadvantages:
The collaboration can become transactional.
Data requests are treated like tickets.
Engineering changes become blockers for analysis.
Context is often lost in handovers.
Unless the teams have strong, intentional alignment rituals, things quickly slide into “service mode,” with long feedback loops and low mutual understanding.
2. Fully embedded data roles
Here, the data analyst or scientist is a full member of the engineering or product squad. They join standups, contribute to sprint planning, and share the same goals and rituals as their engineering peers.
👍 Advantages:
This is the tightest model of collaboration.
Data becomes part of the build process from day one — not something tacked on at the end.
Logging decisions get made with analysis in mind.
Models are scoped realistically.
Product direction benefits from both qualitative insight and quantitative signals.
Engineers and data people build shared language fast.
👎 Disadvantages:
It can be lonely and unsupported for the data person.
When you are outnumbered 6-to-1 in standup, it is easy to deprioritise your own work.
Engineering managers, even the best ones, often struggle to coach or grow data careers.
Over time, technical debt and misalignment build up if the data person is not senior enough to hold their own.
3. Virtual teams (V-teams)
This model creates a temporary, focused team made up of 2-3 selected engineers, 1 or 2 data people, and a product lead. Basically, all pulled together to work on a specific problem space.
👍 Advantages:
You get the best of both worlds.
There is tight day-to-day collaboration without needing a full-blown reorg.
It is a safe way to trial cross-functional work and build empathy between disciplines.
Each team member brings their domain knowledge, and you get to test how data fits into product delivery in real life — not just on slides.
👎 Disadvantages:
It adds some coordination overhead.
Multiple managers need to stay aligned.
Roles and responsibilities can feel fuzzy.
You might have dotted-line reporting, dual backlogs, and the occasional “who is driving this again?” moment.
Assuming this is the first time your engineering team will be working closely with data people, my recommendation is to start with a V-team.
✅ V-teams are the most flexible model, the easiest to spin up, and the best way to test collaboration in the wild without committing to an org-wide change.
⚠️ But — and this is important — V-teams are not a long-term solution.
They are a bridge, not a home. They work best when time-boxed and mission-driven. If they drag on, you start to see accountability blur, managers lose track of priorities, and team members feel stretched between two worlds.
Use the V-team to learn. To find out what rituals actually work. To see where the friction is. And once you know? Disband it.
Let us talk about how to manage a V-team well, because mixing deterministic sprints with exploratory data work is not as easy as it sounds.
Making V-teams work in practice
Most engineering teams work in sprints: story points, velocity, and deterministic planning.
Data work often follows Kanban-style flows: high uncertainty, changing priorities, and an uncomfortable relationship with estimation.
If you put both of these into the same V-team and run Scrum like nothing changed… prepare for chaos.
Instead, treat the V-team as an experiment in collaboration design:
How much work should be committed up front?
How are goals set when outcomes are uncertain?
How are blockers surfaced when there is no clear definition of “done”?
Some teams end up adopting hybrid rituals — sprints with more flexible commitment for data. Others maintain dual tracking boards. Some simply drop points altogether and focus on outcomes.
Unfortunately, there is no universal fix. But if you are open to adjusting ways-of-working and you test in a controlled V-team environment, you will be closer to an effective engineering ⇔ data collaboration model.
Where things often go wrong (and how to prevent them)
Even with the best intentions, collaboration between engineers and data people often runs into familiar traps. Below, I want to share with you 5 friction points I have seen emerge in my day-to-day leading projects with engineers and data scientists.
1. “Done” means different things to different people
One of the first issues tends to show up during delivery.
For an engineer, “done” often means the code is merged, tested, and live.
For an analyst, “done” might mean that stakeholders have signed off on the dashboard and that the metrics actually make sense.
For a data scientist, “done” can be even fuzzier — maybe a model has passed offline evaluation, but still needs validation in production.
This mismatch causes frustration on both sides. Engineers feel like the finish line keeps moving. Data folks feel like things are getting shipped before they are ready, or without enough rigour behind the numbers.
How to prevent it?
Teams need a shared definition of done. Not one-size-fits-all, but something tailored to the type of work — including uncertainty and iteration for data-heavy tasks.
For example, when writing an epic that involves both engineers and data scientists, you might break it into two coordinated work streams. The engineering “done” could be the backend support for a new feature flag, properly logged and deployed. The data “done” might come two weeks later — once enough data has been collected to run an experiment and share results with the product.
What matters is setting expectations clearly. The story does not end at deployment. Nor should a model experiment block frontend shipping. The goal is mutual clarity: what is being delivered, by whom, and when it can be called “useful.”
2. Nobody really owns the data
Engineers create the systems that generate data, but they often do not think of themselves as responsible for what happens to it afterwards. Analysts, on the other hand, assume the data coming in is trustworthy. Then product teams start asking questions about retention, funnels, or attribution — and suddenly everyone is “pointing” fingers.
This lack of ownership shows up quickly. Logs are inconsistent. Metrics are defined differently across teams. An event fires three times on iOS but once on Android. Everyone feels the pain, but no one owns the fix.
How to prevent it?
Treat data as a first-class product surface. Data tracking plans need to be reviewed just like API changes. Engineers should own their logs — naming, frequency, structure — the same way they own code. Create data contracts and model deployment specs. Where possible, assign clear metric owners. If something breaks, someone should know it is their job to care.
3. Work becomes invisible across tool boundaries
Engineers tend to live in GitHub, Jira, and IDEs. Data people tend to live in notebooks, dashboards, and analytics tools. Unless someone bridges the gap, you can go an entire sprint without really knowing what the other side is doing.
From an engineering side, data work can feel like a black box: unclear inputs, delayed outputs, no visibility into progress. From the data side, it often feels like decisions are made without context, or that insights are being ignored because they were never surfaced in the right forum.
How to prevent it?
Fixing this does not require a full tooling overhaul, but it does require some overlap. Like it or not, data people need to upskill in the use of GitHub and code reviews. But engineers also need to understand that data is messy, and notebooks are required as an initial playground. Make sure to include data work in sprint demos — not just code. Celebrate insights from data, not just feature releases.
4. Engineers feel like a support team
Early in the relationship, the data team will likely need help. They need logs added, schemas changed, feature flags exposed, and events instrumented. But all of this lives in engineering land, and over time, the volume of requests adds up.
When this happens without planning, engineers start to feel like a support team — not collaborators. Even well-intentioned requests start to feel like interruptions.
How to prevent it?
The fix is to plan better, not less. Data work should be scoped and prioritised with engineering — not layered on top.
For example, if a team is working on a new onboarding flow, the ticket should already include which events need to be logged, which fields need to be structured, and whether an A/B test is planned.
It is also worth introducing “data readiness” as a shared goal — not just whether the feature works, but whether it can be measured. This creates a shared incentive: engineers are not building for someone else’s roadmap; they are building for a joint outcome.
5. Credit gets lost in translation
This one is less obvious, but no less important.
When an initiative succeeds, credit often flows to the team that shipped the feature, not the team that found the opportunity or validated the impact.
You see this most clearly in product reviews. The deck says, “We launched X and saw a 12% increase in conversions.” It does not say, “The analyst spotted the trend. The scientist built the uplift model. The engineer shipped it.” One team gets the praise. The others fade into the background.
It might seem harmless, but over time it chips away at motivation. Analysts stop raising questions. Data scientists stop pushing ideas. Collaboration becomes quieter, and weaker.
How to prevent it?
Leaders can fix this by modelling full-story storytelling. Recognise the entire chain of work: discovery, design, delivery. Use demos, retros, and updates to call out the behind-the-scenes impact. Not only that, teams should review their competency frameworks and ensure that non-production work, such as data insights, should also be recognised.
Final thoughts: start small, build better together
If this is the first time your engineering team is working with data people, welcome.
The questions you are asking — “who owns this?”, “why is this in our sprint?”, “are we a support team now?” — are common. So are the friction points. Collaboration between engineering and data is rarely clean on day one. But, it is worth it.
If you take anything from this long post, do the following:
Start with one V-team.
Hire a couple of senior data roles.
Put a lot of effort on way-of-working.
Define what “done” means for everyone.
Make your logs a product, not a side-effect.
Celebrate the insight, not just the commit.
Remember: data is not here to slow you down.
It is here to help you aim better and build things that actually work.
🫂 Thanks Jose!
Marcos back!
I want to send a deep Thank You to
for sharing his experience with all of us. To learn more from Jose, subscribe NOW to his newsletter 👇We are more than ✨1222 Optimist Engineers✨!! 🚀
Thanks for your support and feedback, really appreciate it!
You’re the best! 🖖🏼
𝘐𝘧 𝘺𝘰𝘶 𝘦𝘯𝘫𝘰𝘺𝘦𝘥 𝘵𝘩𝘪𝘴 𝘱𝘰𝘴𝘵, 𝘵𝘩𝘦𝘯 𝘤𝘭𝘪𝘤𝘬 𝘵𝘩𝘦 💜. 𝘐𝘵 𝘩𝘦𝘭𝘱𝘴!
𝘐𝘧 𝘺𝘰𝘶 𝘬𝘯𝘰𝘸 𝘴𝘰𝘮𝘦𝘰𝘯𝘦 𝘦𝘭𝘴𝘦 𝘸𝘪𝘭𝘭 𝘣𝘦𝘯𝘦𝘧𝘪𝘵 𝘧𝘳𝘰𝘮 𝘵𝘩𝘪𝘴, ♻️ 𝘴𝘩𝘢𝘳𝘦 𝘵𝘩𝘪𝘴 𝘱𝘰𝘴𝘵.