/ /
How Roche unified global data, enabled AI at scale, and improved operational efficiency

How Roche unified global data, enabled AI at scale, and improved operational efficiency

Roche is one of the largest biotechnology and pharmaceutical companies in the world. From pioneering cancer treatments to advancing personalized medicine, Roche’s vision is clear: improve health outcomes and build healthier futures for people worldwide.

It’s a complex undertaking that’s powered by vast amounts of clinical and commercial data. But over the years, Roche’s data had sprawled into a disconnected ecosystem. In order to scale, Roche needed to unify its data on a global level.

Roche decided to standardize and modernize its commercial analytics platform with dbt Labs. The five-year initiative ultimately covered over 80 countries, streamlined thousands of users, and achieved a cost savings of 70%. Here’s how Roche did it.

Slow decision-making from a fragmented data ecosystem

One of Roche’s primary data challenges is understanding its customers: the healthcare practitioners (HCPs) who prescribe its medications.

To do so requires synthesizing data from internal systems (like CRM, marketing platforms, and event logs) with external data (such as scientific publications, clinical trials, and market share). All of this data was siloed across more than 80 countries.

Adding to the complexity, every global business unit (known as affiliates) managed its own data pipelines, vendors, and business logic. Maintaining infrastructure, training, and contracts across Informatica, Hadoop, Talend, Oracle, and Microsoft was inefficient, unaligned, and expensive.

“Each country had its own version of truth,” reflects João Antunes, Lead Engineer at Roche. “We were all trying to answer the same questions but kept reinventing the wheel with different technologies.”

Roche map

For example, while one team might use Oracle databases, another would use Informatica pipelines. Some teams relied on drag-and-drop tools, while others built strictly with Hadoop.

The myriad of tooling led to duplicate effort and inconsistent business insights. Because every affiliate bought and ingested data separately, with different definitions and pipelines, even basic questions became nearly impossible to answer.

Implementing global data standards and a modern stack

Solving this problem required a global data transformation strategy. To implement one, Roche focused on three pillars: people, process, and technology.

For its people, Roche introduced a matrix structure. Teams were organized by both engineering capabilities (like analytics and machine learning) and product workstreams (either global or affiliate-specific). Capability teams defined standards and best practices, while product workstreams focused on delivering business value. This structure allowed Roche to deliver insights with both speed and quality.

Roche team topology

To establish consistent processes, Roche made DevOps a cornerstone. By managing everything as code, Roche could enforce global standards and automate deployments. As a result, Roche enabled robust CI/CD pipelines with predictable releases. Today, every team operates in a two-week sprint cycle.

“Now we have complete visibility into what’s happening across the entire organization at any given moment,” says Antunes. “If someone starts building something that already exists in another region, we can catch it early and avoid duplicating work.”

Finally, Roche implemented a modern, scalable stack with a fully native cloud platform on AWS. Key components include:

  • Ingestion: Amazon Appflow, AWS Lambda, AWS Glue, AWS Transfer Family
  • Staging: Amazon S3, Lake Formation, AWS Glue Data Catalog
  • Transformation: Amazon Redshift and dbt
  • Consumption: Kubeflow for AI/ML; ThoughtSpot and Tableau for BI

“dbt is flexible enough to support both technical and non-technical users,” says Antunes. “We can quickly and easily onboard new affiliates, even if their teams aren’t as technical. dbt is a key part of our ability to scale.”

Enabling AI use cases

The result of Roche’s investment has been profound. Today, Roche’s data platform supports operations across 80+ countries. It serves more than 1,000 users and refreshes over 3,000 datasets daily. By decommissioning 4 platforms, Roche achieved approximately 70% cost savings while establishing a foundation for innovation.

Roche outcomes

What’s more, Roche can now connect external and internal data at scale. For example, it can combine CRM activity with clinical trial participation, publication history, and social media engagement from their HCPs. As a result, sales teams can more easily identify which physicians are emerging thought leaders and tailor outreach accordingly.

Now that its data is standardized and centralized, Roche is exploring new AI use cases. To cite just one example, sales reps now receive AI-powered recommendations, right in the CRM system, for content to share with a given physician based on prior interactions.

Another standout use case: using Redshift UDFs powered by Amazon Bedrock to classify product complaints and adverse event reports. These are critical tasks for regulatory compliance, and Roche can now meet its requirements faster and more effectively. SQL is dbt’s native language, so Roche embeds the UDF directly into its incremental models and runs it at scale every day.

Roche AI powered applications

“AI augments what we already do, and it unlocks value that we hadn’t even imagined,” says Antunes. “But none of that would be possible without the foundation of a modern data stack.”

Bringing data architecture across the organization

The work, of course, is just beginning. Next, Roche plans to expand its data architecture upstream into early-stage research and product development. It’s a major step toward breaking down silos across the pharmaceutical value chain.

As Roche scales its data platform even further, it plans to expand dbt’s role:

  • Use dbt Labs’ Connections API to consolidate projects. This will reduce the number of data projects Roche manages across affiliates.
  • Move more workloads to dbt. By moving global core projects into dbt, Roche aims to enable a data mesh for the organization.
  • Leverage metadata in dbt. Every day, Roche runs thousands of models. Metadata will help Roche better understand which of these data assets to prioritize and monitor.

In the coming years, AI’s impact on health will be transformative. With dbt Labs as a key partner, Roche has built the data foundation to innovate quickly and power the next era of medical breakthroughs.

What’s your data strategy look like this year? Whether it’s building a strong foundation for AI, breaking down silos, or empowering teams to make faster decisions, we’re excited to help. Reach out to book a demo, or sign up now to connect your data warehouse and start building.

Published on: Jul 10, 2025

2025 dbt Launch Showcase

Catch our Showcase launch replay to hear from our executives and product leaders about the latest features landing in dbt.

Set your organization up for success. Read the business case guide to accelerate time to value with dbt.

Read now

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups