An analyst’s guide to working with data engineering

Whitepaper

An analyst’s guide to working with data engineering

How governed collaboration supports self-service and builds trust

34 min

Data volume is exploding. It’s estimated that as much data is produced in one day in 2025 as roughly one-fourth of all data existing in 2010, and that number is growing exponentially.

Speed is crucial. When the business asks a question at 9:00 a.m., the answer can’t wait until next week.

However, without guardrails in place, empowering analysts to independently model data or self-serve at scale can lead to governance risks: drift in definitions, broken lineage, and compliance gaps. The fix isn’t more tickets; it’s a shared workflow.

Few businesses have foundations in place to scale at the pace of data proliferation or the integrated tools to consistently streamline flow from event telemetry through database maintenance to queries and data visualization. But there is a solution: governed collaboration between data engineers and data analysts that keeps pace with the business.

In this guide, we’ll explore real examples, lessons, and frameworks from dbt Labs data analysts Rachael Gilbert, Paige Berry, Chris Fiore, and Logan Cochran, as well as dbt Labs senior software engineer Zach Brown, presenting our guidance on building a collaborative environment that enables data teams to keep pace with the developing business needs. The result: fewer tickets, consistent metrics, and analysis that makes decisions.

Why analysts and engineers need each other

As mediators, analysts translate stakeholder concerns into data queries and identify developments in business data trends. But given the vast amount of often unstructured data to be queried and the speed at which stakeholders require insights, data analysts cannot work in a vacuum. From establishing critical event telemetry to structuring data infrastructures for self-service by analysts, data engineers and central data teams develop passages that guide the flow of information. Their roles ensure that analysts have access to accurate, complete data to answer questions throughout the business.

On the other hand, data analysts give structure to the role of data engineers, setting parameters for business needs that inform how engineers build infrastructures. Paige Berry, lead data analyst, experienced this when she provided end-user insight into the data pipeline for a new event-tracking service, helping to structure the goals of the data engineering team.

Senior software engineer Zach Brown explains, “The data team, the analysts and analytics engineers, are stakeholder number one. They’re the most important people to serve. The data team should, in my opinion, be the ones driving the decisions, saying, ‘This is the data we need collected so we can provide meaningful analysis after the fact.’”

Better feedback loops = better data

In our recent 2025 State of Analytics Engineering Report, over 56% of respondents cited poor data as a significant challenge. Incomplete, inconsistent, or outdated data introduces errors throughout the analytics workflow, including dashboards, reports, AI models, and operational systems. Poor data quality undermines decisions and erodes trust in data teams—a situation that can be remedied with effective cooperation between analysts and data engineers.

“While it's important for a data analyst to be technical and to understand how to do data modeling and some of the upstream pipeline work, ideally, that is something that's left for someone less interested in the business context and more fascinated by the technical problem of moving data through an organization. Having that division of labor gives the data engineers and analytics engineers the things they're excited to work on and full bandwidth to get to an optimal solution and infrastructure, where the data analysts can focus on the people aspect and understanding what the deeper questions are, and have time to explore that with the stakeholder.” - Chris Fiore, senior data analyst at dbt Labs

Codifying feedback loops into the everyday workflow improves data quality and gives analysts and engineers a solid, shared foundation. With KPIs defined by data analysts and shared semantic layers and data catalogs to maintain consistency, engineers can focus on scalability, tracking, and compliance. Together, they ensure data is accurate and up to date, enabling scalable growth and effective data-driven decision-making.

Saving time with self-service

Surface-level insights no longer provide a competitive advantage, and deep dives into data support increasingly complex business needs. This increased technical focus is only likely to grow for analysts. With tools like dbt, analysts can create and maintain data pipelines, supporting collaboration with data engineers on analytics code, data tests, documentation, and data metrics.

Rachael Gilbert, staff data analyst, explains, “Having a tool like dbt makes it a lot easier, where I can self-serve on tracing. I know this thing is failing, what feeds into it? [It helps with] figuring out why something might be failing and at least putting together a picture before I go ask someone else for help."

Paige agrees, explaining the dbt offers her "the ability to self-serve, to explore on my own, to understand and to be able to look at the column-level lineage, and figure out where this piece of data that is giving me trouble is coming from."

With analysts able to dig into the details of data, engineers are increasingly focused on enabling scale, rather than simply responding to tickets. Chris explains, "Zach's biggest focus right now is less around how we are explicitly enabling data analysts to model data and more on how we set up the proper infrastructure."

Shared ownership and clear workflows allow this relationship to flourish, provided communication channels exist.

Roadblocks to collaboration and how to break them down

It is this degree of clear, consistent feedback loop and self-serve data exploration—or lack thereof—that is often the deciding factor in collaboration between data analysts and data engineers.

“The ways I interacted as an analyst with the folks who were analytics engineers doing data engineering work would be what I call a data detective,” explains Paige, discussing her career before dbt Labs.

But conveying this information isn’t always simple. She continues, “It was always pretty ad hoc: a lot of conversations in Slack, sometimes getting on a call to show someone live what I'm seeing, taking copious notes and screenshots and putting them into Notion. We use Loom [at dbt Labs], but [I might have sent] a Zoom recording to show what I'm seeing if I couldn't talk to someone live but it would have taken an hour to type out.”

Communications are even more complicated when teams don’t “speak the same language,” which can be particularly difficult in organizations with varying degrees of technical aptitude. Documentation, lineage, and shared definitions turn detective work into a repeatable loop that others can follow and reproduce.

Avoiding bottlenecks without losing governance

When workloads bottleneck, skipping collaboration might feel faster, but it quickly erodes trust and governance. Paige finds that AI tools bridge that gap. “If I need data from a source in a staging model, at least where I can query it in Hex, sometimes I'll try to do that myself if our analytics engineers don't have capacity. When I do that kind of work, I definitely use AI to generate documentation. It saves an incredible amount of time. I always have to go back and double-check and clean up a lot of it, but it's wonderful to have that first pass already done for me.”

AI tools also allow Rachael to access models that she might not otherwise leverage. While her everyday use of SQL means she’s very comfortable with it, she sometimes needs Python for a particular query. She explains, “I'm rusty now, because I'm not using it every day like I have at other jobs. [AI] does exactly what I need it to do in that regard. I don't have Python syntax memorized at this point, so it helps me get there a lot faster.”

Self-service may relieve some of the burden from data engineers, but the importance of collaboration to maintain governance is still a crucial point. Zach explains, “There are a lot of different sources from different vendors. Where does that data come from and where does it end up?

“One of the things that [data analysts] struggle with the most from a technical perspective is having the appropriate tools to be able to get the data they need into the right place, which, on paper, feels like a very simple thing. But our data team at dbt has such a wide-reaching breadth of data they interact with.”

Without clear guardrails in place and easily catalogued data sources, this can become an extremely complicated—and error-prone—endeavor.

Roadblocks teams may face:

Conflicting dashboards
Data from unvetted sources outside of governed environments
Difficult-to-trace errors
Non-compliance risks with legal and reputational impacts
Unknown ownership or update history

Governed collaboration principles to break them down:

Clear documentation
Lineage tracking
Metadata management

Scaling your team with clearly defined roles

Rapid growth in data needs often outpaces role clarity. As companies scale their use of AI tools and cloud platforms, data accessibility is expanding, resulting in duplicated efforts and gaps in coverage. Data analyst Logan Cochran cites dbt Labs’s data team growth as an example, stating, “Historically, our team has been small, [but] we've [essentially] doubled in size in the last year. And I think we plan to keep growing the team, specifically in the realm of data engineering.”

This amount of growth can lead to growing pains, including a lack of standardized business titles between organizations and inconsistent role documentation. As new hires bring assumptions about what it means to be a data analyst or data engineer based on their previous employers, they may not know which parts of the process are their responsibility in this unfamiliar environment. While some overlap can boost collaboration and ensure accuracy, it can also lead to wasted time—and, on the other hand, unclear boundaries can result in dropped tasks, ill-defined KPIs, and misaligned expectations.

Trying to manage multiple workflows can also be taxing. A 2022 Harvard Business Review study found that context switching across tools can cost employees just under four hours a week, simply reorienting between applications. It also introduces more opportunities for errors—and that assumes these analysts are already skilled at navigating data engineering tasks.

Chris expands on this, saying, “Oftentimes, data analysts are working across the stack, where they're at that downstream conversation, then they're doing the data viz, and sometimes they're having to go all the way upstream and do all the data validations and checking. And it can be really hard because you're wearing several different hats. You're putting on your people hat, you're putting on your data viz and designer hat, you're then having to put on your data engineer hat.”

His suggestion: specialization and collaboration. “Analysts, in my opinion, really should be focused on thought partnership with their stakeholders. I should be dialed in on how my product managers are thinking, how the engineering team is thinking, and how that ties into company strategy—really understanding the business strategy and context.”

He continues, “[The key is] making sure analysts can find the data they need quickly and effectively, using tools like dbt Catalog or Insights for some of the quicker asks, but having more bandwidth to deepen the partnership and thought leadership with their end stakeholder.”

Zach agrees. In his view, data engineers can enable analysts to lean into their own specializations, making collaboration between the teams more about building repeatable structures than resolving one-off questions. He explains, “There needs to be that technical foundation to make it easy to collect and send all this data. There's one part, which is the process of empowering the data team to drive that across all these different projects. And then there's another part, which is that standard, and that story of what that data should look like holistically across all of our environments.”

He takes a long-term view of evolution within the data team, allowing for both specialization and unity. "The analysts are the ones who know what they need, they just don't know how to make it happen," he explains.

Roadblocks teams may face:

Dropped tasks
Ill-defined KPIs
Misaligned expectations
Duplicated efforts
Time lost to context-switching
Devalued data due to long turnaround times
Lack of bandwidth for business strategy and stakeholder concerns

Governed collaboration principles to break them down:

Specialization of roles
Open communication between teams
Proactive, collaborative development of data architecture
Well-constructed infrastructures for self-service
Standardized, repeatable structures

Governed collaboration: A structured partnership between data analysts and data engineers who work within a shared framework to co-own the data lifecycle and maintain governance principles. Governance should be enforced by the system (CI, tests, contracts, RBAC) so analysts can move fast without constant engineer oversight.

Done well, governed collaboration:

Reduces bottlenecks
Builds trust within the team and with stakeholders
Improves iteration speed
Enables consistency while scaling

Collaboration in practice: 3 lessons from dbt Labs analysts

Lesson #1: Governed self-service saves time and supports scalability

Fully autonomous self-service can introduce quality and security risks when conducted in a silo, but locking down the data pipeline is unsustainable and prevents analysts from doing their best work, especially at a speed where that work can be put into use within the business. This is where governed collaboration comes into play. Analysts ship changes within clear guardrails, and engineers steward the platform and data contracts.

Platforms like dbt enable data analysts to perform self-service for everyday operations and troubleshooting, working with trusted data without relying on data engineers. With dbt, analysts propose model changes, tests, and docs in Git; CI runs checks; owners review; and lineage/freshness make impact visible. This reduces day-to-day dependence on data engineers and frees them up to collaborate on bigger-picture items—such as telemetry frameworks and infrastructure—which allows for scalability and increased productivity. As a governed environment, the dbt platform provides version control and freshness checks in the project, lineage tracking and metadata visibility in dbt Docs, role-based access necessary to maintain governance standards, and a semantic layer that centralizes metric definitions and logic to ensure data quality.

Lesson #2: Building one shared, governed system breaks down silos

Empowering data analysts to contribute to documentation, testing, and trust signals helps grow the company’s knowledge base and dramatically streamlines workflows. Logan finds that dbt enables him to work more efficiently and to share his efforts in a governed space. "I use it as frequently as I can, because I love being able to solidify some of the things that otherwise would be saved in a notebook somewhere that I'm rerunning over and over again, and that people don't have easy access to,” he explains.

A governed “home” to build models that includes discovery, drag-and-drop functionality, validation, and shipping reduces thrash and tightens loops. dbt enables this with:

dbt Catalog provides discoverability, i.e., definitions, ownership, and trust signals, so analysts know what exists and how to use it.
CI + version control test every change safely (temporary schemas, dependency checks) before anything touches production.
dbt Canvas provides a shared, permissioned workspace to compose analyses (text, SQL, charts) alongside models, supporting discovery, drag-and-drop building blocks, validation, and a clear path to ship outcomes the team can reuse.

Chris, too, has found new ways of contributing to his team’s success. "I'm starting to use more dbt Copilot to automate and speed up how I'm doing documentation and setting up some data model metadata."

He continues, “Typically, as you're answering business questions or completing analytics projects, you're either needing to spin up new data models and take it from staging all the way to marks, you’re trying to answer a business question that you realize might be more routine, or you're leveraging existing models that might not have some of the dimensions that you want to add in, like plan tier or customer segment or company name.” By capturing that work in dbt Canvas or the dbt project rather than creating a one-off in a notebook, the rest of the team can contribute and reuse the work for their own projects.

“dbt Catalog absolutely makes it faster to understand what data is available, what pre-existing work has already been done, what kind of logic is captured within a given model,” he says. Catalog’s health signals support this cooperative environment, flagging data freshness and quality so analysts know they’re working with information that is current and accurate, while dbt Canvas keeps the context, queries, and decisions visible to everyone under the same governance and RBAC as the rest of dbt.

Lesson #3: One shared, governed tool speeds up ad-hoc analysis

In addition to facilitating sharing, dbt tools can help analysts better understand their data, even before working with it. Logan explains, "dbt Insights makes it really easy to quickly dig in and see what the data looks like before we move into doing the actual data modeling in dbt."

Through the dbt Insights interface, analysts can explore, query, and visualize data within a single governed workspace—bridging the gap between technical and business users—while allowing technical users to validate shape, quality, and semantics before any deeper modeling. All actions are version-controlled and in line with the organization’s data standards, so data teams balance self-service analytics with strong governance, maintaining compliance and consistency without loss of speed.

Because speed is such a key factor, dbt Insights minimizes context switching, since users can query models directly from dbt Catalog and access metadata and documentation without switching tabs or tools. It also supports AI-assisted queries through dbt Copilot, plus natural language support, and pulls in dbt Semantic Layer metrics, reducing time wasted on fine-tuning.

This deep integration and the ease with which users can preview data in a single governed environment has been game-changer for dbt Labs data analysts. Rachael enthuses, “One reason I am so excited for dbt Insights—and I will rave about dbt Insights all day—is because we've always talked about how dbt Catalog (formerly Explorer) was the place to go to understand your data. Catalog is great for documentation to get additional context or notes about a field or a model. But the other half is validating if the data you're working with is accurate, what you think it is, or what you'd expect to see querying it: what values are populating in these models and fields. So now we have dbt Catalog for the documentation of models and fields and dbt Insights for actually looking in those models and fields. That is the whole of what I need to validate and feel confident in my data.”

Paige appreciates how dbt Insights empowers her to conduct her own troubleshooting, explaining, “If someone is saying, ‘I'm not seeing the data I expect to see,’ I can quickly check. Is the job failing? Maybe we don't have fresh data. Is that why, or are the sources stale in the whole transformation? Are pipelines running? Maybe there's something wrong with our connection getting data from the third-party source. That's something I can check on as part of the validation process.”

Crucially, this “one shared system for self-serve analysis” doesn’t stop at the UI. The dbt MCP server exposes governed project context and safe tools—like compile/run, tests, metadata, discovery, and Semantic Layer access—over the Model Context Protocol. That means IDEs, notebooks, and agentic assistants can interact with the same trusted definitions and permissions you see in dbt Insights and dbt Catalog.

The guardrails of governed collaboration enable analyst autonomy rather than restricting it, battling tool sprawl and providing the framework and common language (models, tests, metrics) and shared protocol (dbt MCP) to turn ad-hoc answers into reusable, auditable assets that move the business faster.

5 ways to make governed collaboration work for you

Having established the benefits of governed collaboration—improved speed, scalability, autonomy, and trust—here are five ways your company can put it to use.

Create shared development environments

Separate platforms, databases, and one-off development environments are roadblocks to collaboration. Beyond the time wasted on context switching, trying to maintain lineage, consistency, and collaboration across multiple environments for a rapidly growing data team can mean confusion, frustration, and miscommunication.

Paige explains, “Before dbt Insights, I would have to drop raw SQL in Slack, or a link to Snowflake where I've done something. And now I can just drop the bookmark to the Insights query, and it has everything in there the way I want it. And data engineers can immediately see the results of what I was trying to show with the query and then go back to validating.”

By creating a shared development environment, data analysts and data engineers can collaborate on the same repo and propagate updates throughout the environment.

Use Continuous Integration for testing and version control

In shared development environments, speed can introduce risk. While data engineers may have a more far-reaching impact on infrastructure, even a misplaced change in analyst code or an updated model can lead to errors, broken logic, or failed validations that can ripple across the organization, leading to damaged workflows, dashboards, and, if not caught in time, even stakeholder reports.

Continuous Integration (CI) automatically tests and validates code each time it is updated in the shared environment, confirming accuracy and cross-checking dependencies before the impact of a faulty change can spread. dbt Cloud’s built-in CI/CD capabilities run automatically on every PR (GitHub/GitLab/Azure DevOps), building only the impacted models in a temporary, PR-scoped schema and executing tests and dependency checks before anything merges. Engineers review diffs and run results in the PR, and Jobs handle scheduled or triggered deployment, keeping production safe.

Make lineage and documentation upkeep a daily task

Collaboration can’t work unless all parties are on the same page—but this doesn’t need to mean endless meetings or daily scrums. Keeping lineage,documentation, owners, and tests consistent in your everyday workflow helps other data team members understand updates, changes, and discoveries, preventing time-wasting duplicated work and errors from misunderstandings or a lack of information on where to find support.

With many inputs and updates to track, Logan relies on this information to structure his analysis, explaining, “I look at documentation that exists and really get a solid grasp. We have such a wide-spanning project that covers so many different things that weekly, I am in dbt Catalog asking, ‘What does this column mean? What is the logic behind it? Where does it live? How does it affect everything else?’”

By codifying lineage and documentation upkeep into a daily task, data teams create a time-saving paper trail, sharing information and providing a point of reference for easier collaboration.

Establish clear handoff points

Part of the collaboration between data analysts and data engineers is actually collaborating, which means knowing when to step back and let someone else take over. But rapidly shifting roles can complicate task ownership or even alignment on KPIs.

Rachael recalls the difficulty resulting from misaligned expectations. “Where I ran into the most friction at past jobs was at larger companies when data engineering sat in a completely different org than me, and we were aligned under different leadership structures with different goals and different missions. They were very back-end data engineers who didn't really know much about analytics, and I was very much more on the data science—applying data to business decision—side of things. At the time, I was very ignorant of data engineering things, and the org structure and knowledge and skill sets just made that gap very hard to bridge.”

Clarifying roles and responsibilities, as well as aligning on KPIs, provides easily understood handoff points. Teams may consider leveraging a RACI (Responsible, Accountable, Consulted, Informed) model to differentiate roles at each stage in the process, preventing redundancies and ensuring that no critical pieces are dropped along the way.

Set guardrails to enable self-service

Governance does not mean restricted collaboration. In fact, establishing clear governance principles and maintaining daily lineage and documentation updates should give data analysts more autonomy, not less. Zach explains, “Here at dbt Labs, dbt is so heavily incorporated that they have that part right. The process of self-service for analysts is super well defined.”

Rather than holding analysts back, this well-defined role offers freedom to explore the data, with safeguards in place to ensure governance is maintained, checks to prevent propagation of errors, and clearly defined handoff points so that all aspects of a project are covered without unnecessary overlap.

Properly deployed, governance with PR reviews, CI, contracts, tests, lineage, and RBAC in place, should allow data teams to feel empowered, not restricted. The result is autonomy for exploration, safeguards to stop bad changes, and clean handoffs.

Your analyst enablement checklist

Before employing governed collaboration for your data team, you should be able to confidently answer the following questions:

1. Does every analyst know how to trace lineage and trust signals (without tribal knowledge)?

If each analyst is approaching data as if no one has ever touched it before, they could be missing key background or unknowingly leveraging faulty data. It should be easy for analysts to understand the lineage of the data they are using and be able to confirm that it is complete, accurate, and up to date.
What good looks like: lineage graphs tied to owners, tests, and freshness; health/status surfaced in the same workspace where queries happen (e.g., Catalog/lineage + test results). Analysts don’t need to know orchestration internals, just whether a dataset is current, tested, and who owns it.

2. Are branching strategies and environments structured for collaboration?

Can your teams easily work together in a shared development environment? Is version control in place to prevent the proliferation of errors or conflicts? Your organization needs a clear staging and deployment strategy before allowing changes to a shared environment, or you risk teams duplicating efforts, or worse, breaking each other’s work.
What good looks like: PR-based workflow in Git; per-developer isolated dev schemas; consistent dev/stage/prod; CI that builds impacted models in a temporary schema and runs tests/contracts before merge; deployments handled by jobs, not ad-hoc changes in shared prod.

3. Are metrics centrally defined and easy to access?

It is crucial that all members of the organization “speak the same language,” agreeing on mutually delineated glossaries, categorizations, and other key features. Ill-defined database structures can quickly result in misaligned metrics, inconsistent updates, and inaccurate queries.
What good looks like: governed metric definitions (names, grains, dimensions, filters) stored in version control; a semantic/metrics layer exposed to BI/AI tools so everyone computes the same result; conformed dimensions and clear SCD handling to avoid drift.

4. Is governance enforced without constant intervention from data engineers?

The point of enabling self-service is to give data analysts autonomy to work without unnecessary oversight or roadblocks, but failing to adhere to governance standards can bring serious legal and reputational damage and an erosion of trust. Governance should be built into all workflows and firmly ingrained in all team members before removing consistent oversight.
What good looks like: policy-as-code (contracts, tests, data classifications), RBAC at the warehouse and workspace, column/row masking where needed, CI gates on PRs, audit logs for changes and runs. Engineers set guardrails; analysts ship safely inside them.

Final thoughts

AI has reignited the value of data, but the scale is unlike anything we’ve seen throughout history, and growing. Now is the time for data teams to build a solid foundation, securing collaborative relationships between data analysts and data engineers that draw on their respective strengths and create a supportive, well-defined partnership based on trust and shared goals. The future of analytics is shared: governed, collaborative, and fast.

Paige has advice for data analysts hoping to improve their collaboration. “See if there's somebody to shadow who's doing data engineering work. What are their concerns? What does their day-to-day look like? It builds empathy and can help you understand: this is how I could adjust the way I'm describing something that makes more sense to them in their world. And see if that helps make your communication better.”

As a data engineer, Zach agrees. “At any company that wants to be properly data-driven, data analysts should be the ones driving the charge. They are the experts when it comes to data. If I could be of use helping them drive that story and form that narrative, then I think that means I'm doing my job.”

The teams that win make collaboration repeatable: clear roles and handoffs, self-service with guardrails, and one governed system for discovery, validation, and shipping. When analysts trace lineage, validate freshness, and iterate with stakeholders, and engineers provide the telemetry, tests, CI, and access controls, speed stops trading off against trust.

AI raises the stakes but also accelerates the loop—drafting docs, queries, and checks—so long as governance is built into the path of work. Make the governed path the easiest path, and autonomy becomes an asset, not a risk.

Transform your analytics workflow today. See firsthand how dbt empowers analysts with AI-powered, governed workflows. Request a demo or start your free trial now.

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Install free extension

Empower analysts to transform data

Explore, transform, and model data - without writing SQL from scratch. dbt Canvas brings analysts into the fold with an AI-assisted visual workspace that’s fully governed.

Learn more