Everyone’s talking about it—”the cloud.” In this post, we’re diving deep on why everyone’s making the move to the cloud, unpacking the primary cloud migration benefits for modern data teams, and discovering how dbt plays a role in all of this.
What is cloud migration?
In general, a cloud migration is the process of moving technical infrastructure (think databases, IT applications, etc.) to the cloud from either an on-premise service or from another cloud provider. In the context of data work (and this page), we tend to focus the conversation of cloud migrations around the transition from on-premise storage to cloud storage (cloud data warehouses or data lakes), and the transformation tools and efforts needed to catalyze that migration.
For data teams, cloud migrations unlock unparalleled abilities to scale, secure, and govern their data—all while doing it in a cost-optimized environment. Let’s dive into how a cloud migration paired with dbt enables these benefits and makes for more empowered data teams and resilient data pipelines.
The benefits of cloud migration: Scaling and governing your data with modern tooling at your side
When you move your data and data infrastructure to the cloud, you commit to scaling your data and transformations in secure, collaborative, and cost-efficient environments.
Scale your data with your business
Modern cloud data warehouses and lakehouses are built to scale with your data, your data team, and your business; cloud data warehouses made it so storage was cheap, allowing data teams to store a considerable amount of data at a more reasonable cost than on-prem solutions. Since a good chunk of cloud data warehouse costs come from compute, most cloud storage providers are equipped with efficient, smart query engines that optimize for performance and allow you to easily manipulate data warehouse size to scale up (or down) with your data.
In addition, when you move your data to cloud storage, you can leverage data transformation tools such as dbt, where you can create and implement automated, code-based tests. These tests reduce the number of ad hoc
select count(*) in your SQL worksheets and enable data quality to scale in parallel to your data growth—not allowing Big Data to become Big Problems.
Lastly, when you move your data and data stack to the cloud, you open the door to new users to contribute to your data pipelines. Historically, data work was landlocked by more technical data engineers and custom on-prem solutions; when your data is stored in the cloud, anyone who knows SQL can dive in and contribute to data transformations, all while doing it in secure environments managed with fine-grained access control.
As your data grows, so does the need for more data transformations—and these transformations don’t magically happen by themselves. As a result, it’s important to provide solutions that lower the barrier to contribution for analytics work in a secure, sustainable way. Learn more about how dbt Cloud enables anyone who knows SQL to contribute to your data transformations in a version controlled, governable, and secured environment.
Security you (and your CTO) can trust
Let’s say it together: a cloud-based data stack is a secure data stack. One of the greatest misconceptions about a shift away from on-prem storage is about the potential privacy and security concerns that are introduced by being on the cloud. However, modern cloud data warehouses offer great security protections and compliance through the use of complex encryption, role-based access control (RBAC) and SSO, and certifications such as ISOs or SOC2. To understand how a cloud tool will meet your security needs, you can review their security policies during your vendor procurement process.
Learn more about how dbt Cloud keeps your data secure (and your VP of security happy 😉).
Govern with modern tooling at your side
A move to the cloud unlocks your data team’s ability to govern efficiently and securely. When you pair your cloud data warehouses with transformation and governance tools, you can more easily govern who has access to your data and how that data is transformed:
- Governing access: Cloud data warehouses allow you to create fine-grained access permissions, roles, and groups, so folks are only seeing what they need to see, and securing PII and ensuring compliance becomes second nature.
- Governing data: In a cloud-based data stack, you have greater control over how your data is not only accessed, but structured and tested. Using dbt, which plugs into many cloud data warehouses, you can force changes to transformations to pass certain documentation, testing, or code requirements using pull requests and peer-review. No more questionable business logic slipping into transformations—govern how your data is transformed in your data warehouse by using transformation tools that support data documentation, testing, version control, and lineage.
When you govern how your data is accessed and transformed, you’re not adding barriers to entry—in fact, it’s the opposite: when you create reasonable governance policies, you’re creating safe and secure development environments for folks who work with your data. These guardrails should empower people to work with the data in confident bounds—and not act as blockades.
Below, learn more about how dbt is enabling governance in the cloud-based data world:
- dbt Cloud RBAC: Create fine-grained permissions and user groups in dbt Cloud Enterprise, so contributors to your dbt project can do exactly what they need to—nothing less, nothing more—without heavy lift from your end.
- Model access: Establish access-levels to your dbt models, so the right folks are seeing the right data; and more importantly, the wrong folks aren’t seeing the things you don’t want them to see.
- Model versions: Version your dbt models to create quick development cycles without breaking downstream usage or reporting.
- Model contracts: Ensure your data is structured in a consistent and expected way to reduce downstream query issues.
- Pull requests in dbt Cloud: Whenever you make a change to your data transformations and try to push them to production, create pull requests and trigger jobs in dbt Cloud to test these changes and ensure they meet your requirements.
Optimize cost savings
We hinted at it earlier, but while cloud migrations come with upfront costs during the actual migration, a move to the cloud has significant long-term cost benefits:
- Separation of storage and compute: Modern cloud data warehouses decouple storage and compute costs allowing your team to focus on creating efficient data transformations and queries (the fun stuff) and not be bogged down by large volume storage costs (the not-as-fun stuff).
- TCO: With on-prem solutions, you’re managing almost every aspect of your data stack: you’re buying the physical servers, you’re maintaining them, you’re responsible for ensuring updates go well, and as a result, your total cost of ownership (TCO) is likely high. With a cloud data warehouse, you move into a more flexible pricing model where your data storage platform scales with your data as you need it. In addition, on-prem solutions are often maintained by highly technical folks; with a cloud data warehouse or lakehouse, anyone who feels comfortable with basic data storage principles and SQL can be involved in its maintenance.
- Smart materializations: When you pair your cloud storage systems with a transformation tool like dbt, you can start materializing entities in your data warehouse in smarter, more efficient ways. Many teams using dbt leverage incremental models where tables are not fully rebuilt every time they’re run; instead, incremental models allow you to transform only new or updated data within a specified time window. This type of materialization can take a table that used to rebuild in hours to minutes with dbt, saving considerable time and compute costs.
- Mature scheduling: dbt Cloud’s job scheduler also enables teams to only run the data transformations they need when they need it. Gone are the days of rebuilding all of your tables once a day (taking the whole day 😬), in are the days of rebuilding tables efficiently or incrementally using smarter materializations and more thoughtful orchestration.
Cloud migration with dbt: Do it yourself!
We just threw a lot of information at you: how cloud data storage platforms scale with your data, what security and governance looks like in these data stacks, and how to benefit from significant cost savings from a cloud migration.
There’s no perfect way to conduct a cloud migration and the benefits we listed above will come with time and effort. But these benefits, at whatever timeline they come on, will allow your organization to have data that is secure and data transformations that are collaborative and efficient.
So, what’s next for you?
- 🧠If you want to learn more about cloud storage, take a look at this guide on data warehouses.
- 👀If you’re interested in seeing dbt Cloud in action and learning how it can play a role in your cloud migration, take a tour of the product or see it live in a demo.
- 🏊If you’re ready to take the deep dive into dbt Cloud, signup for a free trial.
We can’t wait to see how you make your cloud migration your own with dbt by your side.
Last modified on: May 18, 2023