How to securely deploy dbt Cloud
The dbt Live: Expert Series features Solution Architects from dbt Labs stepping through technical demos and taking audience questions on how to use dbt. Catch up with recaps of past sessions, and sign up to join future events live!
As a lead Solutions Architect at dbt Labs, Ernesto Ongaro supports a range of enterprises, spanning insurance companies and financial institutions with varying team sizes and security concerns. In his dbt Live: Expert session, he shared an overview of practices to make dbt Cloud deployments more secure.
Video replay and overview
Note: Ernesto ran his demo with Snowflake and Github, but his recommendations are generally applicable and can be adapted to other cloud data platforms and Git providers.
All dbt Cloud users can use a username and password to log into the application, but this may leave their organization open to risk in certain scenarios – such as when an employee re-uses the same credentials in multiple applications, or leaves for another job (taking their user credentials with them).
For additional layers of defense, Ernesto recommends:
- Enabling Single Sign-On (SSO) to centralize user provisioning, deprovisioning, and multi-factor authentication via an Identity Provider (IdP).
- Creating Groups to manage role-based access and assign each user minimum privileges to do their job.
Customers who want an extra layer of insurance may opt for dedicated virtual private cloud, to ensure only users inside of the company VPN can access their dbt instance.
Once users can log into dbt Cloud, they need to connect to their database through the Cloud IDE to start developing models.
While users can enter their Snowflake, BigQuery, Databricks, or Redshift credentials into their dbt Profile, some database administrators might prefer to limit the number of people and apps with direct access to the warehouse.
Their alternatives for provisioning access:
- Enable OAuth to redirect dbt Cloud users to a designated sign-on screen, and prompt them to connect to the database via SSO to move forward.
- Use least-privilege principles, granting users the minimum amount of database permissions they need to execute jobs successfully.
- Allowlist: Use firewall rules to govern access to dbt Cloud’s IP addresses.
Some dbt Cloud customers may additionally use a PrivateLink to ensure data flows between the warehouse and dbt Cloud through a private (VPC to VPC) connection.
dbt Cloud users need the ability to run dbt jobs in production in order to build tables and views in that their BI tools (and other users in their organization) can query.
By default, they can use warehouse credentials (e.g. Snowflake username and password) in their dbt Profile to authenticate, but Ernesto recommends these measures for additional security:
- Use key pairs for enhanced authentication.
- Consider automated key rotation via a tool like Terraform to limit exposure time in case a key is compromised.
- As with personal authentication provisions, use least-privilege principles to provide users just enough access to run production jobs successfully in production, and nothing more.
- Allowlist with dbt Cloud’s IP addresses as a network policy to limit access.
To secure access to a shared Github repo, Ernesto recommends these measures over simple username/password authentication:
- Authenticate through SSO and multi-factor authentication on the Github side
- Allow access to just the relevant repositories in the organization – practicing least privilege once again.
- Add dbt Cloud IPs to Github’s allowlist.
Last modified on: Nov 29, 2023