Blog Why we're updating Terms of Use in dbt Cloud

Why we're updating Terms of Use in dbt Cloud

We’re often asked how dbt Cloud stores and processes data related to your projects. This post delves into the details of our Terms of Use, which we have recently updated to provide more clarity on these processes. Read now
Why we're updating Terms of Use in dbt Cloud

The previous version of our Terms of Use for dbt Cloud included sections that were not sufficiently clear. Users have had questions about how their data was used and stored by dbt Cloud, and accessed or used by dbt Labs. In this post we’ll provide more detail about updates to our Terms of Use in hopes of making our position more clear.

Defining “Data”

Section 8.1 of the dbt Cloud Terms of Use (ToU) saw the biggest changes to clarify how we (dbt Labs) interact with “the Data(1).” But before getting into those changes, let’s talk definitions. The full definition of “the Data” according to the ToU is in a footnote below, but for now you can think of it as referring to both Platform Data and Client Data.

“Platform Data,” refers to data and data elements collected or generated by dbt Cloud regarding configuration, environment, usage, data structure, performance, vulnerabilities and security that may be used to generate logs, statistics and reports regarding performance, availability, integrity and security of dbt Cloud.

“Client Data” is the data in your warehouse, that you point dbt Cloud at. dbt Labs employees do not have any access to Client Data unless explicitly specified by the customer in Services and Support scenarios, and with one caveat that we’ll return to later, dbt Cloud only accesses Client Data as users perform actions in the product.

How does dbt Labs use data?

To better understand how dbt Labs uses any Platform or Client Data, we can refer to some of the Terms of Use updates mentioned above. Both data types are referred to collectively as “Data” in the Terms of Use. According to Section 8.1: Client Data & Platform Data:

Client is the sole and exclusive owner of all Data, including all proprietary rights therein. Client is solely responsible for the accuracy, quality, and legality of Data. Nothing in the Agreement grants to dbt Labs any rights of ownership or any other proprietary rights in or to Data. Data is Client’s Confidential Information. Client hereby grants to dbt Labs a nonexclusive, non-transferable (except in connection with an assignment permitted under Section 13.2), revocable license, under all proprietary rights, to reproduce, store, process, and use Data solely for the purpose of, and to the extent necessary for, providing the Services and performing its obligations under the Agreement.

Let’s rewind that last line: “—solely for the purpose of, and to the extent necessary for, providing the Services…” This is an extremely important point. This portion of the ToU explicitly states that dbt Labs may only use Data to provide customers ongoing Services (i.e. Subscriptions, Professional Services, Training Services, and access to dbt Cloud) that they expect in a well-functioning, well-supported experience.

What exactly does using Data in support of Services look like? In the following sections we’ll share some examples of how dbt Labs uses Data to deliver (and improve delivery of) the dbt Cloud experience.

Data used to provide the dbt Cloud Service

dbt Cloud enables users to write modular SQL to query, test, transform, and analyze data. In order to perform requested functions, dbt Cloud needs to be aware of the data you’d like to transform, for the extent of the transaction. For example, when a user writes select * from customers limit 100, the data from your customers table will pass through the dbt Cloud infrastructure on the way to your browser. However, this data is not persisted (cached or otherwise), and it does not live on our servers beyond the duration of your browser session. According to the Terms of Use, dbt Labs is also not permitted to access this data, except in support of the Services.

Data used to enable dbt Cloud account Support and Services

In order to provide dbt Cloud customers with any support they may need to access, manage, or engage with their dbt Cloud account, our Support team may require information about the user and their interactions, like the user’s provided contact information, when they last logged in, from where, for how long, and what actions they took. This enables our team to quickly respond to, and customize support to help users get unstuck faster. As with all Platform and Client Data, this Data is only used in support of, and for the duration of these Support transactions.

dbt Labs also offers hands-on training, solution review, and other project support services that require our team of in-house analytics engineers to access your projects in support of agreed-upon deliverables. In these cases, the Professional Services team at dbt Labs works as consultants to train and supplement the analytics capacity of your company. As such, they are responsible for accessing and modeling data the same way a company’s in-house employees might, with that company’s explicit agreement, and for the extent of the Services agreement only.

Data used to improve delivery of the dbt Cloud service

dbt Cloud Platform Data—which includes data about how users interact with the platform—is continually analyzed to help us build a better user experience and inform future requirements. With this data, we can more quickly identify bugs, spot performance degradation, or even recommend relevant resources for platform features you may not be fully utilizing.

One example of this analysis is ensuring a speedy interface. By studying the time between a user triggering a job, IDE loading, and models running, we can better understand how dbt Cloud handles the scale of its use. This analysis was recently used to identify scheduling delays some of our users were experiencing as a result of the sudden influx of dbt Cloud accounts. As a result, we were able to rearchitect our scheduler, resulting in a 10x improvement to scheduler startup time. We also made extensive improvements to the IDE using the same methodology, resulting in 25x faster IDE startup time.

What types of Data does dbt Cloud store?

As shared on our security page, dbt Cloud stores the following data persistently:

  1. dbt Cloud account information including job definitions, users, database connection information, and credentials. Cloud account information does not include any raw data from your warehouse.
  2. Logs associated with jobs and interactive queries you’ve run.
  3. Your dbt “assets” which includes metadata artifacts like run_results.json and manifest.json.

Where this has been confusing for users in the past, is whether any of the above information could include “Client Data” or otherwise “Personally Identifiable Data.” As mentioned above, dbt Labs does not have access to this data unless explicitly specified by the customer in Services and Support scenarios, or through interactions with dbt Cloud. So, how might a user’s interactions with dbt Cloud specify access to Client or Personally Identifiable Data?

By default, dbt Cloud logs and assets do not include raw data from the warehouse. The only way for raw warehouse data to appear in dbt Cloud is where the code written by a customer explicitly instructs this data to be so included. For example, while not advised, it is possible to write dbt code that fetches all customer data from your customer table and writes it to the logs. Again: we do not recommend doing this.

However, in the event any such data appears within the dbt Cloud Platform, the same parameters as above apply: such data may only be used by dbt Labs to provide the Services related to your account(s) and to perform dbt Labs’ obligations under the dbt Cloud service agreement.

In an effort to make this particular point more clear, Section 8.1 of the Terms of Use now states:

To the extent that Platform Data identifies or permits, alone or in conjunction with other data, identification, association, or correlation of or with Client, Client’s customers or Authorized Users (“Identifiable Platform Data”), dbt Labs will only collect and use Identifiable Platform Data internally to provide the Services and to perform its obligations under the Agreement. […]

We hope this post has helped clarify some critical sections of our Terms of Use agreement. For more detail on any of the above, please refer to the newly updated Terms of Use and Security policies on getdbt.com.

Footnotes

  1. According to the Terms of Use, “‘Data’ means all data, records, files, materials, information or content that is: (i) submitted or uploaded by Client or Client’s Authorized Users to or transmitted, processed, or stored by Client or Client’s Authorized Users using the dbt Labs Platforms in connection with the Agreement; and (ii) on the servers that Client or Client’s Authorized Users query, transform, process or otherwise access via the dbt Labs Platforms.”

Last modified on: Dec 14, 2022