“Platform Data,” refers to data and data elements collected or generated by dbt Cloud regarding configuration, environment, usage, data structure, performance, vulnerabilities and security that may be used to generate logs, statistics and reports regarding performance, availability, integrity and security of dbt Cloud.
“Client Data” is the data in your warehouse, that you point dbt Cloud at. dbt Labs employees do not have any access to Client Data unless explicitly specified by the customer in Services and Support scenarios, and with one caveat that we’ll return to later, dbt Cloud only accesses Client Data as users perform actions in the product.
How does dbt Labs use data?
Client is the sole and exclusive owner of all Data, including all proprietary rights therein. Client is solely responsible for the accuracy, quality, and legality of Data. Nothing in the Agreement grants to dbt Labs any rights of ownership or any other proprietary rights in or to Data. Data is Client’s Confidential Information. Client hereby grants to dbt Labs a nonexclusive, non-transferable (except in connection with an assignment permitted under Section 13.2), revocable license, under all proprietary rights, to reproduce, store, process, and use Data solely for the purpose of, and to the extent necessary for, providing the Services and performing its obligations under the Agreement.
Let’s rewind that last line: “—solely for the purpose of, and to the extent necessary for, providing the Services…” This is an extremely important point. This portion of the ToU explicitly states that dbt Labs may only use Data to provide customers ongoing Services (i.e. Subscriptions, Professional Services, Training Services, and access to dbt Cloud) that they expect in a well-functioning, well-supported experience.
What exactly does using Data in support of Services look like? In the following sections we’ll share some examples of how dbt Labs uses Data to deliver (and improve delivery of) the dbt Cloud experience.
Data used to provide the dbt Cloud Service
dbt Cloud enables users to write modular SQL to query, test, transform, and analyze data. In order to perform requested functions, dbt Cloud needs to be aware of the data you’d like to transform, for the extent of the transaction. For example, when a user writes
Data used to enable dbt Cloud account Support and Services
In order to provide dbt Cloud customers with any support they may need to access, manage, or engage with their dbt Cloud account, our Support team may require information about the user and their interactions, like the user’s provided contact information, when they last logged in, from where, for how long, and what actions they took. This enables our team to quickly respond to, and customize support to help users get unstuck faster. As with all Platform and Client Data, this Data is only used in support of, and for the duration of these Support transactions.
dbt Labs also offers hands-on training, solution review, and other project support services that require our team of in-house analytics engineers to access your projects in support of agreed-upon deliverables. In these cases, the Professional Services team at dbt Labs works as consultants to train and supplement the analytics capacity of your company. As such, they are responsible for accessing and modeling data the same way a company’s in-house employees might, with that company’s explicit agreement, and for the extent of the Services agreement only.
Data used to improve delivery of the dbt Cloud service
dbt Cloud Platform Data—which includes data about how users interact with the platform—is continually analyzed to help us build a better user experience and inform future requirements. With this data, we can more quickly identify bugs, spot performance degradation, or even recommend relevant resources for platform features you may not be fully utilizing.
One example of this analysis is ensuring a speedy interface. By studying the time between a user triggering a job, IDE loading, and models running, we can better understand how dbt Cloud handles the scale of its use. This analysis was recently used to identify scheduling delays some of our users were experiencing as a result of the sudden influx of dbt Cloud accounts. As a result, we were able to rearchitect our scheduler, resulting in a 10x improvement to scheduler startup time. We also made extensive improvements to the IDE using the same methodology, resulting in 25x faster IDE startup time.
What types of Data does dbt Cloud store?
As shared on our security page, dbt Cloud stores the following data persistently:
- dbt Cloud account information including job definitions, users, database connection information, and credentials. Cloud account information does not include any raw data from your warehouse.
- Logs associated with jobs and interactive queries you’ve run.
- Your dbt “assets” which includes metadata artifacts like
Where this has been confusing for users in the past, is whether any of the above information could include “Client Data” or otherwise “Personally Identifiable Data.” As mentioned above, dbt Labs does not have access to this data unless explicitly specified by the customer in Services and Support scenarios, or through interactions with dbt Cloud. So, how might a user’s interactions with dbt Cloud specify access to Client or Personally Identifiable Data?
By default, dbt Cloud logs and assets do not include raw data from the warehouse. The only way for raw warehouse data to appear in dbt Cloud is where the code written by a customer explicitly instructs this data to be so included. For example, while not advised, it is possible to write dbt code that fetches all customer data from your customer table and writes it to the logs. Again: we do not recommend doing this.
However, in the event any such data appears within the dbt Cloud Platform, the same parameters as above apply: such data may only be used by dbt Labs to provide the Services related to your account(s) and to perform dbt Labs’ obligations under the dbt Cloud service agreement.
To the extent that Platform Data identifies or permits, alone or in conjunction with other data, identification, association, or correlation of or with Client, Client’s customers or Authorized Users (“Identifiable Platform Data”), dbt Labs will only collect and use Identifiable Platform Data internally to provide the Services and to perform its obligations under the Agreement. […]
Last modified on: Dec 14, 2022