class: title, center, middle # Docs --- class: subtitle, middle, center ## _How is knowledge documented and maintained at your company?_ --- # Why is documentation important? - Empowers the writer to refer back to their thinking - Empowers the reader to find their own answers - Increases velocity of onboarding - Reduces for ad hoc, repeated communication ---
Docs | Focus
Explain how to add descriptions to models and sources
Configure doc blocks
Generate and view documentation in dbt Cloud and the command line
Build intuition for the value of documenting your dbt project
--- # Our favorite metaphor * Analytics engineers are librarians. * Documentation is the card catalog. * The goal is to empower analyst-researchers. -- * _The librarian and the card catalog do not replace each other—they reinforce and enable each other!_ ### Analytics is the organization of an organization's information. --- # Our strong opinions! .denser-text[ * Code is _surface area_. Testing and documentation are _coverage_. * Untested, undocumented code _cannot be trusted_. In some sense, it is worse than having no code at all. * Reliable documentation is _key_ to scaling a data team.¹ * The generation of data documentation should be automated. * Data documentation should include both: * Inferred information from the codebase/database * Manual inputs ("this field means that") * The process to update documentation _must_ be part of the same workflow as the transformations it seeks to describe. If the two are separate, documentation will _always_ lag behind. ] ??? - Slide setup: From dbt Viewpoint: "Bad data can lead to bad analyses, and bad analyses can lead to bad decisions. Any code that generates data or analysis should be reviewed and tested." --- class: subtitle, middle, center ## dbt docs ??? not docs.getdbt.com, though those are also great --- ### Adding descriptions .dense-text[ Descriptions can be added to models, sources, and columns (in your existing `.yml` files) ] .left-column[ .denser-text[ ```yml version: 2 models: - name: stg_orders description: One record per order columns: - name: order_id tests: - unique - not_null - name: status description: "{{ doc('order_status') }}" tests: - accepted_values: values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: amount description: Amount in USD ``` ] ] .right-column[ .denser-text[ ```yml version: 2 sources: - name: jaffle_shop description: A replica of the postgres database database: raw tables: - name: orders description: One record per order columns: - name: id tests: - unique - not_null - name: status description: "{{ doc('order_status') }}" tests: - accepted_values: values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] ``` ] ] ??? Demo in dbt cloud. -- .dense-text[ What's the `{{ doc('order_status') }}`? ] --- ### `docs` blocks `docs` blocks allow you to write long-form descriptions. Docs blocks are added to `.md` files in your `models/` directory. .denser-text[ ```md {% docs order_status %} One of the following values: | status | definition | |----------------|------------------------------------------------------------| | placed | Order placed but not yet shipped | | shipped | Order has been shipped but hasn't yet been delivered | | completed | Order has been received by customers | | return_pending | Customer has indicated they would like to return this item | | returned | Item has been returned | {% enddocs %} ``` ] ??? - Docs blocks are great for special formatting but also allow for more modular documentation. A given .md file can be referenced and used in multiple descriptions without having to copy and paste any text. --
demo
??? Demo Time: Regular docs: - "Let's see that in action!" - add regular model and/or column descriptions to your yml file - run `dbt docs generate` - show where those end up in the docs Doc blocks: - add a doc block file `order_status.md` - paste in the above markdown code ensuring to add newline white spaces - add description: `"{{ doc('order_status') }}"` - run `dbt docs generate` - within docs, find the doc block and talk about it - talk through all the other data shown (metadata, columns, types) - talk through DAG / lineage graph and model selection syntax --- # Generating documentation .denser-text[ When dbt generates documentation it takes into account: * Everything it knows about your project: * Descriptions (ofc!) * Model dependencies * Model SQL * Sources * Tests * Additional things about your warehouse (from the information schema) * Column names and types (even if they aren't documented!) * Table stats This information is used to populate a documentation website. For example: [dbt Learn demonstration project](https://cloud.getdbt.com/accounts/1951/jobs/5297/docs/#!/overview), [our MRR playbook](https://www.getdbt.com/mrr-playbook/#!/model/model.acme.mrr), [GitLab's dbt project](https://dbt.gitlabdata.com/). ] --- ## dbt CLI .left-column[ _Generate_ ```bash dbt docs generate ``` ] .right-column[ _Serve_ ```bash dbt docs serve ``` ] --- ## dbt Cloud - Development .left-column[ _Generate_
] .right-column[ _Serve_
] ??? - No need to run the 'dbt serve' command in dbt cloud in order to view dbt docs. --- ## dbt Cloud - Deployment .left-column[ _Generate_
] .right-column[ _Serve_
] --- # What should I document? .dense-text[ Our rule of thumb: anything that is not immediately obvious. For example: * Sources: - How the source data came to be in the warehouse - Known caveats - The definitions of poorly named fields and tables * Final models: - The grain of the model. Potentially a description of how the model was built. - Column definitions, especially for columns that include business logic * Intermediate models - As required ] ??? - Consider the intended audience for the documentation. Are the docs primarily for your team/department or are they intended for a wider, less technical stakeholder group? --- class: subtitle ## Checkpoint - **Share out:** What are some elements of dbt documentation that you are excited to implement and why? --- class: subtitle # Knowledge check You should be able to: * Add descriptions to your dbt project * Generate documentation for your project --- class: subtitle # Zoom Out
Distributed dbt Learn norms
Who is an analytics engineer?
dbt Fundamentals
Working Group 1
dbt Project Design
Working Group 2
Testing
Sources
Docs
--
Deployment
Working Group 3
---