class: title, center, middle # Docs --- class: subtitle, middle, center ## _How is knowledge documented and maintained at your company?_ --- # Why is documentation important? - Empowers the writer to refer back to their thinking - Empowers the reader to find their own answers - Increases velocity of onboarding - Reduces for ad hoc, repeated communication ??? test notes ---
Docs | Focus
Explain how to add descriptions to models and sources
Configure doc blocks
Generate and view documentation in dbt Cloud
Build intuition for the value of documenting your dbt project
--- # Our strong opinions! .denser-text[ * Code is _surface area_. Testing and documentation are _coverage_. * Untested, undocumented code _cannot be trusted_. In some sense, it is worse than having no code at all. * Reliable documentation is _key_ to scaling a data team.ยน * The generation of data documentation should be automated. * Data documentation should include both: * Inferred information from the codebase/database * Manual inputs ("this field means that") * The process to update documentation _must_ be part of the same workflow as the transformations it seeks to describe. If the two are separate, documentation will _always_ lag behind. ] ??? - Slide setup: From dbt Viewpoint: "Bad data can lead to bad analyses, and bad analyses can lead to bad decisions. Any code that generates data or analysis should be reviewed and tested." --- class: subtitle, middle, center ## dbt docs ??? not docs.getdbt.com, though those are also great --- ### Adding descriptions .dense-text[ Descriptions can be added to models, sources, and columns (in your existing `.yml` files) ] .left-column[ .denser-text[ ```yml version: 2 models: - name: stg_jaffle_shop__orders description: One record per order columns: - name: order_id tests: - unique - not_null - name: status description: "{{ doc('order_status') }}" tests: - accepted_values: values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: amount description: Amount in USD ``` ] ] .right-column[ .denser-text[ ```yml version: 2 sources: - name: jaffle_shop description: A replica of the postgres database database: raw tables: - name: orders description: One record per order columns: - name: id tests: - unique - not_null - name: status description: "{{ doc('order_status') }}" tests: - accepted_values: values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] ``` ] ] ??? Demo in dbt cloud. -- .dense-text[ What's the `{{ doc('order_status') }}`? ] --- ### `docs` blocks `docs` blocks allow you to write long-form descriptions. Docs blocks are added to `.md` files in your `models/` directory. .denser-text[ ```md {% docs order_status %} | status | definition | |----------------|------------------------------------------------------------| | placed | Order placed but not yet shipped | | shipped | Order has been shipped but hasn't yet been delivered | | completed | Order has been received by customers | | return_pending | Customer has indicated they would like to return this item | | returned | Item has been returned | {% enddocs %} ``` ] ??? - Docs blocks are great for special formatting but also allow for more modular documentation. A given .md file can be referenced and used in multiple descriptions without having to copy and paste any text. --
demo
??? Demo Time: Regular docs: - "Let's see that in action!" - add regular model and/or column descriptions to your yml file - run `dbt docs generate` - show where those end up in the docs Doc blocks: - add a doc block file `order_status.md` - paste in the above markdown code ensuring to add newline white spaces - add description: `"{{ doc('order_status') }}"` - run `dbt docs generate` - within docs, find the doc block and talk about it - talk through all the other data shown (metadata, columns, types) - talk through DAG / lineage graph and model selection syntax --- # Generating documentation .denser-text[ When dbt generates documentation it takes into account: * Everything it knows about your project: * Descriptions (ofc!) * Model dependencies * Model SQL * Sources * Tests * Additional things about your warehouse (from the information schema) * Column names and types (even if they aren't documented!) * Table stats This information is used to populate a documentation website. For example: [dbt Learn demonstration project](https://cloud.getdbt.com/accounts/1951/jobs/5297/docs/#!/overview), [our MRR playbook](https://www.getdbt.com/mrr-playbook/#!/model/model.acme.mrr), [GitLab's dbt project](https://dbt.gitlabdata.com/). ] --- ## How do I generate docs? * Generating documentation in dbt is as simple as running a command. * Run `dbt docs generate` * Docs page will be created and saved in the `view docs` link in the IDE --- ## dbt Cloud - Development .left-column[
] .right-column[
] ??? - When using the CLI, `dbt docs serve` is the command to generate documentation - No need to run the 'dbt serve' command in dbt cloud in order to view dbt docs. --- ## dbt Cloud - Deployment .left-column[
] .right-column[
] --- # What should I document? .dense-text[ Our rule of thumb: anything that is not immediately obvious. For example: * Sources: - How the source data came to be in the warehouse - Known caveats - The definitions of poorly named fields and tables * Final models: - The grain of the model. Potentially a description of how the model was built. - Column definitions, especially for columns that include business logic * Intermediate models - As required ] ??? - Consider the intended audience for the documentation. Are the docs primarily for your team/department or are they intended for a wider, less technical stakeholder group? --- class: subtitle ## Checkpoint - **Share out:** What are some elements of dbt documentation that you are excited to implement and why? --- class: subtitle # Knowledge check You should be able to: * Add descriptions to your dbt project * Generate documentation for your project --- # Resources * [dbt docs generate](https://docs.getdbt.com/reference/commands/cmd-docs) * [Description Property](https://docs.getdbt.com/reference/resource-properties/description) * [Docs Blocks](https://docs.getdbt.com/docs/collaborate/documentation#using-docs-blocks) * [Markdown Guide - GitHub](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) * [Build & View docs with Cloud](https://docs.getdbt.com/docs/collaborate/build-and-view-your-docs) * [Custom Documentation Overview](https://docs.getdbt.com/docs/collaborate/documentation#setting-a-custom-overview) --- class: subtitle # Zoom Out
Distributed dbt Learn norms
Who is an analytics engineer?
dbt Fundamentals
Working Group 1
dbt Project Design
Working Group 2
Testing
Sources
Docs
--
Deployment
Working Group 3
---