Implementing JSON schema for effective product analytics from Coalesce 2023
Greg Clunies, senior analytics engineer, explains how Surfline uses JSON Schema for product analytics in their dbt project.
"We're able to define schemas. We're able to enforce just the conventions around them, and we're able to use these schemas or contracts for validation of events as they flow through our systems."
Greg Clunies, senior analytics engineer at Surfline, explains how Surfline uses JSON Schema for product analytics in their dbt project and to improve their event data quality. Greg discussed the struggles they've faced with event data and introduced an open-source project they've been working on called Reflect, which is used in production at Surfline.
Surfline uses JSON schema for product analytics and to improve event data quality
Surfline, a global surf forecasting platform, uses JSON Schema to manage its product analytics and improve the quality of its event data. Greg explains how the company leverages JSON Schema to enhance its event data quality and streamline its product analytics.
"We collect events about what these users are doing. It's super important for us to capture this data in a reliable and consistent manner so that we can use it," explains Greg. He emphasizes the need for reliable and consistent data, which is achieved through schema or contract definitions.
Greg mentions the struggles Surfline had with event data, such as the lack of defined processes or conventions, issues around awareness and maintenance of event data, and the question of who owns product analytics. He introduces an open-source project called Reflect that centers around the concept of enforceable and extendable event contracts to resolve these issues.
Reflect projects improve data quality and speed up workflows
Greg introduced the Reflect project which aims to improve data quality and speed up workflows. This open-source project utilizes two pillars: enforceability and extendability. The ability to define schemas and enforce conventions around them improves data quality, while the automation of the staging layer speeds up the workflow.
"Reflect is what we use in production at Surfline. This is not the only solution. It is the solution that works for us and our stack…it’s not a magic bullet. You still have to plan and collaborate with your team members, but what it has done is give us tooling and a process to follow,” Greg says. He affirms that this helps them bring “teams, and their opinions, and what's important to them together.”
Shared responsibility for product analytics
Greg also discusses the importance of shared responsibility and ownership when it comes to product analytics.
"We were really struggling with who owns product analytics. It's called product analytics, so ‘product’ might be a good place to start, but engineering writes the code for it…when there's a problem with the data, the first people that get contacted are the data team," Greg explains.
He suggests assigning a code owner to an event to ensure ownership and responsibility. “If something goes wrong with this event, we know who to contact right away on the engineering side," Greg explains. This approach has led to increased efficiency and improved collaboration among teams.
Greg's key insights
- Surfline uses JSON Schema to set the stage for product analytics and help with event data quality
- Reflect, an open-source project, has been developed to enforce and extend event contracts and improve data quality
- The Reflect project looks like a dbt project and includes a folder for schemas which holds the event contracts and a folder called artifacts that holds dbt artifacts
- Reflect is used to define schemas and enforce the conventions around them. It also validates events as they flow through the system
- Reflect can automate the creation of sources, staging models, and documentation, saving significant time for the analytics team