Unveiling Domain's innovative use of Snowflake for efficient data processing and impactful decision-making from Coalesce 2023
Reuben Francis and Alex Rong discuss Domain's journey from chaotic data management to a more organized mode of operation.
“So, about three years ago, we started a conversation…’Is there a better way to do that...to make it more efficient and also make the result more consistent?”
- Alex Rong, Senior Data Engineer at Domain
Reuben Francis, Data Engineer, and Alex Rong, Senior Data Engineer, discuss the journey of Domain, a real estate data solutions company, from a chaotic state of data management to a more organized and efficient mode of operation. They also share their experiences and lessons from implementing data engineering solutions, particularly dbt.
The power of proper data organization in the real estate industry
Reuben and Alex discuss their journey from "chaos to clarity" in managing and analyzing real estate data. The team recognizes the chaotic nature of the real estate industry, where the landscape is ever-evolving and data comes in varying formats.
"Real estate is an unreal landscape. You're working [with] different qualities of data. You're looking at different formats... each source that we pull our data from has their own understanding of what ‘property’ means," Reuben explains.
Alex adds the importance of communication when moving towards a more structured and organized approach to data management. "We started a conversation internally saying ‘Hey, is there a better way to do that, and make it more efficient, and also have the result [be] more consistent?’”
The benefits of using specific software and methodologies to streamline data processing
The data engineers highlight the role of specific tools and methodologies in improving their data processing and management. They describe adopting software like Airflow and dbt and implementing Continuous Integration (CI) to their workflow.
Reuben explains, "We did have conversations with the team... you don't really have to worry too much about querying your tables, as long as these metrics and things make sense." Alex adds, "We utilize macros to load data from Snowflake to S3. That can be embedded into a dbt pipeline."
The importance of fostering a positive team culture for effective data management
Reuben and Alex emphasize the crucial role of a positive team culture in managing and analyzing data effectively. They share their experiences of working collaboratively at Domain Group and the benefits of maintaining open lines of communication within the team.
"There's so many possibilities...it's really down to what are your inputs, what are your outputs and what do you expect in your outputs, and that's how you design your process," Reuben reflects.
The practical applications of data organization in the real estate industry
Reuben and Alex share how their improved data management processes have led to practical applications in the real estate industry. They explain how their work has enabled them to deliver data products to various internal teams effectively.
"We're powering five internal products from our project," says Alex. He also emphasizes tagging to manage a large number of data pipelines, saying, "Tagging can be really flexible...that makes us at ease to manage a large number of deliverable pipelines."
Reuben and Alex’s key insights
- The real estate data landscape can be chaotic due to various data sources and formats and the dynamic nature of the property market
- Transitioning from chaos to clarity involves various stages: recognizing the chaos, controlling the chaos through preliminary organization, implementing ETL (Extract, Transform, Load) processes, and achieving sustained organization
- Tagging was used to manage large numbers of data pipelines, making it easier to manage and deliver data products
- Keeping processes simple and clear is crucial in data engineering. Overcomplicated models can be simplified without losing their functionality
- The culture within a data engineering team plays a significant role in the success of data management processes. Open communication and collaboration are key