The need for a new approach to the semantic layer from Coalesce 2023
Artyom Keydunov, co-founder & CEO of Cube Dev, discusses the future of the semantic layer, uniting BI tools, embedded analytics, and AI agents.
"The promise of the semantic layer, if we think about it, is to deliver metrics to all these downstream tools."
Artyom Keydunov, co-founder & CEO of Cube Dev, discusses the future of the semantic layer, uniting BI tools, embedded analytics, and AI agents. This new approach brings together data, enabling data tools to introspect data model definitions and seamlessly interoperate within the data stack.
The value of a semantic layer in data engineering
Artyom emphasizes the importance of semantic layers in data engineering, defining it as applying the ‘don't repeat yourself’ principle to data consumption architecture. A semantic layer sits between one or multiple data warehouses and various data consumption tools, acting as a denormalization engine to build and deliver metrics.
Artyom identifies four pillars of a semantic layer: data modeling, access control, caching, and APIs. The layer essentially takes definitions from data visualization tools and uses them across all data consumption tools.
"Let's take all of these definitions from these visualization tools and put them into semantic layer, and then use them across all of these different data consumption tools," he explains. This, in turn, would reduce repetition and streamline the overall data consumption process.
The need for a standard in semantic layers
Artyom highlights the need for a standard in semantic layers, as every time a new tool is deployed, metrics are defined inside that tool, resulting in repeated work. A standard would help align visualization tools and semantic layers to provide a similar user experience to integrated semantic layers.
"We need to support the ever-expanding set of different technologies," he says. However, he acknowledges the challenges posed by the fragmented nature of BI tools and the near impossibility of building and maintaining one-to-one connectors for all of them.
"The problem is that the BI market is very fragmented, and there are so many different BI tools…it would be almost impossible to build one-to-one connectors to all of them and maintain them well," he concedes.
Different approaches to semantic layers
Artyom also covers different approaches to describing semantic layers. One was a “metrics first” approach, where metrics are defined as first-class objects and exposed as a list. The other was a “dataset first” approach, where semantic layers are exposed as a set of tables that can be searched for specific metrics.
"In metric first, its metric is a first-class object, right?… that usually has a time dimension and a set of dimensions connected to that metric," he explains. However, he points out that this approach might be harder for BI tools to generate.
"In the dataset-first approach… semantic layer would look like a set of tables and data consumers would use that set–that list–to start working with metrics," he elaborates. This approach aligns better with how most BI tools work but could make it more difficult for people within organizations to understand.
Using SQL in querying protocols
Artyom stresses the significance of SQL in querying protocols as most data tools use SQL to query data from cloud data warehouses. However, he points out the complexities involved in querying metrics with SQL.
“SQL feels like an obvious choice," he says. However, he also mentions the potential difficulties with SQL when used to query metrics, noting that "SQL is not designed really to query metrics."
The need for a standard metadata API
Artyom highlights the necessity for a standard metadata API, which would enable BI tools and others to build consumers that can retrieve all the necessary data and map it to native objects, providing an integrated experience.
"We need this metadata standard API, so it can be standardized across all the different tools," he states. This would enable the delivery of consistent metrics to all data consumption tools.
Artyom’s key insights
- The semantic layer should act as a denormalization engine, delivering metrics to the data consumption layer
- Artyom emphasizes the importance of a standard for the semantic layer
- Two approaches to the semantic layer are: “metrics first” and “data set” first. Each approach has its pros and cons
- Artyom also discussed querying protocols, stating that SQL seems to be the obvious choice due to its widespread use. However, there are challenges in querying metrics with SQL