dbt
Blog Data leadership in the age of AI - Part 2

Data leadership in the age of AI - Part 2

Aug 28, 2024

Insights

Recently, I sat down for a chat with Andrew Foster, Chief Data Officer at M&T Bank, and Karthik Ravindran, who leads Data Governance and Enterprise Data at Microsoft. In the first part of our talk, we focused on the importance of a human-centered approach to AI. In the second part, we dove deep into questions such as how to handle multi-modal data, challenges with AI, and how to leap the hurdles that prevent us from getting AI apps into production.

Handling multi-modal data

Q: We’ve focused on structured data quality for 20 or 30 years. And now GenAI shows up and suddenly there’s a renewed focus on unstructured data. How are your organizations dealing with the new multi-modal data model that AI is bringing to the fore?

A: There are many businesses in the world powered by unstructured data. Think of banks and the paperwork that goes into securing a mortgage. Many other businesses run almost solely off of structured data. And then you have semi-structured data — e.g., a table or a form in the middle of a PDF.

Being a SQL person myself, I tend to think of the world in structured data sets. My implicit desire with unstructured data is to structure it—find features you want to extract from emails or PDFs and put them into a well-understood schema with defined constraints. Once that’s done, you can query, aggregate, represent, and operationalize that data.

This process will likely involve GenAI, but it'll also require the more deterministic or classic Machine Language tooling we’ve developed, which still has a ton of utility.

Andrew Foster: There are two words here I think about. One is scale. The sheer volume and linear growth of data means you can’t just scale up with people.

The other is tiering. There’s risk tiering—the risks of the model you’re using, like Karthik discussed, and how you’re mitigating those risks. But there’s also tiering of the data itself. Do you have an intended use for it? What’s your quality bar for it? Are you keeping a human in the loop or fully automating your decision processes?

You can use these two criteria—scaling and tiering—to look at information sets across an organization and size them up based on predicted use cases. That gives you a more flexible approach to how you process data than a “one size fits all” rule. That approach will never work. It won’t scale.

AI challenges (and successes)

Q: Can you share an AI challenge with us? Any success stories?

A: A couple of months ago, at Snowflake Summit, we unveiled Ask dbt. It’s a chatbot that plugs into your data warehouse so you can ask questions about your data using natural language.

The problem we ran into was that business metrics are defined precisely. However, people often don’t know the exact wording to solicit the answer that they care about.

The solution came about from another project that we’ve been working on, the dbt Semantic Layer, for the past three years or so. With a Semantic Layer, we can define standardized metrics and also tag different business-centric metadata to them.

We add things such as, what is the standard definition of the metric? What are the descriptions of its dimensions? With this information, we can create the opportunity for an LLM interface into business metrics and their underlying data.

This is really compelling. It lets users intuitively answer questions like, “Who’s the owner of the Salesforce account for Acme Company?” We think this is the future of chat, where your data experiences flow through something like a Semantic Layer.

Karthik Ravindran: As physical data estates explode, the shape of the data continues to evolve. One really powerful technique is focusing on what Gartner calls Active Metadata.

If we can extract technical, business, and functional metadata agnostic from the modality of our physical data estates, we can elevate data management and governance to this metadata layer. We can then set policies for data management and scale our data governance across the underlying physical data estates.

Getting there was more of a human problem to solve than a technical problem, though. We got good at collecting technical metadata. And we’d generate graphs that were nice to stare at. But we didn’t have a way to connect those to business outcomes.

So what we’ve developed is a split between my team, which provides the systems and solutions, and our stakeholders who take accountability for the logical data estate and business metadata. We can’t do that for them—we don’t understand the details of every single data domain. We need stakeholders to jump in and bring that expertise.

The outcome has been profound. Because now, we've got a sharp sense of accountability for the domain teams and experts.

Skills and talent needed for AI

Q: What skills and talent are most critical for data teams working with AI? How are you addressing ‌talent gaps?

A: We’re currently seeing software engineers digging in and building these LLM-based systems. ‌I hope this doesn’t come across as offensive or sound like a secret, but not many software engineers are very good at data. Some are actually shockingly bad at data.

Software engineers will need to learn a lot more about how data works. By that I mean, not just the mechanics of where it lives and how to access it. I also mean how to think about data.

I think creativity is an important skill and orientation for people to have when dealing with data. Most of the outsized benefits we’ve discussed today will accrue to those who find a shorter path to existing business outcomes.

Slapping an LLM onto an existing workflow may get you budget for the next quarter. But it won’t result in significant efficiency gains.

Andrew: One thing I generally look for in my team is high emotional intelligence. My team sits in the middle of a large organization with a lot of business engagement. We have to work with traditional teams of modelers—which is highly technical work in nature—as well as with various technology and business partners. And we have to find a way to bring everyone’s expertise into this ever-increasing and important space.

No one operates in a silo. When you hire, it’s less about how you’re hiring for an individual and more about how you’re structuring your teams. You have to find a good balance between hiring a generalist and hiring a specialist.

Karthik: I think adaptive leadership and change management are going to be crucial. And not just from executives and senior leadership—it’s something we need from every team member who interacts with data.

Andrew made great points earlier about data literacy—that’s so critical.

Tian Kai Feng published a great book called Humanizing Data Strategy. It shows how the crux of our AI data transformation journeys are very human-centered. Tech isn’t the solution—it’s the enabler. Change management, adaptive leadership, the emotional intelligence that Andrew spoke of—those are super important things to get right.

Engineers and data scientists can’t just look at a problem and say, that’s not my problem. Heck no—it is your problem. You can’t just assume you can build something, toss it over the fence, and it’ll work like magic. You need to get in the weeds and lead with the outcome, with the business benefit.

Conclusion: Key takeaways

Q: What are two or three takeaways people should take from this discussion?

A: Like I said at the beginning, five percent of AI projects are in production today. There are a lot of interesting things we can do better and at scale.

Step one is getting started. For example, with data quality governance, it’s important not to do anything outside of the risk profile of your organization. But also, don’t be afraid to start somewhere.

Karthik: Pay attention to keeping humans in the loop. Despite all of the innovation coming out of AI, humans in the loop and “Humans + AI” is still the name of the game. It’s not “humans minus AI.” I encourage people to think about how to institutionalize this, both as a strategy and in practice.

Andrew: Challenge your assumptions. AI isn’t incredibly different than what preceded it. But aspects of it are different enough that they could impact every part of how your organizations are run.

Look at the components that interact with AI. Because some of them will need to change—and you need to recognize that early.

Last modified on: Oct 16, 2024

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts