Data lessons learned from benchmarking the IDE’s performance

Discover more about author and dbt enthusiast Andrew Escay.

Discover more about author and dbt enthusiast Kira Furuichi.

This post is part of the “How a bill becomes a law (data edition)” series where you follow the Data Team at dbt Labs as they take a data project from a request ticket to data ready for production.

When the dbt Cloud IDE launched in 2020, it opened the doors for many data folks to collaborate on dbt projects while relieving some of the stresses of the command line, local environment setup, and git actions. As the number of users and demand on the IDE has grown in the past two years, so have the pains users have felt in some of their interactions with the IDE.

Nate Nunta, the product manager for developer experience at dbt Labs, recently wrote up an awesome blog post about some of the work happening to better the IDE experience for the short-term (improving startup and file retrieval times) and the long-term (adding new functionality to the IDE). With that work underway, Nate and the engineering team wanted to ensure that we had solid benchmarks for the current IDE performance to compare against future versions.

So where does the data team fit in this work? How do we make sure that our product analytics keep up with the re-architecture of the product itself?

As an analytics engineer embedded on the product team, I was responsible for doing the initial data modeling to create benchmark numbers for the IDE’s performance. The more we dug into the data, the more we learned about the complications, edge cases, and areas of improvement.

This blog is definitely about the type of data work that we typically associate with data work (investigative and creative stuff). Beyond that, this piece is about how direct collaboration with engineers for event tracking is the best axe to wield in product analytics.

Below, we’ll unpack what we thought we knew about the IDE performance, why modeling this data was not as straightforward as we expected, and how we collaborated to create better and more accurate performance tracking for the IDE.

Phase 1: What we thought the issue was #

One of the biggest priorities for the IDE team is improving the time it takes from when a user hits Develop to loading a workable IDE.

Some context on what’s happening behind the scenes while the spinner is running:

  1. We provision a dedicated server to power your environment.
  2. We start a dbt-rpc server to actually run your dbt commands.
  3. We retrieve all of your dbt project’s files and their contents.

Once all three steps complete, you can finally start development work.

For a while, we believed the rpc server startup was the only step causing significant delay in the IDE startup time. This led us to model the data in a way that shows IDE startup times in the 15-30 seconds arena for the majority of users. As more features were added to the IDE, such as the DAG in the IDE, there were more complex things happening causing the previous startup numbers to be only part of the story.

Phase 2: Finding the culprit #

In Q1 of this year, an engineer on the IDE team identified two other considerable components impacting the IDE startup time that we had not originally been looking at:

  1. A “health check”: The time it takes to get back “signs of life” from the dedicated server that is created as part of step 1 in the context shared above.
  2. File retrieval time: The time it takes to retrieve your dbt project’s files while the rpc is spinning up

From this, we learned that the real IDE startup time was a combination of what we already knew (the rpc server startup) and what we didn’t know (the health check and file retrieval time). Once we identified these additional factors, our engineering team implemented new events to hopefully capture more accurate startup times:

  • ide lifecycle start: A frontend event that would fire soon after a user hit the Develop button
  • ide lifecycle complete: A frontend event that would fire once the IDE was ready for use
  • first successful file request: A backend logged timestamp that would mark when a dbt project’s files were successfully retrieved, ultimately indicating the IDE is ready to go

In this world, the two frontend events (ide lifecycle start and ide lifecycle complete) would fire accurately and we could take the difference in their timestamps to find out the IDE start time for a given session.



	datediff('seconds', ide_lifecycle_start_ts, ide_lifecycle_complete_ts) as 

from {{ ref('fct_ide_sessions') }}

But let’s humble ourselves a bit and remember—frontend events are rarely a data team’s friend. Ad blockers, VPNs, page refreshes, and luck of the draw are all things that can impact frontend events from firing and passing the correct data down. This is only part of the reason that product analytics is hard. We chose to use frontend events here because backend events are not capable of capturing the full user experience that’s felt on the browser.

We consequently uncovered that one of the biggest issues with this setup was with the ide lifecycle complete event not firing consistently all the time. As a result, we had to use backend events to find the best approximate “completion” moment in the event ide lifecycle complete didn’t fire accurately.

Two separate IDE sessions demonstrating the perfectly imperfect nature of frontend events
Two separate IDE sessions demonstrating the perfectly imperfect nature of frontend events


active_session_summary as (
   min(case when startup_event in ('page_view', 'ide_lifecycle_start') 
   then event_at end) as startup_time,
   max(case when startup_event in ('ide_lifecycle_complete', 
   'first_successful_file_request_at') then event_at end) as 
   datediff('s', startup_time, startup_completion_time) as startup_time_in_s
from {{ ref('fct_ide_sessions') }}
group by 1
select * from active_session_summary

Definitely not ideal, but we could work with this. Using these new client-side events in combination with page view data and some additional backend events, the data was showing that IDE startup times were taking longer than the engineering team expected. We knew from our existing data, our own experiences with using the IDE, and user anecdotes that these startup times were likely not telling the full story.

From here, we also unpacked several edge cases, as well as a huge component that was causing either a positive or not-as-positive IDE startup experience: hot and cold starts.

  • If you remember when you first startup the IDE, there’s three key things happening behind the scenes: a server starting up, the retrieval of your dbt project’s files, and an rpc server firing up. When all three actions happen, this qualifies as a cold start and can cause longer IDE startup times.
  • A hot start occurs only when your dbt project’s files (and their content) had already been previously retrieved by some earlier cold start (usually in the same day), such as when you refresh the browser while you’re in an IDE session. During a hot start, all three processes are not kicked off from the beginning again and cause shorter IDE startup times.

It was important that the team was able to segment out IDE sessions from hot start and cold starts, so they could better understand the sessions affected by longer startup times.

We had no clear way of identifying whether an IDE session belonged to a hot or cold start. Sowe created new fields in our dbt models to separate out hot and cold start IDE sessions:


	lead(startup_event) over (
		partition by page_view_id
		order_by event_timestamp
	) as lead_event,
		when startup_event = 'page_view'
		and lead_event = 'start'
			then 'cold_start'
		when startup_event = 'restart_ide'
		and lead_event = 'start'
			then 'cold_start'
		else 'hot_start'
	end as startup_type

from {{ ref('fct_events') }}

We additionally identified some weird edge cases around hot and cold starts that required a little more complex work—because of, again, the finicky nature of frontend events. Given this pretty hacky data setup, we felt about 70% confident in the numbers we were reporting on for the IDE’s performance. We knew that if we wanted to get 90+% confidence in these numbers, we would either have to do more data gymnastics to acknowledge all edge cases or apply better tracking.

Phase 3: Better events 🤝 happier data people #

Don’t get me wrong, data work is often a little wonky and solutions are not always what we initially expect, but we knew that we could have more accurate data for a core feature of our product with better event tracking in-place.

So when Nate told us about the IDE revamp, we had two options:

  1. Work with the same events and data as-is, or
  2. Work with our engineers to implement newer and better tracking for IDE startup times

The first option is something a lot of data folks are often tasked with: answer 100% of the questions with 80% of the data. In this case, the engineer responsible for adding the tracking on the new IDE reached out to me about tracking advice. With the right events applied, I knew we could get 100% of the data we needed.

Working with our engineering and product team, our brilliant engineering team added additional and more reliable frontend events and fields:

  • A field to denote if an IDE session was a hot or cold start
  • A lifecycle step event that has four possible statuses (prework, start, or ready, heartbeat) that all contain their time durations

Hold up—did we just say more frontend events after we just talked about how potentially unreliable they are? Yes, you read that correctly. Two big points to note here:

  1. Many of the events that happen in the IDE take place in-browser. As a result, there’s no way to fire these events from the server-side because the server isn’t always aware of what’s actually going on and there may be delays between when the server kicks off an action and when a user truly experiences that step. By default, we sort of had to use frontend events here to capture the full user experience.
  2. The initial frontend events we implemented were essentially reactive instead of proactive. These new events were designed with engineering to ensure they were well-tracked, or at the least better tracked, knowing now what we had learned from our first events.

All this to say, this is a case of where frontend events can be incredibly useful for product analytics, you just have to make sure the correct ones are instrumented.

But what did this mean for capturing IDE startup times? This meant that we could simply take the difference between the minimum lifecycle step (prework) and the maximum lifecycle step (ready) to understand the IDE startup time. The addition of the hot and cold start flag could also help us further segment startup times by hot or cold experiences without having to “guess” what might be a hot or cold session.


   startup_type, --hot/cold flag
   min(lifecycle_at) as startup_time,
   max(lifecycle_at) as startup_completion_time,
   datediff('s', startup_time, startup_completion_time) as startup_time_in_s
from {{ ref('fct_ide_sessions') }}
group by session_id, startup_type

Now, we have more approximate benchmarks for the IDE startup time that we can compare against the future version of the IDE as it moves from development to GA (more on this at our next Staging!).

On collaboration #

Backing out a bit, data teams are kind of infamous for being reactive instead of proactive. To be perfectly clear, this isn’t usually by choice; data teams are often having to play catch-up with engineers or business stakeholders when they implement or overwrite fields, launch new campaigns without telling anyone, or change OKR metrics a week before review seasons.

In our older data models requiring hacky identification of hot/cold starts and unreliable events, I was retroactively working with not-so-awesome data. The IDE update was a real opportunity for me to work with the team to get the fields and events that would make the data modeling much more straightforward and accurate. Luckily, I had amazing engineers reach out to collaborate on what we should be building because I often forget that playing catch-up actually doesn’t have to be the default, and there are real opportunities to improve your company’s data—not just after it’s already been instrumented.

But this kind of collaboration doesn’t happen overnight. I kind of skimmed over it earlier, but I want to add some context on my collaboration with the engineers and product team.

Before adding these new events, I was meeting with the engineering team on a weekly basis to talk about how their work is impacting the most critical performance numbers that the business cares most about. Through these synchronous meetings, I became a resource for the team to bounce ideas off of and I had built the muscle to regularly collaborate with these folks. We likely would have created a re-architecture of the IDE using the same old events that led to considerable data quality issues and modeling complexities had it not been for these relationships.

All this to say, there is trust, relationship, and empathy-building that needs to happen in order for data and engineering teams to have full context on complex systems that each team is working on, and to develop a deeper understanding of each team’s needs. Once the gap between engineers and data teams is narrowed, it makes it much easier to collaborate and add value. In this case, because we chat about our product analytics every week, our engineers were familiar with the numbers and aware that we needed to check our tracking before launching a new version of the IDE.

The next generation of the IDE is underway and the team (pictured above) is excited to show how we used this work to target feature improvements that ultimately lead to improvements in the overall IDE startup time. If you’re ready to hear more about the IDE improvements, make sure to register for our next Staging event!

Last modified on:

Join us for Coalesce 2022

The annual conference dedicated to advancing analytics engineering