Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Git for the rest of us
Originally presented on 2021-12-06
“Oh, ” the software engineer said, “git’s easy. All you need to do is follow these ten exact steps in their exact order — just git pull, then git checkout -b, then …. ”. 😑😑😑
If you’re starting out with git, this talk is for you. Together, we’ll demystify the git flow and build a mental model for how git works. You’ll leave the talk feeling ready to use git on every project you work on! (Conversely, if you use git regularly, this talk is not for you! Get outta here and use this 30 mins to recharge! )
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Kyle Coapman: [00:00:00] Hello everybody. And welcome to Coalesce. My name is Kyle Coapman. I’m the head of training over at dbt Labs, and I will be your host for this session. The title of the session is Git for the Rest of Us. And so I’ll be joined by my former teammate here at dbt Labs, long time dbt community member and founder of the Analytics Engineers Club, Claire Carroll. I’ve learned a ton from Claire. I’ve had the pleasure of teaching with Claire as well. I learned a lot about how to best teach data concepts. And so all chat conversations, taking place in the Coalesce, Git channel of dbt Slack. If you’re not part of that chat, you have time to join right now and visit community.getdbt.com and search for a Coalesce.
Get, it might be the shortest channel name you use this week. When you enter the space, we encourage you to ask other attendees questions, make comments. I’m looking for a lot of get means everybody, because that is the topic right now. And [00:01:00] after the session Claire’s going to hang out, answer some questions.
We might do some live. We’ll definitely do some async and without further ado, Claire, over to you.
Claire Carroll: Awesome. Thank you so much, Kyle. Kyle says he learned things from me, but I actually learned things from him. Kyle’s an actual teacher and I’m just faking it. Let’s see how we go today.
So my name is Claire Carroll. I use she/her pronouns. And as Kyle mentioned, I am currently the co-founder of Analytics Engineers Club, which is a training course that turns analysts into analytics engineers by teaching them not just software engineering practices and principles, but how to think like a software engineer and how to apply those concepts to analytics.
It’s a course I’m really proud of. And this presentation is actually based on a lesson that we teach in the course. So fingers crossed. This goes well today. Because otherwise I might need to find something else to be doing next year. Now I wasn’t always someone who could [00:02:00] teach people analytics engineering concepts.
In fact, about five years ago, I was more of a Git newb in particular. So for context, five years ago, I wrote about this great tool called dbt. And the link to start using this tool sent me to a Git hub report and I had no idea what to do there. I thought do I download this as a zip file? Do I like what’s what do I do with this repository called dbt?
Now to use dbt at the time you needed to know Git, it was a prerequisite. And my exposure to get was through the Looker UI, where I just clicked buttons and tried to create branches and merge things. And I had no idea what I was actually doing. I looked up some cheat sheets at the time and it said, so things like, here’s how you set up your project.
Just initialize the report. And I was like what the heck does this mean? And then if I wanted to make a change to the project, this was the cheat sheet, like 10 steps just [00:03:00] to make a change. And I was really annoyed because, I’m used to working in a UI like this, where if I want to make a change.
I just make a change right. In my in the web UI. I don’t have to follow 10 different steps, but a software engineer said I realized I had to learn, get to use this magical dbt tool. And I’m so glad I did, but before I got there, a software engineer sat me down and he said, look, it is a super power.
When you learn, get you get all of these benefits, you get to understand the story of your code. You get to be able to undo mistakes. You get to know which version of your code is most up to date. You can collaborate on large projects, more easily, which at the time didn’t really matter if that was a one person data team, but became more important later.
And you get to implement a code review process with your coworkers. These were all things that, based on this 10 step list, I couldn’t see how I would get those benefits, but I trusted the software engineer. I taught myself. And today’s here. Here’s how I [00:04:00] think about it. It’s a total match. I love it.
It’s one of the most valuable tactical skills that I’ve learned in the last five years. And I’m passionate about more analysts understanding it and being able to use it really confidently. And so that’s why I’m talking about this today. I’m giving this presentation gets for the rest of us.
Now, if you are someone who gets a message from a coworker that says, Hey, can you open a pull request on this repo and that sentence doesn’t phase you get out of here. This isn’t the talk for you. Go to track to take a break, take a stretch, pet your dog, posts a photo of your dog in the Slack chat for me to look at it later.
This is a presentation for the people who feel overwhelmed by Git. For the people who maybe they’ve been through the steps a few times, but don’t really understand what’s going on at the deeper level. And therefore when things get confusing or something goes wrong, they’re not sure how to get back on the happy path.
[00:04:59] How to Learn Git #
Claire Carroll: I think there’s a lot [00:05:00] of bad ways to learn Git. First of all, what I did for a long time, which is avoid using it. You can use a UI, which is what I did next. You could memorize a set of steps, which is what I tried to do next, but instead, we’re going to do this. We’re going to try and build up the underlying concepts, map the commands to these concepts.
And hopefully that provides you with a base so that you can go and learn way more about it. We’re not going to cover everything today. We’re basically covering that pull request flow. But with the concepts that you learned today, things like merge, revert, reset. All of these things will feel easier to understand the next time around. Sound good?
I actually have no response here. So I hope that you’re at home nodding along. Let’s get into it. So first of all, we’re going to wipe these cheat sheets clean, and we’re going to come back to these cheat sheets and build off the steps over time. So this one, how to make changes to your [00:06:00] code. I don’t care about these steps.
[00:06:02] Level 0: Use a Web Editor #
Claire Carroll: We’ll come back to them later and finally, these benefits watch them all off and then let’s build them back up as we go. We’re going to be talking about the flow in different levels or layers, starting off with what some of you might be familiar with today. And so level zero using a web editor, which we just saw.
So I just implemented a new field code. Median steps is about the way that some of you might be well. If we are just feeling this flow though, regarding have any of those benefits of Git, of course, that makes sense. We’re not using it yet, Snowflake, since its console is pretty reliable.
If I close this tab and reopen it, it will have my code there. It auto saves. There are benefits to this workflow and it is worth absolutely acknowledging how easy it is to write code in this interface. So that might be where you’re at today. We’re going to break [00:07:00] out of the web UI and instead start working locally.
[00:07:03] Level 1: Work Locally Without Git #
Claire Carroll: And we’re doing that to set the foundation so that we can start applying Git in the next few levels. Okay. So level one, working locally without Git. Essentially what we want to do is take out queries and save them as dot SQL files on our computer. How would we set up a project that we’re working on?
How do we make a change? And one of the benefits of this approach, and these are the three same questions we’ll come back to as we build out, understanding together. So let’s do it. I’m going to copy my query and then head to my terminal. I am using just a little bit of terminal commands today.
So hopefully that’s not too intimidating. Going to make a new project and then CD into that folder. So mkdir for make directory and [00:08:00] CD for change directory, and then open this up in VS code. I don’t know why I chose to live card, but we’re doing it. You’re going to have fun. So my new project, I’ve made a folder it’s empty at the moment.
I’ll add a new file to it and I’ll call it my query SQL, actually this query is for workout. So paste that in there.
Now my code lives in a file on my computer. How did we set up our project just to reiterate, we created the new directory and we changed into the directory. How do we make changes to our code?
We made the changes and then we hit save. This is a big thing that trips people up when they’re changing to a code editor workflow, this idea that you actually have to hit save for the changes to persist. And that’s because we’re all so used to working [00:09:00] with Google Docs, with Notion, with Snowflake consoles, where we don’t have to hit save.
So that’s a really easy mistake to make. So I wanted to call that out as a separate step. So what are the benefits of this approach? Actually we’ve probably made out workflow a little bit worse. We don’t know any of the story about code. We can’t undo our mistakes very well. And also it just lives on my computer.
[00:09:25] Level 2: Use Git Locally #
Claire Carroll: No one else can access it, but importantly, we’ve laid the foundation so that we can move on to the next level. So level two, using Git locally. What is Git? I maybe should have defined this sooner, but we’re doing it at this level because we’re introducing concepts as we go. So get is a way to track changes to the code in a particular project or repository also referred to as a report.
So here I’ve bolded these times because they are [00:10:00] new terms for us. And repository, just names, a folder, a directory, a project that we’re working on. Now, importantly, with Git, every time we make changes, we want to group these changes into what’s called a commit. Commits are a really important concept.
So we’re going to dive in just a little bit more. So commits represent the set of changes that you made to your code. For example, remove this line of code added that line, change this other line. And this is different to say like a Notion doc, which might actually save. Bundle up the changes as the entire doc in its current state (actually I don’t know how Notion does this), but rather than saving the whole document, commits adjust saving the changes. This is a really important concept because it allows us to do many other really cool things. So if I know the changes that someone made and in what order I could reapply those commits in a [00:11:00] particular sequence and get the code in it’s current state.
This is beneficial because commits represent the story of a repository. They let us see the context of a code changed, what files changed and perhaps a message that explains why. And also who changed the code, which can be really useful on logic teams, because commits are outgrouping of changes, they also help us revert the code back to an earlier state, if we need to. I’m not going to do that in this presentation. We might have some time for Q and A at the end, and I might do it. Then we’ll see how we go. So what do we need to do here? I need to ask these same questions. How do we set up project at this stage, how do we make a change from what are the benefits of this approach?
So first of all, to set up my project, I already have my folder, but I need [00:12:00] some way of designating this folder as a Git repository. The particular command for that is Git init. And I’ll use the period to designate this particular folder. And I have my computer set up in such a way that it gives me some information about Git whenever I’m in a Git repository.
So that’s a particular setting on my computer. This is a new file. If I can get status, we can see that coding to get workouts dot SQL is a new file. And that makes sense, because we haven’t tracked that new file up until now. So let’s register this new file as a commit. I’m going to first stage the changes, get, add an period for everything in this folder.
You can also do it. One file at a time. I’ll check that was staged correctly. We can see that has changed compared to here. And then I’m going to commit this with a message that says add workouts [00:13:00] period.
Okay. So knit, repository, commit a few different new concepts there. Let’s make sure we got all of those.
So now when we’re setting up our project just use Git locally, we’ve got a few new steps, initialize the repository, Git init. And now when we want to make a change, we have to stage the changes with Git add, and commit the changes with Git commit. Stage and commit two separate steps. We might talk about that in the Q and A.
[00:13:34] Why Use Git? #
Claire Carroll: Let’s see how we go for time. Hopefully that all feels good. So let’s move on. What are the benefits now? We can now start to understand the story of our code. If I type in Git log, we can see that I made the commit and the change that I made there and the time [00:14:00] stamp of it as well.
Okay. So we’ve started to realize some of those benefits of using Git. I know which version of the code is most up to date, but my teammates don’t know. And if dbt Cloud needs to run this code, it doesn’t know which version of the code is the most up to date. So we’re going to introduce a new concept here, which is basically that we want to back up this code on the internet and to do that, we will use Dropbox.
[00:14:33] Using GitHub #
Claire Carroll: JK, we are not going to use Dropbox for this. We are going to use GitHub. Now, this is a thing that I see a lot of newer users getting confused with what is Git and what is GitHub. They’ll use the words in the wrong context. This is why I like to teach it in this order. Introduce Git first and then GitHub.
So you can think of GitHub, like a Dropbox, but specifically for Git projects. [00:15:00] So GitHub or GitLab or Bitbucket or any of the other ones lets you store a copy of your code online and do much more than that. But most importantly understands get concepts in particular, repos and commence. It is not an automatic sync though, which you might be used to if you’re someone that uses Dropbox or iCloud or anything to backup code.
And so therefore there are more steps involved than if you were just using a normal backups. And again, those steps can feel like extra overhead at first that why those steps to make a change, get even longer.
Claire Carroll: To use GitHub, we have a few conceptual steps here. We need to create a repo or a project in GitHub. This is referred to as the remote repo, another new term there. And then we need to create a connection between this GitHub repo up in the cloud, internet, [00:16:00] and the copy on my computer, which is actually living on my hard drive. You can also clone an existing repository if you’ll working on your tape, like a project that already exists.
When we want to make our changes, we need to pull the latest changes before starting our work to make sure that we’re up to date. And when we want to publish changes, we need to push the changes to our remote report. I probably should have added a diagram here. I may have run out of time, but hopefully that makes sense.
So let’s let’s do the specs. We’ll head back to our demo screen. I’m going to type in report that takes me to GitHub. I’ll create it in the Analytics Engineers Club organization, and I will call it my project. Let’s say, live demo Coalesce. You can check this to see if it is working right now because I will set it to public.
Okay. So I have my remote repository created this version on the internet, [00:17:00] and now I need to link it to the folder on my computer to do that. I’m going to use this command. I’ll copy and paste it in, Git remote, add origin. And then that’s the URL of this particular project. So remote again, that’s the word to say the version on the internet? I just means with registering this new remote. The nickname for it is origins but we don’t have to type out the address every single time. I’m all done. If I type in Git remote b, we’ll see that being reflected back. Now, if I want to make a change, I’m going to first pull my changes. I’ll follow up what it says to do set up stream to origin rain main.
There’s actually nothing on this. Sorry. I messed that up. There’s nothing in this remote repository. So I shouldn’t be pulling, just ignore the last minute from your minds will be fine. But normally I would be cool if there [00:18:00] were, if this repository had new changes, I’ll head over to the S code.
I’ll make a change in here. I’ll rename this field as average steps work out and this as a workout. Hit save. So command S to save it. Head back to I term, and then those same specs again. Add, rename fields, get, push, ahead to GitHub and hit refresh. We see that file up here. We can see this rename fields commit show up and it shows us this really good diff view to see the changes that we’ve made.
So just to reiterate that again, as I like doing. Now to set up our project, we’ve got our last two steps unlocked. We created a remote repository on GitHub, and then we linked the remote to our local [00:19:00] folder. And that’s really all that we have to do to set up our project. We’re done with this cheat sheet, unlocking this cheat sheet to make changes to our code.
Now we need to make sure that we pull the latest changes. Especially if we have team members collaborating on this code, we’d go through those same steps as before, make the changes, hit save, save the changes, commit the changes. And then we finally push the changes. And we’re starting to see even more of these benefits to using Git.
We can really easily see which version of our code is most up to date. It’s right there on GitHub. If everyone on the internet can say that this version is most up to date. And we’re starting to get the sense that perhaps we can collaborate on projects more easily because I have an account on GitHub.
[00:19:51] Level 4: Brances and Pull Requests #
Claire Carroll: And I created this in my organization, GitHub org. And so therefore I might be able to collaborate with my coworkers, but we’re not there quite yet. And this [00:20:00] will be our final level that we’ll go through today. So level four, the final level we’ll unlock is using branches and pull requests.
So the big idea behind the branch is what if rather than making changes to one copy of our code, we made our changes off to the side and then propose those changes to be accepted. That way we could test the changes work before we update the code. So we would never accidentally commit broken code and we could have a team member review our changes and vice versa.
And this is where the idea of branching comes in. What we need to do here is check out a new branch. So check out another new term there. We want to make our changes to our code, test the changes, work and propose that they be added to the main branch via a pull request. Now GitHub calls these pull requests, GitLab call these merge requests, which is an infinitely better [00:21:00] name.
So if you’re struggling with the term pull requests, because you just lied to get pulled in, it doesn’t really match up in your head. Think about this, are they merge request? We can also get a review from a team member to check that our code maybe fits our style by it or that it’s just like the best way that we could possibly write it.
There’s often a lot of different ways that you can write SQL.
Let’s see, that’s the final demo. Git checkout -b my-new-branch. Normally you would call it something much nicer than that. Head to the VS code. And what change do I want to make here? Maybe I replaced these with riffs, even though they’re not actually, this is on a dbt project, so I won’t today. But I will, I dunno, just remove average.
Maybe I don’t like it as a measure of central tendency. So same as before, Git add, commit, remove average. I [00:22:00] don’t like it. And this time I’m going to push it to my-new-branch.
If I head to GitHub, refresh, don’t even need to hit refres. GitHub knows. It’s magical. It anticipates that I just push something to a branch off to the side.
Hey, maybe I want to compare and pull requests to merge that in. If I’m being really good, I would add my to review. Hey, this is full of the Coalesce. Then I create the pull request, and we can see that commit that I made. We can say the file that I changed. I just remove that one line. And if it gets online, I’d get him to re to give this a tick.
We won’t do that today. And eventually I will hit merge, pull requests to make sure those changes are live. Then when we head back to my project, [00:23:00] we can see that this change has persisted here. And if on my command line, I go back to the main branch. I don’t yet have these changes. I still got that average showing up in my code editor.
So what do I need to do? I need to pull those latest changes. There we go.
Let’s look at the code editor and that average is now gone. And so that’s why we need to make sure that we keep pulling because as we merge things back into main, we need to make sure that our code stays in sync. So find a little unlock, the step to set up our project has not changed, but we’ve just took off these last few steps.
Okay. So pull the latest changes, check out a new branch, make changes, hit save, save changes, commit the changes, push the changes, open a pull [00:24:00] request on GitHub, have a team member review your work, which I didn’t do today. Also a good idea at this stage is to have a CI checks wrong to make sure that your code does what intensity, CI meaning continuous integration. Basically like this code works, so I’ll give it a little tick and then finally merge the changes into your main branch.
So we’ve unlocked all of those benefits. We are able to collaborate on large projects way more easily, because we can all make sure that we have the same code. We can work on huge projects with hundreds of files and not overwrite each other’s changes. Just do it in a really nice way. And we can implement a code review process with coworkers to make sure that the quality of our code is really good.
Code review isn’t just about, ticking a box. It is about improving the quality of your code. And sometimes it’s about teaching people. If you are the more senior person, you might add someone as a reviewer so that they can [00:25:00] learn how you approached a particular task. Okay. Those are the things that I wanted to hit on today.
Claire Carroll: I’m just checking the time we’re doing pretty well. So where to from here, if you sat through this and there were some new concepts to you first of all, you sat through this and there weren’t new concepts to you. You should have listened to me at the top of the presentation and taken a break. But if there were new concepts, I encourage you to go through this tutorial yourself, do it word for word with my new project, with a file that you have on your computer.
Just try and replicate it yourself. As you go through, you’ll start to come up with new questions, things like what makes for a good branch name what is a good commit message? Why are add and commit different like different commands. And we’ve actually answered all of those in this blog post that I wrote.
So is that’s analyticsengineers.clubs/coalesce. And that has get the second time around. It’s the same flow, but it goes much deeper on every single. I also [00:26:00] encourage you to put your new skills to use at Coalesce.bingo. And if someone’s in the chat, they may share the Git repository for this.
So Coalesce Bingo is a fun little thing that my friend Izzy made, if you haven’t seen it yet. Okay. People already ticking things off. So if you head to GitHub.com, izzymiller, "coalesce bingo". Clone this, and maybe try and contribute to someone else’s repository. And the cool thing about I’m trying to contribute here is that you’ll get practice cloning a project rather than creating one from scratch.
You can also start learning about all the commands like merge, fetch, my favorite, rebase. Anyone who’s ever sat with me through a lesson on rebase knows how much I love revert, reset. There’s so many other Git commands. A nice little tip here. If you are using any command line tool, just about, if you type in the command and dash dash help, you [00:27:00] can see all the different commands in that that are available to you.
So the same for dbt. So those are all the dbt commands, but forget help. These are some of the things that you can work on learning next. Finally as I mentioned, this is one lesson from a 10 week course. This is like one lesson for one week out of full lessons. So if this was interesting to you and you want to go way deeper on these kinds of things consider joining us for our next course on Analytics Engineers Club. The website is analyticsengineers.club, and there’s a link to the mailing list of sign up and find out when we are opening applications, which will be in January of next year.
Finally, if you knew all of this hopefully one thing that you can take away from this today is considering how you teach new concepts to people. I massage the story a little bit and said I found this cheat sheet. It didn’t make sense to me. The truth is like a software engineer [00:28:00] just handed me that cheat sheet.
And they were like, follow these instructions. And I felt like a real moral failing for not understanding what these instructions meant. And I became very nervous about asking for help on this and then a different person on my team helped me out eventually. But the meta lesson there is that it doesn’t make sense to teach people in linear order.
If you hand someone a sheet and they say oh, just pull the changes, check out a new branch. That’s not going to make any sense to them. You’ve introduced way too many concepts at once. Are you implementing this concept of scaffolding, which Kyle can talk much more about as an actual teacher and consider, what does this person know today?
So for us today, it was writing code in a web editor, where do you want to get them? And how can you break that into different layers or levels? So that you’re just introducing a few new concepts at a time. So that person gained a ton of confidence and they feel really empowered to keep working. If you do it well, [00:29:00] you’ll end up with feedback like this.
This was one of my students who, after doing this lesson, she said, she’d been using it for six months and now it feels like she finally understands what she’s doing instead of just copying commands from a sheet. And this is exactly what I wanted to happen. This is what I feel so passionate about.
Like understanding the concept is a huge unlock. That is it. Thank you. Again, those links, analyticsengineers.clubs/coalesce. You can find me on the bird site, Twitter at Claire B Carroll. I will be hanging out in Coalesce-Git for the next half hour or so to answer any questions though, if we go too deep on get, I might not be able to answer them, but with that, I think we’re going to bring Kyle back up for some question time.
Kyle Coapman: Yeah. Awesome. Claire, always a pleasure seeing you teach and I loved your part there at the end, teach the right things in bite sized chunks first. You don’t need to know everything just to get started.
Claire Carroll: There’s a thing you taught me, which was, I do we, do you do, is [00:30:00] that right?
It is. Yeah. And I think I had picked that up through practice, but it was really cool to learn that’s like a tried and tested teaching paradigm.
Kyle Coapman: Yeah, totally. But I think particularly to call out, like whenever learning code, there’s like this temptation to learn everything at once and be like, I’m going to go take this four week course on, get in understand rebates, which I don’t really fully understand.
But you just need the simple commands that you rolled out here just to get started. And then your curiosity, what kind of take you the rest of the way?
[00:30:33] Q&A #
Kyle Coapman: There are a few questions in Slack that bubbled up and some comments that I’m gonna like tweak into questions. So here’s one from Anders Swanson.
How do you think about naming your Git commit? Git commit messages for like beginners.
Claire Carroll: Yeah. So I always, I’m going to, I don’t know if my screen is still shit, cause I don’t have power over that. Whenever I’m making a change, I’ll always do it, trying to think of it more substantial change to make [00:31:00] I live again.
Fantastic. So let’s say that I had just I want to remove this field. I don’t know why I would want to, that’s the thing that we’re doing when I make the commit. I’m going to commit on main, which you shouldn’t really do. I always in my head say this commit will, and then I start typing. Remove the cadence field.
So that’s how I think about naming commits. You start the sentence, this commit will, and then finish it by doing that. You’re using this thing called the imperative tense. The imperative tense is like, when your dad says clean your room that’s the imperative tense because it’s a command.
So that’s like the trick to get your brain to do it really easily. Other people use this thing. I can’t remember what it’s called and somewhere in the chat might help me out. There’s like this, Git a semantic commits. We’ll see if we can find it. Sematic commits a conventional commence. And no, that’s not it. Nevermind emojis.
Kyle Coapman: Funny, [00:32:00] that’s what Anders actually originally brought up this conventional commits. So I’m curious if this other one is.
Claire Carroll: Yeah, it was like this one, this was too hard for me, but like some people like to put it in the energies, in their commits to me. Oh, this fixes the CI build. And I’m like, no, there’s too many things to remember.
But yeah, I think conventional commits was also another thing that I’ve read before. And in general, I really like conventions because they help you like reduce your decision-making. If you have a really simple rule, like this commit will.dot, dot it means you aren’t stuck sitting there being like, what do I name this?
What do I name this? What do I name this? You just can do it really quickly. And then of course rules are meant to be broken. If I look at the code for my personal blog let’s look at one. I did the other day, like fix typos, tried to send a tweet fix, maybe this, start off with the good practice once. I was like, no one looks at this except me. So I’m fine.
Kyle Coapman: Totally. I liked that a lot. I’m going to [00:33:00] hang on to that one. I haven’t heard that one before this commit will, whatever it is slightly, maybe not spicy question, but this one comes from Joel. Why is it important in your opinion to get comfortable with Git CLI instead of in crackin or GiHub desktop and the like.
Claire Carroll: So let’s first talk about what this flow would look like in the week. If I can do this, actually, I haven’t done this in so long that I don’t know if I can. So let’s add that field back in, control Z, save, and then I’m going to hit the add button here and then write a message. Add the cadence field cracking, and then it’s sudden pick maybe and then sync changes.
Is that going to push it? Whoa, I’ve never done that before. No, I don’t want it to do this thing. The reason that I don’t love that is because every UI is different, right? Like I’m using VS code. If I switched to Atom, it’s going to look different if I switched to Sublime which I don’t know if I don’t use it anymore.
It’s going to look different again, and you don’t [00:34:00] necessarily like fully understand the things that you’re doing if you’re clicking buttons. And so there’s a few things to pause out there. One is like the consistency of the experience, the commits that I, sorry, the commands that I run from my terminal, a going to be the same in my terminal, in your terminal.
Whether I’m using the terminal and I term, or using the terminal application or using the integrated terminal in the S code or using the terminal in a remote machine, it’s very consistent. The second thing is that I think you understand the concepts better. If you have running the commands from the command line, I’ve just seen so many people who use a UI and then once something goes wrong, They’re like really unsure about what went wrong.
And so they get very flustered and overwhelmed and then it’s not that same oh, something went wrong, but I can fix it. So I think that’s a downside as well. I think I had a third, but I’ve now forgotten it. So if a third comes back, I’ll drop it in the Slack.
Kyle Coapman: Cool. I love that. [00:35:00] I mostly got into CLI after joining dbt Labs, I was felt like I couldn’t do enough with the GitHub desktop UI and eventually got into a state where I was like what do I do? Yeah, nothing against start in there.
Claire Carroll: But yeah, the other thing is clicking buttons with a UI versus like typing, especially if you can autocomplete things. I don’t know. I’m just faster on the keyboard and there’s less, I think when you have to find where the button is and click it. That slight bit more context changing as opposed to just staying sprayed on your keyboard and typing away.
Kyle Coapman: Totally. Especially if you’re really in a coding flow state, you can command tab over to I term, Git commit, whatever it might be. Yeah. Get your blue, your blues in there really quickly click different, slightly different question.
This came from Dan Hess. Dan essentially said what if I like no Git, but want to introduce this to my coworkers? Can I stay and something that kind of change Dan’s question into, how do you think about empowering a [00:36:00] team or coworkers with Git? How do you make that friendly? How do you make that safe?
Claire Carroll: Send them the video of this. I feel like this was a good starting place. But I do think it’s important when teaching someone a new skill is get them started on my new project or whatever it is, get them started on a test repair. I think another failure mode of teaching, especially when it comes to dbt and get is handing someone a complicated dbt project or just a large Git repository and trying to get them up to speed as their way of onboarding, as opposed to getting them started with a small project that’s manageable and they can experiment and where they can make mistakes.
And it doesn’t matter. So if I were introducing this to a team, I get them all to build their new project. I’d get them to review. So you have a pull request, sorry, request a review from a team member and get used to that flow and then onboard them to your larger project that might have a pull request template that might have CI checks on the [00:37:00] main branch.
And then that sort of adds that layer of complexity at the end of learning all of these concepts. So again, it’s that same paradigm. How do I break this big, complicated things up into smaller pieces where I’m just revealing complexity along the way.
Kyle Coapman: You’re almost like making the learner, the learning modular in a way, breaking down dbt models that might be super long.
Claire Carroll: And if you do need to onboard a team member to dbt as I said, don’t onboard them to your very complicated project, send them to a "learn.getdbt.com." You’ve got to see Kyle’s friendly face and go through that project.
Kyle Coapman: Yeah. Start small. Also, I think you to highlight something, you said, starting a safe place. Like it’s super scary. If you’re on something that is production and you probably don’t have access to break things, but the feeling of you breaking things can hold you back from taking risks, otherwise taking a safe, yeah.
Claire Carroll: One thing we actually do with ask students [00:38:00] is we get them to connect to a remote incidents. Feels like extra overhead at the start, but the S code makes it really nice to do this. And once you’re connected to a remote instance, like we just can tell students, there is nothing you can do to like to break this too much. Like even if you break this entire remote and stuff. So I should explain that remote instance being basically like a computer that we’ve created an AWS, as opposed to there aren’t running commands on their own computer, the files live on this external computer that they’re connecting to.
And worst case scenario, they somehow break things. I don’t even know how and we just wipe that clean. We give them a new instance and then they’re back like working again. And we can even connect to that instance and fix things for them. So we take it even one step further about creating this like really safe space for students to experiment.
Kyle Coapman: How do you thought about that? You can effectively brick your Mac book. If you do some weird things, especially when you have command line and [00:39:00] you’re really working in there. Let me see if, just to see if there’s any more questions over here in chat. I was really getting into the conversation with you there.
What are some good short term, this is from Tyler Wood, short term Git goals for a small data department running a modern data stack, and really startup, no Git background in the data group. Where do they get started?
Claire Carroll: Hopefully here I would put as a short-term goal to make sure that all changes are going through pull requests flows.
That would be my goal to work towards. The goal after that I would be looking to implement continuous integration tests. Maybe the next goal after that implementing a pull request template there’s maybe three tactical things that you can cross off as.
Kyle Coapman: Awesome. So important to get the pull requests or like the looking at each other’s code early, otherwise as, things can get a bit squirrely in and you’re not aligning conventions.
Claire, always a pleasure. It’s great. Hanging out with you nerding out on teaching, Git, data. Thank you for putting this together and showing us on your teaching.