Resources

Abigail Haddad - GitHub: How To Tell Your Professional Story

video
Oct 31, 2024
20:13

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name is Abigail Haddad and I got my first data science job offer seven years ago this week. I won't keep you in suspense, I did in fact take the job offer. So it's been a busy seven years. I've had five different jobs. I've read approximately 500 resumes of people applying to work at the various organizations I've been at.

I've also applied to various talks like this and other ones where I did get to give them. And I've selected speakers for the data science DC group that I co-organize. And so throughout all of these, there's kind of a common thread and something I think about a lot, which is professional stories. So how do we communicate the kinds of things we're trying to communicate about ourselves and what we're bringing to the table for a possible future employer or in some other context? And how do we read other people's stories? Sort of what's the best way to do that? And like in the title, a surprising amount of the time for me in these experiences is GitHub.

What we're trying to communicate

So here's the kind of story that I'm trying to communicate frequently about myself. And then I think you're probably trying to communicate about yourself as well and what you bring to the table. So that first part, obviously, look, it's coding skills. I'm data scientist, I write a lot of code, like way too much code. And that's part of what I do. But like you, it's not the only thing that I do. I also bring a particular problem solving approach. So when you come through with a problem, a set of requirements, there's a way that I approach that.

There's also a set of technical practices and tools that I use, which of course has expanded over the years. I finally learned Docker a couple of weeks ago. That's been great, recommend. And also there's a set of communication skills I bring. So there's a way I write and a way I speak. And this is sort of, for me, this is a large part of the package that I want to communicate about myself.

So first, a digression, junior resumes are really hard, okay? They're a really hard way of communicating the set of skills about yourself, I think even more than more senior resumes. And that's for a couple of reasons. First of all, that's because you're junior, you don't have a lot of job experiences in this field. So you don't have a lot of ways to say like, hey, I built this thing. Hey, here's how I approached this.

The other thing is there are all of these data science and data analytics programs, and we don't know what they are, okay? So if you tell me you studied math undergrad or economics PhD, I broadly know what that means. If you tell me you studied data science, I actually don't. There isn't really a level of standardization, and even the same class across different programs can go into very different levels of depth. And so when I see that on a resume, it's hard for me to tell what that translates to, what you're bringing to the table, okay?

Why GitHub?

So here's what we're going to be talking about today. I'm going to be going through why GitHub as opposed to the blog or the LinkedIn or however else you might communicate your skills. I'm going to talk about selecting projects that really show what you can do, that let you either build or show the kinds of skills that you want that future employer to know that you have. And I'm going to talk about some good development practices, both what I mean by good development practices and also how you can use them regardless of how big or small the repo that you're building is.

First, why GitHub? Okay, so I'm on LinkedIn. I spent a lot of time on LinkedIn. I suggest you spend less time on LinkedIn than I spend on LinkedIn. I also have a blog, and I'm not going to tell you these things aren't useful, okay? But if you're looking for the lowest hanging fruit in terms of spending a hopefully small amount of time in a way that has the greatest returns, both to building your skills and showing off your skills, the answer is GitHub.

And that gets back to these things that we're trying to communicate, our coding skills, our problem-solving approach, the kinds of tools we're using, and how we communicate, right? So you can write a post on LinkedIn, maybe it includes a graph or a finding of yours, and that's part of the story, but it's a very small part of that story, whereas you make a Git repo, and that's a much bigger part of it. So I'm seeing your actual code, right? I'm seeing that the way you got to that graph was some way where you wrote functions, your code was modular, you abstracted it out. But I can't tell that just from looking at the graph, I tell that from looking at the code, right?

I'm also seeing how you communicate and how you approach a problem. So I'm looking at your readme, right? You should read me, we'll talk about that. So I'm seeing what problem you are trying to solve and how you came at it. And I'm seeing the broader set of tools that you're using, starting with using Git, right? Which is actually, like, not everybody knows how to use Git, and that's a great skill to bring to the table. So even if you're doing those other things, and again, I'm not going to tell you not to, I think GitHub should really be at the core of what you're doing for your portfolio.

Additionally, it's industry standard, right? Using Git is something you are very likely to be doing at your next job. And if you're not using GitHub, you're probably going to be using some kind of internal Git server, GitLab, or something else. And so being ahead of that and showing you know how to use it, and that you're paying attention to broader industry norms is really useful. It's free, right? That's cool. I used to pay for a website, and I don't anymore, because I can get the functionality that I need from GitHub.

And finally, and these two really go together, you're creating a lasting portfolio, which is actually likely to get looked at. You can post on LinkedIn now, people might look at it, they might look at it tomorrow. A year from now, when you're applying for a job, that possible future employer is very unlikely to be reading your year-old LinkedIn posts, whereas they really might be looking at your year-old repo.

A year from now, when you're applying for a job, that possible future employer is very unlikely to be reading your year-old LinkedIn posts, whereas they really might be looking at your year-old repo.

Okay, I did a very unscientific little question on this again, LinkedIn, and I asked people whether they looked at GitHub profiles when hiring for data roles. The majority said that they do look if it's listed, and some people said they actively search for it, right? And very few people said that they don't look. And so I can't promise you everybody will, but my sense is anything else we could have asked about, the numbers would have been lower. GitHub really is the thing where if you put a link to it on your resume, many people will go look at it.

And finally, okay, so before I was a data scientist, I was a PhD student, okay? And I was scared, like for a much longer time, even after I finished my PhD, I had this sort of nightmare that somebody was going to ask to look at my code for my dissertation. And had they asked that, we would have had problems, okay? So the first problem was, I had so many, I was working in Stata, I had so many DO files, I couldn't actually have told you which created that final table or that final chart that was in my dissertation. So problem one. Problem two, the code was really bad. Like it was embarrassingly bad, it was very bad. And so I knew if somebody asked me, like I was going to go through this whole shame thing, it was going to be a mess, okay?

I don't have that anymore, right? And I want you to not have that, because now when I code, I approach it from the perspective from the beginning that somebody is going to be looking at that code, right? I might look at it a year from now, I want it to be useful, somebody else might look at it. And that completely changes how I code. And also as much as possible, I put my stuff on GitHub from the beginning. And that way when somebody comes up to me and says, hey, which does happen periodically, can I see your code for this? I'm not stressing out, I'm not worrying about it, I send them to the repo, I call it a day. And that's been just incredibly powerful for me, and I want you to have that as well.

Selecting projects

So people ask me about how you pick a project. And the way I really think about it is, what are you trying to show that you can do, okay? So one of the things that I really appreciate about this field, and that's been really helpful for me career-wise, is that in most fields, most jobs you can have, the way you get the skill to either move up internally at your organization, or find a new job, is somebody has to tell you, hey, go out and do this, right? And you can certainly ask your boss, hey, can I do this? But you're really limited by what you're allowed to do at your current job.

If you need experience organizing very large conferences, you can't do that on your weekends, right? You need to do it at your job. Whereas as a data scientist, and this has really helped me over the years, I don't need to wait. If there's a skill that I want, because I think it's going to be useful for me either moving up internally or at another organization, I can mostly just go get it. And that's been really amazing.

So when you're choosing your project, I would say the first thing to really think about is what skill am I either trying to get, or just trying to show that I have? And I wouldn't underestimate the current set of skills. So for instance, if you are a junior in this field, and you know how to use Git, and your code isn't functions, and you write a readme file, that's actually really huge, right? And so if you're not on GitHub, I would really recommend spending a few hours, it does not need to be a part-time job, just showing that you can do those things, right? So it's both about the skill building, and it's just about showing the skills you have right now.

And in terms of skill building, I would really, again, look at the kinds of jobs you might want to have, and figure out how to get those things, because you really, by and large, can get them on your own. Next, interests. Okay. So if you're trying to show, hey, I can get data from API, or I can do data viz, or really kind of anything, I can write functions, you can do that for any kind of data or any kind of interest that you have, and so why not pick something that you like?

And finally, scope. So I mentioned this before. I really don't think GitHub should be your part-time job, right? You have jobs already, you have families, you have hobbies and friends, and all kinds of things that probably you would rather be doing. And so I would pick the smallest possible project you can do that either teaches or conveys what you're trying to convey. You can always build something out, but it's harder to shrink something once you've already started. And one of the things you're trying to show is that you can do something from start to finish.

So like, again, if what you're trying to show is functions, modular code, a readme file, if the only output of that is I made a table or I made a chart, that's fine. There's a lot you can show with a very small amount of code. And so scope it narrowly and then build if you want to. Finally, and I do want to mention this, put it on your resume. So personal projects are actually a really good thing to have on your resume. And just describing how you did it, here's the tools I used, again, linked to your repo, but that's a really good thing to see.

Good development practices

Finally, good development practices. So I'm afraid like sometimes when people who are more senior say good development practices, it can sound like this. It can sound like, here's a hoop we want you to jump through, right? Or gatekeeping. There's this thing that we spent the time and the pain learning. So we're going to make you do it too. That's not how I want to come across. And I think that's how most, that's not how most people in this field want to come across.

When I'm talking about good development practices, I'm talking about three things. I'm talking about code that's easy to run, I'm talking about code that's easy to build on either for yourself or other people, I'm talking about code that's understandable. And that's really it, okay? And certainly what that means is different across different projects. If you're writing a package, that's going to mean a whole lot of things that it's not going to mean if you're writing something for yourself or even for your organization. But fundamentally, this is really all we're talking about, okay?

So how do we do that? These are my sort of repo essentials. These are the things that I would recommend for any repo you're doing, no matter how big or how small. So your code is modular, right? It's in functions. Maybe it's in classes if you want, right? It's not just a block of text, but it's these little things that each do one thing. You're not repeating yourself. It's clean. Second, all your work is in code. There's not a part where like, here's the part where I pushed my data to Excel, I made some changes, I read it back in. Everything you're doing from start to finish is in the code.

Next, it's organized. Maybe that means you have a subfolder for your results and a subfolder for your data. There's lots of different project structures out there. Maybe it means if your code is really big, there's multiple .py files and each of them kind of does something different and then you bring it all together. And so how this plays out is different for your different projects, but organization. It's documented. You have a readme file, right? Your readme file shows me what you did and why and kind of introduces all of it. I like docstrings because GPT writes me docstrings, right? Much faster. I don't write them on my own, but nobody's expecting me to have docstrings. You know, it's kind of fun.

And finally, it makes sense. And that's harder to sort of describe. That's not like one technical thing, but what I mean is I can understand why you built what you did for the problem you were trying to solve. And maybe you find out midway through, actually, this isn't how I should have solved it. And you can put that in your readme and say, here's what I would do better next time. But really, it makes sense.

Curating your profile

Okay, so when I was going through this, I went to my own GitHub profile and I realized that my pinned repos, which are generally going to be the first thing somebody sees looking at your profile, were not actually the repos that I wanted pinned. Okay? So this is what it looked like last week. It actually looks completely different now. That was a good lesson for me.

So when I say people are going to look at your GitHub profile, I mean it. But generally, they're not going to look at everything. They're going to kind of breeze by, and you want to make it as easy as possible for them to read the story that you want them to read. And that means pinning the repos you want them to see and hiding the repos you don't want them to see. And that's not nefarious. But like, for instance, if there's a repo that's pinned because you worked on it six years ago for a class, and it's a repo you didn't really contribute to, you cloned it, whatever, you added a little stuff, it doesn't really tell your story, then that's not something you want pinned.

What you want pinned, and it can be one repo or up to six, are the things that do the best job of communicating the skills that you're bringing to the table, because that's probably what people are going to click on. I didn't show this, but if you have recent commits, that's the other thing that's going to show up, but I'm less worried about that. Generally, your recent commits are going to be things you want somebody looking at, whereas the things that GitHub decides it wants to pin really may not be. And so you just want to give this a look and make sure that if GitHub is on your resume, that what people are going to see first is what you want them to see. And I tend to ramble a little on that one. So here's the two-point version, which is hide repos you don't want seen, pin repos you do want seen.

Next steps

Okay, so next steps. So, look, if you're not on GitHub, I really recommend signing up. And I think this can be kind of scary, but it really doesn't need to be. You enter your email, you press continue, you begin the adventure. Also, if you're already on GitHub, I recommend looking at your old repos and maybe upgrading a little bit, thinking about what story you're trying to tell about that code, about those communication skills, about everything you're bringing to the table, and really making sure this is a good representation of what you're doing. And again, we're talking about coding skills, we're talking problem-solving approach, we're talking about the tools you're using, and we're talking about your communication. And all of those, you can tell a very good story with a repo.

And all of those, you can tell a very good story with a repo.

Thank you. I felt like I would be remiss if I did not include my GitHub profile on this. Also, LinkedIn, Substack, I've written a couple of posts about improving if you have Jupyter Notebook repositories, and also about sort of the larger issue of Git as a hiring practice relative to other things. And thank you for listening.

Q&A

We have a couple of questions while Alan sets up. So for some of us, we work on proprietary projects or pipelines, but how do you... Do you have any recommendations on how you can publicly show those skills that you've learned? Yeah, absolutely. So like as quickly as possible as you can. Okay. So if you have a set of skills already, then you want to find a way to sort of represent those skills in a new repo. So if you can't share your own stuff, you make a personal project, you think about what you're trying to communicate, and you just get it out there. Because I've been in that position too, working for the government. So like largely, I can't share the code that I write. But if you already have the skill, like that's where it's generally fairly simple.

And there's some things you're not going to be able to build on your own and share. So if what your skill is, is like, I build robust pipelines for millions of users. That's not a GitHub project, right? Like you can't. But the more junior you are, the more likely it's going to be that there will be a quick personal project that showcases the kinds of skills you're trying to showcase.

How do you make time or prioritize putting personal code and projects on GitHub while coding for your job? Yeah, look, that's a great question. And one of the things I like about GitHub is that it doesn't need to be, like it's on your schedule, right? And so like job searching can take a lot of time. And like, I don't recommend waiting until you're job searching and then building out your GitHub profile because it does get really time consuming. So I sort of think about two things. It's doing it when you have time, and it's scoping it as narrowly as possible to show the skills you're trying to show.

What's an example of something you've improved on since showing your Ph.D. code and since you started coding publicly? Guys, it was so bad. My Ph.D. code was so bad. And no one's ever asked, which is great. And if they do, it's been so long that I feel like I don't, I can just say no at this point. But yeah, I mean, I definitely like this last seven years for me, but also for data science in general has brought us much closer to software development. And there are areas where like you don't need to be a software developer, really. But being able to write code that doesn't just run on your computer, right? Where I'm thinking about how to write this in a way that's more abstracted.

So like in the last year, I started playing with CI-CD pipelines and unit tests in GitHub. So I have some repos where this is appropriate. So I write tests and every time I push my code and before I merge it in, it runs a series of tests in, I think it's like Python 3, Python 3.9, like different versions, Windows, Linux. And it runs my stuff to make sure that it didn't break. And I guess I don't know that for everything, but it's really nice figuring out whether your code broke like before you merge those changes in and not like, oops, I ran it and it broke. So I like that.

For humans on the hiring side, can you share your thoughts on how not to introduce bias due to the lack of a GitHub profile? Totally. So and this is funny. So since I put this, since I did this talk, I now change jobs and I'm not allowed to use GitHub in hiring. Okay, so I don't because I'm not allowed. And I can still look at, if you list personal projects, I can still think about that. But yeah, look, I think this is a major issue. And I think we should never use, I think we should never penalize people for not having personal projects because there's all different ways you can learn things. So you can learn things at your job. You can learn things a whole range of ways.

For me, when I'm able to use GitHub, it's more about bringing people in who otherwise I would not be able to bring in. Right. So if you haven't gotten that experience at your job because you weren't able to or for whatever reason, but you learned something then in some other context, then I think it's really important that you're still part of the hiring process, that we have a mechanism for saying, okay, you for whatever reason, you didn't get this at your job, but you taught yourself. And I think like I think being able to teach yourself is a really good skill. And I think no matter how you got a skill, we should respect that and use that because you're still bringing that to the table for your next job.

All right, let's thank Abigail for her talk.