Data Science Hangout | Katie Schafer, Beam Dental | Building a Data Science Portfolio

video

Feb 23, 2022

1:02:14

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout. I hope everyone's having a great week. If you're joining for the first time, welcome. It's great to meet you. I'm Rachel Dempsey and I'm the host of this Hangout. Basically what it is, it's an open space for the whole data science community to connect and chat about data science leadership, questions you're facing and what's going on in the world of data science.

If you ever want to go back and rewatch prior sessions, or share with someone who's missed it, we do have a brand new data science Hangout site. I think I can still call this new for a few more weeks. But Rob can share that in the zoom chat right now as well. We really want this to be a space where everybody can participate and we can hear from everyone. So there's three ways you can ask questions. You can jump in live, raise your hand on zoom might be the best way with a bigger group. You can put questions in the zoom chat.

And feel free to put a little star after your question too if you want me to read it. Maybe you're in a coffee shop or something. And then we also have a Slido link where you can ask questions anonymously too. We really love the dialogue that happens live during these Hangouts. So if you do find yourself inspired, and maybe have spoken a few times, maybe consider holding back a bit for others to jump into. So we want this to be as inclusive as possible. And making room for others is also an important part of that. But again, just want to reiterate that we love to hear from everyone, no matter your level of experience or area of work.

But with that, I'm so happy to be joined by my co host for today, Katie Schaefer. And Katie, I'd love to turn it over to you to just introduce yourself and maybe share a bit about the work that you do on your team.

Yeah, absolutely. Well, hi, guys. It's great to be here. I don't know if anybody else is in the Midwest region, but heavy storm here. So hopefully you guys are all staying warm if you are. So my name is Katie Schaefer. I work at a company called Beam Dental. So we're an insurance company and we focus on ancillary benefits. So I oversee the data science and the analytics team. And we do a huge variety of things across the two teams.

So on the data science front, most of what we do is kind of the more traditional predictive modeling problems, getting those into production. And we handle a lot of the business facing data science problems on the analytics team. We do all sorts of things. So a lot of analyses, but also a lot of what I like to call data consulting. So folks from the business coming to us with key problems that can be answered or eliminated with data and us working to try to provide them some additional information, some opportunities for process improvement.

And from a tool stack perspective, our Python, Python's pretty big on the data science team. And also Looker, we recently moved over to Looker, which is exciting from Tableau. So lots of SQL too.

Open source community and hiring

Nice. So I know there'll be a lot of questions coming in. But to kind of open up the discussion, I know, in the hiring panel that you were part of, hiring great data science teams, we talked a bit about the involvement of in the open source community, and how that impacts hireability. And I think it'd be interesting to just kind of dive deeper into your perspective on on that, too.

Yeah. So I can just give kind of a general overview, and then excited to kind of dialogue from there. So I think it is kind of rarer than I might have expected that I see some heavy involvement in the open source community on applicants. So when I do, I get pretty excited for a few reasons. So I think from my perspective, it shows a true kind of self driven desire for learning for improvement, and for kind of staying fresh on all the tools within the industry, and latest packages, etc. So it speaks to that kind of genuine interest in the job you're applying for versus versus just kind of a nine to five that somebody's focused on.

In my perspective, I think some kind of watchouts there is obviously there's a huge range of involvement in open source communities. So I think really what's super rich is when I can dialogue with somebody and hear how amazing their experience has been in that open source community and that data science community. Because that to me really speaks to kind of their passion for the topic and the job at hand.

Analytics vs. data science roles

Yeah. So I think the delineation of data science and analytics looks probably a little different at every business. So I'll just caveat it with my answer is not the right one, in my opinion. It's just kind of how I think about it. So data science and analytics are two different but really related teams. So they work together quite collaboratively.

So we have shared lab meetings, shared team meetings for now. I think as teams grow in size, obviously, that can be harder to keep that cohesion together. We also really work extremely closely with data management now. In terms of how I think of the differences between analytics and data science. So first thing that comes to mind is from a tool stack perspective. Traditionally, I think the analysts are much heavier on the SQL skills, much heavier on BI skills. So a traditional analyst that I'm typically interviewing or hiring is much more likely to have experience with like Tableau or Looker versus a Shiny .

Shiny, for example, typically when I start to see example with something like Shiny, I also see really strong programming in R or Python. And then often it tends to kind of correlate with the deeper statistics background. So our data science team definitely more focused on Python and R versus traditional BI platforms. SQL really common amongst both of the teams, but it's pretty strong among our analysts from a skillset perspective.

In terms of functional problems, how I like to describe it is I view the analyst as kind of the business data consultants. So they're often tackling kind of vague questions. So it might be a trend that's being observed in the business and their task is to kind of dig in from a data perspective and find indicators that might be indicative of why that trend is occurring. But they might actually not get to the level of building the predictive model to make that happen. It's more kind of extrapolation from visual trends often, and some basic kind of descriptive statistics. And then a heavy amount of like data storytelling, which I know can be a general term, but I do think it's a core aspect of what they do.

And then also just bread and butter reporting. So how are we doing as a business on our key metrics, right? So wins, our deal sizes, et cetera, just the staple daily refreshing tracking of that. Data scientists, often they will be working on a very specific problem that was uncovered from the analyst. So usually a predictive model.

Managing Python and R on the same team

Yeah, sure. Sorry, I was on mute. Katie, you mentioned you have your data scientists on the team use both Python and R. I guess like how do you manage that mix of languages within the team?

That's a good question. So admittedly, I am the person who is the R user and R2 kind of core data scientist or Python users. So I'd say the vast majority of the development code and production code is in Python. In terms of how we manage it though, I think it seems simpler in practice than I pictured it to be. So both of the data scientists are really strong in Python and have dabbled in R at least, if not used it in a master's program, et cetera. So they can read R and I can read Python and vice versa.

But if either one of us were going to build something, it would be much quicker to do it in our preferred language. I am kind of a proponent of the right tool for the job. So most of these things we have not tackled quite yet, but on our roadmap, there are certainly problems that I foresee R being the better tool because there's a more robust set of statistical packages out there for kind of niche problems.

So I'm a big proponent of right tool for the job, right tool for you, but I'd say anyone who's fairly strong in one typically doesn't have problems reading the other or interpreting the other. So I will often do code reviews or coaching sessions on Python code, even though I'm not the expert in Python, it's pretty quick for us to get on the same page of what I'm trying to convey or what they're trying to convey. And conceptually, most of our like working sessions are going to be more about the model we're choosing, the performance metrics, et cetera, features that we want to include, and a little bit less about the specifics of code.

Choosing the right tool for the project

That's a good question. We actually have kind of an example I'm thinking of right now where I'm trying to guide kind of our senior data analyst and our data scientist on which tool to use. So I usually start with the end user of whatever we're outputting. So, I mean, if it's a model, it's kind of a no-brainer, like Looker, the BI tool, SQL, probably not the right tool set.

So right now we have a bit of an automation project, basically at a high level, just automated reports that need to get churned out every month with a heavy amount of math and calculations under the hood. So right now I'm trying to evaluate for the end users of this, what are their primary needs? Do we need this to be static, auditable, and traceable? That might lend more to Python or R Markdown, right, as the tool set. Or do they need to interact with elements of this and there's less of that need for it to be a static PDF? That might put me more in the Looker camp, right?

So I usually start with the end user and then back up from there and that's kind of the same process the team goes through.

So without going into too much detail, that kind of brings me down the path of like, wait a second, what tools are really required to meet that end user need? So I usually start with the end user and then back up from there and that's kind of the same process the team goes through.

Hiring for data science roles

Yeah, so from a data science perspective, I mean, the bread and butter staples are that you can program. And I do kind of primarily only focus on R or Python. So while I would absolutely consider a candidate who is really strong in SAS and maybe had never used R or Python, I just feel like the talent pool, there's just, it's just already so rich in one or the other. In terms of which one or the other, I'm kind of agnostic, and that might be because I come from being more of an R user, but I think it's just hard enough to find good people.

So I think the other thing, just based on where we are as a business, is the data scientist candidates I was looking for definitely needed some kind of element of not just developing the model, but also scoping it, delivering it back to the business. So we certainly lean heavily on myself and the analytics team for that portion of things, but it was also important to me that the candidate was kind of senior enough to have that exposure.

Yeah, I mean, to be honest, it took a while, right? In terms of where we found them, so one ended up being a referral of the other, which is kind of the best situation, but I believe kind of it was just LinkedIn seeing the job posting. But I absolutely got some great candidates who just ended up not being a fit for a variety of reasons through open source community forums. So I posted on Our Ladies. I posted on kind of our local Columbus tech org channel, and I definitely talked to amazing people from those sources. So that's kind of always a first go-to for me is, yes, our recruiter puts it on LinkedIn, but then I take that link, and I push that out to all the open source communities that I'm involved in.

Transitioning industries and selling yourself

Yeah, so I have a question. So when you're looking at candidates, and someone's looking to transition from one industry into another, how do you take that into account when they may not have the most relevant experience in the industry that they're in?

Yeah, there are, I'm trying to think of, like, of our whole team, there's probably only one or two people who had insurance experience. Obviously, I favor that if you had insurance experience, but again, I just find that it's pretty hard to find really good people who fit your specific needs and the time you're at an organization. So there are a ton of amazing data scientist candidates out there, but starting out fresh, it wasn't really the best play for me to think of somebody super junior, because there just wasn't enough, like, coaching time to go around, right?

That's a great question. So, like I said, we definitely have hired people without insurance backgrounds, but I think what has made them really attractive is they have some sort of demonstrated ability to learn, right? And showing a passion for that, right? So I can teach you the industry. It's much harder for me to teach you, like, foundational statistics principles, to teach you coding and programming.

So that's always my approach, but I will say what can be a little challenging is if you ask somebody for examples of when they taught themselves a new concept, or learned a new process in the past, and they come up blank, that can be kind of a red flag. I know coming out of academia myself, that can be a tough transition, because you, in reality, if you've got a PhD, right, like, you have four to five years of job experience, but people don't often look at you that way in the market, unfortunately, and I think that's a big miss.

Unfortunately there, I think a lot of that is you just getting the opportunity to sell yourself, right? So, like, if you can just get that first interview and demonstrate how the skills you learn in your academic position will translate, and, I mean, what a better example of, like, grit than completing a PhD, right? Like, if you can sell yourself on the transition of your skills, I think you can succeed in that interview process, but I think where people probably, unfortunately, get overlooked is just getting that first interview.

If you can just get that first interview and demonstrate how the skills you learn in your academic position will translate, and, I mean, what a better example of, like, grit than completing a PhD, right?

So I think a lot of that, if you're coming straight from academia, is, like, tailoring your interest to the right companies. This is just a generalization. I wouldn't say this is true of every major company. Absolutely not. I've definitely seen large companies that have a appreciation for academic talent, but in general, I think startups or mid, like, mid-stage growth companies tend to be a little bit more friendly to the academic crowd. Because there are folks there. I think if you can find a manager that you might be working under who has their PhD, they are going to be more alert to you as a high talent potential.

Oh, hello, I'm coming from a mechanical engineering field, so mostly we have to develop new projects or improve already existing ones, and because of this, my work comes to gather data initially and later I come to the engineering field and work, but my problem is I create these projects, I develop them, but because of non-disclosure agreements, I can't really share my work, my data analysis, so I get a lot of work, but exclusively as freelancer, because as I can't show my projects, I can't really get a job easily, so what could you guys recommend for me?

Yeah, I was reading Zach's answer, I love that, like is it possible to simulate something akin to some of the data you would use and then feature that as a work sample? I would be curious, I do feel like project portfolios, submitting the actual work is not really something I've seen a ton in interviews, so like you need to be able to discuss the work. I've seen less often that you need to actually submit your prior work outside of like a work sample they'll give you as part of the interview process.

But yeah, I really liked Zach's answer and like simulating data, I would say if more of the challenge is what I think it might be in just simply describing the projects, I mean that's really tough, right, like even when you guys ask me about the BEAM projects, right, you want to say something meaningful, but you don't want to be too specific. So I think honestly there's probably more help than you realize in like honestly having kind of a role-play conversation with yourself on questions you're getting in an interview and practicing answers that are more generalized and abstracted.

So I would take one of your specific problems and this is also a good tip for academia, right, like my academic research, nobody really cares about process outcome therapy like research, right, like that's not going to translate well, so I need to translate that to like an abstract problem that I tackled with statistics and programming. And I wonder if there's a piece of what you could do the same here, so like I don't know your specific problems, I would fumble over even trying to guess, but trying to figure out like well what am I doing in an abstract sense, am I classifying things, am I doing an unsupervised approach, am I optimizing value in some way, and trying to figure out like can you even take a stab at like verbally saying what you're doing without going into too many details and then practicing that answer.

Yeah, just wanted to mention, I've been in a similar boat my whole career, and it's very tough because a lot of what you learn on the job, you know, you want to showcase, so how can you showcase that, and like a follow-up to Zach's comment, there's a package out there called Synthpop, and I'm sure there's a bunch of other ones, but Synthpop helps take your actual data points and sort of simulate them to make a completely synthetic data set, so like that's one way you could kind of take actual distributions from your real projects, you know, create kind of a clone of it in a way where it won't be traceable back to the source, and then create a, like a web report, you know, a R Markdown report, or a Shiny app, or something around that data.

At the job I'm currently in now, when I was interviewing for it, my team had been all around data their whole careers, but not really with programmers, so they were very familiar and accustomed with like Tableau, and Looker, and they didn't ask me to do this, but I probably spent like four hours, five hours on a weekend, and built a Shiny app that I deployed for free to shinyapps .io. I styled the whole website, or the whole app to have styling elements consistent with this brand's, you know, this company's like logo and styling, so that they could kind of see what a really clean looking Shiny app would look like, and like I don't attribute me getting the job to that alone, but when I showed that to all of the interviewers, they were all like super impressed.

Yeah, mainly just, you know, styling a web app with the company I was interviewing, with their entire, you know, logo set, color theme, it was all controlled through, like I used bslib , so I was able to have like a custom .css file, so sassy, you know, sass css, and like doing this kind of thing, if I was actively looking for jobs, interviewing, and interviewing, and interviewing, like the ability to iterate and change out the entire theme is so easy, I mean, it can be, once you have a template, this can be done in like 10 minutes, five minutes. So for, you know, anyone who's trying to get a job, and you're interviewing with a lot of companies, this could be a really easy way for you to have something living on, you know, your personal shinyapps.io, and whenever you interview with a new company, just like change the link out, change the color theme, and you could do this very quickly, and showcase that to, you know, your interviewers.

Speaking at meetups and building your network

Yeah, absolutely. I mean, I, obviously, things have been a little different in the past two years, but I think some of my richest connections have come from speaking. I also think that it's just, you get, it's one of those things that's hard to force yourself to do, but you get incrementally better at it, the more you're exposed to it. So, I think it helps you just in your job, too, just present to stakeholders, present to other non-technical users, but I've found it super, super, kind of, accretive in terms of building up my own network.

So, I think, like, Our Ladies, for me, is a great starting point for that. Our Ladies Columbus chapter is a little slacking, it's been a few months since a few of us had kids, we're a little slower than usual, but Our Ladies is, you know, in pretty much every city. And kind of one of the positives from our current situation is that a lot of these meetups have gone virtual, so it's surprisingly easy if you don't have a chapter in your city of Our Ladies or any other meetup for you to go online and find a meetup you can join.

Yeah, I would say start with a comfortable one, so I know people are pretty hesitant to speak at our Our Ladies meetup, and so we try to have people who were hesitant be then the advocates for people who are hesitant once they've spoken. But it, if you're interested, we are always looking for guest speakers at Our Ladies Columbus, so definitely feel free to hit me up. But I would say start with a small group, right, a small comfortable group where you know it's kind of low pressure. Sometimes the presentations are just walking through code, right, it doesn't need to be super formal.

Resources and recommendations

Oh man, there was a series of videos on our bloggers called 14 Hours of Machine Learning, and it was the slides and lectures from Hastie and Tibshirani from their book, like Intro to ML. I'm going to try to, I'll send it to you afterwards. They've taken down the links to all the free videos somewhere. I can't find it anymore, but I will look for that. I think that is like the best intro to classic stats problems resampling.

I'm a big fan of all of the like RStudio articles out there, so I think like Tidy Models, for example, that whole suite on their website really allows you to get up and running super quickly, and then obviously Meetups, and the proponent of those two I think you can learn a lot. So hopefully that was a quick but informative answer, but I just love that Hastie and Tibshirani resource for like the fundamentals.

Thank you so much, Katie, for jumping on and sharing your insights with us and answering all these questions. If people do have follow-up or want to connect with you, is Twitter the best way or LinkedIn? Probably LinkedIn is probably the best source there, yeah, on Twitter, but usually when I go to a conference, I'm not the best Twitter steward, so.

Featured software#