Data-driven people analytics | Josh VanderLeest | Data Science Hangout
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome back to the Data Science Hangout, everyone. If we haven't met, my name is Libby. I'm a Community Manager with Posit. I help host the Data Science Hangout and foster our Hangout community. And if you're not familiar with Posit, Posit builds enterprise solutions and open source tools for people who do data science with R and Python. And we are also the company formerly called RStudio.
I am joined by the creator of the of the Hangout and my lovely co-host, Rachel. Hi, everybody. I'm Rachel, as Libby said. And I'm actually usually, well, I'm usually in Boston, but I'm in Minneapolis this week for our company-wide workweek. So Libby will be our main host today, but I'll be helping out behind the scenes here.
Yeah. The Hangout is our open space where we hear what's going on in the world of data science across all of our different industries. We chat about data science leadership. We connect with other people that are facing similar things to us. And we get together every Thursday, same time, same place, with very few exceptions.
If you are watching a recording on YouTube and you want to join us in the future, we would really love to have you here live because you get to ask questions that way. Just as a call out to anybody who is adding it to their calendar right now, make sure it adds for the right time. It's 12 p.m. Eastern time.
Alrighty. I wanted to thank everybody for making this a friendly and welcoming place. You have all made it that way, and we are committed to keeping it that way. If you have feedback about your experience today, whether it's good or bad, we want to hear from you, and there's going to be a survey that makes that really easy.
At the Hangout, we love hearing from you. The Hangout is a community discussion. This is not a presentation, a slide deck. This is a community discussion. We want to hear from you, no matter your level of experience, your title, your industry, what language you work in, or whether or not you even use a coding language. And we also really encourage you to connect with each other in the chat.
There are three ways to jump in and ask a question today. Because this is a community-led discussion, we will not have questions to ask our wonderful co-host guest here today unless we all jump in and ask them together. So you can raise your hand on Zoom. You can put questions in the Zoom chat. There's also going to be a Slido link where you can ask a question anonymously, and we would love to have you do that as well.
With that, I am so excited to be joined by our co-host today, Josh VanderLiest, a manager of data analytics and people analytics at Progressive Insurance. Josh, it's so great to have you. We would love to have you introduce yourself and tell us a little bit about what you do for fun.
Josh's background and journey into analytics
Sure, yeah. What I do for fun? Well, yeah. So hi, everyone. I'm Josh. I'm based in Cleveland, Ohio. Work for Progressive Insurance. Been here for seven and a half, coming on eight years now. I'm in the people analytics space, but sometimes that means data engineering, data science, analytics, consulting, all sorts of things that I can get into. Something I do for fun, you might, if you've never been to Cleveland or surrounding area, you might not know that we have one of the best park systems in a national park nearby. So I like to hike, find new waterfalls and new things to see and do.
So I went to undergrad in Michigan and initially I was focused just on psychology. I wanted to be a therapist or psychologist, something around there, like an internship around that space. And turns out I don't like that at all. So I took a very different route and I was like, well, I think I like stats. I like cool charts and things. And so I got more and more into the statistics, cognitive psychology. I started working for a consulting firm called Center for Social Research. And so we were just doing really any kind of community-based research, understanding effects of the watershed, doing surveying for local communities on effective learning, just kind of all sorts of things.
And I stumbled on something called, and this is so wordy, industrial organizational psychology. So if you know people analytics, hopefully you know IO psych for short. IO psychology, while kind of small, is growing rapidly. If you look it up, you usually see like number one career in the social sciences and things like that. So it's a growing space. If you haven't seen it before, give IO psychology a Google, you'll probably end up on PSYOP website, which is society for IO psychology.
So I ended up going to graduate school for IO psychology. And while I was there, I continued to really like consulting for companies. That's how I ended up in Ohio. I moved to University of Akron for my PhD in IO psych. And I didn't love the theory. I really liked the practical stuff. I liked writing code. I liked creating impactful reports that people actually used. And I ended up leaving after getting my master's.
And I ended up finding the people in, well, it was a data analyst role at Progressive. Back then we called it human capital analytics, but I thought that was just a terrible name. So we did end up changing it to people analytics. And starting out, it was around just report requests, like, hey, I want to know what turnover is in IT or, hey, we need this staff for this report.
But slowly kind of grew and developed. When I joined the team, there were four, five of us, and then now there are 18. And I started as an analyst, and now I sort of carved out a niche more in the data science side of things, research and analysis. I lead a team of three, but a majority of my day is individual contributor work, developing, reporting solutions, applications, using R, Python, and all that kind of stuff.
And I'm just so grateful that I am still in this space. I love what I do. I get to do all sorts of things. I think that's what's fun about people analytics is it touches so many different areas of the business. Even at Progressive Insurance, where we have 70,000 employees, I get to do everything, all people data, surveying, recruiting, hiring, employee retention, I get to do it all.
What people analytics teams work on
In a larger company, you know, 70,000 people at Progressive, but other companies like us, you usually have multiple functions within people analytics. We have specific functions. So I say I work in people analytics, but I'm one of four teams. So we're all small teams, but we have business intelligence, business intelligence developers. So these are the people really focused on creating data solutions, sometimes building apps. So, right, we have data coming from all these different source tools. How do we get them into Snowflake or whatever database you're all using?
We have data governance, definitions, making sure people have what they need, making sure the right people get to what they need, and then reporting solutions. This is our largest team, still not me. They are creating reports that any HR related function might need. So, you know, we have maybe three, 400 HR consultants out in the field supporting IT, CRM, contact centers, claims, right?
And then me. I'm that last one, which is, we call ourselves research and analysis. So think data science, we do all employee surveying, and then support kind of, AI solutions and trying to not be too wordy or, you know, trying to make it practical. Research for decision-making. Hey, we want to change the way we give people PTO. How can you help us predict the impact of that? Employee surveying, legal support when it's needed, DEI, data-driven DEI, is it working? How do we do it? How is it effective? How do we integrate it?
An example of a project, surveying is a fun one. So with the rise of generative AI, suddenly it's become a whole lot easier to summarize huge amounts of comments. So, right, we just had a survey go out to everyone about engagement and culture. How can I summarize, call it a hundred thousand comments for our CEO to get some kind of understanding of, oh, this is what people are talking about. And here's how it looks different from past years. So summarizing and reporting on surveys using traditional analytics, Quarto, shiny apps but then also using more of the hip, not sure if it's useful or not yet generative AI to summarize large swaths of comments.
And then, yeah, the turnover is a good example. We have a project trying to better use hiring resume data to understand where people might fit best at Progressive. And then also try to understand, hey, who's most at risk of being unhappy or turning over earlier in their career and kind of improving the onboarding experience for folks.
From NLP to LLMs for survey analysis
So the survey comments, I mean, great question. So we still do some of it. Before LLMs, we did, we're still doing what we would call NLP, natural language processing, but it was at different levels of sophistication. Simple as word counts. What are the words mentioned in these surveys for each question? We would do bigrams, trigrams, which just means like, what are two words that are being mentioned next to each other the most often? It was a lot of like data wrangling to get the data in some kind of interesting cleaned up way.
And then think about single word, bigram, trigram. What was, I think it's called tidy text, the R package. Still a really useful approach and not something we've just like thrown away. And so then you have word counts, you have sentiment that you can just pull out of sentences or words based on huge data sets of like, this word tends to be negative or tends to be positive.
The one other thing we would do was structural topic modeling. So there's an amazing R package called STM. I still truly think it's one of the best approaches to summarizing big amount of comments. If you haven't gotten to the LLM space yet, still maybe even better than LLMs, but very intimidating to start working with. So it was a combination of tidy text from Silgi and team and then STM to do structural topic modeling.
Structural topic modeling is similar to LDA. If you've been in that space before, it's sort of just take all the texts, take all the comments and look at in a vector space, what are the words that tend to be mentioned near each other? So people that are talking about mentioned the word flexible and work remote and pick up my kid from school. Like those are all similar and words that are mentioned in similar spaces. And so STM is really good at picking up those relationships and creating, hey, we think you have 80 topics in this corpus and here are the top words mentioned for each of those 80 topics.
I still truly think it's one of the best approaches to summarizing big amount of comments. If you haven't gotten to the LLM space yet, still maybe even better than LLMs, but very intimidating to start working with.
Yeah. Topic modeling. So we still do that sometimes, but to be honest, I find large language models are pretty good at coming up with those 80 topics and very good at the feature engineering. Hey, I have this comment, please come up with the right topics that are represented in it. So that was the before. And then the after is throw everything away and use LLMs for every step. But where it makes sense, there's obviously the challenge of, hey, every API call is costing Progressive money. How do we do this responsibly and efficiently? And most importantly, as accurate as before, if not better.
Resources for learning people analytics
My favorite resource that I send to pretty much everyone that talks about people analytics. It's called people analytics dash regression book.org. The author is Keith McNulty. And if you're a LinkedIn person, he's great on there. Keith McNulty. He's one of my favorite on LinkedIn. And what I appreciate about him is he's pretty agnostic about it approach. You know, you don't have to use R, you don't have to use Python. He's got examples everywhere.
The other thing is like, if you're really into people analytics, that's the place to start. But honestly, I usually am just sending folks the CyOp website too. So CyOp.org. I know it's not exactly people analytics, but it's people analytics adjacent and I went to school for it. So, you know, I'm biased. What I like about CyOp is they have meetups similar to the Posit ones coming up, where it's industry research and consulting. And I love seeing where those do or do not intersect.
Impactful workplace changes from people analytics
When I started years ago, I looked at how Progressive does people development and onboarding. There was work to do. And to me, the biggest piece that was missing is, okay, when we're in person, I can see you onboarding is fine. Cause you know who to talk to when we all went virtual, we really lost that part of onboarding where it's making connections, finding people like you.
We had to rethink the onboarding process at Progressive. It no longer worked to just like, here's your computer. Enjoy. Like I'll give you a project in a week. And so, you know, I'm in people analytics. I'm not on the ground, like the HR rep, making sure things are going well. I'm a couple steps removed, but what I saw was we don't have a good grasp of what's going well and what's going poorly.
A more recent of like people in analytics insight that made a difference was, and this is one of my proudest accomplishments of last year. We introduced our very first onboarding survey program where we would check in with you now several times through your first year and just ask what you ask, if you have what you need and how can we help you? And I know it sounds lame, like really another survey, but I promise you this one is great. And what we've seen come from it is our leader seeing, oh, this thing is consistently being mentioned as a pain point. And it makes them really easy to put interventions and say, let's change the way we do this.
Let's come up with a buddy system for people in their first year. Let's find them a cohort where they can meet with them every week. And so what I think is really useful about the way we've approached this is it can't be a one size fits all at Progressive because our jobs are so distinct for our function. Like I said, but like we have people going to body shops and people getting their cars and going places. We have some people taking calls, we have sales reps, we have product managers, and like the onboarding experience is so unique and different.
Attrition modeling and working with stakeholders
We have the same, same challenge. So we live at the, we, like my team, we're in corporate HR. So I think of ourselves as like enterprise wide and we have that same challenge of, well, I can create a really sophisticated, validated attrition model, but I'm not the one in CRM on the, on the floor, like doing the interventions. We have counterparts out in the business, right?
I run into this exact issue is, oh, these three groups came up with their own turnover model and it doesn't seem like anyone's really validating them. How do we ensure they're of high caliber? There's only so much I can do. So I think ourselves a little bit as give them the right tools because they may not have the technical expertise that we do. So for example, we've created a suite of here are like, here's the tool kit we suggest you use to do attrition modeling.
For now, it's a toolkit of here are the things you should be thinking about. Here's the way we recommend modeling attrition. Because I think that's one, a lot of people are not don't currently really understand the right way to do it. And if you don't, you might be doing a really bad job and not realize it. So for example, it's easy to use something like logistic regression control for something that actually has a big time component to it. And now you've just modeled time rather than actually worrying about the event. So things such as here's how you can use survival analysis. Here's how you do time dependent and time independent covariates. And suddenly you have a really predictive, really solid model.
None of these models are gonna be perfect. There's so much individual difference that you don't have data to back up. And we also don't want to be big brother-ish and like, hey, we saw you weren't sending enough emails or whatever. In our case, even simple things seem to be helping. So use a somewhat simple survival analysis model, predict your high risk folks, and then just have HR set up a check-in with them.
Progressive really focuses on like transparent leadership. Meaning I feel like I can actually trust the CEO and like talk to them. And so even like that skip level, we found using that as an intervention tool, using the model that I collaborated with, with claims, for example, has been effective at reducing some of our high risk folks. But the tricky part there is how do you keep it simple enough to actually validate? Is that actually helping or no?
And that's to me, like that final step of validating a model and validating intervention effectiveness, it's really hard to get by it. Like, why should I care? I used it. It's probably working. And I'm like, oh, come on, we don't know if it really is or not. So like getting by it on that last step of model validation is really hard outside of the data science community. It's just not something people can really resonate with.
And that's to me, like that final step of validating a model and validating intervention effectiveness, it's really hard to get by it. Like, why should I care? I used it. It's probably working. And I'm like, oh, come on, we don't know if it really is or not.
Ethical concerns with AI and employee data
The AI space is something newer to the people analytics team. So I'm on the responsible AI cross-functional team at Progressive, which is honestly a ton of fun. So they have really been forward thinking on like making sure folks are doing this responsibly. We have a cross-functional team that even has an HR perspective where we review every single AI model we deploy at Progressive. Every single one we have listed, we rate it for risk and rate it for how it's being used. And is it responsible and ethical? And we better always say yes.
The other thing to get to, again, before I actually answer it, is there's this really important distinction happening between models that a company deploys and a model your vendor deploys. More and more vendors are trying to squeeze their way in and do the AI for you. And they'll say, don't worry about the responsible part. We got this. It's responsible. I would say if they can't tell you an explainable AI model, don't do business with them. And I know it's not always your decision, but at least push back and say, we need to understand how your model is validated, run, where it's hosted. Is it in the US? Is it following regulations? How do you make decisions?
And even if it's a black box, at least in the employee space, you can still do the very traditional adverse impact on the decisions it makes. You can always do the very simple 80% rule and a quick P value on the decisions the system is making. So now to, I guess that partly answered it. How do we responsibly deploy AI with employee data? I'm not convinced we need AI for everything, which is a weird thing for me to say, because that seems to be all I do lately. But I think a lot of us are jumping way over the good data, easy to get data, well-defined to the AI, and then we're just missing all this stuff.
And then you end up with a model that's poorly predicted. You can't explain it and it's not being validated. So you take steps back and say, do you have well-documented data of high quality? And then do you, are you just doing reporting analytics? And then maybe you have that last step of like, okay, let's use AI to drive decisions. And still it's like, I bet you could just use a linear model and get 96% of the way there and have it much more explainable and you'll feel better about it.
LLMs in the hiring process
I think you mentioned sort of comparing the hiring resume of people to fits in their jobs. I'm not convinced it's all that useful yet. When you think about the data that you've actually sent in, when you apply for a job, it's a resume. I know what you've done, types of experience, how long you were there, et cetera. It doesn't really give me that much information. And, you know, there was this era in IO psych where we were all in on bio data, which is essentially that like everything you could pull up, pull out about a person, it only gets you that far.
And it really doesn't carve out enough to predict like, oh, that person's going to do great in the job. So I'm still actually a kind of call it traditionalist in the, like, how do we find the right candidate where I still think it's resume screen just for min qual types of things. And then really having the right assessments throughout the hiring funnel. I still think a little bit more traditional assessments that measure the right cognitive abilities, personality fit and job fit that don't need fancy AI.
Don't just give me a personality assessment. Give me a half an hour experience of actually doing that job. And it rates me along the way. Now I know, oh, that job is not what I thought it was going to be. I don't think I want to do that. And now you have some data to know if I'd be good at that job anyways. So I am all in on a valid predictive assessment. That's a pretty good experience for the candidate and gives them a realistic job preview along the way.
For me, where we've looked into that is more like using LLMs for feature engineering. Hey, we want to look at candidates that have this type of experience because now they're at Progressive and we can send them, hey, we think that this might be a good next step for you. So that's where I think they're more useful for now is feature engineering, pull out types of experience, lengths of experience, those kinds of things for more formative and development, less around selection decisions. I would not use LLMs in selection decisions or anything like that. I don't trust those types of things yet.
A/B testing and experimentation in people analytics
I only have so much influence. So I have this challenge where sometimes my HR partners will be like, hey, we implemented a thing. Let us know if that worked. I'm like, oh, you told me before and we could have done A-B test, right? So often there are just process changes and they're happening so often, there's just no way to do an A-B test. So practically, not usually.
Progressive senior leadership highly values data. And I can even tell them they're wrong. If I have data to prove it, they will believe me and change things, which I think is amazing. There are places we do A-B testing. It's usually smaller and maybe less risky. We do these inclusion quarterlies. We call it inclusion quarterly. It's just about different kinds of things. And so the question was, how can we encourage participation? And so we did an A-B test around our approach to encourage people to go to the event and get engaged. So in this case, our experimental condition, we put a block on everyone's calendars. Not that they had to go to the event, but we put a block there so that no one could put meetings on that day.
I think companies in general undervalue the existing research in this space. We already know what works and what doesn't in onboarding, in development, in performance event. We know a lot of the stuff, big companies are just stubborn and think we need to do the research ourselves. So I honestly think A-B testing is not always worth it because I know the answer, just they're not listening. So it's more about giving a compelling story and effectively looking at external research before you would bother doing A-B testing that space.
And the other thing I want to mention was sometimes practically speaking, it's just not possible. So you have to really think about the right way to do A-B testing. For example, if I just did a pure random sample, okay, now some people on this team are experiencing the thing and some aren't, and everyone talks to each other. So there's a lot of that bleed if you don't think about A-B testing in the right fundamental hierarchical way, making sure you're doing it to folks that are equal, that probably aren't actually interacting and still doing that responsibly where you're not affecting some group negatively.
Career advice
I have a boring career. I started at Progressive in one department and never left because I like it too much. So I have smaller things, not some big like here's how to rethink moving your career. One is for your own year in, year out performance, keep a brag sheet. Keep a brag log. That's basically the concept is throughout the year you're writing code, you're working with people, document what worked really well so that when you want that promotion, you have your data-driven documentation of why you deserve it.
I think some of the most brilliant people do such a terrible job advocating for themselves. Brag about yourself. So document every year, set a calendar invite once a month, once a week where you give yourself 15 minutes to write down, here's what I did really well this week. It could be small, it could be big, effective PR or pull requests. You found code that was wrong. You implemented the new onboarding program, whatever. Brag sheet.
I think some of the most brilliant people do such a terrible job advocating for themselves. Brag about yourself.
Put my day, half an hour recurring, every Friday at 10 a.m. and just label it. That's what I do. It doesn't have to be fancy, and I just write a couple things in an ugly formatted OneNote or whatever you use at your company, and then just by the end of the year, boom, you've got it ready.
Thank you so much, Josh, for joining us and imparting all of your people analytics wisdom with us. We had so many questions left over that we could not ask them all. Maybe you and I can get together after this, and we can rapid-fire get some answers. Sure, I'd be happy to. This was a lot of fun. Thank you, everyone, for the participation. It was an absolute honor. Thank you.
Absolutely. Thank you, everybody, for spending your lunchtime with us or your breakfast with us. We hope that you enjoyed it. When you close this, there's going to be a survey to kind of give us feedback or ask us questions. Please, please, please do that. Your feedback means so much to us, and we will see you again next week on Thursday. Have a wonderful week.
