Why regression still matters | Keith McNulty | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the POSIT Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I'm just really super excited to introduce our featured leader today, Keith McNulty, Analytics Leader at McKinsey & Company. Keith, welcome. Will you let us tell us a little bit about yourself, what you do, what you like to do for fun?

Yeah, thanks, Libby, and thank you guys for inviting me. It's a lot of fun to spend an hour speaking with such a diverse and interesting community here. Yeah, my name is Keith McNulty. I kind of navigated a path into the data science world in what's probably a little unusual a way. I think nowadays people are coming into data science having done some sort of data science-related qualification, whereas I started out as a mathematician. I then went into management consulting. I then moved into psychometrics, which is a fascinating field which I still am very involved in, and the study of employee behavior and how it affects the workplace. And then about maybe seven years ago, I caught the data science bug and decided to retrain myself as a data scientist, built a team at my employer, which is now a fairly large team that kind of focuses on the applications of analytics to solving questions related to people and talent.

And in terms of what I like to do for fun, I mean, I just love engaging with the open source community. It's how one of the reasons why I've managed to have a lot of success in my current career is because of the contributions of a lot of open source individuals and the fact that I've been able to take advantage of that and my team has been able to take advantage of that. So I believe a lot in giving back. So I spend a lot of time developing materials and publishing open source books and things like that to make sure that things that I've learned are passed on appropriately to other people so they can take advantage of that as well. And I think, you know, for me, one of the things that attracts me to working in this space is the amazing open source community and the way people like help each other.

Keith's path into data science and LinkedIn presence

I first learned about you via LinkedIn. You're very active on LinkedIn. Were you that active before you were in data science? Were you like already out there spreading knowledge, writing books, doing all the things that you do?

I certainly wasn't very active before I got involved in data science. I think part of that is because my involvement in data science kind of coincided with when LinkedIn became more of a social network. But certainly, you know, there was this very exciting period on LinkedIn, probably between 2016 and 2018, where people were discovering a lot of this tooling for the first time, lots of people. And there was a huge amount of sharing of kind of open source knowledge. And a lot of that happened on Twitter as well during that period. And so, you know, one of the things I try to do is keep that going. I think, unfortunately, LinkedIn has become a bit more salesy and a little bit more, you know, it's kind of its culture has changed a little bit since that time. But I certainly personally try to keep that spirit going and making sure that like what I share there is useful to people and, you know, that it's something they can take away with them and potentially use themselves.

Learning resources and self-retraining

So, hi, Keith. Thanks for sharing your information today. I was just wondering if you could maybe talk a little bit about your most favorite resources that you used while retraining yourself as a data scientist. Things that seem to be the most helpful or most enjoyable, engaging.

It's a great question, Tony, and I'm not sure my answer would necessarily align with how people would retrain themselves today. And the reason I say that is because I retrained in 2016, and there was a significant limit to the amount of public information that was available at that time in terms of example code, how you might get something done if you want to achieve a particular task with your code. And what I found was that the learning during that period for me was in a huge amount of trial and error. So I'd write code and I'd just watch it fail. And then I would dig into Stack Overflow and some of those kind of resources that date back quite a while to try and work out how it failed. And sometimes I wouldn't be able to find the answer, and I would just keep changing my code until eventually it worked. And then I'd realize why it worked.

And that type of process was a huge part of the learning for me to get underneath why your code works. And of course, the resources that got me started are the resources that a lot of people still rely on today. I started out in R. I'm a big user of both R and Python now, but I started out in R. And of course, Hadley and Garrett's book was huge at that time. You know, their early editions of that were out to get me started with the basics of using the language. And there was a little bit of online resource, but that only took me so far. It was really having my own data set and trying to do my own tasks with my own data that really kind of forced me to learn. And I learned almost entirely through trial and error.

And I think there's an element of that that we still need to have today. So I think that if programmers are not learning to program through trial and error, then they miss out on a lot of learning. In particular, one of the things that I see is people are using Copilot to auto-complete their code lines. And then their code lines don't work, and they don't know why they don't work. And so they have to go back and dig into it. And that process is very valuable, right? It's how you actually learn what's going on under the hood.

And then what I found is later on, after I'd gone through that initial kind of hump, and I'd, you know, worked out how to get my code working, I always say there's this period of success, where you kind of, you've passed the initial hump of success. And that point is when you type a line of code and you expect it to work. So there's a period you go through, you type a line of code and you say, I kind of know this is going to fail, because I don't understand it well enough yet. But then there comes a point where you say, I type a line of code and I'm surprised if it fails, right? When you hit that point, that's kind of your first step up point in learning to code, I think.

When you hit that point, that's kind of your first step up point in learning to code, I think.

And then once I passed that point and got further past it, I then started to, you know, use a lot more advanced resources. A lot of, for example, Hadley's more advanced books on R. And started to really get under the hood of that language before I moved on to Python. So it was, it happened in various stages, right? But I think that the message I want to get across is the actual resources are not helpful if you're not actually applying them to your own context and data sets, because that's how you really learn, I think.

Look for a set of people that you trust in terms of what they're saying about this, because there's a lot of hype out there, which you have to be very careful of.

Measuring talent and white collar productivity

I would break it into kind of two components, which is there are a set of constructs which we kind of organizations impose on that relate to what they regard as success of their employees. And I'm specifically referring to white collar employees with this. And those are usually designed by like psychometricians and talent teams, and they'll often be part of development structures or evaluation forms, all of those things. And those things tend to be developed on a theoretical basis using research. It's usually qualitative research with your employee base. But then you have this question, which is more interesting for me as a kind of quantitative individual, which is what data can you get to kind of validate those constructs?

So one of your constructs might be that you're successful in a particular job over a particular time period. And there's several constructs, several measures which you could use to validate that construct for individuals. One is what's their performance rating? But what if your performance rating system is rubbish and everyone gets the middle performance rating and nobody gets anything else, which is quite common in many environments? You also have promotion, right? But what if you're in an environment where promotion is automatic after a certain period of time? Then that's not a great differentiator of success. So a lot of what my work involves is how do we take data that could be indicators of a certain construct and use it in some sort of intelligent way to try and answer some of the questions. And often there's a lot of data that we'll throw out because it's just not useful from an analytic perspective. And there's other data that turns out to be very related. And one of the things, going back to network analysis, to be able to understand like colleagues networks and how they interact with other people, that turns out in many, many use cases to be a very, very valuable indicator of a lot of things with white collar workers, right? If they have large networks or if they're very central to their networks, those sorts of indicators I found are often very valuable.

Moving from academia to industry

It was a real baptism of fire to move from academia into industry. And I think anybody who's done it probably kind of can relate to this, particularly as a mathematician and a pure mathematician at that. You know, the way I describe it is it's a very enclosed and protected environment. And because of the kind of intellectual kind of nature of that environment, there's very little investment in how you communicate as an individual. So a lot of the attitude is, you know, if you don't understand what I'm saying, that's because you're not as smart as I am. Right. And that's kind of a protective envelope that a lot of like, particularly in mathematics, there's some that kind of prevents you from having to work hard at how you communicate because you've got that sort of comfort blanket.

But when you move into industry, it's like, you know, sometimes more than half of getting to the right outcome is how you communicate your approach and how you solve your problem. Because if you can't, if people are not bought into the solution, they won't go along with it. And all that work is wasted. They have to believe what you're saying. And that was just I had to learn that skill because that was not something that was in any way taught to me in my academic career. And that took maybe a year or two to really build my confidence there. And it was incredibly enriching because I don't think I'd be the person I am today if I hadn't have gone through that.

And and I think that's I've taken that through with me because it's taught me the value of communication. Right. I think if there's a danger that if I'd gone straight from academia into into data science, I might have still had that mindset of, look, you don't understand my code. That's because you're not as smart as I am. But I know now that writing the code is one thing, explaining the outcome of the code and being able to convince people that you know what you're doing and that they can trust you because you have good, good knowledge and a good foundation to be able to listen to their questions and understand what it is that's concerning them, to be able to give you give tailored responses to that. All of those things I learned in those first three years of my career, I think, which have served me well in the long run.

Imposter syndrome and career transitions

I mean, imposter syndrome is a really big thing in many, many fields, right? Anytime you kind of change your environment and you often have to join a bunch of people who you perceive to have substantially greater knowledge than you in the field is a massive opportunity for you to feel kind of imposter syndrome. And it's perfectly normal, I think, to feel it. And I felt it myself on many, many occasions.

I honestly believe that you overcome imposter syndrome through, first of all, you have to be in the right field, right? If you are not feeling comfortable in the field you're working in, if you don't have that passion for it, if you're not interested in it, it's likely that that imposter syndrome will continue because the incentive is not there for you to become more knowledgeable and to become more fluent at it because you don't enjoy it, right?

If you are lucky enough to be working in a field that you love, you get over your imposter syndrome through good hard work and collaboration with your colleagues and learning, right? Where you go through enough experiences that you get that pattern recognition so that you can say, you know, I've seen this before. I know what to do. And over time, you start to have success from that. And that's the confidence building, which helps you get over imposter syndrome. So I mentioned previously, like, this period when I was learning to code and where, like, I got to a point where I realized, hold on. Now, when I'm typing my lines of code, I actually expect them to work. That's, like, a huge step up for me. That was, like, step one of me overcoming my imposter syndrome, right? Because then I could get on calls with my colleague and I could, like, live code and not be worried about embarrassing myself, right?

So there's various ways in which you can overcome it. But the first thing I would say is you're probably not going to overcome it if you're not working in a field that you enjoy. If you are working in a field you enjoy, it's hard work and learning. And, you know, putting yourself out there and making the mistakes that help you learn is what helps you overcome imposter syndrome, I think.

The future of data science vs. software engineering

I think just the one topic which I want to get across to people, which relates to something that we spoke about earlier. But, you know, I am – I have a little bit of a concern about, like, what data science is becoming in the kind of world of large language models, right? And I see a lot of people who are data scientists, but what they're actually doing is they're just – they're really doing software engineering, right? They're interacting with language models and trying to get them to do stuff for them. But if you look at the code that they're writing, it's software code. It's not data science code.

For me, like, data science is working with data, understanding the patterns in that data, using those patterns to drive insights, and taking those insights to your business or your organization to, like, have impact. And I just see that gray area coming up now where a lot of people kind of have the data science label, but really everything they're doing is software engineering. So – and that's fine, right? Some people might prefer the software engineering, and that's cool. They should go and do software engineering. So one of the things for you guys to think about in your career is, like, as we move towards this idea where people are asking data scientists to interact with large language models, how much of the data science part of your job are you giving up in that? And are you actually moving more towards a software engineering path in the work you're doing? And are you happy with that? And if you're not happy with that, how do you get yourself back onto the kind of data science track where you're – where I think you said it, Libby, when we connected earlier, you know, the science word, right? If you're wrangling large language models for software purposes all the time, you're not actually doing any science, right? So that's something for you. I'll leave that as a thought, a career thought for you guys to think about, I think.

If you're wrangling large language models for software purposes all the time, you're not actually doing any science, right?

Why regression still matters | Keith McNulty | Data Science Hangout

Transcript#

Keith's path into data science and LinkedIn presence

Learning resources and self-retraining

Keith's books on regression and network analysis

Extracting personality traits from text

Balancing self-learning with work

Network analysis in people analytics

AI skills and trustworthiness

Measuring talent and white collar productivity

Moving from academia to industry

Imposter syndrome and career transitions

The future of data science vs. software engineering