Resources

Pure math to data science at YouTube | Mrinal Raghupathi | Data Science Hangout

video
Feb 27, 2025
58:48

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout, everybody. If we haven't met, I'm Libby. I'm a data community manager here at Posit, and I host the Data Science Hangout based in San Antonio. If you're not familiar with Posit, Posit builds enterprise solutions and open source tools for people who do data science with R and Python, like me. We're also the company formerly known as RStudio. If you've used RStudio, you've used Posit. I am joined by the creator of the Hangout and my co-host, Rachel. Rachel, do you want to introduce yourself?

Sure. Hi, everybody. Nice to see you. I'll be hanging out here in the background in the chat and helping with logistics while Libby is the main host today, but so nice to see all of you.

Thank you. Rachel's in Boston, covering the East Coast, if anyone's in Boston. I'm actually at the Posit office today, if anybody's in the Seaport area.

Well, if you have not been to the Hangout before, welcome. We're so glad you're here. The Hangout is our open space to see what's going on in the world of data across all of our different industries, to chat about data science leadership, to connect with other people who are sort of facing the same things that we are. And we get together here on Thursday, same time, same place, every week. Well, mostly every week. Sometimes it's Thanksgiving. If you're watching this on YouTube, by the way, and you want to join us in the future in person, there's going to be details in the description box below to add it to your calendar. Just a note, if you are adding it to your calendar, please make sure that it adds for 12 p.m. Eastern U.S. time.

I want to say thank you to everybody, first of all, who has made this the friendly and welcoming space that it is today. We're all dedicated to keeping it that way. So if you have feedback about your experience today, good or bad, we really want to hear from you. You can always find Rachel or I on LinkedIn or Blue Sky as well. At the Hangout, we really love hearing from you live. It doesn't matter what your years of experience are or your title or your industry or what languages you use or you don't use. We would love you to participate today because this is a community-driven discussion. If you don't ask questions, the discussion doesn't happen.

Also, if you're looking for a role, please let people know what you're looking for and where you are. And if you have a role that you are hiring for or that you know about, please share it so we can connect our community members with open opportunities that fit them. All right, with that, let me tell you how to participate. So there are three ways to jump in and ask questions. You can always raise your hand on Zoom. We will call you to jump in.

Registration for PositConf is open, but also the call for speakers for PositConf closes tomorrow at 11.59 p.m. Eastern time. So get your proposals in. It's just a 60-second clip of you talking about what it is. It's really easy to do. I promise it's not that scary. You can do it.

Introducing Mrinal

I'm so excited to be joined by our co-host today, Mrinal Raghupathy, Data Scientist at YouTube. Mrinal, it's so nice to have you. I would love for you to introduce yourself, tell us a little bit about yourself and then also what you do for fun. Sure. Thank you for inviting me. I really love these things where I get to chat with people. I would absolutely encourage people to ask questions. I love talking. That's both a blessing and a curse in my role.

I'm Mrinal. I work at YouTube on the data science team. I've been there for about four years and I'm currently what they call a tech lead, which means I manage data science work and projects and initiatives, but I don't manage people. So I used to actually be a people manager. I used to be a data science manager. I did that for a while also at YouTube. Prior to that, I worked at BlackRock, which is an investment management firm, where I did a mix of quant and data science and actually some sort of software engineering type stuff. And prior to that, actually, maybe mention San Antonio. I used to live in San Antonio. I worked for a company called USAA. I was also there for like four years. And there again, I did a mix of like quant and investments and model related stuff. And prior to that, I was an academic. So I also have done that. So I've done like a variety of things. I try to then tell people there's some structure to it. I look back and I make it sound like it was planned that way. That's what I do.

And then for fun, what do I do for fun? I hang out with my kids. That's fun. And I take up hobbies every year. So for example, this year's fun hobby is 3D printing. So that's what I'm doing right now for fun. And then the other like long lasting hobby of mine has actually been, I love building Lego and doing things with Lego. So I do like all things Lego. I do like digital Lego and I do like my own Lego and I do like sets and I buy and sell Lego. Like I do all kinds of Lego stuff. That's like a big thing.

Data challenges at YouTube

I would love it if you could help us out with a little bit of an example of the types of things that your team tackles as far as data challenges. What problems do you solve? What type of data do you use?

Yeah. I would love to talk about that. So I'll actually talk about it in a slightly broader context than just my current role because I feel like there's a lot of different things I messed around with. So for example, right now I think a large amount of what we do, and I think this is true generally of like tech versus finance, is we spend a lot of time looking at user activity and user behavior. A lot of big tech firms, YouTube, Google, you know, Facebook, all of these firms, they have a lot of users. And so I think the broad categories I would break that down into is there's a lot of need to just understand, right? There's just like, let's just do, I call it like EDA on steroids, right? Let's just understand what's happening. And I think one of the things I kind of tell people is like, this is like the ultimate, there's like a data science dilemma that happens all the time, which is like you are in a job where you complain about how much you don't, you don't have enough data. And then like you're in a job where you have like too much data, like, oh my God, there's so much data. So there's always this like tension between too much and too little.

And I think one of the patterns I would say is like, we spend a lot of time, I think, making sense of really, really large data sets in a variety of ways, right? Either through really good EDA, actually making good charts and like aggregating data is like a skill that I think is really valuable. And then we spend a lot of time doing, I think, a blend of sort of experiment analysis. There's big tech firms run experiments and causal inference, right? So you often want to understand that this is just a correlation or there's actually some causation to it.

And then sort of going back a little bit further, like in finance, I think it's very much about understanding what's going on in markets versus, it's like people versus markets, right? You don't get to run experiments. You have to basically rely on observational data. You have to like sort of wonder like, did this event actually cause the market to move or was it something else? You try to understand risk and you tend to build like risk models and you tend to sort of think about that. But again, the common thing there is you often just have a lot of data, a lot of market data, and you kind of make sense of it. So there's sort of, again, a lot of EDA on steroids that I've seen there.

The thing I do not do is I don't do a lot of like ML engineering type, let's build like a deep neural network kind of data science, which is another flavor of data science. I've dabbled in it. I dabbled in it occasionally, but as part of my role, I don't really do that.

From pure math to data science

Well, speaking of dabbling in things, I love math, but I didn't get to do math in college. You did pure math, a PhD in pure math. I'm curious about how your transition from PhD in pure math in Texas happens to data science. Can you talk a little bit about that?

This is the part where I'm going to make it sound like there was a lot of deep thought and planning, and it's actually probably all correlations. So I started, as you said, as a very, very, very like pure, the purest of pure mathematicians. In fact, so pure that I even wrote a book in pure math, like that's how extreme it was. And I think it was over time, I think I just sort of happened to run into people, and I think this is a big part of my experience has been like, you just sort of run into people who sort of just help you think about things, help you think differently. They sort of shape your view of the world. And so I was like, down this like very pure math part, and I'm like, okay, that's what I'm going to do. And I happened to run into sort of just a couple of folks, first at Vanderbilt during my postdoc, who were pure math folks who were sort of doing more applied math. And I was like, you know, going to their seminars and just seeing what they're doing, like all this kind of cool stuff, and they're doing sort of signal processing and things, which is, again, a form of data science and a form of data analysis, and you can see applications. And then there was sort of work being done around like compressed sensing. So there was just like these applications of the things I knew, and that sort of opened my eyes like, oh, this stuff, you know, the pure stuff when applied is like really cool.

And then I ended up in an office next to a colleague of mine who was very into topic modeling at that time, and sort of that kind of like machine learning and data science. And we just started talking, right? And I think this is where like, I really emphasize to folks that I meet like, data science is not about data, it's about the people. It's actually not a profession about data, it's a profession about people. And we started talking, and I got interested in like this stuff, and I was like, this clustering stuff is so cool. And then I started learning some tools. And, you know, I was fiddling around with pandas, like early versions of pandas, and early versions of these things, and just like messing around. I just got into it. I was like, could this just be my job, right? Like, could I just do this instead? And that's how I sort of ended up as a data scientist.

And I think this is where like, I really emphasize to folks that I meet like, data science is not about data, it's about the people. It's actually not a profession about data, it's a profession about people.

And then I think you just sort of take the shortest or the most convenient part you can to get your goal. And so I was working at the Naval Academy in the math department. And so you can sort of see like military, then go to USAA, which is a company, but very focused on the military. And then I was like, what kind of quant do I want to be? Let me like, let me grow up a little bit. And I was like, I kind of wanted to investing. And so within USAA, I kind of went to the investment part of USAA. And then I was like, oh, this is really cool. I really like investing and data. And then I went to BlackRock. And at BlackRock, I kind of learned, you know, I really like the technology, and I like the data management, and I like sifting through the data a lot. And I want to do that at a bigger scale, tech company. So that's kind of how I ended up on this journey. I actually had no idea when I started. I was like, let's just go that way.

People management versus individual contributor

Mike, I saw you had a question in the chat, which is about management. Are you available to ask that? So it was interesting to hear that you've said that you've done both people management and now tech lead. One question was kind of, if you could give advice to younger you who's a people manager, or just about to become a people manager, what would you tell? Or would you say, yeah, go do the people management, because it's a great thing, and you'll learn lots about dealing with people. Or would you say, don't do it, become the technical lead?

If I could go back and give my younger self some advice, I would say, wait as long as you can to be a people manager. I think being a people manager was actually a very good experience. I really enjoy being a people manager. I tell people I like the people part of it. I don't like the management part of it as much, but right, like there's sort of a bureaucratic overhead to being a manager, and after a while you're just like, I just don't do this anymore. Yeah, I hate to be a manager. I love working with people. And I think in all the roles I was in, I think it was a sort of a natural thing where people would tell you like, hey, your career progress is very much about like moving from, you know, being an individual contributor to being a people manager, and these two things are very like disjoint.

And so you say, oh, well, okay, that's, that's what I got to do. That's the step. So I think if I could really go back, I would say like, hey, like, figure out how you can make this individual role in any form work for as long as you can, and figure out how to do the people part of it more and more without doing the management. So mentoring, collaborating on projects, you know, working across like a larger organization is like stakeholder, like there's all these ways you can experience that aspect of like working with people and sort of interacting with people. And like, well, even, you know, you can do all of that stuff without actually having to do the formal management. That's one piece of advice I'd give myself. I guess the other way of putting it is like, don't be in a rush, right? Like, don't just be like, the path forward as a data scientist is up this ladder, which involves write code, manage, right?

I've actually, this is the first time, sort of a weird thing in the career, but maybe this is actually the only time I have been an individual contributor in that sense, and not on a people managing part, which is a really like, nice thing. In every other role, like I went into the role, kind of, it was like already sort of in the role description that the role will begin as a IC, but it will expand to be a team. And I knew it was in my future. So I think that's the big shift in my case. And I've actually enjoyed it a lot. I've enjoyed just doing my own thing.

That sounds really, really nice. Mike, any follow ups there?

Well, yeah, I completely echo that sentiment of not rushing into it, because I think if you rush into being a manager too soon, then often you wind up being not a very effective manager. Whereas I think if you can get that experience, just as you're saying, get the experience, you know, learn about a lot of things, including yourself, then you can be a much more effective manager when it comes.

I also feel like, it's, I think there's an aspect of this, which is, this maybe goes a little bit back to like the sort of this notion I have of like the flavors of data science, is data scientists are a very varied bunch of people. It is actually quite hard to say, go manage 27 data scientists. Like, in many organizations, like it works in some places, but it's actually sort of, I think it sort of puts an emphasis a little bit on the fact that it's a, it's sort of similar, like the sort of variety of data science and actually data science problems themselves are like all sort of unique and special.

I think this is a thing, like in general, like data scientists at lots of places, like if I go back and give myself advice, you know, as a younger me, I'd be like, Hey, like, you know, are you really all the same? Like, is there a way to sort of operate in a way like where, you know, you sort of recognize that maybe people are like actually better off just doing their own thing in some way. And there's different ways to think about this. So I think that variety is definitely, it's a cool thing. It just sometimes makes like scaling management kind of tricky.

Alan, I saw in the chat, you had less of a question and more of just like additional thoughts. And I would love if you could talk a little bit about that.

Yeah, I was just reflecting on not the disagreement at all, but the sort of stay in the technical side, stay in the individual contributor side versus jumping into people leadership. I think a lot of organizations, maybe too many organizations don't know how to design a career path that doesn't lead into people leadership. And so I think I'm just sort of reflecting on that dilemma of, I'm not sure if it's a problem of scale or if it's a problem of like variety within the organization, but I think it's really hard to make a team that doesn't progress in that direction. And I'm thinking about it from the perspective of how do I develop the folks that I lead and what kind of paths do they want to be on? One of the constant questions to me is like, what's my succession plan? Well, maybe I don't want to develop the people I lead into my role. Maybe it's not what they want. So maybe to try to make it into a question, have you seen things that help to make that path possible to become increasingly technical and maybe increasingly technical in a strategic way without needing to inherit all of the things that come along with leading people who are doing that work?

I will, I think it's a really, really great point. So I think one aspect of this is if you're at a very large company, and by the way, like if you just go back to my initial like ramble of like what I said, you'll realize like my experiences, I worked at DoD, I worked at USAA, it's like 30,000, like you can see like it's a cross section of like big organizations. And so I think in large organizations, you can sort of put people a little bit on the path of you as an individual, figure out a way to scale sort of your scope and remit. And I think I've always felt like stakeholders of mine, so portfolio managers or engineers have much clearer defined thoughts, they're able to say things like portfolio manager, you want to be a, you just want to manage money, and you want to move up, manage more and more money.

On the engineering side, I think the version I've seen of this is you just manage a larger and larger system or a larger and larger problem or a larger and larger technical team. I feel like data scientists, our equivalent version of that is because going back to like, I feel like we're really about the people and the data, it's not manage more data, or like make your data bigger. It's actually I think, trying to figure out like, can you actually manage a larger organization? Can you actually like work on a more strategic problem, like you said? And I think that's, again, going back to like, sort of the uniqueness of data science.

I feel like every, almost every group or every company has to like work this out for them, almost for themselves, which is like a really bad answer. But I feel like it really often boils down to like, creating that like scope around like, the organization you're willing to like influence or work with, or the sort of the complexity of the technical problem you're willing to work on. The flip side of that is, it actually, it's quite challenging for the person themselves, right? So it's also challenging for the person who says, I don't want to go down this path of manager, to then say like, hey, we're gonna put you on a slightly untrodden path.

And one of the things I think, you people, I've talked to a couple of people, like you sort of have to give up a little bit is, you know, someone said this to me really well, they said, they were an engineer. And they had a colleague who was not an engineer. And they said, when we started, like, at the bottom of our like, ladders, or whatever, like we were starting our careers, we did very different things. And they basically been on the same teams, but both sort of moved up their ladders in parallel. And they said, like, the thing we've noticed is like the things we look at, and the things we focus on have like converged. And I think this is one of the things like, again, for folks who want to go down, I think, a non management path, or even a management path is like, there's this notion of like, you will actually just have your headspace more in business problems and strategic questions, and sort of thinking about things and like influencing people and your headspace will not be in what package am I going to be using? What's the latest, coolest data science technique. And that is often like, I think I've found that myself, there are days where I'm like, I just want to sit here and write some code. I just do that.

Causal inference versus correlation

Amrit, you asked a question that has since blown by in the chat. Would you like to unmute and ask that?

Sure, yeah. Thank you. So, Mrinal, I just noticed that you mentioned you have worked on causal use cases. I'm just starting out in my career, I would say. I have around, I would say, four years of experience, but I still kind of don't understand what is the difference between a use case which requires a causal approach versus an ML or an AI approach. And if that is even a valid question or...

It's a very valid question. And it's actually, I find this is one of those things that even I, when I first started thinking about this, I was just like, hey, what is the difference? I think the main thing, I would say, if I had to... Actually, by the way, I would say there's a really nice book called The Book of Why, for anyone who's interested. Judea Pearl's book. And it's a really... I think I can say this, in the data science hangout, it's a fun read. And I think The Book of Why just articulates some of the stuff really well.

I think what I would boil this down to is that classic phrase that data scientists use, which is, is it causation or correlation and how we mix up those things? And at one extreme, you have purely observational correlational data. And at the other extreme, you have randomized controlled trials, like really well-controlled experiments. And then you've got a spectrum. There's a spectrum of stuff between that. And what I typically have seen is ML models, so predictive models, are more on the correlational side than they're on the causal side. So the question is, even if you've built a predictive model, and I have a kind of a silly story about this, which I can tell, which I'm probably going to tell. You can have a predictive model, and then it can do a really good job of prediction. You can plug in the input, and you can actually do a really good job of determining what's the value going to be or what's the value. But it's still, in some sense, correlational. It doesn't tell you whether that input actually caused the thing to happen. And so causal inference starts to get the questions of, OK, if we change the input, if we changed some factor, how much would that impact the result?

And there are situations where you just can't. You want to run an experiment, and you just can't run an experiment. Like in public policy, for example, you can't force people to smoke. You can't force people to go to a bad school. You just can't force people to do certain things. You can't do unethical things. And so that's a common case where you can't really run the experiment you want. You can't be like, hey, we'll have you guys smoke for 10 years at random, and we'll have you new people not do that. We'll see what happens. And so you say, well, what do you do? So you say, well, you could probably build a model to predict mortality. I'm just sort of a life insurance example, going back to my USAA life days. You sort of predict mortality based on inputs. You could look at the factors, and you could say, OK, this predicts mortality. And you could include smoking as a factor and see what happens. But then you could also wonder, well, maybe people who smoke also tend to do other things. And so you could sort of say, well, if I control in my model for these other things, I get a better estimate of how much that impacts mortality or lung cancer rates.

So this causal inference often gets embedded in modeling as controlling for certain covariates or confounding, et cetera. That's sort of like, it's also on a spectrum. It can be a little bit squishy. The sort of random funny story, which I cannot resist telling, is my very first job, I was looking into a model from a third party. So I can tell the story because it wasn't the company I was working at. It was a third party who showed up. And we were evaluating the model. We were checking, our job was to check, does the model we are being given actually work? And we were messing around with the data. We're like, wow, this model is really good. And it turned out one of the factors that was very predictive was, and I'm not making this up, the size of the second person in the household's t-shirt. You're like, what? And it's really highly correlated to some other factor of the household. Like, who is the second person in the household? It's probably, you know, maybe in that sample, like, maybe it's one of the spouses, right? And so you're like, well, is it just the fact that one gender is taller? Is it the fact that the higher income person, like, what's the real driver?

And I've seen other versions of this, right? There's versions of sort of models where you're like, well, zip code. Zip code's a very strong predictor. But zip code, is it causal? Like, can you change someone's zip code? Can you do anything about that? So going back to sort of the other aspect of this, which is, I think, a common theme across, like, a lot of data science is ultimately whether you're data engineering, whether you're building tools, whether you're doing inference, modeling, AI, deep learning, whatever you're doing, you're probably trying to help someone make a better decision with the data they have. Like, wherever you are in that stack, if you're like, hey, I'm cleaning up the data, why are you cleaning up the data? Because you want someone to have a better signal.

So one of the things I think is, in that whole thing, whatever it is you end up with, you want it to be actionable. And zip code is one of my favorite examples, because I'm like, you know, models that have a strong, like, dependence on a zip code, when you go to like, okay, what do we do? And I've actually been in a meeting where this happened, where someone's like, so what do I do? Like, I can't change someone's code. I can't get them to, like, move. Like, is that the action here? So I think this is sort of a bit of a sort of a couple, a double-edged thing, right? There's two sides to it. One is sort of the, on the technical side, what is causal inference versus what is modeling, and it's sort of correlation versus causation. But then there's the other side, which is when we actually do it, it tends to be about, like, are we actually able to action, and do we want to actually make a change in some way? And do we know that change will actually have an impact, or are we just observing something in the data?

Experimentation at YouTube

Rachel, do you want to hop in? Sure. Well, Mrinal, I was thinking about, like, it's rare that we have a company on where so many people have probably interacted with that company. And I was just wondering, would you be able to walk us through an example of something that your teams worked on that we might have been able to interact with, or been the end users of?

Yes, so I will keep this discussing. I will try to tread lightly here. But because everyone is interacting with YouTube, it's actually, so I do, I work at YouTube. And I think because I work with YouTube, the beautiful thing about working at YouTube, when people say, like, can you point to something? I'm like, yes, it's highly visible. So I've actually worked on two products at YouTube. One is YouTube TV, which is a TV stream. Think of it as, like, cable on YouTube. So that's something I've worked on. And a lot of the work there has been around, like, you can sort of, it's a paid product, like you pay a subscription. So a lot of the work that has been sort of much more focused on, you know, things like, how do you grow your subscriber base? How do you get more people into the product? How do you sort of, how do you target and incentivize users? So in some sense, you know, a lot of that work is around, like, growth, thinking about marketing, things like that.

Currently, I actually work on our, on the generative AI side of YouTube. And we have a couple of things that if you live in the US, you can play with on YouTube Shorts, and you can, like, do some fun stuff with like gen AI. The common thing, though, it's really all about, like, users and user behavior and what they're doing in the product and things like this. A lot of the questions often boil down to questions of, you know, what should we be measuring? And what metrics should we be looking at to sort of, like, drive our product growth?

So that's kind of what I work on right now. A lot of the actual work involves looking at experiment data and trying to figure out like, we ran an experiment, it's an A/B test, we're running lots of A/B tests. And one of the questions about, you know, it's interesting, like, again, sort of the data science aspect, there is not so much like, how do you set up an experiment? Or just how do you run an experiment? It's actually helping people make decisions with the experiment data. Right. So there's actually a great plug another book, there's a great book called Trustworthy Online Controlled Experiments, it's by a few people. And it's actually all about like, big tech and how big tech companies use A/B tests. And it's just a it's a great insight. I think that book is a great insight into like, what people at Google, Facebook, YouTube, Amazon, like, think about and what they do. But I think it's again, that a lot of the value add is, I think, about helping people just interpret the data that they have.

So we run an experiment, you know, like, okay, what should we do? The experiment said this, what should we do? There's the flip side of that, which is, if you want to do something, what experiment should you run to gain insight into whether or not you should do it. And again, sort of using a using a sort of silly example, right, there are like experiments that are impossible to run, right? Like, you can't run an experiment where you just say, let's just switch off YouTube. Some crazy idea like this, right? Like, you can't run an experiment, just like you can't run the experiment where you like force people to smoke. And so there are these interesting problems where going back to this sort of causal inference aspect, you want to understand what should we do? What change should we make in the product to maybe improve the user experience, make the users happier with the product? And the two questions that come up there are like, well, what does happy mean? What is how do we measure like, whether we've actually done that for our users? And the question of like, well, what should we do to actually test that out is a lot of what a lot of the work involves.

Sharing institutional knowledge

One of the Slido questions was talking about analysis and experiments. So the question is, how does that knowledge get to PMs and executives and other people? So the institutional data science knowledge sharing, we have all of these analyses and these experiments that we do over the years. How do, what are best practices for getting that into the hands of the people who make the decisions?

So the silly story is really one of the roles I had at BlackRock. We spent a lot of, one of the projects I ended up sort of starting was, it was sort of, this is going to sound really silly. I was like, you know what, we're making this chart in every single deck that we show, every single slide deck that we show our chief risk officers. And we're going to make this chart. And we're going to show our chief risk officer. And everybody makes this chart. It's the same chart. Everyone makes this chart slightly differently. And everyone like has a customized process to make this chart. So I was like, you know what, we should just automate that. We should just have like one way to make this chart. And two years after I did this, I remember being at some team event and like this chief risk officer walked up to me and he said, are you the librarian? And I'm like, I'm sorry, what? He's like, are you the guy who like makes sure the charts are the same and controls which charts end up in the library? And I'm like, actually, it's I am.

And so the same thing sort of goes to like institutional knowledge, right? Like this, usually it's like a library of like stuff we know. We know a lot about what's happening in our experiments. We're running lots of these. We have tons of observational data studies. And you know, if you're working in any large organization, you've got you are going to end up in a silo, right? You're going to become the expert in your product. You have to be. You have to go deep. You can't know about everything. And then the question becomes like, how do you like share all of this knowledge, you know, in a consumable way?

I think there's two things I feel work, not perfect, but two things we're thinking about in sort of like communicating, like, this is an aspect, I think there's two parts to the question, maybe like, one part is sort of like, how do you sort of not leak that valuable knowledge, right? The knowledge of the data knowledge of the organization, how do you stop it from like leaking out or like being lost? And then the other part is like, how do you like, sort of crystallize that, like, make it more succinct and like, take it up to like people?

I think two things that help with just like, preventing like a data, I wouldn't call it data leaking, it's more about like insight leaking. One is automation, right? The more time people spend doing bespoke things, inventing their own way of like making a chart, pulling the data, the more time they're spending on that, the less time they're spending on the what is my insight? Because, you know, you only you want to preserve those insights. And then how do I make those insights like, presentable? How do I make them digestible? So it's almost like it helps people shift their focus from time spent on the parts that could probably be automated to the parts where like the human really should be involved and like doing more of that like insight seeking and like communication. The other piece I think is actually just like being a librarian, being like, hey, do we have a librarian? Does our data science org actually have a library of knowledge? Like, do we store this in some place? Do we actually have a repository of all of this stuff? Does everyone actually put that code in the same like source control?

I think when it comes to like, sort of taking things to stakeholders and taking things to like executives, this is actually really, I still find this personally, like one of the most challenging things in my role. You know, I've done this for a while, I still find it the part that's the most challenging, which is what you got to almost figure out like, what question do they have? And then you have to sort of know, like, what do they care about? And it can be really challenging, because you've been in this data for so long. And you don't, you're not in their role. And then you're going to them, and they've got all this context that you don't know, and you got to figure it out.

So I think a lot of it is about just taking a step back. And actually trying to think about like your organization the other way around. Right? You're like, well, first of all, like, what does my organization do? I can actually help you just take a step back and be like, okay, let me restate what my organization does. This is the this is the mission statement. Okay, let's go and actually talk to a product manager, or a portfolio manager, or whoever my counterpart is, and actually ask them, hey, what is the vision deck or document that you created around this product, like actually go read what they are presenting to their executives, right? So often, we're very focused on like, here's my document, and how it's going up to executives. But what are the executives seeing? Or like, what's the product manager seeing on their side? Like, actually go and like, look at it. And ask yourself, like, well, what are the words they're using? And how is it being presented?

And the one other thing I would say is like, two things no one's ever complained about. No one's ever complained about a deck having too few bullet points. And no one's ever complained about a meeting ending early. Two complaints I've never heard. I've never heard someone say, I wish you guys had a third bullet point on this slide, would have been so nice. And so often, like, it's a heuristic, but it's like, you could just sort of ask yourself, am I taking away from my presentation, and it's very hard to do as a data scientist, because you have poured your life into this project, you have worked hard to get the data, it's really hard to synthesize the data, you have worked hard to clean it up, you have battled with Python environments to figure out how to get scikit-learn to run with this GPU, this stack and whatever, you've done all of that, and then you're like, wait, you just want me to put one bullet point on this slide, but I have so much stuff. And I think that's actually a really hard thing as a data scientist, that is the thing that as data scientists, we just have to learn sort of in our own best interest, is to just, and this is just an opinion I have, is just to say, like, look, yeah, that works there, and people will know that that work is there, and that's part of, like, what we do, but at the end of the day, it's okay to take away a p-value, it's okay to remove extra data points, it's okay to remove some lines from a chart, like, that's all fine, like, it's okay to just take some of that data away, because you really want your point to come across.

No one's ever complained about a deck having too few bullet points. And no one's ever complained about a meeting ending early. Two complaints I've never heard.

Networking strategies

Sol, are you available to ask, because you had two questions, and I will let you pick whichever one is most important to you to ask. Yeah, please. I can't remember. Sorry. Okay, so one was about A/B testing. Like, how do you recommend people get up to speed with causality? And the other one was, how do you network at Google? I love both of these questions. I love the networking one. Yeah, I think Mrinal talked causality and A/B testing a bit, and good recommendation, so I'll go with networking. Yeah, Mrinal, I think just, you know, being at YouTube, sometimes things are confidential, teams are siloed. How do you personally go about networking and connecting with a data analyst on, like, knowledge sharing?

Thank you. I've done a couple of things. So one thing I've actually done is, and again, this is sort of some random stories that I won't get into, but I do force myself, in a sort of systematic way, to talk to people. And I'm not kidding. One of the methods I used was, I literally went and I looked at the org chart, and I put people's names in a spreadsheet, and I used a random number generator. And I was like, okay, like, I'm going to talk to these people this quarter, right? And I just would set up time with them, and I would say, hey, this is, there's no agenda. I just want to get to know you, what you do. It's a coffee chat, whatever, right? At that time, it was COVID, so it was a little hard to do, like, in person, but I was like, you know, sort of virtual coffee, just like, I will just randomly pick people and get to know them. And I think, I don't know, it's like the ultimate, like, I don't know, data science approach to networking is when you randomly select a random sample.

But I think it's effective, because it's sort of, it allows you to meet people at different levels, on different teams, doing different things. And because you don't really have a selection bias in who you've picked, you also go into the conversation very, I think, more open-minded. Like, I'm just, I ran into you on the street. I ran into you somewhere, and I'm talking to you.

The other thing I try to do is, I do the same thing at, like, organizational level. So, like, I work in YouTube. I will go network with someone over in, like, a different part of Google. Like, I will, and the way I do that is, I have, there, you have to, you can't really do random. I find it, I will talk to someone I know, and I say, hey, do you know someone that I could talk to? And then you sort of use that network to say, like, how do I get to know more people? I am very thinking about, like, if there's an off-site or a summit or whatever, I'm very thinking about, like, I wanted to be there in person. I will try to be there in person, because I think that's a very valuable thing. And the one thing I tend to do, and maybe this is advice I shouldn't give is, when I go to conferences and things, I actually don't go to many of the talks. And I know that's sort of, sort of bad to say in some ways, because, like, the person who's giving the talk has spent a lot of time preparing that talk and really, like, put a lot of time into it. But I find, like, you've got to balance between being in the talk and just the hallway conversations with people.

So there's sort of, like, a mix of, like, systematic and, you know, a network of, like, hey, do you know someone? I find that's been fairly effective. I, one year I did this thing where, I actually write this down at the beginning of the year. I actually set personal goals at the beginning of the year, professional personal goals around networking, just as an aside, right? I take this kind of more formally than many people do, is I actually, I don't list it in my official, like, manager thing. I'm not, like, I don't go to my boss and say, hey, this is my objective for the year. But, like, on my own sort of personal, like, career goals thing, I actually will write, like, networking, what am I doing this year? One year I did a version of this, which is what was called a meet, meet an engineer, sort of, like, meet more engineers or something. And there I was, like, you know what? I'm going to meet one engineer, not a data scientist, one engineer every month for this year. That was my goal. And I did, like, 10. And so I tend to be very, like, systematic in how I try to set these things up. Because it's one of those things where, like, I feel like if you're not systematic, at least for me, it just kind of doesn't happen. Just like everything else, if you want to network, you have to set a goal, and then you can try to network. So that's kind of how I manage it.

And by the way, it is this approach that actually got me into data science. The reason I went to someone else's seminar and not my seminar is because I was like, okay, I'm doing a postdoc. What should I do to get to know more people in my department? Go to their colloquia. Pick a colloquium at random. Go to it. So it's something I think you can use in any organization, any community.

Closing thoughts

All right, Mrinal, our hour has gone by so fast. We literally have one minute left. Do you have anything else to leave us with as far as career advice besides your networking advice?

It's 30 seconds. Yeah, my 30-second pitch on that is, it's called data science. A lot of it is actually about the people. So as much as you spend time on the data, make sure you carve out some time for the people aspect of your role and whatever it is. Networking, communication, slide deck making is a big part of every job I've been in. I don't know about anyone else, but like, we're making slides, and often it's the part that's not data science, but it is kind of about communicating with people.

So I think it's the thing I would say is like, the common aspect of data science that I've seen across all of it is data and people. The science is the part actually that's very rare. It's the data and the people that sort of stays constant, and those challenges and opportunities and the things that are fun and the things that are not so fun are all kind of in the data and the people side. The science, I mean, it's just the fun part where you get to do all kinds of things.

The common aspect of data science that I've seen across all of it is data and people. The science is the part actually that's very rare. It's the data and the people that sort of stays constant, and those challenges and opportunities and the things that are fun and the things that are not so fun are all kind of in the data and the people side.

Yeah, I think the people is a fun part too. Well, thank you so much, Mrinal, for being with us. This conversation was amazing. Everybody have a great day, and please remember that next week we're going to be joined by John Stanton-Gettys, Senior Manager of Advanced Analytics Marketing at HP, and it's going to be a great conversation. So I will see you next week, same time, same place. Have a good one, everybody. Bye.