Data Science Hangout | Kobi Abayomi at Warner Music Group | Adjusting Metrics Across the Business
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you all so much for joining today. Welcome back to the Data Science Hangout. And for anybody who's new to the Data Science Hangout, this is really just an open space for current and aspiring data science leaders to connect and chat about some of the more human-centric questions around data science leadership, but really to focus on questions that are most important to you all. So if you haven't been here before, there's three ways to ask questions. You could jump in live, put questions in the chat. And Rob, actually a quick note, if you can message Monty for the Slido link. I realized he didn't send it to me. But also this will be recorded and shared up to YouTube for anybody who missed it.
But I'm so excited to be joined by my co-host for today, Kobi, who is the Senior Vice President of Data Science and Analytics at Warner Music Group. And Kobi, it'd be great to maybe just have you start by introducing yourself and sharing a bit about your team and the work that you do.
My name's Kobi. Right now I'm the Senior Vice President for what's called Research and Analytics at Warner Music. It's a music company. So this is one of the places in the world where people off the street are turned into artists and people who like music are turned into fans. These days, the delivery of music is quite different from what it's been historically. And the vast majority of money that comes into any music company label like this is through electronic listening, right? So the way in which we consume media at large is mediated by the delivery.
So media companies, this one in particular, now with this, once something's digital, it's automatically leaving a trail or record or an experiment or observations. And so it's a natural place for data science work, which is concerned with being able to tell what's going on from reams of data, from observations. So the experiment that we're trying to model here is who listens to what, right? And so what is a big deal, how to categorize and quantify music in an information way, and then who?
The problem that media companies have been wrestling with as people's behaviors change from retail, going into a store and buying something to digital, being able to understand audiences, what they like, how to curate them, how to market to them, how to address them in a way that they're choosing your content and that they're choosing your content to put their eyeballs and ears on in the overall sort of span of competitive media or attention sources.
So that's where we exist. Part of it is understanding music and part of it is understanding people. And for anybody who's in a media company, a lot of the things that I imagine we'll talk about in this hour will be familiar. Our work on audience is familiar to people who work at a Comcast or Warner Media, where I used to work before this, or any of the ad tech space.
An example problem: understanding audiences
Kobi, while we're waiting for some questions to come in, I think it'd be really cool to hear about an example of a problem that your team's working on right now. Right. So let's talk about a problem from the perspective of audience.
One of the things that music companies are good at, have been good at, is sort of curating a fan base, right? Like Artist X, these are the people who listen to Artist X and sort of keeping attention and turning a young person into a Dua Lipa, say.
And my team right now is focused on being able to extract from the information we do get from the people who deliver our music a refined and high resolution sense of that fan base and that audience. We receive from the people who distribute our music, these are the Spotify, the Apple Music, the Amazon, information on, and I would say even down to the transaction level.
So really high frequency, really highly resolved information on how people are interacting with our inventory, our property. But masked and blinded and a lot of the information sort of left out, right? And anybody who works in media will be familiar with this conundrum. When I was at WarnerMedia, people would watch our shows, but they would watch it on Comcast. Comcast has a box on your, you know, on your mantle. You have the relationship with Comcast. Comcast knows what you like in a really high sort of frequency sort of way. We get the information back from Comcast.
A lot of that high frequency information is left out, right? You get sort of aggregate way. That's the same thing here. Even when we have information down the transactional level, like, hey, somebody listened to 945 in the morning and the Houston DMA to Dua Lipa's track. We don't know who it was. We don't know anything else about them. We just are able to observe the consumption. So we're really strong on observing consumption.
And so a big part of what we try to do is join consumption patterns to what we can infer about people and their behavior. A lot of this is known in ordinary visual media as fusion, right? Probabilistic joining. How do you take a data set, resolve an individual level, and join it to other data sets with information about people and populations, strata information? So we spend a lot of time figuring out how to join what we can see and what people listen to, to saying things about who they are.
From analysis to research
And Kobi, what does your team look like? How many data scientists do you have or are you currently still building that team as well? Yeah, yeah, yeah. So I'll say this. I walked into a team that, so the name of the group is research and analysis. And I walked into a team last year that I would say more analysis than research. And not hesitant in saying this sort of thing. What is data science, right? We have only been calling what we do data science 10, 12, 13 years at the most.
A guy who I used to work for back when I was a professor, Jeff Wu, is credited with coming up with the name data science and incorporating the computational aspect of statistics into the name. And that name's really taken off. A lot of organizations, as they're beset with, I'll just put it so straight, the wolves at the door, the people who have these walled environments and have really good consumer information. These are the Facebooks in the news these days. These are the Amazons. These are the Netflix. Realize that they needed to create their own data science organizations and name them as they will from coming from whatever space they came from.
Most of what went on prior to 2020, I'll say, was more sort of analysis. That's people taking consumption side data, univariate data really, right? How many people listen to this track? How many people listen to this track this week versus last week? And packaging that up into reports. And so what we've been doing this year is going from analysis to actual research.
And the way that I like to say this to the team is it's cause and effect. This is where statistics can be useful. If you do this, this happens. Conditional expectation. This is the fundamental utility of tools like regression. Being able to explain demand side numbers in terms of the levers and buttons that a business can press. And more than that, the exogenous factors that affect things, right?
And so we spent a lot of time this year standing up not just a model to be able to place demand in context of business action and ambient effects, but the internal pipeline is necessary to be able to do so, right? Like when you want to do something at scale, it's one thing to do it once in regression. It's another thing to do it every several hours or do it instantaneously and have the data pipeline so that things are moving correctly. And so we spent a lot of time this year standing up a proper data science pipeline.
Data engineering and data science
Kobi, last week we were chatting a bit about data engineering and then data scientists and wondering at what point do you hire data engineers or data scientists? Do you have to have data engineers first to be able to make sure everyone has access to data?
Yeah, that's a great question. Yeah, the answer is yes. Yeah, the answer is yes. You know, excuse me. We were lucky to walk into, at this place, a pretty strong data engineering team that had been dealing with the volume of data that they get from the distributors. Again, this is the Spotify and Apple Music world. So we didn't, you know, we didn't have to construct that and that wasn't within, that wasn't one of the bricks that we needed to lay.
Having said that though, data engineering is more than just data munging and pipelining and stuff like that. It's illustrations, it's display, it's using technology to be able to push things back out. And so we have hired a couple people and rely on people who are strictly data engineers. You know, some organizations have these tasks separate. We're lucky to have here people with those skills in the same group so that the conversations always happening and going.
And it guides a lot of the modeling, right? Like these days, with the sort of model that we're using, and we're a Bayesian shop, which relies on a fair amount of probabilistic computing, the data engineers, knowing that from the beginning, right, can construct systems and compute methodologies that match the methodology that we're using, sort of the theoretical methodology. I've been places where those are totally separate, but it's good to have them within the same team here.
Becoming a Bayesian shop
Yeah, for sure. That totally caught my ear when you said we are a Bayesian shop. I think I mentioned on this meetup a few weeks ago, maybe months ago, that I'm always trying to become more Bayesian. So I'm kind of curious, like, did your group just start out that way? Did you hire a bunch of people who are really into that way of thinking? Is there any backstory?
Yeah, there is. You know, back when I was a little boy, one of my first advisors in grad school was Andrew Gellman. And when I did a postdoc, I did it at Duke, which is a big Bayesian place. And as data science has matured, and honestly, the computation on that side has also matured, I've always wanted to impose that in production.
You'll see a lot of places throw, I guess, what they call sort of machine learning models at things, which I consider just sort of, you know, high volume data, frequency type of models, make an estimator once, and then recompute when you need to do it. One of the things that I wanted to do here was have a system where we didn't always have to recompute things, and where we could rely upon over time, really having well specified posterior distribution that we could just draw from for inference, for explanations of the business.
Before this, I worked at a Warner Media, and we were doing random effects there, right, which is really just a sort of, you know, baby version, the smallest stacking doll inside the Bayesian world. And I was like, well, if we can do that in production, we can really do the full thing.
And what that's used for is, let me give you an example. In the model we use for forecasting, we have parameters that govern the speed of growth in listening demand to something, to a song, right? This business is often questioning, is this becoming a hit? Is this becoming a hit? Is this becoming a hit, right? And so now we have a way of having distributions to draw from for these parameters, you know, for this parameter in particular, right? Is this thing becoming a hit?
And we can compare a song's performance online immediately, right, in terms of what the posterior looks like, to our baseline for is this becoming a hit or not. And that's mediated by the full model that we use, which is a hit in countries, a different thing from a hit in pop, a hit in Nigeria, a different thing from a hit in UK. And so now the questions that the business asks are automatically baked in to the procedure that we're doing, right? There's nothing extra to do. It's just model interpretation now, right?
And so that's one of the things I preach, is you want to get people out of re-scraping data and re-computing frequency things every time to answer the experiment again, answer the experiment again. The experiment is a long process. And each one of these iterations, a track, and is it a hit? Is it merely an instance, right, of one, it's a sample from the overall experiment. What is a hit? And we want to use prior knowledge as much as we can. And we bake our, so the system is set up to bake in prior knowledge.
And so that's one of the things I preach, is you want to get people out of re-scraping data and re-computing frequency things every time to answer the experiment again, answer the experiment again. The experiment is a long process.
One anonymous question is, from following up on that, what's an example of a hit song you discovered through this model? You know, I don't know if people, so one of our music artists is Aya Nakamura, and there was a song that was one of the tricks in the music business is, you have a song that's doing well, you want to get it to do better in a local market, you park, you remix or partner with an artist who's popular locally, right? And so she was sort of, this song was shopped around different markets, and a couple of places we could see it hitting.
I can't remember the name of the song. It's in a different language. She sings sometimes in French and sometimes in Spanish, yeah. Aya Nakamura, look, yeah, look her up. She's great. She's fantastic. You'll like it.
David, I see you just asked a question in the chat as well. Do you want to, can I pass the mic to you? Sure, yeah. I mean, it's just a general question. You know, just in addition to the high-level data that you get from your distributors, like Apple Music, Spotify, et cetera, do you guys ever integrate data from other sources, like social media, to learn more about your audience?
Yeah, the answer is yes. For pop music, at least, I think a big part of what the business understands as audience behavior is social media behavior, right? Again, that's, the effect of that is different across genres. You know, I could just say, for instance, we've noticed that the classical audiences have a, almost, you know, pervert, no relationship, but perverse relationship to social media, right? Like, if something's doing well, a social media classic, it's not being listened to. So, it has different importance depending upon what the sort of music is, but yeah, we, that's a big part of one of our inputs, yeah.
Productionalized visualizations and tools
Yeah, you know, so we write a lot of stuff, and we've beaten up Streamlit this year. We've used that over and over and over again for illustrations. I'll say this, so this is, and this is more sort of the, you know, the sort of the corporate sort of thing, and like what structure should be. It's my opinion that, you know, visualizations and stuff like that, this is really sort of outside of the productionized visualization. Let's say that it's, you're pushing the remit of a data science team, right? Like, some people who are really good at that, like, that's what they really do. Like, I'm a JavaScript front-end guy, right? And limited resources, you hire a JavaScript front-end guy, don't expect to get anything else out of him.
So, in a mature organization, right, you want to be able to sort of illustrate and POC and things like Streamlit, but then, like, for mature production, like, you want to get people, especially in a business like the music business, where, you know, the people are, you know, generally artistic sort of people, and so having an app that's really sort of beautiful, you want to be able to rely on the organization for that last sort of bit of productionization, which is make this thing beautiful and have a really beautiful app around it. But for us, for ourselves, to get to that point, we use Streamlit.
Explaining quantitative concepts across the business
How do you partner with that creative team, or what does that process look like?
You know, this is always the question in so-called data science, and from a broad scope, I think you could reframe this question, like, how do you explain quantitative concepts to people, audiences, colleagues, students who are quantitative, right? You know, working with the zeitgeists we live in, and in businesses, it's just so super important to be able to do meaningful work to tie in with people who don't do that work and who it's, you know, novel to.
At this business, so music business, that's, I should say a little bit more about the music business, federated business, right? So there's a central sort of corporate function, and then there's the labels. Everybody's heard this, the label, you've listened to the rap songs, I want to get my own label, and the label signs, and I got screwed by the label. These are the people who interact with the artists directly, and there's many of them in any, you know, large music company that manifold labels, different people, their tasks, idiosyncrasies are sort of bespoke and unique, and so a lot of that I can't really give you, like, a bible for.
It's sort of going around the building and working with people, you know, I'll say this, this might be useful, I say this to my team, you know, any company, there's four people who are doing real work. Find those four people, especially in data science, you need to find the people whose jobs count on the actual measurable outcomes, and if you get those people onto your stuff and, you know, drinking from your bottle, what you create will be successful.
So what are those measurable outcomes? Well, in this business, the measurable outcomes is whether songs are successful or not. Media in general, this one in particular, one of the things that people in media are very worried about is your overall sort of attention budget. So it's like sort of a stick-breaking problem. You can imagine the Dirichlet distribution on your day, each of the dimensions being what you paid attention to, and so in music, you want, if this is the amount of the time you're going to listen to music, we want it, Warner Music, to be listening to our inventory, so that's market share.
So market share is a big driver in this business. The way that the music companies are paid are basically off of fraction of subscription fees, right? You pay Spotify, Spotify takes your $100, whatever, and splits it up amongst, by shares, amongst the rights holders for those shares, right? So if there's only $100 to go around, I want as much as $100 as possible. I need people to listen to my stuff more than they listen to the other person's stuff, right? And you see this especially in pop, right? Very sort of competitive setup. A song comes out, people worry about release schedules because they don't want the amount of available listening to be shared or diluted or diffused, you know, among many different options.
When the metrics used by the business don't match
Daniel, I see you asked a really good question in the chat and would love to have you ask that live. Sure, absolutely. So this just kind of goes back to the similar stream of thoughts that you've been speaking about in terms of kind of like those four people to use that same example, but you kind of put it in the context of like the people that can really kind of knock out those things that are really worth knocking out. They can kind of ace the test, so to speak, or produce the metrics that need to be produced.
Would you have kind of like any strategies or any advice when kind of like the generally accepted metrics that are used by other parts of the business aren't really what you think they should be? So like if you're personally measuring your team based on something and the business is saying like you should actually be measuring something else or this other thing is important to us, how do you kind of bridge that gap?
Yeah, that's a great question. I'll blab for a second. I'll tell you that I hope my answer is satisfying. The reason I say that is because this is a real problem, right? You're in organizations that have been doing things one way. And what you've been hired though to do is to bring them the gospel of what's new, but the way that they read the gospels in the language they already understand.
So the answer is yes. So I'll give you a specific example. So the growth thing that I told you about in the model that I described where we have a parameter that measures sort of rate of growth. So the parameter that we use and the model that we use, the thing we're parameterizing is not just rate of growth, but it's rate of growth per sort of expected total addressable audience, right?
One of the things I noticed when I got here, so we would pass around reports of the previous iteration of the organization on sort of what's growing and what's the top hit. Not baseline, right? Like you would see graphs and people would play around with the y-axis and you would see this thing is really growing, right? And it's logarithmically different from a growth rate for something else.
And I'll tell you for the way that we parameterize it, we try to get the conversion into money as quickly as possible. Demand will be different things in different business, right? Like the top line volume of demand, you're always hitting that with a rate card and I'm using some media jargon sort of stuff. So one of my worries was getting from demand to revenue, and then from revenue to market share real quick, because I knew that those were metrics that the business could understand and that they evaluated themselves on. So it's one thing to say growth, it's one thing to say growth addressed, you know, of total addressable market. And it's another thing to say, if this thing grows like this, this is how much more money and market share you'll get. I've found that to be effective. When you say that, people seem to pay attention, you know, as long as you're back in dollars or yen or pounds.
And it's another thing to say, if this thing grows like this, this is how much more money and market share you'll get. I've found that to be effective. When you say that, people seem to pay attention, you know, as long as you're back in dollars or yen or pounds.
One other anonymous question, Kobi, is are songs themselves analyzed to understand what makes a song popular in certain regions? Like NLP for lyric sentiment? No, fantastic question. The answer is yes. And also there's a lot more to do there. This company, other companies, you know, separate companies that exist just to do this, do spend time thinking about the sound content itself, the lyrics themselves.
Again, you know, so the fundamental object in the music business is who is listening to what, right? What, how it sounds. So that would be our way to quantify and categorize the sound, the lyrics, stuff like that. But the response side of it is who. And so what we are spending time here is, well, now that I'm able to quantify this thing in a measurable way, how do audience segments respond to these quanta, right? Like turns out people who listen to kids' music might not be big fans of profanity, you know, things like that, right? Like so that it gets meaning when you cross over from the content itself to the audience that receives it and what you're trying to get as far as change of demand.
Yep. I wonder if like would songs themselves start to sound more similar if people were trying to like follow this, like. You know, that's a great question, but I'll tell you, and this is not my area. We've actually been lucky to hire a guy who specializes in music information, but I'll tell you what I understand from it is actually the answer is the opposite, is that people tend to enjoy things that sound novel or orthogonal, right? Those are the things that turn into hits. So it's not, we're not actually, if you're trying to get people to listen to stuff, this is what's so beautiful about music. It's almost, you know, like you're dealing with sort of the fundamental sort of questions of existence. No, just playing a tone at 440 hertz isn't going to make a hit, right? And so if people are worried about the algorithms, we'll just be playing stuff like that. That's not what people respond to. In general, for most people, people like to hear novel things and that's what sort of catches their ear.
Back catalog and older music
Bruno, I see you asked a question in the chat if you want to jump in live. Hi, thanks for doing this. I wanted to know, do you do the same kind of work for the back catalog, like the older music? Is it generating enough revenue to warrant the same analysis level?
I'm so glad you asked this question. So I'm going to say several things. First, the answer is yes. Second, this is the perfect question to ask an old fogey like me, because I, you know, so maybe a little preface, and this is just in the, you know, in the course of conversation. When I was interviewing for this job and, you know, and decided to take this job, one of the things I told them was when I used to, when I was in grad school, my part-time job, I was a DJ. And, but I was one of those sort of obtuse, you never heard this song before, and made me proud if you've never heard the song before, DJs, right? Like I, I used to say that I could, I could play a whole set with never playing a record more than 30, more recent than 30 years old.
And, and there's, there's a wealth, there is a wealth of stuff in catalog, right? Like it's my belief you could run a whole radio station off of the, the diaspora of music from 1980 backwards, never play a new song, and get stuff that sounds topical and fresh and beautiful and competes orally with any of the stuff that's come out now. So the answer is yes. I'm very concerned about, about catalog and back catalog at Warner Music. The label for catalog is called Rhino. They have a lot of inventory of really, really good stuff for Reese and Franklin. Last year we, they had a big run with Fleetwood Mac that's in the catalog.
Yeah, so the answer is yes. Now the second part of that is, again, how do people listen to things these days, right? How is music distributed? Whereas before, you know, you had to get some kid interested in Led Zeppelin so that they would go down to the store and cough up their $14 to get that Led Zeppelin CD, and you'd hope that this would happen in an evergreen way. Nowadays, a kid can just go onto the DSP and type in Led Zeppelin and, you know, hear stuff from the 70s. So there's a tremendous opportunity to curate and push that stuff.
And then the last thing I'll say there is that the streaming providers are aware of this. They're already doing it, right? They, they have this stuff in-house as part of the inventory that they're paying for. They push it, you know, Sirius Pandora pushes it in a channel sort of way. They do an amazing job at curating, listening within these sort of older catalog-based tranches. And they also do a great job of sort of pushing the boundary, like, hey, if you listen to this new thing, here's this old thing, and then pushing it out.
But for sure, so the answer, the last part where you have is older music generating enough revenues streamed or warrant analyzing. Yeah, yes, YES, capital YES, yep.
Shiny, data sources, and predictors
I'm noticing there was a question earlier that I had missed when you were talking about using Streamlit. Someone asked, are there, similar to your discussion of Streamlit, any thoughts on Shiny and its place in production at organizations? Oh, yeah, yeah, I love Shiny, and so for the stuff that I do, the sort of more bespoke, somebody wants a dashboard on something sort of unique, I strictly use Shiny, and you know, and I'll say this, I'm a very hands-on person, you know, I lead the team, but I code, I create apps in Shiny, I was futzing around with it yesterday. Rachel knows that I ping her all the time about, can I get some help to make sure this thing works? So we use that all the time. It's a big part of what we do.
Oh, another anonymous question is, do you have colleagues insisting on giving you data in Google Sheets? How do you work with them if all your data is warehoused? They said, then I am so jealous. Yeah, yeah, look, when we, when I came and when I started this job, the answer was yes, so we had to, and that, that's just not a efficient way to do anything useful, right? That's, you're trying to cook for an army, and here's your pot and your can of beans, that just doesn't work. So I feel for you.
So is that to say most of your data is warehouse now? The vast majority of our data is warehouse, and we spend time worrying about making sure stuff is in pipeline before we, before we use it, right? I can't even think of a time recently where somebody just threw something at us, and we were forced to just download it from the Google Sheet or, or save as that CSV.
Being a hands-on manager
One of our, I think it was actually two weeks ago, we did have a pretty big discussion on becoming a manager and taking that jump from being able to do, spend most of your time doing coding and then moving into this manager role. But it sounds like you are still also doing that coding part. So I'd love to hear a little bit more about that.
Yeah, I mean, I think that's important. You know, having a team and having lots of people, beautiful people who are willing to, to tie in to, to make what you're trying to do successful, and not just me, but the company and all that. And that's a privilege. And I say that, and I remind, if, if, if the people you're hiring, and you don't feel like that about them, you're hiring their own people, right? Like that, that it's a gift that they're giving you, you know, their talent and their time, and you can never pay them for when they were eight years old, and they were taking a toy apart, right? Or they were, you know, studying biology on the train when they were 13. You can't pay people for those things.
And as far as, as the work, you know, the point of having a team is because there's only one, you know, there's only one of you. There's only so much you can do for your, by yourself, for yourself. And part of, you know, being a manager is, is, is, is curating the team, and then providing a vision, and, and, and being a decent person. But part of being a manager is also doing, in my opinion, right? And I, and I think the best organizations demonstrate that.
How do you balance that, and figure out, like, that, like, what time you can use on, on actually coding yourself? You know, so for me, in particular, I'm a morning person. I get up really early. I'm up at, like, 4.30, 4.45 in the morning. Before nine o'clock in the morning, before everybody else is up, that's when I'm doing, you know, the individual work that, that's, that's on my plate.
What got you into data?
Was there a certain point in time where, like, in your life you said, wow, you really can achieve amazing things with data? Like, what got you into data in the first place? In the first place, you know, so I'll go away, but, you know, my father was a, was a psychologist, and he finished grad school when I was a kid. I remember him, I remember we had a little Datsun B210. I remember sitting in the back of the car. I remember him saying he had to go to the statistician to get the statistics done for his dissertation. I'm, statistics, like, what's that? And that was sort of my first foray into, there's a world out there.
I liked cars a lot. I was into cars. Things are, you know, I'm old enough to, when I went to high school, there was no AP statistics, right? You know, if you like math, you did calculus, and that's what you got. And so I liked, I liked physics. I liked cars. I figured I'd, I'd do something related to making me a nice car, and I ended up taking a stats course along the way as a prerequisite, and it just, you know, when you find what you want to do, and, you know, God willing, we all have these experiences, and get to see enough to find the things that we really like, it just made sense to me.
Like, I was like, oh, of course, I remember the first time I saw a Z statistic, and I was like, oh, yeah, of course, you subtract off the mean. Yeah, and of course, it has to be divided by the standard deviation. And, you know, that was, you know, 30-something years ago that that happened. Followed that, you know, finished, went to grad school, finished that, did academic thing for a while, environmental statistics for a while.
And this is just a fertile time to be in business and try to make things happen. You know, we're lucky, and I really, this is something I want to emphasize. We are lucky to be in this field at this time where people are willing to pay money for what we've dedicated our lives to, right? Like, I always say, had I stayed in physics, and then needed to find a job in physics, and then needed to find a job after academia, you know, I might have, it would have been just like had I committed a crime, right? Without a high beta tokamak and a particle accelerator. Like, what are you going to do with a physics PhD? It's a crazy thing. It's not fair.
And so, we're lucky that you saw the lingua franca of the world, as the world has moved towards digitalization, as people are on their phones, is data science. And I think we owe, you know, to the field and to the craft to be good at it, right? And the champion and stuff. A lot of the questions that people have asked have been about how do you mediate, basically, a numeracy within organizations and stuff like that. And I think whether we're going to be successful with these organizations, whether these organizations are going to be successful, depends upon our ability, you know, to sort of, to be clean and on top of our stuff in a professional way. Like, to be actually good at it. And to translate what we do into meaningful, impactful things.
To answer more of this, so, when I hire, I try to hire people I respect. I try to hire people who have put the same love into their field that I have. And, you know, when I worked at my last job, we had a conference and there was a panel about the democratization of data science. And this may not be a popular thing to say, but I said on the panel, would you want your surgeon democratized, right? I just had eye surgery last week. I don't want my eye surgeon democratized. I want the eye surgeon who's been studying since he was eight years old. I want the eye surgeon who, you know, who is a professional and good at their craft.
When I still teach statistics, I still teach a couple classes for Seton Hall. And I say to the kids, and they have difficulty with it, I'm quoting somebody else now. I'm quoting Dick DeVoe, a professor at Williams College. He says, math is like music, statistics is like literature. You need to have a body of experience and able to, to be able to apply statistics in a mature professional way, right? You can't just jump into it and, ha-ha, t-tests, I'm good at that. I'm going to throw that at everything, right?
He says, math is like music, statistics is like literature. You need to have a body of experience and able to, to be able to apply statistics in a mature professional way, right?
And I think that one of the things I hope that we get in statistics, in data science, is more of a sense of community and guilt so that we can, and this may not, you know, enforce some standards. It's important for the survival of the field.
Hiring for craft, not keywords
Frank, I just see you unmuted, so I'll let you jump in there with what you were going to say. Yeah, I don't, I don't know if Kobi saw my question in the chat as he was answering, as he was speaking to that idea of standards in the field, and you get a lot of noise, and people just jumping in and using their t-test. But, right, there's a little bit of, right, personal, like looking at myself when I ask this question. But I feel like I've put in a lot of work in trying to understand and make sure when I find insights with data, or put something out there in the world, that I really do my homework, and I kind of put my heart into it, because I'm always afraid that someone's going to say, like, Frank, you're an idiot, right, like that scares the heck out of me. And I would imagine if I feel that way, I have people on my team and throughout my organization who feel the same way. So, right, where's that balance between we need professionals, and we need people who thought a lot about this, and like, how do you know when you're there?
I love that you said that. I love your background. I love the books behind you. I'm going to, you know, one of the people who I really, really like and look forward to, and then look up to, if you can put it like that, Donald Knuth, right, the guy who wrote the Concrete Math Book Professor at Stanford, and he's got a bunch of lectures and conversations on YouTube. And so, somebody asked him a similar question to what you asked, and he just says, like, I was always afraid of not knowing enough. I was always worried that I hadn't done enough, I hadn't learned enough, I hadn't respected what I was doing enough to get better at it. And he says he kept that sense throughout his life, and he feels like that's what allowed him to be successful, is having a higher sort of, you know, thing to answer to, right.
And it doesn't have to be, like, and I think, you know, using words like fear and all that may not be the right way just to describe it. It's your craft, right, and you're doing something with your craft. And then I tell this to my students, you may not remember me, you may not remember the text, but you will remember studying. Your brain will remember the synapses changing, this act of going through learning something and being good at it. And that's a gift you're giving yourself, right. And if we respect that gift, then the organizations, you know, the companies, these business firms that we're in, get the benefit of that self-love, if you want to put it that way.
Alyssa, I see you have a follow-up question, too, if you want to jump in. I think it's one of those things where, I mean, you know, the anecdote of, like, I would need to go see an eye surgeon for my eye. That's sort of like, yeah, my eye needs eye surgery, but what if I just need a new pair of glasses? Then a technician is what I need, right. So I do think that there is sort of space for everyone along there, that continuum to fit together. But that means we need to actually create cohesive communities, rather than being kind of binary about these definitions, in my humble personal opinion.
Oh, sure. You know, it was the optometrist who discovered that I needed to go to the eye surgeon, right. And for sure, you know, you can't have, in every America's Best Contacts and eyeglasses, the retinal specialist sitting there waiting and flipping lenses back and forth. There's definitely a diversity of tasks, if you want to put it like that, right.
I know we have a few minutes until the top of the hour here. So I was just scrolling through to make sure I didn't miss any questions. And I see there's one more anonymous one that was, how seriously would you consider a candidate whose language of choice was not what your team predominantly uses, but who was otherwise on point? You mean like, like English, French, Spanish? I think they mean probably our Python. Oh, oh, yeah, yeah. I don't even ask. You know, that's unimportant. I mean, we do. So we do mainly in Python, just because that's what's going down these days. But you know, I'm old enough, who even knew what Python was, Java, right? I'm old enough to know C++. I'm old enough to know Pascal. I think if you know how to program, you know how to program and you can learn new languages.
So I will say that I don't think that the way in which many recruiters search for good people is sufficient. If you're looking for keywords, and I know SQL, and I know this. You're not getting a scientist, you're getting somebody who learned a package. If you're good, you're dedicated to your craft, you're going to learn whatever. Python will not be around forever. Let's just say that.
Yeah. So I don't, we're not language agnostic. Quick follow up on that, then what is the best way to then find the right people or recruit the right people if it's not necessarily like those words on the resume? Like in all things, it takes time. That's like everything. You know, back when I did environmental statistics, and I spent a large part of my career worrying about hydrology and wastewater treatment. And if I had to summarize all of that, which is basically it takes time. Like you can't rush cleaning water. And the same thing, you read somebody's resume, talk to people, have a network of people.
You know, I've been doing this stuff 30 years. So I know people who've passed through different organizations, people in different places, people who have students. And then the interviewing process, I think is super important. And then, and also, and I'll say this for people who are interviewing, which, you know, I'm a person too who interviews. You want to be as selective at the job that you choose to give your time to as the places are being, you know, selective with you.
And I know in this country in particular, because we have no social safety net, you know, there's always this power differential. But it goes both ways. And I think that's, again, one of the things that's nice about this field at this moment, is that if you've devoted a long time to your craft, you know, the need is so high that you're able to sort of be particular. If it's, you know, your job is a big part of your life. You've got to spend a lot of time doing it. You know, Marx believed that that craft, you know, is the essential sort of meaning of existence and stuff, right? So, so these things are super important. And being happy at what you do, and being an organization where you're able to be happy at what you do, it goes both ways.
But I was kind of curious, like, you mentioned network, and it kind of like spurred this idea in my head of like, this giant network graph of artists, and, you know, basically different record companies, and Spotify, and Apple. And like, do you guys ever look at that network at like, or at least like try to visualize like, all of those different connection points?
So there's this, I'm forgetting the name, it's escaping my head right now. There's this interesting graph that this fellow put together. Art of noise, what is the sound of noise? Many noises, a hundred noises. You know, I'll send it, I'll figure out what it is, I'll send it to Rachel to pass it to the group. But somebody has taken the data you're describing and created this beautiful illustration on the internet, which is all of these different artists arranged in these sort of sub groups, and then clicks it by, you know, type and genre, like, and really specific stuff, like house, deep house, deep trance house, deep trance house in Brooklyn. And on this website, you can sort of pass through this illustration, click on something, it'll play you representative sounds on it. It's the sort of thing you're describing. Let me just figure out what the website is, I'll pass it to Rachel, and maybe she can pass it to the group after the call.
Thank you so much, Kobi. And one last final question I'd love to ask you is, are there certain books or podcasts or things that you generally listen to that you'd recommend that we all check out? I'll send a list.
Okay, perfect. Well, thank you so much for jumping on and sharing all your insights with us. I really appreciate your time. Thank you, Rachel. Thanks, guys. Thank you all for joining too. Hopefully see you all next week. Same place, same time. Have a great rest of the day.
