Data Science in CPG and the Zero-to-One journey | Joel Ash @ Kraft Heinz | Data Science Hangout
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome back to the Data Science Hangout, everybody. If we haven't met before, I'm Rachel, I lead Customer Marketing at Posit. And so I like to let people know here that Posit is the company formerly called RStudio. Just in case we haven't shared that yet with you. We build enterprise solutions and open source tools for people who do data science with both R and Python.
And Libby, I'll have you introduce yourself too. I am a Community Manager here working with Posit for the Data Science Hangout, but I also mentor in Posit Academy. So I help professionals learn with R and Python to help them do better work with data in their everyday jobs.
We're so happy to have you joining us today. The Hangout is our open space to hear what's going on in the world of data across all different industries to chat about data science leadership and connect with others who are facing similar things as you. And we get together here every Thursday at the same time, same place. Just not if it's a holiday. So heads up with Thanksgiving coming up in the month, we won't be here.
But if you're watching this as a recording and want to join us in the future, there will be details below to add it to your own calendar. And thank you so much to those who have helped to make this the friendly and welcoming space that it is today. We're all dedicated to keeping it that way.
Introducing Joel and his role at Kraft Heinz
And I will also say that this feedback form and hearing feedback from attendees is what inspired me initially to reach out to Joel. So Joel and I had been connected for a while, but somebody specifically asked to hear from someone from the consumer packaged goods industry. And that's what got us reconnected again in the past few months.
So with all that, I'm so excited to be joined by our co-host today, Joel Ashe, Senior Data Science Manager at Kraft Heinz. And Joel, to get us started here, would you be able to share a little bit about your role and some of the work that you do, as well as something you like to do for fun?
Oh, I'll start with what I like to do for fun. I'm a hobby collector. So my current one is woodworking. So if you're on, if you end up connecting with me on LinkedIn and ask, I will send you all sorts of fun pictures of things that I like to make. It's a nice split from coding, all the conceptual technical things we do to get out there and make things physically.
Like I said, Joel, I work with Kraft in our RGM area. So revenue growth management. So if you think of any time you go to like a Target, Walmart, any grocery store, and you see a sale for one of our products, you know, we'll just use the Blue Box Kraft Mac and Cheese. You give like a 50% off. So sale promotion. We actually pay for that. So we pay the company, the customer, those are our customers, Target, Walmart, and so forth.
So they don't lose out on their revenue and profits, and then it's then sold to consumers. So differentiation in terms there, since you generally think of our customers as the actual end result. That's not the case. It's the middleman is who we actually sell to.
So as you can imagine, it's kind of expensive to run these because anytime something is purchased on a promotion, we pay for it. We end up paying in excess of a billion dollars a year in promotions domestically. So we are tasked with optimizing that, finding the opportunities for better pricing, more optimal pricing, reducing the costs, since as you can imagine, much like marketing, it's not always a net profit generator, it can be quite just expensive out of the gate.
So we will kind of want to reduce those while also maintaining customer relationships. Since companies like Target, they want to maintain traffic, foot traffic, and a lot of these are our loss leaders. So it's a balance of finding optimal math solutions, while also recognizing the limitations of how far we can push that, since it is kind of part and parcel with the business.
Cream cheese, promotions, and model interpretability
So one thing we're working on now is trying to expand our understanding of what the actual trends are domestically so that we can minimize the expenses. Minimize the expenses of a promotion such as cream cheese, like Philadelphia cream cheese is ours. It's 80% of the market. So 80% of the cream cheese in the United States is Kraft's product.
Why would we even bother promoting it when we have that kind of market capture, right? Well, we don't actually have a lot of people domestically that buy cream cheese. It's about 4% of the markets. So you can see that little split, right? We don't have a lot of people that purchase our goods, but of the people that purchase cream cheese, we are the bulk of that.
So we are looking at ways both with kind of marketing results and with our promotional results to try to identify what the right opportunities are to put those on sale at the right time throughout the year so that we both minimize the expenses that can be associated with it. So making sure we don't discount too highly so that people get it for free. And then also enticing new customers to come into the market so that we can expand our market share, our household captures.
The modeling can be anywhere from quite simple to a little bit more complex, although I would say that we, I try to push my people to not go on solutions like XGB to be direct about it because we still have our customers. Our business stakeholders, our customers as a data scientist still need to understand what's happening in our models. It's largely a recommendation engine. So getting them to understand where it's coming from, where it's kind of the risks are and the benefits of our models so that they can then negotiate those contracts as best as possible with our customers.
Interpretability is important, right? It's number one for us. It's truly number one. There are definitely areas in the business where we don't really care. Just do the best operations like flow control in a factory. Nobody cares about interpretability as long as you get the right answer. But my space is principally business recommendations. And so it's number one, two and three, I guess, in terms of importance.
My space is principally business recommendations. And so it's number one, two and three, I guess, in terms of importance.
Zero to one: building data science in legacy environments
So there are some books on this that relate to startup cultures, starting from nothing and making something. There are a couple of different areas that you can think about with this and how it relates. Like if you were to work in an insurance company, zero to one happened decades ago. You know, the data tends to be incredibly tightly controlled, very, very, well, relativistically clean, relatively clean, right? There are actuaries, there are patterns, there are processes for how you can optimize and kind of what the walls and constraints are for your models with respect to what features you can use, you know, governance, that sort of thing.
Some companies, this is not the only one that benefits like this, where data science is a newly discovered talent, more or less. And this has been going on at Kraft for quite some time, about five years. So not to claim that it's brand new, but you have to integrate more complicated processes into systems that were never originally designed for, right? So if you're looking at Goliath monostack platforms and you're trying to say, hey, we need to inject all these models into this old Java system, you can get a lot of pushback on that.
So it's learning how to be dynamic and how to create and how to create, I guess, value capture and build new systems in a way that's also hospitable and can be ingested into something that was never originally designed for it. This also relates to business processes where you can be working with, you know, salespeople, marketing, other business folks where they have always had universal control over a process and then to inject math, which may be seen as witchcraft to some, into their decision processes in a way that might be a little adversarial or controversial for what they might originally think, but, you know, helping them along to kind of get them into a more optimal flow state.
So it is very, very tricky. And it comes down largely to one interpretability, as we talked about with our models and understanding that the needs of the customer, which is largely kind of a product mindset. So in this case, the needs of the business are critical. So you cannot just simply go off, build a great model and show people how amazing it is and win them over. It's a lot of bureaucracy on top.
Book recommendations and failure as a skill
There's actually one that I just finished that I would highly recommend. It's called Ego is the Enemy by Ryan Holiday. I think it is incredibly valuable to really, really grasp and understand that that process of failure, to be quite honest. Failure is a skill. It is a necessary skill in data science. We do it every day when we're testing models, but at a very, very grand scale.
You could do an entire project for a year that gets cut down. There's all sorts of things that can happen. And learning the kind of core tenements of what that book is sharing actually reminds me of a quote from Star Trek, because I'm a nerd, I'm going to do this, from Picard, that you can do everything right and still fail, right? That is just life. That's not to say we always do everything right. That is to accept the reality of the world we work in, is that failure is happening. What's important is to kind of center yourself, learn, and proceed.
You can do everything right and still fail, right? That is just life. That is to accept the reality of the world we work in, is that failure is happening. What's important is to kind of center yourself, learn, and proceed.
Community discussion: marketing mix, interpretability, and CPG methods
Just a question about kind of the methods that you guys are using. Is it more along the lines of, I guess, like, are you looking at wholesale marketing effectiveness across Kraft as a whole or like within individual campaigns? I know there's like media mix models and marketing mix models for looking at like the effectiveness across, you know, across the board, but I'm wondering if there is something different for individual campaigns beyond, or that you would be using to measure those individually.
So you nailed it already. You know what the core methods are. In terms of what we're doing uniquely, one of the limitations that we have is that we don't control like promotions and marketing activity, right? Like when we have an ad, it's like everybody needs to see this. And then our results, what we end up finding out from these companies is at scale, what was good? Like, did it work? How many units did we sell?
So the best way that this ends up happening is we have to pay for the data from customers that are doing this themselves. But you can think of like if Instacart were to do a promotion, they can do it uniquely to an area, to a couple stores, and we can then go around and purchase that data and find out what happened. But we can't direct those. There's trade laws that are applied to that. It's kind of an interesting experience that there's a lot of information that we're just simply not privy to in CPG, but we still have to find ways to like divine out what the actual results were, which largely comes from paying other people to give us the right information to do those analysis.
Currently the expectations are simply that our predictions are better, more accurate than what the current processes are. That's it, right? When we're doing our retros on whether or not it was valuable, that's actually a secondary process. So I have one of my teams is more insulated from, did you predict it right? Because it's just, did you, are we assessing value correctly? And that's how we separate this out.
So our accuracy, what we just do is like a WMA calculation of, did you, were you able to correctly predict how much a promotion would, would yield in terms of unit sales in a couple of months? And then our targets just are improving upon that through time. So there's no like hard and fast rule around that with respect to how much dollar capture we're going to work.
Interpretability and communicating models to stakeholders
When I was in insurance, partial dependence was what we did when I talked to actuaries. When my CFO wanted to know why we're making a drastic change, we had a conversation about golf. It is wholly dependent on us to find the method of communication that results in a person understanding. That is key.
And if you'd say, you know what, you're just gonna have to trust me on this, and you have built a relationship with the decision makers where that is a thing that they can accept, you do whatever modeling you need to do in order to get to the end state, and then next you'd be, yeah, go for it, right? Go for it because it's going to work, and you can prove that it's going to work. But there is no singular answer because it is never the same conversation.
Our communication needs are to be very, very dynamic and flexible, and this is why I kind of prefer the management and really the technical product leadership aspect of this because to me that's really the code to crack is to get a person who perhaps never really cared of anything past geometry can understand the complexities of what these models are doing and why it really matters to them. And it's understanding them personally a lot of times, which can take a tremendous amount of time, but you do what you need to do, and you keep practicing and trying different ways until they hit the aha moment to get the message across. That's all there is to it.
Community discussion: CPG cannibalization and steal effects
In my experience, whenever I've done projects for consumer brands or HR, the regression assumptions are almost always completely violated, and it's very difficult to implement a regression model and have it make sense. So, unfortunately, a lot of the times we do have to kind of go towards a tree-based approach, whether it be a random forest XGBoost or light GBM, that just seems to lend itself better to explaining what effect each variable is having on the response, and that tends to go better, in my experience at least, with a business unit.
So just as a quick aside, I had a project a few years ago where I was just predicting employee churn, and due to the severe multicollinearity of most of the variables, I wasn't able to really use much of a regression approach. So we just went with an XGBoost, you know, model, and were able to find some variables that were really driving the response of people quitting. And as a result, we presented it to the stakeholders, and it helped them reduce turnover.
I'm curious if I'm understanding right, like, there's a focus that we're primarily talking about here about relationships with sellers and how do promotions there have an effect, and I'm curious about the interaction between that kind of work and work at the consumer-facing side of things, stuff that measures brand loyalty and NPS scores and all that kind of stuff. I'm just generally curious if there's a relationship there, if those kinds of marketing teams interact or if their missions are really separate or have some kind of collaboration.
So your patient population is decreasing, but your NPS scores are going up because you're only retaining the people who had good experiences. So it was kind of a weird thing to use as a leading indicator, more so than like a measurement, like a temperature check on its own, at least that we found in that industry.
And it is on Betty Crocker Drive in Minneapolis, if you didn't see that in the chat. She isn't real, but she still has a street named after her. It's just the exit off the freeway. And yeah, that was the term. It was called steal, which is basically like, hey, there's like 30 varieties of Cheerios. If you add another variety of Cheerios, how much of those sales are going to be incremental new customers and how many of them are going to be steal, which is basically just taken off of a core brand. And that was never, that was closer, I think, to witchcraft and dartboards than a science. So I was always curious how other companies were trying to calculate that out.
Experimentation, failure, and business value
So I'm a relatively new data scientist and I am kind of diving into the world of experimentation and all of the, yeah, so efforts that we do as data scientists, right? So I was just wondering, this is for like experienced folks here. So I'm just wondering how many of your experiments in a project have led to successful outcomes versus not?
All experiments are successful. Just whether or not you get the answer you were looking for is the question. But if you've asked the question and you're like, man, I really hope the outcome is X and you find out that it's not, that's really valuable. And it's really valuable to write that down because then the next person who comes behind you doesn't have the same question and spends their time looking into that.
And I come from like a science background and publishing, and there's this hesitance to publish anything that doesn't have a significant P value. And so you end up with all of these different research labs doing the same experiments over and over again that nobody's telling each other about because you don't want to publish your results because it's not, oh, I discovered this significant correlation where like it is important to do the experiment and to learn what that experiment tells you regardless of what the outcome is.
Yeah, there's an apocryphal or, you know, semi-apocryphal story about Edison and the light bulb. And he talks about it, took him, I can't remember if it's a thousand or ten thousand tries to find a really stable working, you know, model of the light bulb that would be a workable product. And he was talking to someone, you know, at like a cocktail party or something of the equivalent. They said, well, do you feel bad that you wasted those, you know, for argument's sake, let's say nine hundred and ninety nine other things under designs? He said, no, none of those are wastes because I learned nine hundred ninety nine things to never do again.
Steve Wozniak, I saw him at a talk at my company and he said kind of the same. Someone asked him, well, tell me about your failures, you know, things you designed, you know, in your days before Apple and through Apple. What did you design that was, tell us about your failures and looked right at the audience and said. I never failed. Just some of the things I did didn't make money.
I never failed. Just some of the things I did didn't make money.
But I think that the key thing is that if you're in business, be careful what you're experimenting with, right? Playing with stuff to see if it'll work or pushing things and pushing some buttons and see what I get. You know, yes, you'll learn stuff, right? But if sometimes if you're just experimenting for the sake of it in business terms, sometimes you can pick the wrong experiments, you know, the wrong experiment at the wrong time. And yes, you learn stuff. But what you learn is that you're going to get parts of your anatomy handed to you because you didn't do something that was moving the business forward.
So I kind of mentioned, I think what other people have already said. And sometimes, you know, the, the modeling and everything that we do goes well, and we find out the answers that we want to know, but for the business, they might've gotten in there with some very specific expectations, like just generically launching a new product is going to lead to this big halo effect on a parent brand, for example, and we might not find that that's the case.
But in order to make it so that the work we did is valuable, sometimes it takes even more than one discussion or presentation. And what, you know, what can you add to your results for context that allows them to do something with that? And it might be like, well, we didn't see a halo effect, but we also did not see cannibalization, which is a good thing. And you can bring in, you know, other strategic studies and other information that's valuable that maybe isn't work that you did, but that's already been done in that area just for additional context.
Yeah, I find sometimes spending just as much time, if not more, on how do we present our results, how do we explain things to people so it makes sense and they can do something with it than just doing the analysis and here's the results for our modeling. Like sometimes it takes a lot of work and how do you communicate that out? And these days, that's what I spend most of my time doing. I don't get to do as much fun coding and stuff anymore, but I do a lot of presentations and discussions and stuff like that. So I think that part of the work is also just as important.
The tension that I've experienced are when folks are kind of distant from a business perspective. I think this is why analysts and data scientists need to be really rooted and embedded in the business that they work with. Often folks who are who are not have this idea that, well, we'll just explore and an insight will be emergent. And you can find interesting things that way, but you can't get direction on what you do, what's valuable to your stakeholders, to your customers.
Back to me again. One example of a thing that really irritates me as the lead of a group or an SME is when someone will take an example like, hey, I'd like to use Python to recreate all the dplyr functions because I'd like to learn dplyr, I'd like to learn Python. And it's like, OK, that's valid to a point, but you could do all today in R using dplyr. So what are we doing here? What are you actually doing? Are you learning Python just for the sake of it? Or and once you're done, what is going to happen with the experiment that you've just done here? Is this just for you to learn something which is valid, but if you're doing something for the business to move the business forward, that experiment may not be the best use of your time.
I love this unexpected direction we went here with experimentation. I know we have just a few minutes left here, so rather than jumping into a new discussion, I just want to say thank you so much to everybody here for just kind of jumping in on our chaos mode here. I would love to have Joel join us back again when we figure out our Wi-Fi issues so we can give him the session we wanted to have with him and get to learn from his experience too. But thank you all for for jumping in here and still making this a great hangout.
