Data Science Hangout | Nate Kratzer, Brown-Forman | Focusing Tools on Adoption, BI Tools & Shiny

Transcript#

This transcript was generated automatically and may contain errors.

Welcome, everybody, to the Data Science Hangout. I know that most of you have all been here before, so welcome back. But for anyone who's just joining, this is an open space for current and aspiring data science leaders, everybody across the data science community, to connect and chat about some of the more human-centric questions around data science leadership.

And so we really want to create this space where everybody can participate and we can hear from everyone. So there's a few different ways that you can jump in and ask questions. You could just unmute yourself and ask questions live. You could put questions in the chat. And we also have a Slido link, which Tyler will share in a moment here, where you could ask anonymous questions too.

Also, if you put a question in the Zoom chat and you want me to read it out loud instead of calling on you, you could also put a little star at the end of it, and I'll know to just read that out loud. But just wanted to make a quick note that the session will be recorded and shared up to YouTube as well for anybody who missed it. But with that, I'm so excited to be joined by my co-host for today, Nate Kratzer, who is data science manager at Brown Forman. And Nate, I'd love to just have you introduce yourself and maybe share a little bit about the work that you do on your team today.

Great. Thanks, Rachel. And thanks to everyone for being here. Looking forward to getting to talk to all of you for this lunch hour, or at least lunch hour where I'm located. Some of you may be joining from other time zones. So I'm Nate. I'm a data science manager at Brown Forman.

Brown Forman is the company that owns Jack Daniels, Woodford Reserve, and several other liquor and spirits brands. So I work with two different teams. One of them is a team that works on actually making the liquor, which lots of fun stuff comes out of that. We get to talk about how to produce the barrels, just all sorts of actual engineering processes. That team largely uses Python, Tableau, and SQL to get their job done, though they are also heavy users of the RStudio projects within the Python framework. I also work with a team that does pricing, and I came out of that team doing a lot of pricing work, estimating price elasticities. That team largely builds stuff in R and also uses SQL and Tableau as well, and is mainly concerned with what happens when we change the prices on the shelves, and how does that affect sales and profit and everything.

And I think one of the very first Shiny apps we deployed was just something where they wanted to run PCA analysis on some chemical results. And we're just like, OK, here's the thing where you can upload your CSV and then all of it will be run and you'll get all the charts you want right away. Right. And that just instantly saves hours of people's time.

Calculating ROI of data science

That's great. Somebody had just asked me yesterday how to calculate ROI on switching over to data science tools, and they were just asking me if I had examples from different customers. But I feel like I never hear things in terms of dollar amounts of this is the exact ROI and that's so hard to calculate. And I was just curious if you had done that at all. Do you think of what the consultant's hourly cost was or what it costs in Excel?

Yeah. So most formally, we have a project that tracks us through a Shiny app, of course, in production, but it tracks specifically process improvement. So that's that's one that that's easier to demonstrate, like if we run an experiment and it changes the process or it allows us to increase the yield coming out of our mills. I should say, I said Brown Forman is a liquor company. We also own the Cooperage that makes the barrels and we also buy the wood directly. So like that whole barrel making processes is within our scope. And when you're starting to look at those also warehouse conditions. So climate change has reduced bourbon yields, essentially like hotter conditions tend to mean we use less bourbon because more of it evaporates while it's sitting in the barrels. You know, so we've looked at things like how do we get back to that normal state? So when you can put it in yield terms, we don't have an overall ROI. We have, you know, like occasionally we get estimates of like what our work would have cost if it went through a consultant, but we've never actually like tried to add all of that up.

Tech stack and computing platforms

Yeah, I don't often see it, but they're they're asking me for it. So I was like looking through like meetups and other recordings trying to see if people had mentioned that. I see Steve had asked on Slido, out of interest, what are the main computing platforms that you use?

Yeah, so you're just looking for like where we actually do the math. We have RStudio Workbench on a fairly giant server, which is where we do our development. We are transitioning to also using a connect server for some of the stuff that's in production because it will allow us to automate and also it will allow us to separate production from dev. But maybe I should back up a second and just explain our general tool set.

So we have RStudio Workbench where all of us can can log in. You know, it's set up on a Linux server with like a terabyte of RAM. We have nothing in the cloud. We do everything in house, so we need a fairly large server to accommodate all data scientists and any of the work we could perform. And generally that process will be we have a SQL database, a lot of stuff in Cloudera, although production has some of their own SQL servers that I think are Microsoft SQL servers still. And we'll wind up, you know, connecting through ODBC , pulling in our data, running the calculations in either R or Python, and then we will push them back to a database. And then from that, so we'll have our own section of the database that we run as an advanced analytics team. And then we will build the front end app in either Shiny or Tableau off of that transformed data.

I see Arifath, you just asked another question that yeah, you mentioned I should read live. So if possible, could you please shed some more light on the kind of post model work that you perform? Sure. So, you know, for our price elasticities, like we build a model of the entire market. So it's not just price, it's also distribution, how many stores is it in, how is the category performs in like whiskey doing, how is the overall spirits market doing, etc. And, you know, one of the things we do is we go back through the past year and we say, okay, if we compare the last 13 weeks of sales to the same 13 week period a year ago, and we apply our price elasticity to the actual pricing difference that we observed historically, what's the impact in actual case volume. So we're, we're presenting our users with something where they can say, oh, if I'm looking at like, this most recent week, you know, rolled up to account for some noise. But if I'm looking at what's happening now, not just what was the price elasticity, but how did that actually impact me based on what was happening in prices at the market.

And we've also, you know, rolled out something where, you know, using Shiny users can upload like, this is my pricing plan for next year, what would your model predict is going to happen. Right. So the idea is that we're really as much as possible, giving them that part of that is because users would try to do these with the, do these things with the price elasticity coefficients on their own. And not only does that take a lot more time if a bunch of different people are, you know, repeatedly doing the same math and different accessories, but it also was pretty air prone. And people did not always understand the model. There are good reasons to, you know, use the entire model as a whole and not just look at like, what would happen if I changed the price, but actually enter a plan for, you know, price, competitors, price, whiskey segments, etc.

Scaling the team and hub-and-spoke model

Thank you. I was thinking about what you said, how you were one of the first data scientists of the team on boarded from external or externally, but now you have about 14 data scientists. So I'm wondering what that growth looked like and how you scale data science out across the team.

Yeah. So I mentioned initially, the team came in. So I think data science and R also came to Brown Forman from a financial analyst and someone in the production team at about the same time. And they were completely separate working independently. And then, you know, we got a few more, a bit more interest in that initial pricing, pricing work within finance. I was involved in the R user group in Louisville, and I met, met someone who was doing that work, who had started in finance and internally was sort of the catalyst there. So wound up on that team, which slowly expanded. Some of the expansion was combining with other teams that were already interested. We had a team that was doing a bit more visualization work and ultimately combined with them to get like this core group of six. And then a big part of the expansion was getting spoke rules. So we run on a model where there's, there's a hub of core data scientists and they are, they are just data scientists. They report into our advanced analytics team, but we also have folks who are say part of our production team, part of our U.S. commercial analytics team, part of our Australia pricing team, you know, wherever there are managers who, who want to have folks doing data science, but also directly reporting to their team. So in those cases, what they get is they get to set the agenda really in the projects that are worked on. But what they're provided with is we make sure that they have access to, you know, one, the tech stack, but also to all of the data scientists in the hub, so that they're not just this isolated person working alone, recreating the wheel every time.

And that, that is a pretty big source of growth because it is taking existing spots and making them the third source I'd say is we've been fairly successful with year-long internship programs. And we have three or four folks who have joined that way where we've created new positions after they've done a year-long internship.

Marketing data science to colleagues

So I was just wondering how much of your role is you going outside of your team saying this is where you can improve and trying to get them and persuade them to make that improvement. And that's how much of that is them going, we've got a problem here. Nate and his team have solved a problem for someone else. They can probably help solve our problem and then go to you.

Yeah. So initially when we first like formed an advanced analytics data science team, there was a bit of saying, Hey, this is stuff we can do. Please, please come to us with problems like that. And that was our director at the time, at the time I was a senior data scientist, not really as involved with like soliciting work from the rest of the organization. And now we're at a point where we have enough work without soliciting it from the rest of the organization, where there are definitely times where I think it would be nice if the rest of the organization would do things in more of the data science way, of course, but like we have enough people who want our help that I'm not going to like push data science on people who don't want our help yet. So we've been pretty much able to just work with the problems that people have.

We've also, so I guess the one exception to that perhaps is we've been working on adoption. So, you know, like within pricing, there's, there's a lot of support from corporate and for some of the high level folks, but then you get into the question of, well, what if we're reaching out to salespeople on the ground and to some of the smaller countries, smaller markets doing some trainings there? And I mean, that's largely like, again, our, our partners who actually do pricing, reaching out to those other folks, but we have been trying to focus a lot of our tools on adoption. So that comes both in the occasional training meeting, but also documentation, usability of tools. Most of our revisions are focused not now on adding new features, but on making existing features clearer, making it easier for end users to work with.

Shiny vs. Tableau: when to use which

Nate, I see there's two questions that are kind of similar around Shiny and Tableau. So one said, you've mentioned Shiny and Tableau for your team. What goes into deciding when to use which? And then I see that Ian also added, does it have to do with end user or functionality, for example?

Yeah. So it has to do with, with a lot of push things. So, I mean, one of the first criteria, of course, is if it's not something you can do in Tableau, then we're going to use Shiny for it. And then there's no decision. Another one that, that sort of shortcuts it is we think about who not necessarily at the end user is, but the end maintainer. And so the reason the end user matters less to us is we have a reasonably consistent sort of design across Shiny and Tableau. Like folks generally know, we're going to have some filters on the left. We're going to have a nice header telling you what it does. It's going to incorporate a little bit of our branding, and then we're going to have some graphs in the main panel that react to the filters on the side. So we've never really had trouble with users being confused anymore by Shiny or Tableau.

And then we have a corporate center that links dashboards. And of course, you can also just send people links directly. So it hasn't really been a problem for end users. Where it does matter sometimes we as a team are responsible for initially building dashboards that we want other folks in the business to maintain. And if we want someone else to maintain it, then we're a lot more likely to use Tableau. Of our team of 14 data scientists, we probably have five that are pretty good at building Shiny apps and can build and maintain Shiny apps. But that's it for the entire organization. Whereas we probably have about 100 folks who can at least build basic Tableau apps. Now, although some are much more advanced and can build much more complex things, but just not the user of the dashboard, but the user in terms of the people maintaining dashboards, there's a much bigger base for Tableau. And so that can come into consideration.

The functionality has probably been the biggest thing. So using Shiny, you know, you get version control, you get the ability to be a lot more flexible with your data and your setup. You get the fact that you can really go end to end within a Shiny app. So in many cases, it's a lot faster. I can often, if it's, you know, like if it's a small enough data transformation, you know, I can just work within R the whole time, quickly build a Shiny app, look at my results right away. I don't have to wait for this step of even like, oh, I'm going to output this even to a CSV or to a database and then run Tableau on top of it.

So there's, I suppose it's a bit less of a consideration, but still, I think a relevant consideration of like, what does the person building the dashboard want to use? And what are they fastest at? You know, Tableau can make some data transformations much more difficult than if you're working in R or Python, like it is set up as a visualization tool. And I think Tableau is in fact an excellent visualization tool. The problems I observe in organizations are when they try to push tools too far outside of what they're designed for.

The problems I observe in organizations are when they try to push tools too far outside of what they're designed for.

And so like our prominent, you know, our most prominent Shiny apps where we just couldn't do it in Tableau is our forecasting work, where we want users to be able to interact with the forecast and like set some of the predictions. And so that's where Tableau comes in. And then we're going to run the forecast in the background. And then while some of our pricing stuff is, is in Tableau, you know, some of it where like, if we want users to be able to upload any sort of data and then interact with models and so forth, that's all stuff that's going to wind up being in Shiny. We just have a lot of additional flexibility that way.

Would you be able to, just for my understanding, to be able to give an example of what's something that would need to be a Shiny app because of the model or what's like an exact example versus something that would be in Tableau? The two biggest ones I've seen are needing users to upload data. You know, like we tried something with some financial gaming in Tableau where people had to like enter by hand into input boxes, just like everything they wanted. And it just didn't get used because it's a pain to enter data that way. You know, like, whereas we've been able to give them in a Shiny app, like here, you can download last year's data as a default, change the things you want to change, upload it and see what our model says. The other one is, is connecting directly to outside APIs. Tableau probably has some of this, but our institutional Tableau, we can't add Tableau extensions. So we really are just dealing with like base Tableau. So like we wanted to pull Google Trends into a dashboard, and this is easy enough in R. There's a package for it. We pull it in, we make some nice graphs, and that's something we just couldn't do in Tableau.

Thank you. Yeah, I think it's helpful to just understand, like, because both are great tools, but understanding when you would use one versus when you use the other and being able to communicate that out to the team too. Yeah, and I mean, the more calculations in general you try to put into Tableau, the harder it gets to use and maintain the workbook. You know, like when people have to use, especially other people's Tableau workbooks, going back through it and like unlayering and figuring out the dependencies and so forth of, you know, which calculation relies on which other three calculations, and what is the actual math here becomes fairly tricky. Like it's not the best tool for all of your data manipulations, even though it can do some of them.

Data science in marketing and causal inference

Hi, Nate. I find this talk super interesting. I actually used to work for Irish distillers. So I was working on all the Irish whiskeys back in Ireland in the marketing area. And I kind of noticed that they were really trying to make a big effort to bring in AI and ML, particularly around our packaging. And I was just wondering, does either your teams do anything in the marketing and advertising space in relation to data science and analytics?

Yeah, so hopefully I won't get myself in too much trouble with my answer here. So I have talked to marketing and analytics. And the truth is usually when they come to me with a question of like, can you tell us what was the effect of this marketing campaign? My answer is no, I can't. Because every time I've tried, I don't just say no up front. I'm like, OK, what data do we have? Let's take a look, et cetera. But my answer does wind up being no, there's not enough. The short-term effects are usually too small to capture. The long-term effects, well, there's just too much going on with the long-term.

But I also think it's interesting that you brought up where they want to go with AI and machine learning, because that's not at all really where I'm pushing them. I'm pushing them to run experiments. I think they should do A-B testing. I do not think adding AI or ML is going to fundamentally solve the problems we have of not having good data, having too many things going on at once. Yes, you're going to get some sort of pattern, but fundamentally, I think in order to actually evaluate marketing, you need to collect better data, and you need to actually be intentional about causal inference. And the easiest way to do this is going to be some sort of A-B testing.

I'll actually say more broadly, while AI and machine learning gets a lot of hype, I know the common joke that 80% of the time, the business just needs a SQL query with a group by, but the other 20% of the time, they need something more advanced. I think what they need is causal inference. Most of the time, the business is not, and this does vary, but at least in my line of work, most of the time, the business, for me, wants to know, did my campaign cause something to happen? Did the price change cause something to happen? There are businesses I know where the prediction is fine. Is this part going to wear out in so many years? There's some stuff, and obviously, if you're dealing with image data or text and so forth, AI and machine learning is a huge revelation there. But for a lot of your day-to-day business questions, you need data to be better organized, you need research design, and you need some understanding of causal inference.

But for a lot of your day-to-day business questions, you need data to be better organized, you need research design, and you need some understanding of causal inference.

That's a great question, Laura. I had a follow-up on that, too. If that is what you need, but you don't have that today, how do you communicate that to the team or teach them maybe what data they should have? Yeah. We've been working on that, actually, with marketing and analytics, and a lot of this is internal education. We'll see how well it works. We're moving a little at a time. Our basic pitch has been to, first of all, just explain what A-B testing is for folks who aren't familiar. I think it tends to be a good entryway to understanding research design because as research designs go, it's on the simpler end. We're going to have two groups. We're going to compare them. You're going to need us to help you set up the groups. That I also will say you need to stress because there have been occasions in the past where folks did think they were doing some sort of test, and what they did was they said, we're going to run a huge media campaign in the markets that are performing best right now and not in the other ones. They were doing this for strategic reasons, and it makes sense perhaps as a strategy, but it also completely invalidates any sort of test if you're shifting your funding based on which markets are performing well and not as a way to test the actual effect of expenditures.

Then we've also been trying to get them to just start small and on things that are less threatening. Any sort of test that could result in funding being cut, there's always going to be pushback on that, right? Right now, they're getting the funding. They're doing the marketing. Things are going well. Why do a test when that can pretty much only result in bad news for them? We're trying to introduce this as a way to just at first test two different creatives against each other. You're still going to run something, but why don't we run the more effective of the two creative materials? The hope is that that's going to get some research design involved, but this is stuff we're working on now, so you'll have to ask me in a year or two if this actually worked at all or if it just ran into a dead end.