Resources

Eric Leung - R Scripts to Databricks: Lessons in Production Workflow

video
Oct 31, 2024
4:35

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone. As the gentleman said, I am Eric Leung, and I'm a data scientist at the Walt Disney Company working on marketing and content analytics. And last year, we started a project with ESPN, where they wanted to understand the effectiveness of their ads on live TV on the TV-watching households. More specifically, they were interested in how well these ads converted these households into cross-channel platforms. So, for example, subscribing to the Disney Plus Bundle or engaging with the ESPN website in some manner.

When we started this process, we started with an R Script on one of our laptops. But by the end of it, we scaled up our operations, used Databricks for our workflows, and even threw in some Python in there. So today, I want to share three lessons that we learned in doing this process, and hopefully some of these resonate with you.

Lesson 1: Don't reinvent the wheel

The first one I want to share is don't reinvent the wheel. So for some background for us, we've done some marketing campaigns more in the email marketing space, where we could separate groups into a holdout group and a testing group and send emails just to our test group. Now we're thinking, oh, can we use the same methodology for viewership? However, you can't really go into somebody's house and say, you watch some ads and you don't. So we had to find some other way to do this.

Luckily, some of our colleagues were working with Hulu on a marketing campaign, and were making use of a counterfactual model to do this modeling. And even though they're different platforms, Hulu being streaming and our work was in the live TV space, there's some commonalities that we could use to then refactor our code enough from theirs to our own project, hence not reinventing the wheel there.

Lesson 2: Use the best available tool for the job

The next lesson I want to share is to use the best available tool for the job. So the ads that we started with were, say, watching MLB and running this, again, on our script on your own laptop. ESPN was also interested in how these ads would work on the other sports leagues that they have access to. And in my mind, I'm like, okay, now we have an R script running now five times on one laptop.

And this is how I kind of felt where we had this new model that we had to implement and refactor, but also scaling up our processes to handle all these other different sports leagues. So after going through some brainstorming, we're thinking, oh, okay, maybe we just need some sort of server to offload some of our compute, maybe an Amazon web service or even Posit Connect. Unfortunately, we didn't have access to those. But coincidentally, our group also recently got access to Databricks.

And after talking to some colleagues, we realized it was the best available tool that we had, where it could offload some compute, but also organize some of our data workflows in their workflows feature.

Lesson 3: Budget time for new tools

And the last lesson I'll leave you with is to budget time to make use of these new tools. And so before, we had just things on our laptop and adding in some Databricks, right? So this will take some time, and we're committed to this. And I was like, how long could this take? Maybe a couple of weeks?

It turns out it took nearly two to three months to even just go through all the quirks and refactor our code, putting things, getting used to Databricks, and et cetera. So I would say, yeah, budget some time to make use of new tools or even existing tools.

It turns out it took nearly two to three months to even just go through all the quirks and refactor our code, putting things, getting used to Databricks, and et cetera.

So in sum, my three lessons that I want to give you is don't reinvent the wheel if you don't have to. Maybe you have some open source project that you can look at. Use not only the best tool for the job, but also the one that you have available to. And then lastly, to budget time into your project plan to get used to these new tools and to deliver your project successfully. So thank you.