Resources

Demetri Pananos - Making sense of marginal effects

video
Oct 31, 2024
16:34

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, great. Hi, my name is Demetri. You may know me from formerly bird-themed social media websites. But if you don't know me, that's okay. We can start with a little bit about me. I'm 11,207 souls old and I traveled 2 times 10 to the minus 5 astronomical units to get here.

It's probably not very helpful, eh? Maybe more helpful if I tell you I'm 32 years old. I traveled 3,200 kilometers to get here, or since we're in the States, 2,000 miles.

What was the point of all this? Why am I telling you this? Well, the point I'm trying to make here is that the way in which we talk about comparisons is really important for understanding those comparisons. Now, I get it. Years is a unit of time. It's not a comparison. But when I tell you my age in years, you can make a comparison. My age against yours, my age against the year. When I tell you my age in souls, which is the length of a Martian day, those comparisons become a lot harder to understand. The same is true of the astronomical unit. When I say 2 times 10 to the minus 5, that sounds like a short distance. Unless you know that the astronomical unit is the distance from Earth to the sun.

Okay, so the way in which we talk about comparisons really matters. Why does it matter? Well, it matters because good comparisons are crucial for telling compelling stories. And we, as data scientists, do nothing if not make comparisons and tell compelling stories, right? In fact, some of the tools that we use in pursuit of that goal, like regression, are just mini machines for creating comparisons. Now, of course, not every comparison which comes out the other side of a regression is immediately understandable. Take, for example, logistic regression, right? Logistic regression talks about comparisons in terms of odds ratios and odds. I'm going to argue to you for a moment that is a really bad way to talk about comparisons.

The problem with odds

So consider the following. I ran two polls on that bird website. The first poll gave some hypothetical information about data scientists and engineers and their attendance rate at this conference. I said, the probability an engineer attends PositConf is 10% and the probability a data scientist attends twice as large. Notice I'm making a comparison here, data scientists and engineers, and I'm doing so using probability. Okay. Then I asked participants, what is the probability that a data scientist goes to PositConf? Give me the numerical value. And because the numbers are nice and I think the comparison is a good one, you can see that most people get the correct answer here, highlighted in green. Two times 10% is 20%. Okay. All right.

I ran a second poll on that bird website. This time I changed the comparison to use odds. Everything else remained the same. I just said the odds a data scientist attends PositConf are twice as large. Again, I asked participants, what is the probability a data scientist goes to PositConf? Give me the numerical value. Now the answers look very different. All right. You can see that the correct answer here highlighted in green is not selected the majority of the time. And in fact, there's uncertainty as to what the answer should be. People are kind of spread out between three.

So what's going on here? Well, I suspect that there are two kinds of people, right? There's the group of people who think that probability and odds are interchangeable. Sadly, they aren't. There's probably another group of people who know that odds and probability are not interchangeable, but lack the intuition for if the answer should be greater than 20% or less than 20%. And with that group of people, I can commiserate, right? Take a look at the formula for odds in terms of probability. It's like this nonlinear function to get the right answer. I got to like, okay, I'll take the probability. I got to convert it to odds. Then I got to do the comparison. I got back calculating the probability. It's filled with this mental friction, which is obfuscating the stories here.

Now I know what you're thinking. You're thinking, Dimitri, this data is from Twitter. It's from a group of people who self-select into following you. This is not a good comparison in and of itself. And here you are on stage of PositKon, trying to tell me what constitutes a good comparison. And to that, I say, yes, correct.

But, or rather, and, I think it highlights something else about comparisons. And that is language is not the only thing that matters. In a room full of data scientists, I really hope that I don't have to belabor the point that bias is omnipresent, right? Quality of our comparisons depends on from whom and how that data was collected.

And that is language is not the only thing that matters. In a room full of data scientists, I really hope that I don't have to belabor the point that bias is omnipresent, right? Quality of our comparisons depends on from whom and how that data was collected.

Introducing marginaleffects

So we as data scientists have a really hard job. We need to communicate in an easily understood language, right? To talk about age and years, not in souls. We need to create unbiased comparisons where possible. And we need to do all of this while telling a compelling story. Well, what do we use to do? And how do we do so? There's lots of tools that you can use. I'm going to add one more to your toolbox. It's called marginal effects. Marginal effects is a package that you can use to generate counterfactual comparisons from over a hundred different kinds of models. Literally a hundred. I'm talking linear regression, logistic regression, generalized linear models, generalized additive models, Bayesian models, even some types of machine learning models are compatible with the marginal effects ecosystem. You can use marginal effects to reweight your comparisons so that they better match a target population. And you can use marginal effect to cut down on the boiler plate that you would typically write in service of telling your story. The best part is, is that marginal effects is available for both R and

that's the what, what's the how? Well, I think by following these three steps, we can accomplish our goal. By writing your comparison as a function, by specifying the weights of groups in your population, and then using marginal effects to handle the plots and all the math, you can communicate to your audience in a way which is most natural to them, which results in you telling a more compelling story.

A true-ish example: A/B testing lift

And to illustrate these three steps, I want to tell you about a true-ish example. In a former life, I used to work in A-B testing, experimentation basically. And I would have colleagues come to me for help in the design and the analysis of the experiments they would run. And one day a colleague comes to me and says, Dimitri, I need some help. We ran this experiment. We wanted to improve the converted rate of this widget. And we changed the corners of our buy now button to have rounded corners instead of square corners. We think that the effect of the treatment is going to be different depending on if you're on desktop or mobile. So, you know, we've got this interaction with treatment here. And, you know, I'm pretty happy with this model, but my PM wants me to report something called the lift. I know that logistic regression outputs odds and odds ratios. So what do I do here? Do I need to fit a different model? Should I teach them how to understand odds? I went, whoa, whoa, let's not get too hasty. Let's not teach anybody how to understand odds. We know that's very difficult. And the best part is that we don't have to do either of these things. We can use marginal effects.

And I walked through the following steps. I said, can you write down what the lift is? Like don't even worry about code right now. Just write it down pen and paper. And they wrote down something that looks like this. You know, the lift is the difference in conversion rates between treatment and control divided by the conversion rate control. I said, cool, good start. Can you write that in terms of an R function now? And they wrote down something that looks like this. Again, we're going to take the mean outcome in treatment, subtract the mean outcome in control, and then divide by the mean outcome in control. So, I mean, that's all you need to get started using marginal effects. You can use the average comparisons function, which takes as its argument the model you fit, the comparison that you want to make, and the variable in your model for which you would like to compute that comparison. And you can see that marginal effects is now reporting a lift of seven and a half percent. That means treatment did better. And you get all the typical output that you would want from a statistical analysis, right? Standard errors, p-value, confidence interval.

I told my colleague, now I know that your PM said that they wanted the lift, but they don't want one number. They want two numbers. They want the lift for each device type. To do that, you can use the buyer. Simply pass the name of the variable in your model for which you would like to compute this comparison within levels. So now we can see that desktop users have a lift of about 11 percent, confidence intervals, standard errors, etc. And mobile users have a lift of minus 58 percent. So this experiment was not very good for mobile users.

So what I wanted my colleague to take away from this is that you just need to write your comparison as a function, and then you can use the average comparisons function to get that comparison as well as all the typical statistical output that you typically want. And you can use the buy argument to get comparisons within groups of a third variable.

Correcting for population weighting bias

So my colleague was like, seven and a half percent? Great. I can take that back to my stakeholder. Thanks, Dimitri. Oh yeah, one more thing. There's always one more thing. And my colleague said, well, you know, we decided to add mobile users actually on the last day. This was originally supposed to be an experiment just for desktop users, but we weren't going to hit our sample size in time. So we added mobile users on the last day. Hey, is that OK? That's not a big deal, right? I was like, actually, that's a very big deal. Right now, your experiment weights don't match your population weights. And I showed them a data frame that kind of looked like this. I said, in your population, 60% of users are on desktop. But in your experiment, 94% were on desktop. Do you know what that means? They shook their head, no. And I said, that means that seven and a half percent that we just estimated is biased. And it's biased towards desktop users. Not to worry, though. We can use marginal effects to correct for this.

We just have to follow a few steps. The first thing we're going to do is we're going to take that comparison function. I'm going to rewrite it to be a weighted comparison function. You can see here that I'm adding a third variable, w, here for the weights. And everywhere that I used mean in my function body, I'm just now going to use weighted. See, I'm taking the weighted difference between treatment and control. And I'll divide by the weighted mean. Then we just change a few arguments in our average comparison scope. You can see now that I'm passing the weighted comparison to the comparison argument. I've passed that data frame that had the population weights to the new data argument. And I'm telling average comparisons which column in that data frame contains the weights that I want to use. And you can see that now marginal effects is reporting a different estimate of minus 13%. So that means that treatment did worse overall. And why is that? That's because the treatment did worse for mobile users. And there are more mobile users in our population than there were.

So what I wanted my colleague to take away from this is that bias can result from improper weighting of groups. You can fix that with marginal effects. You can specify the weights of the groups in your population in a data frame. And then you can pass those weights to the average comparisons function to get your appropriate comparison.

So what I wanted my colleague to take away from this is that bias can result from improper weighting of groups. You can fix that with marginal effects. You can specify the weights of the groups in your population in a data frame. And then you can pass those weights to the average comparisons function to get your appropriate comparison.

Plotting with marginaleffects

So my colleague was like, whoa, thanks, Dimitri, for catching that. Now I need to make some plots. This is such an important thing that I need to convey this to my stakeholder. And the best way to do that is probably visually, not math. I agreed. I said, but marginal effects can do that, too. So why don't you tell me what you want to show your stakeholder? Maybe we can partner on this. And my colleague said, well, I think I just want to show them the conversion rates first. And they hummed and hawed. And they described this ggplot that they wanted to make, where treatment was on the x-axis, conversion rate was on the y. I said, yeah, that's true. But remember here, the experiment data is biased. You need to weight it. And instead of munging and doing all these weighted means in dplyr, you can just use marginal effects. I'm going to use the plot predictions function here, which takes as its argument my model. To the by argument, I'm passing what I want on the x-axis here. I'm going to plot treatment. And I'm similarly weighting the data in the same way that I did in the average comparisons call. And you can see that I plotted the groups here on the x-axis. And I've got conversion rate on the y, plus confidence intervals. Always good to communicate uncertainty.

So my colleague was like, whoa, that was so easy. You've saved me so much boilerplate. I said, yeah. And it gets even better. Plot predictions returns a ggplot object. So if you've got a theme that you really like, or a company theme that you have to follow, you can just tack that on to the end of the plot predictions call. And you can treat it like any other ggplot object. You can even map other variables to other aesthetics. So for example, if I wanted to plot device type as well, I can just pass that to the by argument as the second element in this character array. Now I've got both treatment and control, and I've mapped device type to the color aesthetic.

My colleague was like, that's amazing. But now I want to visualize the lift. I don't just want the conversion rates. And I said, you should try the plot comparisons function. And now we're plotting the lift for desktop and mobile, similarly to how we computed it using the average comparisons function. You can see that this function signature is almost identical to average comparisons.

So what I wanted my colleague to take away from this is that you can get started making plots that you would have otherwise made with less code using marginal effects. Marginal effects returns a ggplot2 object. So if you've got a specific theme that you like, you can just tack that on in the ways that you typically would do. And the signatures for average comparisons, or for these plot functions, look very similar to average comparisons. So once you master one, you've mastered them all.

The true story behind the example

Now, I told you that this was a true-ish story. I think I owe it to you to tell you what was true and what was an exaggeration. And the truth is, there was no colleague. I was the colleague. This is my story. And this is a DM from that bird website where I'm DMing somebody called Noah, and you can see exactly that this is what I asked. I said, I run this experiment. I've got some heterogeneity of treatment effect. I fit this model. But my stakeholders want something called relative risk. It's simple. It's very comparable to lift. I want to use marginal effects. How do I do this? And is this even the right tool? And so I would not be up here without the help of the authors of marginal effects and everybody in the stats Twitter community. So thank you to them.

And there is so much that I couldn't fit into a 20-minute talk. So my call to action for all of you is to go to marginal effects.com, where there are tons of vignettes, docs, and memes, some of which I've wrote. And I'm not talking about the vignettes or the docs. I didn't write any of those. Seriously, this library is like a godsend to me. I bet that when we all open RStudio, we do two things. We load tidyverse and load in data. And honestly, the second thing that I do is load in marginal effects. It's that useful to me, and it does way more than I've said here. So please go to marginaleffects.com, see what this library can do. And that brings me to the end of my talk. Thank you so much for coming and listening to me. And I'll see you next time.

It means a lot to me, and I'll see you all in 355.23 souls. That's about a year. So thanks so much.