Demetri Pananos - Making sense of marginal effects

Transcript#

This transcript was generated automatically and may contain errors.

Okay, great. Hi, my name is Demetri. You may know me from formerly bird-themed social media websites. But if you don't know me, that's okay. We can start with a little bit about me. I'm 11,207 souls old and I traveled 2 times 10 to the minus 5 astronomical units to get here.

It's probably not very helpful, eh? Maybe more helpful if I tell you I'm 32 years old. I traveled 3,200 kilometers to get here, or since we're in the States, 2,000 miles.

What was the point of all this? Why am I telling you this? Well, the point I'm trying to make here is that the way in which we talk about comparisons is really important for understanding those comparisons. Now, I get it. Years is a unit of time. It's not a comparison. But when I tell you my age in years, you can make a comparison. My age against yours, my age against the year. When I tell you my age in souls, which is the length of a Martian day, those comparisons become a lot harder to understand. The same is true of the astronomical unit. When I say 2 times 10 to the minus 5, that sounds like a short distance. Unless you know that the astronomical unit is the distance from Earth to the sun.

Okay, so the way in which we talk about comparisons really matters. Why does it matter? Well, it matters because good comparisons are crucial for telling compelling stories. And we, as data scientists, do nothing if not make comparisons and tell compelling stories, right? In fact, some of the tools that we use in pursuit of that goal, like regression, are just mini machines for creating comparisons. Now, of course, not every comparison which comes out the other side of a regression is immediately understandable. Take, for example, logistic regression, right? Logistic regression talks about comparisons in terms of odds ratios and odds. I'm going to argue to you for a moment that is a really bad way to talk about comparisons.

The problem with odds

So consider the following. I ran two polls on that bird website. The first poll gave some hypothetical information about data scientists and engineers and their attendance rate at this conference. I said, the probability an engineer attends PositConf is 10% and the probability a data scientist attends twice as large. Notice I'm making a comparison here, data scientists and engineers, and I'm doing so using probability. Okay. Then I asked participants, what is the probability that a data scientist goes to PositConf? Give me the numerical value. And because the numbers are nice and I think the comparison is a good one, you can see that most people get the correct answer here, highlighted in green. Two times 10% is 20%. Okay. All right.

I ran a second poll on that bird website. This time I changed the comparison to use odds. Everything else remained the same. I just said the odds a data scientist attends PositConf are twice as large. Again, I asked participants, what is the probability a data scientist goes to PositConf? Give me the numerical value. Now the answers look very different. All right. You can see that the correct answer here highlighted in green is not selected the majority of the time. And in fact, there's uncertainty as to what the answer should be. People are kind of spread out between three.

So what's going on here? Well, I suspect that there are two kinds of people, right? There's the group of people who think that probability and odds are interchangeable. Sadly, they aren't. There's probably another group of people who know that odds and probability are not interchangeable, but lack the intuition for if the answer should be greater than 20% or less than 20%. And with that group of people, I can commiserate, right? Take a look at the formula for odds in terms of probability. It's like this nonlinear function to get the right answer. I got to like, okay, I'll take the probability. I got to convert it to odds. Then I got to do the comparison. I got back calculating the probability. It's filled with this mental friction, which is obfuscating the stories here.

Now I know what you're thinking. You're thinking, Dimitri, this data is from Twitter. It's from a group of people who self-select into following you. This is not a good comparison in and of itself. And here you are on stage of PositKon, trying to tell me what constitutes a good comparison. And to that, I say, yes, correct.

But, or rather, and, I think it highlights something else about comparisons. And that is language is not the only thing that matters. In a room full of data scientists, I really hope that I don't have to belabor the point that bias is omnipresent, right? Quality of our comparisons depends on from whom and how that data was collected.

And that is language is not the only thing that matters. In a room full of data scientists, I really hope that I don't have to belabor the point that bias is omnipresent, right? Quality of our comparisons depends on from whom and how that data was collected.

So what I wanted my colleague to take away from this is that bias can result from improper weighting of groups. You can fix that with marginal effects. You can specify the weights of the groups in your population in a data frame. And then you can pass those weights to the average comparisons function to get your appropriate comparison.

Plotting with marginaleffects

So my colleague was like, whoa, thanks, Dimitri, for catching that. Now I need to make some plots. This is such an important thing that I need to convey this to my stakeholder. And the best way to do that is probably visually, not math. I agreed. I said, but marginal effects can do that, too. So why don't you tell me what you want to show your stakeholder? Maybe we can partner on this. And my colleague said, well, I think I just want to show them the conversion rates first. And they hummed and hawed. And they described this ggplot that they wanted to make, where treatment was on the x-axis, conversion rate was on the y. I said, yeah, that's true. But remember here, the experiment data is biased. You need to weight it. And instead of munging and doing all these weighted means in dplyr , you can just use marginal effects. I'm going to use the plot predictions function here, which takes as its argument my model. To the by argument, I'm passing what I want on the x-axis here. I'm going to plot treatment. And I'm similarly weighting the data in the same way that I did in the average comparisons call. And you can see that I plotted the groups here on the x-axis. And I've got conversion rate on the y, plus confidence intervals. Always good to communicate uncertainty.

So my colleague was like, whoa, that was so easy. You've saved me so much boilerplate. I said, yeah. And it gets even better. Plot predictions returns a ggplot object. So if you've got a theme that you really like, or a company theme that you have to follow, you can just tack that on to the end of the plot predictions call. And you can treat it like any other ggplot object. You can even map other variables to other aesthetics. So for example, if I wanted to plot device type as well, I can just pass that to the by argument as the second element in this character array. Now I've got both treatment and control, and I've mapped device type to the color aesthetic.

My colleague was like, that's amazing. But now I want to visualize the lift. I don't just want the conversion rates. And I said, you should try the plot comparisons function. And now we're plotting the lift for desktop and mobile, similarly to how we computed it using the average comparisons function. You can see that this function signature is almost identical to average comparisons.

So what I wanted my colleague to take away from this is that you can get started making plots that you would have otherwise made with less code using marginal effects. Marginal effects returns a ggplot2 object. So if you've got a specific theme that you like, you can just tack that on in the ways that you typically would do. And the signatures for average comparisons, or for these plot functions, look very similar to average comparisons. So once you master one, you've mastered them all.

The true story behind the example

Now, I told you that this was a true-ish story. I think I owe it to you to tell you what was true and what was an exaggeration. And the truth is, there was no colleague. I was the colleague. This is my story. And this is a DM from that bird website where I'm DMing somebody called Noah, and you can see exactly that this is what I asked. I said, I run this experiment. I've got some heterogeneity of treatment effect. I fit this model. But my stakeholders want something called relative risk. It's simple. It's very comparable to lift. I want to use marginal effects. How do I do this? And is this even the right tool? And so I would not be up here without the help of the authors of marginal effects and everybody in the stats Twitter community. So thank you to them.

And there is so much that I couldn't fit into a 20-minute talk. So my call to action for all of you is to go to marginal effects.com, where there are tons of vignettes, docs, and memes, some of which I've wrote. And I'm not talking about the vignettes or the docs. I didn't write any of those. Seriously, this library is like a godsend to me. I bet that when we all open RStudio, we do two things. We load tidyverse and load in data. And honestly, the second thing that I do is load in marginal effects. It's that useful to me, and it does way more than I've said here. So please go to marginaleffects.com, see what this library can do. And that brings me to the end of my talk. Thank you so much for coming and listening to me. And I'll see you next time.

It means a lot to me, and I'll see you all in 355.23 souls. That's about a year. So thanks so much.

Demetri Pananos - Making sense of marginal effects

Transcript#

The problem with odds

Introducing marginaleffects

A true-ish example: A/B testing lift

Correcting for population weighting bias

Plotting with marginaleffects

The true story behind the example