Matthew Kay | Visualizing distributions and uncertainty using ggdist | RStudio (2022)
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
In this talk, I want to give you two things. One, a systematic way to think about building uncertainty visualizations, and two, a taste for how to do it using ggdist.
ggdist is an R package that extends ggplot2 to provide three additional geometries. Each of these is a composite geometry that uses the same underlying framework. To understand how to use these, it's really helpful to step back and think about how to construct uncertainty visualizations systematically.
The idea behind ggdist is that first you would derive an uncertainty distribution. I don't really care if you're a Bayesian or a Frequentist, it doesn't really matter. The point is you derive an uncertainty distribution that you can represent in some way that ggdist understands, which might be a bunch of samples, it might be a distributional vector, it could be a number of different things, I'm not going to get into the details of that.
But if you've derived the uncertainty distribution, what you can then do is map distributional properties onto visual channels, which is what ggplot2 calls aesthetics. This density plot you have on the left might be taking the density function and mapping it onto the thickness of this slab. And you combine it with an interval, which might be, say, taking two quantiles and mapping them onto the xmin and xmax of the interval subgeometry.
How ggdist thinks about uncertainty visualization
And this is exactly how ggdist thinks about uncertainty visualization. You have stat slab interval, and you're saying I want to map the PDF, so the density function, onto the thickness aesthetic. You can then play around with different properties, like maybe the orientation of this slab relative to the interval. You might decide that you want the CDF instead of the PDF, you might decide that you want a gradient plot, so you map the density onto the alpha instead of onto the thickness of the slab.
In fact, as of R4.1, ggdist supports the true gradients that have been added to the graphics engine. So if you output this to, say, an SVG or a PDF, you can zoom all the way in, you'll get high quality gradients no matter how far you zoom in.
We can create a whole variety of different geometries just from this same slab interval family. ggdist provides a bunch of shortcut geometries, but they're all really derived from the same core geometry. Of course, dots interval and line ribbon are similarly flexible, you can do a bunch of cool different things with them, but I don't really have time to get into the details of how you would do all of those different things.
We can create a whole variety of different geometries just from this same slab interval family. ggdist provides a bunch of shortcut geometries, but they're all really derived from the same core geometry.
On the other hand, I did promise Twitter that this talk would be like a Twitter thread directly into your skull. So for the rest of this talk, I'm just going to jump through a whole bunch of different fun examples. I'm not going to have time to tell you how to do any of them, but hopefully it will whet your appetite and you might go and try it out.
Examples gallery
So let's do this. All right, so here's an example of visualizing a Poisson distribution using a distribution vector from the distributional package. It knows that it's a discrete distribution, so it gives you a histogram instead of a density plot. And it also allows you to, for example, map the probability onto the color, so you can get this nice little gradient effect.
Speaking of gradients, here's a gradient plot. This also shows the use of the dodge position from base ggplot with ggdist. Maybe you decide that you want to map the CDF onto it, actually this is the complementary CDF. You can think about this as an infinite number of bar charts with infinitesimal opacity all stacked on top of each other.
Other things that you can do with mapping aesthetics at the sub-slab level. So here what we're doing is we're saying, oh, there's a particular region of interest. Maybe we're interested in 0, plus or minus 1.5. So we just map that region onto a different color. ggdist subdivides the geometry and then allows you to actually map colors onto subparts of the geometry.
Speaking of subdividing the slab, here we're actually using the CDF function, essentially bending the CDF in order to show 50, 80, and 95% intervals on the color of the slab. So this is a nice example of using PDF and the CDF functions together in order to construct a geometry that would be difficult to do if you weren't able to use both of them simultaneously.
Another kind of complex example of multiple functions being mapped onto the slab simultaneously, the fill color is actually varying by group. And we're using the fill ramp aesthetic, which is an aesthetic within ggdist that allows you to independently ramp the color to white as the color gets closer to 0.
Speaking of doing crazy things with densities, any probabilist will tell you if you start taking density functions and transforming them, then you have to correct for the derivative of the transformation. Here we have a lognormal distribution. We look at it on the raw scale. It looks lognormal. If you plot this on a log scale, it should look like a normal distribution. And in fact, it does look like a normal distribution because ggdist figures out the derivative and applies the appropriate correction to the density.
Everyone loves rain cloud plots these days. ggdist makes them incredibly easy to create. So we're just combining a slab geometry with a dots interval geometry. This works because the dots interval geometry in ggdist automatically figures out how big the dots need to be in order to fit. This makes it really easy to combine the dots geometry with other more complex plot layouts. So here, for example, this is what some people call a logit dot plot.
Anyway, I hope that I've convinced you that there's a lot of cool stuff you can do with ggdist. I haven't even really scratched the surface here. So please give ggdist a try. And for more of the examples from this talk, check out the URL below.
