Ellis Hughes - Be Kind, Rewind
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you, everybody. My name is Ellis Hughes. I'm really excited to be here. I am the Associate Director of Data Science at GSK. I also run a screencast called TidyX, where me and my co-host Patrick Ward explain how our code works. If that's something that sounds interesting to you, we have over 180 episodes on YouTube. So you should go check that out, where we do talk a little bit more about what I'm going to be talking about today.
So before I go any further, I do want to say this presentation is my own opinion and not that of my organization. I need to get that out there. All right, so I know it's been a really long day, two days of conferences. So let's just take a second and just stretch a little bit, right? It's been a lot. Everyone stretch. Cool, because I'm going to take a little bit of time to tell you a bit of a story.
The 90s Blockbuster experience
So let's talk about the 90s Friday night starter kit, right? I don't know if you remember, but at least my Friday night starter kit was going to get pizza. We went to the local Papa John's and picked up a pizza, not necessarily the best, but you know, it was a solid pizza. It was the 90s. I was a kid. It didn't matter. Our family would go and sit in front of our old tube TV. You know, the one that made the weird noise when anything wasn't playing. It had that static going. It was deeper than it was wide, about 25 inches across, and it was probably about 30 inches deep. And that was really big for the day.
And the blockbuster movies that you went and picked up earlier that evening with your parents to go and you know, you had to share with your brothers, or at least I had to share with my brothers, the two movies that we were picking because that was by Friday night. This is actually a picture of the blockbuster that I used to go to over in Bellevue across the water here when I was a little kid. And I still remember I have memories of that place still the nerd ropes hanging behind the behind the folks that were checking out the movies, the dropping off the clamshells that you'd put into the slot so that you could return it and the smell of that building. I don't know why, but I still remember that smell.
And a part of this was also the fact that when you went and got one of those DVDs, you really hope that whoever had taken that or the excuse me, the VHS before had taken the time to rewind it. So you didn't plug it into your VHS and play it. And all of a sudden, the climax of the movie, the big reveal was being played right then. And you saw the ending of your movie before you saw the beginning of it, you didn't know what was going on. And you had to wait for those, you know, five minutes because they weren't fast to rewind it all the way to the very beginning to rewatch everything. But you already seen the end there. And for those that are too young to know, these are actually physical tapes that needed to spin backwards. It's not like today where fast forward or wind is just kind of moving along a screen, it physically had to troll back there.
So the idea was that the value of this was that people could see the story, understand all of it before they saw the very end before they saw the reveal or the climactic part of the movie. In 2004, Blockbuster was gangbusters. Let's be real here, it was growing like crazy had over 9000 stores across the US or across the world, excuse me across the world. It's over 6000 in the US, they had 65 million subscribers. So people were paying to be able to rent out and rent movies for a short period of time. And it's 65 they were making money hand over fist.
But that was not meant to be. As we all know, Blockbuster did is not around anymore. They ended up having to declare bankruptcy in 2011. And I believe the last Blockbuster is officially closed in Bend, Oregon. Blockbuster had the opportunity at one point to buy Netflix for $50 million. I think we all agree that would have been a good deal of time. Now, but at the time, they were like, No, it's okay. We think we understand our market. We think we understand what's going on here. Go go away. But then you know, 2004, Netflix started shipping out the DVDs, people started being able to move over eventually to streaming, and Blockbuster just didn't keep up with the times and they died.
Storytelling as a data science responsibility
But it's not like storytelling. Storytelling has been around for a really long time from the beginning of time. We've recorded things on cave walls. We've told each other stories to pass information to pass knowledge to one another. Right? How do you think we learn morals? Why do we think that we read books? It's to learn stories to learn new things and be entertained along the way. But this has been something that's been part of the human experience for forever.
Data scientists have a responsibility then to use stories to impart knowledge to our audiences, to our stakeholders, to whoever we're presenting to. And we have lots of tools to help us along the way. We have tools for data manipulation with tidy verse and with tidy models to help us make sense of the data to create that narratives. We have tools for like ggplot and GT to help us view and visualize the results and explain it to new people. And so we can use these tools to create fantastic things that people can understand and read and take in piece by piece.
Data scientists have a responsibility then to use stories to impart knowledge to our audiences, to our stakeholders, to whoever we're presenting to.
This is a visualization that won the 2020 visualization of the year. It looks like it's showing up okay on the projection here. But there's a lot of different elements here. So as a person looks through this visualization, you can see the different ratings of coffee of different countries, how they scale up or scale down, how some countries have wide variations, some have lower variations. And Cedric adds a lot of different layers to this visualization. So you can learn and understand and get more information the longer you look at it. Or you can take a quick glance at it and get, okay, that's the top country. Cool. Moving on.
But Cedric doesn't just release this sort of plot when he creates a visualization. He also creates a how-to, how he made that visualization. And so what's really powerful with this is we're able to see Cedric's exploration of the data, where it's not just him throwing up that final plot. Granted, there's a lot to it. But here we can see how Cedric is exploring the data, visualizing, understanding it better as he's going along. And we can follow along with him as he's going through this process. And we can potentially learn a little bit more about the visualization by following along in this process.
Now, at the time of the creation of this visualization, he was, you know, had a bunch of code here. And for every single plot that he made, he ran gg save. Every single time. So all those visualizations, there were probably 100 of them. He ran gg save for each one of those. When I found out about that, I was like, what are you doing, man? That's so much extra work. I bet you there's a way for us to make this better. Gorgios also creates wonderful visualizations. And does the how-to as well. He also used to use gg save. Again, when I found out, man, that's so much work. Let's make a package to make this easier.
Introducing the camcorder package
So we created the camcorder package to make it much easier for people to record visualizations as they're creating them. So they don't have to stop their exploration process to save the plot and put it away for their potential later use. We made it so that it's easy to start up, easy to stop, and that it takes all the plots that you create and puts it into that gif format that we saw earlier.
So I'm going to take a little bit of time here to show you how that works and how you can use camcorder. So when you're starting, after you've installed camcorder, the way that you initialize camcorder to start recording is incredibly simple. Use a camcorder gg record function. Now you can just stop there, but there's a lot of additional arguments you can add to it. First is dir, so the directory that you're going to save to. By default, it saves to a temporary directory. The device you want to be using. In this case, we're using png, but you can use svg, pdf, etc. And then you set the height and width of the figure that you'd like to be seeing. That way camcorder understands and knows in the future every single plot you want to create will be saved to that directory, will be saved as a png in this situation of a height of 7 inches and a width of 10 inches.
You can also set basically any argument that you use in gg save in gg record. And then from then on, every single gg plot or patchwork that you create will be saved and shown in a viewer that you can zoom in and view in exactly the dimensions that you've identified, the resolution that you've used, and iterate and see it in real time. You don't have to do any gg save, you don't have to do any extra additional work, it saves it in the moment and shows it to you in your viewer. And so it creates a lot, a nice way for you to iterate and see what's going on exactly as you might be saving it because, well, functionally it is.
Now when you're done recording and you're like, hmm, I think I'm done with this session, you can call, or excuse me, if you decided that the height and width of the figure or the resolution of the figure isn't what you'd want to be using, you can use a gg resize film function. And this will take that same plot that you have and re-run it with a new resolution, so in here, or a height and width, and then you can change it as many times as you want. You can go back to the original one, you can update it, but every single time you run gg resize film, it'll reformat that figure that you have in the new height and width and resolution that you used before.
Now if you're done recording for your session, you can call gg stop recording and it'll stop recording. It will change anything, you can go back to working however you want to, and it'll remove the bindings that it's got in your session. So it's an easy way to start and stop recording to that temporary directory.
So here, let's go through how you might create a blockbuster source over time animation. So you first start gg record, save to a directory, I'm using png, and a constant width and height. I'm going to iterate in a for loop, but because I'm using a for loop, I have to use a print statement, right? If you're using gg plot in a for loop, it doesn't automatically print out. And here you can see that I'm running over from 1999 to 2011, and it's showing every single plot in the viewer as it goes. So we can let that run for a second, it's almost done.
And then when I want to play it back as a GIF, like I showed you earlier, using gg playback function. So you set the name of the GIF that you want to be saving it to, the frame duration, so how long each plot should be shown, how long the first and last images should be saved, whether you want to show that last image first, and what the background you want it to be. And from that moment, it'll take all the plots that you created, turn it into a GIF and open it for you right then. And so you can go back and preview review all the plots that you've created at that time.
So that's how I created these two different plots that I showed you. Actually, both all the plots that I showed you today were created via camcorder. This is my exploration of figuring out how I wanted to show the Blockbuster stores in the U.S. and patchworking together the line plot of the total stores that Blockbuster had over the years. And so that's my exploration. And this is the for loop that we just showed you, showing how Blockbuster rose up into 2004, and then crashed afterwards when they didn't buy Netflix and didn't adapt to the times.
Additional uses of camcorder
So that's just one application of camcorder, where we're iterating and showing ourselves how did we explore the data, how did we see what's going on, and then just a for loop there. But camcorder, I think, has some additional uses as well. So we have animations. So I actually was happy to see today during today's keynote that there is reference to the Gapminder dataset. So I used the Gapminder dataset here and iterated for every five years and showed the life expectancy for GDP versus GDP. And you can show that over time. And so it helps people see not just the one snapshot in time, but you can make a video of this animation across time. Now it's not quite the same as ggAnimate, but it's kind of a quick way to be able to create these animations. And you can iterate over any sort of value. So it doesn't have to be a time value. It could be any sort of value. It could be countries. It could be anything else.
Next, you can use camcorder to run comparisons. So this is the chickweights dataset that's included in the datasets package in base R. And here we're iterating over the four different diets the chicks were given. And you can pretty easily see that diet three and four resulted in the highest chickweights at the very end. So then depending on your interpretation and understanding of the data, you can decide is diet three or diet four actually the best. But it shows them quickly iterating one after the other and makes it easy for somebody to understand what is the data showing us without having to flip between two different plots or try to decipher between the different colors. It adds a little bit extra to it, right?
And then finally, creatively. So camcorder is just recording your ggplots. And so you can use it to do something like this where I used flametree by Danielle Navarro and plotted the generative creation of a flametree and recorded that. And so I think this ended up being a pretty cool and fun visualization added an additional element to the generative R to also see the generation of it. And so I think that's a fun way to use camcorder.
And then one of the maybe ones that isn't explicitly documented as a feature of camcorder, but I think one that people are finding very useful is the fact that because we are saving every single plot that you generate and showing it back to you, you're seeing exactly the height, width, and resolution of that figure as you're developing it. Because how often have you created a ggplot? Looks great in your viewer, but when you go to save it, all the text is skewed, everything looks way off. This helps you solve that problem. Now, this was kind of an off-label usage, but since Cedric wrote about it and how he uses camcorder, I guess we're gonna have to keep it.
Now I flip it back to you. The camcorder is looking at you. These are just a couple ways that camcorder could be used, but I'm sure there's hundreds of other ways that camcorder might help you in your visualization journey. And so it's up to you to figure out how to use camcorder. These are just just examples.
The value of storytelling in data science
So I want you to think back to Blockbuster and how we use stories to tell everything, and how the value of our analysis, while we may give a p-value or some sort of this this is the difference between the two, really the value is telling the story around it, giving our stakeholders the context around it, having them understand what is the actual impact of this analysis, what is the value of the difference here, and causing them to have an action from it, as opposed to just understanding, yeah, they're different, but what does that actually mean to them, and what should they be feeling, and what should they be doing from that? And by telling a story, they're much more likely to remember it, even if they don't remember the exact reason why. So make it a Blockbuster night. Thank you.
And by telling a story, they're much more likely to remember it, even if they don't remember the exact reason why.
Q&A
First question, can this be used in any IT or R environment, or does it depend on a plot that shows up in the RStudio plotting pane? It does not rely on the RStudio plotting pane. It's actually overriding the print method, so I'm sure Hadley and them may not like that, but that's what it's doing. So it should work.
Thanks. Can you embed these GIFs into dashboards or part of HTML? It's just saved as a GIF, so you should be able to use it anywhere you'd use a normal GIF.
Cool, thanks. I have a question for myself. So you said it's overriding the print method. Have you considered using the history that you can get in RStudio? I think we're looking into ways that we can more dynamically pull the code that was run, so that we can record that as well as the plots, and potentially find several additional ways that we could augment this information. So I think we still need to overwrite the print method, but I think we want to figure out additional ways as well.
Great. And can you clarify whether you are overriding the plot code or making a series of plots? Overriding plot code? So we're not overriding the plot code. What we're doing is we're assigning an S3 method to overwrite the print method. I think that's the question. If you want more clarification, come and we can talk more.
