Resources

James Blair | R, Python, and Tableau: A Love Triangle | RStudio (2022)

video
Oct 24, 2022
16:14

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thank you. It's great to be back with everybody. It's been quite a while since I've seen so many faces, and this is awesome. I love being here, and I love being here with you. Like Tatsu said, the name of my talk is R, Python, and Tableau, A Love Triangle.

Some of you may recognize this goes back to a few years ago. We did a talk, and we did an advertising campaign around R and Python, A Love Story, and as I was getting ready for this talk, this title just seemed to write itself, and I was like, this is just too good to pass up on. And then I started preparing the talk, and I was like, love triangles are kind of awful. They don't end well. Like nothing works out like this. Trying to make an analogy between a love triangle and like these three different tools that I hope to illustrate work really well together wasn't working out for me. And so we're going to pivot a little bit, and we'll actually be talking a little bit. We'll be drawing some comparisons to music today, and I hope that's okay.

The music analogy

So to start, we're going to listen to a couple of audio samples from perhaps one of the most famous pieces of classical music. You'll likely recognize it, even if you don't know the name or the composer or anything like that. We'll see if this works.

So some of you may recognize that's the beginning of Beethoven's Fifth Symphony, some of the most famous pieces, famous measures of classical music ever heard. Now we're going to listen to one more version of those same few measures of music, and then compare the two.

Okay, same music, same notes, slightly different. In the first case we just have the piano, and nothing against the piano, I'm a pianist, I have been a pianist since I was five years old, I love the piano. But it can't quite compare to the richness of a full symphony, right? As we add additional musical instruments, as we layer in additional components, the depth of the music grows, and we can hear that in the two examples we just looked at. Today we're going to look at how we can do the same with our analyses. If we start with a basic Tableau dashboard, how can we add additional tools and bring additional technologies to that dashboard to add additional depth of insight to what we're delivering to our business users?

The Tableau dashboard and data

Here's kind of an example dashboard. This is a pretty straightforward data exploration dashboard that allows a user to explore a given data set. I can choose an x-axis, I can choose a y-axis, I can explore the relationship between two features. In this case, this data set is a collection of attributes from different pieces of music. Different genres of music, different songs from those genres, how popular were they, how danceable are they, how likeable are they, what the tempo is, things of that nature. So I can create comparisons between classical music and hip-hop or instrumental music or electronic music if I want to.

And a user can come in here and explore and start to determine their own insights and arrive at conclusions based on what they're looking at in this data. Now, if we wanted to take this to another level and add additional details to it, we might want to look to add some machine learning capabilities here. What if we tried to predict the popularity of a song based on different attributes? And how could we evaluate the effectiveness of different models we might be using?

Building the R extension with plumber

To kind of illustrate this example, I'm actually going to turn to my son. I have a 2-year-old son. He just turned 2 a couple days ago. His name is Forrest. Here he is. This picture was taken just before I left. And as you can tell from this photo, he's a bit random. So we are going to be training a random Forrest model today.

In R, what this might look like is something like this. We'll bring in a package, like the ranger package, and we can write a simple function that takes some input data, X and Y, trains a model, spits out the predicted values based on the training data that we supplied. Now for all the machine learning experts in the audience, this is not a course on machine learning, so please recognize that this is perhaps a bit overly simplistic, but it should help us highlight and understand how these different components can all fit together.

If I wrote a function like this in R, and then I wanted that function to be accessible to other tools, we're going to ignore Tableau for a moment and just say, look, I want some generic other service to have access to this thing that I wrote. The best and easiest way to do that in R is using the plumber package. And here's an example of what that might look like. If I bring in the plumber package, first of all I have to bring the package into my workspace, and then all I need to do is add a few special comments to my script that indicate, hey, this should be an API that's listening for requests from other services. It's very straightforward and easy to use, especially if I'm already used to using R.

Now let's take that and make it a little bit more specific. Let's say that instead of just any generic service out there, I actually want this to be used from within Tableau. I want Tableau to be able to submit data to this function, and I want this function to execute and return the data back to Tableau. Well, I could write my own very unique API to do that, or I could use the plumber Tableau package, which handles a lot of the specifics behind the scenes to make this API work specifically with Tableau workbooks and dashboards. As you notice, again, I've highlighted the differences here between a traditional plumber API and this new plumber Tableau API, and they're very minimal. I bring in the plumber Tableau package, I add a few additional comments that indicate what type of data I'm expecting from Tableau, what type of data I'm going to be returning to Tableau, and then the most significant piece is that piece there at the bottom where I indicate that I'm creating a Tableau extension.

Building the Python extension with FastAPI

If we do the same thing on the Python side, we could bring in scikit-learn, we could define a function that creates and trains a random forest and again spits out the predicted outcomes based on the input data that we received. If I want to make this a generically available service, I could use a tool like fast API, I can bring in the fast API module, I can define an app and then I can define routes on that app that are functions that are listening for incoming input. Just like we saw on the R side with plumber Tableau, there's a new package called fast API Tableau that does the same thing. In fact, it's maybe even a little bit easier than what we saw on the plumber Tableau side because all I need to do is change import fast API to import fast API Tableau and I need to change my app instantiation instead of saying it's a fast API app, I now just say it's a fast API Tableau app. Everything else remains the same.

Hosting extensions on RStudio Connect

Now once I've developed these extensions, I might have used RStudio workbench to develop these, I might have developed these locally, but once I've developed them, I need somewhere where they can be hosted so that I can access them from within Tableau. The easiest place to do that is RStudio Connect. And here's an example of what it looks like when I've published these two extensions onto RStudio Connect.

The first thing I'm going to look at is I'm going to look at the Python extension that we built. Once I come in here, you'll see that I have a nice user interface that illustrates to Tableau users exactly what they need to do if they want to use this extension. There's also a nice feature for the developer which will allow me to come in here and test the API out, make sure the extension is doing what I want it to do. Does it take data appropriately? Does it return the appropriate response? Does it behave the way that I think it should behave?

This is really nice because as an R or a Python developer, this means I never have to open Tableau. I can start from the beginning and I can go all the way through to a working, functioning extension of Tableau, but I never have to actually open Tableau to verify its behavior.

This is really nice because as an R or a Python developer, this means I never have to open Tableau. I can start from the beginning and I can go all the way through to a working, functioning extension of Tableau, but I never have to actually open Tableau to verify its behavior.

If we look at the R extension, here's the R extension we published. We see a similar thing. We have a nice interface that lets us know how to use this from within Tableau, and we'll take a look at that implementation in just a moment, and I also have the ability like we saw on the Python side to test and verify the behavior of this particular extension. I can copy output data from somewhere else. I can drop it in here and I can verify that this is, in fact, returning the results that I would expect it to be returning given the function that I defined.

Using extensions from within Tableau

Now if we look at how to do this within Tableau, we've tried to make this as easy as possible for somebody who maybe isn't an R or Python developer but is a Tableau user to be able to come in and use an extension that has been shared on a platform like RStudio Connect. In that case, a user might come in to this particular page, they'll highlight and copy the usage section of the documentation, they'll move over into Tableau, create a new calculated field, give that calculated field some sort of a name, in this case we'll call this random forest or something to that effect, and then paste in the data that they copied or the script that they copied from the previous documentation.

Once this is copied in, the only thing that we need to do is we need to supply the actual names of fields that we want Tableau to supply to this extension. In this case I was very creative and so we have X axis and Y axis, but that allows us to simply identify what the X and Y variables are that we're passing in to our underlying model. Once this is done, I can use the calculated field, I can drag it into my workbook and use it just as I would any other variable or feature within my dataset.

Putting it all together

So if we bring all of these different components together now, right, we have this original Tableau dashboard that allows users to explore data, define X and Y axes, filter to specific genres of music, identify relationships between these different features that they might be interested in looking at, and then we have these two new extensions that we've developed. Both are training random forest models, but one is done in Python and one is done in R and they're hosted on RStudio Connect.

If we take a look at all these pieces put together, we might have something that looks like this. Our original dashboard is up at the top where users can still define X and Y axes, investigate relationships between different components within the data, but then down below we've added these two new plots that indicate the residual plots from our underlying random forest models that we trained. On the left we see our random forest model in R, on the right we see our random forest model in Python.

And you'll see as I interact with this particular dashboard, these, everything is dynamic. Everything updates as I make changes and adjustments to the dashboard. So if I choose to bring electronic music into the mix, I'll see that first reflected in the plot up above, and then I'll see it also reflected in the plots below as new data. Once again, this is a realtime interaction, so new data is being submitted from Tableau to these extensions, some sort of execution is happening, and then the results are being transmitted right back into Tableau for further analysis.

Extending Tableau's capabilities

Now this use case is a little bit specific, and the use case that we've kind of focused on primarily here is I'm in Tableau, but I need to do something to my data that Tableau can't do. And in many cases that might be like something machine learning or machine learning adjacent. I need to do some sort of statistical procedure or analysis that Tableau just isn't fully equipped to do. And I've had good conversations with people at Tableau that have indicated to me, look, we're not trying to make Tableau a fully featured statistical analysis engine, and so these extensions are the way in which we can broaden the horizons of what Tableau is capable of by allowing it to plug into other tools that might be more effective at certain tasks. And we certainly might all agree that R and Python are two tools very well suited to common statistical and machine learning tasks.

There's another class of problems, though, and that class of problems is I have some data in Tableau, and I have in my mind some sort of visualization that I want to do, but I can't do it, or it's really difficult. And if you've ever worked with Tableau, I find myself in this position pretty frequently where I can imagine some sort of visualization, and then I find myself wishing there was some sort of like ggplot Tableau integration thing.

While we don't quite have that, what we do have is a package called Shiny Tableau. Shiny Tableau, and I'm not going to go through all of the details here, I'm going to provide links to the documentation and things that you can kind of explore on your own, but Shiny Tableau allows you to define a custom Shiny application that creates a unique extension for a Tableau workbook. What this looks like is, and on the left I have this, there's a specific manifest file that you write when you create these extensions. There's some nuance to how this gets put together, and if you're familiar with Shiny, it will feel comfortable and familiar to you, but if you're not quite familiar with Shiny, it's a good idea to start with Shiny first and then look at Shiny Tableau as a second option.

But there's this manifest file that I create that defines the extension and some attributes of it, and when I run that Shiny application or when I publish that Shiny application in somewhere like RStudio Connect, what I end up with is this page on the right. This doesn't look like any sort of Shiny application you've likely seen before, because there's nothing really interactive here. Instead what I have is a prompt to download some sort of file. So what I do is I download that file locally, and then if I'm using Tableau desktop, I can import that file directly into Tableau desktop. If I'm using something like Tableau online, I would need to put that file in a location where I can read it from my Tableau online or Tableau server environment.

So if we bring all this together and take it once more from the top, here's what that might look like. Here's the existing dashboard that I have with my R and Python extensions operating here on the bottom, and if I want to bring this Shiny Tableau extension in, what I need to do is drag an extension onto my dashboard, navigate to that file that I downloaded locally, so here we'll see that I select the file that was downloaded from that Shiny application. Once that's downloaded, I then need to configure the extension itself, and these are all components that you can adjust and set up when you define the Shiny Tableau extension. I'll come in here, I'll select the data that I want this extension to be reading from, and then I'll select the values that I want Tableau to pass to this specific extension. In this case what we're going to do once we've finished this is we'll have a nice density plot over here on the right-hand side that demonstrates the density of everything along the Y axis, and I can do the same thing to create a density plot along the X axis as well, which might be something that I want to give me a better understanding of the shape of my data, especially when I have so many data points that I'm considering like I have in this particular visualization.

RStudio Connect as the unified platform

One other thing that's worth mentioning that I haven't brought up to this point is the fact that, again, we have both R and Python extensions operating as part of the same Tableau workbook. RStudio Connect is the only endpoint that supports this idea of both R and Python extensions simultaneously in the same workbook. If you were to use something more traditional like TabPy or RServe, you would be restricted only to Python execution or only to R execution. But given the work that we've done with the folks at Tableau and the work that we've done internally with RStudio Connect, RStudio Connect can host R extensions and Python extensions side by side and they can be leveraged from the same Tableau environment alongside things like shiny Tableau extensions as well.

RStudio Connect is the only endpoint that supports this idea of both R and Python extensions simultaneously in the same workbook.

So what does this all mean? Going back to our music analogy, the idea here is we can use the right combination of tools for the job and we can deliver a symphony of insight. More than just a piano solo, we can deliver something that astounds and astonishes our end users. Now, is this necessary every time? It's not. I have a 6-year-old daughter who just had her first piano recital, and it was exactly what it needed to be. She didn't need to be playing the violin and the trumpet and everything all at once. The piano recital was enough. And it's up to us as the data scientists, as the analysts, to decide what level of insight is the appropriate level for the analysis that we're doing.

It's important to understand that we have a whole host of tools, like JJ kind of alluded to in his talk this morning, we continue to look at this ecosystem and look for ways in which we can participate in the broad scientific computing ecosystem and that includes tools like Tableau and the integrations that we've done there.

If you're interested in learning more, there's some resources here, there's a GitHub repository that has several different examples of how these different extensions can work with different Tableau workbooks and things like that, and you can learn from those examples. The slides and all the content from this talk will be made available on GitHub shortly after I've kind of finished everything up today. And finally, I just want to say thank you, it's been great being back in person, and if there's any questions, we can take those now. Thank you.