Gagandeep Singh & Xu Fei | Yes, you can use Python with RStudio Team!

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everyone. I'm Gagandeep Singh, and I'm joined by my colleague, Xu Fei. We are solutions engineers at RStudio , soon to be called Posit. And we are here to answer a question that we get asked a lot, which is, can you use Python with RStudio Team ? And the answer is, of course, yes, you can.

Very recently, I had to answer this question, so I'm going to start by sharing a story with you. I'm an Indian national. In order to be here in the U.S. to attend this conference, I need to get a visa. And as part of the immigration process, I have to get an interview with an officer, where they ask you questions like, where are you going? What will you be doing? So I told them that I'm going to this conference, I'm giving a talk about using Python with RStudio. The officer said, oh, I know Python, but I didn't know you could use it with RStudio.

Well, I told him that's exactly what my talk is about. Unfortunately, officer Johnson could not be here with us today. But if you're in the audience, we are assuming that you're a multilingual data scientist who has access to RStudio products and wants to use Python with them. Or you are an R user who works with a lot of Python colleagues, and you may or may not have access to our products, but you're looking for ways to collaborate with them.

So it is one thing to say that you can use Python in RStudio products. We're here to show you how it works with a use case. We're in the D.C. area right now, and there is an active bike share program outside called Capital Bike Share. We went out and took the picture yesterday, and this is the station right outside of the hotel. As you can see, there are bikes docked into the bike station. When you go to the company's website, you will see a map that shows you all the bike stations. When you click on one of the stations, it will show you how many bikes are available right now from a live data feed.

Now, what it doesn't tell you is how many bikes are going to be available in the future. As we know, the bike availability varies a lot by the time of day, the location, and we, if we want to have a smooth trip experience, we want to be able to predict how many bikes are going to be available in the future. So because there's a live data feed, we can actually grab the data, store it somewhere, and make a predictive model so that we can actually know how many bikes are going to be available in the future.

So if you were here at the very conference back in 2020, you probably have seen a talk that uses the similar data set, does everything in R packages. It's actually from our colleague Alex. So this project has been successfully running for over two years. So a lot of data has been processed and stored as interoperable data assets that we'll get back to in a second.

When I joined the company last year from more of a Python data science background, Gagan and I started talking. So since we have all these data, can we do something different? Can we do something entirely in Python and using our pro products? And we wanted to do this for two reasons. Number one, well, just because we can, why not? Well, that's not really a good reason. The second reason is actually we thought this could be a very common scenario where in our pro product customers, the teams that work in one language may want to collaborate with another team in a different language using interoperable data assets that's completely deployed in the pro product ecosystem.

So we are also thinking that the conference is going to be in D.C., so we actually get to go to D.C. and ride those bikes. And here we are.

The next challenge for me is to move this model out of my notebook into a place where other applications can consume the predictions it's making. And this is where I'm going to use a combination of Vetiver and RStudio Connect.

So as Julia introduced Vetiver in her keynote yesterday, and also Isabel gave a great talk about using Vetiver in MLOps operations, I'm going to skip explaining Vetiver, but I'm going to show you how I use it in my work. So in order to start working, I will convert the random forest model that I just built into a Vetiver model so that Vetiver can interact with it. And once I've converted that, I will pin this Vetiver model onto an RSConnect board. Yes, this is the same pins package that is available in R, but now it's also available in Python, and it's used in Vetiver for model versioning. And once I've pinned the model onto RStudio Connect, I will deploy it as a fast API endpoint onto Connect itself. And I'm using the inbuilt function from Vetiver called deploy RSConnect. So what this function is doing is I'm passing the name of the pin directly to the function, and it's converting this function into a fast, converting this model into a fast API and directly deploying it on my RStudio Connect instance. So you can see, right, like from a few lines of code, under 10 lines of code, I was able to move this model from my notebook into Connect using Vetiver.

So I've done the deployment to save us time here. Let's see what the deployment looks like. So this is my deployed model as a fast API onto RStudio Connect. So I get access to these docs that the fast API provides. I can see the API URL. I can see what kind of features it's taking. I can make a sample prediction. I can play with it here. As this API is running on RStudio Connect, I get access to all these features, which is what Connect provides. So I can decide who I want to share this API with. I can change the runtime credentials for this API. I can manage it here.

Scheduling model updates

What I've done so far is run the model once on the current state of the database, but the R process is updating this database every day because the number of bikes are changing every day. So that data is also getting refreshed. So my next task is to update my model with the latest data so that it's always available with the right data, the current data.

So for this work, I'm also again using RStudio Connect. RStudio Connect provides you the ability to deploy a Jupyter notebook and schedule it. So I'm going to deploy this Jupyter notebook using the Git-based deployment in RStudio Connect. This is one of the many types of deployment available. I'm going to use Git-based to make sure that my code on my Git repo is consistent with what is available on RStudio Connect.

And to save us time, this is a short talk. I've already deployed this notebook on RStudio Connect, and this is what it looks like, the same notebook that I had in my IDE. And I'm going to use the schedule feature of RStudio Connect, create a schedule to run this notebook every time the data is updated.

Demo: live deployment of the Shiny app

All right. So, so far we have seen the process of developing the model in Workbench and deploy to Connect and let it run as a notebook on a regular basis. For this part, I'm going to show you the app itself, and we're going to make a change in the app and we'll live deploy to our server, Connect server, and we'll see how it works.

So right now, just take a look back at the app again. This time I wanted you to pay attention to the color of these circles. They look kind of blueish. Just keep that in mind. It's going to be relevant in a second.

I'm going to show you something here. So we made this application in Shiny for Python. It's very, very exciting to use this package, and I learned a lot from this. And, in fact, I actually find the documentation and this online examples, it's kind of magical WASM tutorials, really helpful, and if you want to learn more about it, Winston has a talk coming up right after our talk, so definitely stay for that.

What I wanted to highlight here in the code is the interoperability that really saved us a lot of time. As Gagan mentioned, we can actually use pins, the pins package in R, in Python, so this is the Python pins, and I'm reading a pin that's generated by my colleagues Sam and Alex. This pin was generated from a portal document that's running on the schedule, so basically it takes the bike station information, processes it, and it stores it in the pin and it updates it every day. So I don't have to do anything, I just have to read it into a data frame and use the data frame in my application, so it's really handy for me. And at the same time, also use the vetivert endpoint to use the Gagan's API to predict the number of bikes.

So here, the next, I'm going to show you the update. Now, I mentioned the color, right? I don't really like dark blue color, it looks a little depressing to me. Gagan, do you have any suggestions?

Yeah, let's make it orange, one of our new colors.

Okay, so let's change it to orange. All right. Change made. And so what I will do in this case, we're going to deploy to the connect server, and I'm going to run it right now, and we'll explain in a second what it means. So you probably cannot see everything, so I'm just going to explain in general concepts. First, when you deploy to connect, as connect supports a very large list of contents that you can host and you can deploy, the content we chose is shiny because we support it, but if you want to use Streamlit or Plotly Dash, these interactive contents, dashboards, totally support it as well.

On the other hand, the connect server, you need to specify where it is, so I'm going to just go to the app, refresh a little bit, and you also have to pay attention to the virtual environment because in Python it's very important, you want to isolate your environment so that you have sufficient package isolation, and so what connect does at this point is when you send the command, it uses the RSConnect package. This RSConnect Python package allows you to programmatically deploy the content to connect using a command line, and it wraps your content into a bundle first and uploads the bundle that includes the application itself, the requirements.txt, so that connect first will look for the compatible Python version on the server itself and will use the compatible version of Python to run the app and uses the requirements.txt to rebuild the environment, so each deployment is sandboxed, it's robust against future deployments, it does not impact previous deployments, it guarantees to run all the time, so that is really handy for us, and at the end you can see that there's success with the little green links here, and let's go back to the app again, let's zoom in a little bit more, see the orange circles, right, and if you click on one of them, and you're going to see the predictions, so all right, we made a change, and it's live on connect again.

Wrap-up and next steps

So, let's take a look back at what we just did. We started off with our base ETL jobs that's been running and it's been processing on the connect, and we started building Python models using vetiver, pins, all the good things in Workbench, and we deployed to connect so that it keeps running, and we also used the content that's deployed to connect on the API in our Shiny for Python dashboard to serve our purpose to actually inform us with future deployments, future predictions of the number of bikes.

Now, what does it mean to you? If you're already working Python and you have access to the RStudio Team professional products, well, you can just start using your favorite Python tools today. There's really no need to wait. And if you work with Python colleagues, we just showed you a very simple example, but I hope that it can serve as a little starting point for you to collaborate and extend the capabilities across teams between Python and R teams so that all within the RStudio Team what we can offer, and we definitely hope you can take it to the next level than what we did.

I hope that it can serve as a little starting point for you to collaborate and extend the capabilities across teams between Python and R teams so that all within the RStudio Team what we can offer, and we definitely hope you can take it to the next level than what we did.

So what's next for you after this talk? We introduced a lot about our pro products today, so Shufay and I and most of our customer-facing team is at the lounge right outside the room, so come see us, have a chat. All the assets that we built and all the code behind it, we have made it public, and if you have access to our pro products and want to start using Python, there's documentation on enabling Jupyter and VS Code sessions in there, and as Shufay was saying, you can deploy multiple different contents on connect, so the deployment guide is also available. We also use some really exciting Python packages that we introduced as part of the conference, so there's great documentation available for that. And that's our talk. Thank you so much.

Gagandeep Singh & Xu Fei | Yes, you can use Python with RStudio Team! | RStudio (2022)

Transcript#

Workflow overview

RStudio Team products

Demo: model building in Workbench

Scheduling model updates

Demo: live deployment of the Shiny app

Wrap-up and next steps

Featured software#

rstudio

Gagandeep Singh & Xu Fei | Yes, you can use Python with RStudio Team! | RStudio (2022)

Transcript#

The bike share use case

Workflow overview

RStudio Team products

Demo: model building in Workbench

Scheduling model updates

Demo: live deployment of the Shiny app

Wrap-up and next steps

Featured software#

rstudio