How to standardize access & ensure consistent data in data products with FastAPI & Posit Team

video

Jul 26, 2023

21:19

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello everyone, and welcome to this month's Enterprise Community Meetup, where we'll discuss another end-to-end data science workflow using Posit Team. My name is Ryan Johnson, and I'm a Data Science Advisor here at Posit, and I'm joined by a few of my colleagues who will be hanging out in the YouTube chat during the webinar, and we'll stick around afterward to answer any questions you may have about the content we'll go over today.

As a reminder, this is a recurring event on the last Wednesday of every month, and the goal is to provide the data science community with an overview of Posit Team. These demos are open to everyone, and the attendance over the past few months has been awesome, but feel free to share these webinars with your colleagues, friends, or anyone else that is looking to improve their own data science workflows with Posit Tools.

Now for this month's topic, we're going to focus on how to create a FastAPI in Python, and then show how this API can more or less build bridges between the various steps of your data science workflow.

Overview of Posit Team

Now before we dive into creating a FastAPI, let's quickly review Posit Team. Posit Team is a bundled offering of our three professional data science tools, Posit Workbench, Posit Connect, and Posit Package Manager. Now taking a look at the diagram on your screen, we'll start at the very top with the data analyst, who will be writing code and creating insights using Posit Workbench. They can choose to write code in either R or Python, and they have a variety of IDEs to choose from, including RStudio , JupyterLab, Jupyter Notebooks, and VS Code.

For R developers, they may be creating content such as Shiny applications, Pins, R Markdown, or Plumber APIs. And for Python developers, they may be creating content such as Flask, FastAPIs, Streamlit, Shiny , or Dash applications, Pins, or Jupyter Notebooks. Once the content is created, the developers will need a way to easily share that content with the people that need to see it. And that's where Posit Connect comes into play. Connect not only makes your content shareable, but also gives your team tight control over content access.

Finally, we have Posit Package Manager, which does exactly as its name implies, and that's to host, organize, and distribute open source R and Python packages, not only from the community, but also any internally developed R or Python packages.

Why use a FastAPI as a data gateway

For today's workflow, we will use VS Code from within Posit Workbench to develop a FastAPI in Python. And to keep things nice and simple, we're going to design this API to do only one thing, and that's to serve data. We'll then publish this FastAPI to Posit Connect and show how it can serve as a gateway to your data, which can be accessed by multiple pieces of content using Python, R, or any other language or system.

Now, many of you may be like me, and when you started your data science journey, the idea of working with APIs, let alone creating your own API, was so daunting that I honestly tried to avoid working with APIs at all costs. Well, I'm here to tell you that they're not that scary and can actually be used to vastly improve your data science workflows.

Well, I'm here to tell you that they're not that scary and can actually be used to vastly improve your data science workflows.

Now, the acronym API stands for Application Programming Interface, and they essentially allow communication between applications. To help explain what APIs are, I like to think of the commonly used restaurant analogy you see on your screen here. Here we have three customers that have entered a restaurant and would like to order some food that the chef in the kitchen will prepare. To request food, the customers need some way to communicate their order to the kitchen, and rarely, if ever, the customer will interact directly with the chef. So that's where the waiter or waitress comes into play. They take the order from the customer and can communicate that order to the chef. The chef prepares the food, which is then delivered back to the customer.

So in this analogy, the waiter or waitress allows for communication between the customers and the chef. Essentially, they are the API.

For data scientists, APIs can be an essential workflow building block that gives other users or applications or systems the ability to interact with the functions you've designed, the models you've trained, or the data you've curated. You also don't need to be a software engineer or computer scientist to create an API, and there are numerous frameworks in both R and Python for creating APIs, including Plumber and Flask. For today's workflow, we're going to focus on just FastAPI, which is one of the most popular frameworks for creating APIs using Python, given its simplicity and performance.

Now, as I previously mentioned, the FastAPI we are going to build today and publish to Posit Connect will essentially serve as a gateway to our data, which can be accessed by a variety of content. But why would you want to do this? Well, there are a few reasons. The first reason is that it simplifies your code. You no longer need to attach and read in and transform the data into each individual piece of content. So this helps also prevent data duplication and also standardize data access across content.

Secondly, by putting data behind an API, it makes it easily accessible to different content types. For example, you can have a Jupyter Notebook built using Python and a Shiny application built using R access the same data via the same API. And later on in this demo, we'll demonstrate this capability. Lastly, by publishing the FastAPI to Posit Connect, you can control access to your data by adding an authentication layer to your APIs. And this is a great option if your team is working with sensitive data.

The demo data

Now, we are very close to building our FastAPI, but I just want to spend one minute and explain the data we're going to have behind this API. This data is completely made up, and I wanted to make it as simple as possible given our limited time today. Now, this data, it's also relatively small. It's only about 500 rows and has only two columns. The first column is a random letter in the English alphabet, and the second column is a random number from 1 to 20. And that's pretty much it.

As we go through today's demo, I want you to think about where your data lives today. In this workflow, our data is going to live as a local CSV file, but maybe your data lives in a database or a shared network drive, or maybe as a pin on Posit Connect. By using an API to access the data, no matter where it lives, we can standardize how the data is accessed across your various pieces of content and your entire data science team.

Building the FastAPI

Okay, so let's build our FastAPI. On this slide is all the code needed to create a simple data gateway FastAPI. And as you can see, it's less than 20 lines of code. In the first three lines of code, we will import the libraries, modules, and functions needed for this workflow, including FastAPI itself, the typing module for hinting at content types, and finally, the pandas library for reading in and manipulating our data.

We next create the FastAPI instance using the FastAPI function, which we'll call app in this example. Now, the first step of our API is to read in the data as a data frame using the read CSV function from pandas. We'll then store our data frame as a variable called largeDF.

Next, we define our API endpoint. Now, an endpoint takes in the API request, which is usually a URL, processes it, and sends back a response. For our FastAPI, we are using a get method for our endpoint, which is routinely used to read data. The forward slash followed by the word data tells the FastAPI that anything following the forward slash data in the request URL are parameters to the API, which can be fed into the function directly below.

And finally, we add the function itself. This is where all of the logic of your FastAPI lives. In our FastAPI, we are defining a single function called getFilteredData. And it takes a single argument called letters. Letters are captured as a list of strings when the API is queried. It's also an optional query item and defaults to none if no letters are provided. So we next use the list of letters to filter our data. And if I simply read this code left to right, top to bottom, we start by asking if the query contained any letters. If yes, the data is then filtered for those specific letters and returned by the function. If no letters are provided, it simply returns the entire dataset itself.

Publishing to Posit Connect

So now that we have all the code needed, I'm going to copy it to my clipboard and then navigate over to Posit Workbench. So here's the homepage for Posit Workbench. And you can see that I currently have three separate sessions running. I have a VS Code session, an RStudio session, and a Jupyter Notebook session. So I'm going to first navigate into this VS Code session, which I currently have open in another tab. So I'll click on that.

And once here, you'll notice that I'm currently within a directory I called FastAPIDemo. And I previously created a virtual environment in this directory, which is called .venv. And I'm first going to activate this virtual environment. So to do that, I'm going to open up a new terminal, click on these three lines in the top left corner, select Terminal and New Terminal. And once we have our prompt, I want to source the activate script within that virtual environment. So venv bin activate and hit Enter.

Oops, spelled source wrong. Let's go ahead and fix that. And there we go. I now have my virtual environment activated. And this is typically best practice. And it's a great idea just to ensure that the work we do here is isolated from any other project currently on Posit Workbench.

So let's go ahead and create a new Python file by clicking the New File button, which you can find again clicking on these three lines, File, New File. And I'm going to call this main.py and hit Enter. We're going to place it in our current working directory and hit OK. You can see it show up here on the File Explorer. And now we have our file, which is completely empty. And let's go ahead and paste in that FastAPI.

So now that it's been created, let's go ahead and show you how to publish this FastAPI to Posit Connect using the rsconnect Python package, which I previously installed into this environment. So the rsconnect Python package, it provides a command line interface, so you can use it within the terminal down here for deploying content to Posit Connect. So I'm going to type out the command here in the command line, and I'm going to explain each step along the way.

So we first need to, I'll make my screen a little bit bigger here. We first need to tell rsconnect to deploy a FastAPI. Next, we need to indicate which Posit Connect instance we will be deploying to. Now, I previously saved this Connect instance into my environment, and I gave it the name pctprod. So all we need to do is give it the name dash n pctprod. And then finally, we just need to provide the name and location of the FastAPI. So this main file, it lives in my current working directory, which we can indicate with a period, and then we just have to provide the name of the file, hit enter, and that's pretty much it.

So this is going to kick off the publishing process. So the rsconnect Python package, it's essentially taking a snapshot of my environment, including what packages are in my environment, package versions, what Python version am I using. It sends that information from Posit Workbench to Posit Connect, and then it deploys this FastAPI. And so we'll just give this a few more seconds to publish.

And there we go. So we're all done, and we get these two links down here, and so I'm going to go ahead and click on the dashboard content, and that will navigate us directly to Posit Connect.

Interacting with the API on Posit Connect

Now, once it's deployed, we can now interact with this FastAPI on Connect using this Swagger interface that you see on your screen. Our API, it only has one get endpoint, and we can try querying it using various letters in the alphabet and seeing the resulting API response. So let's try querying this FastAPI to extract all of the A rows in the dataset. So I'll select get right here, and we'll try it out. And here we can add letters. So I'll add an A here, and then we can hit execute. And once we hit execute, the Swagger interface, it'll show you the curl command that was run to ping the API, the request URL, which is right here, and then if we scroll down, we can see the response, which is in this JSON format, where each row of the data is a chunk of this JSON file.

Let's try a few different combinations just to make sure that the API is behaving correctly. So let me come back up here to the top, and we'll add an additional string item. So maybe we'll do, let's try T, and we'll also try M, hit execute, and we can see the curl response, the request URL, and here we can see the response, where it's just Ts and Ms.

Now, a major benefit of this API hosted on Posit Connect is that you can manage who has access to it, which is shown right over here. If I select any of the options that require authentication, including this bottom option for specific users or groups or the middle option, all users login required, then users of the API would essentially need to provide an API key in order to leverage this API, and that API key, you can actually log into Connect, click on your name, and generate those keys right over here.

Querying the API from Python and R

So now that we have the FastAPI hosted on Connect, let's use it as a gateway for multiple pieces of content using both R and Python. And let's actually start with Python in a Jupyter notebook. So let's come back to Posit Workbench, so we have our three sessions currently running, and I'm going to open up this Jupyter notebook session, which I have open in a separate tab here, and here's a Jupyter notebook that I created.

So in this notebook, in this first code chunk here, we have a few packages and libraries we need for this analysis, specifically this request library, which is going to be important for pinging our API. So let's run this code chunk. In the next code chunk right down here, we're going to use that request library to submit a query to our API. So you can see that request URL right here. And we're going to feed in two parameters, which are essentially going to extract data where the letters column is equal to F, G, or R. So let's go ahead and run this.

And we're going to take that JSON response, and we're going to convert it into a data frame using the pandas package. So we'll run this code chunk, and you can see the output of the query for the API. And then finally, we're going to take this resulting data frame that we received from the API, and we're going to create a plot to visualize it. And we do that using the swarm plot from the seaborn package in Python. So also, it's just worth noting that I didn't have to manually upload and filter the data. You know, the FastAPI does this for us.

So that's how to query the API from within a Python Jupyter notebook. Let's actually switch gears to the R language and call the API from within a Shiny application. So I'm going to come back here to Posit Workbench, and we'll open up this RStudio session, which, again, I have open in another tab, which is right here.

Now, we won't go in-depth regarding the underlying Shiny code here, but I do want to show how the FastAPI is being queried from within this Shiny application. So here in the top left corner, this is the Shiny application. It's nothing but R code, which is used to create interactive web applications. So we're going to use the Shiny package for this, but also note that we're using this JSON Lite and HTTR package to interact with our API.

So I'm going to scroll down here to the server function and just highlight a few lines of code, which is going to be important for pinging our API. So this first bit of code right here, this is where we actually ping that API. So we feed it that API URL, which I saved as a variable, and we can provide some query parameters to that API. We then take that response. All right, we gather it here, and then we convert that JSON response into a data frame, which will eventually be converted over to a TIBL. We save that as a response data, which gets fed into a ggplot. So let's go ahead and run this Shiny application so you can see how it works.

So the first thing you'll see here is this input box, and this is where we can choose as many letters as we want. So I'll select a few letters. We'll do C, J, M. Let's do P as well. And once we have the letters selected, we can then query the API by clicking on this box. And then once we do that, you can see it returns a plot that's very similar to the plot we saw in a Jupyter notebook. We have letters along the x-axis and the corresponding numbers along the y-axis.

So that's pretty much it. So hopefully this was a good way to show how you can ping the same API and access the same data set from different content types written in two separate languages.

So hopefully this was a good way to show how you can ping the same API and access the same data set from different content types written in two separate languages.

So now that you know a bit more about how APIs, specifically FastAPIs, work, we hope that you can take this knowledge and incorporate it into your application. Take this knowledge and incorporate it into your own data science workflows. So thanks again, everybody, for joining today, and feel free to stick around if you have any questions about what we talked about today. Otherwise, have a great rest of your day, and we hope to see you again next month.

Thanks so much, Ryan. As Ryan mentioned, we're going to stick around for some Q&A for another 15 minutes or so. YouTube should automatically push you over to that, but I know that wasn't as straightforward last month, so I included the link in the details below in the YouTube description, and we'll also copy that over to the chat right now too. As a reminder, there's also a Slido open for anonymous questions, and so that's at pos.it slash demo dash questions, and we'll keep that open for the rest of the week as well, and we can add the answers there in Slido too. But thank you again for joining us today, and we'll see you over there in the Q&A.

Featured software#