What is Posit Team?

video

Apr 2, 2024

20:06

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

At Posit, we want to be the open source data science company. As a public benefit corporation, we do more than just sell software our users love. We aim to make software for data science and scientific and technical computing available to everyone, regardless of their ability to pay.

We believe that the best way to create data science and scientific discoveries that are reliable and reproducible is through code. So our focus is helping people who write code generate value in their organizations more efficiently.

We take a holistic approach to this goal, innovating on free and open source software, making it easier for individual data professionals to create value in R and the Python programming languages.

At the same time, we want our professional tools to be the standard for writing and sharing open source data science. We are relentlessly focused on creating the best ways for people to create, deploy, and use core open source data science capabilities at scale.

As you'll see in the rest of this demo, that means creating the best environment for people to develop and share content in R and Python.

We believe that means not trying to reinvent the wheel, but instead to build first class business and technical integrations to BI tools, MLOps frameworks, generative AI, data stores, and more.

To get started, let's first introduce you to Posit Team.

Introducing Posit Team

Posit Team is a bundled offering of our three professional tools, Posit Workbench, Posit Connect, and Posit Package Manager.

Starting at the top, we have your data scientists and analysts. These are the folks writing code and creating insights using Posit Workbench.

And they can use whatever language they want, including R and Python, and develop in any environment of their choosing, including the RStudio IDE, JupyterLab, Jupyter Notebooks, as well as Visual Studio Code.

For your R developers, they may be creating insights like Shiny applications, pins, R Markdown, or Quarto documents, APIs using the Plumber package, or deploying and monitoring models using Vetiver.

And for your Python developers, they have a home to create interactive web applications using Streamlet, Dash, Bokeh, and Shiny. They can also create documents using Jupyter Notebooks or Quarto , and APIs using Flask and FastAPI. You'll also notice that many of our tools we create here at Posit, including Quarto, Shiny, Pins, and Vetiver, are available to both your R and Python developers.

Now, once the developers create content, they need a way to share it with the people that need to see it, including decision makers, co-workers, clients, or maybe even just friends and family. That's the role of Posit Connect, which is our professional publishing platform. Not only does Connect make it easy to share content, but also control access and performance scaling.

These settings are usually in the purview of IT, but Connect puts these controls in the hands of your data scientists.

And finally, we have Posit Package Manager, which helps centralize and organize the great open-source R and Python packages your team uses for data science, machine learning, and more. You can also host internally developed R and Python packages and restrict access to specific packages based on various user-defined criteria, including the presence of known vulnerabilities.

In our eyes, an open-source platform without real package management is not an open-source platform.

In our eyes, an open-source platform without real package management is not an open-source platform.

Posit Workbench demo

So to start our demo, let's introduce you to Posit Workbench.

After logging into Posit Workbench using your team's preferred authentication method, we land on the Posit Workbench homepage. On the left-hand side, I'll click the New Session button, which brings us to a new window where I can choose my development environment.

Currently, Posit Workbench supports JupyterLab and JupyterNotebooks, RStudio, and Visual Studio Code.

Depending on your configuration, you can deploy a session on the local Linux machine running Workbench, or if you're working in an HPC environment, developers have the flexibility to choose the compute workbench and the image they would like to deploy their session within.

Let's go ahead and open a new RStudio session.

Once inside the RStudio IDE, developers can start coding immediately. Users can easily make connections to external data sources using a variety of methods, including our provided professional ODBC drivers to many of the most common databases in the world.

to many of the most common databases available today, including Postgres, Snowflake, Redshift, BigQuery, and even Databricks.

We also added seamless authentication pass-through from Posit Workbench to Databricks so that users can easily start, stop, connect, and view details of accessible Databricks clusters. Users can also easily access their Unity catalog data and models without ever leaving the RStudio IDE.

In this workflow video, a user is leveraging the R package known as sparklyr to access some UFO data stored on Databricks within Posit Workbench. Alternatively, the user could access the data using the provided ODBC driver.

The quarto document you see in the top left quadrant makes use of the Databricks connection to extract, process, and visualize the UFO data.

Users can also leverage the power of generative AI to assist with their development, including the built-in support for GitHub Copilot within the RStudio IDE on Posit Workbench.

In this workflow, a user is creating a model using a suite of packages called tidymodels , which we've developed here at Posit. As the user writes code, GitHub Copilot takes in cues from the code and comments to provide autocomplete-style suggestions to the developer.

Additionally, developers can directly access chat-style large language models, such as chat-gpt, directly in RStudio to create the general structure of a modeling pipeline all in one go.

These generative AI capabilities in Posit Workbench are also customizable and extensible, and users can easily write their own add-ons that use self-hosted models.

Once your users have accessed their data, they can begin creating valuable insights to help drive your business forward. These insights can take many shapes, from static reports to interactive web applications.

What you're seeing here in the top-left quadrant is something known as a Shiny application. Shiny is an open-source package we developed here at Posit that allows anyone to create interactive web applications using only R or Python code.

This application was written in R and makes use of the tidymodels packages we discussed before to create different predictive models. We can run this application locally within Posit Workbench to see the final product.

On the left-hand side, we have a drop-down menu to select various model types. After selecting a model, the application executes the modeling process in the background and makes predictions using a simple dataset containing car data.

Here, we are assessing the accuracy of our models by looking at the predicted miles per gallon for various cars on the y-axis versus actual miles per gallon on the y-axis.

Posit Connect demo

While this example application may seem straightforward, it can provide valuable information about the performance of different models. These insights can be of immense value to your team, but only if they are shared with the relevant decision-makers.

So how, then, do we share this application with others? That's where Posit Connect comes into play. There are a variety of methods to publish content to Posit Connect, some of which are as simple as clicking a button.

At the top of the RStudio IDE is a button that will kick off the publishing process. In the subsequent pop-up menu, we first define which files we need to send to Posit Connect. In this example, it's only our shiny app.r file.

Next, we define the Connect instance we would like to publish to. And finally, we can give our application a name. I'll leave it as Modeling App, and then we click Publish.

Without Posit Team, if a user wanted to share this application with others, the recipients of the app would be required to have R installed, along with all the necessary R packages and the correct versions of those packages.

With Posit Connect, the dependency capturing and environment replication is done automatically. This ensures that the content, once hosted on Posit Connect, runs as intended.

And here we have the modeling application we developed within Posit Workbench, now hosted on Posit Connect.

As the owner of this content on Posit Connect, I have full control over who has access and how the application behaves. These settings are accessed by clicking the gear icon in the top right corner.

In the Access section, the first thing you'll notice are the sharing options. If you need to ensure that your content remains secure and is only accessible to specific individuals or groups, you can choose the bottom option and then specify the particular users or groups in the text box below.

Similarly, if you want to ensure content is only accessible to individuals with authenticated access to Posit Connect, we can select the middle option.

Finally, to open the content to the public, we can select the top option. Now, the only thing I need to share this content with others is the content URL at the bottom, which can be customized in the text box above.

Once a viewer clicks the URL, they can treat the Shiny application just like any other website and won't need to know anything about R or Shiny.

Again, the primary goal of Posit Connect is to make sharing content as easy as possible with anyone.

Now, not all content hosted on Posit Connect is the same, and in some cases, content may only be viewed by a handful of people, and in other cases, it may be viewed by thousands, potentially all at once.

To help tune and scale your applications and APIs, publishers and administrators have access to runtime settings. These settings can be easily modified to ensure a good user experience.

The scheduling option only applies to static content, things like R Markdown, Jupyter Notebooks, and Quarto documents. We'll discuss these settings later on in the demo.

Next, we come to the tag setting, which allows teams to establish their own organizational structure for content hosted on Posit Connect.

Finally, we have the var setting, which provides developers with the capability to securely incorporate secrets such as passwords, keys, and tokens as environment variables, thereby avoiding the need to embed these sensitive details directly within the code.

MLOps with Python and Vetiver

So far, we've used the R programming language and various open source tools and packages to create and interact with machine learning models. We are going to switch gears and now use Python within Posit Team to monitor the performance of a model.

At Posit, we've developed an open source package named Vetiver, available in both R and Python. This package aids in the deployment, versioning, and monitoring of machine learning models, a field commonly known as machine learning operations, or MLOps.

Let's use Vetiver to monitor the performance of a model using Python within VS Code running on Posit Workbench. To get started, let's open a new session of VS Code. A major benefit of Posit Workbench is you can have multiple sessions of various IDEs running at the same time.

Within this session of VS Code, I'll first open a Quarto document which contains our Vetiver workflow. Quarto is a Posit-created open source tool for scientific and technical publishing. This Quarto document was written using Python, but Quarto also supports other popular data science languages, including R, Julia, and Observable.

In the Python code, we are creating a model and then measuring the performance of the model as new data is captured. Let's preview the Quarto document and view the performance metrics. We'll get a live preview window that shows the rendered Quarto document in HTML format, thanks to the Quarto VS Code extension.

It's also worth noting that we could view this live preview using the Posit Workbench VS Code extension that provides secure proxying to the local development environment. This is great for viewing documents or testing interactive applications.

Here we have three plots that show three commonly collected metrics for model performance, mean absolute error, mean squared error, and the r-squared value. These valuable metrics, just like our Shiny application, will likely need to be shared with others. So let's publish the Quarto document to Posit Connect.

In this example, we are going to use the rsconnect command line interface tool to assist with the dependency capturing and publishing to Posit Connect. In the terminal, at the bottom of the screen, I'll type out the following command, rsconnect deploy Quarto. And then provide the name of our Quarto document, which is mlops.qmd.

I'll hit enter, and then rsconnect will take care of the rest. Just like our Shiny application, my development environment, including my Python version, packages, and package versions, is captured and sent to Connect. Once Connect replicates my environment, it will then deploy the Quarto document. And here it is, now hosted on Posit Connect.

Just like our Shiny application, I have access to the same content settings for our Quarto document, including the sharing settings that we discussed earlier for managing access to content.

However, for reports like Quarto, R Markdown, and Jupyter Notebooks, the source code can be re-executed on a schedule of your choosing. This is great for automating repetitive tasks or streamlining data science workflows.

Since our machine learning model performance metrics are calculated weekly, let's have this Quarto document automatically run every Monday.

I'll first select the schedule tab, and then I have the option to choose my time zone. Next, I'll select the frequency, which I'll change to weekly, and have it run every Monday.

Finally, I have two options at the bottom. The first option asks to publish the output after it's generated. Since the plots will change every time you run this document, I'll make sure to keep this selected so that Quarto content on Connect is always up to date with the latest metrics.

For the second option, Posit Connect can be set up with an email client that enables you to send emails to users containing a direct link to this content whenever it runs. In the case of Quarto and R Markdown, you also have the option to include plots and tables directly within the email, offering an additional valuable way to share data insights.

Posit Package Manager demo

Open source data science without proper package management can result in a painful coding experience and non-reproducible workflows. That's precisely why we created Posit Package Manager, which is designed to ensure that your open source data science workflows function seamlessly and reliably.

In fact, all the machine learning analyses we've done so far in this demo were using open source packages installed from Posit Package Manager.

Let's start with a quick overview of Posit Package Manager, and we'll discuss many of its amazing features along the way. Once your administrator installs Package Manager, users can access the homepage, where they are immediately greeted with options to select various packages from repositories that your team can customize.

This instance of Package Manager has 11 repositories and is serving over 26,000 packages from CRAN, which is the primary repository for our developers, as well as over half a million PyPI packages, which is the primary repository for Python developers.

Let's take a look at the various repositories in this instance of Package Manager. Here I have five R repositories, including a mirror of CRAN, and five Python repositories, including a mirror of PyPI.

I can also serve packages from Bioconductor, which is a popular repository for bioinformatic analyses. And you'll also notice an internal repository for both R and Python. These are repositories that serve internally developed R and Python packages, making it easier to collaborate and share code with colleagues.

Some of the other repositories configured in this environment include subsets of both CRAN and PyPI, blended repositories, which allow you to combine both internally developed packages with subsets of either CRAN or PyPI. And finally, you'll notice a block repository for both R and Python.

This is an added security feature of Posit Package Manager, which allows you to block high-risk and unwanted packages based on various criteria, including specific licenses or known vulnerabilities.

Let's select CRAN and discuss how users would go about installing packages from this repository. Clicking the Setup button will take me to a new page where Package Manager will prompt me with questions regarding my environment.

The first question has me select my operating system. If you're using Posit Workbench, then you would select Linux and then choose the Linux distribution below.

Another feature of Posit Package Manager is that it can serve R and Python packages as pre-compiled Linux binaries, which can dramatically increase the package installation speed for your users.

The next question asks if I want to freeze package versions to enhance reproducibility. Open-source packages tend to evolve quickly, which is great for cutting-edge data science. However, there may be instances where new package versions are not compatible with other packages or code your team has previously developed.

To prevent these conflicts, Posit Package Manager allows you to freeze a repository to a specific date using the calendar below. Which helps lock package versions in place and prevent any potential conflicts.

Finally, after choosing my development environment, I'm presented with a customized repository URL. Reading through the URL, you can see I'm using the eval instance of Posit Package Manager, the CRAN repository, and I will install Linux binaries that are frozen to a specific date. In this example, that date is January 26, 2024.

Below the repository URL are the setup instructions to help users and administrators configure this repository in the development environment I selected above.

For our previous machine learning operations analysis, we used vetiver, which is an open-source package in both Python and R. Let's take a look at the vetiver R package within this repository.

Once the repository URL has been configured in the developer's environment, all they need to do is run the install.packages function to install vetiver. Below, you can get some more information about vetiver, including the most recent version and any known security vulnerabilities.

The vulnerabilities are reported based on a database of known open-source vulnerabilities called the OSV project. vetiver does not have any known vulnerabilities, but if I navigate to another R package called commonmark , you can see what known vulnerabilities exist.

Administrators can also block packages with known vulnerabilities, which prevents users from installing them into their local development environment.

Summary

In summary, Posit's tools excel in leveraging code to produce highly customized analyses and visually engaging data insights like interactive reports and applications. This customization is what truly sets Posit apart from other analytic and BI platforms.

Moreover, Posit Teams significantly contributes to enhanced data and software governance, making it a valuable asset for your team's data-driven endeavors.

This customization is what truly sets Posit apart from other analytic and BI platforms.