How to mitigate package security risks with Posit Package Manager

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. I'm Rachel and I lead our customer marketing here at Posit. It's so nice to see so many of you last week at the conference and thank you to everybody who joined us both in person and virtually. A lot of our time was focused on last week's conference, as you could imagine. So rather than skipping the session this month, we're just going to do things a little bit different. At last week's conference, Joe Roberts gave a talk on package security that I thought would be helpful as part of our workflow demos here. I recorded this part because I'm actually on vacation this week, but Ryan and Joe will be hanging out with you all in the chat right here. So if you have any questions today, or maybe follow up questions from last week, you can ask them in the YouTube chat. And we also have a Slido link where you can ask them anonymously too. And I've shown that Slido link here on the screen. But thank you again for joining us and I'll turn it over to Joe.

Hi, I'm Joe Roberts and I'm a product manager here at Posit. And one of the most common questions I get asked, especially by IT administrators and security teams, is how to make sure the public packages that their data scientists want to use are safe. So today we're going to explore that topic. And we're going to start with an overview of the main public package repositories used by data scientists working in R and Python to understand how they work, and more importantly, how they differ in their approaches to package publishing. We'll also explore some of the more common types of security risks in using public packages. And we'll look at strategies to mitigate those risks using tools like Posit Package Manager.

And so latest isn't always the safest there, and sometimes it's worth it to sacrifice a bit of cutting edge and compatibility in the name of security.

So in Package Manager, we can leverage what we call repository snapshots to easily pin our installation source to a slightly older versions of the repositories. We also get the added benefit of reproducibility of our projects by not always getting updates to the packages that might break our existing code. And it's not a perfect solution, but it's one tool that can be used together with these other strategies we're talking about to reduce our risks.

Package confusion: dependency confusion

Finally, I want to talk about a different variant that preys particularly on larger teams, especially those who develop and share their own internal packages to supplement their work that we call dependency confusion.

So let's go back into our clean environment here, where we're now part of a larger company that has its own internal packages. Let's say we're at Posit, and we have our internal Posit tools package that we use, along with some public packages from PyPI. We may even be so sophisticated as to have our own internal packages in a PyPI-like repository so that Pip can install from there.

So we want to install Posit tools. We add this extra index URL flag to our internal server so that it can find where our additional packages are. In reality, this extra index URL setting is probably set as part of our system configuration, so we don't have to remember to type it every time we install something, but I include this here because this is exactly what would actually happen in reality. So let's say Pip goes out, looks for a Posit tools package on PyPI, doesn't find it, realizes it's not there, so it's probably an internal package, and then searches the extra index URL, finds it on our internal repository, and installs it. I also need pandas, so I do the same command, but in this time, it does find it on PyPI and installs it just as expected. So everything's perfect until our malicious actor steps in.

So an evildoer is targeting Posit employees, for example, and maybe guesses that we probably have an internal package named Posit tools that's not available on PyPI. A lot of companies have internal packages like this for connecting to internal resources, databases, or other internal sources, and so this malicious actor creates their own malicious package and gives it a name they think that our company might use internally, like Posit tools, and they publish it on PyPI, probably with a large version number, in this case like 9.0, so it looks newer than anything else.

And now we go back to our user who's trying to, say, upgrade Posit tools they have installed, and so they, same as before, have their extra index URL, and PIP goes out, and it sees this newer version of Posit tools, not from our internal repository, but available now on PyPI, and it doesn't know any better, assumes that's what you want, and installs it, and you've been exploited.

So, you know, there's lots of variants of this, not just directly installing packages, but these packages being dependencies of other packages, and all of this can happen in a similar way here. So this one's actually great because Posit package manager can completely insulate you from this case using a unified local and public repository. So in that case, we put all of our packages, access all of our packages through package manager, we put our internal packages in front of the public package repository, and present that to the user as a single repository, taking that decision of which package to install completely out of PIP's hands.

taking that decision of which package to install completely out of PIP's hands.

So now when we ask PIP to install Posit tools from our repository here, again, it's probably been pre-configured, so we don't have to add the package manager address directly, but there's only one that PIP knows to pull things from there. And so PIP goes out, asks package manager now, I need the Posit tools package, and package manager knows, hey, I'm always going to give you the internal one, and because I know that the internal package supersedes the public one, and I will never ever serve you the public package, and our case is solved. Similarly, using the same server, I asked for pandas, pandas is not in our internal package source, so package manager says, okay, here you go, here's the public one from PyPI, and still taking advantage of all of the other security measures that we can also use, the curated package blocking or even repository snapshots can all be used in conjunction with these local and public repository can unify. Everyone's happy, and we've solved at least one of many security risks that we have to worry about.

Summary and Posit Public Package Manager

But in summary, the reality is we can't deny that public packages do present risks, but really understanding those risks gives you the knowledge to manage them. Today, we've talked about some of those strategies, and how tools like Posit Package Manager can help you reduce some of those risks.

And for those of you interested in learning more about Posit Package Manager, I want to make you aware of our free hosted service, Posit Public Package Manager, or P3M. We provide a full mirrors of CRAN, Bioconductor, and PyPI that are free to use, including historic snapshots that you can take advantage of, as well as the added benefit of our Posit-built binary CRAN packages to make things easier and faster to install in your R environment. You can find it out and explore it today at p3m.dev, and definitely reach out to Posit if you're interested in learning more about the advanced risk management features we talked about today, and how you can bring the power of Posit Package Manager into your own organization. Thank you so much.

Thanks so much, Joe. It's great to be able to dive deeper into Package Manager today. And while we're not jumping over to a live Q&A session here, both Joe and Ryan Johnson are here hanging out in the chat to answer any questions that you have. So we'll leave this chat open for the next 15 minutes or so. You can type your questions into YouTube or use the Slido link for anonymous questions. The short link for that, for Slido, is pos.it slash demo dash questions, which you can see on the screen.

We host these monthly end-to-end workflow demos on the last Wednesday of every month. We'd love to have you join us again. The last five months, Ryan Johnson has walked us through workflows for deploying visualizations to stakeholders with Dash and Shiny , shared how to create scheduled and company-branded Quarto docs for redundant reports, and also two ways to ensure consistent and up-to-date data in your work with APIs and also with PINs. The link to those previous workflow demos are also in the YouTube description below. But thank you again for joining us today. Ryan and Joe will be here in the chat for questions, but have a great rest of the day, everybody.

How to mitigate package security risks with Posit Package Manager

Transcript#

Overview of public package repositories

Package quality risks

Vulnerabilities in packages

Package confusion: typo squatting

Package confusion: dependency confusion

Summary and Posit Public Package Manager