Becca Krouse | R Package Assessment: Lessons from Pharma

Transcript#

This transcript was generated automatically and may contain errors.

My name is Becca Krouse. I am a data scientist at GSK in our Statistics and Data Science Innovation Hub, where my team is working on building support systems and facilitating the adoption of R across biostatistics. Today I'll be talking about R Package Assessment lessons that can be learned from pharma.

In today's world, we have no shortage of products right at our fingertips that can be received in just days or even hours with the click of a mouse. Vendors of all shapes and sizes have the ability to reach consumers across the globe, which means as consumers we have a wide variety of features to choose from, but also quite a range of quality. Sound familiar? Whether you're looking for something you need on Amazon or a package on CRAN, you probably want to find something that suits your needs that you can count on.

So how do you make those decisions? What do you look for? You're probably going to look at indicators that you feel are important. So if you think about how you approach online shopping, you probably go to the reviews, the ratings, badges, the name of the seller, and depending on what you're looking for, how much time you have, maybe you deep dive into those reviews. While the indicators, and I'll say the indicators give you an overall picture of confidence of whether the product is going to look, feel, and perform as advertised. While the indicators are a little bit different for packages, the overall idea is the same. A little bit of research is going to go a long way to build that picture of confidence that the package is going to produce results that you can count on that are accurate.

Now, this idea is absolutely critical to those of us in, I mean, it doesn't matter your work, but it's especially critical to those of us in regulated industries like pharma, where it's really important for us to deliver safe and effective drugs to patients as quickly as we can. And to do so, we need to make sure our data and our analysis, our entire pipelines are free of errors and full of integrity. A huge part of our pipelines is the software that's used to produce the results, transform the data, all that type of stuff. We really want to do our research, and the regulators expect that we do our due diligence and make defensible choices. So regulators expect that our choices are defensible, meaning that we want to do our due diligence and provide sufficient documentation of why we feel that our software, our packages work correctly and produce correct results.

We really want to do our research, and the regulators expect that we do our due diligence and make defensible choices. So regulators expect that our choices are defensible, meaning that we want to do our due diligence and provide sufficient documentation of why we feel that our software, our packages work correctly and produce correct results.

So let's think about that for a minute. How do we really know our results are correct? Well, one thing we can do is compare against a source of truth. That source of truth could be your common sense, another programmer, another programming language, maybe a publication like a textbook even. And this is true regardless of whether using the latest and greatest R package or a SAS proc that's been around for decades. It's all the same idea, and we want to make sure that our results are going to be accurate for many scenarios we haven't encountered yet, many new data sets, new studies in the future.

We're in exciting times now, and pharma is making progress full speed ahead towards R being a primary tool used in studies and submission work. Folks across the industry are putting their heads together to align on a common philosophy for what it takes to ensure that our packages are accurate. In the R validation hub specifically, this philosophy focuses on a package assessment. And there's some principles and some ideas in place, but it's up to the individual organizations to then implement it for themselves and really decide how much evidence is enough. At GSK, we're working hard on our own implementation, and we've settled on a set of guiding principles for how we think about package assessment.

And I'll present those to you today as the domains of trustworthiness. There's five of them. Each one plays an important role in the overall assessment, and each one consists of a collection of things. Those things are sometimes quite easy to measure, quantifiable, but others are a little bit more nuanced, harder to measure, more subjective.

So if we're flexible and adaptable in our approach, then we can more effectively collect that body of evidence that's going to inform our sense of trust in this wonderfully imperfect open source world.

Becca Krouse | R Package Assessment: Lessons from Pharma | Posit (2022)

Transcript#

The community domain

The authorship domain

The documentation domain

The software development lifecycle domain

The testing domain

Thinking holistically about assessment