Becca Krouse | R Package Assessment: Lessons from Pharma | Posit (2022)
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
My name is Becca Krouse. I am a data scientist at GSK in our Statistics and Data Science Innovation Hub, where my team is working on building support systems and facilitating the adoption of R across biostatistics. Today I'll be talking about R Package Assessment lessons that can be learned from pharma.
In today's world, we have no shortage of products right at our fingertips that can be received in just days or even hours with the click of a mouse. Vendors of all shapes and sizes have the ability to reach consumers across the globe, which means as consumers we have a wide variety of features to choose from, but also quite a range of quality. Sound familiar? Whether you're looking for something you need on Amazon or a package on CRAN, you probably want to find something that suits your needs that you can count on.
So how do you make those decisions? What do you look for? You're probably going to look at indicators that you feel are important. So if you think about how you approach online shopping, you probably go to the reviews, the ratings, badges, the name of the seller, and depending on what you're looking for, how much time you have, maybe you deep dive into those reviews. While the indicators, and I'll say the indicators give you an overall picture of confidence of whether the product is going to look, feel, and perform as advertised. While the indicators are a little bit different for packages, the overall idea is the same. A little bit of research is going to go a long way to build that picture of confidence that the package is going to produce results that you can count on that are accurate.
Now, this idea is absolutely critical to those of us in, I mean, it doesn't matter your work, but it's especially critical to those of us in regulated industries like pharma, where it's really important for us to deliver safe and effective drugs to patients as quickly as we can. And to do so, we need to make sure our data and our analysis, our entire pipelines are free of errors and full of integrity. A huge part of our pipelines is the software that's used to produce the results, transform the data, all that type of stuff. We really want to do our research, and the regulators expect that we do our due diligence and make defensible choices. So regulators expect that our choices are defensible, meaning that we want to do our due diligence and provide sufficient documentation of why we feel that our software, our packages work correctly and produce correct results.
We really want to do our research, and the regulators expect that we do our due diligence and make defensible choices. So regulators expect that our choices are defensible, meaning that we want to do our due diligence and provide sufficient documentation of why we feel that our software, our packages work correctly and produce correct results.
So let's think about that for a minute. How do we really know our results are correct? Well, one thing we can do is compare against a source of truth. That source of truth could be your common sense, another programmer, another programming language, maybe a publication like a textbook even. And this is true regardless of whether using the latest and greatest R package or a SAS proc that's been around for decades. It's all the same idea, and we want to make sure that our results are going to be accurate for many scenarios we haven't encountered yet, many new data sets, new studies in the future.
We're in exciting times now, and pharma is making progress full speed ahead towards R being a primary tool used in studies and submission work. Folks across the industry are putting their heads together to align on a common philosophy for what it takes to ensure that our packages are accurate. In the R validation hub specifically, this philosophy focuses on a package assessment. And there's some principles and some ideas in place, but it's up to the individual organizations to then implement it for themselves and really decide how much evidence is enough. At GSK, we're working hard on our own implementation, and we've settled on a set of guiding principles for how we think about package assessment.
And I'll present those to you today as the domains of trustworthiness. There's five of them. Each one plays an important role in the overall assessment, and each one consists of a collection of things. Those things are sometimes quite easy to measure, quantifiable, but others are a little bit more nuanced, harder to measure, more subjective.
The community domain
So let's step through these now, beginning with the community domain. The purpose of this domain is to get an idea of how broadly used a package is in the community, how much folks are enjoying it, how strong its roots are. So specifically, we can look at something like downloads to get an idea of the volume of use to date, and that volume of use is an indicator of how much the package has been vetted in real-life scenarios, how much it's been stress-tested. We are also interested in the overall trajectory of that use. Is it gaining traction? Is the package maintaining or declining for some reason? Maybe there's another package that's come along that's superseded it.
We also want to look at how the community relates to the package and its maintainers, both how the community relies on this package, depends on this package as part of other software or other materials like books, and also, going the opposite direction, how much the package maintainer has a vested interest in supporting the community and keeping up with the ever-evolving, many moving parts of the community.
Now something to keep in mind is in an industry like pharma, we have some fairly specific needs. Maybe our packages that we are choosing are a little more niche, or we have some internal packages that we're using. So by nature, these are going to have a little bit smaller community user base, and that's okay. We have other domains to look at as well.
The authorship domain
With the authorship domain, we are doing some research to learn who the developers are. What are their backgrounds, their experience? And really, does that experience, is that appropriate match for the complexity of the tool? We found this one to be fairly important for statistical packages where that level of expertise is quite high, quite specific.
So some things that we look at here are the overall reputation, how the community regards this author as an expert in their field, as an R developer, and how much experience they have in both those areas. So something you could look at is the other packages that that person or those individuals might have out there, and any specific qualifications, degrees, certificates, things like that, all round out to our picture of who these authors are and what their backgrounds are. And this domain is quite dependent on what you can find out there, which obviously varies from individual to individual. So it's important nonetheless.
The documentation domain
Our next domain is the documentation domain, which here I intend to cover functionality documentation. So what is the tool supposed to do? What is the intent? And what is the scope, what are the limits of that tool to help us understand what is appropriate use, what is responsible use? This is kind of a prerequisite to being able to prove accuracy. You need to understand in which cases your function is going to produce correct results.
So documentation, we are all probably pretty familiar with looking around at help files, vignettes, examples. So what we're after is, you know, do those things exist, but also what is their quality? How helpful and clear are they? What types of scenarios are covered? And then for the intended use, clear documentation of expected inputs, expected outputs. Really nice if edge case, the limits piece is covered in there. What might we want to watch out for when we're using this tool, for instance.
The software development lifecycle domain
And this leads us into the software development lifecycle domain, which we are aiming to capture the overall process for how the code develops and evolves over time. How it is cared for, essentially. With this one, first and foremost, we need transparency into what's happening. And luckily for us, this is increasingly easy with things like GitHub, which is pretty popular these days. Allows us to get a peek into what's going on, what kind of branching model is being used, how are things being worked on and merged in, and are there any what is the road map of the tool? Are there any critical open issues? All things we can look at just by peeking into that repository.
And to get a sense of how well supported it is, we can also look at the activity. How quickly the code or how often the code is being worked on, issues being closed, releases to CRAN. And then any safeguards that are in place to protect the integrity of the code as it evolves. Protected branches, continuous checking, continuous testing, all to prevent against any bugs being introduced as this changes.
Now, something to keep in mind is a package could have all the hallmarks of a good package here. But maybe it's really, really new. And it hasn't had a chance to work out some of the bugs. And, you know, we're not aware of those yet. And it hasn't been stress tested enough yet. Or it's changing a lot. On the other end of things, a package could have fairly stalled development, which could mean it's been abandoned or it could mean it's just really, really stable. So something to keep in mind in looking at this domain is what is the history of the project? Where is it in terms of its life cycle?
The testing domain
And our final domain is the testing domain. So here, testing, I mean the actual testing inside the package. Oftentimes this is your test that, your unit tests. And what these do is check that the functionality is working as expected and that it's producing correct results. Which sounds a lot like our definition of accuracy that we talked about earlier. So a really solid test suite can be essentially a gold standard for how you can prove accuracy.
So let's take a look at what we might look for inside the package. So this is your package directory. But we can look at whether a test exists and what they're covering. So things like code coverage. Which functions are being covered? And to what extent? And from there, we can look specifically at what we're interested in. What are the key things that we feel we are going to be using the most? And how deep that testing is on those. So what types of scenarios are covered there? What types of inputs are the developers looking for? And are they comparing against some known value? Even better if that source of truth comes from some outside resource.
And I recognize this is a fairly technical task. So this may look a little bit different depending on who you are and what resources you have. But even being familiar with the first two, the presence, the breadth, and then taking a peek into the depth. Maybe some of those descriptions in the test that can be very helpful. Another thing to keep in mind about the testing domain is not everything is super easy to test. Sometimes visualizations or complex algorithms can be a little bit— really novel complex algorithms can be a little bit tougher to test. So that's okay. It's just— you know, again, we have the rest of the domains.
Thinking holistically about assessment
So with those in mind, the assessment may look a little bit different depending on who you are and what you do. For us, we are very much working towards our submission work. We're very risk averse in this situation. We do a pretty in-depth assessment. So, you know, if you're just kind of poking around exploring, maybe you do a little bit lighter of an assessment. But getting to know your packages is really important. It will serve you well.
Also on a task level, if you're transforming the data, changing the data in some way, or performing a statistical analysis, the stakes might be a little bit higher if something goes awry as compared to— and also it might be harder to tell when something goes awry as compared to something that's more aesthetic in nature. So, again, your approach to the assessment might change depending on that.
And so you could also perform this whole assessment and not be quite satisfied with the result. But maybe you have only one option. And this is the package you need. So maybe you're willing to bend a little bit on your risk and build some extra safeguards around that for yourself. Augment it with some testing or extra QC.
So even if you find yourself in the left side, very much in the stable risk averse, the rest of the ecosystem is going to continue to evolve and change. And that's the beauty of R, right? There's cutting edge newer things every single day. So it's important for us, you know, if we are in that situation, to keep our eyes up ahead of what's to come. And find ways to innovate. Find ways to experiment. And keep our minds open for any gems that come along. And we can in turn then grow our tried and true collection of packages that we count on.
All right. So I said there were five domains. But I'm going to throw you one more. Which is the domain about thinking holistically. So when we do these assessments, we are not expecting a perfect score by any means. The packages the purpose of the assessment is to allow these packages to shine where they are strong. And let those strengths or weaknesses guide us in the rest of our assessment. And help us focus in on what matters. So for instance, a package could have really strong tests. But maybe not as much community support or vice versa. So these things can kind of counter each other, right? So if we're flexible and adaptable in our approach, then we can more effectively collect that body of evidence that's going to inform our sense of trust in this wonderfully imperfect open source world. Thank you.
So if we're flexible and adaptable in our approach, then we can more effectively collect that body of evidence that's going to inform our sense of trust in this wonderfully imperfect open source world.
