Resources

Nicole Jones - Breaking Barriers: Adopting R in Biotech with Posit

video
Oct 31, 2024
17:26

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, we're here to talk about how at our biotech Denali we adopted R and mainly leveraged Posit.

Well, just a little introduction. I'm Nicole Jones. I'm a senior data scientist at Denali Therapeutics. There I wear many hats. I'm a study lead. I'm an R package validation lead. I am a Shiny developer, and I'm a newly minted Posit admin.

So, just a disclaimer, all these views in this presentation are my own and do not reflect the views of my company.

The journey to Posit

So, I'm going to share with us first the journey to Posit and what brought me to this stage today. So, when Denali got started, we were a house, a shop of R programmers. We did our internal exploratory analysis in R, and anytime we needed to do regulatory work, we would go to CRO and do it in SAS.

We didn't love that. We're R programmers. We wanted to leverage R. So, when I joined, one of the first things we laid out was what did we want our workflows and our programming life to look like. First, we wanted to be able to support internal R-based filing. We needed a GXP compliant system that would allow us to do this R-based filing and support our workflows. And we all know the famous, hey, it worked on my machine, but it didn't work on your machine. So, we wanted a shared R environment to make sure we didn't have those situations.

So, we found a statistical vendor to give us a statistical computing environment, but it was really designed for SAS. So, we worked alongside them to kind of come up with a solution for R. And this is what the solution looked like. This is the editor inside of our statistical computing environment. You don't have any of the features and the bells and whistles we know and love with RStudio. This is not an interactive session, so when you run your code and our server gets spun up on the back end, it executes the code, it gets killed. And then you have to go look at the outputs and things like that. We didn't love this.

We did have the option of using RStudio Desktop to integrate with our SCE, but then we still had that problem of I could have a different version of R, a different version of packages than my colleagues. So, we didn't want to go that route. So, we knew we wanted a proper editor, a proper development environment, and we did not want these disjointed R environments. And so, that led us to Posit, one thing that could allow us to implement these things.

We saw we'd get our familiar RStudio IDE that we know and love with all the interactivity and everything we've come to know. We would have a single R environment across our entire team. We can enable that now. And then there were a couple other added features where we could host packages, internally developed packages straight from GitHub to Posit Package Manager, and we could host our interactive reports and Shiny apps directly to Posit Connect.

Challenges and planning

So, we saw a lot of benefits, but we also knew that there were going to be some challenges, so we had to take a pause and really consider what was the scope, what were we agreeing to kind of take on by adopting Posit. We're not a big pharma company. We don't have the big infrastructure and the support that a lot of these other pharma companies had. In fact, it was just if we were going to do this, it was myself and my manager, Thomas, who had to be the people implementing this, right? So, we had a small team, and that meant we were going to juggle a lot of things. We were going to have to learn a lot of things and wear many, many hats.

We had to integrate our SCE with Posit Package Manager, and really not just say integrate, but what does that mean? What does that look like? What are the limitations of our SCE? What are the limitations of Posit? How are we going to make this happen? This was not an easy process. This took us a while, many iterations, and even now, we're still working some kinks out. So, definitely start this early and really get down to the granularity of what does this mean, and how does your data get shared, all these different things.

We also wanted to be able, in our Posit Workbench, to support exploratory analyses and regulatory work. We had to work with our QA and IT teams. That added a whole other kink in our process, and we had to modify our workflows. Like I said before, our SCE was not originally built for R, and the solution they built had limitations, and so the typical way code gets processed is a little different inside of our SCE, and we had to learn and modify these workflows.

So, we sat that. We saw our benefits. We saw the challenges. We said, okay, if we're going to do this, we need to come up with a plan. So, we sat down, and we figured out what's our plan.

So, outside of this whole Posit system, we had already developed a process for validating our packages. We used a snapshot-based method, pick a snapshot, grab packages from that date, validate a subset, and the rest are not validated. So, we gave our SCE access only to those validated packages. Those are the packages approved for use in our regulatory work, so that's all that our SCE would be able to access.

In Posit Workbench, when we're running our programs, we want the same experience that we'd have inside the SCE. So, in our Posit Workbench, we have that same subset of validated packages from that same snapshot, so we don't have an issue with package versions being out of sync and things working in Posit Workbench and then breaking inside of our SCE. But, as I said, we want exploratory work in Posit Workbench, so we gave access to our non-validated packages only inside of Posit Workbench to allow our programmers to still do all the fun things that we do that are outside of our regulatory work.

We had to define our system architecture. What would this look like? So, we opted to have our vendors host everything within the same environment as our SCE. Our SCE uses Docker Container as a backer for the R execution, and so we had our Docker Container install packages straight from Posit Package Manager. Our Posit Workbench and Posit Connect install from that same Posit Package Manager, and then we had to define how we were going to share the data across all these systems. So, that was definitely a challenge, and we figured that out.

We also didn't want to be limited with our vendors and only allowing them to update things, so we gave some of our team members admin access and keys, essentially, into the system so that we can make updates and not face a bottleneck waiting for our vendors to do minor things. And then, finally, in compliance with our QA and IT requirements, we used our single sign-on method that we use across all our platforms. Definitely, there's benefits and there's pros and cons to using this method, but one of the biggest pros is it's the same login we use everywhere. Make it easy, easy to get into the system.

Putting it into action

And so, we saw benefits, we kind of had a feel for our challenges, we laid out our plan, and now it was time to put it into action. So, all this kind of started about seven months into my role now at Denali, and I get an email, and in this email, it has an IP address, a port, my username, and a password. I say, okay, great, what do I do with this? And then I get an IM from Thomas, my manager, and he says, Nicole, can you go into Posit Connect and start enabling things, SSH into the system? I don't have a background in anything that has to do with anything like that, so this was my reaction.

What do you mean SSH into a server? How do I do that? Where do I start? I'm starting to freak out. I'm seven months into my job. I don't want to tell them I don't know how to do something, right? You know, and I'm just like, take a deep breath. And I say, okay, Thomas, I don't know what you're talking about. Can we get on a call? And a great manager, as he is, he hops on a call, and also, you know, don't be afraid to say you don't know how to do something. That's the only way you learn.

So we get on this call, and he shows me the system putty. Nice and simple. You put in the IP address. You put in the port. Press okay. Next screen, you put in your username, your password, and then you connect. He even showed me, it's not so clear to see, but there's a configuration file in all of these, in Posit Package Manager, Posit Workbench, and Posit Connect, where you actually go and make these changes and set certain system settings. So he showed me in Posit Workbench how you get to this. He showed me all the things he had already been playing around with, things he created. And then finally, I'm like, okay, how do you edit this? And he says, oh, Vim. And if you've ever done a merge on Git, you've worked with Vim. So familiar, right, familiar kind of text editor. I say, okay, great. All right, I can do this, right? I walk out that call. I'm confident.

And I go and I say, okay, it's my turn to get into Posit Connect. He's done what he had to do with Posit Workbench. Let me get to Posit Connect. And I run this command. This is a command that should open the config file. And I get this message. Pseudo not found. And this is my reaction. Because I just got off this call with him. I know I took notes. I don't want to message him again and say, hey, I'm stuck, right? Because then, like, I'm seven months into my job. Maybe he's going to say I made a mistake, right? So anyway, I did some Googling, you know, about 15 minutes later. And I realized I just need to install Vim into the server. Oof, okay, problem solved.

Configuring Posit Connect

So we get Vim installed. And now I'm up and running. So I had essentially two main tasks inside of Connect initially. One, I wanted to enable the deployment of content straight from GitHub to Connect. And then, two, I found this out after when I tried to deploy a Quarto document. We also had to enable Quarto. So that was another challenge.

So to enable content straight from GitHub to Posit Connect, first I had to work with my IT team. That was a little tricky. Convincing them why we needed this service account and how this would work and the security vulnerabilities, it took a few meetings to really get them to understand what it was that I was doing. But once we understood and they were okay with it, we set up a service account. And then inside the config file, this is all you have to add, right? The only places you need to update are the username and the password. Everything else would stay the same. And that's the service account username and then your GitHub PAT that you generate from that service account. And now anytime we want to deploy anything from GitHub to Connect, we add this service account as a read-only user to that repository, and then it can host. This is the preferred method for us for deploying things. It allows you to have constant integration and continuous deployment, right? You don't have to worry about republishing when you make updates in your main branch.

So in Quarto, as I said, I went to try to deploy a Quarto document and it failed. And so I realized there were a few things. First, we actually have to install Quarto on the Linux server. This is just a snippet of the code. The Posit admin guides have all this well-documented, so go there if you want to see the full code. But this is just a snippet of the code I had to run inside. Once I connected to the server, you execute this in the server, and it installs Quarto into the server. And then inside of the Posit Connect config file, we have to enable Quarto to allow it to actually be able to host Quarto documents on Posit Connect. And then we were up and running.

Configuring Posit Workbench

Until Posit Connect was squared away, we kind of set our sights back on Posit Workbench. As I said before, we wanted to ensure our environment was shared across our development environment and our SCE. We wanted to make sure all the packages were the same. And so in order to do that, as I mentioned before, we used a snapshot method for picking and validating our packages that would be put in the SCE. So we had to do a few things inside of Posit Package Manager to enable this snapshot. One, this first step is optional, enabling binary package installation. But if you don't know the difference between non-binary and binary package installation, just know this is much faster and very recommended. So optional, but I say do it. Make sure you know your Linux distribution system. This matters. If you pick the wrong one, it won't work. So we enabled our binary package installation first.

Then the next thing inside of Posit Package Manager, we enabled snapshot features way down at the bottom by the black box. You can kind of see it's a little small, but it ends with the date that you select your packages, and that is the repos link that we're going to use for installation.

The next thing we wanted to do was enable our developers to kind of do this easily. So one, we wanted to set our CRAN repo. We wanted to set our default repo to be the Posit Package Manager and not CRAN. And we also wanted to get rid of this GUI piece that would allow our programmers to change the repo, right, just to kind of nudge them and guide them in the right direction. So two things we had to do inside of the Posit Workbench configuration file. We had to add the repository link from a couple slides ago that was blacked out a little bit. We added that repo link here. So now anytime you do install.packages, by default it's going to go to that snapshot date and install packages from there. And then also we disabled the GUI piece.

And so now inside of your RStudio, you won't have that option to interactively select your new repo. Now, of course, if you use install.packages, repos equal. You can always change that. We can't disable that piece. But this is, again, nudging our programmers to do the correct thing and kind of keeping them from, you know, potentially opening themselves up to challenges that are going to lead to their programs not working as well in the SCE.

The next thing we want to do, as I said, our recommended method of deployment is for Shiny apps is directly from GitHub to Posit Connect, not pressing this publish button. The reason for this is if you publish from here, it's tied to my account on Posit Connect. And if I win the lottery tomorrow and I decide I don't want to work anymore, well, now we have to go through a hassle of changing who owns that content in Posit Connect. If you deploy straight from GitHub, it's based on the service account. You don't have to worry about any sort of shift, any change in your team.

If you deploy straight from GitHub, it's based on the service account. You don't have to worry about any sort of shift, any change in your team.

So very simple. Again, inside of Posit Workbench, just add this one line, allow publish equals zero. It gets rid of that publish button. And now, again, if you're still using a package, you can still use the console commands to publish from Posit Workbench. But this is, again, nudging our programmers in the right way so that they're using our best practices.

We had to figure out how to share our data between our SCE and our Posit Workbench. This was definitely a little bit challenging. And this was, like, one of the final things we worked on and we have it up and running. It was, again, this is going to be a very unique situation in how you figure it out for your system, but we had to work with our vendors to figure that out.

As I said, we're learning. This process took quite a while. So there are still two things that were kind of under development. Number one, how we're sharing our data between Posit Connect and the SCE. There are some limitations based on our authentication method that makes it a little challenging, so we're still kind of working out the best way to do this, and we've been working closely with our Posit admin team for this. So this should be up and running soon. And then also hosting packages from GitHub to Posit Package Manager. Not too difficult to do just with all the things we've had to do, had to prioritize, so this is coming as well soon.

Lessons learned

But, yeah, that's where we were, right? So our lessons learned, and this journey was quite a fun journey, and when we sat back and we looked back at these things, some of the lessons we learned, which started with, you know, get familiar with your Posit admin guides. A lot of the times I had a question, the question was already answered in the Posit admin guides, so get familiar, look at those.

Leverage your Posit support team. They've been an invaluable asset in this whole process. If we had a question, they've done it with so many other organizations, so they can walk you through all the ins and outs, the pros, the cons of different methods that you can go about. I'd love to say with R, if there's one way to do it, there's 10 other ways to do it. It's the same with this. There's a lot of times there are different ways you can implement it. It's just based on what works for your system and what your ITQA will want you to kind of process you'd want to go about.

Work closely with your vendors and really take in their experience with the Posit tool into account. It took us many iterations to really make sure they understood what it was we meant when we needed certain features implemented. So work with them. Make sure you're with them every step of the way. And then if you're trying to select vendors, if they've set up Posit and worked with Posit before, definitely make that as part of your decision-making process, I would recommend.

Start early. It's going to take you much longer than you think. They said we're still working out some of the kinks, so don't wait until, hey, we're going to want to do some reporting event in six months. No, no, no. Start early. It's going to take you much longer. If you see it coming up anytime soon, start your journey as soon as you can.

Don't do it alone. There's definitely powers in numbers. For me, it's only two of us, me and Thomas. But if it was just me, I don't know where I'd be right now. I probably wouldn't be on this stage. So definitely have a battle buddy for you, at least one. Don't be afraid to break the system. I did a couple of times, and it got fixed, so don't worry about it.

And then the final message is you can do it. We did it. We're doing it. I was just on a call with Thomas this morning. He's running tables and figures and stuff in our system. Like, it can get done. We had a small team, but we did it. So thank you, and any questions?

And then the final message is you can do it. We did it. We're doing it. We had a small team, but we did it.

Q&A

If someone wants to update a validated package, how much work is it? Say they see a nice new feature in a package they want to take advantage of, how difficult is it to update that package? Yeah, great question. So that's part of, like, our package validation, like, flow. And, I mean, all in all, really, like, in the grand scheme of things, not too difficult. But we're just, like, if you're going to request for a package to be updated, I guess the short answer is we have a process for it. I can, in theory, do it in a day. But, like, you know, we have to go through our full QA process and everything. So I think that just comes down to, like, that question really comes down to how do you guys handle package validation, and how often are you guys willing to update your snapshots.