Roche's End-to-End R Journey to Submission

And then based on our case, we can see that the path for the end to end R submission has been established, and it takes limited actual efforts for the applicants to implement the practice using R with the open source and appropriately disclose the R-related information in the submitted package.

Reflections and the road ahead

Thank you very much, Jingyan. So to wrap up, I will take the next several minutes just to share some of our reflections of the past, the present, and also a little bit look into the future.

So starting with the plan within Roche, looking into the future. Right now, we are really, really excited about this first R end to end submission to health authorities and really happy to see that there hasn't been issues, technical issues so far. And as mentioned, we are welcoming more and more studies using R end to end and using open source tools in clinical trial reporting. So a big focus for us is further adoption of modernized platform and solution. And then after the adoption or with the adoption, like another big emphasis from us is trying to standardize and automate the automatable tasks by leveraging the modernized solutions and technologies. Of course, we are not innovating for innovation's sake. We really want to improve our productivity and then reduce manual work, repetitive work from our data scientists.

So if you're interested in learning more about our automation work and ideas, please feel free to check out the pharmaverse packages, such as NAS for output generation, Shiny app generation, and admiral for item data generation, and also the Citrix open source effort for the oak package for acidity and mapping. There are a lot of automation features being embedded in those solutions. And then also similar to everybody, we are pretty excited about the recent advancement of AI and especially large library modeling. And we are actively exploring how to use AI to assist programming, especially assist our code generation for clinical trial reporting.

And I also want to spend some time to share some reflections from myself to cover the journey during the last five years or so. So I remember that the first time I got exposure to our code submission to FDA was in 2020. At that time, Roche had a post-marketing commitment to FDA, where we needed to do some analysis on a large volume of clinical trial data. At that time, for majority of the analysis, we had been using SAS. But there was one particular set of analysis we needed to use a pretty advanced methodology, in which case it was hard to implement using SAS. But there were already existing R code from academia that we can directly borrow. So we decided to use the R code to fulfill this post-marketing requirement, and then to submit R code to FDA. And at that time, since we didn't have much experience of our base submission, there were so many questions internally. For example, our biggest question is around how can we make sure the FDA reviewers will be able to reproduce the results, how to ensure that they can work in a similar computing environment as we do.

For example, two very simple questions are like how do we share the open source packages we use so that FDA reviewer can use the same open source package. And also at that time for the advanced methodology, we built an internal preparatory R package to wrap up the method. And then we want to submit that package to FDA. And as you may know that for all the submission to FDA, you need to go through the electronic gateway and how to submit the package through the electronic gateway.

So eventually we made it, we made a submission, but with a lot of question marks and we were not quite sure that whether we follow the best practice for those submissions. So I remember that in 2021, our team presented at the R pharma conference at Harvard at that time, it was an in-person conference. And it was very interesting that we actually met many people from different companies having similar experiences. People all started to use R for regulatory submission or clinical trial reporting. And they realized that there are certain industry standard or certain current practice that were not very friendly to the usage of R or other open source languages.

So in 2022, a number of cross-industry collaborations has been established with the passionate people from different companies and with a very strong engagement from FDA. I'm calling out two efforts here because those are the ones I'm most familiar with. One is the R consortium, R submission working group, which is a collaboration with FDA to showcase public available open pilots and to show how to submit using R and what are some good practices and also try to identify gaps for R-based or open source language-based submission to FDA. And the other one is called Pharmaverse. And as you heard from earlier presentations that Pharmaverse is an open source collaboration effort trying to develop high quality technical solutions to bridge any gaps in a clinical trial reporting space.

So in 2022, it was really exciting that we finished the pilot one from R consortium working group. We were able to showcase how to use R for RBA submission and then engage with FDA review or to identify gaps, etc. And then those gap has been feedback to Pharmaverse for further development of technical solutions to bridge the gaps. In the following pilot one, there were a number of other pilots in the past two years. In pilot two, we were able to collaborate with FDA to do a Shiny submission. Pilot three, we submitted R code for Adam generation. And right now we are doing pilot four, which is very exciting pilot that we will explore container web R technology.

So I want to say that all those cross-industry efforts shared so much learnings to the Roche internal team. And I truly believe that those valuable insights really streamline our experience in our first R into a new drug application submission earlier this year. And for example, for those iterative learnings, just looking back to the two very simple questions we had during the PMC days, I recall that when we did a first PMC submission within Roche for open source package, we simply just provided crime links to FDA reviewers. However, we all know that this is not optimal because crime may not point to the package version that we use. So during the R consortium submission pilot number one, we have more in-depth discussion and then decided to use and run snapshots, which can point to a specific version of our package. And also at that time, we started to provide more detailed dependency graph and descriptions in the reviewers guidance.

In 2023, we realized that the RE has been pretty mature at that time. So we decided to explore the usage of RE in pilot two, and that was really successful and largely simplified the submission progress. And then right now we're exploring a container web R solution, which is really exciting and I truly believe that that will further increase the productivity in the simplified submission process. And in terms of a proprietary R package, as you may recall that for any submission to FDA, all the files need to go through an electronic gateway. And when we did a post-marketing commitment submission, we learned that for security reasons, the electronic gateway only allow for plain data format, plain file format. So for the PMC submission, we needed to manually convert all the package content into a plain file to do the submission. And then during the RE Pharma conference, like we discussed with a number of people from different companies, and our collaborators from Merck came up with this very smart idea to build an R package to automate the manual process. So under Pharmaverse, there was a R package developed called PackageLite, which automated the manual process of file format conversion, and which largely simplified the process of submitting proprietary R packages. And this PackageLite solution also has been used in our Roche's first R end-to-end submission that Jingyuan mentioned.

And very excitingly, just several weeks ago, our R Consortium submission working group pilot 3 team actually worked closely with FDA colleagues to test the feasibility of submitting zip file for proprietary R packages, and it was very successful. So this further streamlined submission process for proprietary R packages for the future.

So looking into the future for those cross-industry efforts, from my side, I just always feel very grateful and proud of being this very active community. And then everybody is really passionate, really committed, and really supportive to each other. I just found very lucky to be part of this journey, where a lot of us believe that being together, by adopting open source spirit, ecosystem, and tools, and by contributing back to open source and open collaboration, we are really accelerating the bench-to-bedside treatment development as data scientists. And we also truly believe that by automating the automatable tasks, and then reduce repetitive work, and then reduce manual work, we will be able to unleash our clinical data scientists to focus on more impactful work to really bring benefit to patients.

And we also truly believe that by automating the automatable tasks, and then reduce repetitive work, and then reduce manual work, we will be able to unleash our clinical data scientists to focus on more impactful work to really bring benefit to patients.

So lastly, I just want to say thank you for everybody who has been part of this cross-industry community, who has been contributing to this cross-industry community, and I'm really grateful to have the opportunity to work with you directly or indirectly. And if you are not a part of the community yet, you haven't been contributing to the cross-industry efforts yet, it's never too late. I share several links over here, please feel free to take a look at the working groups and identify the ones you may be interested in, and just reach out. I'm sure everybody in the community will welcome your contribution. So with that, I want to thank you for your attention, and we are happy to take any questions.