Resources

Roche's End-to-End R Journey to Submission

video
Sep 10, 2024
59:35

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello. Welcome. Thanks for coming. I am Phil, the Director of Pharma and Healthcare here at Posit. I'm going to be helping to moderate the session today, so if you have any questions, feel free to post those in YouTube as well in Slido. We'll put the link across the screen now so you'll know how to get it. Very excited to bring to you today the sequel to the popular Roche webinar that we hosted last year, where they discussed and mapped out the open-source backbone for clinical trials. And today, we're going to continue Roche's journey and take you through an application of that backbone for a complete R submission and the communication and also the collaboration that they had with the regulatory bodies.

We'll kick off the presentation today with Ning Ling, Interim Global Head of the Data Science Acceleration Enabling Platform, as well as Heinel Patel and Jingyuan Chen, both principal data scientists working on the submission at Roche. Last year in the webinar, we covered the innovation and strategy at Roche, detailing the vision to create this open-source backbone for clinical reporting. And it detailed the statistical computing environments, the package development, the upskilling, the workflow changes, and all the work that Roche had to do over the years to do the submission that we're going to talk about today, all the while leading and talking and contributing to efforts like R and Pharma, the R validation hub, the Pharmaverse, the pilot submission, and the list goes on. And it's with great excitement to bring to you today Roche's first end-to-end R-based drug submission. So with that, I'll pass it over to Ning to start the webinar for today.

Roche's open-source vision

Thank you very much, Phil. I'm really, really happy to join my colleagues, Heinel and Jingyuan today to share our exciting journey for our first end-to-end submission to health authorities. So in today's presentation, we will start with myself, and I will start with introducing our Roche version, especially explaining why we decided to move to open-source strategy. And then following that, Heinel will join me and share her journey, her team's journey as early adopters for those new tools. And then Jingyuan will join us to share the regulatory submission journey from Roche. And after that, I will wrap up with a short reflection of the past, the present, and also the future.

So we started with Roche's vision. So as many of you know that Roche is very committed to embrace modernized solutions and open-source technology. So why is that? If you're working in pharma or biotech or clinical research, you probably all experienced that during the past few years, we are collecting more and more data in clinical trial research. And those data include new data modalities such as genomic data, image data, as well as live data from digital devices, et cetera. Those new data types, new data modalities brought in large data volume as well as more complexity. So with that, all of us need to think harder on how we bring more efficiencies into our data pipeline. Also, those new data modality allow us to generate new insights by using more advanced methodologies from statistical modeling or AI modeling.

So by embracing the open-source languages, it allow us to quickly access the latest and greatest development from academia as well as from tech industry. So that allow us to enable latest automation technology from other industries and also enable us to apply the latest statistical methodology and AI methodology to generate insight from our clinical trial data.

Also, many of us must experience that during the past 10 years or so for the newer grads like the talent from out of school, they already knew our Python and Git from school and they already have some experience in cloud computing. We realized that by enabling them using those tools and skills that they learned from school, that largely improved their productivity as well as job certification.

So with that, within Roche product development data science, our mission is to unlock the full potential of data to accelerate innovative healthcare solutions for patients and society. To meet that mission, one of our goals is to empower our data scientists to leverage diverse tools and languages so our data scientists can use the right tool for the right project. If you have been working with pharma or if you have been knowing pharma for the past few years, you probably know that for clinical trial reporting, commercial software has been dominating the space for the last decade or so. So switching to an open source ecosystem is not easy and it's not happening overnight.

Roche's journey toward open-source adoption

For myself, I joined Roche Genentech in 2016. And if I look back to the earlier years in my career, I recall that at that time, commercial software has been heavily used for clinical trial regulatory reporting. And when I joined the company, that was the time that we started to collect more and more genomic data and image data. And for those exploratory analysis, using genomic data and image data, R has been widely used as a default language for those exploratory analysis. And at that time, internally, we had a R server being built, but primarily for those exploratory analysis.

On the other side, there are so many people working on those new data modalities, etc. So we have a very robust, very solid and increasing R user community within Roche. So this user community also started to explore whether we can build R packages, POCs to support the regulatory work. So there are a number of grassroots efforts trying to tackle different problems for regulatory reporting, such as data processing or output generation.

In 2021, our leadership team under Roche Data Sciences decided that we need to have a holistic strategy moving to open source and modernized solutions. So at that time, among the strategy, there are two aspects under the strategy. One is the infrastructure strategy. We decided that we want to have a validated modernized infrastructure that enables multiple languages. This infrastructure is called Ocean. From the toolset perspective, our leadership team decided that we want to productionize certain prioritized POCs to build our end-to-end pipeline to enable the usage of open source software in clinical trial reporting. From people's perspective, we started to have a more holistic plan for upscaling to enable our people to learn R and Git if they are not familiar with those tools yet. And also we embed the open source language expectation into our recruitment strategy.

2022 has been a busy year for execution. So internally, we have been busy to build our infrastructure and also to productionize our toolset. We also established our validation pipeline and validated the system and tools. And at the same time, we realized that because the commercial software has been dominating the regulatory space for such a long time, we realized that some of the cross-industry standards or cross-industry good practice guidelines are around commercial software. And those standards and guidelines may not be that friendly to the usage of open source software.

Therefore, a number of like-minded individuals from different companies, plus the strong collaboration from FDA, we established a number of cross-industry working groups. And those working groups focus on revisiting certain industry guidelines, industry standards, and try to make them more compatible for the open source ecosystem. And also we identified gaps and try to build open source solutions to bridge those technical gaps.

Within Roche, 2023 has been a very exciting year where we delivered an early version of next-gen tools and systems. We welcomed a number of early adopter studies, including Kinao's study, and she will share more on their experience as the early adopter for our new tools and systems. And then based on the feedback from those early adopters, we iterated our system and tools and then made enhancements.

This year, we are welcoming wider adoption to our new infrastructure. We have a very bold goal this year. We want to migrate 90% of our molecule team users to the new system called Ocean. Right now we are a little bit above 50%, so we are very optimistic of meeting our goal for this year.

In March 2024, very excitingly, we had our first R-based NDA submission to health authorities, and Jingyuan will share a little bit more on the journey of this first submission. And right now within Roche, we are welcoming more studies adopting the RN2N. We are very excited to see that more and more studies are using majority of open-source packages for their clinical trial reporting.

Early adopters journey

Thank you, Ning. Thank you, Ning, for the introduction and a recap of our next generation journey. Before I dive into the early adopters journey, I'd like to briefly introduce who we are and what we do. So at Roche, the PD Data Science is a diverse group of data scientists. We are the end users of the next generation tools and systems. Our team consists of both internal and external members, including experienced programmers and statisticians with a range of expertise in SAS and R. We have varying levels of familiarity with clinical trial data, with some members being multilingual and others either strong in R or SAS. The majority of the team are seasoned SAS programmers who've upskilled in R with a pharma background. Our daily work involves reporting on clinical trials and collaborating closely with key stakeholders such as clinical science and the regulatory to generate robust scientific evidence that supports the approval and the adoption of innovative medicines.

So with this background in mind, as an early adopter, we engaged in extensive strategic discussions and planning. So I'd like to share insights into four key areas of focus, starting with choosing the right study, which was an important consideration. We selected an oncology study, a breast cancer trial, with a straightforward study design that would have standard analyses typical of a breast cancer study. So there was familiarity with the programming specifications and the reporting would align with existing Roche data standards and easy usage of available open source packages. In addition, we carefully considered the study timelines to ensure that there was enough upskilling time, allowing our team to enhance their proficiency with the new tools while continuing to meet the study related activities.

The second area, by far the most intensive, is forming a data science team who are motivated and passionate to upskill. This was really key to our success. Being an early adopter meant that there was a lot to learn and not everything was perfect. The data science team members were not just end users, but we would co-create with the developers to shape the future of our future systems and tools. This mindset and seeing the bigger picture and wanting to be part of this transformative journey was also key. Once we had a team, planning enough lead time for upskilling was crucial. This can be challenging as the upskill need will vary depending on the individual needs. This could cover anything from basic R, package specific training, Git lab training, system related training, and learning new processes and new ways of working. So allowing enough time to practice new tools and solutions on real clinical trial work is essential as this is when the learning is really embedded. Having a good mix of R programmers and those with clinical trial experience is crucial as it allows the team to work as one and support each other in their respective learning journey.

The third area was planning the programming strategy. Some of us are diehard SAS programmers for many years and it's what we're familiar with. So the transition to R for the first time in a submission was a significant step. We therefore planned QC using SAS and this dual approach gave us confidence in the accuracy of our analyses and also providing further flexibility should we need the SAS version of the program in the future. And then finally stakeholder management is vital. We actively engaged with our internal stakeholders to align with our R submission plans to reassure them that the transition would not introduce risks to our submission timelines. Additionally, we communicated with health authorities about the R-based submission to foster a collaborative and a smooth submission process.

So overall, our learning experience has been positive. Initially, the learning curve felt steep as there's a lot to grasp, a new system, a new environment, new tools, new processes. However, over time within the team, we noticed a growing sense of confidence and familiarity with the new tools leading to more timely delivery. As we progress, we're already observing benefits in our second study, which is now reporting. The feedback from users using the open source packages from Pharmaverse has also been positive. Users liked the structured modular approach to programming as it helps navigate and debug code more easily. The documentation is easy to follow and the template programs provide a consistent starting point and are detailed enough for data scientists new to Pharma to be able to pick up.

Programming strategy and key packages

Our programming strategy involves conducting all the first-line programming in R. So on this slide, I'm highlighting the key packages used. So for the SDTM mapping, so this is our industry standard for raw data we collect and required by health authorities. We used Oak and Mint. For Adam, this is our industry standard for analyses and additional derivations that are programmed using SDTM as a source. In here, we've used Admiral and Admiral Onco. And in addition, we have a proprietary package Admiral Roche, which bridges the gaps with our internal data standards. And then for tables, listings, graphs, so these are the final outputs used in health authority documents. We've used Chevron, TLG Catalog, our listings and our table. And for readable code for TLGs, this was based on a subset of the key analyses defined in the study protocol. We've used the TURN package.

So all the packages are well-established, cross-collaborations solutions from Pharmaverse, apart from Admiral Roche, of course. Our strategy was to leverage open-source packages to enhance the transparency during the health authority review process. So enabling the reviewers to directly access openly available open-source code and find the answers to their questions should they wish to.

So in addition to the end-to-end R submission, we also utilised our Shiny apps using the Teal package. Two R Shiny apps were employed throughout the study lifecycle, both before and after database lock, as well as during the submission process. The apps proved valuable across various business contexts. The first R Shiny app was used informally for study planning, and the second R Shiny app complemented our final static analyses for the trial. And I'll provide more details of the apps in the upcoming slides.

So before database lock, whilst we're blinded to the treatment regimen, we use the event tracking app for progressive free survival, PFS. This standard oncology endpoint measures how long patients remain free from disease progression after treatment to assess the effectiveness. So the number of PFS events to detect a significant difference between two treatment groups is calculated. We then use the R Shiny app to predict when these PFS events could occur, helping to determine the study timelines for the final analysis.

After study database lock, when the number of PFS events are achieved, we unblinded the study and reran the final analyses to assess the treatment effect. And our second R Shiny app was used during the clinical interpretation meetings with internal clinical science and the regulatory team. The R Shiny app has also been beneficial for external requests within Roche, such as those in case of novel insights from patent filing teams needing to reproduce the endpoints on particular subsets of patients, and other teams who are involved in accelerating patient access to our treatments, so involved also in pricing and reimbursements. They have also used our app to QC additional analyses they receive from external vendors. So internally, the app supported health authority questions by reproducing key analyses on subpopulations or different baseline characteristics in real time. And in addition, we've used the event tracking for overall survival, which is another common endpoint in oncology.

So the usage of the apps has provided us the flexibility to replicate similar analyses, enabling faster insights and better management of additional formal analyses, especially when the team is under pressure to address health authority questions. So this approach helps us to avoid generating unnecessary static outputs and allows us to focus on analyses that are truly valuable.

So this approach helps us to avoid generating unnecessary static outputs and allows us to focus on analyses that are truly valuable.

Regulatory submission journey

Yeah, thank you. Thank you, Hina. So after hearing the technical platform and the way of working in our study team, we also want to share with you our submission journey. And then from here, there are some points that we will cover in our story for today. So we would like to share with you the way that we disclose the R information in our submission package. Also, we will unfold the interactions with FDA for a submission in US, the communication with EMA for the submission in EU, also the talks with MTA for the submission in China. So now let's start with the approach that we have taken to include R in our electronic submission package.

As a part of our CDISC submission, we have ensured that the data contents and the formats are consistent with the CDISC requirements as an industry-wide standards. The submitted datasets, they are all in XPT format, just that they are generated through R. So if we compare with the traditional package, we can see that the ICTM part remains the same. Just the dataset, they were generated from ROAC. Then for the analysis dataset package, the ADAMS dataset also generated into XPT format, but written from Admiral Roche, which is currently a proprietary package. So in this way, the readable code for the ADAMS dataset will be written in Admiral Roche, but the readable code for the outputs, like the tables, listings, or graphs, is based on the open-source package of turn, so that the end users can easily install and replicate our results.

Then for the only difference, for the actual efforts, one is to further specify the details of the R package that have been used in our analysis. And we included this part of the content in the document, which we name as Program TOC, in addition to the reviewer's guide. The other one is to include the validation reports for the key package that were used in our analysis as the miscellaneous datasets to meet the requirements from FDA.

Now, let's take a look at how we construct the document of the Program TOC to include the R information as a disclosure. So as a structure shown on the slide, the description of the ESA package provides the general information about our study and about our ESA package. Then the list of the analysis dataset programs and the outputs programs lists the names of the variables for the datasets for analysis. So far, it mirrors what is typically found in the traditional submission, ensuring the consistency and the clarity. The final section for how to use the readable code is a place where we provide the information about the R package that will have been used in our analysis.

Yeah, so based on the dynamic nature of R that the advanced feature are updated to the package with the versioning up. So we also included the version for the package along with the very brief description for the package that we used in the analysis. So for example, in the proprietary package part, we listed the Miro-Roche since we used it for generating our analysis dataset. We also used quite a few open source packages like Miro and Tern and also we included the package version and a very brief description which can be also taken as the justification of the use. Then in the appendix, a very detailed instruction helps the reviewers to install the R environment and the package along with the example of executing our submitted readable code. This is definitely an optional part for the day-to-day R users, but to make our package to be more adaptable for the browser reviewers, we still decided to include this part into our submission document.

Communication with health authorities

Then in addition to the technical preparation, communication with the health authority is another key to make the plan work. So for the communication for the submission in US, we actually initiated the talk with FDA during our submission preparation period and the proposed to use R and open source packages as a readable code in our briefing package. But the SAS program actually was mentioned from FDA's response. But after sorting for a clarification regarding FDA's statistical software clarifying statement and the acceptance of the dummy R submission palette from our consortium, we received a positive response from FDA reviewing team permitting us to proceed with the R along with appropriate justification as well as any resource that could provide the evidence for the validation of the package or the functions will be used.

So by then, we're pretty happy to continue our journey with R and we also appreciate the flexibility shown from agency's reviewing team. And then shortly after we submit our package, we receive a request for the application orientation meeting and the technical work through. Then in addition to the usual content to cover the datasets and the supporting documents in module 5, we also include the list of the R related documents and their locations, such as the file of a program TOC and the validation reports. To help ease some of the concern during the review, we also point out the difference in the default options between SAS and R for the primary endpoint analysis. So far, we haven't received any information request related to our R submission.

Then for the submission with EU, so actually we're pretty lucky to capture the tale of the application window for the EU RAD data palette and EMA prominently confirmed their acceptance. Then during the EU RAD data palette meeting, we presented the content very similar like the FDA technical work through, but also include some additional details regarding the dataset contents and some explanations for submitted documents as it is the first time to submit the data to EMA. Yeah, so far, there is no technical related questions from EMA's end either.

Then moving on to the submission in China. For the comparison, we aligned the communication with NMPA and the progress with FDA on the same timeline. So initially, we raised the topic at the similar time, but we didn't get a clear response regarding the use of R in our submission. To address this, ROSHU request a type 3 meeting and provide more comprehensive supportive information from China, FDA, and the PMDA, as well as the industry's use of R in the submissions. Then at the beginning of this year, China and CDE write a written response stating that the biostatistics viewpoint is consistent with FDA. So there's no specific software requirements for the analysis. But similarly, as FDA, the response also mentioned that the analysis software and the corresponding functions used should be validated. So we're pretty happy about the response, and we can utilize the ESA package submitted to FDA and the EMA as the basis for the submission in China.

So now, when we're looking back to our submission journey, we feel that it has been both challenging and valuable. Throughout the process of our new drug application submission using R from end to end, we have found that the health authorities are open for our submissions and would like to know that the package has been validated. To facilitate a smooth review process, it is critical to secure agreement through the prior communication with the health authorities.

And then based on our case, we can see that the path for the end to end R submission has been established, and it takes limited actual efforts for the applicants to implement the practice using R with the open source and appropriately disclose the R-related information in the submitted package. So that is the story of our R submission. We hope it has brought you some inspiration. And now I will hand over to Ning for a broader view in the industry.

And then based on our case, we can see that the path for the end to end R submission has been established, and it takes limited actual efforts for the applicants to implement the practice using R with the open source and appropriately disclose the R-related information in the submitted package.

Reflections and the road ahead

Thank you very much, Jingyan. So to wrap up, I will take the next several minutes just to share some of our reflections of the past, the present, and also a little bit look into the future.

So starting with the plan within Roche, looking into the future. Right now, we are really, really excited about this first R end to end submission to health authorities and really happy to see that there hasn't been issues, technical issues so far. And as mentioned, we are welcoming more and more studies using R end to end and using open source tools in clinical trial reporting. So a big focus for us is further adoption of modernized platform and solution. And then after the adoption or with the adoption, like another big emphasis from us is trying to standardize and automate the automatable tasks by leveraging the modernized solutions and technologies. Of course, we are not innovating for innovation's sake. We really want to improve our productivity and then reduce manual work, repetitive work from our data scientists.

So if you're interested in learning more about our automation work and ideas, please feel free to check out the pharmaverse packages, such as NAS for output generation, Shiny app generation, and admiral for item data generation, and also the Citrix open source effort for the oak package for acidity and mapping. There are a lot of automation features being embedded in those solutions. And then also similar to everybody, we are pretty excited about the recent advancement of AI and especially large library modeling. And we are actively exploring how to use AI to assist programming, especially assist our code generation for clinical trial reporting.

And I also want to spend some time to share some reflections from myself to cover the journey during the last five years or so. So I remember that the first time I got exposure to our code submission to FDA was in 2020. At that time, Roche had a post-marketing commitment to FDA, where we needed to do some analysis on a large volume of clinical trial data. At that time, for majority of the analysis, we had been using SAS. But there was one particular set of analysis we needed to use a pretty advanced methodology, in which case it was hard to implement using SAS. But there were already existing R code from academia that we can directly borrow. So we decided to use the R code to fulfill this post-marketing requirement, and then to submit R code to FDA. And at that time, since we didn't have much experience of our base submission, there were so many questions internally. For example, our biggest question is around how can we make sure the FDA reviewers will be able to reproduce the results, how to ensure that they can work in a similar computing environment as we do.

For example, two very simple questions are like how do we share the open source packages we use so that FDA reviewer can use the same open source package. And also at that time for the advanced methodology, we built an internal preparatory R package to wrap up the method. And then we want to submit that package to FDA. And as you may know that for all the submission to FDA, you need to go through the electronic gateway and how to submit the package through the electronic gateway.

So eventually we made it, we made a submission, but with a lot of question marks and we were not quite sure that whether we follow the best practice for those submissions. So I remember that in 2021, our team presented at the R pharma conference at Harvard at that time, it was an in-person conference. And it was very interesting that we actually met many people from different companies having similar experiences. People all started to use R for regulatory submission or clinical trial reporting. And they realized that there are certain industry standard or certain current practice that were not very friendly to the usage of R or other open source languages.

So in 2022, a number of cross-industry collaborations has been established with the passionate people from different companies and with a very strong engagement from FDA. I'm calling out two efforts here because those are the ones I'm most familiar with. One is the R consortium, R submission working group, which is a collaboration with FDA to showcase public available open pilots and to show how to submit using R and what are some good practices and also try to identify gaps for R-based or open source language-based submission to FDA. And the other one is called Pharmaverse. And as you heard from earlier presentations that Pharmaverse is an open source collaboration effort trying to develop high quality technical solutions to bridge any gaps in a clinical trial reporting space.

So in 2022, it was really exciting that we finished the pilot one from R consortium working group. We were able to showcase how to use R for RBA submission and then engage with FDA review or to identify gaps, etc. And then those gap has been feedback to Pharmaverse for further development of technical solutions to bridge the gaps. In the following pilot one, there were a number of other pilots in the past two years. In pilot two, we were able to collaborate with FDA to do a Shiny submission. Pilot three, we submitted R code for Adam generation. And right now we are doing pilot four, which is very exciting pilot that we will explore container web R technology.

So I want to say that all those cross-industry efforts shared so much learnings to the Roche internal team. And I truly believe that those valuable insights really streamline our experience in our first R into a new drug application submission earlier this year. And for example, for those iterative learnings, just looking back to the two very simple questions we had during the PMC days, I recall that when we did a first PMC submission within Roche for open source package, we simply just provided crime links to FDA reviewers. However, we all know that this is not optimal because crime may not point to the package version that we use. So during the R consortium submission pilot number one, we have more in-depth discussion and then decided to use and run snapshots, which can point to a specific version of our package. And also at that time, we started to provide more detailed dependency graph and descriptions in the reviewers guidance.

In 2023, we realized that the RE has been pretty mature at that time. So we decided to explore the usage of RE in pilot two, and that was really successful and largely simplified the submission progress. And then right now we're exploring a container web R solution, which is really exciting and I truly believe that that will further increase the productivity in the simplified submission process. And in terms of a proprietary R package, as you may recall that for any submission to FDA, all the files need to go through an electronic gateway. And when we did a post-marketing commitment submission, we learned that for security reasons, the electronic gateway only allow for plain data format, plain file format. So for the PMC submission, we needed to manually convert all the package content into a plain file to do the submission. And then during the RE Pharma conference, like we discussed with a number of people from different companies, and our collaborators from Merck came up with this very smart idea to build an R package to automate the manual process. So under Pharmaverse, there was a R package developed called PackageLite, which automated the manual process of file format conversion, and which largely simplified the process of submitting proprietary R packages. And this PackageLite solution also has been used in our Roche's first R end-to-end submission that Jingyuan mentioned.

And very excitingly, just several weeks ago, our R Consortium submission working group pilot 3 team actually worked closely with FDA colleagues to test the feasibility of submitting zip file for proprietary R packages, and it was very successful. So this further streamlined submission process for proprietary R packages for the future.

So looking into the future for those cross-industry efforts, from my side, I just always feel very grateful and proud of being this very active community. And then everybody is really passionate, really committed, and really supportive to each other. I just found very lucky to be part of this journey, where a lot of us believe that being together, by adopting open source spirit, ecosystem, and tools, and by contributing back to open source and open collaboration, we are really accelerating the bench-to-bedside treatment development as data scientists. And we also truly believe that by automating the automatable tasks, and then reduce repetitive work, and then reduce manual work, we will be able to unleash our clinical data scientists to focus on more impactful work to really bring benefit to patients.

And we also truly believe that by automating the automatable tasks, and then reduce repetitive work, and then reduce manual work, we will be able to unleash our clinical data scientists to focus on more impactful work to really bring benefit to patients.

So lastly, I just want to say thank you for everybody who has been part of this cross-industry community, who has been contributing to this cross-industry community, and I'm really grateful to have the opportunity to work with you directly or indirectly. And if you are not a part of the community yet, you haven't been contributing to the cross-industry efforts yet, it's never too late. I share several links over here, please feel free to take a look at the working groups and identify the ones you may be interested in, and just reach out. I'm sure everybody in the community will welcome your contribution. So with that, I want to thank you for your attention, and we are happy to take any questions.

Q&A

Awesome. Thank you so much to the entire Roche team for detailing their experience here. I know so many people are very excited to hear about this journey that you've gone down. So we'll tackle a few of these here, and we're going to post a blog post after the webinar today, so if we don't answer some of these questions, stay tuned and we'll get some information out to the community after the webinar. So for the Roche team, to start things off, here's the first most popular question. Has the FDA EMA described any barriers to entry for end-to-end submitted pipelines? Example, for FastTrack or extensions and PDUFA EMA decision dates.

Yeah, maybe I can start to chime in here. Yeah, I don't wear any barriers. I won't say any barriers are heard from. The clarification we request to FDA is the only time we discuss about the programs. So actually, also FDA quickly just get agreement with us. We provide the validation report and the justification, then it should be fine. And after the submission, yeah, we didn't hear any, like, even situation similar like a barrier, and there's no follow-up communication regarding the programs, the language that have been used. For the EMAs, because we attend to the EU raw data pilot, so maybe from the nature about this new pilot, we would hope the data would help a little bit for the EMA reviewing team to maybe evaluate our packages.

Maybe also just to reiterate that, I think Jian mentioned that from FDA, actually, there was a specific clarification statement being published probably 10 years ago saying FDA is not requiring any specific software language. And we also learned that basically, like, from FDA, I think they are experiencing similar talent shift as us. So as you can imagine that for the FDA reviewers, that reviewers who graduated during the past 10 years or so, they probably already knew R very well. So I don't see a real barrier over there. But also, as Jian mentioned, I think communication is the key. It's possible that, your program has been paired to a reviewer may prefer one of the languages. But from FDA perspective, there is no requirement. And as Jian mentioned, that if we communicate early enough and do the technical work through, from our experience, as long as we kind of front load them with the information that they can prepare for that, there is no disadvantage of one programming language over the other.

Did the regulatory body also require a QC done in SAS for the submission created in R? Yeah. I think from our communication with the health authorities, we like plan to submit, we didn't hear anything about, like, requirements for QC. I think maybe we should be expecting the numbers, the analysis we include in our CSR or submit a document to be the accurate ones. So, yeah, so we didn't meet something question like QC should be done in some certain way. Yeah, I feel like, again, kind of like reflecting, kind of like citing the same clarification statement from FDA, they are not asking for any specific programming language. So I don't expect they will ask for QC, you know, specific programming language. Yeah. This was just more our comfort level. We wanted to, you know, use SAS QC, as opposed to a regulatory requirement.

So a lot of questions coming in about package management. One of the popular ones there is, how did you manage our package version control in-house? Yeah, I want to say we, the three of us are not the expert, but I will try to answer that so, yeah, internally, I think, like, similar to many other companies that we have those release cycles. So we use container and, like, basically for each release of a container version, it comes up with, like, later, latest, or, like, later, like, version of R itself and also R packages, et cetera. If people are interested in learning more, actually, there was a R Consortium blog posted earlier this year sharing some of the technical details from different pharma companies in terms of release cycle, et cetera, that probably provide more information.

Did you have to create new processes for R submission, or were you able to keep the same processes that you had with SAS? Yeah, so, yeah, it is our first time to do R submission, but generally, I would say the processes stay the same. We didn't create anything, like, specific for the R submission, but as we shared in our presentation, we just pick a way, like, we think proper and clear to disclose the R information in our package. Yeah, totally agree. I feel like Jian covered very well that basically, for example, for programming TLC, we add more information because R is still new-ish, and arguably, basically, when we do a SAS submission, we should also kind of show the reviewer how to install SAS, but we know they have experience in that, so we skip that in previous submission. Yeah, but I will say that the process is largely the same, but we try to be more specific just in case that this is new-ish to some of the reviewers.

So, a lot of questions coming in about the tools that you used, and I think it's probably important to mention that last year at R&Pharma, James Black gave the keynote on the statistical computing environment, and also last year's webinar, we went pretty deep into that, but maybe can you share a little bit more about OCEAN and how you approach the statistical computing environment that you have? Yeah. So, for OCEAN, basically, it is AWS-based infrastructure, and on OCEAN, we try to enable both SAS and R, and also, presumably, Python if there are more Python users, et cetera. Yeah. So, as mentioned, for computing environment, version controlling, we have been using containers. We have different containers for SAS, for R, for different package versions, et cetera. And then for code version control, we use GitLab. So, internally, we do have a GitLab instance, and then for traceability, we use Git, and we also use SnakeMake for orchestration.

When you say end-to-end submissions, is it in the context with ECTD regulatory submissions? Yes, yes. It is ECTD, yeah, generally for the electronic data submission.

Awesome. Another question, did Roche develop an internal validation process for the libraries, Admiral Roche, for example, and can you share more details about this? Yeah. So, as Jianyuan mentioned, we do have an internal pipeline, which is in large, a large, very similar to the R validation hub white paper. Yeah. So, basically, like for proprietary R packages, you need to go through the internal validation pipeline.

Was the Shiny app itself submitted to the FDA? Accessible to the FDA, or just an internal tool? No. No. So, this was just an internal tool that was informally used by the project team, but the app itself was not submitted. But we had the key analyses in the study report, which reflected the same information as what was in the app. Yeah. Yeah, for the Roche submission, we haven't been submitting the Shiny app to FDA. However, like if you follow the R Consortium R submission group, the pilot tool actually tests out the feasibility of submitting a Shiny app to FDA. So, if a sponsor prefers to submit a Shiny app as a supplementary, like a utility tool, then it's feasible to submit, yeah.

I'm not sure. At least in our journey, we didn't use from R to generate our define. So, as we said, we didn't do anything like a very big deviator from the traditional way we're doing our submission packages. We just like choose the proper way to disclose the R information in our package. Yeah. I think this is the first time I also saw Mint as part of the Oak workflow. What is the Mint package? So, the Mint package has, it's a web application where we can define the metadata for the mapping instructions for, yeah, mapping instructions. So, that was used by the SDTM team. And maybe, you know, you can also, I feel like that's also part of our automation, like, automation effort. So, for Mint, I believe that we simplify the mapping for standard data sources, right? Maybe you can share a little bit more. Yeah, yeah. I mean, so quite a lot of, we have obviously a metadata repository. So, quite a lot of the standard mappings are kind of available already. But it's looking at each study and seeing what's non-standard. So, that's where the effort lies, where you need to go into the application and add in sort of non-standard mappings in there. The majority of the, sort of, the SDTMs are pretty standard. Most of those mappings are kind of following, you know, sort of ROSH data standards. So, the effort there is really focusing on non-standard pieces.

Awesome. A lot of questions about the internal Admiral package. And so, someone's asking, how was the proprietary R package actually shared with the health authorities? Yeah, for our case, when we prepare for our package, we actually use the package light package. As Ning mentioned, it's a smart package, help us to translate the content of that package into a text file. So, we just included a text file together with our readable code in our submission package. And in our, the program TOC, the appendix, we include a very detailed instruction of how to make use of the text file from the end user's point and install that at the user's environment. And for future submissions, please check out the pilot number three, Repo3. So, right now, I believe you can use zip file for your proprietary R package.

Got a couple of more questions, two minutes, we'll see how far we get. How are packages, Admiral or O packages useful for non-CDISC standard data? Yeah, I'm not very sure about it. I think both O and the MIRA kind of appear recently and the recent submissions to health authority should be OCDISC. So, yeah, I'm probably not so very, yeah, clear. All right. Or we can get back to you. Yeah, we can ask, do some research and get back to this question.

For sure, for sure. So, I'm going to ask my own question. I just thought it was very interesting. I love the slide that said you're at 50% in training and adoption and supporting the transition, that the goal is 90%. What's the plans to help you get to that 90%? I think we are definitely heading there. So, we have a pretty comprehensive plan, like a month by month, we plan out like who are the study teams to be onboarded and what support they need. And for each team, actually, we assign a onboarding coach who are most of them are early adopters or users who are already on the new platform. And I think one thing I really appreciate is that I see really, really strong passion from the user community. I know like whenever we move to a new system or new tools, the initial phase