Posit Pharma Meetup: R for Clinical Study Reports & Submission | Yilong Zhang
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Everybody, thank you so much for joining us today. Welcome to the RStudio Enterprise Community Meetup. I'm Rachel Dempsey. As you can see, we're doing things a little bit different from normal meetups today and are streaming out to LinkedIn, Facebook, YouTube and Twitter. A huge thank you to Tom Mock who's hanging out and helping behind the scenes as well. I'm also joined by my colleague today, Phil Bauscher.
Hey, everybody, thanks for coming. I'm Phil Bauscher, the Director of Life Sciences and Healthcare here at RStudio.
If you've just joined now, feel free to also introduce yourselves through the chat window of wherever you're watching from and say hello, maybe where you're calling in from. But for anybody who's joining this group for the first time, this is a friendly and open meetup environment for teams to share the work they're doing within their organization and teach lessons learned, network with each other, and really just allow us all to learn from each other. So thank you all for always making this such a welcoming community. We really want to create spaces where everybody can participate and we can hear from everyone. So I'd like to reiterate that we love to hear from everyone, no matter your level of experience or industry that you work in as well. While you can ask questions on whichever platform you're watching from, you can also ask questions anonymously as well, through the Slido that we have.
But we are so excited to have so many members of the pharma community here today as well. So I'd love to use this opportunity to also share some information about a few other pharma communities and hand it over to Phil Bauscher from our team, who is the best resource for all that.
Awesome. Thank you, Rachel. Very excited to be here today. So this is the sixth installment of our Life Sciences series. Previously, we've highlighted webinars by GSK, by Novaris, Roche, and Genentech, by Johnson & Johnson, and even the EPA. And you can find these webinars on our Life Sciences champion page. Over the last seven years at RStudio, my team and I have helped many, many organizations use R for submissions, especially in the PK PD space. And I would say in the last two to three years, there's been a lot of interest to use R for late stage clinical trials. And many of the organizations that we work with, you know, they have different approaches to validation, to qualification, for risk, for reporting.
At the R in Pharma gathering that happens every year, you can see many of talks from the past conferences that highlight different approaches to submissions and how organizations are handling some of these things. Also, I'll point out that the R in Pharma conference is coming up. It's a free virtual conference will be in November. And we hope to highlight both how small and large organizations are clinical submissions using R. And today we're going to be featuring one way that an organization has approached this, which is through Merck. And this is really going to highlight the awesome work from Nan Zhao, Yi Long, our speaker, Keven Anderson, Sarah Wang, Simya Yi, and countless, countless others, people at Merck that have helped to enable the R reporting as well as documenting it for the community.
And, you know, one example of this, which is very exciting, is the use of the R package GS design, Keven Anderson leads up, that was used for the Moderna vaccine. And so at R in Pharma last year, this webinar that we're featuring for you today was one of the most highly attended bookings of the highly attended booked up webinars or workshops that we had. And it's so exciting that we get to bring this for the community today and bring it to a larger audience and highlight this fantastic work from Merck. And so it's with a great pleasure that I'll pass it over now to Yi Long to highlight and talk about this work that they're doing.
And I'll just also say, as mentioned earlier, we will have time for a Q&A as well. So you can put your questions into wherever you're watching from, or use the Slido link. But I'm so excited to be joined by Yi Long Zhang, now from the health technology research team at MEDA, to talk about his experience at Merck, where he focused on developing clinical workflows and clinical reporting. So with all of that, thank you all again for joining us. It's amazing to see such a great group of people here. And I would love to turn it over to you, Yi Long.
Introduction and background
Yeah, thank you, Rachel, for that step. And thank you, Phil, for the great introductory. Yeah, today, I'd like to share over experience of using R for clinical study report and submission. And this is a work where at Merck, to work with a group of talented members, within and across the pharma, to share how we are thinking of the recommendation and experience for late stage clinical trial to submit a work to the FDA or for the other regulatory agencies. So the so called ECD package to their portal. And before I start, just a general disclaimer. So all opinion expressed here just for my own and doesn't represent any organization.
So before we start, I think it's just to reinforce the clarification from FDA, right from FDA's statistical software clarification statement, it's clear to see that FDA doesn't require use of any specific software for the statistical analysis for clinical trials. But there's definitely a requirement to fulfill the 21 CFR part 11 to complete the qualification of the software, and also to conduct analysis in a GXP manner. So for here, it's more like we share how we are thinking to execute following this clarification statement to meet the requirement from regulatory agency to complete the analysis, and also to generate a report that's ready for the submission.
Motivation for using R in clinical trials
So we regard to the motivation why we need to use R or different R package for regulatory development, right. So for late stage, it's also have many of the space that have a lot of the nice package to be used for the clinical trial development. For example, R has been widely used for clinical trial study design, for example, for group sequential design. As Phil mentioned, the GXP package for group sequential design has been widely used within Merck or beyond. And also recently, the development of non-proportional head group sequential design is also under development in R and handy and publicly available.
And secondly, so recently, you can see a trend of using estimate framework following the ICH ENI R1 addendum. So in response that noble missing data approach may need to be applied. For example, in recent paper, we collaborate with academia members for recurrent event data and robustness of missing data. All those methods have been implemented as an R package and publicly available. So when clinical trials try to apply those noble methods to their trial, instead of re-implementing other software, maybe we can just apply those R packages available to the clinical trial study with the proper manner that package has been qualified and tested before it can be used for the final results.
And as you may be aware, R is also widely used for Bayesian statistics with a popular package like STAN to be able to handle the Bayesian modeling part for hierarchical modeling. And sometimes for the network meta-analysis in HTA space, the drug reimbursement analysis can also be applied in this framework. So in addition, R always has a strength for the virtualization part. So there are some dedicated work related to the pharma, for example, the Safety Graphics Working Group has created a nice shiny app to summarize a bunch of commonly used safety graphs. And also another example, like we're exploring how we can use an interactive forest plot to help simplify the DMC safety monitoring review.
So if you are interested, there's a recent talk by our colleague, Yu Jie, at the IAM Pharma Conference to highlight how this small package forestly can be used to create an interactive forest plot to aggregate the safety information and to have a quick review and digest by the DMC member. So these are some motivations that drive us to explore more of a proper and compliant way of using R for regulatory submission.
ICH guidance and the R4CSR book
So for some background, the work here we're still trying to follow the ICH guidance, right? For the clinical study report, we're following ICH E3 to structure the content of a clinical study report. And for the company book, r4csr.org contains all the detailed information step by step how we can create some example tables using the tool we provided. And also we're trying to fill the gap to streamline the workflow, right? So in the same book, we try to discuss what's the concept for project management in clinical trial study space and provide some recommendations for that. And in the end, the most important part, once we complete analysis in R, we need to find a way to properly submit the results and the source code and data to the regulatory agency through the format required by the regulatory agency. That's basically this ECTD format required by most of the global submission situation for drug development.
And for the table lists and figures, for this talk, all the focus is still to deliver the results in RTF or Microsoft Word format, because in most of the pharma companies, the RTF or Microsoft Word still play a central role to combine all the table lists and figures into the clinical study reports. So to collaborate with conditions and with the medical writing and the regulatory group, right? So people still using the Microsoft Word as a like a key component to combine all the things together. And also acknowledge that different organizations can have different ways to complete their clinical study report, and the table standard can be different. So this is one recommendation how we at Merck is handling those framework.
R consortium pilot submission
And so for part that we work open space is about our consortium pilot submission, right? So our consortium frame a nice working group to organize people from both the regulatory agency and the pharma companies in different representative to get everyone set together. And also to think how we can submit the work completing to the FDA. So with this nice working group, so we contribute together totally in open space area. So you can go to the link like to identify who's member and affiliations. And for now, what I'm trying to share is outstanding work by this working group for the first pilot, that we are able to prepare the analysis package and prepare it into the ECTD format and then submit to FDA with review.
And here's just a short timeline, right? So about November last year, so we initiate this submission to FDA, and then receive the FDA response in December. So in the response, we received some feedback and comments. So we try to address them and submit a revised submission to the FDA. And in March this year, we received a final FDA response. And all those submission and response letter available on our consortium GitHub website that we will also have link in following slides.
Challenges in the first pilot submission
So while we are dealing with this first pilot submission, so few challenges that we need to solve. The first one is that every organization will develop some internal package, for example, to streamline the workflow to wrap up some open source package available to a way that easy to be create the table figure layout following the internal standard. So you always need to have some internal code that need to be shared with regulatory agency. So one question is how we can submit those internal develop appropriate package to the FDA, right? Because in the ECTD format, there are some restrictions that you submit, for example, we're not allowed to submit a lead file within the ECTD package, and all the filenames should be in lowercases without any special character, right? And all the content should be in ASIC 2 format. So how are we going to deal with this with a simplified way? So that's the first challenge we try to resolve. And basically, a simple solution is to use a package called package light, we will briefly introduce in following slides.
And the second part is how we can follow the ICH or FDA guidance in preparing the ECTD package, right? ECTD have multiple modules, and typically we're following all the contents in the module five. That's basically the data set and the program for clinical study report. And for the data part, typically, we still follow in the standard requirement to use XPT format, and also have a package to create those things. But for the first pilot, we try to focus on the analysis part by assuming all those XPT data set is already available, how we're able to submit the analysis program to the regulatory agency, right? And the last part, even though FDA didn't expect all the code is executable or reproducible from FDA review perspective, we still try to think how we can enhance the reproducibility from FDA reviewers perspective, right? How they can rerun all the analysis code we provided in the ECTD package in an easier manner, right? So what kind of step by step instruction we need to provide into the analysis data in your guide? So those are the challenge we're trying to resolve in the first pilot submission with some simple analysis.
Deliverables from the first pilot
So once we complete the first pilot, there are a few deliverables we combined into the ECTD submission package. So the first one, we're able to create a sudo proprietary R package, I call it sudo because it's also available on GitHub. So you can check what the package looks like. And also, we provide R scripts for the analysis as also on GitHub repo. And then as part of the ECTD package, we also need to provide an analysis data review guide to detail like, for example, what's a version of R and R package, and what's a step by step instruction to rerun the code, and also where to find the data set and output into the ECTD package.
So in reproduce the work, there are two open source GitHub repo. So the first one is more from development perspective, like as an organization, we need a space to save all the source code documentation, right. So that's all saved in this development repo to represent as an organization, how we can organize all those information together. And the current recommendation for the first part is using R package for the structure to organize all the information as needed. And those details also be discussed in the R4CSR book. And the second part, once an organization complete the development, then we need to reorg the information for the necessary part to be put into the ECTD package into the module five of the submission package specifically. So in that part, this is a second repo called ECTD package repo, that basically count only the ECTD part that required to be submitted through the FDA portal to the agency, right.
FDA response to the pilot submission
And after we submit the ECTD package, the initial one to the FDA. So this is some highlight of the response from the FDA. So first, FDA reviewer confirmed that by using the R version 4.1.1, FDA was able to run the submitted code and confirm the applicant table and the submitted figure in report here, basically the PDF file combined table is reproducible. And also FDA reviewer developed their own code to cross check the results, right. So the analyst was able to independently generate all those tables using the submitted data. And there are also some minor comments being identified by FDA reviewer that we basically try to address in the revised submission.
FDA reviewer confirmed that by using the R version 4.1.1, FDA was able to run the submitted code and confirm the applicant table and the submitted figure in report here, basically the PDF file combined table is reproducible.
And for the revised submission, we basically to address those comments identified in the first submission. And once we receive a response, basically FDA agrees that the initial phase of this first pilot submission has been completed.
Future work and second pilot
And once we complete the first pilot, the working group team members start to brainstorm, what can we provide more to the community, what will be our future work for the second or potentially third pilot submission. So there are a few things being identified, and team member working on those efforts. For example, we want to see if it's possible to submit a Shiny app within the ECTD package to the FDA. So the FDA reviewer can rerun the Shiny app in their local machine and to start to explore and digest information more efficiently. In an interactive way, right.
And second part, so the first pilot only target on FDA, but for an organization, typically we are thinking about global submission for drug development, right. So we also try to identify the potential point of contact for different regulatory agencies, for example, in EU, in China, Japan, and see if we can also start first pilot to those regulatory agency and hear feedback to continue improve the workflow to potentially prepare for submission using R for global regulatory submission. And in addition, for the first submission, the analysis is quite simple, and like simply the baseline correctness table, some AE summary table, or a simple ANCOVA analysis for efficacy, right. There could be more advanced analysis requires some complicated R package with certain system dependency, right. So we'd like to also explore the possibility of submission with some advanced analysis to the master, for example, for study design or for missing data and for Bayesian method as we discussed in the motivation of this work.
Reproducibility framework
So I would like to go a little bit deep on how we are thinking about reproducibility. And also based on the guidance from FDA, some recommendations we can take from the pilot submission. So these diagrams borrow from Roger Poe's famous science paper in 2011 to discuss reproducibility in general, right. So this diagram basically show the concept between the four replication, which represent, we can fully represent a data set and analysis, right, which is unlikely to happen in clinical trial space, because how expensive a clinical trial could be, right. So most likely, over reproducibility is more referring to the code part, right, with a given data set generated from a clinical trial. So how we can reproduce the analysis from the data set.
So that's basically the part from the code side. And within the code side, there's also multiple levels about the reproducibility. So for the first part, right, so if we have some internal develop our package, then we need to version it and we need to reproduce with a certain version, right. And second part, there's also open space, our version, our package version need to be managed. So ensure the reproducibility. And in addition, sometimes you may see some discrepancy for the results between operation system. This is also true for many other software, right. So the assumption is within an operation system, then we reproduce results. I think that's also the concept like why we need a contender for many of the analysis to ensure reproducibility, right.
And one more, even more challenging situation that in very real cases, you may get some trouble with different hardware, even though we are running the same operation system, or you're on the same R and R package version. I will say this part is really unlikely to happen. But I just want to highlight this is also the part of need to be considered if we're doing something seriously. But for now, over reproducibility is only focused on the first two part, you know, both R version and about R package version within a predefined operation system and hardware.
FDA requirements for reproducibility
And with this given definition, so we start to reveal the requirement from FDA, like how FDA was thinking about reproducibility of the code. So from FDA study data technical conformance guide, we would just quote this paragraph. Basically, from the sponsor perspective, we need to provide a software program, we need to provide the Adam data set used to generate the table and figure associated with the primary and secondary efficacy analysis. In other words, the CT analysis code typically considered as optional, right. So the focus from FDA perspective is to run the source code for primary and secondary efficacy analysis. And the second part, the specific software utilized should be specified in the analysis data review guide, like which R version, which R package version need to be documented properly to ensure the reproducibility.
And also, FDA tried to highlight that is to understand the process by which the variable for the respective analysis was were created and to confirm the analysis algorithm. Basically, our understanding is that FDA did expect the code can directly be executable from the end. And the goal to have those software program is to understand which variable in the atom data set has been used for the analysis, and which exact analysis algorithm, for example, for ANCOVA model, what's your model terms, what's your missing data strategy to handle them in the code space, right, because that's exactly what we do in the analysis. And also FDA expect the software program should be asked to text format. In other words, those Azure crafter shouldn't be put in the source code.
Recommendations for implementation
So to achieve that, here's some recommendations from over end. So first, when we're doing the final analysis for the clinical study report, we need to fix the R version, for example, R4.1.1.0. And also we need to fix a snapshot date, for example, August 31st, 2021. The purpose to have a snapshot date is to fix the RPEC version we are using for the analysis. And this is both achievable by the RStudio public patch manager. And also for the current time machine created by the M1, or if within your organization, you have a commercial version of the RStudio patch manager, you can have a time machine to both froze the internal and external developed RPEC. So this is how a snapshot date can be helpful to pre-specify the RPEC version.
In the final analysis, you still need to specify each version of the RPEC used for the analysis. And also, we need to consider flexibility of input and output paths. Because when your code is sent to the FDA reviewer part, the source code paths for the data set will be different. So some guidance to update the paths of the input and output should be provided in the NS reviewer data guide. And also, lots of paths should be clarified into the ADR. And based on our understanding, typically, the FDA reviewer is doing the analysis in the Windows machine. And for some organization, maybe we complete our analysis in a different operation system, for example, Linux server. So in that case, we want to rerun the ECTD package and try running it over Windows machine as well, to ensure it's fully reproducible from the sponsor end before we do the submission.
Tools developed: R2RTF, pkglite, and portable environments
So in general, we share the same philosophy as heavily recommends for the package, right? So we just quote here, so anything can be automated should be automated. So we want to do as little as possible by hand, we want to do as much as possible with functions. That's why we're trying to create R packages to simplify the workflow. And also goal for all the developers to see more of what needs to be deliverable, instead of to worry about what our package looks like, right? So we want to bundle them together and be available in open space.
Anything can be automated should be automated. So we want to do as little as possible by hand, we want to do as much as possible with functions.
And here are a few tools we developed to simplify that. So the first part is how we can generate a production-ready table and figures in RTF format. So that's how we create a package called R2RTF to achieve that. And second part is mentioned how to get all those internal R package available into the EC2D package. So we create a package called package-alike to represent R package into the pure TXT format and also can revert and also install back into the reverse machine. And the last piece under internal validation is how we can create a portable R environment to help the reviewer or have the dry run to complete analysis. Because on your machine, you can have different R versions, different R packages into your user space, right? What if the submission package have different R and R package versions, right? You want to have a self-contained folder to reproduce that. Once you complete that, you can just delete and that doesn't have impact for your system.
And all the details is available in R4CSR book. So if you are interested, you can go there to find the detail. For now, I just gave some highlight. So for the first one is about R2RTF. It contains several functions to pipe together to generate an RTF format. So that basically assume the data set is already available. For example, you can handle that in tidyverse or other package. Once the data set is available, R2RTF will format it into the RTF for the production. It's also available to do the same work. For example, Rosh create R package R table to do a simple execute format. And Artoros have a package for Tharma RTF to also create tables in RTF format. And there's also other general tools like GT or X table and other tool available.
And the second part is about to combine all the internal R package together into one TXT file that be able to sync into the ECT package module file. So Petulite is a tool to pack all packets into the plain text. It's also grammar to specify which part of your internal package need to be packed. For example, you don't have to pack your testing code, you don't have to pack all of the R package to your part for the goal is to have the source code in the R folder to be executable from a reviewer part. Also, it creates a standard, the TXT file can just be open and be reviewed. So you can still take a look what has been assembled into the final submission package.
So this is just an example code. Once you provide the path to the package, we can call it, then we have a convenient function to indicate what typically we consider to be saving the ECT package. And sometimes you want to specify more, for example, you have some additional material in the IST folder. You can also pack multiple R packages into one file. So you have two packages internally developed and saved into one TXT file. And once the TXT file is available, people can just unpack in their local machine. And you can also specify install equal to true, that automatically install into your system.
And the last piece is about a portable R environment. So to rerun the analysis, you probably need your .R project file specified RStudio configuration, a .R profile, or .R environment for that. So to recreate the learning environment. And that also being enabled in Kineslake. In addition, we can install a specific version, for example, R4.1.1 to a project folder. We can specify the R tools, because some of R packages need to be compiled, we still need R tools in that part. So all those information can be covered in your Kineslake in this pipeline, and you can specify what information you need in this portable environment. Once set up, it can be used for the dry run, or for the FDA reviewer to review your content to simplify multiple steps in the ADR here.
Summary and project management recommendations
Okay, so as a summary, I will try to see as a US story. So for a statistician, we can use tidyverse, R2RTF, or other internal tool to define the mock table, to create a listing and figures for the statistical analysis of clinical trial, because typically you have more than 100 tables and figures for ethics and safety for drug or vaccine development. So you also need a way to organize them efficiently. As a programmer, we also use tidyverse or R2RTF as an internal tool to develop or validate analysis results based on those mock tables. And as a statistical programmer, we also need to use patching light, and also as an internal tool to prepare those internal development packages in a way that can be saved in the ECT distribution package. And in the end, so as an internal external reviewer, we use clean slate to reconstruct a portable environment to reproduce analysis results.
We try also to give recommendation details in our alpha-CSR book. So for the photostructure part, where you have more than 100 deliverables, how you organize that. So our suggestion is to use a package of photostructure to ensure the consistency, reproducibility, automation, compliance, because there's already a lot of tools around our package that allow you to do the analysis and do some compliance check. So that's one way we're thinking our package structure can potentially be used to organize all those things together.
And also for project management. So we also recommend people to think in a trial way. So set up for success, and work as a team, because you always have a multi-functional algorithm to work together for the trial. We're also interested to design clean code structure, right, to make sure you have proper documentation, and relationship between all the functions so other people can understand when they translate from project to analysis. And also it's necessary to set capability boundary, right? In that way, people can cross-boundary to help, but also be clear the responsibility of each role. And also importantly, we would recommend people to contribute to the community. When you feel like a tool that doesn't relate to the business logic, that can be a standalone R package, I would like to share to the open space. In that way, we can continue to grow and to find a consensus to improve the workflow across the farm.
And when we got to the software development life cycle, it should also follow several steps, like planning, development edition, operation part. So in each of the steps, you need to be properly handled for internal development R package, and also need to be handled to qualify the external R package. So those information can be referred to the R validation hub, white paper, how the organization can achieve that.
Cross-industry collaboration and future directions
And there are many cross-industry collaborations. So the work here cannot be achieved without all those outstanding collaborations with people from across pharma. So for example, if your interest is to understand how to qualify external R package, I would suggest to go to R validation hub website and potentially join the working group. And there are several presentations being handled previously to understand the workflow and tools available. And for this talk, we discuss R-based submission pilot, and more future work. So more than welcome to check the website and join and contribute. And also, the R in pharma conference is just wonderful. It helps us a lot to organize together to share the ideas and information to collaborate.
And for the future direction of the current work we have, so we still try to enhance the compliance with the disability because it began automation, right? We want to also consider how we can assemble the metadata around the analysis, as you probably hear like the effort for safety 360. So we provide some prototype. There's some package called MetaLite or MetaLite.ad on the GitHub work website. So in that way, you can check how we are thinking how metadata was flowed from the planning and reporting in the trial and report. We also want to enable some advanced design special methods, and also introduce some interactive visualization in the report for the DMC. And also can be ways always out back and server.
That's basically conclude all my presentation today, and thank you for listening.
Q&A
Thank you so much, Yilong. You can't hear us clapping but we're all clapping. Let me pull Phil back on stage too. Thank you so much for a great presentation. I can see there's a lot of questions coming in already. So we'll do our best to try and aggregate them from everywhere. But I can see one question that has been pinned to the chat from Manoj and apologies if I'm pronouncing your name incorrectly. It is would R replace SAS completely? Or its usage to some extent?
Yeah, we don't seem like R will replace SAS completely. Yeah, it's more like we add a more additional tool to the arsenal of the clinical trial development. Like within our organization, definitely we can have different tools, even like Python, Julia, potentially can be in the clinical trial development, but we need a way to consider what would be a compliant and proper process for the development.
Thank you. And I just want to remind people to that if you want to ask questions anonymously as well, you can use the Slido link that I just put up on the screen. But I see one question that's been upvoted on Slido from Kevin is, if you're not using containers, is Rm playing a role in documenting dependencies?
Yeah, of course, our package itself already have a well defined dependent structure in the description file. You can also use some nice package that I use to log the file package version as well. And that's also why package light would also be helpful because we just assemble all the package within a TXT file. And you can also unpack them. So your R and R package version has been fixed in the reviewer site. So I would say a container is not necessary, but will be a good add. But one challenge from FDA perspective is how we can enable container environment from FDA perspective.
So a lot of a lot of good ones coming in. But Yilong one I thought you could help with here is a question says, what's the difference between ECTD and SDTM formats? Is one preferred and does your package only support one?
Okay, so we have two different concepts. So SDTM is a data standard. So that's basically what's the source data should be organized when you connect from the CRF. And ECTD is more like a folder structure to organize a clinical study report like all the files from the clinical implementation operation, your clinical study report, your data set. So SDTM data set, just one component stay in the ECTD package.
Okay, another upvoted question that came in anonymously that I want to be sure to ask is I'm a grad student with limited R experience, interested in trials. The decision to make R my main software is a big one. What are some of the reasons they may want to make that switch?
I'm trying to understand. So definitely people have their own comfort zone. They have their own set of preferences. So definitely, I'm not saying that you can't make a switch. But definitely people have their own comfort zone. They have their favorite tools for development. But as an organization, they have their own process, their standard, a group of people share the same consensus, how we are going to work. When people join an organization, I would suggest to first follow their existing process and consensus. And also to see if that match with your own interest and your comfort zone. So I wouldn't say people need to change to use R or use R. But I would say it's more like marketing great space. It's better to learn multiple languages to learn and borrow from the strengths of each language.
It's more like marketing great space. It's better to learn multiple languages to learn and borrow from the strengths of each language.
Fantastic. Awesome. I think I can take the next one, Rachel, here. So to kind of switch up some of the topics a little bit, someone says for deep learning, Python is a popular choice. What efforts are made to address this in R? Yulong, has your team at Merck tackled any of this?
I have very little experience on this. Yeah, so I can attach some links to a workshop that we teach. But there's been a lot of updates in this space. One is the fact that there's the reticulate package, which helps translate R to Python. It helps the interoperability between R and Python. So that's a big piece of it. You can also just come to Python directly, which is fantastic. And a lot of the ecosystem now supports that. There's also the TensorFlow package that's been created for R. JJ Allaire, our founder, gave a great keynote on this at the 2018 RStudio conference. So I'll put a link into the chat with some deep learning examples. But yeah, I don't see a ton of this in clinical. But I was just curious, Yulong, if it had come up for you.
I'm also going to share a link to a meetup that's coming up on June 1st on using Python in RStudio team, if it's of interest to anyone, too. I'll put it in the chat. But I see there's a question that came over from LinkedIn Live that says, thank you, Yulong, for the great talk. Have you and your team considered the use of R Markdown to integrate the report writing with the analysis and reporting?
Yeah. The first pilot, R Markdown is used for the reporting part.
Well, so I wouldn't say it's not come with warranty. It's more like, as an organization, you become a customer of an open-source software, because that's not like a closed source where they have some pure statement. So often, they also have a PDF file to clarify how R can be used in the clinical trial development part. And for an organization, it becomes the organization as a custom also to maintain documents to clarify as an organization how you are using open-source software following the GXP guidance. Once that's been cleaned, I think that's sufficient for the analysis part. So as clarifying the FDA classification, FDA can see that you use some software that's being developed by a company and open-source.
So I found one here that I thought was pretty cool. It says, how are the formats handled in R? And I suspect that's a question on maybe the data formats for clinical trials. I know, Yilong, you're pretty active in that space. What was that like for you tackling the data management side of these clinical workflows?
Yes. The data format is language-independent. So we definitely will follow the SDTM data standard. And FDA requires data to be submitted in XPT format. That's open-source format, even though it's developed by SAS. So in other words, it's the same workflow we need to achieve, no matter we are using any software language.
Thank you. So I'll go over to some of the anonymous questions. And one is, what steps did you take to validate the R packages themselves?
Yeah. I will just recommend one pharma stock paper we developed last year. So we provide clear guidance, like a step-by-step instruction. So in general, it's a follow-up how to recommend our package book. So we use a structure to go through them. But we need to pre-specify the testing plan. We also need some independent testing and potentially double programming for the code. So as I mentioned, you definitely will need some execution resource to quantify and develop internal R package. So this can mimic the existing process to other languages you already have in your validation.
I think I got one here. Rachel, that's pretty good. Someone asked about how to approach macros in R, or similar things to macros in R. What has that been like for you using R?
Yeah, that's also a question discussed a lot while we started R. So for now, we're trying to see how to provide end-to-end solution. Because the way to mix multiple languages can create additional challenge from the IT, from your internal process, for which part of the process you want to use SAS R, become unclear in your process. So for now, we haven't go through the detail how to hybridly use SAS in one particular analysis. But in a study, where you have multiple deliverables, you can have some deliverable in SAS, some deliverable in R, that can be executed independently.
I think one thing, too, the question is trying to tackle is what were similarities in the R experience for you in putting together the clinical trials to what others may have experienced with macros? Was that the functions that you built? Was it the packages that you designed? Did you have different routines or workflows that you shared within the team in R that were reproducible and could be packaged up and shared?
Yeah, my experience is that different companies have different workflows for how to use SAS macro to generate the results. And it's the same in R, right? Probably if you consider Merck's flow, Roche's flow, that can be different. So in the R4CSR, we provide one recommendation, how to do the project management, and we measure where to save each of the files in the infrastructure within R package. This is how we are thinking a way to handle a large amount of table figure listing required for analysis.
I was just going to ask a question that I'm selfishly interested in just based on some of the work that we're doing with R and Pharma is Shiny. Did your team look or think about using Shiny in the submissions process?
Well, this probably you can get more answer from Roche team. Yeah. So definitely, I think the second pilot, as mentioned, is about the Shiny app part. I think the team actually working on that to explore how to use Shiny with the ECTD and potentially reduce, you know, reduce Shiny.
Thank you. I see Sylvia asked a great question. I think it was coming through from LinkedIn, and it was, thanks, Yilong. In your experience, are the external reviewers at FDA for R versus SAS the same group? Trying to think if there would be a delay cause in review since SAS is more popular for the past two decades.
I wouldn't represent FDA. My general hearing when we have the working group, so FDA member have people familiar both R or SAS or ISOC. And also in FDA's guidance, it's also recommend you indicate a software you use in the type C meeting for pre-IED submission. So in that way, FDA can be aware what kind of software you are trying to use.
So I see, let me see what we have here. So one question that I saw was about Admiral. I don't know where it went, but I think the question was people are interested in using Admiral for prepping data sets for clinical trials. Here I just found is that any other package for clinical trials data set creation which is useful?
Yeah, I think Admiral is a wonderful project. A group of people are trying to see how to develop the Admiral data set code in a central place. In general, for most data manipulation, I would say high diversity is sufficient. So for your exploration or for develop our code for the data set generation.
Thanks Yilong. I see someone else asked on LinkedIn is Shiny Meta. Is the Shiny Meta package a better way to build reproducibility for the FDA?
I don't have exercise on that. So I can just provide some color. So Shiny Meta is a package that was created by the Shiny team in RStudio probably about two or three years ago, and it helps implement reproducibility within workflows within Shiny applications. So I would say, you know, stay tuned or watch some of the Shiny for submissions working group information on the R&Pharma website. There's a couple different approaches to reproducibility via Shiny, and it will be interesting to see what becomes more of a standard as more Shiny apps are shipped off through the submissions process. But I do think it's a great way for reproducibility, but it also takes a little bit of learning. So it will be interesting to see if it's adopted or if there's other preferred approaches.
Awesome. I just want to take a second to address a few questions about where the slides exist. So I just put on the screen right here the slides for today that Yilong had shared with us before the event, and then also a few questions around the recording. So one of the great things about using this live streaming tool is that the recording will be there immediately on YouTube, so I'll be sure to share that link there as well.
Okay, this is a bit of a long question, so let me take a second here. Tanisha asked, all data sets and display creation in SAS and QC in R, or vice versa, or only display creation in SAS and QC in R? Or a total mix and match of SAS and R would be fine for FDA submission.
I think what they're getting at is, will submissions be a mix and match of SAS and R? Yilong, do you think organizations will mix the two? Will organizations choose to use R? Will they use R internally for the prep? What do you think? Or what was this like for your team?
I just feel it's a multi-linguistic way, right? But for particular task, it's better to be consensus on the tool to be used, right? For example, if the goal is to create analysis for survival, it's better all the trial for organization is using the same way instead of you have multiple way. Even within SAS or within R, you want to use only one R package instead of multiple package to do the same thing.
Thank you, Yilong. There are quite a few other anonymous questions, so I want to make sure that I get over to those as well. And just so everyone knows, we'll be collecting all the questions and make sure that we answer these too. So maybe in some blog post format as well. But one of the questions is around, so it says, are there any security issues with using R versus Python? This person said, my hospital IT team is less familiar with R and has questions about how secure the packages are.
I mean, I think the thing to know and I think is important is that CRAN and the R Foundation does a wonderful job of running checks when packages are submitted. And that typically is a pretty good thing to lean on and to understand. And some organizations have some scanning tools and things that they use internally. But that usually is enough for people to understand for what it takes to get a package to CRAN, which is not, it's a big undertaking and it's a lot and, you know, for package developers to maintain and support. And so that's a big part of it.
So I see a lot of questions that I think about Docker and containers and using that infrastructure for some of this work. Um, you know, I'm interested, Yulong, what that was like as did that, was that part of the conversations? You know, what would that look like maybe in a hypothetical world with the FDA or other regulatory bodies? Have your team thought about that?
Personally, I haven't tried Docker. I basically have very limited experience on that. Um, it's definitely a bring up, uh, in many conversations for last time mission working group. I think the one challenge here, how to get FDA ready to receive the information, but from your own organization, I think Docker could be a very good way or container solution, but I was seeing like Eric entering our docs from different teams, they have better understanding or potentially a live stream on Docker itself. So that would be nice.
Great. Thank you. Um, so one other question, I'll go over to the Slido. There was anonymous question, which I know we've touched on reproducibility a bit, but what's the best practice for a group of programmers using R for submission, um, to avoid any challenges with reproducibility?
Well, I think first thing start to use version control. Second, have your app profile set properly and in a shared space that people can all access the same location. So I think in general, that would be some good way to enhance reproducibility. And if you're familiar with IE or package, try to use them to lock your package version.
Awesome. Well, with everybody here, I do want to also have take a just a second to let you know that we do have another pharma meetup coming up in August. But if you want to just see all future events, you can use the short link we just put on the screen rstd.io slash community dash events. But we will have a pharma meetup specifically on August 23rd, led by Ashton at Bristol Myers Squibb. And Ashton will be presenting on data as a product, the data science framework for data collaborations. And I think that will be a great event as well. I'll share in the chat just the add to calendar link.
So just rstd.io slash pharma dash meetup, and you can add that event to your calendar. Thank you so much, Yilong, for sharing your time with us and your experience. It's great to be able to learn from you and to see all these awesome questions from the community.
No, thank you. It's always been exciting to see the work that you do with your team. I always love following it, watching it. And yeah, excited to see what you still produce.
Cool. Thank you. Thanks. Yeah. See ya. Have a great rest of the day, everybody.
