Resources

Data Science Hangout | Christina Fillmore, GSK | Open source community collaboration in Pharma

video
Dec 19, 2022
1:02:48

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Happy December 1st to everybody and welcome back to the Data Science Hangout. Hope you all had a great week. Last week, we were off for Thanksgiving. And so if you're joining us for the first time today, this is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing and getting to learn about what's going on in the world of data science across different industries.

We share the recordings each week to our Posit YouTube, so you can always go back and rewatch or find helpful resources. We are putting them up to our new Posit Data Science Hangout site, just is taking a little bit longer, but they will be there. Together at the hangout, we're all dedicated to creating a welcoming environment for everybody here. So we love when you all can participate. And we can hear from everyone, no matter your level of experience or area of work.

And I had a dream last night that I forgot to say this part next and that nobody was talking. I don't know why I had this dream. But there's always three ways to ask questions and also provide your perspective. So it doesn't just have to be a question if you have something you want to weigh in on or a certain topic. But when you can jump in by raising your hand on zoom to you can put questions into the zoom chat and feel free to put a little star next to it if you want me to read it out instead it's in the zoom chat. And then third, we have a slide a link where you can ask questions anonymously.

One more thing we also have a slide, sorry, a LinkedIn group for the hangout too. So this also helps you connect with each other, we'll share that in the chat in a second here. We've learned that you do have to manually turn on the notifications for the group using the little bell at the top if you wanted to do that.

But with all that, thank you so much for joining us here. Happy to be joined by my co host for today, Christina Fillmore, data science leader at GSK. And Christina, thanks for joining us here. I'd love to have you maybe start by introducing yourself and sharing a little bit about your role. Maybe something you like to do outside of work too.

Christina's introduction

Okay, so hi, everyone. My name is Christina. Despite my American accent, I do live in the UK. I'm not sure if you can see it is dark outside. So I thought that's my fun fact. I work at GSK and I work within the biostatistics space. And one of the main roles right now that you kind of see across a lot of pharma are kind of moving pharma moving into the R space and starting to use R as a primary tool in order to report clinical trials. So one of my main jobs today is actually just like writing a lot of R packages for our specific needs.

Pharma does a lot of things that are like very pharma that are just like, if it whether it comes from doing things like the way we format tables and create tables. Pharma does stuff that most come like most of the times you don't do things like just in the world of tables, because that's my most recent package. So I can talk all about tables and table design, as interesting and exciting as that is to anybody. One is things like we tend to make mock tables, like we make a fake version of the table we want before we have any data. And then we like use that fake version to make the table we want much later down the line. But oftentimes the fake version of the table happens six months to a year or even more, depending on how long your study is, to like making that table. So doing things that make it so that it's easier to kind of work through the pharma pipeline and allow for automation while making R packages is basically my job, which is really long winded way to say I make R packages.

And I think that was all your question. My last question was, what is a hobby I have outside of work? And I would say knitting. I knit this jumper. I like knitting.

Team structure and package priorities

I was curious just to start off a question about like getting to make packages and having that be a big part of your role sounds awesome and I was just curious to learn a little bit more about the team that you're on.

Yeah, so I'm in a pretty small team. There are four of us on my team and we do the like kind of majority issue of the package building here at GSK, at least for our internal use and to some external ones. There are other people in our community who help with other projects like the Admiral project. There are people in our department who do that, but I am not one of those people. I do kind of other Pharmaverse things. I currently hold like within the Pharmaverse, I am the person who writes packages to deal with metadata. I don't know how this has become my life. Like it was an accident. It wasn't like I love metadata, but so I have three primary packages that I maintain within the Pharmaverse.

MetaCore, which deals with metadata around datasets, MetaTools, which uses that metadata to do fun automation stuff, and Tformat, which deals with metadata around formatting tables. So that's like all of the metadata I hold in my like world of Pharma based metadata. But that is so that's that's kind of like me. And then, yeah, my team is really there is primarily me and like Becca Kraus, who's on who does safety graphics and Alice Hughes, who's does VellTools. So there's kind of like all of us have our own packages as well, but those those are the ones that we kind of maintain right now.

How does the team decide what to work on or prioritizing?

At the moment, the priority has really been developing a pipeline for us that's usable. The kind of that talk for us, that kind of the clock is ticking for how long we're going to keep using SAS or I mean, well, that's not true. We might always use that, but like we want a fully developed pipeline for our. And so really, like how I started and how it's kind of gone from this point is looking at this pipeline and saying, what do we need in order to have the things that we need? If we have all of these things, we can report a study.

We try to identify where there's not already a lot of people doing the same thing. There's kind of an influx of people into like pharma building our packages right now. And like there's a lot of people doing things that do like kind of calculations, for instance. And we're not necessarily like I came to this role. This came into this world by being a clinical statistician. I used to be the person who's like designed sample sized studies. And then I like needed some things. And I was like, I'm going to build a shiny app to do some sample size things for me. And like, over time, I became a shiny person who then kind of built some packages and like I then I'm more like fell into this role that was like, I am today going to be a data scientist. It's really based on my own needs and the needs of people around me.

So like, that's how I fell in here. And so because I'm not a clinical programmer, I really do my best to not take over some like within the pharma space. Some of the reporting and stuff we do, I guess this is a very sorry, I'm very pharma jargony right now. Basically, within when you're reporting a clinical trial, there's two people who do like the data set things in terms of like job career people is like clinical programmers and statisticians, clinical statisticians. The statisticians tend to maybe run the final analysis, the main big one that tells you whether the study works. But there's a whole bunch of things that you need to do, like get the number of people in the study check that like, look at all the safety adverse events and calculate the number of adverse events there are, which are not really doing like, you know, traditional statistics. They're more like means, standard deviation, like all of the data sets, cleaning up, all of that sort of stuff. And that's traditionally been done by someone called a clinical programmer.

And so I am never been one of those people. And so I always say I just play one on TV, but I tried to leave the, and my group isn't full of people who were that role. So I try to leave the stuff where it's like, you need really in-depth knowledge to understand like how particular things are calculated in data sets. Like the people who were clinical programmers are really good at that because they can bring that knowledge to the table because I can't, that's why I tend to avoid doing those things and to try to focus more on things where I can add value that way.

Working with customers and collaborators

So Christina, I think I was typing this at the same time you were starting to describe those other folks who were kind of downstream of the work you're doing and I was just interested in like what that collaboration is like. Are they, if you think about like your most direct customers, are they those clinical programmers and others who are on the like the study specific teams or do you have a different, like what's the motive of interaction there and how do you figure out sort of who kind of who to be responsive to and when as studies come along?

I would say yes to everything. So on one level, I'm a little bit engaged, removed from it in that I, anything that I'm actively building hopefully isn't also being used in a trial. Like we try our best not to like be trying to fly a machine while building it. That's not ideal. So we don't tend to do that. But at the same time, those clinical programmers are oftentimes the people who I work, I work with them a lot and they are major customers for me because if I write functions that they don't understand how they're going to work together and fit and they don't make sense to them and don't solve problems that they're having, they are useless. I might as well have drawn like a picture for myself.

But they're also not my only customers because depending on what I'm doing. So like for my data set, metadata packages, they actually basically are my only customers. But for the like table formatting one, there are other people who are even further down the line than, for instance, a clinical programmer. So a clinical programmer makes all these tables to hand over to somebody else called a medical writer who then like writes the clinical study report that goes to the FDA. And so like those people become are also like my customers because they have needs like we need to ensure that the like typeface meets the FDA requirements because it turns out the FDA only accepts like five different fonts and that the margins have to be very particular. And there's like all sorts of rules that they know and I do not.

Getting started in the pharma R space

I'm trying to figure out structurally where you sit in your organization. So are you under IT informatics? Are you somewhere else? Also, what was the road like to get to that position and to have a pipeline set up? Because I'm the only R user in my company. And right now I'm in I'm scrambling to build kind of a sandbox playground VM place to start up some of this work and then to start to have to be the pipe piper for other analysts and scientists in the company to kind of start to play with R and find a safe space to kind of work.

Yeah, so my history, like so I sit within what's a department called biostatistics. So in these like massive pharma companies, you usually have a department full of statisticians and clinical programmers who do the reporting out of the study and the designing of the study. So that's the department that we sit in. And we grew out of a need to support that department, building out things like we actually started with really supporting the statisticians primarily and doing and building tools for them to help design trials. Like that's that was like kind of the need that I had as when I was a statistician.

And so, like, that ended up being the tool that I built with others, not just me, like, we have some really brilliant statisticians here who know math more than I do. And they're very good at the math, but maybe they don't know how to be, how always to like, we are a big department, like a big company that I don't even know how many statisticians there are. But many, like, probably at least two to three hundred or something like this, like statistician who knows how to do a particular analysis type or like for trial design can't sit with all two hundred people when they need help. So we started building out some shiny apps to help them like to help people design it. So that is how we started.

We being my entire department, like, because I kind of have our department's not very old. God, like, about four or five, four years. So we're like a fairly new department within our sub department with my larger biostatistics department and I've kind of been around since the beginning. So that's how we, how our department largely started. And then, as we started building those sorts of tools, some of the, there's a request to, hey, we want to move to trying our for clinical study reporting. And so, as we started to move to that, that's how we started building packages for that need.

Open source strategy and the pharmaverse

Years ago, GSK had a very specific extensive SAS macro system and said, even had a name, which I now forget. Is the corp now doing this with R?

I, so I don't know what the grand vision is. I, I am not puppet master, but I would say we are really trying to focus as much as possible as doing stuff in the open source. So, I think a lot of the extensive macro system that we currently have today, which I don't know if I'm allowed to say the name of, I don't know what the rules are. I'm not going to say the name of, because you didn't. So just keep it that way. Um, it's like the one that shall not be named. Is that macro system? Like, a lot of what it does are things that, like, you know, admiral and other packages are kind of taking into place and doing and are really our goal. As, like, the larger pharmaverse is to do our best, not just at GSK, but, like, other pharma companies to as much as we can kind of open source, all of these kind of complex macro systems into things that everybody can use.

It makes it easier for everyone. And like, when we're hiring people, for instance, we can hire someone and then be like, yes, I have used all of these pharma specific our packages because I come from pharma and we don't have to spend 6 months or whatever, training them for our complex macro system. Instead, it's more like, here are some of our processes, but you should feel comfortable with most of these things because you've probably been somewhat exposed to them. So that's like one of the reasons that GSK is, like, looking at pharmaverse, um, and helping in pharmaverse and involved in that way because of those sorts of spaces.

So, yes, the pharmaverse is a group of packages that is all open source. But basically, it's a series of packages that really look on building a kind of pharma and because the pharma workflow is somewhat standardized. I mean, the FDA and the EMA have requirements. They're like, you need to have a data set and it's called an ADAM data set. And it looks exactly like this. And you're like, okay, great. And so there's not, like, everyone does their own thing and gets to be creative about how that data set works. If you do that, the FDA will be like, no, thank you. Go away. Make the thing I've told you to make. So, because there's, like, there and with standardization can come automation and, you know, like, overall improvements. Um, so that means that, like, we're really in the pharmaverse is trying to, like, get some of that automation and that time saving and stuff like that possible because of that.

Prioritizing open source work

How does your team kind of prioritize that type of work versus your more quote, unquote, day to day responsibilities at GSK? And what is leadership buy-in, and then, like, to give your team support to work on, say, the pharmaverse or other sets that are happening in the industry?

Yeah, so, um, thankfully, I work under Andy Nichols, who I don't know, most people probably won't know that name. He helps, like, there's the thing called the R validation hub. He, like, very much loves R and the open source. Um, and so we had to do a lot of kind of groundwork laying, but now most of my time is actually spent on dealing with stuff in the open source. I would say 80% of my time is doing stuff like that and building packages for the open source and then sometimes building internal, like, compendium packages to make those open source packages work for us internally.

But, like, I would say it's largely open source work is what I do. But I think having leadership, like, a, having Andy do all of the talking to be like, open source is important. It's been great. Um, so have have yourself and Andy is my top recommendation. Um, outside of having yourself and Andy, I think for us, I mean, the pharmaverse has been really great because it has been a collaboration where other pharma companies have also said it's important. Um, and so you're able to, like, no one wants to be the only one out there putting open source stuff out. Like, that's that feels very uncomfortable when you're, like, standing alone in the wilderness and you're like, I am alone.

But as everyone started to put stuff together now, we can really talk to our leadership. Like, it's important to do open source because if we don't do it, and if our voice doesn't get heard here, sometimes it means that the way we want things to be done might not be done that way because somebody else built it and now that's really popular and that other person built it a while ago and everyone uses that and it's going to be hard to, like, change course, change a ship that's already moved. I don't know some ship analogy, um, about moving ships. And so that is why, like, I think that's generally the argument that we use when talking about the open source.

You'll hear something I can't remember the exact phrase, but, like, pharma, my general career path, like, we're not an IT company. The day comes an IT company is the day. I'm like, please just like, this is bad. So we don't want to be an IT company. We want to deliver people, like, make drugs and help people live better lives. Like, that's our goal as a pharma company. Not have become an IT company, so us developing software that helps everyone, like, deliver on the thing that we're all supposed to be doing. It's easier to argue that that's not intellectual property for us, because our intellectual property tends to be things like molecules and like, that's like, that's a thing that we make.

Not have become an IT company, so us developing software that helps everyone, like, deliver on the thing that we're all supposed to be doing. It's easier to argue that that's not intellectual property for us, because our intellectual property tends to be things like molecules and like, that's like, that's a thing that we make.

Starting a cross-company open source community

Libby and I, and a lot of other people have been talking about trying to do something that's inspired by the pharmaverse, but more for human resources and trying to sort of pull together people in different companies to sort of build packages together. Um, we have far less standardization and honestly, analytics is pretty poor. Um, so anything that can sort of help raise the bar would be really good. Any suggestions on, like, how to kick start a movement of getting people together or any lessons learned from the pharmaverse about what we should sort of do or avoid.

I don't personally believe you need to have a big splash in order to start. I think if you wait for a, like, a big, beautiful website and everything to be working before you try and get into anything, you're going to just, like, be waiting in the circle. Um, so I firmly believe finding some friends and start developing packages with them is, like, a great choice and it doesn't have to be a huge number.

I don't personally believe you need to have a big splash in order to start. I think if you wait for a, like, a big, beautiful website and everything to be working before you try and get into anything, you're going to just, like, be waiting in the circle.

It honestly, like, I wasn't necessarily run for the whole origin story of the pharmaverse, but, like, part of it grew out of something like Michael and Michael and Mike Stackhouse, two people who are not here. And probably you don't know, but basically, like, they were like, we should build some packages together. And I was like, okay. And they were like, Christina come help. And I was like, okay. And I built a package for them with a tourist. So, a different company that is not even a big pharma. It's a CRO. So, like, we, and a CRO, like, built a package or two together. And then, like, the Mikes were having more conversations with other people. And all of a sudden there was a beautiful website. And I was like, wow, everything's coming together. And so, like, to some extent, it's just like, talking to people and then starting to build stuff. Eventually, you'll have enough stuff that you will have your own verse.

But I highly recommend starting. And in terms of the, like, things aren't standard, what do you do? And how do you start? Sometimes a nice starting place is a package that helps standardize stuff. Metadata is not really in a standard format in pharma. Everyone has their own, like, way of holding this metadata, it kind of gets standardized to a thing called a define XML, but that's only like right at the end of the day, typically that define XML comes together. As you're like sending the thing to the FDA, it's not like that's, it's, it's not ready at the beginning.

And so, one of the first packages that I built as part of the pharmaverse was this package called MetaCore. As I said, I'm like, I hold many metadata things, don't know why. But part of the reason was, we wanted to build automation off this metadata. But everyone held it in a different Excel format, typically, or like, it was in a database, but like, it was their own way, they designed it their way. So, we needed to have a thing that we could all look at and be like, this is the thing, and we all use this thing, and this is how it works. So, like, what MetaCore came for us, it's just like an R6 object. It's just an object, like, it's the most boring package in the whole wide world. It does almost no things, it sits there, and you can poke at it. But like, that's it. But that enabled additional automation to be run off. So then, like, we have a different package called MetaTools that does a lot of this automation. But the first step was getting that thing we could all agree on, was like, the thing we were using. And it doesn't have to be perfect.

Transitioning from SAS to R

The learning curve is steep from cozy SAS programming to pharmaverse or R open source. Any suggestions for newbies transitioning from SAS to R and Python?

My first suggestion is, like, try doing a thing that you kind of generally know how to do that thing in SAS. Like, if you don't run AI/ML models in SAS, don't be like, I'm going to learn R and start with AI/ML models because that's a thing I've always wanted. I know it's a thing you, like, even if you want to get there eventually, it can be helpful to start in a place where, like, I know what this data set should kind of look like at the end, so I'm going to do something similar. So that's, like, my first general recommendation. But I would also say, like, I really like the R for Data Science book. I think that that one's really good.

We are trying to, we being the people of the pharmaverse, are trying to pull out some sample code used to build, like, a standard ADSL. You're saying you're coming from SAS, so I'm going to use formal words at you. So, like a standard data set, so you can start doing things that you kind of are familiar with. But those are my general suggestions. At some point, you just kind of got to do it, and that will make, it will get easier with time.

Because it's December 1, I'll also recommend, like, Advent of Code. It's a thing I like to do. I mean, the questions get really hard towards the end, and so if you don't, can't do them in R, don't worry, I can't do them in either. But, like, just practicing problems. I think going from SAS can be hard because it's not an object-oriented language. Anyway, but yeah, SAS is not object-oriented, and I think getting over that first hurdle of, like, having an object-oriented language is the hardest, but then it does get easier after that.

Data access and clinical data engineering

I have a question about how you access data, so kind of like one of these boring admin questions, but knowing how a lot of – so for those not in pharma, there are kind of some software giants in the space that provide solutions to make electronic case report forms possible. So at the clinic or wherever your clinical trial is being facilitated, people have to enter into an interface, like, this patient had this disorder and this event happened on this day. So there's companies that are called things like Metadata and Viva and Enform, and the way they deliver out data is typically not by way of giving you access to an API or keys to a database. Rather, they have modules that kind of do nefarious things like email data to you. Have you all encountered a better way of making this happen? Because that's one part of the data stack in pharma that feels woefully behind the times.

So short answer, no. Long answer. GSK has a lovely department called Data Management that manages that data and hands it to us in biostatistics as SDTM data. So because we get SDTM data, like, and that's how our departments work. And my job, I mean, back to priority, how does the priorities of what packages get made for us today largely is set on what my department needs. My department doesn't have to worry about yucky data formats to SDTM or trying to get those sorts of things from an ECRF into an SDTM. So that is why I don't do that. And that's why we've not looked at it yet. That's not a no ever. It's just thankfully it's not been my problem, but I sympathize.

Honestly, so many people are changing their name titles to data scientists now. I don't always keep it straight. It's a cool club. Right. It's a great club. Everyone can be in the club. I'm not gatekeeping the club. I just don't remember who has joined the club this week.

Tables and the tabling package

Could you explain how tables in R specifically relate to the work being done by GSK? Maybe a few examples or real use cases.

So when I talk about tables, I talk, I mean, like demographics table, as in the number of people who are like, goes male, female, it goes, you know, here's the age cut down like mean age, max age, min age. By treatment, so it is like the standard is like a standard. It's different for every company, but it is a table that we give out for every study that is ever run at GSK. I mean, or you can have specifics for like this, the table that is for your particular stats analysis for your particular study, but when I talk about tables, I mean, like that sort of table and that table. We, like, can create in R or cannot create in R, it doesn't really matter. And then like needs to get formatted in a very specific way to meet those were kind of medical writing requirements that I was talking about before, where they're like, it needs to have these particular margins, this particular font.

And this year with my team, we developed a package that takes something called an analysis results data, which is a long skinny data standard that kind of CDISC is currently piloting. And basically the idea is, rather than having everybody, because what happens for the most part today, I won't say through all of pharma but through most of pharma, is that a clinical programmer, maybe, like, maybe their job today is to make a demographics table for their study. So they're going to make this demographics table they're going to, you know, do all the calculations that they need in order to get the like, and that the number of people like over 65 and the number of people who are under 65, and all sorts of things, the average weight, the percentage of people who are over 65, whatever, by treatment, they'll make that table, then somebody else comes in and QC's that table.

But like, when they're QC'ing that table they might be like, oh, all the numbers look good but the spaces are off. Now I'm going to spend two hours arguing with spaces, yay, and everyone has a bad time. So we don't want that, that seems bad. So, the analysis results metadata idea is that rather than having to QC a formatted thing you can QC the analysis results metadata, which is long and skinny, and the point is it's much easier to QC, you can just put your Ns in rows and percents in rows and, you know, make it tidy data. Everything is good for everybody. But then at the end of the day the medical writers still do want a thing that looks like a table that's easy to read or the clinician who's, you know, trying to understand, we've run this study, the stats say this thing, but like they need clinicians look at things, and like to understand what happened in the study.

We need to make it so easy for them. So one of the things that we did is create this package that extracts some of that formatting into metadata, so that the clinical programmer can just create a long and skinny data set, call their job done and dusted, and the metadata sits elsewhere, the metadata can be created when you create these mock tables. Basically before you start the study you like, draw out a picture and you're like, I want a demographics table that looks like this picture. And then, so then the clinical programmer has to do a thing that looks like that picture. But if our idea is rather than have it as a picture, have it as metadata, and then the clinical programmer just make something long and skinny, and then because you wrote the metadata to draw your picture, it will like make your picture, and you will be happy because it will be exactly what you said.

It takes any trying to explain code and abstract is hard, but so that's our tabling package, it really is more tabling formatting. And yeah, when we talk about tables, it's just like the demographics table that the clinician will look at for the program medical writer.

Hiring for open source talent

I'm assuming you look for open source talent when hiring. Has this profile changed over the years?

So, yes. And, but the years that we've been looking are slim. So it's, it's more that like our department's what like five, five ish years old, the people we have hired in have mostly been like people who had other open source packages, particularly in the pharma space, because they were like you, you clearly do this thing seems great. And they were, they've been great. And I'd love to work with them. So, I would say, yes, has a chance I can't, I can't, I, there's not been enough years to say it's changed.

FDA collaboration and R-based submissions

Is any collaboration with the FDA or other agencies being done to adopt more R-based workflows?

So, yes. There are a variety of working groups, I am not personally on any of them, where the FDA also sits on the working group. For instance, and like other collaborations have happened. For instance, there was an R submission that they submitted a pilot study into the FDA. But it was like a pilot study it wasn't real. And anything it wasn't real data, but I mean it was in the right format. And so, those sorts of working groups are, and how it's being done and like the pilot study the FDA was like, happy to review it they like spent their time doing things like they would have reviewed a real study, which we definitely couldn't have done without them being totally willing to do that.

That, like, I, I'm not on so yes, but also they're not like, Eric, you can correct me if I'm wrong I'm now pulling you into this as they don't tend to, like, be on a huge number of working groups and they don't have not seen them release any like R packages where they're like, this is what we're doing to like actively help they, they want to help us as in, like, the larger pharma community but they're also not like building a lot of stuff.

Yeah, yeah, most of that is definitely correct although we are making sure that the code or products we're developing are in the open so that other pharmas can use those the way they see fit and where we're where we're at currently is in pilot two with a Shiny app in the FDA's submission portal and that will be obviously all the code that we're doing is open source right now. And in terms of like people don't want to get involved with that effort, the best place to go probably be our GitHub repos and file an issue and say you'd like to be a part of it and we'll make sure our leads for the working group get you involved in but it's an exciting time right now and it does move kind of slow sometimes but we made a lot of progress in the last couple of years, compared to where we were before, so it's pretty exciting.

The tipping point for R adoption in pharma

Pharma, sorry to non pharma people, they're like lemmings, even more than anything else if one person does something then they will follow. So, the key event. The key over the cliff event is the exception, the acceptance of a submission by the FDA from some therapeutic area. So, I like to think that that might be oncology, who's in the lead here that you can see, or is there any lead anywhere. Because there's a lot of resistance internally by by many clinical data, data management biostats.

I would say it's hard to say. I don't, I don't have a real clear like this is definitely going to happen like this way. I also think it's a, yes. The answer is, I don't know. I don't know who's winning, but I, I, I think it might, I mean, things like the admiral packages where there's admiral specific to a particular thing. There's like an admiral oncology, there's an admiral immunology, like, yeah, other there's a couple of admiral specific ones that I think those are a great start.

I feel like I don't know exactly where the change will be. I think it's going to be more slowly over time, then, like, one particular section, switching everybody over. At least how the GSK approach is with, like, choosing SAS versus R when you're trying to do something is largely whatever the programmers feel comfortable with our goal. Part of our goal of moving to R is getting to making it so that people, I mean, there are a lot of new grads and things who are coming up who, like, only learn R in grad school. And so, like, we want people to use what they're comfortable with.

So, I feel like, at least from what I think I see in our department, it's going to be like, these, this, you know, clinical study, like, clinical this programmer who's running the study for submission feels really comfortable in R like, they've done a lot of R stuff. They love R so they're going to, like, push their team to all use R and then, like, those are kind of the submission that. Like, we've started to prepare a submission in R where, like, the, the production is R QC in SAS to the FDA. Yes, and I saw you laughing when I said the QC is done in SAS because we are pharma and still risk averse. So don't you worry, we're not, we're not there yet.

Metadata and the tabling workflow in detail

I'm interested to hear a little bit more about the interaction between the metadata tables that you're responsible for, and the downstream kind of work that involves those. So, just thinking about like the critical tables with content from the studies, you know, the users, or, you know, the participants and all the other information, and how that relates to the metadata, and like what that actually looks like for the downstream analysts like how do they put the two together.

So, um, this is like one of our newer R packages. So it is something that like I maintain my team helped me build like I couldn't have built it without Ellis and Becca so just to give everyone credit as the lead thing, but also part of what we built is based off of very pharma specific workflow. So just like I'm going to put it out there, and in pharma, we do a thing where you like make a table that is full of nothing but X's, where you're like, this should be treatment, this should be placebo, these is the I want this to go to like x, x.x because I want it to go to one. I want it rounded, you know, only once, or like, so you do this whole thing you draw out oftentimes sometimes in Excel, sometimes in Word, like the thing you want it to look like and that's what's called your mock.

So, what we've done is we've basically made it so that this T format package is a package where you can put this metadata in to say like I want things that are labeled, and, for instance, to be rounded to nothing, you just want like no decimal places, but I want things that are labeled as mean that are part of, I don't know the group age to be rounded like this, and you put it all in that then. So it's just an R script, where you like can put all this metadata in. Our grand vision, although we're not there yet because time, and the limited number of people and resources, is to have that right out to a JSON file, it kind of looks like a JSON now we just pretend in our heart it is when in reality, it's just an R object.

And that R object is then what's used whenever you have the data. So the data has to get into this semi standard format called an analysis results data which is just tidy data, honestly, you could use any tidy data data set, as long as it's like long and skinny like that. And so, the plan would be that the clinical programmer just makes that data set. And then they can, we can store the metadata, which is in our pretend JSON but it's now just an R object. And you can just like, it's basically kind of plug and play so you can put, if you have a really standard table for instance, you can use the same metadata, over and over and over again and just swap out the data sets, you can go as many times as you like and we'll at the moment it goes out to GT, and we'll probably keep that forever because GT is great, like Rich has done a really good job.

We've had to work with Rich a little bit in order to get things. There's some additional things that we've needed. We've worked with Rich in order to get it to write out to Word for instance which was important for us, and he did that and that was amazing and so, like, it's, we've, we work with like people like Rich to like our goal as a like GSK department largely is to not everything, but to work with great tools that already exist. So, yeah, that was, so that's how we would see it work is that once you have this GT object, it can go to all of the things you want like it can go to your PowerPoint or it can go to PDF, it can go to Word, it can go to RTF, HTML, whatever. And from our perspective, we can hold the formatting metadata in a pretend JSON and then just apply it to as many data sets, as you want it to be applied to.

Looking ahead

What is something that you're most excited about thinking out to maybe the year ahead in the pharma space or an open source in general?

I think in the year ahead the thing that I'm starting to see the most is like, I really am starting to see this like complete end to end solution, like we're almost there. Excluding Travis's problem with things that before they go into SDTM but that's not my world so in my head it's complete even though it's not complete for everyone but I, I get to care about myself here. And I'm like, it's depending on where you start and to end. And that I like.

I love to see that. And so I think that really will that will also start making it easier for people to do that transition, I think one of the things that has been a barrier up till today is that without every single building block in place in terms of like we need an R equivalent for everything. It can kind of be hard to move into that to be like, okay, we can now we agree and we're willing to like, take the leap and try R on a study. And it might sometimes it's like for stuff you know you won't even need for that particular study or that particular moment, but just to be able to say to your leadership, don't worry that thing exists. If we need it, which I don't think we will but we might can be like helpful in terms of like getting leadership buy-in.

We are currently on the path of choose your own adventure. Like, as I said, let people do what they want and what works for them. And so that that is the GSK party line and that is what I know. I don't know of anyone who's doing that quite that extreme level I mean Roche has been, to some extent, more, they've had a lot some of these tools longer, they're starting to open source more and more of their stuff but like they've had R tables for a really long time, which has been open source for a while. Yeah, I'm not surprised that they're a little further ahead than everybody else in this space.

Getting involved in the pharma R community

If there are people listening in who want to get more involved in the pharma community, what do you think is the best way to get started?

The best way to get started is a, as kind of, you have to choose what's right for you, I would say, to Eric's point, if there's a working group that you are really that you love, like, make an issue and they probably need your help. If there's a package that you love and want to be involved in, ask the package maintainer, this includes me. I am the sole maintainer on a lot of my packages, and sometimes I have other help from my, from my group but sometimes I'm the only one who does the thing. So if you are like, I have looked at your packages and I also love metadata, with a deep and burning passion, or you don't, but you have some like want to help, I would love that.

So, because I have one person. And so, yeah, I would say, contact people for the most part in the R pharma space. People are so open to help and very receptive. So I, I've yet to go up to someone and been like, I have a pull request for you and then be like no you don't get most people are like, thank you. That's amazing. So, I would say, generally, it's really an arms wide open sort of community.

So, I would say, generally, it's really an arms wide open sort of community.

GitHub is great. LinkedIn is good. I, like, it's kind of another option is I am on the, like, there's a pharma R Slack, so you can go on the pharmaverse Slack, I'm available there as well. So I would say, whatever works best. If you send me a message on GitHub, that's probably like you probably need to put an issue on one of my actual packages for me to see it but you can also, yeah, LinkedIn or Slack are probably the easiest ways to get ahold of me unless you have an issue on one of my packages or wants to be involved in the very specific one, in which case an issue will be great.

Thank you so much Christina for for joining us today and sharing your experience. Thank you for having me and thank you Eric for for being, being my buddy and doing links for me that was very helpful. Yes, thank you. Thank you for all the great questions and resources shared in the chat to have a great rest of the day everybody.