Resources

Orla Doyle - Creating reproducible static reports

video
Oct 31, 2024
20:33

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Orla Doyle on creating reproducible static reports. Cool. Thanks very much. Cool. Thanks everyone. Really nice to be here all the way from Dublin. Good way to fight the jet lag is have a talk in the afternoon because the adrenaline kind of overrides that.

So I'm going to talk about a process of automated reporting and I'm also going to talk a little bit about how we work in the pharmaceutical industry. Just to give you some insight into kind of what's a bit different about some of the ways that we work.

So these are my disclaimers. So these views are my own and don't represent the company that I work for.

Working in clinical drug development

So pharmaceutical industry does lots of things, but the part of the pharmaceutical industry that I work in is clinical drug development. And so what we're trying to do is we're trying to develop new medicines and produce robust evidence to understand whether or not these medicines are safe and effective. So are they safe and do they work?

And we do this in a very regulated way and that's not something we're trying to get rid of. We like to sleep at night. So we welcome a lot of that regulation. It makes sure that we adhere to really high standards and that our work is indeed reliable.

And in order to do this, we work with lots of different people. So we work with people who have very different backgrounds to perhaps folks like ourselves in this room and it's a very multidisciplinary environment. So when we want to review this evidence together, we often focus on things like static reports and people here like yawn, how boring, move on, but actually when you have people who come from very different disciplines, sometimes those static reports are a lower barrier to actually getting the work done that we need to get done, which is understanding are our drugs safe and effective.

What "regulated" really means

Then when we think about this word regulated, what does that really mean? It means lots of things across the industry that we work in, but again, if we're thinking about statistics or data science for us, there's kind of three main pillars to it. Is what we're doing traceable? So if I'm working on a piece of data, do I understand the lineage of that data? What happened to it before I got it and did the right things happen to it?

Is my work reproducible? So we submit something to a health authority. They review it. Perhaps they have some queries. Maybe that query is a couple of weeks down the line. Maybe it's a couple of years down the line. We want to be able to come back to our work, understand if I have a static figure, what's behind it? Could I reproduce it? And could I tweak it, right? So that might be what we're thinking about.

Is it accurate? I mean, that's a big, big part of what we do. Often we think about going from writing a specification to me, mostly as a human, interpreting that specification into code, into an output. All those steps we want to think about accuracy. So not just is the number correct, but did I do the right analysis?

Disruptive and non-disruptive change

So they're the kind of things that we think about. So again, a lot of it sounds like we take a lot of these steps very carefully and you might be thinking that sounds like a hard industry to have change and sometimes it is and we don't always take change up very rapidly. Often we move a little bit more slowly and more carefully, again, because of that regulated nature and because of the high impact of us being right or wrong in our work. So an error, making a mistake that propagates through to something that affects a patient, is something that is we want to avoid at all costs, right?

So again, we're kind of embracing that regulation, but also trying to think about what does it mean to change? And I think I want to talk today about two different types of change. So I think if you talk to any people here from the pharma industry, we're all a little bit in different phases of these big changes. Moving from proprietary tools to open source tools, thinking about moving from static to dynamic, there's change everywhere. Thinking about changing the types of compute environments that we work in, right? So we're in this big kind of phase of change and it's taking us a little bit of time and we're all at different phases.

But there's also non-disruptive change and I think non-disruptive change, it keeps you energized, it keeps your teams energized and it also prepares you for that big destination that you want to get to. And I think the example today is a nice example of changing something in how we work that makes it better for someone like a statistician, but ultimately we end up with the same end product, which is our static report. So we don't perhaps need to do so much stakeholder engagement. We don't maybe need to do so much change of the entire process. Maybe what we're doing is adding a pathway instead of changing the entire pathway.

But there's also non-disruptive change and I think non-disruptive change, it keeps you energized, it keeps your teams energized and it also prepares you for that big destination that you want to get to.

From outputs to Word documents — the copy-paste problem

And so this is kind of a very simplified version of how we work and again, I think we've seen that diagram, which is not that dissimilar in several people's talks today. How do we get from an idea to a result? And so again, we think about code, that could be the code that we're specifically writing or code that we're consuming from, you know, an open source package. We have a lot of metadata in how we work. We work with standards. So it could be, well, the footer of that table should look like this or it could be, you know, the structure of your data model should look a bit like this. And then we have the input data itself. So what are we going to consume in order to derive those outputs that we are so interested in?

We have this kind of magic stuff in the middle and these are our workflow managers, our execution models. I think we have lots of different names for these things, but these are things like make facilities. These are things that help us with reproducibility. They help us with audit trails and they help us to produce our outputs in a way that we feel far more confident that they are traceable, reproducible. The accuracy really is done by the human, I would say.

And we want to get to maybe two places. So dynamic outputs and static outputs. And we've done all this hard work, right? We've done all this hard work to make sure things are reproducible, to make sure we have the audit trail. And then we take the sledgehammer of control C and control B and we copy and paste things from mystery location. Two words document and that feels very fundamentally wrong given all the work that has gone before it. And it doesn't scale, it's error prone and it's a frustrating place to be.

So we have different options here. Well, do we just throw away our static outputs and we only do dynamic because we feel like we have a very reproducible pathway? Probably not. Probably not tomorrow anyway, at least maybe in the future. So we need to think about maintaining this pathway where we're generating static reports because that is what our end users want to consume. That's where collaboration often happens. So what do we do?

If we're impatient, what we can do is make an alternative pathway. So the feeling, the experience for the statistician or for the statistical programmer generating that report is fundamentally different, but the end consumer of that report has the same experience. So that enables us to make kind of this concept of stepwise or non-disruptive change so we can still have, you know, hope that we can improve on a much shorter timescale.

And it's really not just about the code and I'll repeat this again in the next slide. For us, the code that what we're showing here is pretty simple and we're leveraging other packages like Office ORR, but the processes is what really helped us here, developing using good data science practices and bringing that forward with different stakeholders and generating what we call company compliant documents. If you're again, not maybe from a pharma industry, you might say, what's that mean? Basically, we have very strict templates that are Word document templates that have very particular styling and very particular things that we need to put in certain places. So again, the manual editing of those documents is a slow and sometimes fussy process for us.

Sample size reporting — the use case

So I want to talk about just one use case that we covered in this package. So we're talking about sample size estimation and the reporting of that calculation. So before we begin a clinical trial, the very logical thing for us to ask is how many people should we recruit? How many people should be on the different treatments or the different arms that we're going to have in our study to ensure that the evidence that we generate is reliable, right? And there are lots of statistical methods that help us to do those calculations. And this is what our statisticians are doing, given what we know about the different studies that we're running. And they want to bring this together.

So how we went about this is it was actually our statisticians who came to us. So I sit within a team that makes tools for our clinical trials teams and they were saying, I really don't like this part of my job. It's frustrating. It's slow. It's a bottleneck. I feel like my work is not as reproducible as it could be. So we got them on board and we said, hey, let's think about this together. But let's also bring someone from our compliance department because we want to make sure whatever we do in the end is going to be something we can deploy and roll out. So let's get them on board from the very start as well. Let's design it together.

Let's leverage the good software development practices that are so common in our teams, but maybe a little bit less common for some of our other users. And let's not wait too long. So let's every two to three weeks show each other what we have and try to see, does it look like it's working? Is that going to be a better experience for you? And bringing all of those things together, I think, helped us to do this in a matter of, you know, weeks to get to a first version.

Building the package with R6 classes

So what does it look like? So we've done a lot of talking about what it might look like and I'm going to show you a couple of code snippets now and the outputs. So again, I think the goal to remember is people wanted to generate Microsoft Word documents, but never know, never have to go to that product.

And this is really what we can do using a combination and for this particular use case, we used R Markdown. So we made classes, specifically R6 classes for every element of the report that we knew we wanted to create. And what that really helped us to do is to think really carefully about the structure. So classes, they have attributes, right? So they have elements and then they have methods and you could think about these really separately. So we could all kind of be like, I'll take that class, you take that class and you can work in a very nicely parallelized way and kind of a lot of your project management and your design is kind of done for you.

So by simply filling out and initializing this class and then doing our rendering in our getTitlePage method, we can actually produce our Word document title page in a way that is in the format that we expect it to be using all of the different fields that we wanted to have. So this was like a really exciting thing for us and this is energizing, right? People were like, oh, this is cool because I hate doing that fab and they were getting on board because they were excited.

It's the same thing for the different elements within the report. We created classes for things that had some unique properties that were like, hey, can't just make a general table, but here we can make a changelog table that's really easy to populate, add rows and then we can render that again. And we did that for all of the different elements across our report and eventually render the entire document. So what's underneath the hood here is a single or a markdown file and you can render that and it will produce your Word document at the end.

The other very exciting thing is you could co-locate the document content with the calculation. So if you're using a package like orpact, which is commonly used in our company for general sample size calculations, everything's now in the one place and that makes us all feel really happy because if you want to go back to it, it's just one single document that is going to generate the Word document, but also reproduce the calculations should you want to do that and you want to revisit them.

We also did a lot of logging here. Again, I think that makes us all feel pretty safe. When did I do it? Who did it? What was what kind of packages did I have? What was my session looking like? So again, we could do a lot of those logs so we could generate a lot of that traceability again to kind of make sure that we felt we were kind of keeping all of the good practices that we had seen when we generate an output itself.

And we end up here. I know this is a little bit small, but basically this is it, right? We get a whole document. It's a single or a markdown file and we never have to touch Microsoft Word or go in and edit it. And that was really exciting and to get to that point and I think again, it kind of brought in even more understanding of I can see now what all this open source stuff can do for me and isn't different to what I used to do before.

Object-oriented design and assertions

But again, it wasn't just about the code, right? It was about thinking about how can we do this in a way that makes sense from a software perspective as well. So again, a report and at least in my mind, it really lends itself to being object-oriented. There's natural elements. You need to have different methods for them, different attributes, and then you can do a lot with assertions. So assertions are before I'm going to make the class, I can assert what I think the values of these attributes should be. So I can say I want my date to have this format. If you don't give it to me in that format, I don't let you make your class. You have an error and you cannot go forward. So that's a very kind of maybe strict approach. You can choose different flavors of how strict you want to be, but instead of then having to review a Word document to make sure the date looks like a date, you can do it upfront. So a lot of those things that we perhaps don't enjoy doing in our jobs, like a lot of that checking, we can automate that in our approach as well, which I think, you know, makes us able to focus on different things.

We also then as developers create a lot of tests for our package and had a CICD pipeline. So we had lots of different people wanting to come in and work on our package as well. We wanted to make that open to them, right? So we couldn't scale to do every possible report in the company. So as we want to work together, making sure that we have a lot of tests for every class. So if we change something, we know we either, the tests that we have passed, we need to add new tests or we've broken something. So let's not put that to code review just yet. So that really helped us to have different people working on the package and making sure that we were able to test what we were doing as well.

Now and next

Okay, so now and next. So we've just finished successfully piloting this package for our sample size report within-house. We focus on two different scenarios. So what if someone was doing an amendment, if they'd never used the package before, what would that look like? Or if they were doing a brand new study, could they jump in and then what were the ways that they could jump in? And in both cases, this is where your compliance people are so helpful. Understanding if the report was different, if we were using a different version of the template, what will we need to document in terms of our process? Or does that look okay? So we had a very engaged compliant person, Maritza, to really help us there as well.

And we also have additional use cases that I didn't go through today, but some of our teams were saying sometimes something changes in the data and it's a small thing and then I need to edit 50 figures in a Word document and I don't want to do that. It's very manual. It takes me a lot of time. So simply with using a YAML to say, hey, this is where, this is the footer, was like the key, find that figure, update it with this new figure. We could also automate how people were updating their Word documents, which again is saving a lot of time and also is much less error prone than someone kind of going delete, you know, copy, paste again, right? So it's all about making sure that we have a much more kind of traceable way to make those changes.

We thought a lot about how we were going to open source this. It's kind of a funny one because it's very reliant on our company templates. So I'm very specific to what we're doing, but then equally we didn't really want to just kind of keep this fully in house. So what we're working on now is just having a very generic document and showing how we developed our code to do this. As I said, I don't think the code basis is that extensive or complicated, but perhaps what we can do is include a lot of our CICD as well. So people can kind of, you know, take this, make their own version, but kind of keep some of those practices embedded. And we want to do even more use cases. So I think this is where our testing will come into play and we'll see how we can change the shape of it.

But really we hope this is going to create a much more enjoyable experience for people who want to produce reports in this way without us having to wait a couple of years to make these types of changes, right? So that's what we're really excited about. And I'm very grateful to the people I worked with at Novartis. A combination of people from different parts of Novartis coming to us saying, hey, change this, this needs to be better. And you know, when those people come to you, they bring you such valuable knowledge and you get like such great insight into how you can really help. And then also, as I said, people from compliance, they know how to get stuff done. Making friends with your compliance people and going together is really, really important so that whatever you develop has a very good chance of getting to the people you wanted to get to, right? It's always disappointing to find out at the end that perhaps you missed a step or it's going to take a while to edit an SOP or working practice. So good to get that done at the start as well.

Making friends with your compliance people and going together is really, really important so that whatever you develop has a very good chance of getting to the people you wanted to get to, right?

Yeah, so thank you very much.

Q&A

Thank you, Orla. Quickly, is Joshua Cook here? If so, could you please come to the front? So time for a couple questions. Top one here is, are the reports compliant with accessibility standards? And if so, how do you implement those accessibility elements? Not sure if it's relevant for Novartis, but. Yeah, no, I think it's a good question. I have to say, I think things like colorblindness for graphics and things like that are things that are important. But I think anything further than that, I have to say, I don't think it's something that we have actively checked in the documents, but it's a good question. I actually wrote that down because somebody said it at the Orin Pharma Summit, so I have it in my notes to be like, are we doing that?

One more. How do you go about the exercise of scaling this to preclinical or including other modeling results in such reports? Like what are the steps of the design thinking? Yeah, so I think it's very simple to include different results or different code. I think where it is not always as scalable as we would like is if the templates look very different. Because then there is some limitations in how dynamically we can work with Docx and XML. So there's a little bit of setup if the template is really different for sure. But then if you have some type of generic report, I think that's going to scale extremely well. And the other thing is trying to get the people who want that in. And I think it's a brilliant opportunity. They have different skills than I do. I have different skills than they do. So kind of getting people from different parts of your business in, if they would be willing to adopt some of the software practices, you can kind of almost do that in a knowledge sharing as well. But I think getting folks to ideally come in as well and be part of the development is I think how we can scale it as well.

One more quick one. So I was thinking about like how you would work with your compliance people and like what does that look like? Because you're writing code, but then you produce an output. Then it's like you produce the output, you show it to them, you say is this good? And then you do some tweaking and then is that the cycle that you go through with these people? No. So I would say that's more like statistician to statistician. So that would they would be working together to say, hey, I have this output. Can you double program it or can you verify it? The process of what that looks like is then designed with your compliance person. So what documentation do you need to have to kind of have evidence that you did that check? And then we would also work a lot with Moritz and say, well, the working practice for sample size reports is currently like this. If we were to introduce or markdown, what needs to change, if anything. So she would help us a lot with the formal kind of documents in our company. And she always has a solution for many things. So a great person to work with as well. Yeah. Great. Thank you so much. Thanks, Luke.