Orla Doyle - Creating reproducible static reports

Transcript#

This transcript was generated automatically and may contain errors.

Orla Doyle on creating reproducible static reports. Cool. Thanks very much. Cool. Thanks everyone. Really nice to be here all the way from Dublin. Good way to fight the jet lag is have a talk in the afternoon because the adrenaline kind of overrides that.

So I'm going to talk about a process of automated reporting and I'm also going to talk a little bit about how we work in the pharmaceutical industry. Just to give you some insight into kind of what's a bit different about some of the ways that we work.

So these are my disclaimers. So these views are my own and don't represent the company that I work for.

But there's also non-disruptive change and I think non-disruptive change, it keeps you energized, it keeps your teams energized and it also prepares you for that big destination that you want to get to.

From outputs to Word documents — the copy-paste problem

And so this is kind of a very simplified version of how we work and again, I think we've seen that diagram, which is not that dissimilar in several people's talks today. How do we get from an idea to a result? And so again, we think about code, that could be the code that we're specifically writing or code that we're consuming from, you know, an open source package. We have a lot of metadata in how we work. We work with standards. So it could be, well, the footer of that table should look like this or it could be, you know, the structure of your data model should look a bit like this. And then we have the input data itself. So what are we going to consume in order to derive those outputs that we are so interested in?

We have this kind of magic stuff in the middle and these are our workflow managers, our execution models. I think we have lots of different names for these things, but these are things like make facilities. These are things that help us with reproducibility. They help us with audit trails and they help us to produce our outputs in a way that we feel far more confident that they are traceable, reproducible. The accuracy really is done by the human, I would say.

And we want to get to maybe two places. So dynamic outputs and static outputs. And we've done all this hard work, right? We've done all this hard work to make sure things are reproducible, to make sure we have the audit trail. And then we take the sledgehammer of control C and control B and we copy and paste things from mystery location. Two words document and that feels very fundamentally wrong given all the work that has gone before it. And it doesn't scale, it's error prone and it's a frustrating place to be.

So we have different options here. Well, do we just throw away our static outputs and we only do dynamic because we feel like we have a very reproducible pathway? Probably not. Probably not tomorrow anyway, at least maybe in the future. So we need to think about maintaining this pathway where we're generating static reports because that is what our end users want to consume. That's where collaboration often happens. So what do we do?

If we're impatient, what we can do is make an alternative pathway. So the feeling, the experience for the statistician or for the statistical programmer generating that report is fundamentally different, but the end consumer of that report has the same experience. So that enables us to make kind of this concept of stepwise or non-disruptive change so we can still have, you know, hope that we can improve on a much shorter timescale.

And it's really not just about the code and I'll repeat this again in the next slide. For us, the code that what we're showing here is pretty simple and we're leveraging other packages like Office ORR, but the processes is what really helped us here, developing using good data science practices and bringing that forward with different stakeholders and generating what we call company compliant documents. If you're again, not maybe from a pharma industry, you might say, what's that mean? Basically, we have very strict templates that are Word document templates that have very particular styling and very particular things that we need to put in certain places. So again, the manual editing of those documents is a slow and sometimes fussy process for us.

Sample size reporting — the use case

So I want to talk about just one use case that we covered in this package. So we're talking about sample size estimation and the reporting of that calculation. So before we begin a clinical trial, the very logical thing for us to ask is how many people should we recruit? How many people should be on the different treatments or the different arms that we're going to have in our study to ensure that the evidence that we generate is reliable, right? And there are lots of statistical methods that help us to do those calculations. And this is what our statisticians are doing, given what we know about the different studies that we're running. And they want to bring this together.

So how we went about this is it was actually our statisticians who came to us. So I sit within a team that makes tools for our clinical trials teams and they were saying, I really don't like this part of my job. It's frustrating. It's slow. It's a bottleneck. I feel like my work is not as reproducible as it could be. So we got them on board and we said, hey, let's think about this together. But let's also bring someone from our compliance department because we want to make sure whatever we do in the end is going to be something we can deploy and roll out. So let's get them on board from the very start as well. Let's design it together.

Let's leverage the good software development practices that are so common in our teams, but maybe a little bit less common for some of our other users. And let's not wait too long. So let's every two to three weeks show each other what we have and try to see, does it look like it's working? Is that going to be a better experience for you? And bringing all of those things together, I think, helped us to do this in a matter of, you know, weeks to get to a first version.

Building the package with R6 classes

So what does it look like? So we've done a lot of talking about what it might look like and I'm going to show you a couple of code snippets now and the outputs. So again, I think the goal to remember is people wanted to generate Microsoft Word documents, but never know, never have to go to that product.

And this is really what we can do using a combination and for this particular use case, we used R Markdown. So we made classes, specifically R6 classes for every element of the report that we knew we wanted to create. And what that really helped us to do is to think really carefully about the structure. So classes, they have attributes, right? So they have elements and then they have methods and you could think about these really separately. So we could all kind of be like, I'll take that class, you take that class and you can work in a very nicely parallelized way and kind of a lot of your project management and your design is kind of done for you.

So by simply filling out and initializing this class and then doing our rendering in our getTitlePage method, we can actually produce our Word document title page in a way that is in the format that we expect it to be using all of the different fields that we wanted to have. So this was like a really exciting thing for us and this is energizing, right? People were like, oh, this is cool because I hate doing that fab and they were getting on board because they were excited.

It's the same thing for the different elements within the report. We created classes for things that had some unique properties that were like, hey, can't just make a general table, but here we can make a changelog table that's really easy to populate, add rows and then we can render that again. And we did that for all of the different elements across our report and eventually render the entire document. So what's underneath the hood here is a single or a markdown file and you can render that and it will produce your Word document at the end.

The other very exciting thing is you could co-locate the document content with the calculation. So if you're using a package like orpact, which is commonly used in our company for general sample size calculations, everything's now in the one place and that makes us all feel really happy because if you want to go back to it, it's just one single document that is going to generate the Word document, but also reproduce the calculations should you want to do that and you want to revisit them.

We also did a lot of logging here. Again, I think that makes us all feel pretty safe. When did I do it? Who did it? What was what kind of packages did I have? What was my session looking like? So again, we could do a lot of those logs so we could generate a lot of that traceability again to kind of make sure that we felt we were kind of keeping all of the good practices that we had seen when we generate an output itself.

And we end up here. I know this is a little bit small, but basically this is it, right? We get a whole document. It's a single or a markdown file and we never have to touch Microsoft Word or go in and edit it. And that was really exciting and to get to that point and I think again, it kind of brought in even more understanding of I can see now what all this open source stuff can do for me and isn't different to what I used to do before.

Object-oriented design and assertions

But again, it wasn't just about the code, right? It was about thinking about how can we do this in a way that makes sense from a software perspective as well. So again, a report and at least in my mind, it really lends itself to being object-oriented. There's natural elements. You need to have different methods for them, different attributes, and then you can do a lot with assertions. So assertions are before I'm going to make the class, I can assert what I think the values of these attributes should be. So I can say I want my date to have this format. If you don't give it to me in that format, I don't let you make your class. You have an error and you cannot go forward. So that's a very kind of maybe strict approach. You can choose different flavors of how strict you want to be, but instead of then having to review a Word document to make sure the date looks like a date, you can do it upfront. So a lot of those things that we perhaps don't enjoy doing in our jobs, like a lot of that checking, we can automate that in our approach as well, which I think, you know, makes us able to focus on different things.

We also then as developers create a lot of tests for our package and had a CICD pipeline. So we had lots of different people wanting to come in and work on our package as well. We wanted to make that open to them, right? So we couldn't scale to do every possible report in the company. So as we want to work together, making sure that we have a lot of tests for every class. So if we change something, we know we either, the tests that we have passed, we need to add new tests or we've broken something. So let's not put that to code review just yet. So that really helped us to have different people working on the package and making sure that we were able to test what we were doing as well.

Now and next

Okay, so now and next. So we've just finished successfully piloting this package for our sample size report within-house. We focus on two different scenarios. So what if someone was doing an amendment, if they'd never used the package before, what would that look like? Or if they were doing a brand new study, could they jump in and then what were the ways that they could jump in? And in both cases, this is where your compliance people are so helpful. Understanding if the report was different, if we were using a different version of the template, what will we need to document in terms of our process? Or does that look okay? So we had a very engaged compliant person, Maritza, to really help us there as well.

And we also have additional use cases that I didn't go through today, but some of our teams were saying sometimes something changes in the data and it's a small thing and then I need to edit 50 figures in a Word document and I don't want to do that. It's very manual. It takes me a lot of time. So simply with using a YAML to say, hey, this is where, this is the footer, was like the key, find that figure, update it with this new figure. We could also automate how people were updating their Word documents, which again is saving a lot of time and also is much less error prone than someone kind of going delete, you know, copy, paste again, right? So it's all about making sure that we have a much more kind of traceable way to make those changes.

We thought a lot about how we were going to open source this. It's kind of a funny one because it's very reliant on our company templates. So I'm very specific to what we're doing, but then equally we didn't really want to just kind of keep this fully in house. So what we're working on now is just having a very generic document and showing how we developed our code to do this. As I said, I don't think the code basis is that extensive or complicated, but perhaps what we can do is include a lot of our CICD as well. So people can kind of, you know, take this, make their own version, but kind of keep some of those practices embedded. And we want to do even more use cases. So I think this is where our testing will come into play and we'll see how we can change the shape of it.

But really we hope this is going to create a much more enjoyable experience for people who want to produce reports in this way without us having to wait a couple of years to make these types of changes, right? So that's what we're really excited about. And I'm very grateful to the people I worked with at Novartis. A combination of people from different parts of Novartis coming to us saying, hey, change this, this needs to be better. And you know, when those people come to you, they bring you such valuable knowledge and you get like such great insight into how you can really help. And then also, as I said, people from compliance, they know how to get stuff done. Making friends with your compliance people and going together is really, really important so that whatever you develop has a very good chance of getting to the people you wanted to get to, right? It's always disappointing to find out at the end that perhaps you missed a step or it's going to take a while to edit an SOP or working practice. So good to get that done at the start as well.

Making friends with your compliance people and going together is really, really important so that whatever you develop has a very good chance of getting to the people you wanted to get to, right?

Yeah, so thank you very much.