Resources

Ari Siggaard Knoph @ Novo Nordisk | Data Science Hangout

video
Oct 5, 2023
1:00:43

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Happy Thursday, everybody. Welcome back to the Data Science Hangout. If this is your first time joining us here today, it is very nice to meet you. And thanks for spending your Thursday with us.

If we haven't met, I'm Rachel, I host the Data Science Hangout, of course, and also lead customer marketing at Posit. This is our open space to chat about data science leadership, questions you're facing, and getting to hear about what's going on in the world of data across different industries. So we're here every Thursday at the same time, same place. So if you're watching this recording on YouTube later, the link to add it to your calendar and join us live here will be in the details below.

Also put a link in the chat, just a little like short link, if you ever want to share the Hangout with friends, too. But together, we're all dedicated to making this a welcoming environment for everyone. And we love to hear from everybody, no matter your years of experience, titles, industry, or even languages that you work in.

And so every week at the Hangout, I'm joined by a different leader from the community who joins us to share their experience and answer questions from you all. So if you're here for the first time, let me just give you a little rundown of how this works. It is totally okay to just listen in here. But there's also three ways that you could jump in and ask questions or provide your own perspective on certain topics, too.

So you can jump in by raising your hand on Zoom, and I'll monitor that. You could put questions in the Zoom chat. And just put a little star next to it if it's something you want me to read out loud instead, if you're maybe in a coffee shop or something. And then third, we have a Slido link, which Hannah just shared right when I said that. Thank you. Where you can ask questions anonymously as well.

With all that, thank you so much for joining us here this Thursday. I am so excited to be joined by my co-host, Ari Sigard-Knapp, International Lead Programmer and Statistical Programming Specialist at Novo Nordisk. And Ari, I would love to have you introduce yourself and maybe pronounce your name better than I did and share a little bit about the work that you do, but also something you like to do outside of work, too.

Definitely. I think you did it pretty well. You used to say Knapp, that's my last name, but that's probably because we're from Scandinavia.

Yeah, as you said, I'm an International Lead Programmer at Novo Nordisk, and that means that I'm the responsible programmer for all of our programming deliverables on cynical trials. I'm responsible for writing up documents for the authorities and interact with them whenever we have interactions. And in the end, the primary responsible for our data packages and code that goes to authorities.

And yeah, outside of work, I really love spending time with my wife and kids. I have two sons, friends and family as well. My oldest son recently picked up golfing, and he's only five and a half, but I used to play when I was younger, so we've been spending a lot of good time on the range already. And I really like to play a lot of guitar as well, so yeah, no matter what my time goes with.

The NDA submission milestone

That's awesome. Well, thank you. It sounds like your son's going to be a superstar. He's already starting at five. He loves it, so that's the most important thing, yeah.

Well, I get to start off with the questions here as we wait for everybody to start jumping in. But I know in a few weeks, you're going to give a presentation here at Posit2, and you've reached an incredible NDA milestone. Could you explain a little bit about what that means and what you're going to be talking about in a few weeks?

Yeah. So, I mean, what's really special about this is in the past few years, whenever we've been at conferences, we've heard so much more about R emerging within pharma, and people are starting to use it more in these GXP-controlled environments. And we've been working towards setting up a GXP-approved environment and reallyβ€” And sorry, I used an acronym at first because I forgot, as I was saying it, what the actual length was. But if we use acronyms, do you mind just explaining what they are too?

Of course. Okay, cool. Thank you. Yeah, it's very hard in pharma because there are so many abbreviations, so it's very hard to continue talking about a subject without mentioning some of them, but I'll do my best to explain.

No, so what's really special about this milestone is we've primarily used proprietary language to do most of our programming within pharma. And in recent years, people have been opening much more up to using open source languages. That comes with a lot of challenges when you need to use it in a heavily regulated environment such as pharma is, because there are a lot of rules and standards you need to squeeze down on top of your processes in order to comply with all the regulatory authorities, not only in the US, but around the world. So a lot of demands to how we actually produce code and submit code and data to authorities.

So we've been working on it for... we started almost seven years ago just trying to do something with R, and that has really grown into now having an NDA submission where we have tables, figures, and listings done in R. We have some analysis datasets done in R. We have packages we have submitted and so on. So really, the full monty of doing this, and the reason why it's so special is the number one question, at least I've heard when we've been at conferences, is whether anybody has done it before.

And the predecessor to what we've done is really the work from the R consortium working group. R for submission, where FDA also has played a huge role in giving input to what's possible with submitting those packages and R code to the authorities. So for us, it's really special that at least now we have an answer that we know that we have done it, and hopefully others will do it soon in the future as well.

So for us, it's really special that at least now we have an answer that we know that we have done it, and hopefully others will do it soon in the future as well.

Music, self-teaching, and collaboration

And I was wondering, how has that carried over for you into your data work, or do you think that it plays a part?

I think, I mean, so I'm a self-taught musician. So I mean, I learned it off of YouTube, like when YouTube was brand new almost, I spent so much time learning stuff off of YouTube. And it's actually the same with most of my R skills today. I mean, I studied mathematics at the university, but it was like the chalk and chalkboard type of mathematics. We weren't doing that much programming until the end. So I think a lot of the skills that I learned today, especially around R, comes from both like what the community is putting out there in terms of just like educational material, but there are so many people putting stuff up on YouTube to learn from.

And then there's also like, I mean, when you play music with people, you're kind of in sync with them and not to make like a cheesy analogy, but I mean, when you're developing code with people, I remember seeing this almost a meme right about like a git commit history looking like a guitar hero track, because there's just so many branching and so on. So I think there is like some great opportunities to like co-create stuff with people both in music and in code, yeah.

Collaborating with version control

That's great. How do you actually collaborate with your colleagues in code?

So it's not that long ago that we, at least where I work, moved into using version control for most of what we do. Even though we have been doing statistics and programming for quite a while, moving an entire organization into learning git and working with repository has been quite a challenge. But I think we've been quite successful in like lowering the barrier. So wrapping a lot of stuff up into other functions such that you don't need to think about like how do I set up a repo? So we primarily use Azure DevOps for it.

How do I set up a repo? How do I set up a pipeline? How do I make sure that I have the upstream set and how do I... So all of these things, you're more, I mean, people can... it's very easy to get your head around like commit, push, pull, like those three commands. Everything like outside of that is a little bit more advanced. So we use that here. And then externally, I mean, now I'm also laboring more with people through Farmiverse where we use a GitHub and also keep up with that.

So I mean, when we were testing out for like which product to use, I mean, we first tested out GitLab, but we found we had much more freedom in Azure DevOps also because we wanted to try and do like some of our project organization a little bit differently, like the way we organize programs and who should do it and how are we discussing about these things. And GitHub don't have to the same extent this project organizations. They do have Kanban bots, but they don't have the same level of interactivity with work items and what you can do in terms of like manipulating them with pipelines and APIs and so on. And also, I mean, Novo is a major customer at Microsoft, so it was kind of right down our ballpark to go to that product. So we don't use GitHub Actions, unfortunately. We use Azure DevOps and write a lot of YAML files with pipelines and batch scripting and so on.

What the biostatistics team does

Ari, I was thinking maybe in the beginning here, it would be helpful for some of us, who don't come from a pharma background, to just understand a little bit more about like a specific use case or walk us through a project that your team might work on.

Yeah, so I sit in the biostatistics department. So that's really like the evidence generating department of the drug development cycle. So we help run clinical trials. So we have like the people in the chain right before us, it's like data management. They set up all the systems to collect data, and then we get the data in biostatistics. And not too long ago, a lot of standards, data standards were rolled out called CDISC. And I'm sorry, I can't really, it's a very long acronym for something. Yeah, but it's a data standard for like collected data and analysis data.

So, and we create these data sets. It could be based off lab data, it could be like bioscience data, it could be some specific measurements, depending on the disease area that you're working in. And this is where we, I mean, use R, we use SAS for it as well. We run the trials in Git repos, and on top of this, I mean, what we're trying to do now is like build tables and figures with R, but also now Shiny applications on top of that.

Some of the first selling points for us to really get R into the organization was creating Shiny apps. So, we had a very tedious process of like reviewing pharmacokinetic data, where people were like entering meetings with like a 100-pager PDF document with like pharmacokinetic curves, and like they had all reviewed them before the meeting, written comments, and then you would like go through everything and had like modeling outputs. And it just seemed like such an obvious use case for us to go into a Shiny app. And that was one of the first thing that we really like sell to our stakeholders.

So, we do possibly do that, but like our bread and butter is creating these datasets, it's doing statistical analysis, it's doing summary statistics, descriptive statistics of adverse events and like lab results and so on. And then at the end, we wrap all of these things up into like a submission dossier, where we have the medical writers who write up documents, send it to authorities, and now there are at least three health authorities around the world who require that we send our data and our code to produce both the data and tables and figures.

Pitching Shiny apps to management

Thank you. So, did you basically say to them, hey, you don't have to go through this 100-page document, here's this app?

Yeah, yeah. I mean, so, we like, when you always have a project, like when you have a project like that, you always like tend to maybe go, you want to do too much in the beginning, like you want to automate, like we want to have automatic minutes based off like the comments that people were doing. And essentially, yeah, I mean, in the beginning, it was just like, you don't have to look at it in PDF, like we're also saying to here, right? And you can look at it in the screen, and then we'll kind of build out the modules around communicating in the app, seeing what other people have done and so on.

Travis, I see you had a question in the chat a bit earlier, you want to jump in?

Sure. So, I know Nova's been working on this Open Study Builder solution, which is more focused on data capture for those not in pharma, it'd be like electronic data capture. So, EDC systems is the front end that faces clinicians or their research assistants or whomever's entering data, and these are dominated by a couple of giant and I'll just say, like, let's say legacy-looking software providers. And so, this Open Study Builder looks really interesting, but it's very much kind of Neo4j and kind of like in the weeds on that side. Wondering what the connection to your side of the house looks like on the Biostat side, are they plugging it into R, do you have good APIs, or is that kind of future state?

Yeah, so just to give a little bit of history to that, so we have had a system for metadata management for quite some years. So, whenever we start up a trial, we build the entire trial design with all of the lab parameters and all of this and so on, and that really builds a metadata infrastructure for when we need to run the trials. We also put in metadata to use in the actual outputs, tables, and figures, and so on. And we found that works really well when you're just developing our small part in Biostatistics.

What we really want to do is, let's say I have metadata about some parameter definition, and they're writing up a protocol where they want to use that parameter. The Open Study Builder will kind of be like the one source of truth for everything that's connected to data in some way or another. That's why they also built it on top of this graph technology from Neo4j. So, what we will use it for is writing protocols where you have topic-based text that you just pull out. It's to build a much more coherent and metadata-driven full pipeline around the Biostatistics and the deliverables we do.

So, we've been using metadata for many years, but we want to reuse a lot of that information in the documents that we write. Because a lot of projects today are kind of making up their – well, not making up their own, but they're slightly changing up the definition of things when they write it in protocols or a data reviewer's guide or whatever it is that we're handing over to authorities. So, it's to build a system to manage all of that. And for us, it will be API calls internally.

We have – the same way we have a master model of our metadata for, for instance, STDM study data and analysis data, we also have a master model of all the metadata datasets that go into producing both those datasets, but also tables, figures, and listings. So, I mean, we're betting really hard on having a fully metadata-driven pipeline to the extent possible that we can do it.

Getting into pharma and career background

Thank you. I'm going to jump over to a few of the anonymous questions, too. And one was, could Ari share a bit about how he got into the pharma industry?

Yes. So, as I said, I mean, my background was in mathematics. I was doing topology and high-dimensional geometry. And at some point during the master's part of my education, I was thinking, who needs this out in the real world? So, I kind of shifted towards doing more statistics stuff, doing more programming. And I kind of learned that high-dimensional geometry is not that different from n-dimensional feature spaces. And I got really caught up in learning about machine learning. And I did my master's thesis in prediction modeling on MRI data.

So, I was really intrigued by that intersection between medicine and technology. Like, what could you do in that field? So, I was looking for places where I could explore that more. At that time, Siemens were like the one creating all the MRI machines. But there's, I don't think there's Siemens here in Denmark, where I live. So, and I ended up applying for a job at Novo at that time. And so, I started here straight out of university and haven't left since.

Package development at Novo Nordisk

So, I saw a really interesting presentation from Novo Nordisk about the, you know, there's essentially a pharmaverse of R packages that you're using. And you have packages for everything, basically, which is amazing. So, are those packages developed, like, from scratch within Novo? Or are those kind of Novo wrappers around other things?

So, they're primarily wrappers around, like, tidyverse stuff. Because there are internal standards to pretty much anything you could think of that there is a standard for. So, and some of them are to make it possible to work with R within our environment, like the way we store data, the way we export things. And I said, like, utility functions to work with Azure DevOps, to work with Git, to work with, to all these different systems that we have internally.

Some of it is, like, layering on top of these metadata to do things smarter. Some of it is building, like, a package for creating tables that are specific to the Novo way of styling tables and so on. But one thing that kind of has kind of catched up to us is they were developed without, like, looking too much to the open source world. So, now when we actually want to do more with Farmiverse, where multiple companies are working, it's really hard to separate, like, the Novo part and, like, the real functionality of the package. So, we're working on that now.

It's an interesting design issue then, you know, because I can understand that building wrappers on existing things to tailor it for your needs is a really neat way of doing things. But as you say, it kind of then locks you into a particular way of working that maybe, you know, has problems down the line. So, there are probably pros and cons, right?

Exactly. And we do, and I mean, there's also, like, in recent years, so many companies have opened up much more to talk about how they're actually solving things, and we're also doing that as well. And we really want to contribute to kind of keeping those lines open. So, we're definitely trying to pivot some of our packages, either kind of committing to some of the open source packages that are there, or pivoting our packages and kind of see if they can get into the open source world.

Transitioning from SAS to R

So, I'm a little bit curious about the transition Novo Nordisk made. The first is, you know, during that trying to prove that, you know, the Pharmaverse or using R was just as good as SAS programming. How did you have, like, you know, I know there's always a second reader on the data. So, did you have one person using R and the other person using SAS, or how did that go? And then the other question I have is, you know, I've seen some reports the R and SAS data coming back just slightly different. Nothing that would change a decision or anything, but just slightly different. So, how do you sort of address that as you're, like, sort of mid-transition here?

I mean, to speak to your first question. So, our primary goal has not been to, like, replace SAS with R. I mean, we are really aiming for a much more, like, language agnostic world where programmers use the language that they're most comfortable with. In terms of, like, getting it into the organization and having people using it, we started out, like, what we really wanted to prove was that it was possible to, like, make a real deliverable with R, where we make the tables and figures in R. So, there we did the actual deliverable in R, and we did the parallel programming in SAS to make sure that it is what we would expect to get in SAS as well.

And then down the line, I mean, I remember seeing one of my good peers, Michael Rimler from GSK, talking about their program called R for validation, where they were really rolling out that everybody was allowed to use R for validation. And I think, like, building on top of that, we thought that, well, I mean, they're already using it for validation, so why not try to use it in production? Like, on a very small scale. And now, today, we have, I mean, all kinds of mixtures of how people are approaching it. Whether they're doing the primary program in SAS and R is not that important anymore, because we kind of built up an environment that supports, like, a multilingual execution.

So, I mean, so we expect that there is a difference. And many of the, like, compare packages that's out there, you can set a criterion on how to which decimal places should the numbers align. And there's always, like, when you're storing floating point integers, there will always be, like, a small difference even with, like, the same number within the same system. And I think what we have really been preaching internally is that we really try to preach, like, conclusions over numbers. So, of course, if you see something that's totally off, you're going to investigate what's happened there. But that when we use and kind of set these two systems up against each other, are we still looking at the same conclusions for these adverse events or this lab parameter and so on? So, that's been our kind of take on it.

And I think what we have really been preaching internally is that we really try to preach, like, conclusions over numbers.

Pitching Shiny apps and enabling data-driven decisions

Jared said, definitely interested in knowing more about how you pitched to your upper management to use Shiny apps. Mainly I get pushback because of validation concerns on doing that versus the standard 200-page PDF files.

Yeah. I think I would say, luckily, our management have been very open to, like, it's almost expected that you try and solve problems in a new way. And that's kind of what we've been raised with here. At least, I've been raised with it here. So, of course, there are a lot of technical aspects around that. But we kind of did the same thing that you would do with maybe another innovation product is that you would talk with the people who are supposed to use it. And most of them said that it was a really good idea. And then we said to our management, like, everybody thinks this is a great idea, right? And we think we have the skills to build it.

And, of course, one good thing is it's always good to apply a little bit of action on an idea before you go searching for opinions because opinions can quickly kind of hold you back. And we had already applied some actions, like built something around it. And so, you're kind of coming up with something that they almost, at least we're trying to get into a scenario where they can't say no, or at least that, like, can we test this out in one or two trials?

And then you make sure that, like, all of the downstream processes around this, I mean, we're still, like, in compliance with standards. We're still, like, able to deliver everything we're supposed to do. And now, hopefully, people would be a lot happier. And now also, like, CO2, right? We really want to save some trees as well.

So, that's been a major selling point for us in terms of Shiny apps. And then also, another thing that I think most companies say today is that they really want to be data-driven, right? And if you want to hold people to the word, I think, like, our community is probably one of the parts of the organization that have, like, the most enabling power to give to other people who doesn't necessarily have data scientist skills. I think that's, like, if you can make somebody higher up in the organization, realize that you can make so many people do their job in a much more simple way just by going into a Shiny app, or maybe even just, like, looking at automated reports on a connector, I mean, what be it, that you can enable so many people just by doing that. So, yeah, that's been our, kind of, selling point of trying to get all of this into our processes.

Advice for those entering the pharma industry

Hi, Agus. So, my background is in mathematics, and I'm now moving into bioscience. So, I'm currently doing my PhD, and I'm going to finish in a couple years, and industry just seems like this massive unknown, where, as an academic, I don't really know what is wanted. Do you have any advice for, like, things to learn or technologies to learn to make yourself more competitive in that environment, and specifically in bioscience?

Yeah, I think, I mean, now I can only really speak for my own company. I think there are, of course, like, some technologies that we're moving more towards. I mean, so, if you know, like, any of the open source languages, like, do you have, I mean, recently, when we have candidates, we also look, I mean, are they on GitHub? Like, have they done something interesting? Do they, can they, like, display a passion for what they're doing, and for us now, we're also putting it in our job ads that it's almost more important that you are, like, willing to learn new stuff. Of course, there are some jobs, also in the farm industry, where you have to have a specific degree to do that job, but, for instance, for development jobs, we're looking much more into that.

Like, you have a passion for doing it, we can see that you have done it before, and when we speak with you, you know your way around, like, for instance, package development, and which packages you use to do package development. So, I think it's a much more, now, a display of interest. So, I think a lot of companies today do these, like, interview questions or exercises that you do before, and, like, if you're applying at a pharmaceutical company, then, like, ask about, like, the disease that you're, maybe you've gotten some data on some disease, like, what does this mean? What, like, show that you're interested in in what it is that they are trying to do in pharma? But, I mean, keeping up with technology is a really good thing to have on your resume.

When to create a package

I'm going to jump back to the conversation about packages for a second here. I saw there was a Slido question that said, I'm curious from Mari's experience of writing packages, when would it be considered overkill to create a package and when is it justified?

Well, I mean, it depends on whether, so I mean, of course, now I'm sitting in an organization where you know that a lot of people are doing the same exact thing, but maybe just a little bit different than you are. So, most of the things that we do every day is something that at least 200 other people are doing. And that's a really strong rationale for just making a package around working specifically with tables or Git or repos and so on.

But if you're working for yourself or in a small group, like my wife is a PhD student currently and has also done a small clinical trial, and you kind of quickly learn that you can actually gain so much from setting up functions for yourself to kind of repeat tasks over and over again, to read in small data sets or to format something that you want to output later on. So, I think there's probably not a package too small if somebody benefits from it, and especially yourself. But in our case, it's pretty easy to know when you're supposed to do a package. Most things, and also kind of referring to a comment before, most things end up in a package because we expect that many people will use them.

Sharing knowledge and making work visible

How are you actually helping share those packages across the organization or helping share what other people are doing in different teams so work doesn't, I guess, get duplicated? Do you have community groups?

Yeah. So, one thing that was running, and I think it's still running, is we had this initiative called Radar, where everybody could post ideas to not just code development, but process development, or let's get new coffee in the coffee machine, or whenever people start, they should get a water bottle now. So, all these different things. In terms of code development, moving into Azure DevOps has been a way for us to also, first of all, make all of that work visible. So, before, I couldn't necessarily see the code and programs from another project, and we tried to really open up and make all of that work visible and kind of in a help yourself way, where if I want to see what other people are doing, I can just now see all of their code, and it's the same way with the package development, now that all of that development and organizational development is also available for everybody to see. So, that's the way we've done it, yeah.

Python, Shiny, and future language plans

And I think the question was, do you see Python for Shiny as something you would use for your use case in the future, or now?

I think there will probably, so I'm not that well versed in Python, but I can see, I mean, there definitely will be some use cases for Python for us as well, but a lot of the stuff, like now we have like a huge SAS infrastructure, and now we also build a huge R infrastructure, and now when we're starting out with Python, I mean, like you have the reticulate package, and you have so much like interoperability between R and Python, that there might be many things that where you either use like the Plumber API or reticulate to kind of use the best thing of those languages, so not even if you were to at some point in the future do a full trial in Python, like you would maybe just call some of the R infrastructure underneath it, because those two languages are now so easy to kind of switch between.

But I love the quote, I think, and probably I saw Joe Cheng's keynote again not too long ago about Shiny, I think it was last year, where he says like Python, it's the second best thing for everything, and it is really like when you're looking at doing like production grade enterprise level apps or systems that you build both in R and Shiny and also other technologies, I think that Python will be some of the things, and that's also what we like recently hired for, that we want to have Python developers also to influence some of the stuff that we're doing in R that might be even better to do in Python. So it's definitely coming, I would say.

Career lessons and the "no side projects" rule

Thank you. So a question that I love to get to ask leaders in this space is could you share an important lesson that you've learned over your career, or maybe a career lesson that just stands out to you, and I ask that.

I think, I mean, there are kind of two things that come to mind, and one of them is, I mean, this is just my kind of personal belief in like if you want to do new fun stuff that you think will actually help somebody at work as well as, like I try to do this like rule where I have no side projects, because usually like you're doing work and then you have this side project that you're like really excited about, like you want to build up some stuff and so on, but people know that like the actual work is really what consumes all of your time. So trying to have no side projects in the way where if you have a side project, you're trying, I'm really trying hard to make, I'm like this is actually what I'm supposed to do, right? Like the same with this shiny app that we were trying to, like this is how the process should be, this is how we want to work with it.

And also now like rolling R, like we wanted to do more R development and we can also see now that like we can interface with so many different things that like, but the people coming into like especially my project, like if I've told them like now we're running these five trials, everything is in CES, and on the side you'll have to learn R, because in two years time we're going to do a trial in R. So I've really tried to use my power in that sense that I really wanted to change their environment such that like it's not a side project anymore, like this is, I want you to learn this stuff like on the job and we'll try our best to kind of support you in doing that, because I mean there's also like some like logical aspect of like working towards the deadline with something you know will be used by somebody, so you try a little harder. But it's also just like whoa, this is really new and exciting and there are people here to help me do it and people just grow incredibly fast if you let their side projects be their main projects really.

People just grow incredibly fast if you let their side projects be their main projects really.

And another thing is like one of the things that like made the biggest splash in terms of now we have like some new technology we could use is like it's not the shiny apps and it's not like using R and like other processes. I did this, so whenever we have a trial ending and there might be some like sensitive information or confidential information that you need to keep secret until you kind of do a press release with results, everybody who has access to those data needs to be on an inside list. And what we previously done is like the responsible programmer, so the programmer in my role, kind of tracks who has access to the data and then we like generate a list that we send to like a legal department and they put them on inside. So it's like this long chain up and they like up to like when we do the release and so on like they're asking every day like maybe twice a day for this list like did anybody go off of this? Did anybody like new get access to the data?

And I remember I don't want to do that every day. I don't want to like sit and like figure out who has access to the data. And luckily with R we now had like we could query the server that we had data on for accesses and I just wrapped that into like a data table and a markdown script, put it on a connect server and then just made it open for them and made it refresh itself every six hours. And like so many people who like maybe have some technical roles that are not really touching data in that sense, like you have so many people now getting automatically on the inside a list that they may not have been on before. And there are so many people outside of the area that I have and they were like calling me like what is this thing that I'm now on this it's a lesson and not because they want to do like bad about having access to the data but it just created such a huge splash that now we were able to like automate this part of it and so many people were kind of affected by something that we have automated that like that has really like resonated like far out into our stakeholders that we did that.

And I mean we thought it was pretty cool that so many people now like automatically got on the list and that's been like one of the biggest splashes to like we automated something right that actually created value for others. So that was a pretty cool yeah.

I love that. It's really cool when you get to see all of that feedback from everybody right away. Like just yesterday I did something really small where I added like a download csv to a table that was in a Quarto doc and I realized right away like that made that report so much more impactful to somebody that they could download it.

Submitting Shiny apps to regulatory agencies

Niels I see you just put a question here in the chat you want to jump in and I can read it. It is what is needed to submit a Shiny app to a regulatory agency? What are the requirements that they are looking for?

So I might not be the best one to answer this one because it's not something we have done. The only thing I know is that our consortium working group are for submission have some really great people working on getting Shiny apps to FDA and there are many aspects of that like how do you read the data into a Shiny app in a system where you don't know where the data is right and how do you spin up something that can run a Shiny app. So there is an association for a statistical program in pharma called FUSE which has had a working group around like interactivity and submission for quite a few years where they were talking about like if everybody is submitting Shiny apps like FDA will never know like there's no like red thread around like this is how we do data imports in Shiny. So if everybody is making up their own UIs about how you analyze data, I think FDA will have a hard time going through all of those Shiny apps.

So they have been working a little bit on like should we try and set up some standards around submitting something like a Shiny app to FDA. And I think even like even less could even do it like if you were doing just like HTML file if you're using flex dashboard to do stuff. I think that could go a really long way and I do believe so FDA has a file format specification document over all of the allowed files and I'm pretty sure dot html is on that one. So you could get pretty far probably with doing something in Quarto or Markdown.

Validation, dual programming, and environment control

Yeah I know this is just a kind of a hot button topic that a lot of groups are going into but just how do you guys approach the issue of like dual independent programming and validating your Atom, SDTM, Atom, TLF data sets. Was it more of a one side one side used SAS another used R or if you both used R when it came to like package and environment versioning like validation to provide that full traceability to the FD like how did you guys approach that infrastructure behind the scenes if you don't mind me asking.

Yeah so one of the things I think I can mention first is that like for quite some years we have been working with our standard operating procedures and making them like more easy to work under and especially so we have one for we call it the custom programming SOP standard operating procedure where we talk about like the different review levels that we're applying to custom programs as we call them and there we kind of introduced a self-review level, a peer review level and then a double programming level. Most people call it and we're actually trying to do less and less double programming. I mean so for many years we've kind of been at our ears blown full of risk-based approaches to many things and so we've been trying to like you double program the really most essential parts primary analysis the most important analysis data sets and then if you kind of step one level out to like most people do double programming they have done it in CES but most people are doing it in R now we're also doing like R versus R.

And then in terms of like doing those GFL in order to encapsulate like what you're doing on the trial I mean we're running RENV or REN to control like the local environment and that's also why like we needed to have those trials. We needed to have them use GIF because we need to track like that local environment and then if you step one further out into like how are we getting packages into the organization and I mean we use Posit Package Manager and we apply kind of the risk-based approach from the R Validation Hub. We do risk assessment of packages and we worked or some of my earlier colleagues have worked very hard on like automating that to some sense. I think if you want to have a really good example of how that would look like is I think it's Aaron Clark from Biogen has created this risk assessment app which is built up on the risk metric package and we use that as well for our internal risk assessments and then the environment itself like the workbench and the package manager we have had some people from IT help to bring it up to like a GXP validated environment that we can actually run trials in. Yeah I hope that covers it.

Yeah I think my biggest like issue I've encountered is just trying to allow any sort of confidential or patient-sensitive data into a GitHub or even if it's a private repo or something to allow that to be moved out of some firewall locked state and so then comes the question of you know you know SVN or what do you use at a local level to to ensure that contain like docker or you know kubernetes or what to containerize the versioning as well as the validation of the packages and everything so that it can be built from that and then it just becomes it just turns into this mess of trying to explain and keep separate so if that makes sense.

Yeah I mean so maybe luckily I can say it I mean we don't use GitHub so we have Azure DevOps within the firewall and I think one of our recent like GXP requirements are that basically any system that you're like interacting with when you're doing trial has to be within the firewall and like in terms of this system so of course there's a lot of documentation for the system that you need to have in place if somebody were to kind of inspect the system. Our system will soon I think actually tomorrow shift to run on the kubernetes. It's been running on like Amazon virtual machines I think for now and everything there is built like infrastructure as code so we have control over everything we do there and as I mentioned like the guys from IT are really the ones helping like with the requirements around standing up that environment and they also have like inspection experience from other systems that we have so that's been a huge help in kind of getting it to work for us. No that's great yeah thank you.

Sorry if I cut you off but thank you so much everybody for joining today and thank you Ari for sharing your experience with us. I wanted to remind everybody before you log off if you want to hear more about the work they're doing at Novo Nordisk and see Ari again presenting. That event will be on September 12th and I see Tyler just shared the link in the chat right now if you want to check it out but Ari thank you so much for joining us today. Yeah thank you so much for having me. It's been an honor to be here. If people want to get connected with you is the best way LinkedIn? Yeah probably on LinkedIn. Okay awesome. Well thank you all. Have a great rest of the day. Nice to see you. Bye everybody. Bye.