Ari Siggaard Knoph @ Novo Nordisk | Data Science Hangout

So for us, it's really special that at least now we have an answer that we know that we have done it, and hopefully others will do it soon in the future as well.

And I think what we have really been preaching internally is that we really try to preach, like, conclusions over numbers.

Pitching Shiny apps and enabling data-driven decisions

Jared said, definitely interested in knowing more about how you pitched to your upper management to use Shiny apps. Mainly I get pushback because of validation concerns on doing that versus the standard 200-page PDF files.

Yeah. I think I would say, luckily, our management have been very open to, like, it's almost expected that you try and solve problems in a new way. And that's kind of what we've been raised with here. At least, I've been raised with it here. So, of course, there are a lot of technical aspects around that. But we kind of did the same thing that you would do with maybe another innovation product is that you would talk with the people who are supposed to use it. And most of them said that it was a really good idea. And then we said to our management, like, everybody thinks this is a great idea, right? And we think we have the skills to build it.

And, of course, one good thing is it's always good to apply a little bit of action on an idea before you go searching for opinions because opinions can quickly kind of hold you back. And we had already applied some actions, like built something around it. And so, you're kind of coming up with something that they almost, at least we're trying to get into a scenario where they can't say no, or at least that, like, can we test this out in one or two trials?

And then you make sure that, like, all of the downstream processes around this, I mean, we're still, like, in compliance with standards. We're still, like, able to deliver everything we're supposed to do. And now, hopefully, people would be a lot happier. And now also, like, CO2, right? We really want to save some trees as well.

So, that's been a major selling point for us in terms of Shiny apps. And then also, another thing that I think most companies say today is that they really want to be data-driven, right? And if you want to hold people to the word, I think, like, our community is probably one of the parts of the organization that have, like, the most enabling power to give to other people who doesn't necessarily have data scientist skills. I think that's, like, if you can make somebody higher up in the organization, realize that you can make so many people do their job in a much more simple way just by going into a Shiny app, or maybe even just, like, looking at automated reports on a connector, I mean, what be it, that you can enable so many people just by doing that. So, yeah, that's been our, kind of, selling point of trying to get all of this into our processes.

Advice for those entering the pharma industry

Hi, Agus. So, my background is in mathematics, and I'm now moving into bioscience. So, I'm currently doing my PhD, and I'm going to finish in a couple years, and industry just seems like this massive unknown, where, as an academic, I don't really know what is wanted. Do you have any advice for, like, things to learn or technologies to learn to make yourself more competitive in that environment, and specifically in bioscience?

Yeah, I think, I mean, now I can only really speak for my own company. I think there are, of course, like, some technologies that we're moving more towards. I mean, so, if you know, like, any of the open source languages, like, do you have, I mean, recently, when we have candidates, we also look, I mean, are they on GitHub? Like, have they done something interesting? Do they, can they, like, display a passion for what they're doing, and for us now, we're also putting it in our job ads that it's almost more important that you are, like, willing to learn new stuff. Of course, there are some jobs, also in the farm industry, where you have to have a specific degree to do that job, but, for instance, for development jobs, we're looking much more into that.

Like, you have a passion for doing it, we can see that you have done it before, and when we speak with you, you know your way around, like, for instance, package development, and which packages you use to do package development. So, I think it's a much more, now, a display of interest. So, I think a lot of companies today do these, like, interview questions or exercises that you do before, and, like, if you're applying at a pharmaceutical company, then, like, ask about, like, the disease that you're, maybe you've gotten some data on some disease, like, what does this mean? What, like, show that you're interested in in what it is that they are trying to do in pharma? But, I mean, keeping up with technology is a really good thing to have on your resume.

When to create a package

I'm going to jump back to the conversation about packages for a second here. I saw there was a Slido question that said, I'm curious from Mari's experience of writing packages, when would it be considered overkill to create a package and when is it justified?

Well, I mean, it depends on whether, so I mean, of course, now I'm sitting in an organization where you know that a lot of people are doing the same exact thing, but maybe just a little bit different than you are. So, most of the things that we do every day is something that at least 200 other people are doing. And that's a really strong rationale for just making a package around working specifically with tables or Git or repos and so on.

But if you're working for yourself or in a small group, like my wife is a PhD student currently and has also done a small clinical trial, and you kind of quickly learn that you can actually gain so much from setting up functions for yourself to kind of repeat tasks over and over again, to read in small data sets or to format something that you want to output later on. So, I think there's probably not a package too small if somebody benefits from it, and especially yourself. But in our case, it's pretty easy to know when you're supposed to do a package. Most things, and also kind of referring to a comment before, most things end up in a package because we expect that many people will use them.

How are you actually helping share those packages across the organization or helping share what other people are doing in different teams so work doesn't, I guess, get duplicated? Do you have community groups?

Yeah. So, one thing that was running, and I think it's still running, is we had this initiative called Radar, where everybody could post ideas to not just code development, but process development, or let's get new coffee in the coffee machine, or whenever people start, they should get a water bottle now. So, all these different things. In terms of code development, moving into Azure DevOps has been a way for us to also, first of all, make all of that work visible. So, before, I couldn't necessarily see the code and programs from another project, and we tried to really open up and make all of that work visible and kind of in a help yourself way, where if I want to see what other people are doing, I can just now see all of their code, and it's the same way with the package development, now that all of that development and organizational development is also available for everybody to see. So, that's the way we've done it, yeah.

Python, Shiny, and future language plans

And I think the question was, do you see Python for Shiny as something you would use for your use case in the future, or now?

I think there will probably, so I'm not that well versed in Python, but I can see, I mean, there definitely will be some use cases for Python for us as well, but a lot of the stuff, like now we have like a huge SAS infrastructure, and now we also build a huge R infrastructure, and now when we're starting out with Python, I mean, like you have the reticulate package, and you have so much like interoperability between R and Python, that there might be many things that where you either use like the Plumber API or reticulate to kind of use the best thing of those languages, so not even if you were to at some point in the future do a full trial in Python, like you would maybe just call some of the R infrastructure underneath it, because those two languages are now so easy to kind of switch between.

But I love the quote, I think, and probably I saw Joe Cheng 's keynote again not too long ago about Shiny, I think it was last year, where he says like Python, it's the second best thing for everything, and it is really like when you're looking at doing like production grade enterprise level apps or systems that you build both in R and Shiny and also other technologies, I think that Python will be some of the things, and that's also what we like recently hired for, that we want to have Python developers also to influence some of the stuff that we're doing in R that might be even better to do in Python. So it's definitely coming, I would say.

Career lessons and the "no side projects" rule

Thank you. So a question that I love to get to ask leaders in this space is could you share an important lesson that you've learned over your career, or maybe a career lesson that just stands out to you, and I ask that.

I think, I mean, there are kind of two things that come to mind, and one of them is, I mean, this is just my kind of personal belief in like if you want to do new fun stuff that you think will actually help somebody at work as well as, like I try to do this like rule where I have no side projects, because usually like you're doing work and then you have this side project that you're like really excited about, like you want to build up some stuff and so on, but people know that like the actual work is really what consumes all of your time. So trying to have no side projects in the way where if you have a side project, you're trying, I'm really trying hard to make, I'm like this is actually what I'm supposed to do, right? Like the same with this shiny app that we were trying to, like this is how the process should be, this is how we want to work with it.

And also now like rolling R, like we wanted to do more R development and we can also see now that like we can interface with so many different things that like, but the people coming into like especially my project, like if I've told them like now we're running these five trials, everything is in CES, and on the side you'll have to learn R, because in two years time we're going to do a trial in R. So I've really tried to use my power in that sense that I really wanted to change their environment such that like it's not a side project anymore, like this is, I want you to learn this stuff like on the job and we'll try our best to kind of support you in doing that, because I mean there's also like some like logical aspect of like working towards the deadline with something you know will be used by somebody, so you try a little harder. But it's also just like whoa, this is really new and exciting and there are people here to help me do it and people just grow incredibly fast if you let their side projects be their main projects really.

People just grow incredibly fast if you let their side projects be their main projects really.

And another thing is like one of the things that like made the biggest splash in terms of now we have like some new technology we could use is like it's not the shiny apps and it's not like using R and like other processes. I did this, so whenever we have a trial ending and there might be some like sensitive information or confidential information that you need to keep secret until you kind of do a press release with results, everybody who has access to those data needs to be on an inside list. And what we previously done is like the responsible programmer, so the programmer in my role, kind of tracks who has access to the data and then we like generate a list that we send to like a legal department and they put them on inside. So it's like this long chain up and they like up to like when we do the release and so on like they're asking every day like maybe twice a day for this list like did anybody go off of this? Did anybody like new get access to the data?

And I remember I don't want to do that every day. I don't want to like sit and like figure out who has access to the data. And luckily with R we now had like we could query the server that we had data on for accesses and I just wrapped that into like a data table and a markdown script, put it on a connect server and then just made it open for them and made it refresh itself every six hours. And like so many people who like maybe have some technical roles that are not really touching data in that sense, like you have so many people now getting automatically on the inside a list that they may not have been on before. And there are so many people outside of the area that I have and they were like calling me like what is this thing that I'm now on this it's a lesson and not because they want to do like bad about having access to the data but it just created such a huge splash that now we were able to like automate this part of it and so many people were kind of affected by something that we have automated that like that has really like resonated like far out into our stakeholders that we did that.

And I mean we thought it was pretty cool that so many people now like automatically got on the list and that's been like one of the biggest splashes to like we automated something right that actually created value for others. So that was a pretty cool yeah.

I love that. It's really cool when you get to see all of that feedback from everybody right away. Like just yesterday I did something really small where I added like a download csv to a table that was in a Quarto doc and I realized right away like that made that report so much more impactful to somebody that they could download it.

Submitting Shiny apps to regulatory agencies

Niels I see you just put a question here in the chat you want to jump in and I can read it. It is what is needed to submit a Shiny app to a regulatory agency? What are the requirements that they are looking for?

So I might not be the best one to answer this one because it's not something we have done. The only thing I know is that our consortium working group are for submission have some really great people working on getting Shiny apps to FDA and there are many aspects of that like how do you read the data into a Shiny app in a system where you don't know where the data is right and how do you spin up something that can run a Shiny app. So there is an association for a statistical program in pharma called FUSE which has had a working group around like interactivity and submission for quite a few years where they were talking about like if everybody is submitting Shiny apps like FDA will never know like there's no like red thread around like this is how we do data imports in Shiny. So if everybody is making up their own UIs about how you analyze data, I think FDA will have a hard time going through all of those Shiny apps.

So they have been working a little bit on like should we try and set up some standards around submitting something like a Shiny app to FDA. And I think even like even less could even do it like if you were doing just like HTML file if you're using flex dashboard to do stuff. I think that could go a really long way and I do believe so FDA has a file format specification document over all of the allowed files and I'm pretty sure dot html is on that one. So you could get pretty far probably with doing something in Quarto or Markdown.

Validation, dual programming, and environment control

Yeah I know this is just a kind of a hot button topic that a lot of groups are going into but just how do you guys approach the issue of like dual independent programming and validating your Atom, SDTM, Atom, TLF data sets. Was it more of a one side one side used SAS another used R or if you both used R when it came to like package and environment versioning like validation to provide that full traceability to the FD like how did you guys approach that infrastructure behind the scenes if you don't mind me asking.

Yeah so one of the things I think I can mention first is that like for quite some years we have been working with our standard operating procedures and making them like more easy to work under and especially so we have one for we call it the custom programming SOP standard operating procedure where we talk about like the different review levels that we're applying to custom programs as we call them and there we kind of introduced a self-review level, a peer review level and then a double programming level. Most people call it and we're actually trying to do less and less double programming. I mean so for many years we've kind of been at our ears blown full of risk-based approaches to many things and so we've been trying to like you double program the really most essential parts primary analysis the most important analysis data sets and then if you kind of step one level out to like most people do double programming they have done it in CES but most people are doing it in R now we're also doing like R versus R.

And then in terms of like doing those GFL in order to encapsulate like what you're doing on the trial I mean we're running RENV or REN to control like the local environment and that's also why like we needed to have those trials. We needed to have them use GIF because we need to track like that local environment and then if you step one further out into like how are we getting packages into the organization and I mean we use Posit Package Manager and we apply kind of the risk-based approach from the R Validation Hub. We do risk assessment of packages and we worked or some of my earlier colleagues have worked very hard on like automating that to some sense. I think if you want to have a really good example of how that would look like is I think it's Aaron Clark from Biogen has created this risk assessment app which is built up on the risk metric package and we use that as well for our internal risk assessments and then the environment itself like the workbench and the package manager we have had some people from IT help to bring it up to like a GXP validated environment that we can actually run trials in. Yeah I hope that covers it.

Yeah I think my biggest like issue I've encountered is just trying to allow any sort of confidential or patient-sensitive data into a GitHub or even if it's a private repo or something to allow that to be moved out of some firewall locked state and so then comes the question of you know you know SVN or what do you use at a local level to to ensure that contain like docker or you know kubernetes or what to containerize the versioning as well as the validation of the packages and everything so that it can be built from that and then it just becomes it just turns into this mess of trying to explain and keep separate so if that makes sense.

Yeah I mean so maybe luckily I can say it I mean we don't use GitHub so we have Azure DevOps within the firewall and I think one of our recent like GXP requirements are that basically any system that you're like interacting with when you're doing trial has to be within the firewall and like in terms of this system so of course there's a lot of documentation for the system that you need to have in place if somebody were to kind of inspect the system. Our system will soon I think actually tomorrow shift to run on the kubernetes. It's been running on like Amazon virtual machines I think for now and everything there is built like infrastructure as code so we have control over everything we do there and as I mentioned like the guys from IT are really the ones helping like with the requirements around standing up that environment and they also have like inspection experience from other systems that we have so that's been a huge help in kind of getting it to work for us. No that's great yeah thank you.

Sorry if I cut you off but thank you so much everybody for joining today and thank you Ari for sharing your experience with us. I wanted to remind everybody before you log off if you want to hear more about the work they're doing at Novo Nordisk and see Ari again presenting. That event will be on September 12th and I see Tyler just shared the link in the chat right now if you want to check it out but Ari thank you so much for joining us today. Yeah thank you so much for having me. It's been an honor to be here. If people want to get connected with you is the best way LinkedIn? Yeah probably on LinkedIn. Okay awesome. Well thank you all. Have a great rest of the day. Nice to see you. Bye everybody. Bye.

Ari Siggaard Knoph @ Novo Nordisk | Data Science Hangout

Transcript#

The NDA submission milestone

Music, self-teaching, and collaboration

Collaborating with version control

What the biostatistics team does

Pitching Shiny apps to management

Getting into pharma and career background

Package development at Novo Nordisk

Transitioning from SAS to R

Pitching Shiny apps and enabling data-driven decisions

Advice for those entering the pharma industry

When to create a package

Sharing knowledge and making work visible

Python, Shiny, and future language plans

Career lessons and the "no side projects" rule

Submitting Shiny apps to regulatory agencies

Validation, dual programming, and environment control