David Granjon & Bo Wang @ Novartis | User-friendly, self-serve tools | Data Science Hangout
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the Data Science Hangout. Hope everybody's having a great week and are enjoying the Shiny conference. Thank you so much to the Appsilon team for putting on this amazing conference and for asking for the Data Science Hangout to be a part of that. It's really cool to be included in the conference.
But I know we have quite a few people here joining the Data Science Hangout for the very first time. So thank you all so much for joining us. I'm Rachel Dempsey and I lead our pro community at Posit. So we host this Data Science Hangout every Thursday at the same time, same place.
So what is it? What's going on here? This is our open space to chat about data science leadership, questions you're facing, getting to hear about what's going on in the world of data across different industries. And so every week we feature a different data science leader as my co-host to help lead our discussion and answer questions from you all. There's no presentations or slides, just all conversation with everybody.
We're all dedicated to making this a welcoming environment for everyone. And so we always love to hear from everybody, no matter your level of experience or area of work. This is a bit different than the Hopin platform that you're in for the Shiny conference. So I just want to make people aware of that to be a bit careful with unmuting and those things that come with Zoom.
It is totally, totally OK to just listen in here. But there's also three ways that you can jump in so you can ask questions or provide your own perspective on certain topics by raising your hand here on Zoom. If you're not familiar with Zoom, it's in the bar below where it says reactions. If you click on that, you can raise your hand. You can also put questions in the Zoom chat. And if you want, just put a little star next to it if you don't want to jump in and read it yourself. I'll know that maybe you're in a coffee shop or something and you want me to read it. And then lastly, we also have a Slido link where you can ask questions anonymously.
And our team here at Posit will share that into the chat in just a second. We'll keep posting it in there so that people can find it as well. It's great to see all the messages in the chat. We do share the recordings of each session to the Posit YouTube and our data science hangout site. So you can always go back and rewatch or share with a friend.
But with all of that, thank you so much for joining. It's so amazing to see this big group that we have here today. And thank you so much, David, for joining us as our featured leader. David Grandjohn is a senior full stack Shiny developer at Novartis and the founder and maintainer of the open source R interface organization where he develops Shiny extensions like BS4- and Shiny Mobile. We also have Bo Wang, senior expert data science at Novartis, who's also joining us in the audience today. He'll be helping with any questions that come up on their data monitoring committees at Novartis as well.
Introductions
So let me bring you both here to introduce yourselves as well. David, I'd love to have you introduce yourself and maybe share a little bit about your role and also something you like to do outside of work.
Yeah, sure. Hi, everyone. Thanks for having me. I'm David Grandjohn. I was born in France 32 years ago. I'm married. I have one wonderful daughter, a wonderful wife. I'm doing data science at Novartis since four years. My background is about applied mathematics. So I have quite a sort of Swiss knife background where I can do physics. I can do mathematics, chemistry, a lot of different things and also background in web development. So when I learned web development, I learned HTML, CSS, PHP and also database on the side.
I'm super glad I was able to attend the keynote before from Colleen because I like sport a lot as well. I hate running, but I like cycling a lot and probably for the same reasons as Colleen. So, yeah, at some point, no matter what sport you are doing, if it's making you better, feeling you better, having new ideas, taking some insight of what you are doing, that's just what you have to do.
Thank you again, David. Bo, so nice to have you here, too. It would be great to have you introduce yourself as well.
Thank you. Thank you for having me. First of all, I love running and I hate biking. I didn't know that about you, David, that you didn't like running. Maybe we can team up as a relay team in the triathlon or something.
Anyway, so my name is Bo. I work on the same team as David and Novartis. We are data scientists who build tools, pipelines to support our clinical statisticians and other data science functions. My background is in biostatistics. Since I started working in the pharma industry, I've been working on more on the tool development, almost like software development side than actual trial analysis. Although I have been working on supporting data monitoring committees, as Rachel mentioned before, and there seems to be a bit of interest in that on that topic in previous meetings. So that's why I'm here. Outside of work, like I said, I like running. I like traveling. I'm actually right now in Memphis, Tennessee, even though I'm normally based in Boston. I do not have a wife or a daughter. I do have a dog who I treat as my daughter. She's at my foot, napping. And that's pretty much it. Glad to be here. Thank you.
Building great user interfaces
But something I was really excited to start off with asking is, David, I know you think a lot about building great user interfaces. And I was curious to hear tips you have for all of us in communicating with the business through data visualizations.
Yeah, sure. So from I think we started a big project last year, which is around probability of success. And then for this project, we really thought about another way to start our Shiny app project, especially, you know, from the user interface perspective. So I would say when I joined Novartis at the beginning, most of the time we were, you know, rushing to code. We developed the interface, the app, and then we showed the app to the end user and then they are not happy with the result. We completely changed the process. And with this project, we basically start with wireframing. So we met with the user, we drew the user requirements, we discussed together for one week, two weeks, and then we present some mockups, some wireframing. So you could basically either use a paper and a pen or you can use more sophisticated tools. I would say it's a bit of pity that most of the good wireframing tools you have to pay. We are using Balsamiq, Novartis. You may use it, you may not. It depends if you know. And then based on that, we spend around, I would say, one month before we can come to an agreement and then we can start to code.
Overview of projects at Novartis
Yeah, I was just wondering if we could get an overview of some of the big projects that you've been working on at Novartis. I know that Novartis is a pharma company and that it's really big, but being a person outside of the pharma universe, I sometimes don't know what you guys are working on.
Sure. So two of the bigger projects I've been working on, one I've been working on for a couple of years. Basically, when a clinical trial is being conducted, there are a lot of data analysis on the patient data that we collect from the enrollees. And for later stage, for later, there are like earlier phases trials in a small amount of patients with lower doses. And then once the safety established, you move on to later phases with larger number of patients and higher doses for efficacy analysis, for example. But the patient safety is always monitored through all the phases.
And for later stage studies, the sponsor in PharmaLingo, the sponsor is a pharmaceutical company that pays for the clinical trial. Anyone inside the sponsor company are not supposed to know the actual treatment of the patients, given if the study is a randomized controlled trial. We need to hire external partners. We as sponsors have to hire external experts to be unblinded to the actual treatment code and review safety data to make sure that all the patients in the ongoing trial are still safe. So one of the big projects I've been working on in the past couple of years is writing and managing an app, a Shiny app that has a JavaScript backend to display some of the safety data in the more user friendly way than, say, 200 pages of PDF. Because the DMC members tend to be very highly regarded, very expensive, their time is very expensive, experts in their field. So they have very limited time that they can give us to review the data. So having an app as a supplementary tool aids their review of the data with the limited time that they have.
Since I joined Novartis, I've been very lucky regarding all the projects I received. Surprisingly, at the beginning, like during my first month, I was given a project, not about clinical data, but for human resources data. Because you have to imagine, you know, Novartis, it's like 100,000 people working. And then you have many departments. It's not only clinical, but it's also sales, human resources and many other things. And basically, we had to develop a tool to better manage the time people spend on projects. And this tool was basically first developed with Shiny, with a lot of databases around just to make sure that data are persistent. And then later, the tool was improved using other technologies like React. But in the end, yeah, it shows, you know, how you can integrate Shiny in the whole ecosystem.
Then I joined the Probability of Success initiative, which was really a rewarding project because we really had a checklist and tried to respect all possible best practices. This checklist was based on the mistakes we made on other projects. And then, yeah, we started with the wireframing. Then we had like a strong CI-CD setup. So yeah, I'm responsible of the CI-CD setup of the team. So making sure we have... So we are using GitLab, so we have runners with proper installation, system dependencies, everything. So if you want to do like headless testing for your Shiny apps, you make sure you got the proper installation. Yeah, so setting up with runners, yeah, how to manage the release, you know, the production release. So this afternoon, we had, during the conference, we had the production release for POS 1.0.0, the first one, which is like one year of work. So that's quite a big achievement.
So POS is clinical data, right? I didn't really give the background, but, you know, when you work in pharma, pharma companies, they invest billions into making a treatment. It's not just like tens of thousands, it's really billions. So you need to make sure what you are going to do is successful at some point. You need to know to analyze the market on everything. And these probability of success tools provide like a data-driven solution to have an improved diagnosis for your probability of success.
And yeah, finally, I've been involved in migration. So it's more like in the server infrastructure. So to migrate from Shiny Server Pro to RStudio Connect. So initially, yeah, we had Shiny Server Pro and now we have RStudio Connect. So I had to supervise this migration. So essentially talking about thousands of applications and make sure, you know, those applications work correctly on RStudio Connect, Posit Connect now. So that was quite challenging because all apps were not necessarily following all the best practices, you know, essentially around package dependencies. So sometimes you are not able to find, you know, where a package is coming from. Yeah, and the application was just not working anymore. So we did a lot of work around renv to set up reproducible environments for many projects, which has a learning curve.
Yeah, and we are also tracking usage on RStudio Connect. So using the API, getting the data from the server to see, you know, how people are using the applications and stuff like that. So that's very, not only people data, that's all data, all possible data.
Validation in regulated industries
Cool to hear you're using that API too to get those advanced usage metrics.
Thanks, guys, for jumping in and sharing your stories. I know Bo already kind of answered it, but validation is usually, especially in kind of regulated industries, is an arduous task generally. And so I was kind of really interested to see, is that something that you guys have built that required validation to be undertaken? You know, sometimes people have alternatives to that. So if you could speak to that, I guess, a little bit more, Bo or David, that'd be great.
Yes, I think arduous is the right word to describe the whole process, as I discovered in the past couple of years. I came from my previous experience in my work throughout my graduate school career. I had never encountered the issue of having to validate a piece of software or a piece of application. Not in the like code review and testing sense, but in the process documentation sense. So it was definitely a learning curve, and we have had a few iterations of processes that are based on software validation. When I say software, you can think of RStudio, the Shiny server, or the RSConnect as a piece of software that we have existing processes written, authored by IT to support. But apparently those processes are a little, there's a little too much overhead for a Shiny application.
So in the past, I would say year and a half, two years, we've been working with our QA colleagues, our IT colleagues, and our colleague who are managing our deployment system or analytical system to kind of communicate both ways. What does quality mean for our QA colleagues? What does quality mean for IT? What does quality mean for business? And try to reach an understanding of how much testing, how much documentation is needed. It's still an ongoing process. We've had a few pilot projects deploy, deploying apps for use in DMC, but there are like exceptions for certain studies that does not apply to other studies. So the solution for certain issues don't really translate to long-term solutions that you can write in the process, in like a business process on how to validate a Shiny application. So that's the part that we're still trying to figure out as we incorporate more different types of studies in the program.
User feedback and app adoption
But one thing that's a gap for us, the feedback that we get from the monitors is still this kind of clunky, maybe they'll write a long email or give us Excel sheets or other things back. What are you thinking about or have you gotten any feedback from users writing your app, which seems like a good opportunity in the Shiny environment?
No, actually on the data points themselves. So a lot of times they'll look at a patient profile and go, ah, this lab value, are you sure this is right? Or, you know, just the usual kinds of, sometimes it's even data quality questions, which are valid because the monitors are often content matter experts, medical experts, and they see things that we as data scientists don't. It would be very convenient if while they're looking at your app, going through what was previously 900 pages of PDFs, if they could say, ooh, like highlight this and kind of submit right through the app and you could kind of get feedback in that format rather than more traditional methods.
Oh, I see. No, we haven't incorporated that in our app to accept feedback directly in the app from the DMC members. In fact, the feedback we've received are mostly not directly from DMC members, even in the email chain, because for pivotal studies for later phase studies, we hire an external CRO to interact with the DMC members directly. And we normally just get feedback from them because we're not invited to the meetings because we can't risk unblinding, even though the developers are not technically part of the trial team, but we are still part of Novartis business. That's one of the things that's been part of my learning curve also, how many silos are intended in the process of clinical data review just to make sure data is secure.
It is indeed. But I remember one project where it was for like specifically for team and then we had like a kind of surprising reaction, you know, from the end user, because they were actually not really convinced about the Shiny app. They would actually prefer using the PDF, the table listing, you know, these frightening things. Just because, yeah, there were like many options in the, like the safety explorer, widgets, stuff like that. Too many options, they didn't know where to click. It depends if you're already familiar with these kind of tools, you know, if you're familiar with web application or not, I would say.
Too many options, they didn't know where to click. It depends if you're already familiar with these kind of tools, you know, if you're familiar with web application or not, I would say.
Personal aspirations and career advice
Uh, that's tough one. That's a tough one. Uh, that's not a technical, uh, advice I gave. Uh, that's just a mindset, uh, advice probably like never give up. Um, it's hard today. It will be better tomorrow. You know, these kind of things, uh, like Colleen said in the keynote, I was like, I always go back to the keynote, but, uh, there are days you want to quit, right? It's hard to make things work well, uh, using best practices. But in the long run, yeah, you learn about a lot about yourself. Um, you collaborate better with people. Um, but yeah, I mean, for everything in life, it's hard. Uh, if you want to have like raise your kids, it's hard, uh, do sport. It's hard. So that's like a common point for everything. So believe in yourself and never give up.
CI/CD and deployment with Posit Connect
Uh, so David, you're talking a little bit about the, uh, uh, renv and using, uh, connect for CICD sort of deployment. So the one thing with connect is it's a pull. So it pulls from, you could, you could pull from, um, it's designed to pull things in. Uh, sometimes that's different from, uh, uh, enterprise sort of scenario where you have to, they're expecting everything to be packaged up and sent to the server. Uh, did you use renv to perhaps address some of those challenges around that?
Yeah. So for instance, when we have, um, like your Shiny app, for instance, we want to send to connect. Um, so first of all, we make sure it's using renv, right. We have frozen dependencies. Then we have the CICD pipeline. We have a specific section in the YAML file, which is restoring packages, uh, in the, in the runner. And then, um, you have some instructions to deploy the app, uh, on a Posit Connect. So you need, uh, like to use the connectapi. Uh, so yeah, you, you need a API key, uh, on, uh, different things, not going to enter too much into details.
Posit Connect doesn't use renv. You know, when it restores the application, it uses packrat, but if you have frozen dependencies in the runner, it's much, much easier than if they are packages, you don't even know where they are coming from. And you try to really push everything. And then you have error messages everywhere in the Posit Connect. So yeah. From time to time, uh, you still have problems, right? But, uh, they are significantly reduced.
Shiny modules as reusable building blocks
Super impressed by what's happening at Novartis where you've built essentially the, the data science team have built these Lego bricks for modules for building Shiny apps. Then other people who want to build Shiny apps aren't starting from a blank sheet, but they're kind of reusing those Lego pieces. Do you want to talk us through that? Cause I, I think that's a novel approach.
So Mike, when you say about the legal pieces, are you talking about data pipeline presentation from.
Yeah. We had some presentation at our, some years ago where, yeah, there are some modules to like develop visualization for clinical trials. And then you can combine with modules together to create your application. So you don't need to, you know, develop them yourself.
So the modules we, I don't know how many modules we have, but we probably have like 30, 42 modules. Which you can basically plug into your app. And then depending what you want to do, if you're going to do like safety Explorer. Yeah. Patient profile. Whatever. Yeah. They're very still. And in that case, people need to dive in the code. And at some points that's inevitable, but. For sure. That's a big change for us. The main challenge is also like making rules. Modules. Following best practices in terms of. Like software development.
You know, we are not even 10 in our team. So that's, that's a lot on our shoulders. As Colleen said before, you know, from time to time you, you hear people. They're not super satisfied. But in the meantime, you also have people that are really thankful to what you are doing. So that's what's so rewarding for as a developer.
Tracking app usage with the connectapi
I wanted to ask you about the users. Because the beauty of Shiny is that we can build apps. Like very fast in small iterations. And we can keep getting the feedback and upgrading on them. And this makes it like a bleeding edge. Kind of software that we can provide to the users. And I know that you're also sure that it is not that easy to get. Feedback from the users directly. So I wanted to hear a little bit more about it. How do you cope with this?
So, yeah. Just to know whether your application are properly used. Or whether, you know. These kind of things. Right. And how people are using the applications. Typically.
So. This is. I'll be honest. This is something we will start to do at Novartis. So I did it. Like. In the open source. But I'm really talking. What we are doing at Novartis. So what we have. We have the tracker. In Posit Connect. Which basically. It's an application. That we developed last year. And when I was in Japan. So, you know, when you travel, you can have. You can do many things. And yeah, this basically queries. The API to get. Like all applications. For all application you get. Who is accessing the application. How much time. Which basically give you some insights. So. When you develop an application. Several teams. You can have some insight of. What which team is using the most. This application. And how you could target your support. Like if you see one team is using the application 20 times more than. Another one. Maybe. Like invest. More people to provide. Here.
And on the other side. If you want to know how people are using the app. So like the in app usage. You may know. Like Shiny heatmap. That's started from an experiment. I made around. The JS. Library. Because in the past, I was using. To track. On my website. And one day. I. Yeah, I was just wondering. Why. Why should. I just use. So I tried the tracking. To put it in the. And then I couldn't see anything. It didn't work. And I connected the support. And they told me. Yes. It won't work. Because that's. Quite specific technology. And then I took. And brought some. Very simple code. We are in. Put it in a package. Yeah. The Shiny heatmap. So basically each time you click in your app. It will record. X and Y location. And then aggregate the data. At the end of the day. So you can see where people are clicking. Yeah. For instance, you can identify some dead zones in the application. Which will help you to. Refactor the design. At some point, you know, if you. If you realize there is a path that nobody visits. Maybe that's. Because it's badly designed.
For instance, you can identify some dead zones in the application. Which will help you to. Refactor the design. At some point, you know, if you. If you realize there is a path that nobody visits. Maybe that's. Because it's badly designed.
Shiny versus other front-end technologies
From clinical from apps, supporting clinical projects. It's an easier answer is that we. Don't use anything beside Shiny right now, because. First, we don't really have. Super big app, super complicated app and super big. Sample size. Clinical trial data are quite small compared to many other industries. And second is we actually tried React before separating. The front end and backend, just to improve. Flexibility and the visual effect and make it look better. But eventually, like I said, in the comments, our team is very small. Like David mentioned, we're not. Our goal is not to write a tool and maintain a tool for each study that comes to us, but to create something that we can ship off to our. Statisticians to use on their own and to manage on their own. And we have much, much better. Adoption of our, especially with the younger version of a younger generation of statisticians coming in from academia. Many of them are quite. Fluent in our.
For some other bigger scale apps, like the one that David worked on for the HR data or the POS, maybe. You're right, Bo. I totally agree. We decided to go for our. And if tomorrow we say people, yeah, we have to go to Julia or whatever. At some point, you know, there'll be a big clash. Our is doing what we want to do. It's doing it well. Now we have many tools. If we want to speed up. We use parallel packages to speed up computations. If we start to develop something with React. The problem is we are like three in the team to know React. So if we leave in the future, then who is going to maintain. The tool. So if we develop it in our, maybe it's not as fast. Maybe it's not as optimized, but at least we can find someone to maintain it in the future. And this is what matters, you know, when you're in the industry at some point.
Favorite resources for learning Shiny
For me personally. Stack overflow. But for someone who's just starting off, I think a lot of the, the, the gallery and the, and the tutorials on Posit's website. Are there. As I understand, like they're current, there are news articles. But you can always go back, go back to the. I think there's a series of what. 15, 20 videos. Of Shiny tutorials from the very beginning. And I found that to be really helpful. Yeah, and later. At least mastering Shiny, I believe it's in. It's on version two right now.
Awesome. Well, I know we are coming to the end of the time here and I don't want anybody to be late for the next. And I did just want to remind people that you'll go back to the Shiny conference page and. Use Hopin to get to the next event, but if you had fun today and want to join us again. We do have these data science hangouts every Thursday. From 12 to 1 Eastern time, so the same time and the same place. And so I'm just going to put this link. Into the chat if you wanted to add it to your calendar, you can do so there. There's also a data science hangout site on the Posit. Site that you can check out recordings of past sessions as well.
Thank you so much. David and Bo for joining us and answering everyone's questions. Thank you all for jumping over here to the hangout. It's been fun spending time with you and. Seeing all the great discussions in the chat, not able to follow along with everything as we go. So I'm looking forward to reading it after as well. Thank you everybody.
