Resources

Data Science Hangout | Eric Nantz, Eli Lilly | Innovation in clinical trials with open source

video
Dec 16, 2022
1:01:15

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout. I was going to play around with a lesson I learned from Matt Dancho last week. If you've been here before, do you want to try putting a one in the chat? If it's your first time joining today, maybe put a two in the chat.

It was fun seeing all the people commenting last week with Matt. But if it is your first time, it's so nice to meet you. I'm Rachel. This is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, and getting to learn about what's going on in the world of data science across different industries and companies.

We share the recordings each week to our Posit YouTube, so you can always go back and rewatch or find helpful resources. These go up to the new Data Science Hangout site at Posit.co as well. I was a bit slow on this, but I just put up Jesse's and Marco's sessions as well this week.

Together, we're all dedicated to creating a welcoming environment for everybody here, so we love when everyone can participate and we can hear from you all, no matter your level of experience or area of work.

There's always three ways that you can jump in and ask questions or also provide your perspective on a point. It doesn't just have to be a question. You can jump in by raising your hand on Zoom here. You can put questions in the Zoom chat, and feel free to just put a little star next to it if it's something you want me to read out loud, if you're in a coffee shop or something like that. Then we also have a Slido link, which I forgot to share with Hannah ahead of time, so I'll copy that over in a second here and put that into the chat.

With the Slido link, you can ask questions anonymously too.

I am so excited to be joined by my co-host for today, Eric Nance. Eric is a director at Eli Lilly and Company, podcaster, and community builder that you may have seen around the open-source community a few times. He just celebrated his 100th podcast as well, so congrats, Eric. I would love to have you just jump in and introduce yourself and share a little bit about your role as well.

Well, thank you so much, Rachel, and I'm really thrilled to be here. I've been watching the Hangout since the very beginning. It's really an exciting space to be a part of and to be a co-host with you. It's tremendously exciting.

My name is Eric, and I've been in my current role now as a director in the statistical innovation group at Lilly for over 12 years, which does not seem like a time that certainly flies, but my day-to-day is a mix of a few things. I'm sure this will probably come up in the conversation, but I kind of wear a few hats. I'm a mini-manager, as they say. I got a couple of direct reports in our group, and so I'm trying to help enable them to succeed. And where I put a lot of my focus time these days is bringing innovative algorithms and designs for the new clinical studies that we're trying to launch. And throughout all that process, the one tool that unifies everything is R. So literally, it is at the forefront of what I do, whether it's building Shiny apps, internal packages, consulting, and even with the community building that I'm sure we'll touch on as well.

So I've been in a mix of a lot of different things at Lilly, but learned a lot along the way, and certainly happy to share my insights of that. And yeah, it's always never a dull moment, as they say.

Getting started with open source and podcasting

Well, I would love to get the conversation going around some of the community work that you do, because I'm always so impressed with all that you do. And I was just curious, how did you first get started in all this, and like podcasting and just getting involved with so many community events?

Yeah, now it's going way back a ways, but the roots of this actually started in my dissertation back in grad school, where I would not have been able to complete it without R, because R had the package from one of the leading authors of a manuscript for the method I was using called competing risk. And ever since that point, I not only used that package to crunch through the numbers, but then I was able to produce a dissertation with S-Weave, which was a precursor to Knitter, and now what we know of R Markdown and Quarto. So I was hooked, like I thought R was amazing.

I never really had training in it until then, so I was just enamored with the world of open source. And around this time, I was also getting into open source with respect to like the Linux operating system and things like that. And as I was learning things about Linux to help with my grad school duties, I found podcasts. There were some podcasters teaching me some great skills around Linux and open source, and really helped me in quite a few situations with some education there. And I thought, well, I like R so much, and I certainly don't know everything about it, but maybe if I just start to put myself out there a little bit and maybe see what might be receptive to people, and kind of like a little nudge to learn out loud before that became a catchphrase from Mara.

So I started the R podcast back in 2012, 2011. I forgot the year, but it really just started me just kind of learning about R and a lot of the things that I hadn't had a chance to pursue yet. And lo and behold, I started to get some feedback that people were liking it, and I decided to keep going. And I think what really turned the corner on that was not just me doing my own learning, but to showcase the brilliant developers, package authors, practitioners that were doing amazing work. And I wanted to give them a platform to share their backstory, you know, how they got to where they were.

And that started kind of my interview, you might say, sequence. And through those connections, just things happened organically after that. Started collaborating with NowPosit, formerly RStudio, quite a bit. Started to go to the various conferences and meetups. And lo and behold, I was learning so much from the community that it became a passion to keep that going. And then more recently, I decided to augment that into like Shiny because I'm so passionate about Shiny as well. Starting the Shiny Developer Series a couple of years ago. And now more recently, the R Weekly Project to help curate what is available in the community. So all these things just kind of happened organically.

But I thought not many others are trying this route back in 2012 to put themselves out there via voice. There are only a few blogs out there. But I thought if the Linux podcasters helped me so much, maybe I can help others in the R community too along the way. And like I said, learn for myself as well, but just share it with the world. So it all just kind of happened organically. And I'm glad I stuck with it. Certainly been off and on, as they say, but it's been a real, real passion of mine to share that knowledge. And like I said, to showcase who else is doing brilliant work in the community.

And I thought, well, I like R so much, and I certainly don't know everything about it, but maybe if I just start to put myself out there a little bit and maybe see what might be receptive to people, and kind of like a little nudge to learn out loud before that became a catchphrase from Mara.

Marrying a love for R with professional work

So this is another mini story time, as they say, but it was about a few years into my job where in the very beginning, I wasn't using R to do very much because I was assigned to do a more, you might say, traditional study project statistician role where I was helping oversee the analysis strategy of a particular study. And as many might know in life sciences, SAS has been the predominant player in computation there, but that's slowly changing, which we may get to later.

But over time, they hooked me into an assignment, pretty open-ended assignment, where you're getting a lot of complicated biomarker data. And this data was huge. And we were trying to use some pretty novel algorithms with it. And that's where R came into play. I was finally able to take what I learned in my dissertation and start applying to actual work at the day job. And then along the way, we had another talented colleague that was running at that time called the SNR user group at Lilly.

And I started calling into it much like a hangout here, but it was just people sharing what they were doing, small but enthusiastic group. And then within maybe a year or so of me attending that, the leader of this group said, hey, Eric, you know what? I think you should take the reins on this. And at first I'm kind of shocked because I've only been with the company for a couple of years. And this great colleague was very well respected for asking me to do this. But ever since then, that's when I thought, okay, now it's time to take this to another level. And lo and behold, I've been leading the internal R group for about, yeah, eight, nine years.

Moving from SAS to R in pharma

Yeah, well, that's a good thing. We got some time for this, but I can keep this kind of brief to start. But it was kind of leading by example a little bit. I think that to solve complicated issues, especially what I was doing in that early assignment with biomarker analysis, I wanted to demonstrate that the outputs I can get from this, the analytics I can run from this, and do it efficiently to tie in to some really important HPC, high-performance computing infrastructure, which at that time was not very easy to do in the SAS world. I was able to turn around answers pretty quickly and be able to share insights with the leadership, the clinical team. And they were saying, hey, this is great. Keep going, keep going with it. So leading by example.

And then honestly, the other key event that's happening across our industry is the invent of the now the fourth or fifth year running our pharma conference, where now many of us that are passionate about using R in the clinical realm got together, organized this event from the ground up. Maybe we have some of them listening here on the chat here. But that really started to open eyes of many of the industry leaders and life sciences to say, hey, look what these people are doing with open source. Look how they're bringing automation in a seamless way with integrating R with other pipelines and generating interactive insights. And it's just really grown exponentially since then.

And a lot of this is not possible without the open source nature of R, what we can grab from the community, but also integrate it with other systems really easily. Whether it's databases, high performance computing, obviously Shiny for web applications. There's so many integrations that are possible that other proprietary packages may have some solutions for. But A, they cost money. B, they can cost a lot of time. But if you build up the talent, you can do amazing things with R and the other integrations. And I'm not going to sugarcoat it. We're still not quite where I want to get to yet, but we're definitely a lot farther in the last few years than we ever were when I started working in life sciences.

OK, well, time for some hot takes perhaps. So like that question probably insinuates, I also started with SAS. I did not know R before SAS. They are fundamentally different. So honestly, the first thing to keep in mind is that many of the practices or maybe tricks you've learned about how to do things in SAS, probably not going to work with R. They are fundamentally different from a language perspective. So you've got to check that to the side a little bit and go in with an open mind.

It will be challenging at first, but I'm going to echo what others in the community have said, is that I would say getting started with the tidyverse, that's going to be pretty familiar in the way the pipe works and other things like that with dplyr and the like to what you might do in the SAS data step historically. So I think the tidyverse eases that transition a little bit, but there are going to be some habits that you'll have to grow into. But I think that the materials online are really great to get started with this. In fact, Minae's Data Science in a Box is another excellent resource, I think, to get started with that we can probably link to. There's lots of educational materials out there, but unlearn the old habits first before you dive into it. That would be my biggest advice.

Building a community of practice

Yeah, this is certainly another, you know, I say passion area of mine because I've, as I said, kind of in the outset, the idea of bringing people together and sharing knowledge has been something I've been excited about for a long time. It's not always easy because again, we have to sometimes fight a little stigma amongst ourselves that we feel like the things we're doing with either R or just in general what we're doing with analytics isn't really that innovative. It's just like part of the job, so to speak. But I'm trying to get people to come out of their shell a little bit and celebrate it and not feel like everything has to be super polished before they show it to us.

I really made a concerted effort in the last few years to say, hey, we are a welcoming group. We would love to share our ideas. If you want advice, we're happy to give it. If you have any stumbling blocks, we're happy to help solve it. But even if you just found a great package that simplified your workflow, we just want to know about it. It doesn't have to be anything fancy. So making sure people felt comfortable that sharing any contribution, no matter quote, big or small, has been one of the things we've really tried to do.

And honestly, finding a diverse set of topics that we can learn from, because not everybody is going to be interested in the things I'm interested in, right? But finding others to kind of give a different voice, a different perspective, whether it's from, you know, creating that great table with GT or automating a pipeline with, you know, connecting R with AWS, you know, lots of different things are out there and making sure that there's a protected time for people to showcase that. Having a regular cadence has always been helpful too. But I'll admit I'm not the best at it, but I try to surround myself with others that are enthusiastic about the same things, the building this community. So not doing it alone is definitely one piece of advice. Find somebody else to tag along and help along the way.

Oh yeah. Okay, another hot topic for me. What there, it's been a mix of both, I'll say. What helped me early in my career as making great relationships with that same research IT group that gave me the keys, so to speak, to that virtual server, that has grown into probably my most valuable collaboration in the entire company. They have helped us out immensely with infrastructure, with our HPC needs, and really pushing the envelope what we can do with R inside the enterprise.

And how I got there, this is a fun little, you might say, data mishaps story that I shared earlier this year at a data mishaps conference. But I just so happened in my eagerness to turn loose on this really complicated analysis to bring down the entire HPC cluster one day. Like I brought it down. And you can imagine the panic I felt being a new statistician doing something like that. But this IT group didn't say, okay, we're not going to give him any capabilities of doing this anymore. They took me under their wing and they really educated me on what I could do better next time. But they said, we want to partner with you. We know you've got the ambition to really utilize the tech in the best way. And we want to bring you under our wing. And that's been really fruitful.

So whenever I need anything from them or they need anything from me, we're there for each other. It's not always that way in other parts of the IT world. So it does pay to build those relationships with those that maybe kind of get where you're trying to get to and they can help translate some of the processes or some of the hoops you have to jump through. So I've been trying to keep those going fruitfully. And then it's really just being persistent and showing why you need that certain access key or why you need access to that API. Really just showing them what you're trying to do here. And I've been fortunate. Once they see the value of it, they're going to go to bat for me to get it.

Being a square peg in a round hole

So I think one thing that's helped me as I struggle with this, sometimes to this day, still struggle with parts of this, is not being afraid of it, but using the strengths to help better the people around me immediately and then letting that kind of spread out. So I won't lie, there's some pressure being known as one of the R people at the org that people go to for questions every week or whatnot. But what I've been trying to do is kind of what I said earlier, bring up others around me to help carry out certain messages or certain steps in what makes me that square peg, kind of showing where these skills are applied in what situation and getting some help along the way.

And one tangible example earlier this year that I took on is, I was really sick and tired of putting the same email response to questions I got every week about getting access to the RStudio server, getting access to the HPC cluster. How do I launch a Shiny app? Eric, help. It was almost like the copy paste of the response and I got tired of it. And when I get tired of something, that's when I'm like, I'm gonna put this down for a bit. I'm gonna focus on solving this right now. So I took the first two months of this year to produce an internal documentation portal written in Hugo with a really cool theme to document almost every type of info I thought for enabling people the ability to get started with R.

But when I launched that, it made me feel like this square peg thing, now I brought value out of that to others because I took this knowledge that was in my head most of the time and I put it out there, gave links to people, and now people are sending those documentation links to others when they have questions. It's not always coming on me. So I feel fortunate that spreading that knowledge has helped.

Biggest changes at Eli Lilly over 12 years

Well, there's been a lot. A lot of the change that happened that are most at forefront to me was an initiative many years ago where we tried to minimize the time from say a treatments discovered in say the discovery lab to when we can actually get that treatment out to patients to benefit from. It's kind of called minimizing that white space to get there. To do that, we had to do a lot of things differently. There has been reorgs in terms of like how trials are designed, more emphasis into other functionalities. But within statistics, it actually empowered us quite a bit to push new cutting edge algorithms for designing and for simulation and to not just do the traditional analysis for like a clinical outcome.

So the fact that we got this mandate to be like, okay, we got to shorten this time that we're in this research phase or this study phase, then we had to really put all hands on deck to put out new solutions. And so that just brought innovation out of necessity, but it also opened the door for those in my group at the time to really think differently of how we're leveraging R in the design space and become one of the industry leaders in doing clinical simulation. We've definitely have really much relied on a lot of the functionalities that my colleagues have made.

And I'm not gonna come here and say, I know all the ins and outs of every statistical method. That's why we have a team, right? Everybody can specialize in their key areas, but we've all been able to pitch in and be able to bring that innovation to more novel designs that can shorten that time that a patient has to be in a trial. And the industry itself still has a ways to go, but just that emphasis on let's get these treatments out to patients sooner, just organically then brought some new innovations that we were doing in statistics. And it really gave us a voice at the table that we could bring some really innovative change to how we were doing things in the past.

To do that, we had to do a lot of things differently. There has been reorgs in terms of like how trials are designed, more emphasis into other functionalities. But within statistics, it actually empowered us quite a bit to push new cutting edge algorithms for designing and for simulation and to not just do the traditional analysis for like a clinical outcome.

The targets package and ML Ops

So it very much helps that on our extended team, Will Landau, the author of targets, we're in the same group. So we get to share knowledge and I've learned from him way more than he learns from me. But yes, targets has been a huge factor in our clinical simulation pipelines and other endeavors to simplify and automate and make sure we don't have to repeat the same analysis over and over again. And it's been used very heavily in our group.

And yes, the learning curve might be a little bit, you know, harsh to start with, but Will has the most extensive documentation for a package I have ever seen. Yeah, you can book that, that's recorded. Will does an amazing job with his documentation. So you just go to targets home site, you're gonna see manuals, you're gonna see other tutorials. And he's beefing that up even more with things like the bugging pipelines and such. So certainly an MLOps domain for targets. He's definitely done examples of that in the past. And we're definitely using targets quite heavily and future you will thank you for investing in targets.

And we're definitely using targets quite heavily and future you will thank you for investing in targets.

Podcasting tools and open source apps

What I've settled on now is that in particular for the R Weekly show that I do with Mike Thomas, who's been in this group a few times in the past and shout out to Mike if he's listening. We connect virtually using a platform called Zencaster, but there's quite a few like this where you can just have a recording in the cloud of your voice and then be able to download that and do whatever you want afterwards. So we use that every, either Tuesday or Wednesday morning or afternoon, depending on our schedules to record our voice. But, and I bring it down locally and I use a software, unfortunately not open source, but it's also the state of the art in this called Reaper to do the actual editing where I can automate a lot of the voice processing to get like the levels nice and avoid clipping and make everything kind of sound pretty, at least halfway decent.

And it's just super blazing fast, you know, exit out or X out all my little ums or like, oh, I botched that person's name wrong. I do that over again. So being able to smooth all that out and do it pretty quickly, that's been my tool of choice for audio editing. It's been Reaper to bring the voice tracks in and it's a cross-platform Mac, Windows and Linux. So me being a Linux guy, I get to use that too. So that's been helpful.

And then for the video side, for the Shiny Dev Series, I know that's not part of the question, but I can't not talk about the Shiny Dev Series. I record everything, whether it's a live stream or my video tutorials with OBS, Open Broadcaster Studio. It's a lot to get into at first, but actually Jim Hester, a former Posit employee, was doing a few tutorials about some of the things he was working on. And I asked him at RStudioConf a couple of years ago, hey, those are great tutorials, what are you using? He's matter of fact, oh, I'm using OBS. I'm like, OBS, isn't that like super hard to learn? He's like, oh, you'd get the hang of it. And sure enough, I got addicted to it. So now I use that for my video production and that's been a huge win for me because it's so flexible. And members of the Linux community that I follow have also supercharged OBS setups and I follow their lead and bring that into my work station over here. So OBS for video and then Reaper for the audio.

What are your favorite open source alternative apps that you would recommend folks migrate over to? Well, those are, yeah, definitely two of favorites currently. So we all know the situation going on with Twitter. We don't have to dive into it, but Mastodon has been kind of where a lot of the art community is going for social messaging. So I have an account on there. I just formed our weeklies account on there. So how we recommend Mastodon. And for keeping notes, yeah. Standard Notes is a Markdown based open source editor, which I can sync very quickly across all of my apps. Very quickly across all my devices. So anytime I, maybe I'm out and about at a kid's practice and I was on our studio community or another site, I want to quickly jot down the name of a package or a name of a website. Then I can quickly write something in Markdown, bring it into my computer and I get back and research on it. And I keep all my production notes on there.

And then other apps that I use on a day-to-day definitely are more on the Linux and media production side. But like I said, OBS earlier, I use Shotcut for video editing. I use obviously Firefox and Brave for web browsing. I am using VS Code quite a bit now for my development needs alongside RStudio IDE. So lots of things in my toolbox and little utilities just can just add up to really help out. But those are the big ones for sure that are in every computer I have here.

Oh, this has, like I said, been a real thrill for me. And I can talk about this stuff all day, literally. So it was good to be on here. But the Data Science Hangout is a wonderful place. So if this is your first time attending, definitely come back. It's a great place to be.

Thank you so much. And I just remembered, I almost forgot to say, next week is Thanksgiving on Thursday. So we won't have a Hangout next week, but we'll be there the week after. So see you in two weeks, everybody.