Daniel Sjoberg - gtsummary: Streamlining Summary Tables for Research and Regulatory Submissions

Transcript#

This transcript was generated automatically and may contain errors.

I'm a data scientist at Genentech, and today I want to talk to you about the gtsummary package, which has like adorably been mentioned in previous two talks, so I love that.

So this is a bit of a different talk that you might hear about a package at PositConf. Oftentimes you hear about these new cool tools that you are learning about, and while I think this is a cool tool, it's not exactly new. It just turned five years old on CRAN, so like happy birthday gtsummary.

I want to talk to you a bit today about this journey from honestly being an absolute R noob to developing a package that was meant really for my team or my department at Sloan Kettering, where I was working at the time, and then to maintaining a project that's actually pretty widely used in the community now.

Origins at Memorial Sloan Kettering

So picture it, it's New York City, the year is 2018, and I'm working as a statistician, a biostatistician at Memorial Sloan Kettering Cancer Center, and my team and I, we took major pride in the quality of our code and the reproducibility of our results, so we're constantly patting ourselves on the back about it. I thought, should I share how we were reproducible, because it's a little embarrassing in retrospect, but what we would do is in Stata, cue the Imperial March music, we would, for example, if we needed a table one or a demographics table, we would say, hey, well, I need age and grade and stage, and I need that in table one format, so it would calculate the median, the IQR, what have you, and round them, and it would print it to the console with amber sands between where you would see the columns, then we would copy, put it in a Word document, paste, highlight it, convert it to a table, and we're like, wow, reproducibility.

So while this was good, you know, we weren't transcribing manually, we weren't retyping numbers, it leaves something to be desired, for sure. It was around this time that I started hearing some rumblings about something called R Markdown, and it was infinitely superior to what we were doing, and it was very clear that the team, we needed to make a change.

So we were just delighted to find the solution. Almost as delighted as I was to find a color match to this crushed velvet ensemble that Dorothy's wearing to that sticker. Flawless, right?

So we needed to make the switch. There were just a couple issues. We didn't know R. I kind of used R in passing in the past, but I didn't know what the tidyverse was, I didn't know anything really, but we thought that shouldn't really be an issue. It's just writing things in a script, how hard could it be?

So before we made this transition, like, let's take a survey, let's see what's out there, does it really meet our needs as a team? So ggplot was great, but for summary tables, we didn't find the exact solution that was going to be great for us. So we thought, we'll just build one, you know? How hard could it be?

So with the confidence that you can have only when you have absolutely no idea what you're doing, we were like, we are going to build this package, it's going to be so great, it's going to make this table and this table. These are the two common tables that I make in my work as a statistician here.

So I should say that while I was totally ignorant to how to program in R at the time, thinking about statistical reporting was something I had been doing for many years. I was on the editorial board for European Urology, the journal, for a couple years, and I had co-authored the reporting guidelines for all of the studies there, and those guidelines have since been adopted by seven other journals. So thinking about how to report statistics is something I thought a lot about. The mechanism of doing that with R, I was way out of my league, but I didn't even know it.

Anyway, fast forward, we cobbled this together. It's a great way to learn how to use a programming language is to pick up a project, right? So that's what we did. So the first release came in May 2019, and I remember being so nervous about putting this work out to the public to be scrutinized and see if people could see what I did. It's really funny to think back about those feelings because I now will put pretty much anything out and be like, hey, I made some garbage. Do you like it?

But the community's reaction was incredibly kind and engaging, and it was through that and community engagement through the package added additional functionality, and the product just got better and better and better. So the package grew in both users and functionality, and it was just really exciting to see these contributions coming in, both code contributions via pull requests and ideas for improvements, some really fantastic ideas. And this was also like my beginnings of engaging with the R community, and it was such a happy time.

But the community's reaction was incredibly kind and engaging, and it was through that and community engagement through the package added additional functionality, and the product just got better and better and better.

And funny enough, everyone speaking in this session, Rich, Shannon, Becca, hello. We all know each other before this from the community. So Rich, for example, I was in Toronto two weeks ago. He gives a great walking tour of the city, FYI. Shannon and I collaborated on a couple of packages together as well, and we have found ourselves in no fewer than two service elevators trying to get to a rooftop bar that we may or may not have had a reservation at. And Becca and I have a standing weekly call because we're actively collaborating right now. So it's just I love the community. It's really wonderful.

Daniel Sjoberg - gtsummary: Streamlining Summary Tables for Research and Regulatory Submissions

Transcript#

Origins at Memorial Sloan Kettering

How it's going

The package in practice

Composable tables and regression summaries

Table cobbling

Broad community and language support

Output engines and GT integration

Getting started and ARDs for pharma