Richard Iannone - Adequate Tables? No, We Want Great Tables
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you. Adequate Tables? No, We Want Great Tables. That's the name of my talk. It's also a true statement. Okay, so tables. They're so great for displaying information. They're underrated sometimes. I talk to people and they're not really appreciative of tables as much as I am. They can be awesome. Here's a few that I made. I even showed some of these recently in a workshop I gave yesterday. So this one's really nice. The first one here, I'm going to use my pointer. Look at the stuff it has. It has scientific notation in it. It's got some units, chemistry even. Great. This is wonderful. This one's even got plots within a table. Super great stuff. But as good as those are, other people have made even better tables.
These are ones I found on the internet. Mine can hold a count of these ones. These are really, really nice. They're super nicely styled. They have titles, footnotes. Very informational and beautiful. And so the great thing is all these tables were made with gt. It's a package for making tables. You can make useful tables for publication, tables for sharing, and importantly, you can make them look good. So the goal of the talk is to look at the story of gt, sort of like look back a little bit, discover where we came from, where we're going, and there's lots to it.
The origins of gt
So let's dive in. So when gt started, it was in 2018, March 20th. That day is when I started the GitHub repo. It was a week after I joined the company, which was then RStudio. It was wild. Back then, there were at least nine table packages for R back in 2018. Great names, too, like Pixie Dust, Stargazer. Xtable was first released in the year 2000. That goes so far back. It's wild. There's so many of them. I wasn't, you know, deterred by this. We decided to go for it, make our own, because why not? We had some goals, too. So these are like the main goals I had before even starting, or near to. Comprehensive table structuring. So number two, large selection of functions for formatting values. Three, flexible and easy to use, one hopes. Methods for table stylings. And we wanted to have table rendering that goes to multiple output types.
So in secret, I worked on gt for most of 2018, added nearly 60 functions, hitting on those topics I just mentioned, those goals. Made it public December 6th. I think that was in RStudio Conf. Oh, my God, that goes way back. RStudio Conf back then. That's when we made it public, just before that. And it was pretty good at that time. This is like a little image I had on the readme that showed the API, all the functions that are available. It was not bad for a public release.
Table structuring and formatting
So our first goal was, you know, pretty much met comprehensive table structuring. We had a basic blueprint for a table. We wanted to have things like a header, a footer, some place to put row labels and, you know, just like tease them apart and, you know, make them composable, configurable. So we worked on formalizing those components. And in the end, it makes everything a little bit easier to understand as well. And we eventually solidified our table network. We got some nomenclature, like stub, stub head, you know, these are fresh new things. And we really sort of like tried to really formalize things. And it's pretty good. To this day, we still use it. We haven't modified it much. I think it really works quite well. So we use this all through the documentation to explain how this API works.
Okay. So our second goal was a large selection of functions for formatting values. We wanted to get this package on the track. So we had to meet this goal. So we started with these formatters. They all began in FMT. And they're just for formatting values in the table body. They're super important. They're very fundamental to me. And this is a pretty reasonable starter pack of formatters for the track release, which was in March 2020.
The weird thing is, it took two years from the start of the package to get it to CRAN. That's a very long time for any package. It's only because we wanted to really get things right, not because I was super slow. I was pretty much working out all the time. Just took our time. That's all it was. And because I'm totally obsessed with formatting functions, we added more in 2021 to 2022. Pretty much almost every release had one new one, except for 0.5, which I was a slacker then. Didn't do it then. And then the last few releases, there was a large volley of formatters because I wanted to do them. But they're really useful. And they're really good. The package is better for it. So we had a ton of formatters in the last three releases. So all together, the current set of formatters, it covers a ton of formatting tasks. We have 29 of them. You may need them. You may not need them. But I think at least some people need one of these, at least one of these.
Styling and rendering
So our third goal was flexible and easy-to-use methods for table styling. Table styling is super important. It makes the table look good if you want it to look good. And one way we had it with the initial release was using the function tab style. That allows you just to, you know, anywhere in the table, add a style or multiple styles. Another way, which is also equally good, is using tab options. Just another way of inserting styles into a table. And then eventually later, we had in a much later release, we had op stylize be put in. Just a way to theme a table quickly with one function with very few arguments. It's a good starting, jumping off ground. And, you know, constantly working on this. So we made improvements to the things we added along the way. And I think it's kind of important to, you know, improve the features you already have in subsequent releases. So because of styling, we could have awesome-looking tables like this. These are not my tables at all. But they look really good because of, you know, all the styling possibilities in the package.
Okay. The fourth goal was to do really good rendering. Table rendering to multiple output types. So, you know, the steps are you have an input data table, gt object. So you use functions to make the table. And then finally, you have to report it, you know, essentially print it out to an output table. Also, you shouldn't need to have to change your table code depending on the output type. Whatever context you're in, the code just makes that table. Whether it's good or bad is another issue. So three output types were targeted initially. HTML, LaTeX, RTF. HTML support was pretty good on the release. I was pretty satisfied. But LaTeX and RTF, they were like half good experimental, you might say. But they were at least there. These were the beginnings. You have to start somewhere.
So, you know, starting somewhere means that you finish elsewhere. So we had throughout many other releases, we fixed up, improved LaTeX and RTF. And even the last release had RTF and LaTeX improvements. And HTML, as good as it is, we made changes there to make it even better. So we focused on speed of rendering tables and, of course, having accessible tables in HTML, which is very important to us.
Later, and because I'm giving you dates here, August 2022, the DocX Word output type was implemented. And Ellis Hughes, if you're in the room, he's from GSK, he did a huge amount of work here to make this really good. I'm forever in debt to him for this. But that was done in, yeah, 2022. And it's great. And we're improving that a lot as well. And looking future-wise, I'm sort of breaking into the future, we're looking at adding Excel and PowerPoint output as well. Mostly because there's super strong demand for Excel output. Like everybody wants it. Why aren't you doing it, Rich? Come on. Get on it. Do it now. Okay. But we're going to do it soon. I promise you that.
New goals and future plans
So we met these goals. We had a lot of goals. And we did sort of good. Reasonably good. We're always improving. But as we were developing the package, we also developed new goals. There's always more to do. So we got four more. We wanted to make, well, let's go one by one through them. We wanted to have the package be useful across many disciplines and use cases. So a lot of work is ensuring that the formatters are really good, that you can format according to your conventions, whatever field you're in. And we looked at lots of tables from all sorts of fields. Like there's just a smattering of the fields there. We tried to see if we can reproduce the tables, whether there's blockers for people in marine biology, for instance. And identifying those, we just made improvements. So we added things like improved format scientific, added format parts per if you know it, you need it, and added stuff for chemistry as well.
And we wanted to make sure that gt worked for users all over the world. So a big focus was ensuring that, you know, numbers, dates and times, even for words, fit the language and the region that you're in or you want to format for. So most formatting functions have a locale option. And we support nearly 600 locales. And we always update our code with CLDR guidance. CLDR is basically just a repository of localization stuff that is constantly updated. It's a Unicode project. We look at that a lot. We try to implement the best stuff from there into our code.
And good documentation. That's like super important to us. We're always trying to improve our documentation. For instance, we found it super important to have many examples for every function in the package. And the idea there is if you land on a certain page, you find something useful fast, you know, whether it's like descriptive text or some examples that are close to what you need, you shouldn't have to spend too long to get the table you want. And a big thing is we talked to a lot of people in Pharma, and they have specific needs for building tables. So that's why we added RTF support initially, because they use that a lot for Pharma-specific tables. And we also more recently included functionality for splitting tables across pages for paginated output formats. We're going to continually improve that.
Okay. So looking towards the future, I have a short list of things that we need for a future gt. Big list here. But I'll just go through them one by one. One big thing is like a way to reorder your rows of data within gt so you don't have to do that beforehand, say in dplyr. Another thing is footnotes. They're really good right now. But there's many possibilities for better affixing footnotes where you want them in the table. I mentioned Excel before. It's a popular file format. So that's definitely going to be in the future. And better table splitting. I also alluded to that way to split tables with more flexibility. Right now, we don't have much integration with database tables. If you're just to give gt a database table, I'm not sure what would happen. Probably nothing good. So we want to ensure that we can at least take in like, say, a DuckDB table or some other database table and use it as input for gt. Ways to better style text. Right now, if you have a text-heavy table, you really have to do some pre-planning, pre-work to make it look good. So I feel like there's some work that can be done there to make that better. And right now, we have new ways to merge cells, like having I talked about this with lots of people. Say you have redundant text and you just want to merge it all together and have one label somewhere. You can't do that right now. I feel like that should be a thing. So that's a future thing.
Great Tables for Python
We've also been working pretty hard on the Great Tables Python package. Right now, Python doesn't have a thing that's like gt. So we're like, yeah, it was just port over gt to Python. And we call it Great Tables. So we started the work for that mid-2022. But it really started going earlier this year. And again, the goal is to bring all that gt goodness over to the Python language through that package. And we're in the process of porting all the features while making it fit in well with the Python ecosystem.
So we're working on it. But it's actually really capable today. Tables made with Great Tables look pretty close to the ones by analogous gt table with the same code, really. So these two tables are made with pretty much the same code. Of course, one in Great Tables code, one gt code. They look kind of the same. If you superimpose them, you just, you know, they're like the same thing. So basically, our plan is to port over the most important features first. So, you know, we did that. These tables look pretty good. But there's a lot of sort of secondary features that are not in there yet. But they're coming.
Okay. This is a bit of a short talk. But that's not a bad thing all together. So this is like a wrap up. I just want to say the future is full of great tables. We have Great Tables in gt. We have these gt adjacent packages, which are great. A lot of them have gt in the name, which is so wonderful. But they're really good packages that work well off of gt and they're used for, like, pharma and other things. And, again, you deserve more than adequate tables in your life. So try out more and more of these packages. And also, there's a brand new Python version of Reactable, carried with Mike Chow, who is probably in the room somewhere. If he is, like, stand up. You're probably already standing up. Okay. But, yeah, check this out. It's not really well publicized. It just got released two days ago. And the best way to search it up is just Reactable-py. And if you know Reactable, it's the same thing, but in Python. And that's amazing. Because Reactable is amazing. Okay. So really good stuff. That is it.
you deserve more than adequate tables in your life.
My presentation is online. Find information on GitHub. And thank you.
Q&A
Okay. Thank you so much, Rich. That was a great talk. I appreciate it. Excellent tables in that presentation. I do have a few questions from our virtual audience. Will gt work with typist?
Right now, it kind of does. There was a presentation this morning showing that if you have an HTML table, either with gt or Great Tables, as long as you're using Quarto, it will translate that HTML table to typist. But I know other people are asking for native types as well. That might be a thing. If more people ask for that, I might do that. Who knows? Three people asked for it. So it seems three. All right. I'll do that.
What slide software do you use? Because the transitions look great. Everybody asks that. Okay. It's Keynote. Yeah, I swear by this. I don't know why. I don't use Quarto slides. That's an endorsement. Just because I'm used to this. It's old habits die hard. Quarto slides is great. Reveal.js. Love it. Don't use it. But it doesn't mean I don't love it.
Good. Does gt work for HTML links and buttons within the cells? Yeah. You can totally do that. You can totally we have a function for that. Format URL. So you can make links. You can actually make buttons from that function. Or you can just pop in some HTML, like just pass it through, and you got your buttons. Yeah.
So how is your documentation so good? You just put in the time. Half the time I'm putting in tons of docs, like just new examples. Just time. You just got to spend the time on it. That's all. No secret.
Can you talk about why you might not want to include Excel output? You know, is it happening that way to us? Sure. I can talk about that. Well, it's a lot of work. And it might be terrible at the beginning. So that might be bad. Disappointing. But you have to start somewhere, as I said before. And people really want it. So there's got to be an avenue where you can swing into it and it's not too bad. And hopefully people use it. And I also don't know really how much of what I don't really have it scoped in terms of what people really want from that. So just some open questions. So I'm hoping that it becomes clear as work continues on it or starts on it, at least.
In your development, do you ever default to relying on other packages to handle functionality you want? The gt summary to gt pipeline is useful. gt summary has some of those functions, cell merging, etc. Is there room to leverage that typical pipeline to prevent needing to reinvent the wheel? Yeah, absolutely. Like, like gt summary, for instance, has table conversion functions. So you can go from, you know, one table to another, and it sort of tries to faithfully convert all the stuff that you did into the other table package table. But also we use Reactable under the hood to make, you know, interactive tables, which is a nice bridge. If you don't use Reactable, you just want to do some gt, but still make a pretty serviceable interactive table. We do leverage that. So that's a good example.
Does this work with Shiny? It does. I've seen it. Yeah. It's better with obviously interactive tables, because like Shiny is just a better environment with interactive things. But I've seen both like just static gt tables and interactive ones.
Can we get a very quick comparison of gt with DT? And what are some of the benefits of gt over DT? Yeah, yeah. For instance, DT has interactive tables first, like first and foremost. And gt has that, but they're a bit slow. They're not really a server side. So you don't really have that. One advantage is, though, you can style it pretty easily. And there's a lot of formatters. So DT doesn't really have that.
Can gt do cubes and hierarchies? No, I'm pretty confidently saying no, because I'm not even sure what that is. I know what that is, but no, it can't do it. Yeah. No cubic tables. Okay.
Okay. Can you use gt as a filter on HTML documents? A filter? I've never heard that. So maybe we can get some clarification. Yeah, get some clarity, because right now I'm like, probably not. Yeah, I wouldn't do it.
Okay. And yeah, I think we're good. Thank you so much, Rich. Thank you.
