Richard Iannone - Great Tables for Everyone | SciPy 2024
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you. So my talk is Great Tables for Everybody because I'm talking about the package Great Tables. Incidentally, I've got three goals for this talk. Make you like tables by the end of the talk, especially Great Tables tables, the package, the tables you make from Great Tables, and convince you of their goodness in science. We can use Great Tables for Science, we're at SciPy after all, so I want to show you some examples of that.
Okay, to start, why do we make Great Tables? We did it to enable the building of tables, obviously, but it's also focused purely on the display of tables. If you ever had to share a table with somebody, you want to make it look good, you want to make it presentable. So that's the whole idea here. So again, this is not the only approach to table making in Python, but what we have is something that's comprehensive, it's actively developed, and it's really great for science-y tables, as you'll soon see.
And then there's a bit of confusion with tables, right? What are we talking about, really? We can give them other sort of alternate names, like display tables, summary tables, presentation tables. These are just kind of like tables you want to show to people that don't look terrible. That's the idea.
So what I want is... What I want to see in the world is less of the stuff on the left, which is basically just a DataFrame printout in a console, and more of the table on the right, which looks super nice. It has a title, it has nicely formatted pretty much everything. So that's the world I want to live in.
Why not just use a DataFrame or Excel?
So how do we make tables today? Okay. So we could do this. Raw DataFrame, give it to somebody, maybe it's fine, probably it isn't. I don't recommend it. I wouldn't do it. Because you can't even see all the data, for one thing. And it looks kind of like not so great.
Another idea. This is something I'm guilty of. Take your data, dump it to a CSV, and then bring it to Excel, make your table there, and then bring it back to wherever you need to bring it back to. That's okay. But the problem is your reproducible workflow, if you want one, is now broken. So terrible. I don't like it.
What I'm recommending, the whole purpose of this talk, is to use Great Tables, make really great tables entirely in Python. So it's reproducible. Probably less effort once you learn how to use Great Tables. And the tables, they look good. I'll show you more good-looking tables soon.
What I'm recommending, the whole purpose of this talk, is to use Great Tables, make really great tables entirely in Python. So it's reproducible.
How Great Tables structures a table
So let me tell you a bit about how Great Tables thinks of a table, how it structures a table. It's basically done with components. So here's the most basic form of a GT table. It's just got column labels, a table body, pretty bog simple. But we have this other feature called a stub. It's a little thing that goes to the left, essentially another column, but it's kind of set off. It's nice. You can put row labels in there, and in the top left, you can put a label on top of all these labels. So it's not always needed, but it's actually quite useful.
Now another thing you can do to add structure to a table is bundle your rows into groups, and those groups can have labels themselves. Nice way to sort of structure your data. We also let you put in a title and a subtitle in the table header. It's great for describing what the table really is. And at the bottom, you can have additional notes in the table footer, kind of source notes, but you can use them forever. So additional information about the table.
And another way to structure your data is you can structure your columns as well. Use spanners for that. We see this all the time in tables printed in, say, journals, for instance. So we call those spanner labels. You can have unlimited levels of those above the column labels. So yeah, that's how Great Tables looks at a table, and this is used all throughout the API.
The Great Tables workflow
Okay, so when you make a table with Great Tables, here's a sort of prototypical workflow. You have your table data. So what you want to do is you want to put your data into a form that's reasonably close to the actual output of the table. So we can use all sorts of tools, like Polars and pandas. In this talk, I'll be using mostly Polars. And then we have our GT object. That's when you can add all sorts of stuff, just use the API, make a great-looking table. And you can preview this table inside the notebook. You can sort of iteratively create the table, which is nice. And once you have the table, it looks satisfying to you. You can output it to HTML, save as an image. You can have types to output via Quarto, if you heard of that. And you can use these tables wherever HTML is accepted, essentially. Web apps, notebooks, et cetera.
So we have a lot of datasets inside of Great Tables. They're great for making examples, and they're just great to have. So we have 16 of them, which is a lot. But they're useful, because today I'm going to show you three tables using three different datasets. And because we're making the tables from code, we're benefiting from the reproducible workflow. So a bit of a spoiler. These are the tables I'm going to make. It's kind of good to show you what we're going to end up with. So three datasets, three great-looking tables made with Great Tables.
Example 1: reactions dataset
I'm going to start with the reactions dataset. And here's the final look at the table. And I'm going to show you just some of the features that I want inside the table. So a title. That's what I want. A column spanner. That's going to span across three columns. And then nice column labels. I want to be nice. It's not like the original labels from the data frame itself. You know, sort of dressed-up column labels. And then format. We're going to format the values so they're in scientific notation. Pretty cool. And then stylistically, we're going to change the font from the default font. And we're going to apply a table theme so it looks sort of, like, all dressed up.
So what I have... I'm going to show you lots of code in this talk for the three different tables. This is some Polars code. And we don't have to look really closely at this. But one takeaway is... One good thing is, like, we use Polars selectors here. And also within the Great Tables API. Because it allows you to select columns with these simple selectors, like, starts with, ends with. Things like that. So we're going to import that. And that's useful in Great Tables.
So the first thing we do is use the GT class. And now we have our HTML table. Very nice. We have a stub as well. Then we're going to add a title. So we do that with tab header. That's the method. So just a bunch of methods is what we're using. And then very simple arguments to build up the table. So there's the title. Next, we're going to add a spanner label above four columns. And see here we're using in columns, PS ends with. We're using some Polars inside of our Great Tables API. Super cool. And a really cool thing here for science is that between these curly braces, we have a special notation for defining units. So it's nice. And define things which are, you know, otherwise a little bit difficult to define, like, superscripts essentially.
And another thing we do is we're going to format some values. We can format compounds, like chemical formulas. They look really nice. But format units. And all those values in scientific notation would format scientific. And then there's missing values. There was a bunch of nones in the table. Doesn't look really great for presentation. So we're going to substitute that with just M dashes with submissing. And then finally, we had a column which had nothing but missing values. That's pretty, you know, not really useful. So we're going to hide that column from display with calls hide.
Then we're changing our column labels. I recommend doing this every single time. Just changing your column labels from the default to something nice. That's what we did there. And then finally, we're using opt stylized to apply a theme. Then finally, we're doing one more thing. We're changing the padding horizontally. So there's more gutter space between the different columns. So it presents better. We're changing the default font to a different font using a stack. Which just means a theme, a font theme, essentially. Here's our table. Looks better than the original data frame. I would present this somewhere.
Formatting methods overview
So I'm going to take a little segue here. Let's talk about formatting methods. We have lots of them. I just want to show you a few examples of how they're used. Which methods we have. Here's an unformatted column. If you were to use format number, we can get things like right off the bat, grouping separators and fixed decimal places. And that's without using any options. If you just want integers, you can use format integer. If you want scientific values, just like you've seen, format scientific is your friend. If you want percentage values, you can use format present. Currency values. Great Tables knows all about currencies. You just give it a currency code. It knows what symbol you use and how many decimal places past, you know, like the decimal mark. Really awesome. There's some specialized ones, like format bytes, in case you need it. And each of these methods has an argument called pattern. It lets you decorate the formatted values with any sort of literal text around those formatted values.
Example 2: flights dataset
Dataset number two. Example number two. Using the doFlights dataset. So here's my Polars code. Just to get a huge table into a table that's only 14 rows and four columns. I won't go through this, obviously. But there's the code. This works. So we use GT. That's how you start off any sort of Great Tables table. And we have this table. We're off to a good start. Adding a title, as before. Very useful in tables. This is adding a stubhead label in the top left. We're just adding isotope. And then we're formatting that number of values. And also some values in scientific notation. So you have a few arguments there. And you can actually scale values and choose a number of decimal places.
And we have missing values. But in this case, we're gonna do a targeted selection of two columns with two submissing calls. It's only because we want the replacement value to be a little bit different in each one. Okay. Finally, we have another method called data color. In this case, we have categoricals down that column decay underscore one. And all it does is sort of, like, use those colors in the palette. And the first seen value just gets that color and so forth. Finally, we're using calls label to make the labels look presentable. And we're actually doing a little bit of alignment. So that decay one column is center aligned. Because I think it looks better.
Finally, we're doing some final tweaks with the opt methods. We're changing the alignment of the table header to the left. And we're also changing the vertical padding so the table is not so tall but a little bit scrunched up. There's our table. Looks really, really good. How do we present this?
Another segue. I'm gonna talk about another method, data color. Heat maps and tables. This is a table made with three tables. And the cool thing about it is it has a heat map. Care of data color. So we colorize data by value. And the cool thing about that is we get to emphasize differences in values. It's nice. And we get to reveal some trends in the data just from doing that. So if you look at, you know, one row, you can compare across measures without even looking at the values. You just look at the sort of color. And same with columns. You just look up and down. You sort of get a sense of, like, you know, what the scale is just from the colors changing.
Example 3: Gibraltar weather dataset
Finally, third dataset. Third example. Gibraltar. Anybody ever visited Gibraltar? One person. Whoa. Nice. So basically this is a weather dataset. And what I've done is I used some Polars code. Probably pretty poorly. But I did my best. And what I've done is I made list columns with temperature and humidity. Because the idea there is I want to show a lot of, like, basically all the temperature data per day. So I've got an aggregate of every day. There's some temperature data. There's some humidity data. And we're gonna make a table out of this.
So this looks bad. But it's gonna get better. Trust me. So I added a header with a title and a subtitle this time. And you might see that, you know, like, the list columns, they look fully printed out. It's gonna look better. Because what I'm doing is I'm gonna use a method called format nanoplot. And what it does is it actually makes little plots inside of your table. And these are, like, really nice fun plots that you can customize. And they pack a lot of data into a table. And they fit nicely inside the table cells. They're very customizable. So what we've done here is we've used the nanoplot options helper and sort of saved that to nanoops. That's what you see on the right. It just means that we can customize these individually.
And then one final thing I'm gonna do for formatting is format the dates so they look kind of nice. I'm using a format date to do that. And it has a bunch of, like, keywords where you can just really simply from some presets format the date. No must, no fuss. That sort of thing. Calls label. We're doing a cool thing where we actually make it so you get degrees Celsius there with a little bit of curly brace code. Because you can get symbols from that. Otherwise it's a little bit hard to do. And there's our table. So a lot of data in a table. And it's a really great summary of weather.
Nanoplots
Third and final segue about a method. This one's format nanoplot. Nanoplots were actually inspired by Sparklines, which you may have heard of. They were popularized by Tufte in his writings. And there's an implementation, of course, in Excel. And people really love that feature a lot. And so it seemed like a cool thing to do to add to Great Tables. So here's another table. Using a dataset that's actually in Great Tables. And basically has nanoplots in it. And the cool thing about nanoplots is that you can see right away for this patient, the white blood cell count increased beyond the normal range. And that's super important. If you just had numbers, the time to that insight would probably be a little bit lower. So this is really good to have. And it's also really cool. Because you can actually interactively inspect exact values. So features of tables, like having precise format values, are there inside nanoplots. You just have to hover over the values. So that's really super, super cool. And actually kind of fun. And did I mention that these are all super customizable? You can have bars. You can have just lines like this. It's really quite nice.
And the cool thing about nanoplots is that you can see right away for this patient, the white blood cell count increased beyond the normal range. And that's super important. If you just had numbers, the time to that insight would probably be a little bit lower.
And nanoplots, of course, you can probably think of a million ways to use them. Here's just three more examples. Stock prices. Weather data. Not too dissimilar from what you've already seen. And some sales data based on pizza sales.
Wrapping up
Summing up, you can make really nice, beautiful, I'd say, publication quality tables with Great Tables. I think tables are cool. I really feel like they're gonna get their day one day. And I know that people love good looking ones. When I see a table with data I care about, I'll pore over that table for quite a while. It's really amazing. And you don't have to compromise on reproducibility when you use Great Tables. You can basically use Great Tables in the visualization step. You don't have to dump out your data. You can just use it all in one go. And your reports and such will probably be greatly improved because of that.
So to get started with Great Tables, it's on PyPI. So you can just pip install Great Tables. And there's a great project website we have at PositDev. I'm from Posit, by the way. PositDev and on GitHub, Great Tables. Just search it up. We have tons of examples there in the examples gallery. And we have a blog as well, which has a few articles, even some new ones. Okay. Thank you so much.
Q&A
I think in your first example you showed some code where you could use various ASCII characters to modify the formatting of some labels, like subscripts, superscripts, that kind of stuff. Do you also support inserting LaTeX for equations and Greek characters?
We have Greek characters actually within that type of formatting. LaTeX equations, that's coming soon. Or later, I should say. And LaTeX as an output, like format for tables, is also forthcoming as well. So right now you have, like, if you learn to use that sort of like shorthand, you can actually insert symbols surrounding a keyword with colons. There's quite a range of, like, the usual symbols you would expect in units.
Does any of this work in the terminal? Yes. You can run it in the terminal. And we now have a show method. So it will actually emit the table into a browser, yeah. So you won't just get the object printed out.
What I mean is, can you render any of this fancy stuff? Oh, I see. Like, terminal rendering. Yeah, no. Like, rich does that. Like, the rich package is, like, adept at that sort of thing. We don't really have that. We might have that in the future if there's sort of lots of demand for it.
I was curious about the nanoplot function. Are you using another library to handle that, or did you write your own plotting handler? I just did it myself. Basically it's SVG creation. So essentially, yeah, it's all code that's within Great Tables. We may strip that out and make a separate library, because it feels kind of strange that it's just, like, trapped inside there. It's probably useful. But right now it's just, like, our own sort of code.
I'm curious about the use of Polars selectors. Does that work for pandas inputs as well, or does Polars selectors work on Polars inputs? Yeah. You can have... Of course. This can ingest pandas data frames, but Polars selectors do not work with pandas. Essentially we have a... This is before narwhals came out. We sort of made our own abstract backend. And so we accept some things, you know, from... Only Polars data frames will accept Polars API stuff. Behind pandas, you'd have to use, like, a lambda, or just the column names, or go without, essentially.
We had an online question as well. They were asking... Since you had a lot of colors shown in your tables and such, do you have color-blind friendly palettes? Yeah. In Datacolor, we do have a bunch of keywords that allow you to select palettes, and we have a few color-blind ones from Viridis, or Cividis, I should say, which is a branch of Viridis.
So you mentioned that you generate HTML. I'm curious, with the nanoplots, are you including the data as well in that HTML plot? Is it, like, base64 encoded and you render it? Essentially, it's all HTML. It's pure HTML. And that's it. But there's no base64-encoded stuff in there at all. Oh, the plot is an SVG, which is totally acceptable. Just a tag inside. Got it. Yeah. That's how it works.
