Mine Cetinkaya-Rundel - Reproducible, dynamic, and elegant books with Quarto
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Today I want to talk about reproducible, dynamic, and elegant books with Quarto. And if you've seen me give talks before, particularly on the topic of Quarto, especially if I'm trying to cover a tool, I try to do my best to convey the best practices and sort of do a comprehensive overview.
Today's talk is going to be a little bit different. It's going to be about things that have worked for me in my adventures with writing books with Quarto. So some of these practices could be best, some of them might be good, and some might just be good enough. And since we're keeping things personal, I thought I'd bring my favorite book along with me. Some of you with a keen eye might already realize, but it's The Little Prince. I love the story. Some of my favorite quotes come from it. But today I'm not going to be talking about the story, not the content of the book, but instead the making of the book.
So my favorite types of books look like this. And for those of you in the back, they look like this. These pop-up books. I've loved these since I was a child. I had zero ambitions of ever being an author, to be perfectly honest. I still don't know how I've stumbled into this. But if I could, I would actually quit what I do and make pop-up books.
Well, I'm not going to do that, but I'll talk about making of Quarto books today. And I'm going to talk about three books that I've worked on over the last year. The first one is Introduction to Modern Statistics. This is an open source introductory statistics book. And the characteristic of this book related to this talk is that it uses code, but it doesn't teach or show code. So figures and tables are made with code in a Quarto document, but the purpose of this particular textbook is not to teach code.
The second one might be one that many, many of you are familiar with, R for Data Science, that I've had the pleasure of joining the team to write the book in its second edition. And as you probably all know, it does teach code. So we have code and we display code. And the last one is a work in progress. And it's a little meta. It's a book about Quarto written in Quarto.
Introduction to Modern Statistics: multiple outputs
So let's start with IMS, as I call it, Introduction to Modern Statistics. And there are two things that I want to highlight about this book, multiple outputs and accessibility checks. So let's start with multiple outputs. Ultimately, I want a web book, an HTML, and I also want a PDF. And this seems like a no-brainer working with Quarto. We set format HTML, format PDF, and we should be on our happy way to go. But if you take a careful look at these two pages, these are not default themed pages.
There are these boxes, for example, at the beginning of each chapter that sort of give us a summary. There's an icon, there's coloring, so on and so forth. And our goal was to get them to look as close to each other as possible. How do we achieve this? Well, it's one source. So we put it in a fence div, and we give it a class. So in this case, that's the chapter intro class. And then for HTML, with the help of meticulous styling and with an SCSS file, I write a class for chapter intro and basically give it some things. Like, I want some padding, I want some borders, I want some background color, I want an icon.
That works nicely for HTML, but how do we get it to work for PDF? Well, maybe some of you will recognize this with a lot of ambition, coffee, frustration. You can make it happen, and you can get close enough. What you're seeing here, even if you've never had to write LaTeX code in this way, and if so, I envy you, is that we're defining some sort of basically the equivalent of divs environments. You can see that we're using the same name, you can see that we are referring to the same icons, and ultimately try to get them to look as alike as possible. And then Quarto allows us to define a theme where I pull in this SCSS file, and then for HTML, and also include in the header, the style file that I wrote.
Now, what is HTML without dark mode, right? So we have to have dark mode, obviously, because my students are really willing to study all night long, so they need the dark mode. So how do we do this? Another SCSS file where we tweak things minorly and make sure that things are still legible, and we pull that in our Quarto YAML file as well. But this all sounds like magic, but it's not all magic.
Then comes the PDF, which requires page breaks. And if you've ever had to do LaTeX documents that are multi-page, you have to painstakingly add these clear page or new page type commands over and over, wherever they go. But the nice thing is you can. The word I've recently heard used was litter your QMD file with these tags, because HTML processing will completely ignore it. But when you're rendering to PDF, they will get picked up, and another, and another, and ultimately, you actually get to have the page breaks where you want them.
Now, let's bring back the magic. The fact that a QMD to HTML process will ignore things that it has no business means that I can continue to litter my QMD document with index tags. So here, for example, I'm defining a term that in the HTML format I just want bolded, but when I have a PDF output, so for the printed book, I actually want to make an index, so I tag that. And then, in addition to tagging this throughout, which is a manual process that you as the author are going to want to do, the things that you want in your index, you then write a tech file that is going to be included at the end of the processing of your PDF that basically says in the back matter, print index, and then your Quarto YAML gives you a place to declare that as well. It says, let's include that after the body and create our index. Then, when you render to PDF, Quarto will actually take a little bit longer and render things twice to get the page references correctly for all your index calls.
The word I've recently heard used was litter your QMD file with these tags, because HTML processing will completely ignore it. But when you're rendering to PDF, they will get picked up, and another, and another, and ultimately, you actually get to have the page breaks where you want them.
All right, well, that was all nice, but what I am looking forward to, and I have the, you know, luxury of giving this talk towards the end of the conference, is typed. So today, it's one source, two style files, and two outputs. Hopefully, next time I'm touching this book, I'll be able to get away with one source, one style file, and two outputs. We'll see. If we can get there, great, and if not, that's also fine.
But another thing that working with these multiple outputs does, in addition to the styling, is sort of your code output. So today, a lot of the code in the book looks like this. Let's do some data processing to create a table and then make a table out of it that will be printed both in HTML and in PDF. And the kable extra package is wonderful for this, I've found, but I have to declare things, some for HTML and some for PDF. So each of my code cells where I am creating tables have a lot of lines of code. Why is that annoying? Because sometimes, I change my mind, and changing everything can be quite tedious to make sure I catch all of the occurrences of these.
Accessibility checks
All right, the next thing that I want to mention in the context of this textbook is some accessibility features. Now, Quarto allows you to write, for example, alternative text for your figures, and the best practice is that as you create your plot, you write the alternative text at the time of writing the caption or whatever else. And then what happens is when you are actually reading this book in the browser, the screen reader, to the screen reader, this accessible text, the alternative text is accessible to it, and it reads it out loud for you. Unfortunately, I hate to admit, particularly because this was a second edition of a book, at times I've gotten sloppy and maybe forgotten to add these in a few places.
Then I look to see how many places, how do I make sure that all of the figures have alternative text. Well, it looks like every single plot is created with ggplot in this book. So if I search on GitHub, there are 46 files that have ggplot calls in them, and actually there's about 419 references to ggplot. So I'm definitely not going to manually go through each and every one of these cells. So what I'm about to show you is not a package, is not even necessarily a single function, but a snippet of code that you can make use to sort of parse your QMD file. So the package I'm going to leverage here is parsermd. I'm going to use that to, once I read my QMD file, and note that I've broken one of my QMD files from my chapter to remove one of the alternative texts so that we can get an alert when something is missing. I pipe that into the parseQMD function, which basically splits it into the markdown text and the code. And then I'm looking for anything that is a code chunk that basically matches ggplot2 in its contents and does not have the option fig out. Then I can ask it to give me the label for that. I do meticulously label all of my code cells, so this is good enough generally for me to go back and add that. But if you haven't done so, you can also say give me the contents of that code cell, and then you can just sort of like control F in your document to find them. I think in a future iteration, I'd maybe like to turn all of this into a function and maybe even have it be automatically checked every time I push my files to GitHub. But at least the tooling is there for you to be able to check for these things.
R for Data Science: leveraging R and GitHub Actions
All right, let's move on to the second book, R for Data Science, and we're going to talk about two things here. Continue to talk about how we can leverage R as we write Quarto books, and also talk about GitHub Actions a little bit. So when you want to solve a problem in a new tool, you can try to learn everything about that new tool, or you can say, do I have anything that I brought with me that I can maybe use? So at the beginning of each of our chapters in R for Data Science, we load a file called common.r or underscore common.r. What are the types of things we do for that? We set a seed for every single one of the chapters, a very special day of the year for those of you who know.
We set code chunk options that are going to apply to every single one of our code chunks. We can override them on a per chunk basis, but we want something consistent. Some R options, and also set a theme. My collaborator loves theme gray, even though I'm more of a theme minimal gal, so we stuck with theme gray. And we set a base size of 12, so all of the plots look consistent with each other. And we can keep going with this. These are very obvious things to put in a common R type file and to start every one of your book chapters with.
But other things we've done is that this was a living and breathing document, and a lot of people use R for Data Science, and we didn't want to keep the text away from them as we were working on the second edition. In fact, many a times we've asked for feedback as we were writing the text. So how do we do that? We can share that with the world, but we also need to alert them that things might be in shambles, particularly for people like me who are educators. We don't want people sending their students to a page and then the next day, like, we've completely altered that page. So we wanted to have some sort of signaling.
And at the time, either there wasn't or we couldn't think of a way to do this purely with Quarto, so what can you do? Use your R function writing skills as an escape hat, and to avoid duplication. So write a function that says, we'll give you a status, and based on that status, choose the callout box type and write some text for us. And at the beginning of each one of our chapters, we set a status, and as we went through the drafting and polishing and complete, we updated these so our readers could know. Well, this was back in the day. If you're doing this today, Quarto actually does have features for doing this, the announcement banner that you can add to your Quarto pages. So you could choose the approach where you can have more, perhaps, fine-tuned control of what that message might look like, or you can sort of use an out-of-the-box option with Quarto.
All right. Another thing is that you want to be able to set it and forget it when you're writing a book. So how do we keep things in check daily? Well, I live a life where sometimes on a Sunday morning at 10, 18 a.m., I get emails like this, something failed. And I'm like, okay, we'll get to you later. So how is this happening? So one of the things we do with R for Data Science is we have completely avoided the freeze option for Quarto, which means that every day when we run the checks, every single line of code reruns. This may very well be overkill for many projects. In fact, for many, many projects that I work on, it totally is. But we wanted to make sure that particularly at a time when we were making changes to R for Data Science, as well as the packages it is writing about and updating the packages, that at all times we could check to make sure that the latest version of that package and the code were in check with each other. So every night at 11 p.m., R for Data Science rebuilds. And some nights it's a nice green check. And some nights we get an email. And then we look to see what can be done to fix it. So using GitHub Actions, you can do these daily checks.
You may have heard this quote before. Whenever faced with a problem, some people say, let's use regular expressions. Now they have two problems. There's a nice Hacker News-like thread about where the source of this is, who the source of this quote is. And here's one from me. Let's use GitHub Actions. Now you have so many more problems. Well, don't reinvent the wheel. There is, you can borrow from other people's GitHub Actions. And you can also, there's a Quarto Actions repo that you can sort of like grab actions from to start with.
Quarto, the Definitive Guide: multiple languages
And finally, briefly, let's talk about Quarto, the Definitive Guide, which is a work in progress. So here we're using multiple languages and multiple computing environments. So we want two languages in a single QMD file. And we want each of them to be executed with our own engine. So not using reticulate basically in between them. How do we do this? We're actually using the embed option that you're going to see a lot of mentioned in Quarto manuscript-type projects to create separate notebooks that we're going to embed individual cells from. So here we have a tab set with R and Python. And we're embedding from these notebooks, each of which have an R notebook and a Python notebook, each of which have a code cell called plot. And now you can see that as a result, we have been able to sort of like go between. And if you recognize outputs of Knitter and Jupyter engines, you'll see that these look like how you would expect them to look like.
And here I'm going to say let's do use Freeze because we are managing multiple environments. And the last thing you want to do when you pour yourself a nice cup of coffee and say, I'm going to write some today, is to like manage a virtual environment. So if your collaborator has done a bit of work and frozen their computations, let, you know, benefit from that and use the frozen computations and work on your chapters. So I'm going to say so you can safeguard your sanity with Freeze as well.
Looking ahead: interactive books
All right. To wrap things up, I've tried to talk about making books. And I hope that you will find that many of these books are pretty, but I also want to think about making books that are functional. What do I mean by that? When you have pop-up books, sometimes you have things like this, where you're interacting with a book. And sometimes it is just to keep the child busy. And sometimes that interaction is really to tell the story. So these little tags allow you to discover more about the story. So what I'm looking forward to in my next Quarto book journey is this project that George talked about earlier. We can now have code cells that are actually interactive in our books. So hopefully in the next editions of these books that display the code, that's what you're going to be seeing. Thank you so much for listening. The slides are at this link. And the repo has the code for the slides as well as some of the code snippets that I've shown.
Q&A: accessibility tools
We have time for just one question while the next speaker comes up.
You talked a little bit about accessibility in your talk. Are there any automated tools you'd recommend for assessing accessibility?
Yeah, so you can automate some of these tools. So the type of check that I have done, you certainly can automate that. There are great R packages even that will check for the accessibility of the color choices of your plots, for example. So that would be another one that I would recommend. I'll also do a plug for this open source tool that I've encountered a couple of years ago that I've been using called SimDaltonism. Not automated, but it will put this like filter on your screen and you can check for color accessibility sort of on the fly. And as someone who's not necessarily an expert in either accessibility or web design, but who cares about it, I like these interactive tools that, you know, sort of teach me things along the way as well.
