Resources

Joshua Cook - Quarto: A Multifaceted Publishing Powerhouse for Medical Researchers

video
Oct 31, 2024
19:23

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Give me a hand in welcoming Joshua Cook, who will be talking about Quarto as a multifaceted publishing powerhouse.

Thank you so much for having me. I look up to a lot of the people in this room, so it's a real honor to present today. My name is Joshua Cook. I'm from the University of West Florida. Here's a little bit about me.

I recently graduated this past May with my master's in data science. Before that, I was in clinical research, and then before that, I was in biomed. Right now, I'm an adjunct professor at the university. I teach anatomy and physiology, and I work with a lot of Ph.D. researchers and a lot of physicians. I was a research quality analyst and a clinical research coordinator for a year before that. Actually, for three years before that.

Medical research. I've heard a lot about pharma. I want to insert my background a little bit here. I have never worked in pharma. I've never done clinical trials data. I'd like to. I'd like to learn it and get into it, but my experience has mostly been with phase one clinical trials or benchtop studies. That's mostly in academia. Sometimes you'll get it done in healthcare and industry.

When I talk about medical researchers, I'm talking about M.D.'s, D.O.'s, nurse practitioners, occasionally Ph.D.'s, or if you're responsible for preparing documents for somebody in this category.

The medical research dissemination process

The general process. You start with study planning. You initiate your study. There's some data collection, data analysis, and then dissemination. Pretty similar to clinical research, except that we don't have as strict guidelines in terms of our data structure, and then these are usually, like I said, in academia and stuff like that.

The publication needs are a little bit different. Our main goal is not a regulatory submission. I've only ever worked on one or two of those briefly before. My main goal in my roles has been statistical reports, and sometimes there'll be a simulation study before the report. Interim reports while a study's going on. Interim presentations, if a doctor or a Ph.D. is going to give a presentation at a conference and the study's still ongoing, and then when it's done, a final report, a final presentation for them to go present again.

They usually want the manuscript submitted as soon as possible. They want a blurb for their website. They want it on their blog, and then there's usually this new idea around patient-participant material. The patients themselves are starting to ask for their results back, so whenever you have people that are taking part in a clinical trial, they've been in it for six months, they want to know what their labs have been looking like. Well, it's very, very hard to extract that if you're entering data into an EMR or some dense medical system, and so being able to very quickly extract that information for that patient and then allow the doctor to have a presentation on hand to talk with them at their next visit's really helpful. Finally, sometimes there will be a regulatory submission.

Just in summary, this is a lot of work to keep up with for medical researchers, people who are not used to coding, people who don't do submissions, especially if changes are made. Many, many times a doctor will look at my statistical report and say, it's great, and then they're getting ready for their presentation, they're actually out there, they're not at the institution anymore, and they're texting me, I need you to update figure one as quickly as possible. Well, I do that for them, but then every single document that I made before that's out of date.

So that's really what my presentation is focused on, is how do we use Quarto to make it to where when I update one document for the physician or the researcher, everything else updates with it, we have a change log, we have something to say what it looked like before.

So that's really what my presentation is focused on, is how do we use Quarto to make it to where when I update one document for the physician or the researcher, everything else updates with it, we have a change log, we have something to say what it looked like before.

So here's the traditional method. You draft an initial document, everybody approves, all documents are generated, I've got a report, a presentation, et cetera. Then I get a random edit request, I update the report, and then I frantically copy-paste everything that I updated into all the subsequent versions. So yes, the doctor wanted figure one to be different, but now I've got to change it everywhere else. During that time, something gets missed, an error is made, or some more edits have been requested since the first request. Then the author, the medical researcher will go out and present, and they accidentally present something that's not true, or some outdated piece of information. They get frustrated with me, the stakeholders are frustrated, and then we continuously repeat this process. It looks like some of you have experienced this before, so that's good.

Quarto projects overview

This is where Quarto comes in handy, it's the next generation of R Markdown, I won't go too deep into what Quarto is, I think everybody is aware at this point. But with Quarto projects, it allows you to create various project types from a single source. And when I talk about project types, I'm talking about an organization of files. So we're familiar with reports, and manuscripts, and presentations. This could also be a Quarto website, or Quarto blog. And then when I talk about project formats, I'm talking about how it's going to be output in the end, an HTML, a PDF, or a .docx.

So a general setup with Quarto, as with any project in RStudio, you need a primary folder. I'll go over my folder structure in just a minute. These Quarto projects are the directories, they're the central location for all of your folders, your images, your scripts, and then any output that you're going to generate. This is a key concept for this presentation. We're going to do three different projects today in this talk. We're going to generate a report, we're going to generate a manuscript, and we're going to generate a presentation with templates, all from the same source code.

So Quarto projects, they have this shared YAML for metadata. Each of the project types typically have a different setup for their YAML. I've got examples in here. But when you're in RStudio, all you have to do is click the create project button. It gives you an option to include a directory name and change where your working directory is going to be at. And then finally, in project type, you can start setting these things up. So Quarto website, blog, book, manuscript, they're all listed there. It's very easy to click them. There's also terminal commands to set these up. It's pretty straightforward.

Whenever you start a new Quarto document, it's going to give you a template file, usually index.qmd. It comes with some markdown code, and there's an example below. It also includes a very basic YAML header. The title is the only thing in this one, and then an example R code chunk. You can change the language if you want to. If you don't want to use R and you want to use Python, that's supported. So is Julia and a couple other languages. Toolbar gives you very easy access to editing tools without knowing markdown syntax. And this is just the basic layout. So you can continue to add chunks and markdown content as you need.

The YAML header is very, very important. This is a versatile human-readable data serialization language. This is where you configure how your document is going to populate. So you'll see throughout my examples today, a lot of front matter. The title, the subtitle, date, author, et cetera. The important sections are format, which designates the output format. This is also where I designate all of my templates. So if you're interested in, oh, my author usually generates manuscripts for a very specific journal. I use the Public Library of Science in my example. You can download that template, and then it will automatically output to that when you render the PDF. There's a bunch of YAML options. I included a link to go explore more.

Case example: file structure and the report

But here's my case example for medical researchers. So this is the file structure. My working directory is called PositConf. And within PositConf, I have these folders, data, analysis, images, manuscript, presentation. And then within the working directory itself, I have my report, our project, my index.qmd, and my report to YAML. So the report itself is going to be in the working directory. Everything that you're about to see is within this index.qmd down at the bottom.

All right, so in my report, it's very, very simple. I have this front matter where I'm supplying a title, a logo, the author names, and the date. This section is what you probably want to pay attention to. This is telling it that it should be output as an HTML document, don't mess with my tables, and embed all the resources so it doesn't have any dependencies. And then you can hit Render, and it will populate with all the front matter that I specified. I also added in a code chunk that is not being executed, but my lead statistician may want to review, so I included my code. And then I included this figure. This figure is an analysis of all the missing data in our dataset.

Shortcodes for modularity

Now, importantly, neither of these things are coded directly within my report. You'll see that I use what's called shortcodes. So in a nutshell, the goal of this is to maintain quality and consistency through modularity. Quarto shortcodes are special markdown directives that generate our content from another file. So if you look back at my file directory, all of my actual QMD files are in this analysis folder. I'm not doing anything within the Quarto markdown. It's not hosted locally. It's just pulling from this QMD. All the data is in one separate file. It's not being generated and brought into R each time I want to do a report.

And then down here, you can see there's two types of shortcodes I use. The first one is embed, which directly embeds that Quarto markdown file or Jupyter Notebook into my report, my presentation, my manuscript. It's just going to run the code that's there. And if you're not careful, it will copy in everything that's in that QMD. So everything that's generated will be put into your document. You can specify what you want, though, using hashtags. And then include is a direct copy-paste. So it's not embedding the QMD. It is just grabbing the code and then running it in your main file.

The example of the include shortcode is on line one. I didn't specify any particular figure, so it's going to throw everything that's in my data processing QMD, everything that's being output into my documents. The second one is an example of embed. And you'll see that there's a hashtag. That's where I specified, I only want figure one. There's a bunch of other figures in there that I was doing as part of EDA, but this saves time and processing and makes sure you only grab what you want.

Manuscript and presentation outputs

So here's the YAML for my manuscript that's in the manuscript folder. It's a little bit different, pretty much the same front matter that I copy-pasted. But this time, it is a PDF. Most manuscripts want a PDF. And then I specified a template. So please use the Public Library of Science, specifically for Global Public Health. This was a public health study. That way, any specific formatting that needs to be done is automatically going to be applied. Here's the manuscript, again, with all the front matter. And importantly, it's that same exact figure. Figure's not being generated within the manuscript QMD itself. I'm doing it using shortcodes.

So I just want to keep using that idea of never include all of your code in one Quarto markdown because every time you render, it's got a process. And then if you change one thing there, since it's not being shortcoded in, you have to go change it everywhere else you mention that figure.

And then finally, for the presentation, same thing, same front matter, little bit more complicated YAML header. I had to specify not only a template, which in this case is a Reveal JavaScript template called Clean, but also self-contained preview links. There's a couple different features you can add in that are really similar to PowerPoint. So if your medical researcher is used to PowerPoint or to Keynote, they may want those things in there. And then everything else is front matter.

So here's the presentation. It's the same exact template that I used for this talk. It's the title, all of the front matter that I mentioned before, the logo, same exact figure being shortcoded in one line of code from my analysis script. Now, importantly, when that medical researcher comes back to me as they're about to present this in five minutes and they say, figure one is wrong, I want you to update the caption. As long as I go do it in my data processing QMD, whenever I go to re-render these documents, that change will be applied everywhere because I used shortcodes.

As long as I go do it in my data processing QMD, whenever I go to re-render these documents, that change will be applied everywhere because I used shortcodes.

I've been in the mix where things kind of get messed up, where they have a certain copy that I generated last week, and then I generated this new copy, and everything's just a mess. This saves that, especially if you're using GitHub, because then you can track your changes as everything is taking place.

And then furthermore, researchers don't know how to use Reveal JavaScript sometimes. They prefer PowerPoint, Keynote, that's fine. You can just change the format to PPTX, and then it will output those figures, those tables, the inline code that you referenced into a PowerPoint format. That way, the only thing they really have to do is copy that slide from your new output into their existing presentation. That makes it very quick and easy. They don't have to worry about resizing. There's no messy screenshotting of reports. It's what physicians have done with me in the past. It's also in the updated report, they just take a screenshot of what I generated and the whole thing's messed up. But yeah, everything will update for their new presentation.

So in this case example, this is really important because all of these documents are frequently used by medical researchers. And you could expand it further to Quarto websites, blogs, or other types of projects where maybe they have a website, where they have a research portfolio, maybe they're applying for grants. I've heard all of those things. You could add that in there as well. As you're updating your one QMD file, every subsequent Quarto output is going to update. If an error is identified or something else, simply altering those files will queue them to refresh.

And I'll show you a setting in a second that you have to put into the Quarto YAML to kind of freeze at that time point. And then anytime there is a change detected, it will re-render that document. If another template is needed, we've all been rejected from journals before. This is probably one of the biggest time-saving tasks that I've had is we'll submit to the Public Library of Science, and they'll reject us. And then we'll say, okay, submit to Nature. Well, then you have to go reformat the references, the figures, the file structure, everything needs to be changed. Instead, why not just enter in one line of code to change the template over to Nature, and then specify that in your YAML header? Everything else will be automatically rearranged. The only catch is some journals require you to submit figures and tables separately. In that case, you will want to disable the generation and then just save them natively to a file.

So even if the researcher prefers traditional PowerPoint or Word, you can still output to those. It just gets a little bit tricky with tracking. Still recommend to protect against redundancy when updating your analysis code though. And then, in other words, we are effectively moving beyond copy-paste. We're making it to where they no longer have to take the report and screenshot whatever change I made. It's automatically in all their documents. If they have access to the directory that I'm outputting these documents to, maybe if it's on a shared drive or something like that, I can just say, hey, every document that you need is now in the updated drive. Saves me a ton of work and it's really efficient for submissions.

Advanced features and conclusion

So to wrap up, there's a bunch of advanced features that I didn't want to spend time going into because I could spend a whole lecture on them. But the first one is inline programming. So again, instead of copy-pasting values from tables or from figures, which can change, you know, I go back and update the analysis code, suddenly the mean has changed by two units. Just use inline programming. So that can make sure that no matter what format you're outputting to, that value is being pulled from my actual analysis. It's not being copy-pasted from somebody else.

Second one is everything's dynamically updating based on the quarto.yaml file. So under execute, there is a freeze function and typically it's set to true. If you set it to auto, it will automatically re-render files if changes are detected. So let's say you want to render a report and everything's fine. You don't want to keep re-rendering that processing file because it takes time and computational power, especially if nothing's changing. Only change if somebody has gone in there and edited something.

Number three is templates. I used a template today for the manuscript and for the presentation. There's plenty of open-source templates online. So if you have a specific use case or a specific journal that you're always submitting to, I highly recommend using them. I will never personally format by hand for specific journals anymore after figuring this out. Number four is collaboration tools. With a little bit of setup, you can set up a hypothesis which allows for commenting notes and highlights, which is what medical researchers are used to with track changes and Word documents. This works with Zotero libraries for reference tracking, which is great. Zotero is free. It can import references from PubMed and other sources and then input them into your references within Quarto Markdown. There's also some compatibility with EndNote, but you have to do some trickery because it is proprietary software.

Number six, hopefully, this will enhance patient communication. Very quick release of their documents when they're wondering where they're at in a study. It's their data. They have a right to see it, especially if there's nothing stopping them from unblinding or any other concerns in the protocol. And then number seven, hopefully, this will speed up the delivery of treatments because physicians and other medical researchers no longer have to spend time converting my statistical report into their manuscript, into their presentation, into their website.

So, in conclusion, Quarto is a multifaceted publishing powerhouse for medical researchers that allows us to efficiently create these polished formats from a single-source document. There's templates available to support a wide array of submission requirements. There's tools for collaboration. And this will hopefully enhance participant communication and the speed of delivery to patients.

Limitations, the setup and organisation may not be appropriate if your project is small. This is a little bit of heavy lifting at the front end. Tables seem to have a problem with the embed shortcode. Not sure on a fix yet. I just, you know, coded directly into the Quarto markdown. And then I really want to think about ways to reuse the markdown text because we found a way to reuse our code, inserting the same exact figure into multiple outputs. What about all of the context that you're adding in markdown? How do we take that, maybe with a large language model, and repurpose it for other formats?

And that's it. I want to thank my partner, Nathaniel Nicholas, and my mentors over the years, and Mine, who gave a great talk. I linked it here. She did a very similar presentation on Quarto. My QR code is here if you guys have any questions. And I am actively seeking a full-time position if there's any people in the audience. Thank you so much.

Q&A

Thank you, Joshua. That was a great talk. Quick question here. Have you ever experienced any resistance to using Quarto for slides instead of PowerPoint that the presenter can edit last minute? And I guess my follow-up on top of that is it seems like a PowerPoint is an output option. So how do you know that they're not editing it last minute?

So typically, they can't edit my figures because the figures are being outputted as an image. And so they'll email me or text me, hey, I need you to update figure two. In that case, I know they need me, and I will just re-output the PowerPoint. And then they can copy that whole slide from my output of PowerPoint into their final PowerPoint. That makes it very quick and easy. If they're editing content, that's their own thing. I don't add content to the slides. Typically, I don't have the background to do that, and many of us don't. If we want to talk about, you know, latest developments in cancer research, that's all on the physician or medical researcher.

Great. Thank you so much. Thank you.