NHS R Community and Posit Quarto Q&A
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Shall I do like an official recording? I'm going to pass it over to Posit and Jeremy in particular to start the session. Thank you. Yeah, well, thanks for being here. For those that don't know me, I'm Jeremy Allen. I work at Posit. I'm on our marketing team. I've been in sales and I've been in customer success, mostly helping public sector Posit customers all around the world, lots in the U.S., Canada, the U.K., Asia, a ton in Europe. So I've gotten a good amount of exposure to public sector folks who are using Posit tools, and that includes Quarto, but also Shiny, all the other open source packages, plus our professional tools like Workbench, Posit Connect, and Posit Package Manager.
In fact, Zoe and I started planning this Q&A session when I was still a part of our sales, our public sector sales team. I've since moved over to our product marketing team where I can focus a bit more on products and really helping Posit understand the unique requirements of our public sector customers. So this event still fits really well with that. So we're happy to learn about what you're doing and how you're using the open source R and Python packages and what your workflows look like. And then I have our Quarto team or representatives from our Quarto team with us so that we can answer your questions about Quarto. So let me let them introduce themselves. Andrew, do you want to go first?
Sure. My name is Andrew Holtz. I work actually across a couple of different teams. I work at the Quarto team, Shiny team, MLverse teams, and just really excited to be here. I mean, honestly, Christoph and Carlos know way thousands times percent more than I do about Quarto, but I'm really interested to hear kind of what the challenges you all have are in adopting it and figuring out how we fit it with a small team trying to get a big product out the door. So really looking forward to the discussion. Thanks. Andrew is being humble. He's the head of open source engineering. So that includes Shiny and Quarto and all the other open source teams.
I'm Carlos. I've been working at Posit for about three years now in Quarto all of the time. I'm currently the lead engineer on the project. I started a few years ago, like I said, three years or so, on incorporating some of the interactive visualization, sort of reactive JavaScript, so very much on the engineering part. Before joining Posit, I was a professor. I used to live in Tucson until just a month ago or so. So I lived there for 10 years. I was a professor at the University of Arizona for like eight years or so. And I've always been in the periphery of open source development for sort of scientific communication. I wrote my thesis on that kind of thing and been working on it ever since. And so it's great to be at Posit and working on open source that gets to sort of help all you do incredibly important work.
So hi, everyone. I'm Christophe. So I'm also in the Quarto team. So I joined Posit initially on the R Markdown team, and I'm still working on R Markdown and maintaining the R package. But now most of my time is on the Quarto project alongside all the R work I'm still doing. So I'm coming from the R community.
Thanks, gentlemen. Appreciate it. And we also have Colin from our partner, Jumping Rivers, who's over on your side of the world. Colin, do you want to say hi?
Hi, I'm Colin Gillespie of Jumping Rivers. I've probably met most of you at the NHR over the last, I don't know, four, five, six, seven years. And as Jeremy said, we're based in the UK up in Newcastle and we do lots of R, Posit, Python things. Thanks.
Overview of the session
Excellent. And let me give a quick overview of how we're going to run the meeting today. I'm going to drop a link in the chat right now. This link is to the site where you can ask your questions anonymously. If you want to put your name, you can put your name, and then you can upvote questions from other people. So that way the popular questions rise to the top and we'll make sure we get those answered, even if we can't answer all the questions. So make sure you upvote other questions that you see that you like, and you might want to have answered. That way we cover the most important things first.
So that'll be the majority of what we do today. Go through this Q&A. Our Quarto team and myself will help answer those questions. I would like to start us off with a little bit of Q&A from us to you. We would like to know a little bit more about some of the R and Python projects you have going on. Obviously, don't tell us anything that's proprietary or sensitive in nature, but if you don't mind, let us know a couple of really important projects you're working on that require some kind of communication or reporting that Quarto might be used for, either Quarto documents or Quarto websites or now Quarto dashboards.
NHS community member projects
Hi, thanks for this talk. It's really moved me forward to it. I adopted Quarto quite early on and often thought it was Quattro, like the Audi Quattro or whatever. I was saying it incorrectly for years. And I use parameterized reporting to help support feasibility assessments for research studies that we deliver in our emergency department. So I lead a team of nurses that are funded through our NIHR. And we are often invited to take part in studies. But what's important is to know whether or not you have the right population to deliver a clinical trial. And that's what I do. So I do a lot of research in the emergency trial because clinical trials are lots of work and they're expensive to run. And if you aren't going to be able to deliver a study, you shouldn't really open it. You should focus on other studies.
So I use kind of our national data sets that we collect and use parameters to build SQL queries that then build out analysis and population reports to suggest whether or not we might have a population that would meet the requirements of a study. And whereas I used to do each one manually, I've turned it into a kind of a reproducible pipeline so that I just put in the SNOMED codes for the disease population parameters and it runs in five minutes. And I get the same report out, which I can send to the investigators and they can decide whether or not they want to pick it up and run with it or not.
So we, so most of it links to academic outputs. So there are a few papers that we've put together now that are based off data extraction through Quarto reproducible pipelines now that I've put together. So we've got one that's built around head injury processes and the whole pipeline was based around a Quarto report for extracting patients who attend the emergency department between two date ranges with presenting complaint of head injury who have a CT head. And then I extract and pull all the data that describes their pathway.
In terms of like the delivering trials for other people, we're a really very successful team and one of the most successful research teams in the country for delivering emergency trials. So, and part of that is that we don't open trials that won't be relevant to our population. So we've taken part in studies, looking at patients with chest pain and the combination of the tooling that I've got available to me in through R and Quarto and our reasonably, decently developed EHR, which collects decent data about patients, has meant that I can automate the extraction of complete observational cohort studies, which other centres in the UK can't do. And so we've put 7,500 patients into a observational study of cardiac chest pain. And we're one of the only centres in the UK who have been able to do that because everyone else would have had to collect it manually.
Thanks. It was just a bit of context, I suppose, for how we and my team use it. So, hi, everyone. I'm Chris Maney. I'm Head Data Scientist with Birmingham and Solihull Integrated Care Board. And because not everybody works in our country or indeed system, just to explain a little bit about how the health service is structured. So, we have a national health service, which is all under one umbrella, but it's actually composed of lots of different organisations who work under a common sort of banner. We have regional things that organise the care and the funding, and they are called integrated care boards. And under them, they have a so-called integrated care system. So, we have hospitals and other providers underneath those in regions. So, I work for an integrated care board rather than a provider, rather than hospital.
So, the sorts of work that myself and our team do are things like looking at population health changes, forecasting forward into the future, trying to work out what our population might need. Often things to do with dissecting the makeup of the current population, dissecting the activity at different providers. And we, I suppose, like many other people in the country, are moving, I guess, technically from a profession where really we weren't really a defined profession. We were often people who were fairly smart and people who use computers.
Which has gone through a series of people using Excel, building up SQL skills, using wider BI tools. And then really with, I suppose, data science becoming more accessible, maybe what 2012 onwards really, we started to see people who were maybe interested and maybe with academic interests picking up R or Python or other things, or bringing stuff in from other places they've worked. The NHSR community, I think, rode that wave a little bit where there were people dotted around organisations saying, how do we do this? We can't get this through our IT departments. We can't get the right infrastructure. How do we do this? Can we share and try and work together to do this?
So, I've got a small but very smart team who work across various different things. So, there's very strong SQL focus. There's a lot of Power BI work, but there's a lot of people with some R skill to a lesser or greater extent. I suppose I spend probably 80% of my time in R, but I have a couple of Python people as well. So, with Quarto coming forward in the last few years, it's been very useful to me because I can now look at a reproducible framework across my Python users and my R users. Because before that, I was pushing R Markdown because that's fantastic. But I couldn't spool that out to my Python users as well. So, we're at the point where we're trying to do that. I suppose I'm interested now on how do I move the team towards code first stuff on GitHub, whether it's R or Python, communicating in a standard way using templates through Quarto.
Underutilized Quarto features
Our first question is, what are your favorite features and capabilities of Quarto that you think are underutilized?
One of the first things I implemented when I started on the project was the observable JavaScript integration. So, this is a little on the wonky programming side for folks who are not deep in it. But there's a separate company, actually, that builds an open source product called observable JavaScript, which is a way to make your web pages, make it very easy to write web pages with interactive content in a way that is very close to what Shiny does. But instead of writing R code, you write JavaScript. So, it's kind of like a reactive JavaScript thing, which is really great if you want to write sort of fast interactive content that doesn't need to sort of talk back to a server as often in R.
And one of my really favorite things that I think is actually underutilized is that in Quarto, the R reactives that you get from Shiny. So, if you're using Quarto to build a Shiny app, which you can do. So, you can actually have Quarto generate from a single markdown page, both the UI and the server page to get real Shiny applications going. If that page has observable reactives, they both work together transparently. So, from R, you can refer to the JavaScript observables to get sort of JavaScript values. And from JavaScript reactives, you can refer to Shiny reactives. And they both work reactively and know what parts go on the server and what parts go on the client.
You know, I can say something. Like, maybe it could be only an impression, but like for me, it's all the Quarto theming system that we have. It's here since the beginning of Quarto, and I think it's like underutilized, or it was, I would say. And we worked on brand feature, which is new, and you may or not know about yet, which hopefully will simplify the use of that. But it's really powerful because it's something we started to do late in R Markdown. And so, in latest R Markdown version, we had like features thanks to BS Lib work and all the work from the Shiny team to like customize document, but it's way less advanced than what we have in Quarto. But it's required to know a bit of SCSS and some technology. And so, I'm thinking maybe we need more like documentation, templates, examples. But hopefully, the brand system will help to get like customization because it's really easy to get non-default outputs with Quarto.
I guess the answer is it's underutilized because we released it just over a month ago on the latest version of Quarto. But brand.yaml, if you haven't seen it, is a really nice way to create sort of themed documents. And they theme not only HTML, but they also theme your types to PDFs, which is quite nice. So, you get sort of this consistent output. And they theme presentations, and they theme dashboards all uniformly from the same format. So, if you need to do like branding, and you have fonts and colors, you just edit this one YAML file, and then it works across formats.
And to piggyback on that, it also works on Shiny. So, if you get your brand.yaml, you have a file that becomes very portable, not just across Quarto documents to make it really easy, but also into Shiny as well. And one of the goals that we have for this year is we're going to be putting engineering time into making it very easy for plotting libraries in multiple languages to also support that. So, if you're using, for example, ggplot, you should be able to just plot a brand.yaml file and like with one command have it pick it up, and then your charts are now themed exactly in the same way. We're hoping to do that for matplotlib and seaborn, and then perhaps for plot9 and some of the other plotting libraries.
brand.yaml, if you haven't seen it, is a really nice way to create sort of themed documents. And they theme not only HTML, but they also theme your types to PDFs, which is quite nice. So, you get sort of this consistent output. And they theme presentations, and they theme dashboards all uniformly from the same format. So, if you need to do like branding, and you have fonts and colors, you just edit this one YAML file, and then it works across formats.
I'm sorry, Jeremy. I just wanted to piggyback one comment. You know, Carlos also mentioned types. I think just in general, one of the things I think that's really exciting, again, it's kind of cheating underutilized because it's kind of on the newer side. But Quarto is continuing to invest in making types work really well. So, if you have to generate PDFs, and you've fought with LaTeX, and you're sick of that nonsense, you know, types, and the templating, and how fast it is, and it's really clean, and Quarto can work really well with it, and we're going to continue to kind of, as I said, invest and chat with them and figure out how we can make it all work really well. So, it's something to investigate if that's an area that you work in, generating complex reports. Like, we did our Posit PBC report with it, and it came out really well.
Yeah. Continuing, apparently, the tradition of systems with hard-to-pronounce words. Typst is how they want you to pronounce that. It is a system that exploded, sort of, in popularity in the last … like, it's existed only for five years, but it's sort of … stands a genuine chance of displacing LaTeX in academic circles, which is not something I thought I would ever say in my lifetime, which is fantastic. So, I've used LaTeX for my previous academic life, and it's great, but it's also really, really awful in terms of user experience and how slow it is, and Typst fixes many, many of those things. You can use it as a standalone system, and it's great, but Quarto offers great support for it, so you can have your markdown files, and you can just use format Typst, and it's an alternate PDF format. So, it generates PDFs just like they would for LaTeX, but it generates them a lot faster, and there's a lot … it's a lot easier to control appearance and things of that sort.
Accessibility and PDF output requirements
Yeah, that's a good one. I have a follow-up question, and this is for the NHS folks, about your reporting outputs, particularly PDF. Do you have strict requirements for the format of your PDF output? Different organizations might have their own requirements, but there isn't an NHS-wide thing for anything other than a national statistic. So, if you have a regulated national statistic, you have to publish it in certain formats. So, they have to be open, accessible formats. So, you have to have the open equivalents of kind of Word documents and things like that, and they're supposed to look identical. But that's the only place I've seen that requirement in our area is when you have a regulated official statistic.
I'm going to add for NHS, I don't think there is necessarily a standard, but there is in civil service. And so, there are standards around what we publish as well. So, really, we shouldn't be using just PDF. We should be publishing for public consumption things that are fully accessible. So, you don't necessarily need things to read it. So, there's definitely information from civil servants, but that's not necessarily followed by public sector.
And just to add as well, our identity colours for NHS, our community are the same as NHS generally, but we don't use some things which are particularly NHS branded. So, there are things that are copyright, like the logo, and we don't use a particular typeset font because you have to pay for the official, official one. So, we use the second one, which is available, which is Arial. So, just to say for the purposes of recording, accessibility should be something that we should be thinking about a lot.
So, to quickly follow up on accessibility, this is something that Posit is paying very close attention to. There is a, for not only in sort of UK and European Union guidelines that are sort of have existed for a long time, we do pretty well for accessibility in our HTML formats. There are a few things that we could do better, but we do fairly well. But in the context of sort of American institutions, sort of US government, there is a change in the way that the Americans with Disabilities Act will be enforced starting next year, and it will start essentially kicking in for websites as well. And so, American universities, like public universities in the United States are quite worried about this because essentially there are folks that believe that any websites that are published as like course materials even will have to pass the accessibility guidelines, or they will be liable.
Serverless Shiny in Quarto
The next question up is from Daniel, and it says, I would be interested in any updates regarding deploying a serverless Shiny application for R within Quarto. So, there are a number of ways that you can do that. So, there's a link to an extension that was developed by an external. So, James Valamuda was a frequent contributor to some Quarto features and has a lot of really good extensions. So, Codeless Professor is his GitHub username, I believe, or something like that. So, that's in the question. But we have, Posit has a number of libraries that allow that as well. So, if you search for Shiny Live, and they are developed by Barrett Schlorke and the Shiny team. And so, this is possible to do if your application doesn't do communication with the data behind the server, right?
Yes, I wanted to add that the link shared in the question is directly, for example, for the Shiny live Quarto extension. So, this extension should be working. Maybe there is case that are not handled yet, but I think like regarding the question, any updates, we would need more details about what are expected. For work, we know we have some probably things to look into it, like recently we had a question in the discussion board about doing interactive dashboard with Quarto using Shiny live, and currently it's only using Shiny not live, so not serverless Shiny. So, maybe there is more things to do, but there is no recent update on that. But as Carlos mentioned, Shiny live is developed not by Quarto directly, but another team. So, we know we need to work closer together in some use case, but we also need more use case. So, if you have like any use case to share, to really help us to know what we need to prioritize on this side.
So, the use case is primarily, as you say, like if you want to use a Shiny app, then you have to have a Shiny server set up, and that just is more overhead, more infrastructure required. So, if you want to share, call it a dashboard, call it a report with stakeholders, then if you need to set up a Shiny server to do that, then that's just more overhead. And if it was possible to do so without a Shiny server, then that wouldn't be advantageous.
In an ideal world, really, when sharing data with stakeholders, then it'd be nice to share a sort of HTML format report with them that they can then drill down on their own machine, but that's not so easy at the moment because, as I say, it kind of depends on a Shiny server. In our setting in Newcastle Hospitals, where I work, then we share, for example, antimicrobial resistance patterns using HTML format Quarto reports with stakeholders, and then they can look at resistance patterns in their own patient cohort to decide on things like which antibiotic do I feature in my guidelines, for example, based on the resistance patterns of the organisms in that patient cohort.
So we have lots of this kind of data, and I shared currently with stakeholders in HTML format reports that I send to them by email, and I guess what I'm aiming at is, like, looking for is an answer to the question, like, am I able to distribute this data in a more efficient way than sending parametrized reports to the stakeholders that they then have a static report to look at? Ideally, really, they would be able to drill down on that report slightly, and that's why I think that has some sort of role.
Yeah, I really appreciate the extra context. That's very helpful. Yeah, the basic idea is perhaps eliminate a few reports. If a single one has some interaction that could then encompass the equivalent amount of info that might be in two or three reports, they get one instead. And then if it's ShinyLive, it doesn't need a server component, and so now you're sort of maximizing what the person can get out of the report and minimizing the infrastructure necessary to make it happen.
There's a more efficient way to share reports now rather than email, but it still requires a server. It wouldn't necessarily be a specific Shiny server. It could be something like Posit Connect, which I think the NHS has, but I don't know if you all have access to it. But there is a license for it there, so you could share that way. Folks could access the reports there just via a URL.
Just to clarify, we were shaking our heads, so if we do blur this out, the NHS doesn't have its own license for all NHS, so we tend to have to have them per trust or team or whatever. So, we're seen as independent organizations, unfortunately. So, it means that it's not actually available to many people. But in the trust I'm working at the moment, we are using Flex Dashboard because the Shiny, although they have a Connect server, the Shiny isn't available to everybody without logging in because the Connect server now makes it so that it makes it public. And we can't have that public even though they're aggregate figures.
So, where Shiny has become very important over the Flex Dashboard, but we can't use it as a solution, is the scenario where you have a report, this is for incidents, and people like it in their department area, and then they want it in each other department area. So, then you have 20 metrics that become 20 times 5, but it's the same code underneath it. So, that's when reports, even when they're parameterized, we have to be very careful because you'll have like a staging bit and a development, and it just kind of explodes really in numbers. So, something like Quarto, if it can be used, is also very familiar to people, whereas Shiny can be quite difficult with its reactive nature and things. So, there are lots of colleagues who know R and could use R Markdown or Quarto really easily, but then Shiny is like another step for people in their sort of maintenance side.
I just want to add a quick follow-up on we, as a company, or like the Shiny team, in particular, is keenly aware of that barrier you just described, which is folks who come from R Markdown in Quarto, and then they need to build something on Shiny. They sort of hit a wall, to be very blunt about it, which is like, you know, this is very hard. And so, without sort of saying that—I won't go and say that this problem has been solved by what I'm about to tell you, but I think they're working really hard on it. And so, one of the things that folks are doing is they're attempting to leverage sort of modern large language models to create assistants. And so, folks at Posit are putting a lot of engineering behind taking these models that sometimes help, but sometimes can be weirdly unhelpful, and actually turning it into something that's quite usable. So, I'm going to share a link on the chat. So, this is called Shiny Assistant. You can use it today. And this is built by the Shiny team with sort of doing a lot of work behind the scenes on making it so that it is a chat interface to help you create Shiny applications.
Why Quarto, and how it differs from R Markdown
The next question is, why did we make Quarto, and how is it different from R Markdown? Really quick on why we made it, the R Markdown ecosystem had a bunch of really awesome features that people really love, and as JJ mentioned in a recent presentation, it has a requirement for the knitr package. So, folks from other languages would then have to accept that requirement for R and for the knitr package, and with Quarto, we're able to have folks experience all the great things from R Markdown, but without the requirements for R or knitr, because there's a Jupyter engine with Quarto, anything that Jupyter engine can do, you know, you can do it without having this dependency on R.
And from a user point of view, there's some really nice things about Quarto over R Markdown. In R Markdown, I would have R Markdown, BlogDown, Xaringan for presentations, so on and so forth, Flex Dashboard for dashboards, and you know, you end up installing a lot of packages. With Quarto, all of that is all within the Quarto installation, and one nice benefit of that is it makes the syntax across all of those outputs the same, whereas with Xaringan, it has its own syntax for making presentations that's different from an R Markdown HTML output, for example, but in Quarto, same syntax across, so that's a really nice user enhancement.
Yeah, I was just going to say that not only can you do multiple languages, which you pointed out, it is a lot easier to use on a polyglot team or a multi-language team, and these are becoming more and more common, right, where you have a team where some folks have R expertise and some folks have Python expertise because they have, they come from different domains, or that's just sort of what they know, and so Quarto is designed so that you can have a website or you can have even a book where one of the chapters is written in R and the other chapter is written in Python, and the end result should look the same because essentially, like, different parts of the document are handled by different engines, right, the thing that takes your code and, like, it disperses it with the Markdown we generate, and so it really is designed for sort of, like, this polyglot world in which we see R as being, like, an amazing tool, like, we're still supporting it. R Markdown is actually still supported, and, like, you know, it's not going anywhere, but there's a big whole world out there, and we want things to play nice with one another, and Quarto is very much designed sort of, okay, what can we do to make that possible?
Rendering output to a subfolder
I think we have time for one more, and there is a technical question. It's two down on the list, so I'm going to ask that one since we have our technical folks with us. Is there a way to get Quarto to render output to a subfolder? I can then have the subfolder in the Gitignore. I was secretly hoping we would be able to, like, squeeze by without asking that question because time would bail us out. Unfortunately, that's surprisingly hard for us to do well. The reason for that is your knitr code or your Jupyter code has very deep expectations of where inputs and specific files are, and when you change that, and if you want to do, like, relocatable output in that way, you have to be very careful and know absolutely everything about what your knitr script open or your Python file open, so that some of the dependencies that you might have moved, we need to move them in the right way.
We are not very well supported with that right now, so there are ways that if you really twist Quarto's arm, you can make it do it, but you're probably going to regret having made that decision. This is something we want to change in the next couple of years, so we are going to fix that, but it's not a short-term thing for the next release. There's a lot of work that we're doing on behind the scenes to enable that, and it's ongoing, but there are some formats that allow you to do that. There are some ways that you can make that work, but they are fraught with peril.
Yeah, to add on that, because the question is concise, but maybe it's about, like, the output flag. Just to be clear, with Quarto project, you can move outputs to another directory, but with single document, you can't. Sorry. So, if you use Quarto project, you will do it. It's how website works or book works. Your output will be in another directory, but with single document, it's not the same, so you could use Quarto project with default project, and you could have something like that, but it's really difficult to do per document, as Carlos said, and it's something we are careful about because also of the history of R Markdown, because in R Markdown, this was something possible, so users that know R Markdown may think, like, it's why it shouldn't be possible with Quarto, but in fact, it's really difficult, and we had a lot of issues, and path handling is really something, like, tricky, and we were stuck with what we did in R Markdown with some issues we weren't able to solve, so we wanted to do it the right way in Quarto, and it's why it's taking, like, longer or it's more difficult to do.
So, just to give a concrete consequence of that, right, so, Christophe, correct me if I'm wrong, but in R Markdown, the default option for you to use is this thing, sort of a self-contained output. You get one HTML with everything, so it's easy for you to, sort of, email them, right? You can get them, like, blogged down or, like, sort of, package them, but, like, with a single R Markdown document, you get one big HTML file with everything, right? If you want to put things on a website, that doesn't work so well because they all get all of the CSS and all of the fonts and all that in one file, and your trial is huge, so you don't get any benefits of shared resources. Quarto goes the other way around and tries to make it so that it's efficient, and, like, you can build things on your phone easily and things of that sort. As a consequence, we now need to be very careful of paths, and that's where these constraints come into play, right? So, we are, we're trying to figure that out, but it's a long path, so we're doing a little better on some things, but don't hold your breath.
All right, we're out of time. Zoe, back to you. I just want to thank everybody for coming and for putting up with some of the administration around this, and thank you so much for everybody for the questions and the answers to the questions. It's been really wide-ranging, a bit like the Quarto products that you've got as well. So, I'll end the call, the recording, I should say. All right, thanks, everyone.
