How to schedule a Quarto document on Posit Connect
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello everyone, my name is Ryan Johnson and I'm a Data Science Advisor here at Posit, and welcome to this month's Enterprise Community Meetup, where we'll discuss another end-to-end data science workflow using Posit Team. As a reminder, this is a recurring event on the last Wednesday of every single month, and its goal is to provide the data science community with an overview of Posit Team, especially for those that may be unfamiliar with our professional tools or maybe for those that are looking for a refresher or hoping to explore new capabilities and features within Posit Team.
Every month, we'll cover a different topic, and we're going to highlight all three tools from within Posit Team. We have Posit Workbench, Posit Connect, and Posit Package Manager. During and after each presentation, members of the Posit Team will be in the chat to address any questions you have about our tools, workflows, licensing questions, and anything else you may be curious about.
So for this month's topic, we're going to take a deep dive into one of our newest open-source tools called Quarto. Quarto is a technical publishing system, and in many ways it's similar to R Markdown, but there's no dependency on the R programming language. So if you're a Python, Observable, or Julia developer, you can also create a Quarto document.
Now if you're new to Quarto, you are absolutely in the right place, as we're going to create a brand new Quarto document from scratch using the RStudio IDE within Posit Workbench. We'll then show how Quarto can be used in combination with Posit Connect to fuel some really extremely powerful workflows.
What is Posit Team?
Now before we get started, I just want to quickly review Posit Team, and we'll start right here at the top with your data scientists and your data analysts who will be creating insights using Posit Workbench. And they can choose to write code in either R or Python, and they have a variety of IDEs to choose from, including RStudio, JupyterLab, Jupyter Notebooks, and VSCode.
Now for the R developers, they may be creating content such as Shiny applications, Pins, R Markdown, or Plumber APIs. And for Python developers, they may be creating content such as Flask or FastAPIs, Streamlit, Shiny, Dash applications, Pins, Jupyter Notebooks. Now once the content is created, the developers will need a way to easily share that content with viewers, and that's where Posit Connect comes into play.
Connect not only makes your content shareable, but it also gives your team tight control over content access, and can also be used for job scheduling, which we're going to talk about today.
Finally, we have Posit Package Manager, which does exactly as its name implies, and that's to host, organize, and distribute open source R and Python packages, not only from the community, but also any internally developed R and Python packages.
Now for today's workflow, we're going to use the RStudio IDE from within Posit Workbench to extract and save some data as a pin, and then we'll create some insights from that pin data using Quarto, which we'll then publish to Posit Connect, and then run it on a schedule.
Today's data: package download metrics
So to get started, let's first talk about the data we'll be analyzing today. And I thought it might be interesting to extract some R package download data from within Posit Package Manager. And we can do this via an experimental API from within Package Manager, which can be used to gather lots of great information, including package download metrics. So this is great for auditing purposes, or maybe you're just curious to know how popular is a certain package from within your instance of Posit Package Manager.
For example, here's some example data down here of what can be gleaned from the Package Manager API. On the left hand, we can see the R package name, in this case, ggplot2, and then the number of downloads of this package on a specific day.
Now this data is constantly updating as people continue to download packages from Posit Package Manager. So one of the goals for today is to automate the data extraction and visualization steps so that our final reports will reflect the latest and greatest data.
Introducing Quarto
And that brings us to Quarto. So for today's demo, we're going to create two Quarto documents. One will be for gathering, processing, and then saving our data. This is often referred to as an ETL, or extract, transform, load, workflow. The other Quarto document will be to generate a report with tables and plots to visualize our results. But what exactly is Quarto?
Well, Quarto, first and foremost, is an open source tool. So if you have a computer, and you have internet access, you can create a Quarto document. It's perfect for scientific and technical publishing, and can really be thought of as like a next generation of R Markdown, or our way of bringing R Markdown to everyone. And since there's no dependency on the R programming language, you can create a Quarto document if you're an R developer, Python, Julia, or observable programmer.
You can also create Quarto documents in whatever environment is most comfortable for you, including VS Code, RStudio, which is where we'll be today, Jupyter Lab, or any text editor. And you can also create a variety of content, including articles, reports, presentations, websites, blogs, and books in a variety of formats, including HTML, PDF, Microsoft Word, EPUB, and lots more.
So let's review our workflow for today. So we're going to draw in package download data from Posit Package Manager's API, and we're going to pin that data to Posit Connect using Quarto. We'll then show you how to schedule this Quarto document to run using Posit Connect, which will automatically update the pinned data.
Finally, we'll create a second Quarto document for visualizing our package download metrics, which we'll also publish and schedule to run on Posit Connect. We'll also apply a custom theme to the second Quarto document to make it look really nice.
Now, for our data, we are going to use Posit's public package manager here. So let me click on this link. So this is a free hosted service that provides standard mirrors of popular R and Python repositories, including CRAN, Bioconductor, and PyPI. Additionally, you can track repository changes over time. You can freeze repositories to specific dates and install packages as pre-compiled Linux binaries.
Now, this instance of package manager is used by hundreds of data science teams throughout the world and will serve as a good proxy for understanding our package download trends.
Creating the ETL Quarto document
So to get started, let me come back here to my slides. To get started, we're going to show everyone how to create a Quarto document from scratch using the RStudio IDE. So we're going to go into Posit Workbench here and open up a new RStudio session and then create an RStudio project for our workflow.
All right, so we're going to be using our demo environment here at Posit. So I'll select Posit Workbench. You can see I already have an RStudio session running, but we're going to create a brand new session for today's workflow. So I'll click New Session. Here, I can choose my IDEs, and I'm going to select RStudio for today.
We can give it a custom name if we wanted to. Now, our demo environment's in a Kubernetes cluster, which allows me to choose various cluster options. I'll just leave it as default today, but if I did have a more computationally intensive job, I could choose a larger instance. But we'll just leave it as default, and I'll hit Start Session.
All right, here we are within our new RStudio session, and in the top right corner, you can see Project None. So I'm not currently within a project, so let's go ahead and change that. I'll select this option at the top, click New Project.
Now, we have some options of where we want to place this project. It can be in a new directory, it can be in a new directory, an existing directory, or pull in a project from version control. We'll select New Directory for today. Select New Project, and I'll give this directory a name. I'll just say Demo Test, and I'll put 123 after it. And we'll hit Create Project.
Now, anytime you open up a new project, it will open up a new RStudio session, so we'll just give this a few seconds to boot up.
All right, so here we are within our new RStudio project, and you can see the name of our project here in the top right corner. And our first step will be to create a blank Quarto file. Now, we can do this by clicking this New Blank File dropdown right here, and we can select Quarto Doc. I'm going to give this name of this Quarto document ETL for our extract, transform, and load workflow here, and hit OK.
All right, so here we have a blank Quarto document on the left-hand side of our screen in our source panel. So next, with the help of our visual markdown editor, so you can see I'm currently on source, I'm going to click on Visual.
So we're going to add our first component to our Quarto document, and that's going to be the YAML. Now, YAMLs are usually found at the very top of Quarto documents and are used to provide various metadata to our documents. So let's go ahead and add a YAML to this blank Quarto doc, and then we'll add a title and an author. So to add a YAML, I'll go to this Insert dropdown, and all the way towards the bottom, you'll see YAML block. All right, YAML has these three dashes at the top, three dashes at the bottom. Put my cursor right in between, and I'll start typing title, and we'll give this title, I'll call it ETL. We're going to add one more key value pair here. I'm going to say author, and I'll put my name, and that's it.
Now, just for right now, we haven't added a whole lot, but let's go ahead and render this Quarto document, and by default, it'll use HTML. So we can click on this Render button, and we'll let Quarto work its magic. All right, so here's the current state of our Quarto document. We have our title, and then the author.
So next, we're going to add some R code, and we do this by adding an R code chunk, or a cell, to our document. And this can be found in the same kind of Insert dropdown menu. So I'm going to click Insert, Code Cell, and we're going to add an R code chunk, or a code cell.
Now, to save some time, I'm actually going to copy and paste some R code, which I'll explain here in a bit. So for right now, I'm just going to copy off screen here, and I'm going to paste into this code chunk some functions to load a few packages for our workflow for today. But again, we'll explain all this code here in a bit.
So next, I'm going to copy in some text, which provides the reader with some background information regarding the ETL process within this Quarto doc. So again, I'm also going to copy this off screen, and I'm going to paste it right here into my Quarto document. And you can see this nicely rendered markdown text within this visual markdown editor. Let's go ahead and render this document and now see what it looks like.
All right, so we still have our title, there's our author, you can see some of the code and its output right underneath. And then down here towards the bottom, you can see all that text that we added, and it's starting to look really nice. So let's close out of this. Now I'm going to add some more code and text. And I'm going to explain this code and text in a little bit. So I'm just going to go ahead and paste it all in here. And so pretty much we have our final ETL Quarto document. And we can now publish it to Posit Connect.
So I'm going to go ahead and start that process now, since some of the R code takes a little bit of time to run. And while it's publishing, we'll go back into that Quarto document and explain some of the code and the text that we added. So to publish this Quarto document, you'll see this little blue button at the top of your screen, this is our publishing button. So I'll click on this. We're going to choose Posit Connect. And we're going to publish this document with the source code, which is really important for job scheduling.
Alright, so we're only going to publish the single ETL dot QMD file, which is what you see over here on the left hand side, we're going to publish it to our demo environment of Connect, which we call Colorado. And we'll just leave the title as ETL. And I'll just go ahead and hit publish. And once we do that, down here in our console, there's going to be this deploy tab that opens up. And this is all running behind the scenes. Essentially, what's happening is that it's capturing my environment. It's looking to see what packages I use, what versions of those packages, what our version Quarto version, it captures that sends it to Connect, Connect replicates my environment, and then deploys this Quarto document.
So let's come back into this Quarto document and explain all the code and the text that I added. So in this first code chunk right up here, we are just loading some R packages for this analysis. And then in the text right here, we're essentially just, you know, adding some information about what this ETL process is doing. And essentially, I'm creating two custom functions, which will query that Posit Package Manager API.
So then we use those two functions in this following code chunk that starts right here, it's a pretty big one. These are the two functions that I created. So once we create those two functions, basically, what we're going to be doing is using those two functions to gather package download metrics for three packages, we have ggplot2, dplyr, and shiny. This first code chunk right here is going to get the total downloads over the last 30 days. For each of those packages, it then combines it into a single data frame or a table, and then it will print the results.
And then this last code chunk, it's already done publishing, we'll come back here. In this next code chunk right here, we're going to get similar data. But instead, we're going to get the total downloads per day over that same time frame, the same 30 days for ggplot2, dplyr, shiny, and then we're going to print just the first 10 rows of that resulting data frame. Finally, we take the package download results, and we're going to pin them to Posit Connect.
All right, that's all done right here in this code chunk. Now for those that are new to the pins package, this is a very easy two step process. We first register our pin board, which in this case is Posit Connect using the board connect function. And then we'll write the two data sets as pins to Posit Connect using the pin write function.
Now for those that are new to the pins package, this is a very easy two step process. We first register our pin board, which in this case is Posit Connect using the board connect function. And then we'll write the two data sets as pins to Posit Connect using the pin write function.
And that's pretty much it. So here is the published Quarto document now on Posit Connect. But I'd first like to show you what the two pinned data sets look like on the Posit Connect server. So let me open up a new tab here.
Duplicate this one. I'm going to come back to the content on our connect instance. And here are those two pins that we just pinned to Posit Connect. So I'll click on this first one. And at the very top, you can see the name of the pin followed by a link to download the data directly. Next, there is an R code snippet that gives it users instructions for how to read this pin into their workflows. And finally, we can see a paginated preview of the pinned data.
Scheduling the ETL document on Posit Connect
All right. Now let's switch back over to our Quarto document that we just published to Posit Connect. Now, as a publisher, I not only have control over who has access to this document, but I could also schedule all the code to run on a recurring schedule. Now, since the data we're working with, this package download data, since it updates daily, let's set this document to run every single morning at 7 a.m. So to do this, I'll first click on the schedule tab right over here.
And we'll select the box to schedule output for default. So next, you can choose your time zone and then the start date and time. Now, we're going to set this to run every day at 7 a.m. So let's move the time to 7 a.m. And we're going to choose daily, which is already selected. All right. So essentially now this will run every single morning at 7 a.m. This way, the pin data sets will remain updated without any manual intervention. You can essentially set it and forget it.
You'll also see two boxes down here at the bottom. So that first one means that the report you see on your left will be updated and republished every time the document runs. And the second box right down here is to send an email to users letting them know the document was rerun. Now, we are going to create a second Quarto document right now, and we're actually going to demo this feature.
Creating the report Quarto document
All righty. So let's go ahead back into Posit Workbench. I'm going to close out of this ETL script. And we're going to create another Quarto report. I'll select a new blank file. Quarto Doc. Now, I'm going to call this document boring report and hit OK.
Now, I'm first going to create a YAML just like we did before. I'm going to add a title and an author. And again, we'll use the visual markdown editor to help us out here. So I'll go to insert, scroll down to YAML, add a title, and I'll call this boring report. You'll see why it's boring here in a second. And then I'll have author and I'm going to add my name.
So next, I'm going to copy in some code to load a few R packages. So let me first go ahead and create a code chunk or code cell. And then I'm going to add in some code right here. We're going to load in two packages. We have the pins package and also the tidyverse package. So after that, we're going to add in a little bit of text here, which is just kind of introducing all of our readers to the report. So I'm just copying and pasting this off screen for the sake of time.
All right. And now I'm going to copy in some more code here. And the first code chunk is basically going to read in that pinned data. So if you remember when we looked at that pin on Posit Connect, it actually included some R code snippets to draw in those pins into our workflows. So I'm going to go ahead and copy that code. Let me first actually give this a title right here, which we can have a two hashes. I'll say package download metrics. Add in a new code cell.
And I'm going to paste in this code. So again, we first register our pin board, which this case is Posit Connect. And then we're going to read in those two pinned data sets.
Now, finally, I'm going to add two code chunks. The first is going to print that first pin, this one right here. And it's just going to print kind of the raw data, which is going to be a data frame. The second, we're actually going to create a plot using the ggplot2 package. And it'll look at the same download over the same data over the last three lines of code. And it's going to print the same download over the same data over the last 30 day window. So we can kind of visualize some trends in our package download metrics.
So let me go ahead and grab that code. I'll first add a code chunk here. And again, this first one is just going to print the results or print that first pin data set right there. And then I'm going to add another code cell. And this is going to create a plot. Again, using the ggplot2 package. So let's go ahead and render this Quarto document to see what it currently looks like. And again, this is our boring report.
So render this and we'll give it a few seconds to run. And here we go. So this is our final HTML boring report. You can see there's my name. And then we have our code. And again, if there's any output from that code, whether it be a table, a plot, or just some text, it'll show up directly underneath that code chunk.
We have some of the text that we added right here. All right, and then we have our package download metrics. So this is where we connected to our Posit Connect instance, and then read those two pins. This first code chunk, all it does is just print the first pin. So you can see those three packages and how many downloads over the last 30 days. And then the second code chunk right here, it creates a plot. And it's a pretty good looking plot. All right, we're looking at trends of those three packages over time, which is along the x axis, downloads along the y axis.
Adding custom theming
So this report, you know, we created, it certainly gets the job done. But it's kind of bland, and certainly could use some sprucing up. Also, it could be really nice to add some custom theming to our Quarto document that includes our company's colors and logo. And yeah, we want to create something that would really impress our leadership. So to get started, I'm actually going to come back here to Posit Workbench. And I'm going to download a custom Posit format that I created and is now hosted on GitHub. And that's what you're seeing right here. I did this just a few days ago. And this is a simple Posit Quarto theme.
Now, if you're ever curious about how to create your own theme or template, the Quarto website has some great documentation for this. So here we have for custom HTML themes, and also for creating customized templates for your workflows for your team. So coming back to this custom theme I created on GitHub, I included some instructions for how to use it. So I'm going to copy this code, I'm going to come back into Posit Workbench, and open up my console down here. And within the terminal tab right here, I'm going to paste that command, and we'll hit enter.
And we'll basically just follow the instructions. The first thing it asks, do you trust the authors? Well, the author is me, so I certainly trust myself. I'll say y for yes. And then we can give this template, it's going to be into a directory. So we can give that directory a name. I'll just say posit theme and hit enter.
And now if you look in our files directory, right down here, you'll see this posit theme folder. Now within this new directory, let me click into it. It includes a template Quarto document, which is shown right here, it is posit theme dot qmd. And it also includes an example HTML output file, which is right here. So if I click on this and view in my web browser, here's what that Quarto theme looks like. And it looks pretty nice. You see we have a nice logo at the top of the screen, all the headers and the text and the links basically reflect Posit colors that we use for our company. And so this is kind of the template we want to use for our report.
So let's go ahead and open up this posit theme template right here. And I'm currently viewing it in the source viewer. So I'm going to actually delete pretty much all the placeholder text right here. I'm going to come into my boring report, I'm going to click on source. And I'm just going to copy everything from our boring report, copy and paste it into our nice report. And we'll select the visual markdown editor again. Alright, let's just go ahead and render this posit theme, this custom theme report and see what it looks like.
Alright, so here is now our theme report. So again, you can see the logo, the headers look really great. It's starting to look pretty good. Alright, the text looks good. We got a lot of code, we got a lot of kind of a basic kind of noise here, like, do we really need to see this? You know, we can probably really spruce this up a little bit. And also the plot, while it's still a good looking plot, you know, the colors are the generic ggplot2 colors, maybe we can have those reflect Posit colors.
So to improve the look of this Quarto document, I'm going to close out of this. Let's first add a custom Posit color palette. And we're going to do this using a package called R Color Brewer. So let me go ahead and first load that into my environment. So using a library function, R Color Brewer. And we're going to add that palette directly underneath here. So I'm going to copy this off screen, come back here into this R code, and I'm going to paste it right in there. So this is all the code we need to create a Posit themed color palette, which we're going to use for our plot, and also for our table.
So let's come back down here, we have our pin, read our code chunk, and then we had this code chunk, which, as we know, just printed the raw data frame, it did not look very good. So let's convert this printed text output. And we're going to basically make it a table, and I'm going to add some Posit themed coloring. To do this, we're going to use a package called gt. So let's first load that package at the very top. And I'm going to copy in the code to create a gt table. And we'll run it here. So you can see the output. So I'll first run library to load the gt package, I'm going to run this code chunk here within Posit Workbench.
All right, we'll come back down here, let's read in those pinned data sets. And now I'm going to come in right here, I'm going to replace this text with a gt table. And if you've never used gt tables before, they are so much fun, create really nice tables. So I'm going to paste in all the code. And let's run it. And here's now what our table looks like, which looks a whole lot nicer than just that raw data frame text.
Finally, let's change the look of our ggplot to include our Posit colors. And we'll also tweak the look by adding a minimal theme and customizing the title and the labels as well. So I'm just going to copy this code off screen, come back here, I'm going to replace this text with some more ggplot text. And let me run this so you can see it here within Quarto. And there we go. So you see some nice Posit colored lines right here.
And this is already starting to look a lot better. Let's go ahead and render it and see the current state of this Quarto document. Alright, starting to look pretty good. Still have our Posit theme, scroll down here to our table. So that looks really nice. And then we'll scroll down here to our plot. And that also looks really nice.
Now, it looks pretty good. But since most of our viewers of our document won't be really interested in the code, let's go ahead and hide all the code and any kind of messages from the final HTML output. So let me close out of this. I'm going to scroll back up to my Quarto document, the very top.
So we're going to add a few things here, I'm going to add an execute key. And we're going to apply some code chunk options, which will basically apply to all the code chunks in the document. So I first want to hide all the code from the output by setting echo to false. And then we'll hide any warning messages by setting warning to false as well. Alright, I think this is gonna look pretty good. Let's go ahead, I'll save it, render this final Quarto document.
And here we go. So now we're not seeing any of the code, we're only seeing the output, the tables and the plots. Alright, so we can see our nice table right here, which again, is looking at those three packages, and total downloads over the past 30 days. And here we look at those same three packages, every single day, their download metrics for the past 30 days.
Publishing and scheduling the final report
Alright, this is a great report. So now that we have this final report, let's go ahead and publish it to Posit Connect, just like we did before. So I'll click on that blue Publish button here within Workbench.
Select Connect, we're going to publish this document with the source code, since we want to schedule to run just like before. We're going to publish it to our same demo environment, Colorado, here's all the files and some of the dependencies needed for that custom theme. And we'll leave the title as Posit theme as well.
So again, anytime you publish from within your RStudio IDE, using this blue button, the next thing will happen will be this deploy tab. And again, essentially, what's happening is RStudio is capturing my environment. And you can see some of the logs here, it'll print out, you know, certain packages that are needed for this Quarto document, and the associated versions.
So we'll just give this a few seconds to run. And then we'll come back together once it's all published. Give us a few more seconds. It's now rendering the Quarto document, it's the last step. It's running all those various code chunks, there's the output file. And there we go. And now here is our final report now hosted published to Posit Connect.
And so once it's here, again, the main advantage is you have tight control as a publisher over who has access to this document. So for example, if I wanted to define specific users or groups, I can define them right here. So if I wanted to share this with Rachel, for example, I most certainly could. So I'm going to go ahead and share this with Rachel. For example, I most certainly could. And now we would be the only two people here at Posit that would have access to this Quarto document this report.
I could alternatively select this middle option right here, and basically have all users login required. So as long as you can log into Posit Connect, you can view this content that would essentially open this up to all Posit employees.
So next, we'll want to make sure that this Quarto document is scheduled to rerun so that it reads in that new pinned data set, which again, is being updated every single day at 7am. And it updates this report so that these numbers reflect you know, those new package data every single day. So let's set it to rerun basically daily at let's just say noon every single day. And we'll go through the same workflow, I'll click on schedule, schedule output for default, choose your time zone, and I'm going to select noon. And we'll run this every single day.
Now this time, we're also going to send an emailed notification to our teammates to let them know the report was run. So to do this, we select this box right here at the bottom. And we can add our collaborators or viewers, which we assigned previously. So that was Rachel, or I can come in here and I can add Rachel directly, for example.
And so once we do this, Rachel basically every single day around noon, she'll get an email with a direct link to this report so that she can view the latest package download metrics. And if we switch back to my slides here, I actually have an example of what one of those email reports looks like. So you can see here that it's sent directly by Posit Connect with me as the sender. And it links to the latest versions of the report. The subject line you see up here at the top is also customizable. And it also attaches an HTML version of the Quarto report directly to this email.
So you can see here that it's sent directly by Posit Connect with me as the sender. And it links to the latest versions of the report. And it also attaches an HTML version of the Quarto report directly to this email.
All right. And so with that, we have come to the end of our workflow for today. Now I hope everyone found this month's demo helpful. And we'd love to chat more about Posit Team, Quarto, pins, job scheduling or anything else. So feel free to stick around and we'll have a few Posit folks available to answer any questions you have. Thanks, everyone for joining. And I hope I look forward to seeing everyone again next month.
Thank you so much for the great demo, Ryan. And thank you everybody for joining us today. I see there's already been a few great questions in the chat. We are going to do things a little bit differently than the last two months where we're going to try and answer questions live instead of just in the chat as well. So as a reminder, there's also a Slido link open for anonymous questions. And so that's at P O S dot I T slash demo dash questions. And I'll put that here in the chat in another second here too. But we're going to send you over to another platform for the Q&A and it should automatically send you there from YouTube right now. But thank you again and have a great rest of the day. See you over in the Q&A.
