RStudio Connect | Cut down on the grunt work. Deliver insights more effectively with RStudio

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. Thank you so much for joining us for today's RStudio Data Science Live event. My name is Tom Mock. I'm going to be your host for today. I'm our customer enablement lead at RStudio. I'm going to be going through a lot of different content specifically related to RStudio Connect.

We're also running kind of a fun experiment in streaming to many different locations. So a lot of the time when we've been talking to our customers or people interested in learning about open source data science, it might be coming from YouTube or LinkedIn or Twitter. So we're trying to meet everyone where they want to be in streaming to all those different platforms. So we're hoping everything goes really well today. But if you do have any hiccups, please bear with us.

For today, this will be recorded so you can always watch the recording afterwards on YouTube. And the content will be available in terms of the slides as well as the code I show here on the screen. So the slides are going to be at colorado.rstudio.com rsc.automate. And we can actually send that out through the chat. And then we'll also have the code examples I'm using as available as well.

So again, my name is Tom Mock. We are trying out this new data science live is kind of a webinar style event, but going to many different locations. Before I get started, I also want to give a quick shout out to my colleague Alex Gold, who's one of those solutions engineering managers here at RStudio. He put together this great bike share example that we're going to be walking through in depth along with a few other pieces of code.

And then also my friend Katie Masiello, who's a customer success manager at RStudio. And she put together a great point blank demo content that we're going to be showing as well today as part of kind of the data validation, ETL scripts and other content.

Overview of RStudio Connect

So as far as the core idea of what we're going to be talking about today, we're going to be looking at RStudio Connect, which is kind of our premier platform for building and sharing, or I guess in this case, sharing data science products and building kind of insights among your organization. The core idea is that RStudio Connect makes it easy to share your data products that you build in both R and in Python. For your data scientists, they get to stay and use their preferred open source data science tooling. So again, R and Python, but they're also able to get these things off of their laptop and share these insights to impact decision making across the organization.

So they might be creating things like APIs with Plumber or sharing datasets as a pin or versioning models with pins or deploying shiny applications for interactive web applications. Or lastly, what we'll be talking a lot about today, sharing R Markdown reports or automating and scheduling scripts or R scripts through R Markdown documents.

Now for our Python colleagues, in terms of the Python data scientists, they also have a rich workflow that they can kind of put things on to connect and use. Specifically, we'll be talking a little bit about Jupyter today, which is a great notebook that actually supports Python and R inside it, just like R Markdown can support R and Python. And there's other data products you can publish to connect like Streamlet, Dash, Bokeh for interactive web apps, or Flask and FastAPI for publishing APIs or RESTful APIs in Python.

Now, as far as getting these things out to your stakeholders, there's what we call kind of the easy button. You can just kind of go into a data product, click publish, and then send it out to your organization. Your decision makers, whether they're business stakeholders, executives, or other colleagues can then access the things you're creating via a web portal with authentication and with kind of automatic scaling within your server component.

Your decision makers, whether they're business stakeholders, executives, or other colleagues can then access the things you're creating via a web portal with authentication and with kind of automatic scaling within your server component.

So while these might seem like small changes that you're doing, they're actually very powerful for connecting with your business users and helping them actually want to see the reports or kind of working with them and the theming that you're working on.

You may not even want to worry about doing a lot of customization. You could use something like R Markdown Distill. So this is like scientific and technical writing native to the web. It's got amazing defaults, and it looks really, really nice. So this gives you the ability to very quickly take the exact same code you're running, but make it look better.

And now we have a nice-looking report with very good defaults. So this instantaneously, even though it's black and white, to me, it's much more visually appealing. I've got kind of this floating table of contents where I can navigate back and forth between the different components. So with essentially no work on my part, just by changing one chunk of the YAML header, I can make a nicer-looking report, or if I wanted to go very deep, I could go full customization with Bootstrap Lib and make these reports look even better.

Sending emails with Blastula

Now the last portion we have in the last few minutes that we have for today is talking about emails and sending out emails from RStudio Connect with the Blastula package. So Blastula makes it easy to produce and send HTML emails from R. So this is very powerful in being able to combine things like ggplots or tables or data and embed them into emails that you're sending to your colleagues.

So emails can obviously be sent through SMTP, so you can do something like send them via Gmail, but more commonly we see a lot of customers use RStudio Connect to send these emails through the built-in mail server. So this uses your actually existing company email and uses that type of authentication as opposed to being limited to Gmail or something external to your company.

It's really easy to kind of render these emails. Basically, you take Render Connect email with a specific body of the document and then attach that email to the body of the document in terms of render it and attach it. So it's not only going to show it inline in terms of your actual email will be shown when they open their inbox, but you can attach different things to it, whether that's a copy of the HTML or a PDF report or an Excel file, whatever you want to attach to that.

So part of what Connect is doing is letting you bring in things like your authentication groups so you can send to your entire executive team or the entire sales team or the entire data science team and send it out to all of them.

So here we have an R Markdown report. This R Markdown report is not special in any sense. We can knit it and take a look at the repository. The main thing we're looking at here is this is what the actual R Markdown document looks like. So you could schedule this to be executed and it would generate this, which is intended for the data scientist.

Here at the bottom, and I'll zoom in a little bit, it has some information saying we can actually attach the email. That was the render Connect email and attach the email that we were doing. It kind of jumped the gun there, but it actually sent it out and this is what the email would look like inside your actual email client. So this would actually be the body of the email. You not only have text and headers, but you can embed ggplot graphics, you can print data, or even embed specific tables.

So that was the preview. Now in terms of if I were inside Connect, like let's actually go to an email one. So I'm going to go back into this and talk a little bit about the batch sending of emails because I know that was a question that came up.

Because Connect has a connection to an email server that's been installed alongside Connect, I could email the report and say, just send me a copy of this report. I can basically say do it now. But more often what I would do is like schedule the document and have it send an email out to people. So every time it's run, it sends the email out.

As far as who it's being sent to, that is up to you. I'm the only person looking at this document right now, but maybe I want to add my colleague Kelly or I want to add the entire solutions team at RStudio to receive this. So now all these people being brought in from our authentication protocol will be receiving the email if I want them to.

So I can go back to schedule. I can say send email after update. And I can send it to all the viewers and all the collaborators. Basically send it to everybody who is interested and have all this batch email being sent out.

Conditional email sending

Now that might be useful in terms of sending these emails, but what if that email gets too noisy running every single day? Maybe I don't want to see it every day. And that's also where Connect can help you or Blastula can help you. So this idea of conditional execution is basically, yes, every time a doc is rendered, it can be useful to send out an email. But sometimes you want that higher signal to noise ratio basically saying only send an email if a criteria is met or not. So maybe data quality is below expectation, or your model is predicting a value outside a specific range, or your data set is too small. It should be thousands of rows, and it's 20 rows. Basically whatever logical criteria you want to define, you can build that logic into your R Markdown doc, schedule it, and then send out emails conditionally based on that.

It should be thousands of rows, and it's 20 rows. Basically whatever logical criteria you want to define, you can build that logic into your R Markdown doc, schedule it, and then send out emails conditionally based on that.

So just a few examples of code. You know, here's an example basically saying if predictions are outside of a range or greater than a predicted value, then send me this email basically saying, oh no, the model is drifting or the model values are too high. Generate this other report and email it to me. Otherwise, if values are below that, it just doesn't send the email. It still executes the code. It still renders your report, but it doesn't warn anyone because nothing is there to be warned about.