Nathan Stephens | Scaling Spreadsheets with R | RStudio

This code here, I dusted off that thing that took like weeks and weeks to build and I recreated it from the ground up and are in about six hours. From six weeks to six hours, literally.

Summary and Q&A

How do you scale spreadsheets and are by using powerful coding tools that make your work reproducible and communicate with apps and notebooks?

So this boundary, hopefully this boundary is a little bit clearer. I think I know I drew it as a hard line. I actually think that's a pretty wide spectrum. Right. I think there's a pretty big difference between what spot you're going to move from Excel to R. But hopefully this presentation has helped clarify why you would use R instead of Excel, even in the event you're not doing like a bunch of random forests. Right. Even in the event where you just need to do some regular old data science exploration.

So the last thing I will say is that in this R space, you know, you're doing data science. You know, you can do data science in Excel. Right. You can do it in R. And Excel is a great tool. It's got it's Turing complete. Right. It's got power query built into it. Right. You can publish things to Power BI. You know, you can. You've got the visual basic in the back end, which is not really my cup of tea, but it is a programming language and it works.

The nice one of the nice things about R is it really clarifies when you're using the code. And I'm going to suggest to you that, you know, in data science, the source for all your results really is your code. It's not it's not the report. It's not the output, but it's the process that you use to get the code. So in Excel, if you think about Excel and you like share some output from Excel or like in part. So you take some pivot tables from Excel and put them in the PowerPoint. So that's nice. That output is really valuable. But the source for that was the process that you went through to get that information. That whole process is where the value is in data science. And when you use things like R scripting or code or scripting language, like you get that part as a natural part of the process.

But, Christina, I see you just asked the question around what if the executive wants the report for every combination of parameters? Do you have strategies to iterate through them and generate a full PDF report? Yeah, so if your executive wants every combination, it sounds like your executive wants to do data science. You can build tools for that executive to do the data science. And that's what you would do with the shiny application. You build a shiny application and you put that out there.

So what we actually do are Tableau dashboards, but then we also have to publish all like we have to do all the filters for like every combination, basically. And then we have this like C++ program that takes web shots of all of them and like puts this report together. And it's a really complicated process. That's what that's what I was getting at with that.

Oh, that's so interesting. I actually I actually took that part of this presentation out, but I gave this so I can send you a link. Another version of this presentation actually does the web shots, puts them into PowerPoint presentations programmatically and would allow you to do something like that. I think that the so Tableau is a great tool. Tableau and RStudio products are basically solving the same problem. It's just that R is doing it with code and Tableau is not. So I'd be surprised if there were things that you're doing in Tableau that you couldn't do above and beyond R. But the thing that you mentioned, we're talking about like programmatically pulling all combinations and doing web shots, putting them in a presentation. Yes, that's all possible. And I've done it.

Leo asked, could you please show a quick step by step to achieve that awesome email with the graph and formatted table? Oh, you want to see how the email is created? Yeah. Yeah, I can do that.

Blastula is the package that you're looking for. And then DT is another one that you're looking for. No, no, GT , GT, not DT, GT, Grammar of Tables and Blastula. But this is where I put in the subject line. Right. And this is where I define the plot. And then I compose the email. This might look like comments, but this is all being recognized by R, right? So this is like actually pulling in the things in R and inserting them into the content of the document. And you can read all about that in the Blastula help site. And then on the R Markdown side, you have to attach the subject line here. So this is the subject line. This is the body. These are images and these are attachments. So the XLS file gets attached here in the R Markdown file. Makes sense.

I just want to make sure that you're aware that when we do like a presentation here on R Markdown, if you come here to presentation on R Markdown, one of the options is PowerPoint. So you can actually generate PowerPoints here. And then in PowerPoint, going back to the web shots, you can put web shots in here to like, you know, pick up the things that you're trying to insert into the PowerPoint.

I see a conversation in the Zoom chat right now on Cassandra asked what level of security is available and using RStudio and a shiny app because they work in the treasury with sensitive banking information.

So we take security very seriously at RStudio. It became very clear to me early on that we don't have a company without secure software. So I'm not saying that we've got everything, you know, figured out because it's an ongoing process. But I will tell you that, you know, we have onboarded hundreds of customers that have had a variety of security needs and we haven't lost any deals specifically because security.

Yeah, the general question is, you know, can I trust, you know, my shiny applications? And keep in mind that normally these shiny applications are used for internal purposes. Like normally you are like you're putting this behind your firewall. So that's usually the standard you're looking for.

I see another anonymous question that was Excel may be considered as interactive app for data analysis. How do you help Excel users successfully transition to a shiny app developer? Well, yeah, you're in luck. I mean, there are a lot of resources around learning to do shiny. So if you go to, what is it, shiny.arcstudio.com, that's a great place to get started. The creator of shiny is Joe Chang. And shiny is 10 years old now. It's been around for a while. But the cool thing about shiny is that it was designed to fail nicely. So Joe talks about the pit of success, right. So you might make a mistake, but shiny is going to be nice to you, you know, when you make that mistake.

So the way if you're transitioning to shiny, be nice to yourself, start simple, and then just follow the patterns and fall into that pit of success. I say that myself because I'm not the world's best shiny app developer. My background is in big data, right, as opposed to front-end application design. But I have learned that just going with the patterns, going with the flow, you're going to have a pretty gentle ride with shiny. And if you do have problems, go to community.arcstudio.com. There's a lot of resources on the community site about doing shiny.

I will point out one thing that's interesting about shiny is that reactive nature is very similar to the Excel paradigm, right? So in Excel, you change one cell and it just trickles through the whole spreadsheet and can just have massive changes to the experience. Shiny is the same way. You'll change one thing and it will just reactively, like, percolate through the entire application.

Another question on Slido from Josh was tools such as GitHub, SQL, Power BI, Tableau, Excel. Curious your thoughts on these tools as they don't seem to be emphasized enough with respect to their use in the workplace.

I'm struggling with that question a little bit because I deal with customers on a daily basis and it seems like everybody's using those tools. And SQL has been such a dominant paradigm for so long. I've worked with no SQL databases for quite a while, right? No structured things like Tableau or Redis. There's always some sort of SQL-ness around it. You're like, hi, it's not even SQL, but they put HQL on top of it. So I think SQL is one of the most enduring paradigms in technology that I'm aware of. And don't bet against it.

I think one of the problems with SQL in general, with the world, is very few people get formalized SQL training. I was lucky enough to get that. So I learned SQL on the job. Nobody in academia talks about SQL, right? I shouldn't say nobody, but nobody in my experience in academia ever mentions anything about SQL. And then I go into the job and that's all anybody was doing. So I learned SQL that way. But then I actually did take a training class on SQL and then talked about the query planners, right, and other clauses that maybe you're not so familiar with. And window functions and different styles of it. And that was very useful.

If I could kind of just clarify, I think you did end up touching on what I meant. With R, I learned a lot about R through school. Barely was exposed to any of the other tools. So it seems now that I'm backtracking to learn the other tools to match my use with R, but I would barely use R in the workplace.

I think R is a niche programming language mostly used by scientists who are seeking the truth. And if you still go to the R project page today, it'll say it's a language for statistical computing and visualization. And how many companies are sitting around thinking, I need some statistical computing visualization software around here. That's not what they're thinking. They've got data. They've got operations. They've got BI. They've got these big systems. So it's not a surprise that, like, in academia, you're using R to do science and learning, and then the industry is using more enterprise tools.

I think one of the reasons why I wanted to join RStudio was to help bridge that gap. But I think, yeah, Josh, you and I should grab a cup of coffee and talk.

I wrote a post many years ago when I first joined RStudio about making R legitimate in your organization, and the tenets of that was to make a decision and allocate resources. So you're looking for a stakeholder, usually someone very senior in your organization, to say, we will consider R to be an analytic standard, and we're going to support it, just like we would SQL or these other tools that we're talking about. And then if you can get resources allocated to them, whether it be, like, you know, like a server, like money resources or software, or you get individuals, like saying, we're going to train these individuals on those things, that's what you're looking for, right? And that's the natural process in most organizations to recognize standards.

There's an assumption that R is – that data science is things that people do on their laptops with free software, right? And I think that greatly undermines the value of data science. I think that – and it's, you know, Josh pointed out, well, what about Tableau, right? And what about Power BI? Power BI is a little different than Microsoft, right? So Microsoft is a different thing. But, you know, all these other tools. Well, if you brought Tableau into your organization, somebody went up in there and said, we're going to make Tableau an analytic standard for this organization, we're going to allocate resources for it. You have to do the same thing with the R programming language.

There's an assumption that R is – that data science is things that people do on their laptops with free software, right? And I think that greatly undermines the value of data science.

I think most organizations aren't very data-driven, right? And there's a spectrum. But they want to be, and they know that it's relevant. I think going through the process of saying, like, you know, R, you take all the emotions out of it and say, like, you have a business to run, right? We're talking just enterprise. We have a business to run. R is a competitive advantage. It brings in great people. It allows you to use technologies. It needs support. You're already using C Sharp, and you're using some Python, and you're using Java. Why not use R? And those are arguments that you can win.

There's usually some sort of fulcrum, and getting that executive sponsor is key. Like, if there's no executive sponsor, no one's willing to actually say, yes, this is what we're going to do, then you really – you're back in that – you're back in the guerrilla warfare, right? You're just back in there, like, doing stuff and, you know, throwing something against the wall and hoping it sticks, right? And I have to say, like, if you're in that organization, there are better organizations than that, right? Like, if you're looking to, like, make an impact, make a difference, and improve your skills, like, why would you want to be in an organization that didn't value data science, right?

And data science is mainstream now. Most of them are not data-driven at all. And for them, it's one extra step that they need to do. And, you know, like, people are lazy. They don't really want to do an extra step if they can, like – if they don't have to. So before they realize all the benefits, they need to do effort.

But one of the questions from earlier, Nathan, was what packages do you recommend for importing Excel files? Wasn't there, like, a readxls and then there's, like, an xlsx package as well? Yeah, you know, that's a miss in my presentation because I was showing how to read in the flat files. But, yeah, just reading those things are pretty straightforward. Everyone in the chat is confirming that it is readxl . So it's – okay, so if I come in here to environment, import dataset from Excel. Yeah, readxl is the one that's used by default.

And then another question from earlier was, can you version control Excel? Yeah, the one I'm familiar with is a paid solution, though. So you actually have to, like, subscribe to it. It's an enterprise solution, which makes sense, right, because Excel is an enterprise tool. Like, you have to pay for it. That's one of the nice things about R is that it is free, right? So if you want to use it, you can start it easily.

But last – a few months ago they released some new function that made it Turing complete, you know. So it's like Excel is there forever. And it's not just Excel either, right? It's also Google Sheets, and it's also OpenOffice, right? So it's like you actually have three, you know, spreadsheet platforms to choose from that are common.

My experience is I've been using R and R Markdown for three years, using it in the enterprise business. I first started people introducing – getting them to know R, but that was a little step too big. But once I introduced them to R Markdown, they could see the link with Excel very easily, very straightforward. And they love R Markdown, and it connects with them, with Excel, because you can work in the R chunks and generate nice reports easily, do it step by step.

And how I use it now in the enterprise, I use it in the oil and gas, I use it in the marketing business, is they have these ERP systems, SAP, Microsoft Dynamics, and when they have to do reporting with those systems, it is very cumbersome. And when they use R Markdown, they extract the data from the backend, put it in a small database, access the data with R Markdown, then it is for them easy peasy. Really, they can generate fantastic reports, and they love it.

Yeah, I appreciate that feedback. I'd be really – so I showed the visual editor today on the demo. It broke at one point, which was unfortunate. It's new. I'd be curious to know your feedback introducing these same people to visual editor. That actually made things even better for them.

They don't like R, but they like R Markdown. They love R Markdown. Well, I mean, R Markdown, you've got R code chunks inside of R Markdown, so you can't really use R Markdown without R, right? But you're saying they don't like the R script. They don't like the R script. They like the R Markdown. It's easier to understand, to connect with Excel, all things work.

I like it because you – I just have too many experiences where I wrote R code and I came back six months later and I couldn't understand what the heck I was thinking. Like I hardly recognized it was me, you know. And R Markdown at least gives you some opportunity to say, this is what I'm thinking. This is why I'm doing this. This is what you should expect here. And, yeah, that's really valuable just to me personally, right, like me as a coder.

And what I also like is about the different output formats that you can easily generate with R Markdown. You can make an HTML, you can make a dashboard, you can make a PDF, you can make whatever you wish.