Brad Lindblad | Professional Financial Reports with {rmarkdown} | Posit
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Well, thanks, Rachel. I'm excited to be here. I've got 20 minutes to talk, so some of this might be a little bit rushed. So I just want everybody to know they can reach out to me. There's my contact info on the slide, and it'll be at the end, too, if they want me to elucidate on any of the points that I'm making. So the title of this talk is Professional Financial Reports with rmarkdown. But first, I'll tell you a little bit about myself.
So I'm a data scientist at a company called Ag Country Farm Credit. It's a farm credit organization in North Dakota. We do a lot of ag lending. I'm a tidyverse instructor through the RStudio education group. If anyone has any need to learn how to just teach in general better, and especially to teach technology, I would recommend talking to Rachel about it. It's a great program. A few of us went through it, and it's been paying dividends. So I'm also the author of the tidyUSDA package, which lets you query the USDA survey and census data and then throw them into nice county and state maps, which are – it was kind of a pain doing that manually. So I use this every day at work. And I'm also author of a slightly more useful package called Shroot. And if anyone's familiar with the TV show The Office, this package has all of the transcripts from all of the nine seasons. So you can create things like this chart here that counts how many times someone said that's what she said. And obviously, Michael is the winner here. This is also available in Python and Julia. A lot of people like using this package just to practice NLP and text analysis just because it's a data set that a lot of people know pretty well.
I also had the chance to use R this spring. So we had our second child, and we couldn't decide on the middle name. So we used – I think it's called babynames, a package, and we kind of narrowed it down to our three names we had, and we were looking at the trends to see how popular they were just to make sure we weren't jumping on the bandwagon. So it was kind of a fun thing to do. I still don't know if my wife appreciated that or not, but hopefully she forgot all the pain medicine and stuff. But I always like using R in any way I can.
Overview of the project
And this project is an example of doing something that may have taken a few different technologies in the past, like Photoshop and some kind of photo editing thing, Excel, what have you. And the goal is to make a nice packet for a fictional financial industry company. We're calling it Black Swan Financial.
The output of this will be three separate packets, one for each branch in this company, and we're going to do it all in R Markdown. And the kicker is it's all going to be PDF outputs because HTML is arguably a lot easier to do in R Markdown than PDF. But unfortunately, people in the finance industry still like to print out reports.
So we're going to tackle that, and this is what it's going to kind of look like.
So the key players, the packages that I'll be going into a little bit in this talk. R Markdown is important. Cable extra lets us format cables, which are table format that we use for PDFs. It lets us set things up really nicely. I don't know how I could do this without this package unless I knew a lot more about text, which I don't. Using the RN, I think that's how you pronounce it, the RN package, which is, I think it's relatively new. It's a nice package that lets you reproduce your environment on someone else's computer, kind of like in Python or Julia. And it might make installing my dependencies easier on your system because Markdown dependencies can be painful sometimes. And then we use the glue package to glue together bits of SQL and things like that and some strings. We use that quite a bit in this example.
Disclaimers. Like I've been kind of hinting at, R Markdown, it works great when you install it on your computer and get it to work in your computer. But the minute someone else has to reproduce it, it can be trouble. So if you've never used the RN package, you should just be able to download the repo from the first page and hit RN restore, which should make your environment almost identical to mine. Potentially, you might have to re-install Tiny Text, which will fix some kind of text backend stuff. And there's more troubleshooting on the readme. I tested this on Windows 10, Linux Mint 20, and Mac as well.
Okay. So this is kind of the high-level map of what we're going to be doing. I like to think of anything in terms of inputs and outputs for projects like this. So the inputs are these three branches that we have for our company. We're going to use one R Markdown document. We're going to iterate over these branches, pull data from a SQL database, in this case, SQLite database, just for an example. And then when it's all said and done, we're going to have a PDF for each one of these branches with data for just that branch.
So I'm going to go through the four pages of the documents here, and then we're going to jump into the code a little bit. So there's going to be a front page with personalized, it's going to personalize this to show what branch this packet is for. We're going to have just a really basic income statement. Another table with a ggplot polygon map, just to show that you can put plots in. And then this is the same as the first income statement, it's just a trend. So just kind of grouping it differently in your process. So get into the code.
Walking through the code
All right, so if you folks can see down here, I'm going to be kind of navigating this file structure, but I'm going to briefly touch on how I created the data, because if someone else, if you yourself have to create fake data for a talk like this, there's a really nice package called charlatan that makes this really simple. You can create fake users, fake credit cards, you can fake geospatial stuff, it's really nice. So for this example, we're creating just a SQLite database. We're saving in the file structure. We're creating two tables, financials, so this has kind of the data you'd see in the income statement, then customers one, which was on the page with the map, then we're just writing that to this table. So just making some simple data, not really important to remember that, but in case you're interested.
Dive into the R folder. So we've got a setup, which is pretty common. I labeled the main function that we're going to run through main, and then we have a little functions thing that has a few more functions. And there's an R markdown folder, which has our R markdown. So start the setup.
I've got some packages here, just calling them all right there. I'm pulling the current year and month, because generally when you have like a packet like this, let's just say it's income statements for a branch, you'd get this monthly. So we're going to be iterating through that. I'm creating my connection to the SQLite database that I can use later, and I'm creating my input data frame. So this is the three branches that we're going to iterate over. So if I run this, actually I already ran it, it's going to show, there it is, it's going to show the three branches that we're going to go over. So Fargo, Belfield, and Kindred, those are all towns in North Dakota. And so we're going to run our finished packet against each of these.
So that's the setup. Now main, this is where all the magic happens. This is where we source the setup, the setup file we just talked about. We have a function that'll create the reports, it'll knit them, and then put them in our directory that we want. And then we have an iterator function. So I'm going to go through this really quick.
So our create reports function, it looks at this, our main Markdown report that creates the packet, it's called reportmaster.rmb in this, in this case. We're using the R Markdown render function to take the inputs that we're going to pass to this function. It's going to tell us where we want the outputs to go. It's going to also name, name the outputs. And then it's all, and then it's all, we're also specifying the parameters we're going to use, which are the, it's the magic sauce for this. It lets us iterate over these, these different branches.
So that's, that's the main function. And one thing I like to do is use the per possibly function instead of a try catch. So let's say you have 40 branches, and this packet is huge, and it takes two minutes, you know, each to do. So it's, it's a pain to have to go through 38 of those and then have it fail. And then you're stuck. So this kind of lets you just go through them and it fails gracefully.
So this kind of lets you just go through them and it fails gracefully.
So I have a three-year-old and whenever he wants to ask me, whenever he asks his mom or me to do something that we know, he knows we're not going to say, he tries to be as cute as possible. And he says, you know, maybe ice cream, maybe, you know, go for a walk at nine o'clock at night. So that's kind of what I named this, because it's what I'm thinking in my head, like maybe create reports. Hopefully this will work. That kind of helps me remember silly things like this.
So we're going to take, this is kind of the main generator, iterator part of this. We're going to take the input data. So that was the little data frame with just the three records in it. And we're going to use the per pwok function to iterate those, these branches over the markdown to create the outputs.
The R Markdown document structure
So that's the main R files. Now we'll get into the R markdown parts of it.
So in the past, when I would make R markdown reports, you know, it'd be like a thousand lines long because I didn't know this trick. So, and it was just a pain because you'd just be scrolling and scrolling, trying to find the things and you'd be navigating with the tree and whatnot. But there's a nice way to do this. There's a little parameter you can call it called child. So it lets you knit an individual markdown document and place them all together into one document. So I have a separate markdown document for each page of this packet. So as you can see, front page, income statement, customer, trend on the back page. So then your main page looks really clean and you can kind of organize it better.
So as far as the top matter goes, I have the parameters that we're going to use here. These are what we're going to be using to iterate, branch, month, year. I've got a lot of text stuff that I honestly couldn't explain what it means or how it works. I just know that after like a week of, you know, getting this format the way I want, I had to have these in this part. So it's one of those things that I can't really explain, but it works. So copy it if you want. Something that you might run into trouble with if you're on Mac or Linux is the font. So I change it to use a sans serif font. And I'm on Windows now, so I'm using Calibri. But if you're on Linux, you know, use Ubuntu or what have you. So you might have an error there if you just try to run what I have playing on Mac or Linux. So I'm going to say that.
So let's run through each one of these pages and open up. So front page, we're taking the parameter branch and we are passing that to this function called createImage. So there's a nice package called magic in R that lets you edit images, do whatever. In our case, we're going to be annotating it with the text that we're going to see in the front cover.
This is just literally a PNG of the front page that we have saved. But this part right here, where it says Oregon, North Dakota branch, this is going to be different for each one of the branches that, you know, the next one's going to say Belfield and what have you. So we're going to be creating this in our markdown. And we're using the glue package, the glue function to just glue these parameters together to create our string. We're going to throw that string on that image and return it. So it's a nice way to avoid having to use Photoshop or, you know, some program like that to create your cover page.
Income statement and customer pages
So the next page is the income statement page, which is this one. So it's pretty basic. It's just a single column. I've got the data up there. And as you can see, it's grouped by just different subheadings. As you can see, that's bolded. Some certain fields are bolded. And this is the R markdown page to do it. So we're going to be calling, rather querying our SQL database that we created in the beginning, the SQLite database. I'm using glue again to select that data from the financials. So grabbing a data frame at the very top. And then I'm not going to go into what each of these are. But, you know, we have the data frame. We want to, you know, change it. We want to, you know, mix new columns, you know, add up things. And that's what all this is.
So basic data monitoring. And at this point, you're going to have data that's wide, right? So nice and tidy. But for something like an income statement, you want the data to be vertical. So there's a nice function in data table called transpose that lets you just flip the axis of your table before we finally put it into the cable. So this is where we use cable extra. And cable extra has a lot of functions that let you style the table the way you want. So in this case, for each of these rows, the rows that I'm aggregating on, rather, I'm bolding them. And this pack rows function gives us this kind of nested thing you see here, which is, you know, pretty common on income statements. So in a nutshell, we're grabbing the data from the SQL database. We're munging it. And then we're throwing it into a cable and using cable extra to format it, which spits out, you know, this nice looking page.
I'll move on to the next customer page. The best customer, rather. So this is just a simple table that looks at the top customers sorted by interest income. And then we have just a nice ggplot that aggregates that by state and plots it. So same idea. We're just pulling the data from the SQL database. And we're creating that top table. And we're just doing a ggplot. And you can obviously put all kinds of cool graphs and charts on here. And the point of this talk isn't to show you how to use ggplot. It's to give you the idea of how you can iterate over reports like this.
So last important page is the trend, which is identical to the income statement page. But we're just grouping by the date. So we have kind of the way you'd normally see an income statement with previous months there. So you can look for trends and whatnot. And you will also notice when you look through this that this page is horizontal. And you actually have to write, I think this is text or latex. I'm not an expert on this. But if you duplicate my repo, you should be able to just type this command and b landscape and then n landscape. And it lets you have one page in your packet, which is horizontal, which is huge because this is a pain in the butt to try to format to look really skinny.
And then we have just a simple, for the back page, I just literally have the front page without all this text on here. So just a nice way to bookend the project. And I'm just including the png right there and knitting it all together piece by piece in this master document.
What you can do with this
All right. So I've only got one minute left here. But so what can you do with this? I know I went really fast. I didn't have a ton of time. But like I said, you can feel free to reach out to me or fork my repo or do whatever you need. What really helps is in any financial institution, you have people usually in the accounting department that are making Excel reports like this by hand, which introduces opportunities for error opportunities. Obviously, it's a lot quicker to do things in R when you spend six hours one day creating this and then you don't have to touch it forever, essentially.
So it's a good way to pick up these Excel reports and also get people that that maybe are just Excel jockeys like I was, like a lot of us were, and turn them into R developers because they see what they can do with R versus with Excel. And it really kind of gets the wheels turning. You can also make your own reporting easier as well. So if you're a data scientist like me, you have to look at various tables every morning. And instead of looking at these tables one by one, I just have R Markdown reports that I've parametrized to look at certain tables and give me the data I need. So with that, I feel like I talked really fast. So I apologize. But I think we have maybe five minutes for questions, Rachel.
Q&A
Yes, that was awesome. Thank you so much, Brad. And yeah, we have we have five minutes for questions now. I didn't notice there's anything in the chat window yet. But if you want to ask a question there through the chat window, please feel free to go do so.
Or even just ask aloud. It looks like a lot of people are just saying that was great, Brad. Thank you so much. Great presentation.
Hey, Brad, if I could ask a question real quick. So how do you automate this right now? Is it off of your like, you're using something like Microsoft Task Scheduler, or RStudio's automation? Or that's an issue we've run into with R Markdown reports is they don't run consistently. And we don't have access to a reporting server right now. So have you run into that? And then how have you kind of solved that problem?
So my day job at Egg Country, it's a Windows 10 shop. So, you know, we don't have the best, you know, Windows isn't my favorite platform. But what I've found to work the best is if I make a batch file, and call, in this, in the case of this project, it would be, you know, our script, our main dot R. So it'd be running my iterator page. That's, I try to make this a whole project like this roll up just to one call one function call. So let me pull that up really quick. So everything, if I were to run this, it would run this whole process, I just get just described. And I find if I make a batch file that does this, that's the best way to run on Windows 10. I like to use, you know, Linux for my personal things. And, you know, putting this on a cron job is, is easier. We don't have RStudio Connect yet at my, my office, but I think it's a lot easier to schedule things like, like this with the professional suite of tools. I mean, Rachel can say more about that. Yeah, and so with RStudio Connect, you could just schedule this to run as often as you needed it to, and then email out to specific people that needed to receive it. Or I'll put a I'll just put a link in the chat too, if anyone wants to look at it.
Another question in the chat is, Brad, what about requests to drill into individual transactions? Do you get these from your users?
So, so is the question kind of scope creep based on the report that I've created? So let's, let's say we have this income statement report. I think the question is, you know, do we have people asking to drill into these things further? And if that's the case, yes, all the time. And that's more of a person management thing than a, than a data management thing. But with our markdown reports like this, that are spitting out PDFs, it's a little bit tougher because you have to, you know, once you have this format down, and it's creating these tables, it's, it's it's a little bit tougher to change them. Versus if you're spitting out an HTML report, you could use all kinds of different frameworks like GT that let you drill into things a lot better. So, you know, you, you lose flexibility when you're spitting out to a PDF, in my opinion.
There's a few other questions in the chat trying to read through which ones haven't been been answered yet. But someone had asked for any recommendations for learning latex books or websites. I don't know latex. I know the our markdown book, books that you put out. Maybe Rachel can post those in the chat or something. But the author of our markdown has a really, really good book. And, you know, I based, you know, my learning on making parameterized reports on his little chapter that he has in there. I just had to expand on a lot because it was kind of bare bones. But that's, that's where I started. As far as like learning latex proper. I don't have a clue. Awesome. It looks like Nick put a helpful link in the chat for us.
I think we have time for one more question. I see Luis just asked, my question is more culture related. Did you encounter any resistance to start using R for reports instead of Excel or outdated report writers? Oh, yeah. I tell people, you know, when I first started at my current company, there was no one using R, there was no one using any language like this, period. And it was all Excel. And it was, you know, I was pretty confident in my R capabilities. It was spreading, you know, evangelizing people on telling them that, you know, R is better because we can do things like this. And you're not going to have people fat fingering, you know, errors that, you know, cost millions of dollars. So it's, the evangelism part is, is really the kicker. And you have to have, you have to develop people skills. And in the end, you really have to show what's in it for those people. I mean, because you and I know that R is, for certain things, R is better than Excel, right? You just have to show the value and then people get it.
The evangelism part is, is really the kicker. And you have to have, you have to develop people skills. And in the end, you really have to show what's in it for those people.
Thank you so much. And we'll have a few more minutes at the end of the meetup for a few other questions too. But if there's anything that didn't get answered, we can continue using the meetup discussion page as well.
