Resources

Dan Caley | Demystifying the art of creating custom libraries for your organization | RStudio (2022)

video
Oct 24, 2022
14:50

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

The title of my talk is Demystifying Creating Custom Libraries for Your Own Organization. And just like a lot of our talks, we like to go with long titles. So I don't know if mine, like, wins the long title. Haven't counted it out yet.

But yeah, so, you know, when I was thinking about this talk, I was thinking, you know, libraries feel like they're reserved for more developers, and they're this, you know, tough thing that, you know, is super difficult, you have to read many chapters. And what I found in my journey, it's not. And this talk could actually be two minutes instead of 20 minutes. But just to hit the requirements, I put a bunch of fluff in this. No, no, I'm kidding. There's some good stuff here.

So just going forward, you know, my hopes from this presentation is that you guys all get inspired. You know, inspired to, you know, create your own libraries, and, you know, inspire other people on your team to contribute to those libraries and those packages.

You know, before we jump in, like, let's do a little introduction of myself. My name is Dan Kaley. My hobbies are hiking and taking care of my dog, Helmut. That's him in a little vest. I was hoping for a little bit more oohs and aahs when everyone saw my dog, but I'll, whatever, we'll talk after.

I work at Custom Ink as a senior data analyst. And at Custom Ink, what we do is we create custom t-shirts for organizations. And so we do that because we're trying to build community and bring people together. You know, just like this conference, when we got that RStudio t-shirt, we're all, you know, 20% of the people here aren't wearing it. And it's bringing us all together. And this company was founded in 2001, and I swear to you, all of these people here look just the same, and it's 2022. So if you ever work at Custom Ink, they might have discovered immortality. I haven't confirmed it yet.

And this is, you know, this idea of bringing us together from, like, a community perspective is kind of the same thing about libraries. You know, we all love these libraries. We get involved. We want to contribute. And there's a little bit of community.

Why package your functions in a library

So the things that we're going to cover is, you know, why package your functions in a library? Because that's what a library is. It's just a bunch of functions. You know, and we're just going to briefly talk about functions and how to create a function and then create a library and then document your function and then how to install your library. And how to install your library is going to be the important part, because that's going to allow you to share it either via GitHub, through, you know, your GitHub's firewall, or, you know, you can have it saved in a local directory.

So, you know, why package your function in a library? If you've ever had this internal thought process, you know, where did I save that function again? Like, you're building something, you create this cool function, and you're like, hey, I want to use that function again, but where did I save it? Where did I put it? What file? Like, what folder? Did I delete it? Oh, my gosh. That's, like, that's scary.

You know, you start spiraling, you know, what did I do? Did I save it in a notepad, an R markdown? Did I print it out and then fax it to myself? Like, you're going south, you know?

You know, I'll just rewrite the function. You know, it takes you 30 minutes, an hour, two hours, again, to rewrite it. And so this is the idea of, like, packaging your code, is that you want to take those functions, package them up so that they're already in your R environment, and then you could just load them up and then use them. And then also to allow your co-workers to use them and inspire them, and maybe they're having the same problems, or maybe they didn't realize that they could create a function like that because they're just so busy.

Creating a function and building a package

So let's just cover what a function is real fast. So a function is a set of statements that, when combined, performs a specific task. So an example that we're going to use, we're going to use this function for when we build our package. So we're going to build a not-in function. So for anybody who used SQL, there's an in and a not-in. And the in function, you know, filters on a list of values. Not-in does the opposite.

And that's nice, because let's say you have, like, a list of 20 things, and you want to exclude only two of them. You don't want to list out 18, you only want to list the two, so you can include the 18. So in order to create that function, all we're going to do, and it's really simple, is we're going to go negate, in, and now we have our not-in function right there. So we're going to use this to create our library.

So what we're going to do is we're on a clean R document. We're going to go file, new, package, new directory, and then R package right here. And we're going to call this Danalytics, because my name's Dan, and it's my talk. Create package. And then once we create a package, you can already see there's already some default things. They have a print hello world, which is just called hello. So let's just delete that, let's remove it. And you can see, this is just an R script. So you can have a bunch of functions that you already have in R script, and just plop it in here, and you're halfway there.

In this case, we're going to drop our not-in function in here. So we're going to do that negate, in, I promise you I'm not this slow at typing. I go much faster. And then we're just going to do some housecleaning. Let's rename it. And again, my talk, not yours, Danalytics. And then, okay, great. And then we're going to go to the top, and build, and then install, and then install package. And that's pretty much how you create a package. That's it.

And that's pretty much how you create a package. That's it.

So we have a bunch of functions. We're pretty much done. And that's why I'm saying this could have been two minutes, but we have some other things to do that we can add here.

So let's see if this works. So we're going to load the library, Danalytics, and we're going to have attendees. We're going to have Dan, Michael, or, yeah, Dan, Michael, John, Andrew. We're going to put that to a variable, and then we're going to go attendees in John, and then attendees not in John. And as you can see at the bottom, it goes false, false, true, true, and it did the opposite, and that's what we want. True, true, false, false. So we see that the not in function works.

Documenting your functions

And this is a big part, documenting your functions. And how you're going to do that is pretty simple. So there's this folder, the man folder right there, so we're just going to load it again, and we're going to go to man, and you can see in there that there's already a hello file in there, and it already has some documentation. So we can take that, use that, and replace it with what we want to add, which is going to be the not in. And just like cooking show magic, it's there.

And then, what's nice about it is, if we just do the help or F1 key, if we do help and then not in, you can see there's some documentation, and this is important because sometimes your function might be pretty complex, and you can document it, or you might have to install libraries that are outside of your function, and you can all document it there so it's easy for users to jump in and see what it's doing.

Installing your library

So now how do we install this library? And there's going to be three options for us here. We can either do GitHub public, which anyone can access. It's not through CRAN, which is a little bit more involved, this is just free form. And that's not behind a firewall, it's just your own GitHub. And then either through the GitHub enterprise, which that one's going to be through a firewall, so I'll teach us how to get through that securely, and only read access and not write access to that enterprise. And then another one, a lot of companies don't have GitHub. They have a file director, and it's all shared. So you can just save it there, and manage it there, and install it there, and allow your colleagues to still access it.

So installing it from GitHub public is just this. Install package dev tools, call that library, and then install GitHub, the developer name and package name. And so in this example, install GitHub dkd5005, and then slash danalytics, and that's it. And when you install it, it'll just look like a normal installation.

And this one's going to be a little bit more steps, but it's not going to be that daunting, as it might seem. So installing from GitHub enterprise, we're going to install this package, use this, and then library use this. And then what you're going to do is whatever your GitHub username is and your email, in this case, mine is daffy.doc, and it's daffy.doc at funnyducks.com, and you're going to create GitHub token. So what's happening here, it's going to use that git config, create GitHub token, it's going to use that configuration to then open up a web browser, and then you'll log into your GitHub, and it will bring you to this page.

And here, we would just retitle it, you know, you have notes, this is the danalytic package, and then you can set an expiration. You might want to set this as 90 days for security purposes if you're ultra conservative, or if you just never want it to expire. And then there's pre-settings here that you can adjust, but a lot of this is just going to be read-only, so it won't let you, like, write directly against it and, like, upset your GitHub. You don't want your company to be mad at you.

And then where it's going to take you, once you scroll down and you hit yes, it's going to take you to here. I blurred out this token. I probably didn't have to, because I deleted it afterwards. But it's going to give you a token where, like, the blurry line is, and you're going to copy that token and make sure to put it, like, in a shared password of some sort, you know, keep track of it. It's no problem. You could create this process and have a token if you needed to.

And then you would just do this, just like before, install packages devtool, devtool, install GitHub, and then your company name, and then danalytics, and then whatever it is that is on the main branch. And then you would put your token in there, run that, and it would install that package.

So then installing from a file directory is much simpler than that. All you're going to do is go use this right here in the devtools, you go install, and then you put the file directory in there, and ironically, it is put into a GitHub folder for you then to download it from a directory. But yeah, so you would just do that, and then you could see, if we call the library danalytics, it's in, and then you have access to it.

Real-world use cases at Custom Ink

So I know, like, the not in function is, like, probably not that exciting of a function to use. So I wanted to kind of give some examples of some use cases that we use it over at Custom Ink. Connecting to a SQL database. You know, I think, maybe just me, I go online and I find this all the time, and this is from the RStudio website, is this right here, this whole credentials. This is, like, 12 line of codes. I have no reason why I ever want to remember this. I don't ever want to remember this.

And so I created a function called InkBase. Custom Ink likes to brand everything. Our mascot is called Inky. So databases are InkBases. And so I created a function, InkBase. And then what we can do, and then set up some documentation, and here you can see, like, how do you, you know, what do you need? You need to install the package keyring, the library is keyring, and then how would you set that up? You go keyset, so what's the service that you're using, what's your username, you know, it would pop up and give you a password that you need to put in there that it secures. And so it has, like, some setup in the documentation for new users who have not connected to a SQL database.

And then another thing that I like is GetSQL. Just like, you know, when you go read underscore CSV, this is the same thing. I like to code in a different IDE for SQL and then save it down, rather than having maybe, like, 100 lines of a SQL statement or 1,000 lines of a SQL statement in R, I rather have it in a different file and then call that file, read it in, and then run that query. And so I created this function, which is like a GetSQL. And it's just, like, takes in the SQL file and just takes it in as a text file.

And so, yeah, so you can see here, you know, we're using, this is no longer R Markdown, a QMD file. And so I'm going ink base for the function. I bring in R PostgreSQL. And then I come over here and I go, OK, I want to pull in orders. So I go to this orders.sql file, which is on the right side right there. I pull it in. And then I, you know, put my connection in there, which is this Redshift connection, orders, and then pull in the data. And I can see that the data is up there at the top right of R.

And then another one that I like is, I love data tables when you put them in, like, R Markdown files, because then it allows users, you have visualizations, and then what's the data behind those visualizations. But one thing that I want users to be able to do once I give them my R Markdown file is I want them to have the ability to copy it or download it as a CSV, so maybe they can do their own visualizations, because maybe they have some other additional analytics they want to do on their side. And again, like, I would have to write all of that stuff, which I have no intention to try to remember.

And so I created a function where you can just, you know, it has copy CSV, and then the length menu, you can just enter it in there, you know, for the Y one. You can say, hey, I want it to be 10 rows long, 15, 25, whatever you want. And that looks like this, where you have the copy and then CSV at the top. Rather than having to remember all that, I can just call that function.

And that is the end of my talk. If you want to see this, this is going to be on the RStudio GitHub page. It's not up there now, but I'll put this up there.