Resources

Luis D. Verde Arregoitia - Why’d you load that package for?

video
Oct 31, 2024
4:24

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everyone. My name is Luis. I'm a biologist. I study mammals. I do research in ecology and evolution. But as part of my work, I'm often helping students and colleagues with their analysis. And this involves a lot of running and explaining and fixing code that I didn't write myself.

So, by the way, in this research setting, people tend to work interactively with scripts. And there they'll be loading different packages to do different things. Sometimes I'll recognize the names and what they're for, but a lot of the times I won't. That's fine. No one's expected to know all about the tens of thousands of packages out there. And I'm not the only one that sometimes needs clarification, right? Like this user here. His grammar, not mine. That's my talk title.

Adding comments to package load calls

So say we're loading some packages in a script like here. So my practice nowadays is to ask the people I'm working with, what are these for or is there anything I should know about them? And if so, let's just go ahead and add these little code comments. And at least to me, this is already giving me a pretty good idea of what's going to happen later on in the script.

And I know that this conflicts with good practices of not adding too many code comments. But in my opinion, these little package comments are fairly unobtrusive. And we're loading packages once per script ideally, so it's not too much clutter. I think it's okay to do.

And if I'm asking people to add these comments to their code, then I should help create them or do something to automate these comments. So what I did was to develop some tools that can build these comments, either by pulling information from package descriptions, or just checking a script to see how the different package elements are used. And after that, I wanted to examine comments in public code to see if people are doing more or less this overall.

And if I'm asking people to add these comments to their code, then I should help create them or do something to automate these comments.

The annotater package

So here's the tool I built for this. It's the annotater R package. It can add various bits of information to library load calls. So let me just show you. Through either functions or add-ins, we can do things like add the package title next to each call. And package titles are usually short and informative, in my opinion. Or we can list the exported functions from each package if they appear in a script. Or this one that I use a lot, which adds versions and sources.

What people are doing on GitHub

So now I've seen comments like this here and there, especially in blogs and teaching materials, but I wanted to check thoroughly. So what I did was to search a snapshot of public GitHub repositories. The details for this are in my blog, but of about half a million relevant files, I located 4,000 comments I could look at. I'm not naming package names here, just the comments, but overall people are mentioning the overall purpose of their packages or any functions of interest, which was a good thing to know.

A fair amount of comments about pipes for some reason, and people leaving various technical details. Here's a few more that I found amusing. I'm not naming names here as well. You can guess, try and guess which package those are for.

But now let me just leave you to at least consider adding these sorts of comments to your code, especially if it's meant to be read by others. This is a low effort thing to do with a potential high reward, and it's like with this baby drawing here. If we know what the little kid is trying to draw, then we might see them see and recognize the elements sooner. And same thing with code, I think.

So if we provide some context and intention, we'll be reducing the amount of guesswork and interpretation that people will do when they encounter new code that they haven't seen before. So yeah, we should do this to give users of our code a head start. That's it for me, so catch me outside if you want. Thank you.

This is a low effort thing to do with a potential high reward.