Seth Colbert-Pollack - Level up! Empowering industry R users with different levels of experience

Transcript#

This transcript was generated automatically and may contain errors.

Good afternoon. Welcome to Level Up! Empowering industry R users with different levels of experience. I'm Seth Kopelniks, a data scientist at Picnic Health, and I work on a team of seven wonderful data scientists and analysts, and we all come from different backgrounds.

We have different backgrounds in education. My manager is a PhD linguist, and I also work with PhDs in neuroscience and biostatistics, MPH in epidemiology, and we have different work histories.

My co-worker Chris came from the operations side of our company as a data analyst. We have people who came from educational, non-profit policy advocacy, and we have people who came from clinical research. We also have different backgrounds in our programming. A few of us are R experts from the start, or started with R and grew into experts. Others of us have more knowledge in SAS before coming to Picnic Health, and a few of us had bits and pieces from various programming languages or SQL that we now bring to the team.

So how do we all work together? This talk is like a mullet. A mullet is a hairstyle with business in the front and a party in the back, so this talk is going to have technology in the front as solutions for leveling up our team, and then we're going to move on to cultural solutions. So the technological solutions might sound familiar to many of you, but the cultural ones, although they do come up, I feel like we discuss less in our community, so I'm excited to talk about those with you today.

Building internal R packages

So first, technology. The first tool that I'm going to share with you out of two is building your own internal R packages. We speak a lot about building R packages for a wider community, sharing them as open source software, which is great, but we can't always do that, and if we're working on proprietary projects, we still will want to be able to take advantage of the features of R packages.

So why would we want to do that? We don't want to reinvent the wheel when we're reusing code, and we'd like to reduce boilerplate. So when we're working on the clinical data that Picnic Health produces, it might come up that we want to calculate the age of our patients on a certain date, and to do that, we'll need the birth date and, if relevant, the death date. The calculation is fairly straightforward, but we'd rather make sure that we're doing it correctly once, and then using that correct solution throughout our code, instead of copying and pasting, possibly with errors, throughout many scripts. It's also nice that updating the package updates your code. So if in our internal data source, our birth date column changes name to date of birth, we can update in our package what column we're referring to, and then our code doesn't need to think about that further. And finally, it's always nice when team members are using the same functionality, even when they're working on different projects, because it increases interoperability, so when I'm handing off work to a teammate, they can pick up running, and when I'm reading code in a code review, or I'm trying to figure out what a co-worker did who's no longer at the company, I have more familiarity with what they were doing.

One sign that you might need an R package is if your scripts all begin with a laundry list of functions that you wrote elsewhere. So in this example, you're actually already 80% of the way to your package, because all of your content is already there.

When you're building your internal R package, I encourage you to start small with a list of a handful of functions that you've already written. This way, you're focusing on the process of building the package and not getting bogged down in the content, which you can always expand later. So start small and iterate. Get your package to build without errors so that it works, and use your package yourself. Then you can share it with others. Beyond that, there are all sorts of fancy things you can do, like building a package gem website or writing tests, but before you can run, you can walk. So begin small. It's good enough if you have tons of warnings and notes when you run devtools check, which tells you how well your package is building, as long as we have zero errors for now.

For further reading, the standard text by Hadley Wickham and Jenny Bryan is called R packages, and it's available for free online. There is a package called devtools, which you use, and this is all the same literature, whether you're writing an open source package or an internal package. There was a talk at this conference last year about writing your first R package, and for a slightly more advanced approach, you can host your package on Posit Package Manager, which is a paid Posit product. This is what we do at Picnic Health, so once we've set up our options, we just have to write install.packages with the package name, just as we would for a package from CRAN.

Hosting content on Posit Connect

The second technological solution I want to share with you is hosting content on Posit Connect. If you're like me, you've heard a lot about Posit Connect over the past few days, so even if you had never heard of it before, you now might consider yourself an expert. So we'll do this quickly, but what I love about Posit Connect in terms of a team is that our work is reproducible and schedulable, so if we have a report that we're running, we don't have to worry about those things, and we don't have to email attachments back and forth. So once you know the URL for your resource, you can reference it. When an update is published, it will be to the same address, so you can reference it again. And as a teaching and learning tool, teammates see the complete example of a data analysis output with the code and the resulting figures and analysis explanation, which really helps link the programming and the resulting product in the mind.

And as a teaching and learning tool, teammates see the complete example of a data analysis output with the code and the resulting figures and analysis explanation, which really helps link the programming and the resulting product in the mind.

So Posit Connect is a little bit harder to just get started with than writing an R package, because it is a paid product. I suggest that you discuss with your team if this is something that you'd like to involve, and then talk to one of the many people from Posit at this conference who would love to tell you more.

All right, we've made it through the business in the front, and now we have our party. So now I'm talking about cultural solutions to upskilling our team together. And the first one is a wiki. So by a wiki, I just mean a website that is editable by its users, in this case, your teammates. Wikis are good because they are highly discoverable. So if you're not sure what you don't know, you can go to your wiki and read through it, and see what your teammates have deemed a useful and necessary knowledge for you. I also advise that when you're contributing to the wiki, advertise your contributions to your team. This will encourage them to do the same.

As an example, at our company, this is what our wiki looks like. We're using a product called Gitbook to host it, which there are a lot of paid products that will do this. I don't necessarily care which one you choose, but this meets our needs. You can see we have sections about troubleshooting common errors, how to access our data, how to use our Git and GitHub setup, and much more. When you're advertising your contributions, let your teammates know what you're adding, maybe give them a link to the new page, and then they can learn from what you just wrote, and they are encouraged to do the same when they edit the wiki.

I also want to point out that just like in our package, you might already be 80% of the way there. A Google doc can be a wiki. All we need is an editable page or website that your teammates can view, and a Google doc can be that. So why not start with taking a list of bug fixes that you might have posted in Slack already and putting them in a Google doc and slapping the name of wiki on it.

Office hours and pair programming

The next technological solution I want to share is office hours and pair programming. So by setting aside an optional meeting that we have once or twice a week with my team, and I'm coming from a team of seven, if you have a team of 50 or two, a cadence might be different, and invite folks to come with their problems, their questions, maybe some work that they need a little help collaborating on, and share your knowledge with one another. For example, you might talk about resolving merge conflicts in Git, or updating a function in your package, or even set differences between vectors in R.

We can also share our knowledge directly with our teammates, not waiting for a meeting. We don't want to hoard our knowledge. We want to share it. And we can use whatever format our team usually uses to communicate. So if you are an in-person team and you always hang by each other's cubicles, go ahead and chat. If you are on a remote-first team like me, share in Slack or Teams, whatever you use, email, the format doesn't really matter. What matters is that we're sharing our knowledge. For example, maybe you found a blog post about ordering the colors of your legend so they match the order of colors from top to bottom in your plot. Or maybe you found that two functions that you use a lot can sometimes return multiple values instead of just one. Let your teammates know the same since they're likely to be using the same functions, encountering the same issues, or wanting to pretty up their plots in the same way.

The inverse of sharing a public service announcement is asking for help. If you're wondering something and you haven't found the answer yet, well, others might be monitoring the same thing. Set an example that it's okay to need help. And make yourself approachable. This goes especially for senior team members. So again, on a remote team, especially a bigger remote team, it can be difficult to post in a public forum or chat where everyone can see that you don't know what you're doing. And as a senior team member, you can set an example, show that's okay, and become more approachable.

And I want to emphasize that even if it takes 65 replies to get to the bottom of your book, you have now saved a bunch of other people 65 replies worth of work to solve your issue.

And I want to emphasize that even if it takes 65 replies to get to the bottom of your book, you have now saved a bunch of other people 65 replies worth of work to solve your issue.

Seth Colbert-Pollack - Level up! Empowering industry R users with different levels of experience

Transcript#

Building internal R packages

Hosting content on Posit Connect

Cultural solutions: wiki and knowledge sharing

Office hours and pair programming

Demoing your work

What these solutions have in common

Q&A