Resources

Seth Colbert-Pollack - Level up! Empowering industry R users with different levels of experience

video
Oct 31, 2024
17:43

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Good afternoon. Welcome to Level Up! Empowering industry R users with different levels of experience. I'm Seth Kopelniks, a data scientist at Picnic Health, and I work on a team of seven wonderful data scientists and analysts, and we all come from different backgrounds.

We have different backgrounds in education. My manager is a PhD linguist, and I also work with PhDs in neuroscience and biostatistics, MPH in epidemiology, and we have different work histories.

My co-worker Chris came from the operations side of our company as a data analyst. We have people who came from educational, non-profit policy advocacy, and we have people who came from clinical research. We also have different backgrounds in our programming. A few of us are R experts from the start, or started with R and grew into experts. Others of us have more knowledge in SAS before coming to Picnic Health, and a few of us had bits and pieces from various programming languages or SQL that we now bring to the team.

So how do we all work together? This talk is like a mullet. A mullet is a hairstyle with business in the front and a party in the back, so this talk is going to have technology in the front as solutions for leveling up our team, and then we're going to move on to cultural solutions. So the technological solutions might sound familiar to many of you, but the cultural ones, although they do come up, I feel like we discuss less in our community, so I'm excited to talk about those with you today.

Building internal R packages

So first, technology. The first tool that I'm going to share with you out of two is building your own internal R packages. We speak a lot about building R packages for a wider community, sharing them as open source software, which is great, but we can't always do that, and if we're working on proprietary projects, we still will want to be able to take advantage of the features of R packages.

So why would we want to do that? We don't want to reinvent the wheel when we're reusing code, and we'd like to reduce boilerplate. So when we're working on the clinical data that Picnic Health produces, it might come up that we want to calculate the age of our patients on a certain date, and to do that, we'll need the birth date and, if relevant, the death date. The calculation is fairly straightforward, but we'd rather make sure that we're doing it correctly once, and then using that correct solution throughout our code, instead of copying and pasting, possibly with errors, throughout many scripts. It's also nice that updating the package updates your code. So if in our internal data source, our birth date column changes name to date of birth, we can update in our package what column we're referring to, and then our code doesn't need to think about that further. And finally, it's always nice when team members are using the same functionality, even when they're working on different projects, because it increases interoperability, so when I'm handing off work to a teammate, they can pick up running, and when I'm reading code in a code review, or I'm trying to figure out what a co-worker did who's no longer at the company, I have more familiarity with what they were doing.

One sign that you might need an R package is if your scripts all begin with a laundry list of functions that you wrote elsewhere. So in this example, you're actually already 80% of the way to your package, because all of your content is already there.

When you're building your internal R package, I encourage you to start small with a list of a handful of functions that you've already written. This way, you're focusing on the process of building the package and not getting bogged down in the content, which you can always expand later. So start small and iterate. Get your package to build without errors so that it works, and use your package yourself. Then you can share it with others. Beyond that, there are all sorts of fancy things you can do, like building a package gem website or writing tests, but before you can run, you can walk. So begin small. It's good enough if you have tons of warnings and notes when you run devtools check, which tells you how well your package is building, as long as we have zero errors for now.

For further reading, the standard text by Hadley Wickham and Jenny Bryan is called R packages, and it's available for free online. There is a package called devtools, which you use, and this is all the same literature, whether you're writing an open source package or an internal package. There was a talk at this conference last year about writing your first R package, and for a slightly more advanced approach, you can host your package on Posit Package Manager, which is a paid Posit product. This is what we do at Picnic Health, so once we've set up our options, we just have to write install.packages with the package name, just as we would for a package from CRAN.

Hosting content on Posit Connect

The second technological solution I want to share with you is hosting content on Posit Connect. If you're like me, you've heard a lot about Posit Connect over the past few days, so even if you had never heard of it before, you now might consider yourself an expert. So we'll do this quickly, but what I love about Posit Connect in terms of a team is that our work is reproducible and schedulable, so if we have a report that we're running, we don't have to worry about those things, and we don't have to email attachments back and forth. So once you know the URL for your resource, you can reference it. When an update is published, it will be to the same address, so you can reference it again. And as a teaching and learning tool, teammates see the complete example of a data analysis output with the code and the resulting figures and analysis explanation, which really helps link the programming and the resulting product in the mind.

And as a teaching and learning tool, teammates see the complete example of a data analysis output with the code and the resulting figures and analysis explanation, which really helps link the programming and the resulting product in the mind.

So Posit Connect is a little bit harder to just get started with than writing an R package, because it is a paid product. I suggest that you discuss with your team if this is something that you'd like to involve, and then talk to one of the many people from Posit at this conference who would love to tell you more.

Cultural solutions: wiki and knowledge sharing

All right, we've made it through the business in the front, and now we have our party. So now I'm talking about cultural solutions to upskilling our team together. And the first one is a wiki. So by a wiki, I just mean a website that is editable by its users, in this case, your teammates. Wikis are good because they are highly discoverable. So if you're not sure what you don't know, you can go to your wiki and read through it, and see what your teammates have deemed a useful and necessary knowledge for you. I also advise that when you're contributing to the wiki, advertise your contributions to your team. This will encourage them to do the same.

As an example, at our company, this is what our wiki looks like. We're using a product called Gitbook to host it, which there are a lot of paid products that will do this. I don't necessarily care which one you choose, but this meets our needs. You can see we have sections about troubleshooting common errors, how to access our data, how to use our Git and GitHub setup, and much more. When you're advertising your contributions, let your teammates know what you're adding, maybe give them a link to the new page, and then they can learn from what you just wrote, and they are encouraged to do the same when they edit the wiki.

I also want to point out that just like in our package, you might already be 80% of the way there. A Google doc can be a wiki. All we need is an editable page or website that your teammates can view, and a Google doc can be that. So why not start with taking a list of bug fixes that you might have posted in Slack already and putting them in a Google doc and slapping the name of wiki on it.

Office hours and pair programming

The next technological solution I want to share is office hours and pair programming. So by setting aside an optional meeting that we have once or twice a week with my team, and I'm coming from a team of seven, if you have a team of 50 or two, a cadence might be different, and invite folks to come with their problems, their questions, maybe some work that they need a little help collaborating on, and share your knowledge with one another. For example, you might talk about resolving merge conflicts in Git, or updating a function in your package, or even set differences between vectors in R.

We can also share our knowledge directly with our teammates, not waiting for a meeting. We don't want to hoard our knowledge. We want to share it. And we can use whatever format our team usually uses to communicate. So if you are an in-person team and you always hang by each other's cubicles, go ahead and chat. If you are on a remote-first team like me, share in Slack or Teams, whatever you use, email, the format doesn't really matter. What matters is that we're sharing our knowledge. For example, maybe you found a blog post about ordering the colors of your legend so they match the order of colors from top to bottom in your plot. Or maybe you found that two functions that you use a lot can sometimes return multiple values instead of just one. Let your teammates know the same since they're likely to be using the same functions, encountering the same issues, or wanting to pretty up their plots in the same way.

The inverse of sharing a public service announcement is asking for help. If you're wondering something and you haven't found the answer yet, well, others might be monitoring the same thing. Set an example that it's okay to need help. And make yourself approachable. This goes especially for senior team members. So again, on a remote team, especially a bigger remote team, it can be difficult to post in a public forum or chat where everyone can see that you don't know what you're doing. And as a senior team member, you can set an example, show that's okay, and become more approachable.

And I want to emphasize that even if it takes 65 replies to get to the bottom of your book, you have now saved a bunch of other people 65 replies worth of work to solve your issue.

And I want to emphasize that even if it takes 65 replies to get to the bottom of your book, you have now saved a bunch of other people 65 replies worth of work to solve your issue.

Demoing your work

Another example that we can share for technological solutions is demoing your work. So here my coworker Ashley is showing a dashboard that she made. And she's showing what's possible given the tools that we already have. Or new tools that she has implemented in our stack. It's also great to get feedback on your work from coworkers more senior or more junior than you. And the audience here can be small. So just the people you work closest with or it could even be your entire company.

What these solutions have in common

So what do all of these solutions have in common? I see two things. First, you can start unilaterally. Regardless of your seniority and regardless if you control the first strings, you can build a Google Doc of your favorite bug fixes. Or you can share a blog post that you really love. And secondly, you can lead by example. By doing any of these things, you are showing it's something that's okay to do and you're encouraging your coworkers to do the same. As a result, your work will become faster, smarter, and more legible to your teammates. And your team will level up.

Thank you.

Here you can see my LinkedIn, my email, and a link to the slides, as well as the QR code to the slides. And I believe we have some time for Q&A on Slido.

Q&A

That's correct. Thank you so much for your talk. So I have a question here. It says, the tool is amazing. I would love to use but I'm not working with U.S. data. Any scope of translating this tool to other countries? What is needed for that in terms of census data? Something tells me this is the wrong one. That might be a different talk. It does look like a different talk.

So can you give us an example of like the first functions that you used in your package that you published internally? Yes. So we have a lot of utilities for using our data that we share with our research partners. So things like accessing the patient's age or span between their first and last visit or their geographic location. We also have a whole other package that we use not to manipulate our data, but just to serve as infrastructure for connecting to our databases, and authenticating with Google, authenticating with RStudio Connect. So that's like a whole other list of utilities that can come with your internal package.

That's awesome. Now we have one for this actual room. It says, do you have any advice for people who are a team of one? Good question. So the intended audience of the talk was people on a team of two or more, but on a team of one, although you might not get the same benefit from sharing a demo, you might still have advantages from using an R package. Like you have the same capabilities of integrating with the RStudio IDE. You can run tests and documentation. I think largely anything involving writing things down to act as an artificial memory enhancer is still going to be impactful as a team of one. Because I can't tell you the number of times I've searched my problem in the Wiki or in my company Slack and found that I was the one who had the same problem six months ago. So maybe the same will work as a team of one. I don't know.

I'm very familiar to having the same problem six months ago. All right. So I like this one a lot. It says, how do you motivate colleagues to get on board with new cultural practices? It's a good question. I am lucky to have a team that is very on board with positive cultural practices. So I can't really speak to the other side from experience, but from what I can tell you is that if people are seeing how it directly benefits them, they're more likely to buy in. So if someone has a question and you know the answer because it's already on the Wiki, you can share a link to the Wiki. Or conversely, if someone solved a question, answered a question that you had, maybe you can add that to the Wiki for them and let them know and that can show the benefits.

That's great. I think we have time for one last question. It says, where do you store your internal packages and how do you get them on Connect? Is it Package Manager or something else? Yes, we do use Posit Package Manager and we have it set up so that it reads directly from our GitHub, which is private as we are a private company. So we have a few packages that go to Posit Package Manager. They are rebuilt whenever a commit is made in the relevant package and we can see them once we have our options in R configured appropriately. Well, thank you again. I appreciate it. Thank you.