Aaron Chafetz | Digging a Pit of Success for Your Organization | RStudio (2022)
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thanks for joining my talk today. I hope you guys brought along your shovels since we're going to be talking about digging a pit of success. Over the next 15 minutes or so, I'm going to talk about a framework that we developed at our organization for developing a pit of success. And so let me talk a little bit about me before I dive into that.
My name is Aaron Chafetz. I work for a federal agency called USAID as an economist, and I work in the Office of HIV AIDS. Our work in the Office of HIV AIDS feeds into a larger interagency effort known as PEPFAR, the President's Emergency Plan for AIDS Relief. PEPFAR was started in 2004 with the goal of ending the global HIV epidemic. And we work in interagency fashion with a host of other agencies like CDC, Department of Defense, and Peace Corps.
So one of the success factors, one of the factors that has made PEPFAR so successful is the reliance and the push to collect measurement and evaluation data across the world. So what does that data look like? We're collecting data on HIV status, treatment status, and prevention indicators, upwards of maybe 30 different indicators collected at 30,000 health facilities all across the globe. This data also is coming in each quarter to us. So as you can imagine, this adds up very quickly, and we are swimming in data.
What makes this even more challenging is when you don't have the right tools for the job. So when I started in the Office of HIV-AIDS back in 2015, the primary means of doing analysis was Excel. Excel is great, and there's, you know, clearly there are useful uses for it. But when you're working with data that come in so frequently and are so large, it's probably not the right tool for the job. And at that point, you kind of have to recognize that there are better things out there, much like R. And so that's when I started in 2016, started realizing that there was a use case here for using R. I had been using Stata as an economist, but even Stata, Stats package, didn't quite cut it in terms of the scale of the data we were working with. And I got really excited about R and really spent a long time learning this and tried then to convince other people in our organization that R was a great path forward and we needed to learn it.
Problem, though, you know, R has a steep learning curve, and so trying to convince other people to learn R is much like telling them just go out and run a marathon, right? It's not easy to do, especially when you don't have a training plan in place. And so after a few unsuccessful years of proselytizing R, just talking about it, what I realized our organization needed was a pit of success. We needed something that users didn't have to go it alone when they were learning new skill like R.
What is a pit of success?
The problem, I think, in terms of this idea of success is we often frame it as one of climbing a mountain, right? We have to fight this uphill battle, and only when we have summited this mountain after all of this work, we have achieved something. The idea, though, a pit of success is flipping that on its head. It's saying why can't we create an infrastructure so that in order to succeed, it's as easy as falling into that pit? And in order to do so, we kind of lay out the guide rails and the framework to make this easy for other people to do.
It's saying why can't we create an infrastructure so that in order to succeed, it's as easy as falling into that pit?
This is by no means a new concept or one that I came up with. Hadley was talking about it way back in 2016 in relation to the tidyverse. And the tidyverse definitely was the primary pit of success for me in learning R, and I think this framework also applies to an organization. Problem is, if Hadley's talking about it, why aren't we all already doing this already? And I think there are definitely some clear barriers in the way of doing this. Obviously, it's resource intensive to set up this pit. You need people to kind of shine that light to show others the path to move this forward. And it's time intensive, both for those people who are leading the way and for new people to come on and learn the skill. And then lastly, as part of the government, our hiring practices are just definitely not as nimble as the private sector, and our data stacks are a little slower to innovate.
Three ways to dig the pit
So I want to talk about three ways in which we were able to dig this pit of success at our organization. The first, the primary means of doing so was developing packages to create that infrastructure, then creating buy-in at multiple levels, and then lastly, fostering community of support for our users.
So starting things off, I think the foundations of this pit of success were through packages. Our work in PEPFAR, we, as I mentioned earlier, have lots and lots of data, measurement and valuation data, that's coming in every quarter. And this is a good means to automate something, and R is a great tool for that. So if we have this data that's similar and coming in, we can start writing scripts that we can automate. Oh, and those scripts, as we get more comfortable, we can turn into functions. So that those functions are useful to you, but they're also can be useful to others. And slowly, started developing packages. Initially by myself, but I saw that this is a means to help others enter the field and learn R.
So as I became more proficient in R, and we started bringing on more people who learned R, and we had more expertise there, we were able to grow out this infrastructure to allow new people to enter more easily. So we've developed over time a ton of different packages, whether they are around solving a problem for a particular project that we have to repeat, or they're more general utility functions so that people don't have to recreate the wheel every time they encounter a problem. So an example of this is one of our utility packages called Glamour, where we do something as simple as setting up a folder structure, right? Doing this manually is very easy to do, but when you're talking about having to repeat this process across lots and lots of different projects and doing this across lots of users, having something that allows you to just fall into that pit of success is important. And we do this through a lot of different ways. So like loading secrets, securely storing and loading secrets to make it easy for people to have all their credentials in one place and to make sure that they're not posting this to GitHub and posting their credentials.
So packages really founded the foundation of this infrastructure, because they created reproducibility for our work and consistency across all of our users. And more importantly, this created an approachable way for new users to come on without having to know the full R ecosystem, but they had tools that were specific to our workflows that they could use and take advantage of.
Creating buy-in
So while we had those packages, we didn't really have the buy-in. We didn't have people clamoring to be at the table to use R. And where we started to be able to turn things around was really convincing leadership that R was the right direction to go. Leadership is really critical, because they're the ones who are ultimately creating that tradeoff structure, telling analysts whether they should be focusing their attention on training versus having to turn around products, right? Because there's always that need for those concrete products. So that they can create that influence both in the workforce and as they're hiring new staff on, can see this as a marker when they're looking at CVs in order to bring that on.
And the way we were able to lure in our leadership was through concrete products. We were able to, and specifically around visualizations. We were able to build off GG plot by creating our own package glitter that used our own themes and colors that allowed us to quickly and efficiently create high-quality products that our leadership could use to understand where our progress was at and use these as communication pieces. And they wanted to see more and more of these more quickly.
We also had to convince our colleagues that R is a worthwhile investment, right? And so I think it's clear automation makes a ton of sense to all of us in this room, but there is that learning curve. And so we needed to convince our colleagues that that upfront investment cost is worthwhile. Spending anywhere from two hours to two weeks to invest that process so then in a year, when you're rerunning all of these things, it only takes seconds to run those analyses, as opposed to having to do that comfort in a point-and-click world where, yes, it's easy to do, that you have spent 40 minutes on it instead of two weeks, but you have to spend that 40 minutes every single time that you're doing this.
And so the way we were able to influence change for our colleagues was to curate support to their different workflows and intervene in specific instances. So maybe it was working with a colleague to show them how to set up and work with them to support an API to bring their data in without having to do a manual download of data. Or maybe it was showing them how to create calculations before they moved their data into Tableau. Or maybe they felt comfortable in the point-and-click world for all of their munging, but then wanted to have a high-quality product using R for the visualization side.
Fostering community
So it's great. We started to have some buy-in from our users, but we needed then this last piece of community. We needed to support them in some way. And I think everybody here is familiar, as you've started R or some new programming language, there's lots of resources out there, and it's easy to be inundated and overwhelmed by everything and not sure where you should start learning. And so we realized early on that we needed to curate an approach for them, a path that they could move forward so that they knew what direction to go and what was applicable for our work.
So as a result, we created a community of practice in our office called Core, which is really founded on the ideas of being able to share our workflows and best practices between inexperienced users and the ones who had a lot more experience. And as we developed more and more R users, we could do things like having one-on-one mentorship that we had people who had been around and knew what was going on. We were able to develop classroom-like settings to have more people able to kind of know the one-on-ones of R. Setting up a Slack channel was really important as well, because it created a forum that people could feel like they could ask questions very easily, and other people could share new things that they'd learned. And it didn't have to be a one-way street like a classroom setting. And then lastly, just kind of setting up infrequent makeover challenges so people became more familiar with ggplot.
So here's kind of an example of how we tailored our approach in the classroom setting. So myself and other colleagues set up a 12-part series that's based on R for Data Science. The core principles from R for Data Science are there, and many of these chapters are almost copied verbatim. But the difference, though, is we tailored the data to what our analysts were familiar with. We used our PEPFAR data so that an analyst understood the funda—could focus on the fundamentals of R rather than having to worry about new data. We also tailored all of the examples to things that worked in our workflow, so that they understood the data, they understood these workflows, and could go from the classroom-like setting to apply it the very next day in their work, which was quite successful.
Results and closing thoughts
So setting up this pit of success was something that just didn't happen overnight and did take a lot of time, and it's something that we have still been continuing to do. And over the last few years, we have drastically—we've drastically increased our user base. But I think what's important is not just the number of users we have but the work we've been able to produce. We've been able to increase our efficiency and do more and higher-quality work as a result of more of an emphasis in R.
I think this model of kind of building a success and creating an infrastructure for others in your organization is really important, and I'm happy to be a resource for anybody who's trying—feels like you are that N of one at your organization and wanting to set this up. My email here, and it's in the next slide as well. And then I have some additional resources that are posted on this slide and are also on the conference website as well. So with that, I'll end, and I wish any of you who are willing to take on this challenge the best of luck in digging a pit of success for your organization.
