Resources

Lee Durbin - Coding in a Cyclone: open-source and the public sector in the birthplace of R

video
Oct 31, 2024
18:51

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

I'm going to introduce myself in the Maori language to connect with the community and to recognize the importance of indigenous culture in my work.

So for those of you who don't speak Te Reo Maori, and I'm a beginner myself, hello everyone. My ancestors hail from Cymru, Wales. I arrived in Aotearoa, New Zealand in 2019. I currently live in Tamaki Makaurau, Auckland, and my name is Lee.

So I work for Auckland Council, from the beautiful country of Aotearoa, New Zealand, from where I've flown to be here with you. But New Zealand had a pretty tough start to 2023. So on the 25th of January, Prime Minister Jacinda Ardern announced her resignation, which came as a surprise to just about everyone. She looks very happy here, but it was quite a sad time. And then just two days later, Auckland experienced some of its worst flooding in living memory. This image is taken from the inside of Auckland International Airport on that day.

And then just two weeks after that, a severe tropical cyclone was bearing down on the country, Cyclone Gabriel, which you can see here approaching the top of the North Island. So I was working from home on that day, as was just about everybody that I knew, when I got a call from somebody at work, and the conversation went something like this.

Hi, he said. I work for Auckland Emergency Management. I manage the rostering team, and we need to coordinate our volunteers so we can distribute them across the emergency shelters tomorrow, after the cyclone has hit, to help our communities in need. We're currently using a big Excel spreadsheet to manage this, and it's kind of chaotic. Somebody told me that you're sort of good with data, so could you help us? We need it by 6am tomorrow. What do you reckon?

So I looked out of my office window at the giant elm tree as it was leaning towards my house and back again. I sat down. I took a deep breath. OK, I said. Here's what I can do.

The first realization: proficiency

Now, I realised three significant things in this moment. It was a significant moment for me, and there were three things that I realised. To understand the first of those things, we need to go back again three years to January 2020, because the first of those things is proficiency, which I define as delivering quality at pace.

So in January 2020, I first started working for Auckland Council. It was actually my first job in New Zealand, having moved there the previous year, and I was pretty excited to start as a senior data analyst in the libraries and information department. So I'm autistic, and this job helped me to combine two of my special interests, data and books. So I was really excited.

On my first week, my manager sat me down and she said, I need your help with something. We need to report some numbers to the lead team, but these numbers are the data sources, lots of different spreadsheets, 20 different spreadsheets that are all in different places, and surprise, surprise, they're all formatted differently. Can you bring all of that together into one data set and visualise it for the lead team? And by the way, we need to do that every month. What do you reckon?

So I sat down, I took a deep breath. Now, I knew before I moved to Auckland, I knew that R had been invented in Auckland. Some of you might know that. It was invented in part by this guy, Ross Ihaka, along with his colleague, Robert Gentleman, back in the 1990s. This wasn't the reason I moved to Auckland, by the way, that R was invented there. There's another story behind that. So I knew that R was invented in Auckland. And so I naturally assumed that all of the people working with R in Auckland Council would just be churning out R scripts all day, right? Wrong. The most common tool and use in the team I worked in was, can anybody guess? Excel. Yeah. Along with a bit of SQL and a bit of Power BI, but mainly Excel.

Now, I knew that if I were to do what my manager was asking me to do using Excel, then that would become my job. I would be spending days every month copying and pasting between Excel spreadsheets and it would be demoralising. I didn't want it to become my job. So I thought, well, R was invented here in Auckland. I work for Auckland Council, so I'll just do it in R. The problem was, in January 2020, I'd never written a line of R code. So I taught myself R.

One of the reasons I chose R was not just because it was invented in Auckland, but because of the incredibly supportive R community and the wonderful array of free, freely available resources online, which made this process so much easier.

Which includes this amazing book, which I'm sure many of you here are familiar with, Alpha Data Science, really helped me along this journey, and I still refer to it all these years later. Incredibly helpful. As well as Tidy Tuesday, for those of you who know that, as well as Dave Robinson's Tidy Tuesday screencast on YouTube. He doesn't do them anymore, but they're still really worth watching. They were really helpful. So there were loads of different resources that helped me along in that journey, such that within about six weeks or so, six weeks or so, I had written my first R script. Yay. It was terrible. The code was awful. But it worked. It worked, which meant that this thing that would have taken me a couple of days every month took less than an hour. So I'd saved all of that time. But then at the end of this, I asked myself, but was it worth the time? Because I've just spent six weeks doing that.

So I often refer to this XKCD comic that some of you might have seen. And this is actually the background on my work laptop, because I'm always tempted to improve processes, and I have to kind of go, no, Lee, hold on. Is it worth doing this? This particular process was a monthly process. I spent about six weeks on it. And if you look at this chart, it was definitely worth putting in that time to improve this process. I saved a lot of time. And I was so enthused by this that I started automating all the things, any data wrangling process that came my way, building R script. More and more R scripts started to accumulate, which was great.

And so as time went on, I became better and better, and I became faster, and my code became better and better. And then in 2021, I joined an R-Ladies, I think it was an R-Ladies Melbourne virtual webinar, where Fonty Carr, who might be here today, led a session on building your first R package, which Seth has already nodded to from the session of last year's PositConf. Prior to that session, I thought that building R packages was like something that software engineers did. But it's not. I mean, it is as well. But if you want to get started, it's not as hard as you think. So that webinar got me started.

This book, which Seth also acknowledged, if two people independently are recommending the same things to you, you should probably check them out. This book was also really, really helpful, R packages, as well as the building tidy tools workshop from PositConf 2022, Emma Rand and Ian Little. You can find all the material online. Amazing. Highly, highly recommend that as well. All the links for this stuff, by the way, will be in my GitHub, which I'll show you at the end. So I started building R packages. I started taking these scripts that I'd written and packaging them, which meant that I had better documentation, I had unit tests, all that good stuff. And it was great. So as 2021 went into 2022, I now felt pretty good about what it was that I was building. I felt proficient. I wasn't an expert. There are very many people, many of whom were in this room, are much better at writing R code than me. But I didn't feel like an imposter anymore. I felt proficient. Which brings us back to February 2023, as that cyclone was approaching and I got that phone call. I knew I could help because I was proficient. I knew I could do this quickly because of that proficiency. I knew that it wouldn't be perfect, but let's be honest, anything's better than Excel.

I knew that it wouldn't be perfect, but let's be honest, anything's better than Excel.

The second realization: business value

So that was pretty good. As the storm approached and there were these winds all around the house, the house was shaking, the Elm tree was leaning towards the house, and I was churning out this R code to build something better. And it was fairly simple. It was just a data frame that listed all the emergency shelters and all the shifts for each of those shelters and grouped them by different regions, connected that to somewhere where the rostrum team could access it and have more control over allocating volunteers. It was pretty simple, but I did it. And then I realized the second thing as I reflected on that the following day, which was about business value, which I define as advancing the organization's mission.

So business value is a bit of a nebulous concept sometimes, particularly if you work in the nonprofit sector or public sector. So I work for Auckland Council and defining business value can sometimes be a little tricky when you're working in the data space because you don't always see the impact that your work is necessarily having. But here was a perfect example of direct impact from a solution that I had built using my R skills, using my data skills. So I felt pretty good about that. And then I started to reflect on the previous few years and realized that I'd actually been delivering business value all along, albeit indirectly. So I automated a process, which meant that I freed up my time, which meant that I could automate another process and deliver things as well. So I was delivering direct business value and indirect business value. But also along that journey, I had written a whole bunch of code that I never used that was a little bit self-indulgent. I just had fun writing the code. And in those situations, it's a bit like a pointless box. It's beautifully crafted. It's pretty cool, but it's maybe a bit pointless. Or is it? Because this is the sort of paradox of business value in a way.

In order to increase business value, you need greater proficiency. But increasing proficiency doesn't always necessarily increase business value. So, Baffle. I want to keep this family friendly. So if you don't know what that means, feel free to Google it afterwards. But essentially, in order to get better at something, you just have to play around. You have to experiment. You have to try things that will fail that might not lead anyway. But you learn from that. And then down the line, you can apply that learning to deliver that business value. So, business value can be direct, indirect, or maybe deferred, maybe delayed as you're learning.

The third realization: sustainability

So, I was at this point where I could quickly create a new solution, where I was delivering business value. And I felt really good. I felt like, hey, I'm the expert. I've got all this knowledge. They came to me. Awesome. As I was practicing this talk with somebody the other day, they said, you're sounding pretty egotistical at this point, which is true.

And that was the third realization that I'd had, which is this thing that I'd built, if the cyclone had swept me away, and this thing broke or needed to change, I didn't know anybody else who could do that at the organization. In other words, it wasn't a sustainable solution, which brings me to sustainability. So, what is sustainability? In a nutshell, to me, it just means just share things. Share your knowledge. Share what you know. Share what you've built. Share where you've learned things. Share the shortcuts along the way. Just share as much as you possibly can.

And that was the realization I had on that night, was I had gained so much value from the R community. I had benefited from what that community was sharing with me, but I wasn't sharing back. And I was really letting my community down of Aucklanders by not building sustainable solutions. So, I needed to change the way that I was operating. And I was in a really fortunate position in February 2023, because just a few weeks later, I was seconded into my manager's role. And so, now I was working with a team of data analysts that I had previously been in. And so, I had this nine-month window of opportunity to help them along on that journey, to share with them what I had learned, where I had learned it, and the shortcuts along the way.

And it's really easy to share when you write code. I love this quote from Geoffrey Sanova, who invented PowerShell. He says, the mouse is antisocial, the GUI is antisocial. So, what does that mean? You have a problem to solve, and you solve it with the GUI, and what do you have? Problem solved. But when you solve it with a command line interface and a scripting environment, you have an artifact. And that artifact can be shared with someone. And I had built an entire ecosystem of artifacts that I could share with that team. And as time went on, they started to contribute to those artifacts themselves. And they started to build their own artifacts. This was amazing. And so, we had sustainable solutions, and we had this more sustainable culture. And I went a bit further. I created an internal R package that made things easier for them to work with some of our internal data. And I built a package that made it easier to build packages called PackagePal, which is on Crayon, if you want it. It's just a checklist that tells you these are the things you should think about when you're building an R package. So, that was great.

But when you solve it with a command line interface and a scripting environment, you have an artifact. And that artifact can be shared with someone.

Wrapping up

And so, the end of this journey. A mature data culture is based on proficiency, business value, and sustainability. But remember, proficiency takes time and practice. There are no shortcuts. Business value is important, but increasing proficiency sometimes comes at the expense of delivering value. And that's okay. Sustainability is a choice that puts the needs of our community ahead of our own. Ultimately, in order to create a mature culture, we need all of these things in place. But a mature data culture is one where individual practitioners have the space to grow and the time to share. And everyone benefits from that. Thank you.

A mature data culture is one where individual practitioners have the space to grow and the time to share. And everyone benefits from that.

Q&A

Thank you, Lee. Really appreciate that. Really like that idea that the value could be deferred. I never thought about it that way. You practice and the business value is just deferred, even though just because you're getting the proficiency. As we were waiting here to get some questions in, I actually do have a question. I may not be specific to what the subject was, but what happened to the elm tree? Did it survive? That's a bit of a sad story. So, it did survive the cyclone, but then it sort of contracted Dutch elm disease and we had to cut it down. So, sorry. Sad ending. That's sad. I was hoping it would survive. It survived the night.

The first package that you were able to write for your team internally, where was it published? Was it published on GitHub or did you have a way of being able to share internally? Yes. So, I published it on a private GitHub repo. And so, then I added my team into the organization account I created and then they could just install the package from the GitHub repo in R using... There's a package that makes that really easy to install. So, yeah, there were a couple of them that I did make public. They were more kind of general use, but then we also had a bunch of internal private ones. That's great. Well, thank you again. Appreciate it. Thank you. Thank you, Lee.