Meghan Hall |. Cultivating Your Own R Ecosystem as a Solo Contributor

Transcript#

This transcript was generated automatically and may contain errors.

I am Megan Hall. I am going to be speaking today on the experience of being the only RU workplace. I'm so excited to give this talk. I'm sorry I'm not there in person with the plan, but I do have to give a special thanks and thanks to the conference organizer who so deftly handled my very last minute switch to virtual presenting instead.

The team, I suppose, formerly known as the RStudio , these great top graphics for us, but there is not quite enough cell phone for my taste, so feel free to follow me on Twitter. But really just want to thank you all so much for watching the talk, and I want to give kind of a special shout out welcome to anyone who might be watching virtually, even watching this recording when it's posted later, maybe someone who's even a little earlier in their R journey, and because this talk is also for you. There should be time at the end talk for questions. If any come up while I'm speaking, you can put them at slido.com with the hashtag CA, or I'll also be around the Discord server this afternoon if there's any topics you want to chat further about.

So this is my laptop. It probably looks like a lot of your laptops out there, but to me this is the most special laptop of all because this laptop is why I learned R.

To set the stage a little bit, I've been working as a data professional for over seven years in higher ed administration. I currently work at Brown O. Bruno as a data manager, and in that role I am really kind of the data person in a functional business office. I don't work in IT. I don't work on a data team. A lot of other data fluent people around me, and as you can imagine, my is Excel mostly, and so kind of at the beginning of my career I was also using Excel for my data analysis because again that was the atmosphere that I was in, and then a few years I started getting into hockey, and because I am a numbers person, whatever other euphemism you prefer to use for nerd, I pretty quickly started gravitating to the analytics side of the game, started kind of getting involved in the public sports analytics community, and as I started embarking on my own analysis, I started using Excel because that is what I was like most recently familiar with, so what I used at work for a lot of the data analysis work that I did.

But let me tell you this lovely laptop that you see here that I still have, and so this 2015 MacBook Air could not handle Excel files that had over a million, just didn't work. It would crash. It couldn't even like filter, couldn't even do basic stuff. So out of technical necessity, I started learning R. I had been somewhat R and SAS and Stata back from grad school, but decided to pursue learning R because truly it seemed like it had the most welcoming and kind of inclusive community, and it seemed like something that was really learnable to me, and so experience that many of you are probably familiar with. I became happier when I was doing my data analysis with R, as R is much more suited to that type of data analysis work than Excel is, but again, even though I was very happy using R for the kind of the sports analytics work that I was doing on the side, it took me a little while to incorporate all of that R knowledge into my work and the work I was doing for money, because again, as I said, no one around me was using R, so it didn't seem like a very welcoming environment to start using it.

R doesn't have to be all or nothing

So I realized I don't need to spend a lot of time preaching to this crowd that R is great, because we all know that R is a really ideal tool for dealing with reproducible data analysis, right, and one of the things that is kind of magical and amazing about R is that you can use R for your entire data analysis workflow, but I would argue that kind of equally amazing is you don't have to use R for your entire data analysis workflow. That might work for a lot of people, but it might not work for everyone, might not work for you if it doesn't fit within kind of the constraints you might have upon you at work, and again, I think it's, I would argue that it's like equally great, kind of an equal strength of R that even if you just kind of use bits and pieces of R, incorporate them into your work that really helps solve your specific problems, you don't need to use the entire, again, A to Z workflow of R in order to get those benefits of R.

you don't have to use R for your entire data analysis workflow. even if you just kind of use bits and pieces of R, incorporate them into your work that really helps solve your specific problems, you don't need to use the entire, again, A to Z workflow of R in order to get those benefits of R.

And so you can really just focus on what is possible for you and what helps you, because it's for sure true that there are struggles to incorporating R into a workflow that might not feel very welcoming toward it, but I'm going to, today I'm going to talk about some of the struggles that I have faced going through that process, how I've handled those struggles, and how I still think they are greatly outweighed by the benefits you get of using R, again, even if you can only use bits and pieces of kind of the entire R ecosystem. So I hope that this talk will be helpful and inspiring to you, whether you are, you yourself are kind of starting to think about using R at work, whether you're like me and like a lone wolf, using R at work in trying to kind of spread the gospel and looking for some tips about how to make that job easier, or maybe even you're involved in like helping usher other people along similar journeys, along how to, how to incorporate R into their workflow.

So in addition to my work in higher ed, I also spent some time at Zealous Analytics, which is a sports analytics company, and there I worked as a data scientist among a team of other data scientists in kind of a, you know, software development team, and there are my entire workflow that I worked in was in R. I was able to, within R, you know, write my SQL code, directly connect to the databases to pull the data that I needed. The products I created were in R Markdown, all my analysis, my visualization, etc. And of course everything was controlled, you know, through Git and version control. And if you work in kind of a traditional data role in a data team, that does not sound foreign or special to you at all, but I can tell you that not all of us are so lucky. It is true that not everyone gets like the luxury of choosing their entire tech stack, but again, if you work in a role, maybe slightly less traditionally a data role or a data role that's kind of embedded in a lot of non-data people, if you work kind of far from other tech friendly folks, your options might be, your kind of tech stack options might be limited and not ideal, which was certainly kind of my professional case.

So while going through a couple of examples, I'm going to focus on my two best tips for dealing with that problem, which is firstly, to always try, be creative, do as much in R as possible, as less as possible in other tools, and focus on what is really realistic for your situation, not what's maybe ideal for the general population, someone in a more traditional data science role.

That is just wasted time. Having to reproduce your own analysis on any time interval is really time wasted that could more easily be made much more efficient with R.

And so, the less time you spend doing that, the more time you have for whatever work it is in the line of work that you find important, that you are uniquely good at, that you are able to, you know, put your specific skills toward. So, hopefully, this talk has inspired you to, again, continue to incorporate bits and pieces of R into your existing workflow, even if you can't use all of R, like you see other people use. And I hope it has comforted you that not everyone uses the entire R ecosystem A to Z. It doesn't mean that you aren't a real programmer, you aren't a real coder. As long as you're working kind of within the constraints of your job, you can still find ways to use R to help ease some of your specific problems.

Thank you, again. Hopefully, there are time for questions. But if not, I will be in the Discord server this afternoon.

Meghan Hall |. Cultivating Your Own R Ecosystem as a Solo Contributor | RStudio (2022)

Transcript#

R doesn't have to be all or nothing

Working around data access limitations

Dashboarding outside of R

Version control workarounds

The benefits outweigh the struggles

Internal packages

Parameterized reporting

Closing thoughts

Featured software#

rstudio