Kelly O'Briant | Building a business case for data science & advocating for analytic infrastructure
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi again, everybody. Thank you so much for joining us today. Welcome to the RStudio Enterprise Community Meetup. I'm Rachel calling in from Boston today. I will be heading to Nashville for my birthday. I'm super excited, heading there later today. But I'm joined by my colleague, Lou, who's going to be helping out in the chat a bit if anybody has questions there.
But if you've just joined now, feel free to introduce yourselves through the chat window too and say hello, maybe where you're calling in from. I do want to also make note that if you want to turn on live transcription during the meetup, you can do so as well in the Zoom bar below, you just press the more button. But to go through a brief agenda, we will go through some short introductions of this meetup group. And Kelly will introduce us to tips for navigating internal processes and advocating for data science infrastructure, while also sharing her own experience in doing so. And so hopefully that will be able to help a few of you also do the same.
We're doing things a little bit differently here today. So after Kelly's presentation, we'll stop the recording and would love to open it up to discussion with everyone here to hear what you're facing and how we can all help each other too. So for anyone joining this group for the first time, a special welcome to you. This is a friendly and open meetup environment for teams to share the work they're doing within their organizations, teach lessons, learn, network with each other, and really just allow us to all learn from each other. So thank you all so much for making this a welcoming community. We really want to create spaces where everybody can participate and we can hear from everyone. So I'd like to reiterate that we really love to hear from everyone no matter your level of experience or the industry that you work in. So for questions for the meetup, you can always ask questions in the Zoom chat or raise your hand to ask live. But we also have a Slido link that you can use so you can ask questions anonymously as well.
And if you ever have suggestions or general feedback, please let me know. I'll share a few links in the chat. But if you do find this format helpful today, we'd love to have more champion chats. So I'll start by saying the third Monday of every month. Maybe we'll try to stick to that, but I'll share more info when I put the recording up as well.
But with all of that, thank you again so much for joining us. I'd love to turn it over to our speaker today, Kelly O'Brien. Kelly is a product manager here at RStudio and is interested in configuration and workflow management with a passion for R and data science.
Thanks, Rachel. Yeah, I'm honored to kick off the first champions chat here. And I just have a couple of slides put together to hopefully spark some discussion later on. And I apologize up top that my voice is a little terrible because I did like three days of hard gravel riding over the weekend as my fun vacation. And I am suffering now. But happy to be here talking about analytic infrastructure, which I have been passionate about for a long time.
Before I was a product manager here at RStudio, I worked on the solutions engineering team. And that's how I joined RStudio. And I see we have, I think, a couple of solutions engineers on the call. Gagan, I saw on the call. And we've got Nick from Customer Success. I worked with him for a long time here at RStudio. And Rachel, of course, our ultimate champion of champions.
And so happy to share some of the experience that I've gained here at RStudio and that brought me to RStudio, actually. So I'm going to talk a little bit about what advocating for analytic infrastructure sort of looks like to me and in my past lives. And I want to talk about what it is and why it matters and ideas for identifying leverage points for yourself so that you can be a better advocate for your own workflows internally at your own organizations. And so that you know how to interact with the community and ask for the support that you need from us and from the community at large.
Identifying where you are in your organizational culture
So before I get into that, I want to start by acknowledging or identifying where you are now on the spectrum inside of your own organization. And this slide, it can be a little bit depressing. So I promise that we'll have some more optimism from here. But everybody starts inside of an organization that's at a different stage in its organizational culture, its growth, its points. And there, it's important to identify where you are on the spectrum of organizational culture typology because you need to know what you're sort of working against when it comes to doing advocacy and asking for change inside of your own organization.
So on the right-hand side, you have your generative performance-orientated organizations. This is the ideal. If you are in this type of organization, hopefully you feel really supported and you can ask for what you need and you can build business cases to support those asks. And everybody is cooperating building bridges between teams is highly encouraged. You can take risks and feel supported in that area. You can fail and ask questions about what happened and you are, through all of that process, sort of taken care of. In the center, you've got your rule-oriented organizations. These are more bureaucratic. There's some cooperation, but not super highly motivated cooperation. You're sort of ignored or bridges are tolerated, but not encouraged. Also, novel problem-solving can work over time, but you have to work really hard at it. Maybe you have to build some new rules along the way or show why the old rules are outdated. And it's a lot of work to get things done. On the left-hand side, you've got your pathological organizations, which is a funny word, but I've definitely been a part of these places where they are entirely power-oriented. There's very little cooperation. You don't want to stick your head up or offer new solutions because you'll be shot down and everybody is shirking their responsibilities. Everybody is scapegoating and all novel ideas are crushed. This is a real soul-fucking place to be in. If you are in this place, I'm very sorry.
I hope that you'll find some support within this group as well because you can find yourself inside of these places and teams. It's really unfortunate. We want to help you understand and see from the outside that it's really almost impossible to make change and advocate for yourself if you are truly inside of a pathological organization. We want to see you move towards healthier spaces where you can find more cooperation and support. I just lay this out at the beginning of every time I give one of these talks because I know everybody's coming from different types of organizations and spaces. I don't want to paint this as a super rosy picture where you can do everything because sometimes there are just forces at play that are really difficult to navigate and move against. I'll leave it at that.
What is analytic infrastructure
We'll now go into the more optimism side of the story. I hope that as you're looking at yourself in this picture, you're finding yourself more on the right hand than the left hand side. If you are more on the left hand side, then just know that this is a safe space. We're here to talk about it. What is analytic infrastructure? I define it as all the how, where, and with what that goes into doing your daily data science work. This is a picture of a bunch of stuff that could involve portions of your analytic infrastructure environment. Ignore the red line in the box in the center. It's just in there from a different diagram. You've got data sources. You've got processes. You've got artifacts that you're building. You're deploying, managing, scaling those artifacts that you've built off of processes that involve data sources. You could be integrating with other things like BI, databases, reporting systems, workflows. Underneath, you've got security and control, access validation, and then other things, programming languages, packages, and dependencies, and operating systems. All of these things play into how you do your daily data science work.
All of these things look different in different organizations. They'll look different for you. We use the terms like levels of sophistication, but even that is a little bit, I don't know, hard to put into a box. When I have polled previously groups of our users about what their definition of production is, here are some of the varied answers I've gotten to that question. Production could be doing things at scale. It could be things that are customer or user-facing. It could be something as technical as production is a service-level agreement, therefore mission-critical operations. It could just be a set of environment requirements, like areas where validated applications are deployed in a lockdown way. It could be that production means just documenting processes, doing due diligence around testing and monitoring. My favorite answer is in the middle. Somebody just wrote a single word response, credibility, which is great. You can see there's all of these different things that go into production. Production, to me, is how you define the environment across the board from even starting with development all the way to whatever it ends up being that you deliver. Those environments that you're doing your work inside of.
Don't get caught up in the hype
I don't want people to get caught up in the hype machine here. This is a screenshot of me Googling DevOps tools and the crazy sets of tools that you could employ to build out these environments. I'm not going to provide any solutions here. RStudio sells products, but you can build environments completely on open-source tools. We're not here to give you any easy answers even because there are none. There is no standard architecture because you know the types of problems that you are trying to solve. You need to understand what tools make sense for you in your own environment. There's also no perfect deployment pipeline. Everything is and should be evolving over time. You don't even get to rest on your laurels. You should always be evaluating your processes and determining whether you can make them better.
Tactical metrics: code deployment lead time
These are all lessons I've learned from the world of DevOps. When I say DevOps, I don't mean the tools that you find when you Google what is DevOps. I mean the philosophy of DevOps, teams working together to accomplish goals inside of an organization.
One of the really interesting, I think, tactical metrics that spoke to me when I did a deep dive on DevOps was the idea of code deployment lead time. This is something that I've never bothered trying to measure in my own life, but I've realized that I was, as I have been on different teams trying to accomplish goals and projects, almost subconsciously measuring how long does it take us to accomplish this thing? How long does it take us from going from raw materials, point A, to some kind of finished product? When I feel like that time is growing or it's taking too long, it starts to grate on me in my heart, in my core being. It eats away at me. That's what, I think, makes me a natural analytic administrator. It makes me a champion because I am very bothered by these types of problems.
Both of these questions are really interesting to ask yourself if you haven't. How long does it take you to get from raw materials to a finished product? Is that something you measure or is it something that you just intuit and eats away at you? How many teams do you have to traverse to make real impact with the product of your work? That's another really interesting one. It may or may not be relevant to you, but also very relevant to the world of DevOps philosophy.
Why does this matter? Your analytic infrastructure is what enables you or teams to deliver value through decreasing that code deployment lead time. It also dominates how your daily work is performed. These are pretty obvious lessons, I think. This author, Gene Kim, has written a number of really influential books in the DevOps space, two of which are novels about IT and DevOps. One is called The Phoenix Project, and one is The Unicorn Project.
Your analytic infrastructure is what enables you or teams to deliver value through decreasing that code deployment lead time.
I've enjoyed both of them. If you want to read a really corny novel about IT and DevOps, I highly recommend it. This quote is from him. Improving daily work is even more important than doing daily work. If you don't have the tools you need to accomplish the data science daily work that you're doing, if you are doing things in spreadsheets that take days instead of what you could automate in minutes or even seconds, it's affecting you, it's affecting your team, it's affecting your organization's bottom line. Taking the time and energy to advocate for better processes is, in fact, more important in those cases than doing the daily work of those tasks and moving your spreadsheets around from place to place. That's just an example, but it's something from my real life. I have spent days and days and hours and hours doing things inside of spreadsheets that could be automated.
Improving daily work is even more important than doing daily work.
It's really interesting because I have a really high tolerance actually for doing that sort of stuff, even though I said it does crush my soul at a certain point. I find it very meditative to work inside of spreadsheets. I don't have anything against spreadsheets, but there is also this power and real importance to advocating for process automation and getting to the point where you can see and make the business case for the importance of automating basic processes. I have had a lot of success in my own career doing that sort of advocacy and work, but it does sometimes come at the cost of then realizing the type of organization that you're a part of.
Gene Kim in the Unicorn Project wrote really, these stories are both the same, both the Phoenix Project and the Unicorn Project. They're both very fun because they start inside of organizations that are somewhere between bureaucratic and a little bit pathological. Then they build out from that. You watch these characters claw their way back out of these organizations and build a better future for themselves. I read the Unicorn Project probably whenever it came out in 2020 or 2019.
This captures this really visceral feeling that I'm talking about. In this scene, the main character, Maxine, is a developer. She's trying to improve how development work is done on her project in this crucial, critical way. Everything she tries fails. Every bit of progress she makes seems impossible. She's at her wits' end. I'll do a dramatic reading. She buries her head in her hands and silently screams down her keyboard.
If you are looking for some sort of catharsis, I highly recommend this book.
Phoenix is the most important project in the company. They've spent $20 million over three years. Yet, here she is trying to help. They won't spend $5,000 on more disk space. Now, she won't get a development environment for five months. If you've ever felt like you have been in this scenario, then read this book and feel better.
The analytic administrator
Also, come to this group. This is who we all are. We're this type of person. We've experienced this type of pain. Nathan Stevens, my old boss here at RStudio, who has now since moved on to bigger, better roles, maybe. We're sad to see him go. He wrote very eloquently about analytic administration and the type of role that this person, at the time, we called it the admin, inhabited. Now, we're more language inclusive. We can open it up to R and Python admins or data science admins in general. It's a data scientist who onboards new tools, deploys solutions, supports existing standards, works closely with IT to maintain, upgrade, and scale analytic environments, influences others in the organization to be more effective, who is passionate about making data science a legitimate analytic standard within the organization. I really like this definition of who we are in this space because it feels very powerful and it really spoke to me. This is one of the reasons, this type of of thought leadership that Nathan was writing at the time was one of the reasons why I joined RStudio, the solutions and sharing group, because I felt like a real sense of community here and that's what I hope that we can chat about today.
Challenges for analytic administrators
I'll open it up to the group from here, but I wanted to leave this as sort of like here are some of the challenges that I see for analytic administrators in this space and I typically break them out in different talks and into organizational challenges and technical challenges. On the organizational side, like we've talked about or been talking about, there's like the question of credibility and legitimacy. Do you have that? Are you able to advocate inside your organization from a credible and legitimate place? Do you have the right leverage points or business cases built out to support you and the cases that you're trying to build and asking for the tools that you need? Do you have the right relationships? Have you built out a functional working relationship with IT and how do you go about doing that? And then on the technical side, are there places where you feel as an analytic admin or you don't even feel like you're ready to take on the mantle of that title yet? Maybe you need some more technical skills, experience, education or exposure in order to get to that level. And so across these topics especially, I would love to know how we could support each other as a community, how RStudio can support you as well.
And I think I love this site, especially the building a business case section of the Champion site if you haven't checked it out at rstudio.com champion. I know Rachel had a big hand in putting this together as well as I'm sure a bunch of other people and I'm sure some folks on this call, some customers who have helped us out in a huge way in pulling this information together. This section in particular on how to build a business case and a playbook for that is really powerful. So I recommend starting here especially if you're not sure where to get started or ideas for how to get conversations rolling. Like you need to first and foremost come to the table with some sort of ask, some sort of plan and this is a really great framework for thinking about how to start building that. So I'll stop talking. Sorry. Once I get rolling, I just keep going. That was great. Thank you so much, Kelly.
We were just saying in the chat when you were showing the image from the plumber hex sticker, I always equate that to you because I first saw that hex image at a Women in Data Science conference and it's when I met you. So it's funny to see it there too. Just a reminder to everyone that you can ask questions on Slido as well if you wanted to ask anything anonymously. But I am going to stop the recording now just so that everybody can feel free to just jump in to the conversation as well. But for people watching the recording, thank you.
