Communicating the value of data science | Led by Merav Yuravlivker
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Well, hi, everybody. Thank you so much for joining us today. Happy Monday. Welcome to the RStudio Enterprise Community Meetup. I'm Rachel calling in from Boston today. I would love to introduce our speaker, Merav Yuravlivker, co-founder and CEO of Data Society. Merav will share with us some lessons in communicating the value of data science and bridging the gap between teams.
Well, thank you so much for joining. Let me go ahead and share my screen here so you can all see what I'm seeing. This is meant to be informal. As Rachel said, this is really, really fun for me to do these types of presentations to be able to share the best practices that we've learned and then hear from you and hear what else you're interested in. So don't be shy with questions. That's exactly why I'm here.
So I like to call this presentation talk data to me because I'm a nerd and I like puns. But really, the subtext here is communicating the value of data. So I'll give you a bit of a history of myself and talk a little bit about what I do. Before I do that, just wanted to let everyone know, presentation best practices. So I like to have this at the beginning of each one, just as a gentle reminder. We want this group to be interactive. You're all here to get value and to learn something new.
So as Rachel said, I'm the CEO and co-founder of Data Society. Just to give you an idea of who we are as a company that I helped start back in 2014. I'm actually did not start in the data science field. So my background is in education. I started as a public school teacher in New York City, teaching elementary special education for a number of years. And that's where my passion is. It's really an education and empowering individuals and providing skills for them to really supercharge what they're able to do and maximize their impact. Now that's really what we do at Data Society, where we just help professionals use data better. We deliver custom data science training programs to organizations.
The frustrated data scientist survey (2018)
Let's turn back time. 2018, folks. What was happening in 2018? Well, those who don't remember, Black Panther came out. People were eating Tide Pods. Does anybody else remember that in the U.S.? Beyonce made history in Coachella. This is also when Time's Up movement started. K-pop gained popularity kind of all over the globe. Harry married Meghan. They were also still royals.
Specific to data science, when we think about what did the field look like back then, there were a few things that were happening at the time. The first one is that there was more of a focus on some data pipelines starting to shift. APIs were becoming very popular because people just heard that they were being used a lot. There were some advances in NLP as well as deep learning. I believe the BERT model paper came out during that time.
There was also an increased demand on model interpretation. As data was growing, what I would say is more people were realizing, oh, we don't really understand this. Just to give you an idea, when we started in 2014, we would go to meetings with potential clients and partners, and we would say, we teach R programming, for example. We got the question, no kidding, we got the question, oh, why is it called R? Why didn't you call it S? We say Python programs. Oh, why is it called Python? Really, if we remember all the way back in 2014, open source languages were not really well known outside of the data community, statistician community.
Fast forward to 2018, it became a lot more popular, but what we found is that a lot of data professionals who worked with felt just really frustrated and stuck. Typically, they were maybe the only person in the organization that knew how to use one of these tools. They got a lot of nonsensical requests. They spent most of their time explaining data, digging around in data, using Excel, you know, so there was a lot of that happening.
One of the things we wanted to find out is, okay, well, how can we better help people? So, back in 2018, we were a pretty small company, pretty small community, so I just posted something on Medium, said why we're giving away data science training materials for free, because I built a survey that kind of asked some of these questions to understand what the pain points were, and at the end of the survey, I'd be able to compile the results, take some of those trends, and then be able to actually build a toolkit that could address some of these pain points.
I left that survey up for a month-ish. We ended up getting almost 200 responses, which for us was great at the time. Like I said, we're a small company. We're bootstrapped, so we didn't have a VC or anything like that, and then I took that. We then posted the results on Medium again, and then we were able to send out a toolkit that was really focused for data professionals who felt frustrated, felt as though they were stuck, felt as though they couldn't communicate insights correctly.
And I have a really terrible joke that I'll tell you now, which is if a data scientist finds a new insight but nobody around her understands it, does it matter? Sadly, the answer is no. It's kind of a sad joke, but I appreciate it for anybody who gave me the pity laugh while you were on mute, so thank you.
If a data scientist finds a new insight but nobody around her understands it, does it matter? Sadly, the answer is no.
And so the results that we found at the time, this was again, this was 2018, leadership, non-data colleagues, they didn't really understand what data could do. About 30% of data professionals' time was spent explaining results, so imagine in 40 hours, in a 40-hour week, you're spending 10 to 15 hours of that literally just telling other people about the work that you're doing. And we got a lot of magic comments in our open-ended section where a lot of people said, yeah, my boss feels like what I do is magic. I just click a button, magic happens, right.
The data science communicator toolkit
So based on that, we developed a toolkit, and what does toolkit include? So, there are kind of three components to it. The toolkit, we call it the data science communicator toolkit, and what it's meant to do is to help facilitate a conversation between you, the data professional, and people around you who might not understand data. So the first thing is marketing, right, so the toolkit comes with a presentation. It comes with a script, and it also comes with sample email templates that you can really just copy and paste and send out, putting in whatever you want.
What's changed since 2018
Come 2022, and things are a little bit different, right, so first of all, COVID happened. That was obviously a huge, huge shift. People are working more remotely. There's a ton more data that exists, even from 2018 to today. Alongside, there's really been, yeah, an increased focus of data everywhere, right, so if you have a Fitbit, which I have as well, you have an Apple Watch. If you take bikes in the city or any type of individual transportation, like the scooters, people can track movements within a city.
So you're all of a sudden seeing, again, such an influx of individual information on top of the fact that, especially with the COVID pandemic, hey, data actually became, I mean, it was always, there were some cases where it was life or death, but now there were a lot of cases where that's true, right? If people, if health agencies weren't sharing information with each other, we're really talking about lives that were at stake.
So COVID plus increased data everywhere, and then in the past few years, one of the shifts that we've seen is a focus for more companies and more organizations to become data-driven. So what I would say is in our earlier years, even through 2018 and 2019, training programs were focused on specific technical skills, right, so just intro programming, machine learning techniques, neural networks, text mining. In the past few years, we've seen a lot more of a demand for almost like a holistic data academy, continuous learning, where now there's a mandate coming from senior leadership, and I don't know if you've seen this where you are, where they're determined to become a data-literate organization. So now it's not just the data scientists, but it's literally everybody within the organization needs to have at least a basic understanding of what data does, how to use it, how to manage it, right?
We're seeing an increase in data governance, so if you work in any type of a healthcare company or finance, you probably have already had a lot of these regulations in place. If you don't, you're probably going to see a lot more of that, so obviously Europe passed GDPR, and that was a really, really big shift in terms of how we think about data. I know California also passed, I believe it's called the CPAA, similar to the GDPR, about giving ownership back to individuals, so ownership of data back to individuals.
Updated survey results
So on the back of all of these trends, we thought, okay, well, it's been about four years. Let's check in with the people that we saw a little while ago, right? What's happening now? What are they seeing in space? So, I'm going to caveat this. We released the updated survey a few weeks ago, and we have about forty responses at my last check, so the number of responses is a little bit lower, which is why I encourage everybody, if you have an opportunity to take it, please do. That'll help us better understand where we are today.
But even so, we're finding some insights and some shifts that I wanted to share with you, so these are the never-before-seen released results of the updated survey. One of the biggest shifts that we saw from 2018 to today, and we, by the way, over time, we've collected almost four hundred responses, so now we have a sample size of four hundred versus about forty. There was a shift in terms of what data professionals wish that their management and leadership knew, which is it went from the top issue was what data science can and cannot do, was that was what was identified in 2018. In 2020, it's actually flipped with a second-place one, which is how data science can actually impact the company, and everything else actually looks to be about the same. People are still feeling like nobody understands how much time it takes, what different data science methods do, what different tools do.
Beyond that, do you think your organization is using data effectively? The same proportion of people said yes versus no. I was hoping that the proportion would be a little bit more even, but we're looking at about thirty-five percent to sixty-five percent who said no, so it's clear that, at least in terms of the data professional world, there's a lot of work left to be done. People understand that data is important, but maybe they're still not leveraging it effectively.
Some other trends here, so some of the questions that we asked, what are some of the biggest changes that other people have seen? An increased demand for data-related roles, so we all know hiring has had some ups and downs, especially this year, but in general, data roles still remain pretty high in demand, and now there's less of a lone data scientist, like, in existence. Now, what we're seeing is that there's more people that are joining teams of data professionals, so that's also a shift that we're seeing. Increased amounts of training, more investment in data tools.
And then some of the biggest challenges, and this is actually mirrors very much what we see as well, putting work into scalable production, again, building those pipelines, so I know that MLOps has become a really popular topic. We actually developed our own program for it last year because we saw the demand for it. People can build algorithms, but now it's a question of how do you operationalize it? How do you scale it? Getting access to the data that you need, again, that's a pretty tough challenge for people. Still see that, and communicating data effectively with my colleagues, so those were the top three challenges that people are still seeing.
Some other trends that we're seeing, and I pull these quotes directly from our survey. Most people still think it's really easy to gather data. I don't know why. This is something we try to focus on with a lot of the programs that we run, which is it takes a lot of time and a lot of patience. Magic still appears, even in 2022, so machine learning can't do magic, right? Again, we had a lot of responses saying there were misperceptions, misconceptions about how data analytics can be performed.
And then on the other end of the spectrum, we have a situation where people have enough knowledge to know that it can do stuff but don't understand the limitations, so data science means anything is possible because look at Google. Yes, if we were to look at Google, if we were to look at those really high-tech organizations, it seems like they can do anything and everything with data analytics and machine learning, but it's easy to look at them and say they can do all of this. We can too, and it's another thing to actually take a look and evaluate the infrastructure that you have in place with an organization. How is your data stored? How is it being collected? Who knows how to use it, right?
Interestingly enough, 30% of time on average is still spent explaining results. I would have hoped that would be a little bit less, but overall, what we're seeing, at least in the responses so far, and we're hoping to collect some more, is that people have a better idea of data, but they don't really understand the nuances behind what's possible and what's not. They're having difficulty understanding the applications, and that still seems to be a big challenge for most people.
Discussion: what are you seeing?
With that, I have a discussion question for you, and I'm seeing some people are already chiming in, so I'll give folks a few minutes in the chat. How is this similar to what you're seeing? Alternatively, are there different pieces that I'm missing here? Are there other pain points that you're seeing in your organization that I haven't addressed?
I figured I'd say it rather than type because I typed so much during the day, but yeah, very much feeling that. I started my analytics career 20 years ago and then worked in academia for a while and then back in industry, and it just is shocking how little progress has been made or maybe being in a different industry, how things are still so messy, and it's just shocking.
Yeah, absolutely. Thank you so much for that comment, and you have a really interesting experience going from industry to academia and then back to industry, and so there's so much advancement in the actual field of data, but then industry, it feels as though takes a longer time to catch up as well.
Sure. I just wanted to expand. It seems a couple of us have a similar issue is that our data team, we can provide places to start inputting data from our fields. I work in a fundraising capacity, so we can give pipelines for people to start recording various activities that they perform to see what leads to, say, a donation, but so our leadership is saying, well, we've given you ways to measure those. Now, why aren't you giving me the answer when maybe our field doesn't understand that they need to enter the data, or they have their own shadow Excel system, and they aren't recording into our pipeline, so our pipeline doesn't necessarily look correct. And so the data team are like, we don't want to give you this answer that you've asked for because we don't trust this data at all, but they'll say, well, give me the answer, and we'll worry about it later.
Yeah, and Laura, I hear you 100%. We see a lot of that, and I see that trend continuing for the foreseeable future because it takes a while for people to kind of figure out to know what they don't know and then actually be willing to learn it and then apply it in that situation, so it's hard when you're starting from a point where people might not really understand what you're saying, but you have to be able to communicate results in some form.
A couple of other points to make. I think John made a good point where explaining data has not always lost time, especially if you're doing it for clients. Absolutely agree, and maybe I need to reframe my question a little bit because I had in my mind explaining the work that you're doing to other colleagues, which again is not necessarily time lost, but if there are ways to streamline that to make it easier, but I completely hear what you're saying, and I do think that makes sense.
I'm also seeing a couple of folks saying, yeah, people want to hire more data scientists, but what they really need is a data engineer, for example, and there's definitely a lot of job postings that I've seen out there. I'm sure you have too, where somebody wants to hire a data scientist that has experience with every tool under the sun plus a decade of experience in analysis, and you can tell that they're not really sure what they're looking for. So I think that's a really interesting point as well. People know that they need these folks but don't really know how to describe the responsibilities.
Concrete steps to build a data-driven culture
So, when we're talking about everybody on this call, you have pain points, and I recognize that you're all working really hard, so this is not meant to be an added burden but rather an opportunity to think through what options could work best for you, given where you are and given where your organization is. First thing is to find a champion or be a champion, right? What we found to be most effective is finding somebody in leadership. If you have a CTO, CIO, CDO, or anybody at the C-level who's passionate about data, who has access to budget, who has a say, having somebody from leadership saying, Hey, this is the direction we need to go in. Here are resources for you, is incredibly powerful, especially with people who might be a little bit more resistant to learning about data.
Along those lines, if you're feeling up to it, offering some types of data trainings to develop a common data vocabulary, empower staff. Again, the updated toolkit that we'll be releasing probably within the next couple of weeks can be a great starting point for just a lunch and learn. Help people around you understand what data can and cannot do. Help them understand how they can contribute and make your life easier, because I think most people in general don't, they want to be able to do this on their own, they just don't know how.
Giving people time and space to be innovative, so asking others, maybe other data professionals around you, Hey, how are you using data? I think one of the most important things, this is why I love kind of the RStudio community is sharing best practices, especially internal to an organization, is such a powerful resource that generally isn't leveraged to the fullest. Finding other people in your organization who can do that and starting to share, you're going to really find a strong group of folks that will encourage data adoption and will also potentially help you solve some challenges that you're seeing in space, because they're familiar with the architecture, with the tools within an organization.
Sharing best practices, especially internal to an organization, is such a powerful resource that generally isn't leveraged to the fullest.
Providing space and technology for people to innovate or bringing in outside resources to do the same. And I'll just put in a plug, one of the things that I do, honestly, this is just something that I do for free, is I do have a presentation that talks about what data science can and cannot do, and the whole purpose of it is to facilitate this type of conversation. Happy to chat with you if you think that's helpful for your organization. Easy for us to set up a lunch and learn where I can just come in and do that, and hopefully start to support your efforts and facilitate that communication.
Being a role model, I'm sure most of you are already doing this. But if you're not, or if you think there are ways to improve, here are some suggestions as well. So, asking for metrics or analysis behind conclusions and reports, or providing them, and showing that this is what the standard needs to look like. Making sure that you're asking and addressing powerful questions, so talking about what the implications are behind the insights, using tools and exercises to get people out of the box, finding ways to connect with them that will help them better understand what you're doing. Talk about what it means to make a decision based on data. I saw somebody earlier said, we're still at a level where we're all trusting our gut. I saw several of those comments also in even the updated survey responses. So, it's still happening. You can trust your gut and then see if the data verifies it. I would say it's not always either or, because we don't want to discount levels of experience, but I think we're at a point where those levels of experience need to be supplemented by data.
Providing a safe space, so a safe space like this one, a safe space like Nerd Lunch, any type of lunch and learn that you have, even with other data professionals, just what types of crazy ideas do you have? What would be interesting to explore? Talk about what's possible, and one of the things that I always encourage any data professional to do is find successful examples of data projects within your organization, so whether it's one that you did, one that your colleague did. Leadership understands projects that are done internally, so if you can show results, they will have much more understanding of what's possible, and they're much more likely to support your efforts, so make sure to communicate with those successes, and again, build that community practice.
Giving recognition. Again, if there's a successful project, shout to the rooftops about it. Have a lunch and learn about it. Is there a newsletter? Maybe write a blurb about it. If you have a hackathon, demonstrate what were the top three most successful outcomes from a hackathon. Shedding light on that not only encourages leadership to look more into data, but it also encourages them to keep doing a good job. People like to be recognized for the work that they're doing, and so being able to recognize that is incredibly important, and then don't be afraid to make mistakes.
One of the things that we do internally is, if there's ever something that goes a little bit differently than what we expected, whether it's on the solutions or the training side, we do a debrief and just say, Okay, why did this happen? How do we prevent it moving forward? There's no blame, and that helps people feel comfortable innovating. It also helps people feel more comfortable asking for help when they need it.
Second discussion: what's worked for you?
Alright, last discussion question, and then I'll open it up to any type of questions that you have, but my question to you is, we talked about some concrete things that you yourself can do. What else have you seen be effective to build a data-driven environment or to help other people understand what you're doing in a way that's supportive to you?
Awesome, so Lisa's saying, having upper management who support the effort. Oh, Eugene, that's a really interesting point. Stop using the word data. Get specific. I like that. What metrics are you looking at? What type of data, right?
Yeah, thanks. I just wanted to share something, an initiative that we've been doing at the Financial Times, where I work, and we as a wider data science, of which I run as a part of the wider analytics organization, and we've had a big data democratization program, which is all around data in the hands of people, and as part of that, we really invested in hiring a learning and development individual who's a specialist in data, so she comes from a data analytics background, but her education is in learning and development, and it's quite a unique, niche role. Quite hard to source these types of people, but that's been fantastic because part of her role is both in educating, a lot of what we've been talking about is that whole education of the wider business community, but it's both a push and a pull in the sense of that she will both educate kind of the wider business about the different types of analyses, descriptive, prescriptive, predictive, and help them to learn to interrogate their own questioning about what they're trying to find out from data.
That's awesome, and can I ask Leanne, how often does that individual come to work with you? What's the cadence there? So it's kind of, so they're 100% dedicated to analytics. Unfortunately, this is where it gets into difficulty, that the business doesn't quite still understand like they're like a lot less priority than like other roles in our organizations. At the moment, the way that it works is that whenever I've got, so for example, my two lead data scientists need a career plan putting together for them and their teams, and so I've gone to her from like some career coaching consultancy for them, so it's kind of in that scenario, it's very ad hoc.
I'm seeing some other comments, so I'm just going to go in here, you know, definitely using words that people can connect with, like changing analytics to informatics. Laura, the confusion of data requests and report analysis requests, data is not the same thing as analysis. Yep, I hear you, and then talking, Libby mentioned, you know, having a manager director who's willing to say no to things that don't make sense for the team to work on, who protects their time. I agree, if you have a manager who's supportive, even if they're not necessarily technical, but they understand what you're doing, that does make a huge difference as well, because then you have an advocate in your corner that can also bring the work that you're doing to the forefront.
Q&A
With that, that's the end of my formal presentation. This is my email. It's just marav.datasociety.com. I'm not burdening people with having to spell my last name, but if you want to chat, if anything that I said resonated with you, or if you're interested in doing a Lunch and Learn for your own organization, just let me know. This is what I love doing. Like I said, my background's in education, so I enjoy sharing best practices, sharing the knowledge that we've acquired over the past eight years as we've been working on this.
One of the questions that was on Slido was, how did you make the transition from special education teacher into data? Yeah, that's a great question, so I'll tell you that it was a lot of serendipity. I was in the classroom for a number of years, and I decided that I wanted to be able to have an impact on a larger scale, so I then spent the next several years working at different educational institutions like Kaplan, like the International Back-A-Way organization, just really learning everything there is to know about education, training, and assessments.
And honestly, I had a mutual friend who was having trouble finding out how to learn about data and how to learn. Specifically, actually, it was our programming. He was an analyst on Wall Street who spent most of his time in Excel, and a friend of his showed him how he could automate literally three weeks of work into an afternoon, so he saw the value of that. And then when I was in my career, I also was using some data-driven methods, but I couldn't find any resources that were specific to me as a professional where I could just pick it up quickly, so we had a conversation and thought, well, if we can't find what we need, let's build it. So it was myself and then two other co-founders, and I said that the best thing that happens is that we still have a company and we build something cool, and the worst thing that happens is that I learn a lot. And I did quit my job nine months into it, and I said, well, I can always go find another job.
We do get a lot of pushback. Interestingly enough, what I would say is it's not always talked to your leadership. A lot of times, it's management folks that have been doing this for a while. You don't really see a need to change. There are a few points that I bring up that I think are particularly effective. The first one is it's not something, let's say, that we're forcing you to do tomorrow. We're not going to shift everything that you're doing. We're not saying that what you're doing is wrong. What we're doing is showing you a different set of tools and a different set of skills that might make your life easier, so by alleviating the type of pressure of being forced to implement this, you're giving people room to explore and understand for themselves why it's important.
I think the second point is, and again, it depends on the level that you're speaking to. It will be really, really hard to stay competitive without these skills, just period. There are organizations that are becoming really data-driven. There are organizations that are leveraging their data very, very effectively, and that translates to increased profits, increased revenue, increased efficiency, more satisfaction, increased retention of employees. And guess what? If you're competing against that organization in five years and you don't have that same level of skills, you're going to be feeling it.
It will be really, really hard to stay competitive without these skills, just period.
And then the third one is finding any type of successful project or any type of use case that will speak to that audience, so whenever I give my presentations, I like to pull from different use cases in an industry where people can relate to, and I find that that really helps with the light bulb moment because they start to understand how they could be doing something similar, so those are three points that I would recommend. It doesn't always work. It really does depend where the organization is, their budget. If somebody's invested like 10 million dollars in a tool, they're going to want to use that tool whether it's helpful or not, so there are some things that are harder to get around.
Definitely. Thank you, and I want to stay true to what I said where I'll stop the recording so we all can jump in the conversation too, but on that point of finding different industry examples, I wanted to say this in the recording. I did share the Champion site, rstudio.com slash champion, which has a ton of different industry examples grouped together, so it pulls together like webinars and meetups and blog posts and just lots of people talking about the ways they're using data science. But I will stop the recording here so we can all can just like jump in and chat with each other, but to anybody watching the recording, thank you. Thanks, Mirab.
