Data Science Hangout | Merav Yuravlivker, Data Society | Getting People Invested in Data Science

Transcript#

This transcript was generated automatically and may contain errors.

Welcome, everyone, to the Data Science Hangout. I know a lot of you have been here before, but if this is your first time, it is an open space for current and aspiring data science leaders really to just connect and chat about some of the more human-centric questions around data science leadership. And so we really want to create a space where everybody can participate and hear from everyone from the community. So there's three ways that you could ask questions. You could jump in live, put questions in the chat, and we also have a Slido link that Rob will share shortly, but you could ask anonymous questions there too. And just a quick note for everyone that it is recorded, so it will be shared up to YouTube as well for anybody who's missed it.

But I am so excited to introduce my co-hosts for today's Hangout, Merav, who is co-founder and CEO of Data Society. And Merav, I'd love to have you also just introduce yourself and share a little bit about the work you do in Data Society.

Thanks, Rachel. Thank you so much for having me. I love doing these types of events, so great to see you all. If you have a camera, you feel like turning it on, I love to see people. If not, that's fine too. I understand people might be running around with kids in the background and or eating lunch, so that's absolutely acceptable. As Rachel said, my name is Merav Yuravlivker, CEO and co-founder of Data Society. Just to give you a bit of a background as to who I am, I'm a teacher first and foremost. I actually started my career in the classroom way back in the day in New York City public schools and spent several years in the education space before starting Data Society.

And the reason why we started it is because we saw that there were a lot of professionals in the space who needed to understand how to use data, but didn't have $40,000 to get a master's degree, didn't have hundreds of hours to spend looking through YouTube videos, and we saw an opportunity to build practical customized training programs to deliver specifically to federal agencies as well as Fortune 500 companies to really help people get up to speed quickly. Because what we believe and I'm sure what a lot of you do as well is that when you layer a data skill set on top of the industry knowledge that many people already have, you are really transforming the way that people work and the way that industries run. And our goal is to help industries and companies become more data driven because we think that's what's going to lead to the greatest innovative leaps and it gives us the most power to be able to solve a lot of the challenges that we're seeing in today's space.

So we're really passionate about helping people understand how to use data. We also have our consulting side. Frankly, it just came out of teaching. When you teach what you do, a lot of people ask you to do what you teach. So we build custom software, predictive algorithms, help with digital transformation, and we do a lot on that side of it as well. We're based in Washington, D.C., but we have offices in Richmond and we have folks all over the country, especially as we've gone remote during the pandemic. So that's a little bit about me and who I am and what I do.

Getting started with new clients

Thank you, Merav. And while we're waiting for some questions to come in from the audience, I'm just curious, what does that process look like when you start with a new client and are starting off that whole training process?

Yeah, that's a good question. We're very much focused on solutions and outcomes. So for us, every single client that we have has a bit of a consultative approach at first. We talk about what are the learning objectives, what are people's current skill sets. We also deliver assessments that can help people actually understand what skill sets are. So we go through that and then we develop customized pathways based on those objectives and the starting points. And we have an in-house team of data scientists as well as instructional designers that work together to make sure that all the content that we deliver is best of breed. We work with a lot of instructors who have industry experience. We have in-house instructors as well. And our main goal is really to create a good experience where people can walk away and actually immediately start to use the skills that they have.

So a lot of our programs also have capstone projects, which are a really great way to demonstrate return on investment and also application of data skills directly to a challenge that they're facing. There's actually one other piece that I didn't mention, which is as we expanded our technical training, what we saw is that there's a bigger gap that we're trying to bridge right now, which is the communication gap between data and non-data professionals. We've seen that's a really big pain point in a lot of organizations. So we actually have our whole non-technical series that works with executives, managers, and general staff, so that people can all speak the same language because we want to facilitate that level of collaboration and communication across an organization.

Getting people invested in data science internally

That's awesome. I'm thinking that from the audience here, there's also a lot of people who are maybe doing this training internally at their organizations or are like the one kind of spearheading that and getting people interested. So do you have any tips or best practices for doing that?

Yes. First of all, congratulations. That's a really hard job to have. So I'm always happy to help if you ever want to chat with me specifically about that. One of the biggest ways that we've seen that can facilitate interest in this type of training is to demonstrate the value that it has. So if you have data science teams who are using data analytics or there's a particular project within your industry or your organization that was successful, shout it from the rooftops, get some lunch and learns, get that newsletter out. Make sure that people are aware of the benefits that they can have if they start to implement these skills.

And then the other piece of it is make sure that you have data champions in high places. What we've seen especially to start with is if you don't have a mandate from the top to become more data-driven, I mean, most people just won't take the time out of their day to learn these skills because they don't feel like it's mandated. So it has to come from the top. You have to have folks saying, we need to make sure that we're using these metrics, that you understand what they mean. We need to be asking questions about where this data is coming from. We need to make sure that our data is kept securely.

And having that type of conversation come from the top helps give cover to people in a way who want to take the time out of their day but maybe weren't able to before because maybe their direct supervisor didn't see the value in it. So get a data champion that's high up there and then make sure any successful data projects that you have are well publicized are two tips that I would give.

And then the other piece of it is make sure that you have data champions in high places. What we've seen especially to start with is if you don't have a mandate from the top to become more data-driven, I mean, most people just won't take the time out of their day to learn these skills because they don't feel like it's mandated. So it has to come from the top.

Thank you and feel free anyone to raise your hand too if you want to or I can take people unmuting as raising their hand as well. But just to follow up on that, so finding that data champion higher up in the company, how should you actually go about doing that if you're part of a large organization?

Yeah, it's not always easy. That's what I'll say. Politics in large organizations can vary. Sometimes they're less difficult than others. I would say start with the allies you know you have. If there's one person, even if they're your colleague as opposed to somebody higher up, just get together. Start talking about maybe building something like this, an event like this internally, right? Once a month you just meet people who are interested in using data. Start to get a quorum so that when you go to a supervisor or an executive you can demonstrate the level of interest that your organization has.

The other piece of it is bring in other examples of companies that might be more data-driven than yours because we all want to make sure we won't become obsolete and what I've seen is that organizations now, if they are not going towards this whole holistic data literacy and building a data culture, if organizations aren't moving towards that, it's going to be very difficult for them to keep up with the market in the next five to ten years. And when you say things like that to an executive audience, they tend to listen. So if you see that your competitors or other players in your industry are doing this, that's another good motivator, especially for those at the executive level.

Thank you. I see a follow-up anonymous question that came in on Slido. When you mentioned well-publicized, what do you mean by making sure your work's well-publicized?

So if there is a project that went well, I would look at what avenues you have available to you to just disseminate that information. I personally love Lunch and Learn because, just like this one, it highlights a person who's doing a great job and also encourages knowledge sharing across an organization. If you have enough people who are participating, it tends to disseminate. So I would do that if there are newsletters. I know we work with a very large client who says they get 20 newsletters a week. Sometimes it's not really the best avenue, but for a lot of folks, you also might have Slack, you might have Microsoft Teams. If you have any media outlets, I know that organizations love to show how data-driven they are. So if you are in touch with your marketing team or your PR team, throw that at them. That may be something they would want to do a little press release about or potentially reach out to one of their contacts. So there are a few different ways to go about it to try to garner that level of attention.

That's a great point. Frank, I see you asked a question in the chat if you want to ask that live.

Hey, folks. Good to see everyone again. This is a really fascinating way to start the hour. You got me wondering, is there any tips that you have to try to find a balance between, hey, let's do this. We got in a huddle. We have this idea. We have this project. Let's see it through. And hopefully, once we get it, we can put it in the newsletter or we can put it out there on social media and show people that we're data-driven. But man, we can't wait. You don't want to wait 3, 6, 12 months for that cycle to happen. Is there a way that you work faster, I don't know, to experiment to say, oh, yeah, this is the right direction. Or no, let's pivot and be OK with pivoting and try something else.

That's a great question. The short answer is it depends on the project you want to focus on. And it also depends on the data that you have. So especially if you're starting from a point where you just want to get something off the ground, start with the data that you have. Because data collection, I think, as many of us know, can take 3, 6, 9 months depending on the type of data that you have. So if you have data that you think can answer maybe one or two questions that you have that would be relevant to some of the challenges that your team is facing, your department, look for that low-hanging fruit. Because once you have the data, the cycle to actually go through, cleaning can take a while. Hopefully, your data's in good shape. But once you have that, it's pretty straightforward to run through any data visualizations if you want to build a dashboard. Or alternatively, if you're running it through some machine learning algorithms, that whole prototyping process is pretty quick.

I would also start with a small amount of data just to see if you're on the right track. So take a subset of your data, train your model on that, then test it, and then see if it works out. And if not, that's OK. That's all part of the process. It's hard to guarantee a good outcome when we're doing this type of work.

Yeah, I'm a big advocate. It's so hard for especially large organizations. This is not a one-month project. This is a multi-year project. So I always recommend, especially if you're just getting started, look for the low-hanging fruit. Where are the people in your company that you know would be big advocates for this? Where is the data that you know, if even put into a visual interactive dashboard, would garner that type of attention? That's where you want to start with. I love data visualization because it's usually presented in a way that most people understand, and it's less intimidating to executives who might not know how to use data. Everybody loves to look at a pretty picture and point to trends, right? That's a nice and easy way to get people on board.

So just out of that project, we're looking at a million dollars a year in a new revenue stream that they wouldn't have had before.

Gut feel versus data-driven decisions

Yeah, absolutely, and gut feeling is interesting to me because we get a lot of people who say, well, I've been in this industry for twenty to thirty years. I know what I'm doing. I don't need data to tell me what I'm doing, and that's not necessarily something we ever want to dismiss because, listen, at the end of the day, computers are good at a lot of things, including repetitive tasks, but nobody understands, no machine understands concepts and context better than we do. But I like to point to this example that was a study that was done, gosh, it might be a decade ago now, where there was a group of researchers who surveyed eighty law professors and asked them to predict the outcomes of cases based on the case file, and then they also developed an algorithm to predict the outcomes of cases as well, and in fact, the algorithm outperformed all of the experience of those law professors.

Now, that doesn't mean we should throw out all the law professors that now only focus on algorithms, right? Obviously, there's a lot of nuance to all of this, but what it does mean is that we should always encourage folks to use data as a starting point, as a baseline, to make sure that their gut is headed in the right direction. I don't think that these types of algorithms will replace us anytime soon, but what I like to think of them doing is to augment the skills that we already have so that we can ensure that we're making the best decisions possible with the information that we have.

Yeah, that's another really great point. That's why I love, you know, code repositories, so something like GitHub or Bitbucket, developing shared drives to host, to keep all of this data, to keep all of these algorithms and coding templates is crucial to ensure that you're not redoing things you have been before or losing information. So data governance, and we haven't really touched upon this, but data governance is a really important piece of that just to set the standard of how data should be used and then what the expectations are so that everyone's aligning on those rules.

Building trust with clients

Robert asked, how do you gain the trust of a new client after the engagement starts? In other words, how do you make them feel like your team is delivering value?

That's a great question. The key to that is communication and clarity. What I've found is building trust takes time, of course. And while we say that we have everybody's best interests at heart, I know that there are folks that have already always had bad experiences with people like that, right? So it's not enough to say it, but we have to show it. So what does that mean? In terms of our training programs, we usually set up weekly check-ins, especially if it's a longer program. We ensure that we solicit feedback from students. So for some of our longer programs, we also have these very short check-ins. There are typically two questions. The first one is how difficult did you find the material? So to help us ensure that we're meeting their expectations in terms of skill level. And the second one is how relevant is this material to your work to make sure that we're always teaching the most important topics.

In terms of communication for our consulting projects, we operate very similarly in terms of that level of communication, ensuring we have those regular touch points, ensuring that our clients understand the progress that we're making, and then also making sure that when issues pop up, we bring them up. I think sometimes clients are afraid to come with me with negative feedback. And I get it. Nobody likes to tell somebody anything unpleasant. But for me, I welcome it because that's the only way that we grow and that we get better. And if I don't know about it, then I won't be able to fix it. So whenever I have conversations like that, the first thing I always say is, hey, thank you for bringing this to me. That should be everybody's automatic reaction. And then the second piece of it is, let's talk about what we can do, what steps we can take to make sure that you are getting what you are expecting from us.

So that's a long-winded answer of saying it's just communication more than anything else. That's the number one way to build a good relationship. And that's not even in the professional field. I would just argue that as a general rule.

Prioritizing multiple projects

Yeah. Good question. Well, we're all data people. We love our matrices. So I like creating and evaluating against two different axes. The first one is feasibility, and the second one is impact. So if there is a project that you've identified is quite feasible and has a high level of impact, so that would be in the upper right quadrant for me. Then that's the one that I would start with first. And then based on that, you can see, OK, maybe there's a project somebody threw in there, but it's actually really difficult to get the data, and it's not going to move the needle for us that much. Maybe that would drop to a lower priority.

So it's really dependent on the objective that your organization has, as well as the resources that you have, and then what the timing sense is. Because even for a project that's maybe less feasible right now but has a high level of impact, maybe there are some tasks that you can start doing now to prepare and to improve the feasibility of that project. So if it's data collection, maybe there's something you can run in the background just to start collecting data for a project that right now isn't feasible but might be in three months when you have more data.

Yeah. When I started my new role, I did a couple weeks of meeting with various people and tried to find a way to rank projects. And we landed on this, what you're saying. One, how hard is it? Two, what's the impact? And then three, is the business ready for this or not? Which is related to impact but not quite. So we developed an internal weighted number and then scaled it from zero to 100 to figure out what to do first.

Using data just to determine those projects. I love it. That's great. I like business readiness a lot. I think that's important. We get a lot of clients who ask us to do neural networks off the bat. And I'm like, well, let's talk about the data that you have and then we'll see how it goes from there. Executives like to throw around terms that they're seeing without necessarily understanding the implications behind it. So it's one of the, again, this goes back to building trust. One of the biggest pieces and biggest responsibilities that we have is to educate whoever we're speaking with to make sure they understand what they're actually asking for. Because a lot of time what clients are asking us to do versus what they actually need can be different. So that's why we like to dig in there to make sure that we're addressing what the actual need is.

Building a predictive model

Yep. So in terms of building the model, we have a framework that we use. We also teach. And it has six steps. I'm sure that there are variations of this all over the place, but we always start with asking the question. So like I mentioned earlier, without the right question, it is very difficult to build a model that will make an impact. So you want to ask the right question. You want to make sure that you have the right data. Honestly, that will take 70 to 80% of any data project time because collecting data and cleaning data can be very labor intensive. And that also goes back to the piece of setting up the appropriate data infrastructure and having the right data governance in place. Because if you do have both of those pieces, it dramatically shortens the amount of time that your data scientists need to spend in R or Python just wrangling the data and manually understanding it.

Once you have that, depending on what the objective is, you'll determine which algorithm makes the most sense. So if you're just starting out with unlabeled data and trying to understand different types of groups that your data falls into, you might want to do something like clustering, which is a well-known unsupervised machine learning algorithm. If you already have data that has existed and you're trying to predict trends or categories, you might do something like classification, maybe with a random forest or something like that. And use a part of your training data for that to build a model. And of course, you always want to test it. And then you want to validate it with real world data. And then the last piece of it is interpreting the results.

So a lot of times that's the typical framework. And there's a lot of back and forth, so it sounds linear right now. But during the actual process, you should be prepared to go back and try different algorithms. A lot of times people like to build several different models to determine which one presents the most accuracy. You'll also want to take into account what does accuracy mean to you. So just to give you an example, if you're building an algorithm for the Air Force to determine how often planes need to come in for maintenance, you really want to err on the side of caution for that. If you wait too long and planes start falling out of the sky, it's going to be a really big problem. So your accuracy level, you might need to adjust your parameters to be as conservative as possible. So a lot of that also depends on inputs that you can give your data science team to make sure that you're prioritizing the appropriate business cases.

Keeping students engaged after training

Going back to the training piece, do you know if students continue to use the skills, specifically coding skills, after training has ended? Do you have any tips to keep students engaged?

That's a good question, and that's one that we're working on internally right now, because one of the big trends that we're seeing is not just teaching and learning, not just teaching individuals, but then also building this data culture and culture of data literacy to foster innovation and collaboration. We have seen incidences of people using our skills afterwards. I'll be honest with you, it's an area that we are looking to grow internally, so maybe deliver some assessments and surveys six months to a year out from a particular training program to see how they're using those skills.

We have seen evidence of it, so for example, with the programs that we run at HHS, the Department of Health and Human Services, what we've seen is that groups from the cohorts that we've run continue to meet on a monthly basis, even after a program is over. So what does that lead to? It leads to an exchange of ideas, problem solving together, and just again helps build that collaboration that ripples out and has a much bigger impact otherwise. We've seen other folks who have taken the skills that they've learned in our programs and then applied that to some proofs of concepts that have won awards for their organizations, won projects and things like that. So we do see evidence of it, but as a data-driven person, it's probably not up to the standard that I'd like that to be, and again, that's something that we're working on with our company.

Scaling up the business

Hi, really nice talk, really interesting topic. I'm really happy I joined today because it's something that I was always kind of interested in because I do a lot of training with a lot of academic groups and also for younger researchers and people coming into the field. And I've done some work with companies and I'm always interested to hear how other people scaled up their business and always interested to know if they bootstrapped, if they got investors, how they standardized procedures, how they kind of met standards for instructional design and things like that. So when you introduced the topic and talked about you have all these internal people and instructional designers and data scientists, I was like, wow, how did you get to that level and how did you grow from your background and build up to this level?

It was a messy process, I'll just put it that way. For us, we bootstrapped, so we didn't raise any capital. We decided that we wanted to retain control of our company and where it would go. For us, it was really important to do that. So because of that, we needed to ensure that we could keep the lights on from day one. We started out relying on individuals like data scientists to build the content for us because I actually didn't have a background in data science, obviously I was coming in from education. So I was relying on the expert to build it up for me. And what we found is that there was a really big variety of quality in that content. And that just because somebody knows the topic, it's one thing to know it. And it's obviously a very different thing to teach it and to understand where a student is coming from.

So even though it's much easier, frankly, to rely on one person to build content and deliver it, and that's how a lot of companies do it today, we decided that that wasn't the direction that we wanted to go in because quality is really important. And my background in education makes it the number one priority, we want to deliver something good. So we developed a process in-house that goes through several iterations between our data scientists and our instructional designers so that we get the best of breed from a technical standpoint, and then also best of breed from an instructional design standpoint. And it took us a few years to get to that point.

Now that we have a lot of content, it's usually a faster process, but because we customize to almost all of our clients, so we incorporate their data, their use cases, we still go through that process. Not to mention the fact that, as we all know, technology is a double-edged sword. It's great, but it updates every three months, so we just need to make sure that nothing is deprecated when we're teaching it. So we've developed this process, and actually one thing that we've worked on internally, which I'm really excited about when we're talking about scaling, is that our data science team actually developed an internal software that will automatically customize content for us. So instead of us having to run through that customization, we can actually take a data set, input it into the software with the shell of our course, and what it will do is actually run that data set and populate it, and not only populate it with the algorithms and the data that we're running, but it will also pull relevant case studies from Google using key search words. So we built a custom search algorithm for that as well. So it's one of the beauties of running a tech company is we get a lot of folks who are very innovative, very motivated, and that will help us really fast-track our scalability.

That sounds awesome. Yeah, the underlying thing here, by the way, is also just hire really good people and trust them to do the right thing. That's just the number one. We have an absolutely amazing team, and a lot of the innovations that have come from our company come from them, and my job is to make sure that we have enough business coming in and that we create a really good work environment for people so that we have a good culture and good retention.

Change management and data culture

Can you hear me? Okay, great. So I was wondering if you cover any change management topics more broadly with your clients to help them implement the new skill sets that their teams are learning on a more organizational level, because I know that that can be a challenge. Learning the skill set is one thing. Having the whole organization adopt that sort of shift in using that is a different story, so I'm curious how you implement that or if you even touch on it.

Yeah, absolutely, Libby. Great question. We do touch on that because we saw exactly what you were talking about where the joke that I like to make is if a data scientist has an insight but nobody around her understands, it doesn't matter. So no is the answer to that question. It's a lame joke, but sometimes people laugh at it, and it speaks to an underlying current of truth where at the end of the day, if your executives and your leadership are not on board with what you're doing, it's very difficult to implement the type of change that you need in order to see that massive return on investment of your data. So we do a lot of work with executives. We deliver a lot of workshops with them as well as with managers. Managers tend to be a little bit more hesitant because a lot of times people in higher levels feel as though they're being told that what they've been doing is wrong, and our first point is to make sure to let them know that that's absolutely not the case. Obviously, they've been successful to get to this point, and what we're here to do is give them a whole other skill set and a satchel of tools that they'll be able to pull out and use to just increase the impact that they have. So we do a lot of work with change management and get the leadership on board so that they can actually start to develop a data strategy around an organization.

Reinforcement learning in practice

Thank you, Rachel. Yes, I would like to add a bit of context to the question. I'm actually busy with a master's degree in big data analytics, and the modules only cover supervised and unsupervised learning, and it was the usual cases where you look at a credit card fraud when you apply a large integration or SVM, and some other cases where you apply unsupervised learning for clustering or principal component analysis, but what really intrigued me was the realm of reinforcement learning and how we can use it to understand or develop new things like robotics or learning the lunar rover. So I just wanted to understand like in the mainstream day-to-day job, are there any specific projects that are reinforcement learning driven?

That's a great question. So what I would say is the majority of cases that we see and that we teach are foundational unsupervised and supervised machine learning. So typically we'll look at clustering, we'll do some classification techniques. Text mining is usually where people top out, and text mining is actually one of our areas of expertise, and it's incredibly, it's grown in popularity a lot, especially over the past few years because of the explosion of text data that comes through social media as well as customer comments. Reinforcement learning is really interesting that you mentioned that because we actually were on a call this morning with one of our clients who is asking for a specific case for reinforcement learning. So we'll be building out that program for them. It's less common because it is a lot more complex, so we haven't seen a lot of demand for it to date. And I would argue based on what I've seen, a lot of the challenges that people face today, it's less about building those types of predictive algorithms and more about even understanding the data that they have currently to make better decisions. It's getting from that zero to one that we've seen is most difficult and most in demand, and then beyond that, usually you have people who are experienced enough who can develop those algorithms without as much support.

Overcoming fear of AI

Yes, is the short answer to your question. A lot of what we do today is help alleviate trepidation about using data because, especially now, organizations that are not data-driven feel like they're so behind, there's no way to catch up, whereas, in fact, I would still argue most organizations are still at that beginning stage.

The key thing is to just have a level of reassurance as well as appropriate terminologies to deal with the fears that people have. So coming in and starting with, hey, what level are you at? And then letting them know, if you're talking to your team about this as well, you can just say, hey, actually, this is true across our industry, right? Most of our organizations in this industry do not actually have this in place. So helping provide some of that level that this problem is not unique, we've seen it before, and there is a pathway forward, is really critical to help people feel more at ease with the idea of learning data.

The other piece of the puzzle is just giving them a basic framework of what data science is, and the way that I like to describe it is that you are quantifying the relationships around you. That's what data science is, and we're able to do that now on a much higher level because we have this massive influx of data that we've never been able to collect before. So we look at data science as the intersection between three things, math, programming, as well as industry knowledge, and the most important piece is industry knowledge because at the end of the day, you're the ones that know where the bodies are buried, where the biggest challenges are, and if we can build on top of that and give you a little bit of that programming skill, a little bit of that math background, you can actually leverage that much greater than somebody who's unfamiliar with your industry coming in with data science skills.

The other piece of the puzzle is just giving them a basic framework of what data science is, and the way that I like to describe it is that you are quantifying the relationships around you. That's what data science is, and we're able to do that now on a much higher level because we have this massive influx of data that we've never been able to collect before.

So we like to set that tone. I have a one-hour presentation I do for Lunch and Learns. It's free. I just like doing this information that goes over the basic and foundational terminology of data science. We talk about what is data science, big data, supervised and unsupervised machine learning, and just by giving them that baseline of information, they're already starting to feel more comfortable because it's really intimidating. As somebody who learned data science when I started this company, it is very intimidating, and so it's really important to just understand that it's not magic. It's a step-by-step process, and by learning the terminology and the foundations, people actually start to feel more comfortable with that idea, so then we just roll the ball forward from there.