Prabhakar Thanikasalam | Enterprise-Level Data Science Success | RStudio (2022)
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
I'm here to talk about enterprise-level data science success. It's making data science a success in a large company. That's what I want to talk to you about.
Who am I? I've used R for about six years. I come from an engineering background. In my professional role, I lead a data science, analytics, and software team in the supply chain organization. The silver lining of the last two years is I don't have to explain what supply chain means to anybody. From toilet paper to semiconductor chips, everybody understands what supply chain means.
Flex is a large electronics manufacturing company. It's a Fortune 500 company. It's located in 30 different countries. Outside of work, my primary role is a dad to two young kids, and mom currently thinks I'm on vacation in a fancy conference resort.
The session was themed working with people is hard. It is hard. I agree that it is not easy, but I'm here to try and convince you a little bit that it need not be this hard.
Defining data science success in an enterprise
So what does data science success in an enterprise mean? I'm going to give you one definition that's my definition, where there is effort that leads to some output, and then there are decisions made with that output, and actions and outcome for an organization or an enterprise.
We know what the tools that we use in the effort and the outputs are. We use R and Python. We publish our outputs. Now for the second part, decisions are generally made with other people in meetings, whether it be with higher ups or your peers or with people who are the practitioners and users of the output that we provide.
Decisions and outcomes typically mean we make more sales, we make more revenue. The key performance indicators show improvement after implementing the analytics. My talk is going to be centered around the third and fourth phases of making data science success, and in most organizations, data science and analytics is a means to an end. Success is measured by dollars and metrics of what is important to the business, so it's a means to an end in most cases.
I see four sets as very fundamental to data science success in a corporation. It's the data set, the tool set, and the skill set is talked about a lot in this conference. We have different tools that we have, we improve our skills. In this talk, I'm going to talk more on the mindset within an organization of how to take it from development to deployment, where the first two boxes that you saw, implementation, actioning and outcome, is what the mindset change that is required for the last two processes.
Partnership and decision maturity
Partnership. This is a word that means a lot, and I emphasize this within the organization. There is a tendency to treat data science and IT organizations as a service provider within an organization. My passionate recommendation is do not settle for this.
Ask to be equal partners, and what that means is the data science team gets equal input to key decisions that are made with the analysis, that they get to offer opinion, they get to guide the non-datas and the business people to execute on the output. In return, the data science team also owns the business outcome, so we take on an additional challenge of owning business outcomes, and not just model deployment.
Ask to be equal partners, and what that means is the data science team gets equal input to key decisions that are made with the analysis, that they get to offer opinion, they get to guide the non-datas and the business people to execute on the output. In return, the data science team also owns the business outcome, so we take on an additional challenge of owning business outcomes, and not just model deployment.
In terms of maturity of an organization to utilize and implement data science, I think of these as two-dimensional. One is the analytics maturity or data science maturity. This could be ingesting large amounts of data, running complex models. The other important vector is decision maturity. Is the organization, how good is the data literacy of the organization? How good are they in decision making with the uncertain outputs that we have?
Decision maturity is a fancy term for saying have a decision framework to utilize the output that is given from analytics. For the data science team, it is important to understand what are the key metrics that are important for the practitioner of the analytics, or what's important for the business. And we talk a lot about model metrics, model accuracy, classification error, whatever the case might be.
The model metrics only need to beat the current business process metric to be implementable. It's a two-horse race. A model accuracy does not need to be at 99% to be implemented if the business process metric is today at 70%. You don't have to beat the world record to beat the other horse, you just have to beat the other horse. And then make decisions to implement the data science side of things on the business side of things.
This is contradictory to, you know, working nicely with people. It's important to pick some fights, and you have to fight for the right reasons. Friction is a very necessary component to moving forward. Good friction is good. That removes hurdles and blockades that sometimes come when different teams have to work together to move things forward.
From model output to decision making
I want to make an illustration or an example of how do we go from model output to decision making. You see a picture of Google Maps that takes us from where we are today to Dulles Airport. Many of us will be making this trip in the next 24 hours. Google Maps tells me it's going to take 45 minutes.
I ask the question to Google Maps, how long is it going to take? But that's not the real question. When I have to make a decision with the answer it's given me, the decision making breaks down into multiple tree-type decision making. Do I want to take Uber or can I afford to take public transportation? Do I have time for coffee?
When you put this decision making framework on top of the very complex analytics that's running behind Google Maps, that gives you a framework to connect data science to business-type decision making. Talk to your stakeholders to understand what decisions do they really want to make with the data output.
Going from a generic example, I'll go to a supply chain and a specific example. Within the supply chain organization, we have analytics that support, that provide output for what the price of a part should be. My organization and company buy millions of parts every year. You can see the scale of the pricing.
When we provide some model outputs that show lower than current price, the person who has to execute on it, when they first see it, they get some pressure. They get some pressure or fear that they have to go meet that target. The pressure or fear is not within that one person's scope, but their manager also sees this data. That's where the pressure comes from. There is a very high chance that giving analytics in an organization can create some pressure or fear.
However, if the person were to achieve any lower price for that part, whether it meets the target, regardless of whether it meets the target or not, that's an opportunity. It is very important to work within an organization to change people's perception of analytics from feeling pressure or fear to creating that opportunity and growth mindset. We call it the pressure to opportunity mindset change, and it takes time when you work with different teams.
It is very important to work within an organization to change people's perception of analytics from feeling pressure or fear to creating that opportunity and growth mindset.
Analytics tend to judge machines and models very harshly. I'll give two scenarios with this exact case of pricing a part. Now, let's take the case of the part was a dollar and the model output was 90 cents, and the actual achievement after the negotiation was 95 cents. In case one, or scenario one, with a model, people tend to judge the model and say the model has 50% error.
In case number two, or scenario two, without a machine or a model, and when a human went in with a low bid as a starting point, and they started with 90 cents and got up to 95 cents as the result, it was a good negotiating strategy. So humans tend to judge machines and models in a very harsh way. We have a lot of empathy towards other humans. So what we have learned in doing analytics at scale within a large company is be aware of being judged in a very harsh way and prepare for that.
Building relationships and teams
So with people, there's building relationships with people up, down, and sideways, up the management chain with your teams and your peers. The data science and analytics team, to be successful, have to learn to understand and talk the language of the business teams. When we say models and random forest and regression, everybody does not understand those terms. It makes them a little closed and feared.
So we have to open up and learn to talk their language to get how they can execute and do their jobs better with what we are giving them. And building a successful team takes time. Manage, lead, and coach are three different things, and all three have to be done in a different way. And similarly, as we challenge the non-datas to speak our language, we have to challenge the data science team in an organization to speak the business team's language.
Some pitfalls that I've learned over time, when management asks for machine learning, whenever they start a question is, let's see if AI ML can help here. That's usually a rabbit hole. Can we find something interesting from the data? That's another big rabbit hole. Be aware of scope creep in an organization. Have a clear plan and a threshold for aborting a project with your project stakeholders.
So working with people need not be this hard. Working with people is a very necessary component of enterprise-level data success.
