Matt Frazier @ Pie Insurance | Removing blockers for your team | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the Data Science Hangout, hope everyone's having a great week. If we haven't met yet, I'm Rachel Dempsey, I lead our Pro Community here at Posit. This is our open space to chat about data science leadership, questions you're facing and getting to hear what's going on in the world of data across many different industries.

And we're here the same time, same place every Thursday. So if you are watching this as a recording sometime in the future, there'll be a link where you can add it to your calendar below as well. Together, we're all dedicated to making this a welcoming environment for everybody. And so we love to hear from everybody, no matter your level of experience, or area of work or industry.

I like to say that it's totally okay to just listen in, though, if you want. But there's also three ways you can jump in and ask questions, or provide your own perspective. So you can raise your hand here on Zoom, and I'll be on the lookout here. You can put questions in the Zoom chat. And if you're maybe in a coffee shop or your dog's barking or something, just put a little star next to your question in the Zoom chat, and I can read it instead.

Otherwise, I'll just call on you to jump in and add some context. And then lastly, we also have a Slido link where you can ask questions anonymously. And I see Hannah just shared that in the chat here.

But thank you so much, Matt, for joining us here today as our featured leader. My pleasure. Matt Fraser is former Chief Analytics and Underwriting Officer, currently a Strategic Advisor at Pi Insurance. And Matt, I'd love to have you maybe kick us off by introducing yourself and sharing a little bit about your role and something you like to do outside of work, too.

Matt's background and path into data science

So I'm Matt Fraser. I actually started in data science in 2008, right after the financial crisis. Prior to that, I was more of an insurance guy. I actually have a background in philosophy, specifically metaphysics. And so I'm sort of a Karl Popperian at heart, if anybody knows who Karl Popper is, who was the first empirical skeptic. And so I sort of transitioned my empirical skepticism over into data science and started to learn it in 2008.

Before that, I carried a number of executive positions at different insurance companies in the underwriting space. At that point in time, right, from let's say 1998 through 2008, I did not know of an insurance company that was actually doing any predictive modeling at all. And it is still the case that really the national carriers are using AI, they're using data science, they're using predictive modeling. But most of the smaller mutuals, right, the ones that are only in a couple of states, are really using other predictive modeling software as a service companies to do all of their predictive modeling.

So data science was a sort of an obvious transition after I made it, but I had no idea what the heck it was before I got into it. I was just a philosophy guy. I was a language guy. I love language. And I love understanding what is knowable and what isn't knowable, right. But I didn't really realize that you could translate that into mathematics until I got into a company called Valen Technologies.

So I sort of got out of the insurance industry and went into a company that was adjacent to the insurance industry, actually selling to insurance companies, predictive analytic and data science solutions, right. But before that, I was pretty good on the actuarial side. I'm not a classically trained actuary at heart, but I was the director of pricing at a number of different insurance companies as I sort of progressed through my career.

And so I got pretty good at using Excel, okay. I got pretty good at grinding through data. I get pretty good at philosophically understanding data and understanding, you know, what it meant if certain data had low integrity versus high integrity and things like that. So I was really good at data wrangling, but I didn't have any idea what a GLM was. And don't ask me to write any formulas down because, you know, I let you guys, I let the data scientists do that, right.

But I'm pretty good at picking models apart because of my empirical skepticism that I learned at a very early sort of age through college and high school. My dad was a physician. He always used to kind of hammer me on, hey, are you sure you know that? Do you know that or do you think you know that, right? Do you believe you know that? Because beliefs can be wrong.

And so when I was at Valen, I was able to get trained by a former chief modeling officer at Capital One. Capital One Technologies, one of the major investors was Nigel Morris, who was one of the co-founders of Capital One. And Capital One was sort of one of the first banks that got into machine learning, got into predictive modeling on the credit card side. And so they had a very robust understanding. They were one of the early adopters, let's say, in the banking space and the fintech space of actually using predictive models in order to create a competitive advantage or arbitrage opportunity within the credit card space.

And so we were able to really capitalize on that connection with Capital One. We were able to take some really smart people from Capital One, bring them into Valen. And I really learned on the job from one of the former chief modeling officers on the commercial auto and auto finance side from Capital One.

And the second I got into it, I absolutely loved it. I mean, Valen was my favorite job. The second I got into data science, I was like, oh, God, this is what I've been missing my whole life. Right. Like, I love the language. But to be able to blend the language and the math together, the language and the data together and actually produce something that gives another company a moat or a competitive advantage in a marketplace, man, that's really cool.

But to be able to blend the language and the math together, the language and the data together and actually produce something that gives another company a moat or a competitive advantage in a marketplace, man, that's really cool.

Like I can actually take things from the philosophy side where all you're really doing is creating more questions for yourself and actually translating that into real, durable value in the world, like something that's tangible, that can be seen through dollars and cents. I was super duper excited, you know, I got really excited about that.

And then I translated that into Pi Insurance. I was one of the what I would call the silent co-founder at Pi. We took all of the learnings that we were able to extract from Valen Technologies, working with about 56 different insurance companies in the property and casualty space. And we translated that into building durable models that created a competitive advantage and an arbitrage opportunity for Pi Insurance.

Pi started in, oh, gosh, I want to say it was 2018 now. So we're now writing about 300 million. We're on the path to about 350, maybe 400 million this year. And I believe we're the fastest growing and quickest, most capitalized in insure tech that has ever existed.

So somehow I was able to take that philosophy degree, translate it into data science, create a competitive advantage, and raise a whole bunch of money and create a company that now employs. Unfortunately, we just went through what we call a reduction force. But we had about 460 employees, we now have about 400 employees now. And I think that's the end of the cutting so far.

But I've been very pleased with what I've been able to do and what I've been able to learn in data science. And I'm just looking forward to a lot more learning opportunities as we move forward. You asked, what do I like to do? Skiing, fly fishing, hiking, I like to be outside. So as long as it's outside and it's sort of a more of a single sport, I really like that kind of a thing.

My job is to find smart people and then set up the conditions and get the heck out of the way. And if there's a blocker that is in their way, I need to go find a way to remove that blocker, right?

Now, are there performance reviews and things like that? Yeah, every company has them. I am a strong proponent against them. So I hate performance reviews. I think that performance reviews should be done every single day between the individuals that are working together. And I don't think it should be between a boss and that boss's employees. I think it should be the individuals on, that they're colleagues on the team that are working together with one another. That's where that feedback should be coming from.

It should be a 360 degree feedback because performance reviews are a thing that can be weaponized so easily. And they say more about the person who's giving the review than they do about the person who's being reviewed, period. That is my position. Now, I'm happy to argue with anyone about that particular position because of how strongly I actually believe in that position.

I think that if we can create an egalitarian society where everyone is responsible to everyone else and everyone is giving constant feedback to everyone else on how to improve with empathy, and with honesty, that is where you can create an organization or a social structure and a social hierarchy that actually allows everyone to win and allows everyone to succeed and allows everyone to explore all of the levels of Maslow's hierarchy, right? That's how you become fully human and fully yourself.

Removing blockers and communicating needs

You said part of the role in a leader is removing some of those blockers for your team. And I'm wondering for some of us here who maybe are in a role where we've identified certain blockers for us, how would you recommend you go about like sharing that with your leadership to kind of get past those roadblocks?

Whether it's an I or a we, right? Sometimes there's a blocker for a whole team and sometimes there's a blocker for an individual, right? I think what you need to do is, I don't know if any of you have taken any courses in nonviolent communication. It's kind of a big thing now. We've taken a lot in nonviolent communication. I don't believe all of it, but I think it is a tool set where you can sort of mix and match some of the tools that come out of nonviolent communication. One of those things is to listen for needs and to express needs, right?

So you start with, I'm trying to do this for the company. I believe that this is going to be very valuable. Do you agree? Right? So first, you've got to get buy-in on what you're trying to do is valuable for the organization and valuable for the company. Okay? The second thing is, in order to do that, I have certain needs that are not being met. Okay? This is what those needs are. It could be an individual. It could be a group of individuals. It could be a dynamic. It could be technology that isn't there. It could be budget. It could be whatever, right?

And then you really express that as a need, not as a so-and-so is doing something to me and I'm mad, right? That's a very sort of a selfish communication. What you want to do is you want to try to abstract that as much as you can. Now, you can't always because sometimes it's just a person who isn't on board, right? And so you say, look, I really need this person to be on board, but I don't think this person is on board. And I want to ask you what we can do together in order to try and identify that other person's needs and discover why it is that they appear to be a blocker.

It may be in my own head and I'm just, you know, I'm just in an echo chamber, but let's have that conversation. And if you don't feel confident having that conversation yourself, go to someone who you think can have that conversation. And it doesn't have to be a manager. It could be a colleague who knows that person really well and can set up that meeting so that you can have that meeting and identify what those needs are.

There's another book that I would really strongly recommend, and unfortunately it's escaping me right now. The guy's name is Chris Voss. It's called Never Split the Difference. I don't know if any of you have read that. Strongly encourage you to read that book. Some of it's throwaway, right? But, you know, I follow the Pareto principle, so there's always 20% of, you know, absolutely fantastic nuggets in every book.

So that book basically teaches you to get to know. You talk to an individual and you have a dialogue with them. You have a dialectic with them until you get to why they're saying no to whatever thing you want to get through, right? Because yeses that aren't really yeses don't matter. You have to get to the no, right? And so that I think making sure that you understand what the other person's needs are or the organization's needs, if that's a blocker. And then making sure that you're constructing the right conversation and you're having those conversations through an expression of needs on both sides, a reciprocal expression of needs. Usually that is the black swan that will bust right through that blocker.

Regulation and data asymmetry in insurance

Matt, kind of following on the thread that you started with about these opportunities coming from data to democratize this industry. I wonder if there's a counterpart regulatory piece that you see. Are there regulatory gaps in that being able to be successful in making sure that you're successful in this industry that you described as being a monopolistic cabal? Are there regulatory things that you think are really important either for consumer protection or for pushing back against anti-competitive kind of practices that would make a difference?

Yes. And this is where my opinions are probably the most provocative. I believe that, by and large, insurance is massively overregulated. Massively. To the point that it's actually been regulated in such a way that it's extremely difficult for new entrants to get in, right?

I think we've democratized some of it with a lot of the things that companies like Valen have done and other companies constructing consortium data sets. But the regulatory world in insurance is extremely antiquated. They're basically operating on 1970s principles, by and large, okay? And those principles were set up by the insurance industry at that time, which was populated by a very few, extremely large carriers, specifically for the purposes of creating barriers to entry for new entrants, okay? So, I believe all of the regulations need to be completely rewritten, and they need to be completely rewritten in view of the latest technologies.

There is something that one has to understand about insurance, and that is that it is an asymmetric information game, okay? The insurance carrier gets to ask questions, and they have to ask questions mostly through a third party, through the insurance agent, right? And so, not only do you have information asymmetry, but you have information distortion as well.

And so, what the insurance companies are trying to do is put themselves on a level playing field, because the insured has all of the knowledge about what their exposures and what the risks are. The insurance agent has probably 20% of that information, and the insurance carrier probably has about 10% of that information. A lot of what's coming available now is actually increasing that 10% to maybe 30%, 35% of the information, right?

There's still unconscientious, disagreeable people out there that are lying about all of their information. They're a roofer, and they're saying that they're a computer programming company. They're a trucker, and they're saying that they're a retail shop, right? And that's where models are actually quite good at determining whether someone is actually lying to you or not, right? And identifying and shaving down the moral hazard.

Now, having said that, there are still ways now, and will be ways in the future, that insurance companies can take advantage of information that the insured doesn't even know they have. And so, I think that information transparency is a huge piece of the puzzle. I think that if an insurance company is making a decision, letting the insured know what information they're using in order to render that decision is probably going to be very important in the future.

But I think that the regulations around machine learning and filing models with the Department of Insurance and the scrutiny that they have to go through is massively, like exponentially slowing down the innovation that could occur in the insurance industry. It's also why most insurance companies that I've worked with are choosing not to file their models. There are very few people that are actually filing models in the commercial insurance space. In the personal auto space or the personal line space, you have to file all of your models. So, the Department of Insurance gets to look at it. They get to scrutinize it. They can say no on just about anything, and they do.

And there are some states that don't even allow for trade secret. So, once you have to file your model with the Department of Insurance, your competitor can actually get all of that information and rebuild your model, which I've done several times. So, I think there needs to be massive changes in regulation around data science, machine learning, and algorithms and how they're used in determining pricing and availability of insurance coverage.

Automation of underwriting and the future of data science roles

I wanted to get your opinion on see how far is some of these job functions being automated, such as underwriting or app input, because AI is so much advanced, and it's reading so much data from your database, vast databases. So your thoughts on that?

Yeah, so I don't believe that all risks. And now with the latest technology, my opinion is changing sort of actively as I start to understand a little bit more about these LLMs and transformers and the like. I believe that probably 85% to 90%, maybe even 95% of the underwriting decisions that are currently being made by human beings in the world will be eliminated through the use of machine learning and AI. I mean, that's just a simple fact.

It is already the case that machine learning models can make a significantly better and more consistent decision than humans. Now, it is game theoretic, right? So what the models are not quite as good yet at doing, and obviously there's ways to solve this using models, they're not quite as good at understanding the market conditions and the changing market conditions with regard to what the best price might be, right?

Insurance carriers have to be able to understand good risk from bad risk, but they also have to write policies, right? And that means that sometimes you have to make trade-offs, and that's where human beings come in. They're making those trade-offs. They're saying, well, I might write this at a slightly higher loss ratio, but if I don't write this packet of business at a slightly higher loss ratio, I'm not going to have enough revenue, right?

So the game theoretic piece of the puzzle, sort of price optimization, which is a really bad word in insurance, but that's how we think, right? We're actually doing price optimization every single day. It's just that we're doing it through human underwriters. I think where the world's going to go in insurance is that a lot of the customer service personnel are going to be eliminated because LLMs are going to take on that, you know, chatbots and RPA and automation are going to be able to do probably 95 to 100 percent of policy changes, endorsements, things like that.

But on the underwriting side, I think that it will probably eliminate 85 to 90 percent of the decisions. That doesn't mean that the underwriter's role gets eliminated. It means that the underwriter's role changes significantly. It changes to a portfolio management role. It changes to a gap identification role. And it changes to a role where an underwriter is now the subject matter expert that works directly with the data science team in order to identify and explore the gaps that may exist in the model solutions and helps to fill those with novel data and perspective that sometimes a lot of the data science teams don't have because they're not on the front lines.

And that's sort of a surrogate is it is really important in insurance that the data science team actually has a working knowledge of the line of business that they're actually looking at. So if you are building a model for commercial auto, you have to understand commercial auto. You have to understand how it's rated. You have to understand the coverages, et cetera.

I have a lot of PhD data scientists that I've worked with in the past that felt like, hey, I'm a data scientist. I don't need to know the underlying environment. I'm just going to grab the data and I'm going to build a model. I can tell you that in my experience, 90% of the models that are built in that fashion fail because they cannot be productionalized. The data scientists didn't find out whether the information was available at the time that the decision needed to be made or the target was not transformed in a way that was critically important because the data was centered or there was a treatment effect that they did not identify until after they built the model.

The model building process

Generally speaking, we will, on a regular basis, go out to each one of the department heads and we will ask them, what are your problems today? What is creating too much friction? What is something that you think could be a better decision that's being poorly decision today? Then we'll get all of that information back.

Once we construct all of the potential opportunities for data science, then we start to look around at the available data sources that we have and we start to do some discovery around whether or not we can actually solve that particular problem with data. Do we have the data available to be able to solve that problem? Is that problem a big enough problem to solve?

Usually, once we go to the department heads, we'll then go to the business or department analysts within that particular department and say, hey, how big of a problem is this? Can we quantify the friction? Can we quantify how much this is costing us? How many minutes? How many hours? How many days? How many employee days does this take? Theoretically, if we were able to remove that and automate that, what would the savings be?

Then, of course, there's other things in the industry. How much better can we make around these decisions? Can I separate good risk from bad risk? Can I identify the probability that someone's actually going to pay their bills? Can I identify the cost if they don't pay their bills? Can I identify how many times they're not going to pay their bills and we've got to do something with that policy, whether it be to cancel it and reinstate it and all that stuff because that costs insurance company money.

Once you sit on a problem and you know that there's enough value there and you know that you have the data to be able to solve that problem, then it's a pretty standard process. You have to wrangle the data. You have to normalize the data. You have to work together with the department experts in that department to identify what they think might be interesting or valuable covariates.

So usually we will do some unsupervised modeling just to figure out what the data looks like. Where might there be blooms of data that could be interesting? And then we'll do a correlation study, which is with as many covariates that we can actually put together and then bring that back to the department and then ask the subject matter experts what's not in here that you think should be in here. And then they'll start to think about and construct some additional features. Then you start the feature engineering phase.

You do that with the subject matter experts in that department and you try and get as many features that you think might be useful as possible. And then, of course, there's some target transformation sometimes. Are you going to build a logistic model? Are you going to build a linear model? Are you going to build something that you're going to use for triage? Is it going to be sort of a categorical model where you've got five or six different decisions that need to be made and you're going to split those up? You kind of figure out what the best sort of opportunity is there.

And then we usually go through a sample and partitioning. How are you going to sample this? What's your strategy going to be? What is your validation strategy? So are you going to do cross-fold validation? What's your holdout going to look like? What's your train set going to look like? What's your testing set going to look like? Are those stratified correctly? Is anything time series based? Because that has a significant impact on how you're thinking about your sampling strategy.

Usually in insurance, depending on data size and the composition of the target, we use anywhere between three partitions and 10 partitions. If it's a more complicated problem and we've got a lot of data, we'll use up to 10 partitions and we'll hold three or four of them out. And then we'll use incrementally four or five or six or seven of those in order to increment through the overall model building process.

We'll use the first partition, maybe even split that up in order to do feature engineering and basic variable selection and reduction. Then we will build our initial model and then we'll test it on the next partition. And then normally what we do is we'll build on, let's say you have five different partitions. We build on one and two, test on three. Build on one and three, test on two. Build on two and three, test on one, right? And then once you're really confident that you've got the right model using that data, you add the next partition.

Now you've got one, two, and three built on four, four, three, and two built on, you know, tested on one and so on and so forth, right? And we're constantly looking at coefficients, whether they're shifting, whether they're going up and down. If it's a tree model, we're looking at what the splits look like. We're sort of doing hyperparameter selection usually on those early partitions until you get to the final model, usually with two or three validations that's already held out. And in insurance, we're usually using between five and eight cross-fold validation at each one of those stages, just to make sure that we're getting, there's a lot of volatility because of the zero point math in insurance. So you really need to make sure that you've got enough cross-fold validation there because you can have a very large loss that makes things look really weird in any one partition of the data that you're looking at.

Advice for aspiring data scientists

I mean, so I think a working knowledge of both Python, and I'm going to say it, R, right? R and Python. A working knowledge of both of those is really important. The biggest piece of the puzzle for me is an understanding of what I would call pragmatic, productionalizable model building, right? Do not narrow your scope or your interest down to research alone. You need to know how to get models into production so that they can be used by actors in the real world, right?

If you're doing descriptive statistics, that's great. I mean, and if that's what you want to do, great. But you're going to cut yourself short if you don't know how to deploy a model. You're going to cut yourself short if you don't know what the process looks like in order to deploy a model. You don't necessarily need to know how to do the engineering, right? You can get an MLOps team for that. But you need to know as a data scientist, how to do things like parity testing, predictions between the production environment and the model build environment, right? You need to understand what that looks like, what the process looks like, right?

And you need to understand how to actually go the last mile, if you will, right? If you're just building general linear models and you're doing that descriptively and you're saying, look, I built a model and it looks great, right? The next question that any executive is going to ask you is, okay, how do we use this? How are we going to deploy this? Are we going to deploy it in SQL? Are we going to containerize it? Are we going to use Scala? Are we going to do something else in order to get that model into PMML, whatever the case may be? How do we get that model into production? And how do we know that that production model is going to score the same way as it did in the model build environment?

I think the other one is having a really good understanding and an ability to articulate what the model building process looks like all the way from finding the data sources all the way through to validating that model and making sure that you don't have an overflip model and that that model is really going to perform as you say it's going to perform in the real world on data that it hasn't seen before. Those are all really, really important.

The last point that I'll make is, it is critically important for you to show in any interview or to any business manager that you have a high level of curiosity in the business that you're in. Don't sell yourself short and just be a data scientist. You need to be a curious data scientist. If it's an insurance, you need to want to learn insurance. If it's in aeronautics, you need to want to learn aeronautics. If it's in process optimization for, I don't know, making candy bars, you need to really be curious and interested in what the process looks like to make a candy bar, right?

I think that that's really, really important is that data scientists can't just be data scientists. They have to be Renaissance people and constantly curious and autodidactic. That's the best advice that I can give is don't narrow yourself. Don't be an inch wide and a mile deep. Be a mile wide and more than a few inches deep.

I think that that's really, really important is that data scientists can't just be data scientists. They have to be Renaissance people and constantly curious and autodidactic.

Thank you so much, Matt. I know we quickly came to the top of the hour here, but I really appreciated this conversation and you jumping on here and joining us. Absolutely my pleasure. I had a fun time. Thank you all for all the great questions too. I'm going to put Matt's LinkedIn in the chat here if anybody wants to connect, but also I will share the recording, of course, to YouTube and to the Posit Data Science Hangout site. Have a great rest of the day, everybody.

Matt Frazier @ Pie Insurance | Removing blockers for your team | Data Science Hangout

Transcript#

Matt's background and path into data science

AI, ChatGPT, and the insurance industry

Data privacy and LLMs in insurance

Bias in models and synthetic data

Effective leadership and servant leadership

Removing blockers and communicating needs

Regulation and data asymmetry in insurance

Automation of underwriting and the future of data science roles

The model building process

Advice for aspiring data scientists