Resources

Brian Fannin @ CAS | Copy & paste is operationally dangerous | Data Science Hangout

video
Jul 17, 2023
58:55

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey everybody, welcome to the Data Science Hangout. Hope everybody's having a great week. I'm Rachel Dempsey, if I haven't had a chance to meet you yet, and I lead our pro community at Posit. Data Science Hangout is our open space to chat about data science leadership and questions you're facing. Just getting to hear what's going on in data across different industries all over the world. But we're here every Thursday at the same time, same place. So if you are in the future somewhere watching this on YouTube, you can add it to your calendar, and the details will be in the YouTube description below.

Brian, I think that's actually how you first found out about the Data Science Hangout. But together, we're all dedicated to making this a welcoming environment for everyone. We love to hear from everybody, no matter your level of experience or area of work. You can join in the conversation a few different ways and jump in and ask questions. You can jump in by raising your hand on Zoom. And to get to that, it's in the Zoom bar below where you click on reactions. And then secondly, you could put questions in the Zoom chat. And feel free to just put a little star next to it if you want me to read it. And then we also have a Slido link where you can ask questions anonymously, too. And I like to add in that it is totally okay to just listen in, too, if you want.

You don't have to jump in the conversation. But I am so excited to have Brian joining us here today. And before introducing Brian, I forgot, Libby, that I was going to have you do a little 30-second pitch to share something with everybody here, too. So let me do that first. And then I'll jump over to introducing Brian.

Awesome. Thank you so much. This is a pitch on behalf of Andis and I. Andis and I are going to be doing a reread book club of ISLR. We're going to do the second edition because we've gone through the first edition. And the second edition came out, I think, like August of last year or something. And it has fantastic resources because everything is really, really made available by the authors online. There's YouTube lectures. There's slides. There's all kinds of stuff. So who is this for? If you would like to join us, we're probably going to do it on Discord. It's probably going to be one chapter a month. And it would be best for people who are rereading like we are. A lot of people went through ISLR in their past. They're already pretty familiar with all of these statistical concepts. And they would like to dive into the second edition and kind of experience ISLR's take on some of the new material, like deep learning that's been added. So if you are interested in that, please private DM Andis or I your email address just so that when we are ready to do our thing together, we can include you too. Or find me on LinkedIn and send me your email address that way. It also works. Thank you, Rachel.

Yeah, absolutely. And ISLR, that's Introduction to Statistical Learning.

Introducing Brian Fannin

Brian Fannin is our featured leader for the day. And he is currently a research actuary at the Casualty Actuarial Society. And speaking of books, Brian is also author of the book, R for Actuaries and Data Scientists with Applications to Insurance. So really excited to ask you about that today to Brian, but to get us started here, it'd be great to have you introduce yourself and maybe share a little bit about your role now and thing you like to do outside of work, too.

Okay, so I'm an actuary and have been doing that for almost at the point where I can say 30 years. One of the first questions that I get when I tell people that I'm an actuary is what exactly is an actuary? I'll keep it very, very simple for folks, though I'm happy to go into detail if folks are interested. An actuary is an applied statistician, particularly focused on financial risk. The easier way to think of that perhaps is we work a lot in insurance, we figure out what people need to pay for transferring the risk of financial loss to damage to their car, to their home, or to what have you, any number of things. So that's what an actuary does. I've been with the Casualty Actuarial Society for about five years, more than five years, and we are very engaged in professional education, credentialing actuaries, but also actuarial research. So that's where I nerded up with trying to develop new concepts to improve the way that actuaries can carry out their work.

I also do a lot of speaking. As I mentioned, I was just in Boston recently at a conference, so I do dabble somewhat in professional education. Outside of work, it will be the same set of non-professional hobbies that I think everyone does that are fascinating and enriching to me. I spend time with my wife and my kids, I cook, I watch movies. I can't think of anything that would really distinguish me all that much outside of work. I like to travel, but you know, we all do. It's a really dreadful dating profile, but happy to talk about movies or travel.

Actuaries and the shift away from Excel

Brian, something I thought might be interesting to kick off the conversation with is, I know you mentioned that a majority of actuaries are still doing the lion's share of their work in spreadsheets, specifically Excel, but I was just wondering how you've seen that start to change or how you think that might be changing in the future.

So it is changing, albeit slowly. And I can say that my experience professionally as an actuary for the first 20 years that I was doing this was largely spreadsheet-based. That's very, very common. There's a very easy on-ramp for people to work with Excel, which I think helps explain some of its ubiquity. And I'll say Excel, that's usually what I mean, spreadsheet more generally, but typically it's going to be Excel. Two years ago, the CAS issued a technology survey about what tools actuaries were using because we're curious to see how quickly it's moving. What we found was 98% of respondents, and we got about a thousand some odd respondents, so pretty decent sample size, said they use 98%, I think, use Excel on pretty much a daily basis. Whereas R or Python, the numbers were on a daily basis, certainly comfortably less than one out of three. I would be one of the ones who uses R every day. We redid the survey last year and the numbers moved a little bit. That survey will launch again this summer and curious to see, if that slow and steady pace is continuing.

Anecdotally, I can say that when I'm at conferences and talking about some of the work, I do talk to folks about what tools they're using. I hear it more. I think Python is moving a little bit quicker, largely because R had some penetration, but I do get the sense that there is a subtle shift. A lot of it is going to be demographic. Within the actuarial community, I'll be one of the older R users. They tend to be a little bit younger than I am, so that shift is partially a generational change.

With that, have you had experience in helping bring people on board? I'm curious what's been most helpful for people just getting started? I'm not sure what has or hasn't been helpful for other people. I can tell you my approach has been to zealously evangelize using scripting tools. I'll say R specifically. It's the one that I came to first and the one that I love the most, but Python is good, too. I have less to say about things like Julia or JavaScript. JavaScript might not be a great example, but Julia would be a good one. They're not bad, I'm just not as familiar with them, and I think they don't have quite the network effect, but evangelizing scripting tools is one of the things that I do. That's one of the reasons why I wrote the book, so that there would be an entry point for folks who are new to Python.

To get familiar with the language. Later today, I'll be co-instructing a virtual workshop for actuaries using R, and I also assist with the Python instruction. I hope that's helpful. I can tell you what has been helpful for me is just recognizing the benefit that I get from using technologies like R, like Python. It has made me a better statistician. I suddenly found that when I was able to script a lot of these models, I understood the statistics a lot better than I thought I did when I was trying to translate pen and paper into a spreadsheet.

Brian's career journey

I know you've worked across many different insurance companies, like Munich Re or Swiss Reinsurance. You were chief actuary as well. I'd love to hear about that journey and maybe also what that transition has been like going into the casualty actuarial society, more of a professional society rather than large organization.

Well, I did say that I love to travel. One of the ways that I tried to put that into action was I was working in Chicago. This would have been about 20 years ago now. One of the professional realities of white-collar work that some of us have experienced, unfortunately, is a company where I was working was sold to someone else. I thought, well, I'd like to work somewhere outside of the United States. I had this opportunity where I found myself open to other positions and was able to move to Germany, not because I had said that's going to be the place. I phoned a headhunter and said, I'd like to move somewhere on earth where English is not the native language. Within this particular field, that means if you want to be a non-native or rather where English is not the native language, that will be either Switzerland or Germany, maybe France, maybe a few other countries. That was where I wound up. It was terrific. I learned German. I got to travel quite a bit and got to see a very different perspective.

I know that I'm pretty sure that I'm the only actuary on this call, although if you're not, please say hello. For people who are not familiar with the field, the perspective and the mode of practice is different all over the world. Actuaries in London are different than actuaries in New York are different than actuaries in Asia are different than actuaries in Switzerland. Getting that perspective was really eye-opening. Also, it kind of fed into my interest in R because a lot of the folks that I've worked with in Europe, their statistical skills were loads better than mine. I think they were able to think in equations and be able to do things in ways that I struggled to keep up with. That fed into my chafing against the constraints of the spreadsheet. I'm going to send my own intellect. I'll go ahead and just out myself with that. Like I say, I know more mathematical statistics than the average person on the street, but I don't know as much as the typical actuary in, say, Switzerland. That was one of the draws to getting me to R.

Databases and NoSQL

My company is finally looking into a proper database to help store data. That is not my area of expertise at all, but I thought someone here might have some advice. When I look into different kinds of databases, it seems like R and RStudio is pretty set up to work with SQL, relational databases, but I don't know what tools are out there for if we wanted to go with a NoSQL database. If anyone has advice, I would love to hear it.

I don't want to preempt anyone else responding. Your question about NoSQL, the experience with working with relational databases has been pretty smooth. I have just for fun dabbled in MongoDB and Neo4j. There are R libraries out there. At scale, there may be some pain points, but I can say that in terms of prototyping and just getting started, there are packages that support certainly Neo4j and MongoDB.

Actuarial statistical methods and packages

An anonymous question was, what mathematical and statistical methods do you use? Are there packages for this specifically? And this being actuarial science, or are we still on databases?

There are a few, and I'll have to reference some specific actuarial techniques. There is a technique called loss reserving. Effectively, it's establishing the amount of liability, financial liability, for an insurance company. Practically, statistically, it is very similar to longitudinal data analysis. Not quite the same, but very, very similar. That is one area where there are at least two packages. Both of them are called chain ladder. One of them is R, and then there was a port to Python. Those are a couple. There's a package called actuar, A-C-T-U-A-R, which is, I think, French for actuary, developed in France. This is some support for statistical distributions beyond what you get in the stats package. Actuaries often need to rely on probability distributions that have thick tails. We've got highly skewed distributions. This is a good thing if you're a consumer. It means that really terrible things are rare, like really terrible events are rare, but as actuaries, we need to be able to come up with an estimate there. Actuar is another package.

Which begs the question, is that all that we need? The answer there is no. Actuaries use pretty standard techniques, like generalized linear models, for a lot of the work that we do.

Bridging data science and actuarial knowledge

Adam, do you want to jump in and introduce yourself? Yeah, hi, I'm Adam. I'm a director of data science at the Hartford. I just joined two months ago, but it just came up this morning, in fact, that it's very common with the rise of data science that we often have lots of resources and cheat sheets and ways for folks to understand what data science is as it comes into their business, but we don't often see it go the other way. I've been in the insurance industry a few years, but I know a lot of new hires, especially in data science, struggle with learning those actuarial concepts and the jargon and what things mean. I'm kind of looking for other resources, other ways for us to go out and find, besides, of course, your book, other resources where we can go out and understand these concepts and use that to collaborate with our product partners, for example.

Everyone that I've worked with is going to be really upset that I don't have one of those at the ready. The Actuarial Foundation may have some kind of actuary and one-on-one type of material. I'm sure that that material exists. I've not needed it for a very long time. Generally, if I'm talking to folks, even if they're not actuaries, it's largely more about insurance terms. Here's a good example, like the abbreviation IBNR. Actuaries know what that means. So do a lot of people in the insurance industry that actuaries work with. To end the suspense, IBNR stands for incurred but not reported. That is an amount of insurance claims that the company must estimate as being owed to policyholders. But it's so standard on insurance financial statements that even outside the profession, folks tend to know it. If you're a data scientist and let's say last week you were working for, I don't know, Best Buy or Home Depot and today you're working at the Hartford, you're not going to know what that is.

Yeah, so the Institute, which I guess organizes designations that actuaries tend to take. So like CPCU is a very common, very big one, but they have a few other very specific designations that are low effort in terms of the amount of work involved that I think are quite useful. So one of them in the chat is Associate in Insurance Data Analytics. So it takes maybe like nine months to complete fully, but it's an official designation that I think has been useful for our team. Also being in the insurance industry in terms of just getting up to speed on basic knowledge terms that can be very confusing to someone starting off.

Infrastructure and tooling for actuaries

Brian, are your colleagues and peers working on local laptops or perhaps on an on-prem larger system? And the follow-up question is, would your typical actuary need to install and manage RStudio or Anaconda on their own machine? What does the setup look like for them?

Typically it's going to be, well, I'll describe my setup, which is probably typical these days, 2021, 23, 2023. I have a laptop. I work remotely. So I've got a laptop, which will refresh to your corporation server. We're an office, sorry, Microsoft shop, which means that we've got one drive. And so everything is cloud synced. And I think that's fairly typical. I have R installed. I'm not using RStudio server, although Rachel, everybody ought to. It's a really good product. I have used it. I know the Hartford does. I did some consulting for the Hartford and they had it. And it was a fantastic experience. The installation of R and RStudio is pretty straightforward. It used to be a lot more painful, but I find that it's pretty smooth these days.

So installing locally, my use of R professionally has been largely on my own. The collaboration has usually been between me and someone at another company. And their GitHub will be the way that we're collaborating. Ideally, that's my favorite way to collaborate, or we'll get lazy and use something like SharePoint or just email, forgive me. But those are easy ways to collaborate. Anaconda, not actually a big fan of Anaconda. It's really opinionated. And I find that whenever I install it, I'll wind up with somehow three or four different flavors of Python on my device. I'm probably doing something wrong. It's probably my fault. But I usually just install Python. And as with R, it's usually pretty straightforward. If I were working with a team of actuaries for an insurance company, that's not my situation right now. I'm one of only a few actuaries. But if I were working with a team or managing a team, I would do something like RStudio server.

Survival analysis and simulation

Another anonymous question was, when you say that actuary statistical analysis are similar to longitudinal analysis, does that include survival analysis sometimes?

Yes. There are globally two different flavors of actuary. There's a life insurance actuary, and then there's every other actuary. In the US, it's life and health insurance. Outside of the US, there's public health insurance schemes. But a life insurance actuary would absolutely use survival analysis. I think a lot of techniques that are in survival—I'm not a life insurance actuary, so I'm talking out of school a little bit. But I think there are a lot of techniques in survival analysis that have their origin in human mortality studies. So yeah, survival analysis is a thing. I wrote a paper about using survival analysis in loss observing, so you could kind of think of that as a hybrid survival analysis and longitudinal analysis.

In the insurance space, there is a need for publicly available data sets for research. As you can guess, there are a lot of privacy issues that would make people uncomfortable. Companies certainly and policyholders are very uncomfortable sharing that data. There are a few kind of toy data sets out there, but we do simulate data for research purposes from time to time. For that reserve data, the longitudinal-ish data, there are several packages that will do that. There are about four or five. If you're not an expert, you've never heard of these things, but that will simulate these kinds of financial transactions. Beyond that, fairly straightforward, the ability to randomly generate a number in R, it's one of those really bedrock things that I really enjoyed when I first started using R. In Excel, I can remember I wanted to simulate something from a Poisson distribution. At that time, probably still today, there's no easy way to simulate Poisson, which is a very, very common thing in actuarial applications. You couldn't do it. Maybe I could have just inverted the cumulative function, but whatever. In R, I don't even need to do any of that. It's just R plus, and I've got 10,000 observations.

Things like that. You can stack them. If you've got a cascading process, sort of a hierarchical simulation, pretty straightforward to do in R. My favorite way to tackle that is I'm a very data frame-centric programmer. I've got a data frame where I've got maybe 1,000 scenarios that have different parameters, and then just go row by row using per or something like that. Each row in a data frame becomes a new data frame that is simulating a scenario. I can just easily nest and unnest and summarize all of that stuff. It's that Lego-based approach to simulation.

Public speaking tips

Do you have any tips for us who are trying to get better at public speaking?

I think finding a friendly room is not a bad place to start. So locally, here in Research Triangle, North Carolina, there are not very many actuaries, but I think folks realize that this is a really good spot to be if you're into data science or statistics. We've got the Research Triangle Park, we've got the universities, and we've got SAS here. So there is a very active and super-friendly community of data scientists. I'm very much involved with a meetup group of Research Triangle analysts, and we have monthly meetings. People can give talks there. Really friendly room. That's not a bad place to start. Yeah, present to your family, present to your friends. That's another way to find a friendly room. And then finally, just keep doing it. The first time that I ever spoke at an actuarial conference, I was so nervous. I thought that I was going to have a heart attack. I couldn't believe just how petrified I was. But that was like 13, 14 years ago. It gets easier.

R Markdown, Quarto, and reproducible reporting

An anonymous question on Slido was, do you see R Markdown or Quarto's publishing functionality used in actuary stats work? For example, publishing your notebooks into tables, docs, or slides directly? Not as often as I would like. I am a big, big fan of R Markdown slash Quarto. I'm a little bit late to the party on Quarto. I use it. The differences between Quarto and R Markdown, I'm still learning, so I use the terms interchangeably. I wrote a 600-page textbook using R Markdown. I do pretty much all of my work in R Markdown. I find that I'm writing notes and long-form comments suddenly become the most important bit in the document, and so it just becomes easy to have R Markdown slash Quarto as my starting point for work. Professionally, I've used it a lot. Stakeholders very often will want, this isn't all that they want, but very often they can be satisfied with a reasonably concise and informative Word document, or PDF if you'd rather, but giving them a Word document, like, okay, here you go. Here are the tables, and here are the graphs, and here's the executive summary. I think that's a great way to share information.

Actuaries are not using it as much. I think there's not as much awareness of R Markdown. I think the initial presumption is that R's greatest strength, or only strength rather, is in calculation. It can absolutely do that. It's got an incredible library of statistical and machine learning functionality, but it does a lot more than that. That is slowly changing as with adoption of R generally.

I see that as being a real big selling point when people find out that they can replace a very typical actuarial and kind of analytic workflow where I create a plot in Excel, and I have to copy and paste it into Word. That's operationally inefficient. It's operationally dangerous. So the notion that my entire analytics report is in one file, all of the code, all of the narrative, all of the tables, it's like telling me that I can breathe underwater and fly. It's really fantastic. So I think that as actuaries become aware of that capability, they will get pretty happy about it, as will people who need to communicate with actuaries.

I see that as being a real big selling point when people find out that they can replace a very typical actuarial and kind of analytic workflow where I create a plot in Excel, and I have to copy and paste it into Word. That's operationally inefficient. It's operationally dangerous. So the notion that my entire analytics report is in one file, all of the code, all of the narrative, all of the tables, it's like telling me that I can breathe underwater and fly.

I like the way you put that copying and pasting is operationally dangerous. Yeah. I'm not the only actuary who has been burned by that. Where you're in a meeting, everyone's looking at the same file, and the plot didn't update. What do you mean we're losing $1 billion a day? Oh, sorry, that's yesterday's graph.

Writing the book with R Markdown

It's really cool that your book was built on R Markdown as well. I was wondering, what was that process like of writing a book, and how did you decide that you were ready to write a book? Two things. One, I'll start off by saying that if it were not for R Markdown, there is no way I ever would have written that thing. I can't imagine. Certainly, I'm not going to write the entire document in LaTeX. There are people who can, and God bless them. I'm not the person that could have that much LaTeX to think about, nor do I want to juggle about 12 or 13 different Word documents or one monolithic Word document. R Markdown was a tremendous solution, and it was really empowering.

The idea for the book came about from workshops that I and others had been doing for the CAS. We had one-day workshops, day-and-a-half workshops. I found that there was a lot of foundational material. I'm thinking about vectors and slicing vectors and concatenation and lists and data frames. This was taking up a large amount of scarce time. Teach somebody to be an actuary using R in about seven and a half hours. That's a tall order. When you spend about three hours or more just making sure that folks understand what a data frame is, that's scarce time that we're losing. Those early slide decks, probably all of which were written in Markdown, by the way. I'm translating it to RevealJS or something like that. We already had all of the script for those. I sat down with those decks and said, okay, these look like book chapters. You take a 45-minute session, and you've already got a lot of the code and the examples. At the time, I thought, well, I'll just flesh this out with a few paragraphs, and bang, I'll have a book. It wound up being more complicated than that. It turns out that it does take a little bit of time to write all of the words in such a way that they make sense. But at that point, it was too late. I was already, I think, 80 pages into it and just needed to finish.

AI, Bayesian methods, and the future of actuarial science

Do you have any opinion on using chat GPT and the effects on your industry? Not yet. The only thing that I know, I know very little about chat GPT. I ought to be more conversant in AI. A few years back, I just dabbled with some very, very basic exercises, just to understand the mechanical flow. I'm blanking on the co-author, but JJ Allaire has a really good book about AI. That was kind of my intro to that.

AI generally is—how do I phrase this? It's a slightly awkward fit for actuarial practice. That's my view, and it may be wrong. There are definitely cases where it works. My sense is that AI relies on a lot of observations. Insurance claims, thankfully, are pretty rare. So those observations are kind of hard to come by. Now, that's not to say that it has no place whatsoever in the actuarial workflow, and we've got some papers that have looked at this. Last year at a conference, we had a really good talk about the long-short-term memory models in loss reserving. So that's an active conversation, and we're looking at more and more applications. Ron Richmond, I'll throw his name out there, is a very, very talented expert who's looking at AI and deep learning in actuarial applications.

Returning to chat GPT, I've already outed myself as an old man, and I haven't really used it very much, other than I think it was last week or the week before I posted something on LinkedIn. Someone had posed a question to chat GPT about the Grateful Dead. I'm a big Deadhead, and the answer was just wrong. So I know that some actuaries have fed an actuarial exam to see if it can pass, and so far it can't. So that's good news for the profession, I suppose. Hey, Brian, that's called hallucinating. As a Deadhead, you should be familiar with that concept.

I'm a Deadhead, but I'm also a big database nerd, and don't get me started on the statistical analysis of Grateful Dead performances. But at the moment, especially specifically something like chat GPT, there are not active applications that I'm aware of. Deep learning generally, yeah, there are a few, but I see a lot more scope for lift for actuaries out of Bayesian techniques, because we simply don't have a lot of observations, but we have loads of kind of qualitative wisdom to bring to bear. And I would say that Bayesian techniques are just woefully underutilized. Actuaries were at the forefront of promoting Bayesian techniques for a long time. There's a really good book called The Theory That Wouldn't Die. Some of it's like Lady Tasting Tea. It's kind of a popular, easy-ish to read book about statistics. There is an entire chapter about actuaries. One of the few books you're going to find that will do that.

But they're just not mainstream. And especially with the advances in something like MCMC, there's no reason why we can't be using them. So I'm loads more bullish on Bayesian techniques than I am on deep learning, not immune to deep learning. That is something that we're looking at, but I see a lot more lift out of Bayesian techniques.

I see a lot more scope for lift for actuaries out of Bayesian techniques, because we simply don't have a lot of observations, but we have loads of kind of qualitative wisdom to bring to bear. And I would say that Bayesian techniques are just woefully underutilized.

Career advice and closing thoughts

Thinking about your career so far and your mentors along the way throughout your career, is there a piece of career advice that stands out to you? Good question. I'm not sure that I'm an amazing person to give career advice. I've had a lot of luck over the years, but I've kind of randomly gone from one thing to the next, and I don't know that that's great advice. Maybe I could spin that positively and just say, I feel like I benefited from an openness to do other things, and also just an awareness that depending on what industry you're in, you will need some resilience. I worked for a really cool, really fun startup for about a year until it just evaporated. You go into work one day and they say, well, you know, it's been fun, but the market has other ideas about the viability of what we're all doing here, so you do need some resiliency and some openness.

I guess the mentor there has just been experience, just having lived through some good fortune and some not so good episodes has been what has taught me that. I will say, though, that I've got a really, really cool network. I've worked with some really, really great people. This is advice. I'm not going to be the first person to have shared this, but as often as you can, try to make sure that you're the dumbest person at the table. You know, not so hard for me, but being deliberate about that has been useful.

I appreciate that advice. I also think just like listening to you today, what I've learned is to also ask for things that you want. If you want to go and travel to a different country and teach a workshop there, ask people if they're looking for that. Brian, thank you so much for joining us today and sharing your experience. This has been great. I'm going to share your LinkedIn here in the chat for everybody as well, but thank you all for the great questions, too. Have a great rest of the day, everybody.