Gerard Sentveld @ Prudential | Data Science Hangout

video

Feb 12, 2024

57:40

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout. I'm Rachel. I lead Customer Marketing at Posit. I'm excited to have you all here with us today. The Hangout is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, connect with others facing similar things as you. We get together here every Thursday at the same time, same place. If this is your first time joining us today, it's so nice to meet you.

I like to ask, is this anybody's first Data Science Hangout? Please say hello in the chat because we'd love to welcome you in for anybody joining for the first time. We'd love to hear from you no matter your years of experience, titles, industry, or languages that you work in. It's also totally okay if you just want to listen in here. Maybe you're having lunch or you're walking your dog or something, but also awesome to be a part of the party that happens in the Zoom chat too. There's also three ways you can jump in and ask questions or provide your own perspective on topics. One, you could raise your hand on Zoom and I'll call on you. Two, you could put questions in the Zoom chat and just put a little star next to it if it's something that you wanted me to read out loud instead. And then third, we have a Slido link where you can ask questions anonymously too.

And just a quick note before we get started that if you are watching the recording and sometime in the future and want to join us live, the link to add it to your calendar will be in the details below to join us. And there's no rule that you have to stay the whole time or talk, come and go as it fits your schedule. With that, thanks again for joining us. I'm so excited to be joined by my co-host today, Gerard Sentveld, Director of Data Analytics, Operational Risk Management at Prudential. And Gerard, I'd love to have you just introduce yourself here, share a little bit about your role, but also something you like to do outside of work.

Sure. Thank you. So first, the obligatory, these are my opinions, not Prudential's. But so I'm a Director of Data Analytics within the Operational Risk Management team within Prudential. And what that basically means is that for an insurance company and an asset management company that Prudential is, there's in that regulated industry, there's a lot of focus on us demonstrating that we manage risk. And there's two kinds of risk. There's financial risk, which is kind of like the ability for the company to manage fluctuations in the market and fluctuations in regarding to the actuarial models that we have. So COVID was a fun one for that. But then there's also operational risk and COVID was a fun one for that as well.

As an organization, we have to be able to manage disruptions as far as weather-related events, sales practices, fraud, cyber attacks, all of that. And the Operational Risk Management Department is responsible for that. And at a very high level, we are responsible for calculating what's called risk-adjusted capital, which is a number that is tied to how much operational risk we have. And so the less risk we can demonstrate, the less of that capital we as a company need to hold aside. And of course, cash flow is king. The more money you can freely move around, the more flexible you are as a company. So that's the importance of our role within Prudential.

About something else, how to work. Yeah, so I love music and the gadgets that come around with it. So I love my stereo system. And last week, we spoke a little bit about home automation, which is kind of like DIY home automation. I love DIY audio. So the amplifiers that are powering those beautiful speakers in the background, I built myself. I built myself a streamer that I have, and my music server that holds my music and streams it to those speakers, I put together myself as well.

Operational risk explained

That's great. Thank you. So thank you for describing your role a little bit, too. I know when we were talking before and you were explaining risk to me, I didn't at first realize the difference between operational risk versus financial risk. And I know one example you shared was about if people working on a project or a workflow all live in the same area as well. And I was wondering if you could just describe that a little bit more about what that means, operational risk.

Yeah, so as a company, the business continuity team is responsible for being able to re-establish all our business processes in case of a catastrophe, like Hurricane Sandy or something like that, people can't get into the office. That means other people will have to fill in those roles. And so in order to do that, you need to be fully aware of your business processes, and you need to know who is responsible for them, who are the key subject matter experts in that process. And you kind of want to spread them out so that if there are scenarios that impact a very geographically centered location, that you don't have everybody living there so that you can adjust for those scenarios. So that's what we kind of try to express in a number associated to each business process, so that the stakeholders that own that business process can think to themselves, okay, maybe it's time for like a rotation of some resources that are associated to them.

Maybe to put us all in your team's mindset too, could you share an example project that your team's working on or maybe a past project? I think that helps us understand your challenges and team goals too.

Yeah. So a big one that I can describe is we try to express our IT incidents in their impact to business processes. And the thing is that when something in a data center goes down, a ticket gets raised and they say computer ABC is, you know, hard drive is full or memory issues or CPU needs to be upgraded or something like that. But that doesn't make it possible for a business owner to understand what that really means. And so within our IT system, there's this giant list of, you know, all these assets and it's a giant graph, a network of all these pieces that are combined. And so our team created a network out of that and calculated which business processes are impacted if something goes down or needs to be replaced or, yeah.

Shifting industries and team structure

But Gerard, I know you previously worked in the pharmaceutical industry before moving to Prudential. And I was wondering, what was that shift like for you shifting industries? And if you have any advice for some of us who might be doing something similar?

I think one of the most interesting aspects there was trusting, I guess, my core skills, my data science core skills, and being able during my interview process to express how that could be valuable in a completely different industry. And so in working in the pharmaceutical industry, and so in prior projects at Merck and at Bristol Myers, I had worked with third-party risk management teams. And obviously, that's an operational risk. And so I was able to highlight that when I switched to Prudential. And yeah, one of the first things I had to do was learn a lot, do a lot of training and resources that the company had made available about what it is that Prudential does and what the department really was responsible for.

Could you also share with us a bit about the data team structure at Prudential, too?

Yeah. And that's kind of what we hinted at at the description of this hangout. It's very much a matrix organization. There's a lot of people that have dual reporting relationships up and so I'm now part of operational risk management, which is a fairly small group, because the thinking is that the actual owner of the risk is the owner of the business process. And so they're in the business and they have resources that are knowledgeable about risk management as well. And so from a functional perspective about risk, they look at us as a source of knowledge and skills and effort. We help them with their work from that perspective. But from a managerial line, they report into the business. And so that's kind of where we're at right now.

And then on the IT side, we have a similar scenario. So I consider myself to be a data scientist because I apply data science principles to all the data that my stakeholders own. But there's an entire organization out there called the Chief Data Office that has many data scientists that really work on taking the models that we develop and migrating them up to production and integrating them with operational processes. And within that organization, there's a lot of different roles and responsibilities. And so coordinating all that is a giant amount of work.

Types of risk and time sensitivity

Russ asked, what kinds of risks is the team looking at? Is it issues with computers, hard drives, or Prudential's processes? And how time-sensitive is the risk?

Yeah, so it's what we consider all operational risk. So it's, according to the regulatory bodies, there's like a whole taxonomy of risks out there. So people and processes is one of those branches. IT is a branch technology. And all of those are divided into smaller pieces. And so there's a lot of variety. If you're interested in a particular type, you can find your fun projects to work on within the organization, because there's so many different things to work on. About the time sensitivity, a lot of work that we do has to do with highlighting and finding areas of high risk so that we can alert business owners on where to pay extra attention and put in extra controls in order to manage it. The more, I guess, frequently a process gets executed and the more ability there is to use the data science model to predict things going wrong, the more we try to integrate it in a business process.

Sure. Thank you. So I apologize. I was a minute too late. I don't know the full context of the modeling we're doing, but in your sector, I think you guys are a little bit ahead of upstream oil and gas. So I'm with the upstream oil and gas, and there's a thought that everyone should have this democratic ability to build a model. And at the end of the day, a model, some sort of predictive capability. I'm trying to fit and predict. And if we could do all that, we'd make even more oil and gas and we'd all be making more money, but it's very hard. So I want to understand in your domain, the methods and the deployment strategy, because I have this illusion in my mind. I don't know if it's true, if it's fantasy, that there's this very robust process you have to identify, develop, validate, and then deploy these models or products. And I don't see that in my day-to-day, and I'm trying to understand what good looks like in that space.

Sure. So model risk is kind of a risk area that falls in between financial risk and operational risk at Prudential, because the financial models that, for instance, determine a quote for a life insurance policy or things like that, obviously, if there's an issue with that, they create financial risk. But the development of it and the processes around it and the rigor that's put in the whole development of that is an operational risk, because that's like a process risk. And so that really, that whole process of model risk governance is what controls the quality of the model. So the higher the impact that you can have with your model, like monetarily and just from a reputational risk perspective, the more rigor is placed on that process. And it has to do with quality of documentation. Sometimes even the model is so important that the model governance team creates a secondary model from scratch without any of the resources that the model team had in order to verify that the outcomes are, as you mentioned, good enough. There's testing involved, et cetera, to prove out that the model is doing its work. And then there is on the IT side, the more processes you impact, the more involvement you have from the ML engineers and the data scientists to, again, validate that your model is doing what it's supposed to be doing. Again, when you get to more impactful models, there will be less open source allowed and more restrictions on that. That's kind of how we control that risk.

Managing black swan events and audit

How do you approach managing risk for something like the COVID pandemic, a one-time situation with no recent precedent?

So I think the actuaries were impacted from a financial perspective. Many of these models have tables that are updated every year to reflect mortality and data from a certain period of time. And so, obviously, with COVID, those models needed to be, tables needed to be updated more frequently, and a lot more scrutiny was placed into that whole process. From an operational risk, a lot of work was placed into enabling people to work from home and be effective and looking at all the issues that come along with that. But yeah, that is, for an operational risk perspective, is jumping on, dropping everything you're working on and fully paying attention to that new risk.

Yeah, thanks. Hey, everybody. I'm curious if the kind of risk assessment that you do is subject to auditing or inspection, and if so, what does that mean for how you do it, what kinds of processes you have to have in place, and what kinds of documentation you have to wrap it in? Is it all part of a quality system, or do you have some latitude where you can flex around a little bit?

So, within Prudential, audit, compliance, and risk are considered non-financial risk management, and so they're kind of under one umbrella. And so they're, I guess, a partner organization of ours, and yeah, our processes in and of itself are governed by compliance and audit as well. I don't know if that answered your question. Yeah, and I guess maybe just to follow on, does that create constraints for the kinds of modeling you can do or the kinds of recommendations you can make?

I've not encountered that yet. There is an interesting consequence of the different roles that we have within audit, compliance, and risk. You would love to be able to reuse a lot of the methods that you use in order to obtain the data, in order to assess risk or calculate risk, but because audit has an independent function, and they need to operate independently, and they need to be able to fact-check anyone, they also need to be able to fact-check us and what we do. And so, while we would like to get data as refined and as cleansed as possible, audit usually wants the data as raw as possible. So that's kind of like a challenge there, because obviously you would want to reuse as much as possible, but because of that independence, they can't necessarily trust that the data cleansing and the data prep is good. So they start even further back, yeah.

While we would like to get data as refined and as cleansed as possible, audit usually wants the data as raw as possible.

Tools, languages, and open source

So the first question, I've seen mostly Python and Excel. I know there are R users within the quant and the financial modeler organization, but I haven't really done much work with them. There's also older models out there prior to Python even existing. And that's an interesting aspect of operational risk, knowing that the majority of practitioners are moving along to R and Python. Excel is considered an end user compute solution. And so to control that risk, once that becomes something of importance, there's a lot of rules that come around of actually using it for decision making. As you can imagine, you can mess up a lot with an Excel file. And so there's specific rules around it, how well it's documented, what the structure of the solution is. And therefore, a lot of it is migrated to Python modules.

And then the other question was, books about operational risk management. So there is the classic view of operational risk management for that calculation of risk-adjusted capital. That's fairly accounting focused. And yeah, there's books around it. I think if you search for operational risk management on your favorite bookstore, you'll find some. I found that having understanding of business processes and I guess business continuity, so disaster recovery and disaster recovery planning, is also very helpful because it gives you a good understanding of how to express things that are really important to the company. There's a lot of things that during your day, you can do right or wrong, but the impact to the company is minimal and it really is not that great. Once it's described as a business-critical process and someone wrote it down as what would happen if this whole data center stops working or this building is on fire or there's a hurricane at headquarters. Once people start describing that and get to the point of, well, this person is so important that if something happens, they need to be evacuated or given a new computer within an hour. Once you get to that kind of level, you really learn a lot about the business processes that keep your company alive.

When you describe the process of, so if an Excel document becomes business-critical, it then gets moved over to more of like a code-first solution. And I'm just wondering if you could expand a bit on what does that process look like and what is the kind of collaboration between different teams?

I think it stays within the team that owns that business decision. But so it's a control that there shouldn't be any Excel files that don't follow that process and that rigor that are used to make business-critical decisions. And so it's up to the stakeholder to identify that and then migrate it off. And if that, I get a model like that comes along, then the model risk management group will approach them and go through the whole process of what any other model would have to go through.

Team size, structure, and data science community

Yeah, sure. So my company has a really small data science team. We've only got three members, including myself. And I'm just wondering for companies that have more established data science practices, such as Prudential, do you guys have a chief data officer that's kind of in your corner helping to promote data science as a whole?

Um, I think we have multiple of those people within the organization. I think the chief data office is hundreds of people. And that group does not include the quants of the company. So the financial modelers, that's a whole different group, and on the actuaries, there's a whole different group on top of that. And because Prudential is an insurance company, there is very little you need to do to convince them that data science and statistical solutions are the way to go in order to make good decisions.

Yeah, I'm just sort of thinking of it in terms of the consumer packaged goods industry. We don't have anybody like that at our company. So I just thought I'd pick your brain about that a bit.

Yeah, I would think in that space, you could, I guess you could make references to companies like Amazon and other key players in that space that have used data science to squeeze the margins and be successful that way. But yeah, you know, being an insurance company, we're kind of cheating that way. It's not difficult to convince leadership that historical data will give you quite a good prognosis on mortality and occurrence of fires and stuff like that.

The actuarial sort of background seems to exclusively prefer a frequentist approach to these sort of statistics. Somebody else had already asked about the Black Swan events, these rare one-offs like COVID and stuff. Do you all ever dive into Bayesian approaches as well, or do you stick strictly to frequentist approaches, or what?

I believe there are teams that use that. As far as the models that calculate risk-adjusted capital, there's a lot of extreme value theory at play. You could simulate a lot of that, of course, and there are some Bayesian techniques in there. But yeah, you're absolutely right. From an actuarial perspective, there's a lot of focus to frequentist methods. The downside with some of the extreme value approaches is that you need a good amount of history, and in certain cases, you don't have that necessarily. Simulation and synthetic data generation can be helpful there.

Oh, about the team. Yeah, Jared, I'm curious if you can say more about the team generally, and I'm curious if we've talked a whole bunch about modeling, and I wonder what other kinds of activities are folks doing, if anything, or is everybody really focused? And then related to that, how broadly do people operate? Are people really specialized in domain areas or areas of the business, or do people tend to sort of float around and respond to needs as they come up and sort of from a shared kind of pool perspective?

Yeah, so the teams are, because of the matrix nature of the organization, the teams are really comprised of people from all over. So for instance, one of the projects that I described earlier, we have a stakeholder that is within IT because it's an IT risk, and we have myself, which is in risk modeling, and then we have resources from the chief data office that are assisting with the creation of the graph and testing out the model and creating visuals, et cetera, et cetera. And then we have folks from the data platform that enable access to the data and pump it all in. So they're from all over, and I think that enables just a lot of possibilities in order to work on different things. So for me, because I'm in this operational risk role, I can jump into all of the different projects that are out there.

And our group is fairly small. So we have like, from a modeling perspective, from a data analytics perspective, I would say like four or five people, depending on how you classify their primary role. Yeah. Thanks. Yeah, I appreciate that. Yeah, sure. And around managing risk, right, there's also just a lot of paperwork that comes around with it. And that paperwork is not just for fun or, you know, just it has that regulatory requirement as well. We are required to track any financial loss over a certain amount. We are required to have descriptions of all the risk events that we expect and that we track. Key risk indicators, they all have to be documented. They all have to have owners. They all have to be reviewed every now and then. That whole process of managing risks, processes, is just, I'd say that's the larger part of the operational risk management group. There's more people in that than in my team.

Data science community and LLMs

Gerard, I know I asked you this before when we were chatting a few weeks ago, but I know you're heavily involved in the data science community and data science competitions in your last role in the pharmaceutical industry. And I had asked, well, what does that look like at Prudential? Is that something that you're involved with? And you mentioned not yet, but weren't really sure why. Could you chat a little bit about how the different data teams at Prudential get connected with each other?

Yeah. I think because Prudential is such a data-driven kind of business, I think it's more assumed that everybody has those certain skills. I see a lot more independence there. There's not necessarily a community. But I think things like the emergence of ChatGPT, and the fact that there's a lot of emergence of large language models, all of a sudden drove that point home, that there is a hunger of sharing knowledge and expertise, because everybody is trying to wrap their minds around it. What can we do with it? How can we do it effectively? From a risk perspective, we're on top of it because we don't necessarily, we don't want any intellectual property to be shared with companies like OpenAI. And in that space, there now is a think tank, and there's a community building that's slowly becoming more mature.

Within the prior companies that I worked, there were more pockets of data scientists that didn't know about each other. So I think within Prudential, again, because it's driven by a lot of data, a lot of the teams are aware of each other and they know where to find each other. Within the pharmaceutical industry, it wasn't necessarily a lot of sharing between the R&D folks, and even within R&D, the R and the D was not necessarily talking to each other from a data science perspective. Similarly, the folks within corporate and within manufacturing, they were not necessarily looking at each other for best practices, for sharing of compute platforms, for sharing of training materials, et cetera, et cetera. And so when I guess data science, the term was coined, it became more obvious that those teams were also trying to figure out how to do this better. And having a community at that point was really powerful to showcase that they are, even though their methods are different and their subject is different, they're doing data science, and they can learn from each other, and they can benefit from having these discussions, and most importantly, the investments in compute platforms.

Having a community at that point was really powerful to showcase that they are, even though their methods are different and their subject is different, they're doing data science, and they can learn from each other, and they can benefit from having these discussions, and most importantly, the investments in compute platforms.

Sure. Good afternoon. So I wanted to ask, we have similar risk profile here, and just kind of messaging around the use of various LLMs, but at the same time, we hear, this is the future, this is where we have new possibilities, we'd be happy to leverage just technology. Is there any kind of lab access or just general strategy on allowing people to learn the technology and use it, but not expose company data so that we're ready when risk has come back and said, hey, now we turn this on, you have this ability?

Yeah, sure. Thanks for joining, JJ. Good to see you again. So the process kind of went, like you described, we started out with a lab where, you know, really no internal access at all possible, no internal documents, just to figure out how it worked. From a risk perspective, only ideas that do not touch any critical business process were allowed to be further developed just to see what this potentially could add. And within the operational risk organization and within model risk, AI or generative solutions were classified as a new category of risk. So a new control was put in place in order to make sure that if a model like that makes its way out of development into production, that there's a process around verifying it and making sure that it's not exposing us to any more risk than we're willing to take.

Guiding leadership on AI investment

And the question was lots of high level leadership for many big companies are so excited about AI or generative AI. Some already did invest money in it, but got low ROI. As a data scientist, what is a better way to guide leadership? How to better use AI or high tech practically, along with building a foundation?

Yeah, so I like two approaches. One, I like the let your business leaders pick a discretionary budget of an amount that they're willing to spend on just trying things out. I think a fun example, I think at Chevy went a little bit awry when Chevy started promoting other brand cars when you asked it nicely, but they came up with a chat bot that could tell you everything about their cars. But as I mentioned, they quickly realized that it was not infallible. But that's something that is a nice to have a fun feature. It's a marketing kind of gimmick and you can do that with discretionary budget or with a budget that you assign every year. The more valuable areas I think is where you can reduce the amount of busy work that people have reading unstructured text going through videos, et cetera, et cetera. If you're capable of using these technologies to adequately summarize it or describe it, obviously you need to put a lot of rigor around doing that correctly and ethically. But then you can express your benefit in opportunity cost. If an analyst spends half their day transcribing something that a generative AI can describe in a few minutes, then they have four hours that they can work on something else. And that's how I would describe it. I would never describe it as, so we don't need that four hours of that person. They can do more valuable work.

Open source governance and data validation

The question was, you mentioned limits on open source. Is that regulatory? What is the process for getting approvals or weighing risk?

I think it's similar to what others have mentioned last week. We have a repository that acts like a copy of what CRAN would be for our users and what Anaconda is for Python users. And the closer you are to production, the more restrictive your ability is to download things from that environment. So for my desktop use, if I want to try something out, I have access to quite a lot. And as soon as I download something, it will be scanned and there will be a lot of processes that are kicked off. And then depending on what you download, you may get an email saying, well, we recommended you stop using this package. But that would never get to that point for something that would make it into production. There's a very prescribed list of packages that you can use in those environments. And of course, if you need something, then you can request it. And then the process kicks in of manually reviewing all the code and much more scrutiny is played at the open source at that point.

The question was, how did data scientists ensure the data they are reliant on has been validated or verified? As an architect in a large organization, the request for data to be integrated into local repositories and data platforms is staggering.

Yeah. And I think that also comes back to what Alan was asking about. So from an operational risk perspective, that is definitely a concern. And depending on the impact of the process, the audit process will catch up with that and will ensure that the data that a model uses is faithful to the source. So if there is a lot of processing and a lot of data manipulation in that whole data pipeline from the source to when it's finally used in a model, that none of that is going wrong.

How did you go about building the lab environment? At my company, we were thinking about setting up an enterprise account with open AI instead of individual accounts. Is that a reasonable approach?

So I don't really know. That is something that really fits within that chief data office. That's what we rely on them for to establish. They already had a team that specifically is focusing on AI solutions. But the chief data office recognized that there's machine learning and AI and generative AI solutions as well out there. And so they established a platform and they do the intake for requests for functionality. So as soon as chat GPT was launched, they were bombarded with requests for, we need this, how can we do this, how can we get our hands on this? And the first reaction was, well, you can't. And the second reaction is, well, give us some time and we'll figure it out. And establishing the sandbox and establishing a contract were first steps in that whole process. And it's an evolving process. I think every company is trying to keep up with the likes of open AI and Google and Amazon and Microsoft on what they're rolling out on a weekly basis.

Closing reflections

Yeah, absolutely. Gerard, I actually started my career as an actuary with Mercer, now work for Charles Schwab. I was just wondering, I'm out of the actuarial space, but I still remember those days and that was definitely the rise of open source. I was wondering if you use any open source packages for any of the actuarial side work that you might do?

I don't know with a hundred percent certainty. I would find it almost impossible to believe that there's no open source being used. But I wouldn't be surprised if it is the core of our top product that someone took it completely apart and rewrote it completely from the first letter to the last letter of that open package. I wouldn't be surprised.

Gerard, do your actuaries and data scientists collaborate on projects at Prudential? And if so, how does that usually work?

I would imagine. I'm not really involved in that space because only when they create something that needs to be moved to production, the operational risk team really gets involved and we get to know what they're doing. I hope at one point I'll see something like that come by. Also, I've only been with Prudential for like a year and a half, so I have not nearly scratched the surface of what happens within all the nooks and crannies of the company.

But what is one of the most memorable pieces of career advice that you've either received or given throughout your professional journey?

I thought about it after we did our prep meeting. Let me rephrase it as the most impactful that happened to me. I was born and raised in the Netherlands. I worked for a little company at the time, and my mentor at the time emigrated to Curacao, which is in the Caribbean. I was trying to make a career switch, and I asked him for advice. He said, why don't you come work with me? Four weeks later, I was on a Caribbean island working there. So that was the most impactful advice that I ever followed, but I'm not so sure if it's applicable to a lot of people on the call right now.

Well, thank you all so much for joining us. Thank you so much, Gerard. I really appreciate it. Sure. Thanks. It was a blast.