Benedikt Kahmen @ Generali | Data Science Hangout
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello, everybody. Welcome back to the Data Science Hangout. If we haven't had a chance to meet yet, I'm Rachel. I lead Customer Marketing at Posit. I'm so excited to have you all here joining us. If it is your first time, the Hangout is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, and connect with others facing similar things as you. We get together here every Thursday at the same time, same place.
But again, if it is your first time joining us, so nice to meet you. Say hi in the chat if you want to welcome. We'd love to welcome you in to anybody joining the first time. We're all dedicated to keeping this a friendly and welcoming space for everybody and love to hear from you no matter your years of experience, titles, industry, or languages that you work in.
There's also three ways that you could jump in and ask questions or provide your own perspective today. So first, you can raise your hand on Zoom and I'll call on you to jump in. Secondly, you could put questions into the Zoom chat and just put a little star next to it. And then third, we also have a Slido link where you can ask questions anonymously.
And quick note for anybody watching this recording in the future on YouTube and if you want to join us live, the link to add the event to your calendar will be in the details below. If you are adding the recurring event, just double check the time zone for you. So it's every Thursday from 12 to 1 eastern time. No rule that anybody has to stay on the whole time or talk. Just come and go as it fits your own schedule.
But with all that out of the way, welcome again. So excited to be joined by my co-host today, Benedict Common, head of analytics, data, and AI at Generali. And Benedict, I'd love to kick things off by having you introduce a little bit about yourself and your role, but also something you like to do outside of work.
Yes, sure. Thank you. Yeah, I'm Benedict. I'm the head of analytics, data, and AI at Generali Deutschland. So that's the German branch of the Generali Group, which is one of the leading primary insurers in Germany. We're working with a bit over 9,000 colleagues here. I've been with the company for the last 10 years and had the amazing opportunity to build a data science department from scratch, from literally one or two colleagues to now three full teams. Before that, maybe somewhat unusually, I did a PhD in philosophy. So if you're interested about that, just go on and ask me.
Yeah, and something outside work, I also am a marathon runner. Hopefully Luxembourg. I live in a part of Germany that's close to Luxembourg and they do a night marathon there. So maybe that's a good challenge.
Philosophy, curiosity, and data science
That's awesome. I'm curious because you mentioned your background in philosophy as well. How has that influenced your approach to building and scaling data science teams?
So maybe the intersection first. I did my PhD in a part of philosophy that's called philosophy of minds, which tries to understand how thinking works. Well, one way to do that is to think a lot, which is what philosophers do. Another way is to do experiments, what psychologists do. And a third way, the engineering approaches try to build a mind, which is what data science and AI researchers do. So it's not really that far removed. Although I've never seen one bit of real data in all of my philosophical studies.
How has it influenced my approach to building teams? Well, maybe not the philosophy background, but the academic background has influenced me there because my desire to study that and do a PhD there was always driven by curiosity. And I think that's one of the central values that's like the quality of life for data scientists in the day work. So I'll try to give as much room as possible for this exploration, this curiosity also in my teams.
Team structure and use cases
So we have major data science teams in four or five functions within the organization. Being an insurance company, of course, we have data scientists in the actuarial departments, which, for example, are responsible for the insurance prices. Then we have also data science teams who work across different functions. And my teams are part of that. So we are part of the chief customer officer function, which is like the umbrella for everything that has to do with distribution, but also with the customer contact and the customer lifetime.
The size of the team, so it's currently 23 people. And with that, we are, I think, one of the largest, maybe the largest dedicated data science team in the general group.
We have a number of use cases across different functions that we touch. For financial services, firm, of course, marketing distribution are very important. So we spend a lot of time doing a marketing mix modeling or attribution modeling just to find out how to optimize our marketing spendings and distribution activities. Very closely linked to that are use cases within the customer relationship management area. So for example, calculating next best offers, the next best actions for customer agents is one of the things that we do. At the intersection to the actual area departments, we also develop models that inform the pricing that we do. So we are involved with price elasticity models or with survival and retention modeling.
And we also have quite a big interest in operations, which is for a services firm again, a lot to do with extracting information from unstructured data, a lot of paperwork actually, but also knowledge retrieval. Generali has been in the business for quite a while. So we have very old insurance conditions sometimes, and you get very complicated questions that you have to answer about what is insured in a policy that's 20 years old. So routing customer intents to the right person who has this knowledge and helping that person find the knowledge even faster.
R and Python in the team
So right now I think it's an even split or roughly an even split with an unclear tendency or with no tendency. We have the requirement for every data scientist who joins the team and only speaks for one of the two languages that he or she learns the other within the first, roughly first year, so that everybody is comfortable with working in both languages.
Initially, we have tried out projects where within one project, we also mix the languages, so write one module in R and another in Python and a third sometimes in Java. We don't do that anymore. That's not very efficient, obviously, but we had to learn that. So it's most of the time, it's just one language per project, but that can change just depending on the people and the requirements. It's a bit easier to integrate Python in operative applications in our system landscape, and sometimes we have to resort to Java to be really compliant with all the regulations that we have in the industry.
Yeah, same for me. I mean, I feel most at home in R. The marketing mix model we're building, which I'm closest to in my data science work, is also completely in R. So that's my home zone. But I have to remind myself to stay fit in Python.
Polyglot challenges and onboarding
Well, one of the challenges was package management and dependency management. If you have both languages, you just have to manage more potential holes when you go to IT sec for clearance to go into production. So that's just a lot of work. It also has implications on the kind of documentation we have to do. It's just more documentation if you use many different languages. And it's also just the switching costs. So if one and the same person, even if he or she is fluent in both languages, just switches intraday between languages, people just felt that that is just a friction in their everyday working that isn't necessary.
In the beginning, it made sense because when we started off, we were like four, five, six people doing that. So one had a Python background and another had an R background, and we had to throw that all together in the first project. But as we grew, we had subgroups who focused on one of the languages, and then it was more natural to stay within that context in that one language per project. So maybe it's also a function of the size of the team.
So in terms of the skilling process, we use DataCamp as a learning platform to do the data science trick there, just to have a harmonized set of skills and that people just talk about the functions in the same way, things like that. We have our own helper packages that interface with our infrastructure at various points in both languages. So everybody who joins just gets an introduction there, and then we'll have just the usual onboarding process within the projects that gets the introduction to the code and the specifics of the project.
I don't have the feeling that I really get pushback, so maybe the pushback is more subtle and I just don't read it right. Of course, we're joking about that. So there's the Python team, and then there's the R team, and we make our jokes about the one true language, but they're jokes in my opinion.
We had an initial phase where we had to discuss with the rest of the organization, especially, of course, with our IT colleagues who are very Java-oriented. There is a new data science department who wants to do stuff in R, which is strange, and Python, which is not so strange, but still not on the radar, or hasn't been on the radar. That's a few years ago, of course, now it's standard. And you asked for the value proposition. I mean, at us, it's part of the interview and onboarding process. So we tell potential candidates very, very openly that we require R and Python skills. So there is a clear expectation, everybody who joins has at least said that Rishi is fine with that.
Marketing mix modeling
Yeah, sure. So the starting point was Facebook's Robin model. So we have a mostly R set up there, and if you go through it line by line, you get a really good feeling about how to do the various parts that are required, like ad stocking and what are parts of a good marketing mixed model.
Now, one of the things from this first experience with Robin that didn't work well for us was we had the requirement that we, on the one hand, wanted to know what the effects of our brand are on sales, but also we wanted to know what the effects of performance marketing, like search advertising, are on sales, and we wanted to have that in the same model. If you go to an agency and buy a marketing mixed model, you usually get two different models, one for the performance marketing and another maybe structural equation model for the brand part, because that's longer term.
That was the initiator for us to build our own model, and one of the central requirements to have a marketing mixed model that we can refresh every day. I mean, if the search colleagues come to me and have launched a new campaign, and a few days after that they come to me and want to know whether their campaign works, whether I see any differences in the marginal contributions of the different campaigns to sales, well, it's no good if I have to tell them, well, come back in half a year, because that's the refresh rate of the model, and then we have weekly data, which is the usual thing you do for marketing mixed model.
And then you have to do model adjustments. For example, one of the central questions for the model are time-varying coefficients for the model, so if you use a standard marketing mixed model, you get one coefficient for one channel, that's what Robin does, for example, but if you want to do a daily model, and you want the model to pick up changes, then you can't have one coefficient for the whole time period, say a year, on a daily basis, because then you don't see any differences, so you want to see these coefficients change in the model, and there was a lot of experimenting with the feature engineering and how to get the GLM net parameters just about right, that these differences in coefficients come up in the model.
There are lots of other tinkerings and bits, for example, we played around with causal inference, because we had a lot of sales process data, and just threw that out of the window for the standard models with Robin, so we integrated a few different formulas and a few mediating models that represented the process steps in the causal inference chain that we had in theory, at least in our sales process, and then we can better isolate the effects.
To illustrate that in one of the first iterations of this model, we had a quite strong impact from social, from organic social posts on sales. Later on in the model, we implemented, we also added our internal email campaignings to the model, that weren't part of the initial data engineering effort and weren't there at the beginning, and then we saw from, while analyzing the causal inference chain of all these intermediate models and the chain, that the effect of the email campaigns controlled, when controlling for social posts on sales, was greater than the effect of social posts on sales controlled for email campaigns, which was an indicator that our social, organic social posts reached at least partly our existing customers, which still drove engagement, but didn't drive additional sales. So that's an, that was an interesting discussion with our marketing colleagues, how we could readjust the targeting of those posts, for example.
which was an indicator that our social, organic social posts reached at least partly our existing customers, which still drove engagement, but didn't drive additional sales.
Yeah, sure. So that was, that was the beginning of one of the examples. So in the development phase, we had this, this very short feedback loop where we would work on the model each day and just send two or one or two output plots of the contributions of different marketing campaigns to sales, just to our then head of marketing. And he would just give us very quick reactions like total crap. I don't understand this. This isn't, this isn't possible. How can we have negative sales there? So it also helped, helped us fix issues in the data engineering part. But it also just transferred to us a lot of domain knowledge about marketing that we didn't have.
After, I don't know, 50, 60, 70 of these exchanges, so it was for a time, it was just every day, the pictures started to synchronize. We had contributions from channels that they invested in. And when they changed something, we could see something. And when we said, try this or that, and they tried, we could see a difference in the model. And that's, that helped a lot with the acceptance of those.
Team design principles
So I said at the beginning that I think it's important to keep things as interesting as possible for the individual data scientists. So one of our principles was, anybody is, in principle, allowed to do anything, to work on anything. So I don't want to have a setup that prohibits anybody to look into a topic or a potential model or use case that he or she finds interesting, or acquire a skill, a technical skill, that he or she finds relevant and interesting.
The other principle is that the best use cases start from business questions. I want to incentivize everybody in the teams to not just build the model, to have the attitude, give me the problem and I'll build you the model. So that's not the right attitude. You have to want people to solve the business problem so that they are really part of the business. It also gets you a seat at the table when the business decision comes along.
So with that in mind, we had our team set up along the axis for whom are we working. So we have one team that focuses on gaining insights for human consumption. And we have another team that's more focused on building models. And the third team that focuses on applications that use ML models. Naturally, the first of these teams, so the insights focus team, is closer to departments like marketing or various other business functions. Or the last of these three teams is closer to an IT function.
But it helps us also to avoid handoffs, for example, to IT or handoffs between the teams, because we can just create virtual teams between these three teams for any task that comes up. So it feels a bit complex, but it forces everybody to work across teams all the time, which I believe is a huge benefit.
Model management and governance
I wouldn't say there's one way to solve all these problems. In a way, it depends. So what we do to manage the model in a way depends on the application or the area of application. So there are models that just are meant as decision support or informing decisions. And for those models, the regulatory requirements are not as high as for other models. But for those models, usually it's enough to have the code versions, to have version tracking for the different model iterations. And just, for example, upload those models into GitLab and use a GitLab runner and have a schedule there.
For the more critical applications, MLflow is one of the alternatives that we use. I would assume that we will use it in the future even more than we do now with new incoming regulation coming from the AI Act. And for the most critical applications, we, for example, when we build models that help with pricing, they are integrated into the production pipeline from our pricing software, which is specialized insurance software.
We don't rely that heavily on third-party models. Well, it changes a bit with LLMs, but that's a special case right now. Apart from that, we don't really rely on third-party models. One reason is governance and trustworthiness. So we want to have the full control over what the model does. But also, I'm really skeptical about vendor lock-in and possible cost cuts that will then disable you from building new use cases. So for me, it also has to do with this robustness to be able to help yourself and build use cases in the future, not dependent on one single tool.
Being part of the business, not just support
So one of the things I see, particularly in people who are new to, who come newly into the team and aren't, maybe even from the industry, what happens a lot if you build dashboards or reports without industry experience is that you talk about the what that you see in the data. So you have, in principle, somebody reading out a dashboard to you. That may be interesting and the dashboard may look nice, but that's not really interesting for the business.
They can, most of the time, they are quite quantitative. So they see the what immediately and most of the time, even better than I would, for example. But the why is actually the important question and you don't normally get to the why if you just look at the data, you have to get out of your data science home zone area in the office and just go to the people who do the real work.
And you come most of the time back with quite a few number of hypotheses about what's going wrong or what's going right about that. Of course, you can then think about designing experiments. But most of the time, if you just dig deep into the data, you can find a quite obvious answer that part of the process is just maybe broken or just faster than another process.
So this obsession with getting to the why and getting your hands dirty in non-data science work, I think that's most important to be part of the business because everybody else in the business does it as well. You're not in a better position or you're not allowed to have any kind of attitude just because you're a data scientist.
So this obsession with getting to the why and getting your hands dirty in non-data science work, I think that's most important to be part of the business because everybody else in the business does it as well.
Generative AI and RAG
So one of the focus areas of right now is, of course, generative AI. And it's part of, well, it was borne out of last year's hype about that kind of artificial intelligence. We are currently, for example, working on a system that helps us retrieve knowledge from these very old insurance policies, which our customer agents spent quite some time on.
We have various, very complex and partially very old knowledge databases where our customer agents have to gather multiple information across, I don't know, five databases to answer a question if it's really, really complicated. And they need then maybe even two or three other colleagues to be really precise about the answer. Because if you are not precise, then the company is, of course, liable for every false statement that the customer agent makes.
And we are currently in the process of building a retrieval augmented generation stack to help get to the right documents faster. We're also experimenting with generating the answer. We have to be very cautious about that because of the liability issues. We are currently also very careful not to expose that to the end customer. I want to have a competent human always in between to check that. It's still in the development phase. But that's one of the main use cases where we use LLMs and general AI.
So we use quite a standard RAG architecture with an external API call to open AI at the moment, but we're evaluating switching models there. We are estimating four metrics, if I remember correctly, at the same time. So factual correctness is one, relevance, conciseness, and completeness is the third, which I always find the hardest when it comes to completeness. Because insurance policies for our flagship product are 277 pages for one product. And it's hard to find out just how much text you quote and how much options and writers and exclusions you add to the answer you want to generate without overloading the whole chat. And it would be useless. People could just read all the pages and they wouldn't speed up in their operative process.
Causal inference in practice
So I do embrace Judea Pearl's book of why, and I try to spread that in the team. But that's still a process. So we are still in the learning phase there. So nobody in the team has received formal training in causal inference as far as I am aware. But we are trying to educate ourselves as good as we can.
We use causal inference mainly within our marketing mix model that I've talked about a bit. The reason to try out causal inference there was that we have a lot of data about the sales process that gets normally thrown out when you do standard econometric marketing mix modeling. So we usually have the marketing spendings for various channels and try to model sales as a function of the marketing spendings. But we have so much more data. We have data on the impressions, on the clicks, on the wizards, on the calls, on the email campaigns. And I always thought that it would be a waste of all this information and signals just to throw it out in an effort to do a marketing model.
So we started drawing causal diagrams where these process data are mediating variables. So we have our independent variables, the spendings, and then we have a lot of mediating variables. I mean, we wouldn't do any quotes if we didn't have the spendings in the first place. It also helped with controlling for unknown confounders, because that was one of the issues that early marketing mix models didn't get off the ground. You know, one of the best ways to sabotage any modeling effort of anybody is just think of one potential confounder and ask them if they have considered that.
So now what we're doing is we built models, several models from spendings to these mediating variables. And then we built one model from the mediating variables to sales. And then we can do, I think it's front door adjustment, the front door adjustment in the causal inference, from spendings to sales induced that simulations. So our business partners, I don't think they are deeply interested in the specifics of causal inference. But they are very interested in the causal diagram because it represents how they see the process that they are. And I can offer a better basis with them for connecting and discussing how this process works and how the model works.
Career advice
One of the earliest pieces of advice was Benedict, you're going to fail. It's, it's a guarantee. You're going to fail. And maybe you're even going to fail every single day. You just can choose where you fail. So to elaborate a bit every day, you receive tons of deadlines and emails and requests. And it's, it's a certainty that I cannot meet all of the things that are thrown into my direction. And that's fine. I mean, that's part of my job.
What I'm not thinking about trying to meet all the deadlines and do all the stuff that I have to do. I'd rather like to think about where, where can I afford and where do I want to fail and not meet the expectations. That has helped a lot with accepting things that go wrong and discussing it more openly with my team.
And maybe you're even going to fail every single day. You just can choose where you fail.
