Benedikt Kahmen @ Generali | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hello, everybody. Welcome back to the Data Science Hangout. If we haven't had a chance to meet yet, I'm Rachel. I lead Customer Marketing at Posit. I'm so excited to have you all here joining us. If it is your first time, the Hangout is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, and connect with others facing similar things as you. We get together here every Thursday at the same time, same place.

But again, if it is your first time joining us, so nice to meet you. Say hi in the chat if you want to welcome. We'd love to welcome you in to anybody joining the first time. We're all dedicated to keeping this a friendly and welcoming space for everybody and love to hear from you no matter your years of experience, titles, industry, or languages that you work in.

There's also three ways that you could jump in and ask questions or provide your own perspective today. So first, you can raise your hand on Zoom and I'll call on you to jump in. Secondly, you could put questions into the Zoom chat and just put a little star next to it. And then third, we also have a Slido link where you can ask questions anonymously.

And quick note for anybody watching this recording in the future on YouTube and if you want to join us live, the link to add the event to your calendar will be in the details below. If you are adding the recurring event, just double check the time zone for you. So it's every Thursday from 12 to 1 eastern time. No rule that anybody has to stay on the whole time or talk. Just come and go as it fits your own schedule.

But with all that out of the way, welcome again. So excited to be joined by my co-host today, Benedict Common, head of analytics, data, and AI at Generali. And Benedict, I'd love to kick things off by having you introduce a little bit about yourself and your role, but also something you like to do outside of work.

Yes, sure. Thank you. Yeah, I'm Benedict. I'm the head of analytics, data, and AI at Generali Deutschland. So that's the German branch of the Generali Group, which is one of the leading primary insurers in Germany. We're working with a bit over 9,000 colleagues here. I've been with the company for the last 10 years and had the amazing opportunity to build a data science department from scratch, from literally one or two colleagues to now three full teams. Before that, maybe somewhat unusually, I did a PhD in philosophy. So if you're interested about that, just go on and ask me.

Yeah, and something outside work, I also am a marathon runner. Hopefully Luxembourg. I live in a part of Germany that's close to Luxembourg and they do a night marathon there. So maybe that's a good challenge.

which was an indicator that our social, organic social posts reached at least partly our existing customers, which still drove engagement, but didn't drive additional sales.

Yeah, sure. So that was, that was the beginning of one of the examples. So in the development phase, we had this, this very short feedback loop where we would work on the model each day and just send two or one or two output plots of the contributions of different marketing campaigns to sales, just to our then head of marketing. And he would just give us very quick reactions like total crap. I don't understand this. This isn't, this isn't possible. How can we have negative sales there? So it also helped, helped us fix issues in the data engineering part. But it also just transferred to us a lot of domain knowledge about marketing that we didn't have.

After, I don't know, 50, 60, 70 of these exchanges, so it was for a time, it was just every day, the pictures started to synchronize. We had contributions from channels that they invested in. And when they changed something, we could see something. And when we said, try this or that, and they tried, we could see a difference in the model. And that's, that helped a lot with the acceptance of those.

Team design principles

So I said at the beginning that I think it's important to keep things as interesting as possible for the individual data scientists. So one of our principles was, anybody is, in principle, allowed to do anything, to work on anything. So I don't want to have a setup that prohibits anybody to look into a topic or a potential model or use case that he or she finds interesting, or acquire a skill, a technical skill, that he or she finds relevant and interesting.

The other principle is that the best use cases start from business questions. I want to incentivize everybody in the teams to not just build the model, to have the attitude, give me the problem and I'll build you the model. So that's not the right attitude. You have to want people to solve the business problem so that they are really part of the business. It also gets you a seat at the table when the business decision comes along.

So with that in mind, we had our team set up along the axis for whom are we working. So we have one team that focuses on gaining insights for human consumption. And we have another team that's more focused on building models. And the third team that focuses on applications that use ML models. Naturally, the first of these teams, so the insights focus team, is closer to departments like marketing or various other business functions. Or the last of these three teams is closer to an IT function.

But it helps us also to avoid handoffs, for example, to IT or handoffs between the teams, because we can just create virtual teams between these three teams for any task that comes up. So it feels a bit complex, but it forces everybody to work across teams all the time, which I believe is a huge benefit.

Model management and governance

I wouldn't say there's one way to solve all these problems. In a way, it depends. So what we do to manage the model in a way depends on the application or the area of application. So there are models that just are meant as decision support or informing decisions. And for those models, the regulatory requirements are not as high as for other models. But for those models, usually it's enough to have the code versions, to have version tracking for the different model iterations. And just, for example, upload those models into GitLab and use a GitLab runner and have a schedule there.

For the more critical applications, MLflow is one of the alternatives that we use. I would assume that we will use it in the future even more than we do now with new incoming regulation coming from the AI Act. And for the most critical applications, we, for example, when we build models that help with pricing, they are integrated into the production pipeline from our pricing software, which is specialized insurance software.

We don't rely that heavily on third-party models. Well, it changes a bit with LLMs, but that's a special case right now. Apart from that, we don't really rely on third-party models. One reason is governance and trustworthiness. So we want to have the full control over what the model does. But also, I'm really skeptical about vendor lock-in and possible cost cuts that will then disable you from building new use cases. So for me, it also has to do with this robustness to be able to help yourself and build use cases in the future, not dependent on one single tool.

Being part of the business, not just support

So one of the things I see, particularly in people who are new to, who come newly into the team and aren't, maybe even from the industry, what happens a lot if you build dashboards or reports without industry experience is that you talk about the what that you see in the data. So you have, in principle, somebody reading out a dashboard to you. That may be interesting and the dashboard may look nice, but that's not really interesting for the business.

They can, most of the time, they are quite quantitative. So they see the what immediately and most of the time, even better than I would, for example. But the why is actually the important question and you don't normally get to the why if you just look at the data, you have to get out of your data science home zone area in the office and just go to the people who do the real work.

And you come most of the time back with quite a few number of hypotheses about what's going wrong or what's going right about that. Of course, you can then think about designing experiments. But most of the time, if you just dig deep into the data, you can find a quite obvious answer that part of the process is just maybe broken or just faster than another process.

So this obsession with getting to the why and getting your hands dirty in non-data science work, I think that's most important to be part of the business because everybody else in the business does it as well. You're not in a better position or you're not allowed to have any kind of attitude just because you're a data scientist.

So this obsession with getting to the why and getting your hands dirty in non-data science work, I think that's most important to be part of the business because everybody else in the business does it as well.