Data Science Hangout | Mythili Krishnaraj, AXA XL | Platform Governance With a Shared Vision
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome, everybody, to the Data Science Hangout. If you're joining for the first time, it is so nice to meet you. I'm Rachel. The Data Science Hangout is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing and what's going on in the world of data science. So these sessions are recorded and shared to the RStudio YouTube, as well as the RStudio Data Science Hangout site. Try to get them up at least a week after each session. So you can always go back and rewatch or find helpful resources. We also have a LinkedIn group for the Hangout, too. So if you ever want to continue a certain discussion, or if anybody wants to talk to each other in there, other than me just talking in there, please feel free to join that group.
Together, we're all dedicated to creating a welcoming environment for everybody here. So we love when everyone can participate and we can hear from everybody. So there's three ways that you can ask questions today. You can jump in by raising your hand on Zoom. You can put questions in the Zoom chat. Feel free to put a little star next to it if you want me to read it, or I can call on you to introduce yourselves and add some context, too. Lastly, we also have a Slido link where you can ask questions anonymously. And Hannah will share that in the chat as well. I just want to reiterate before we get started that we love to hear from everybody, no matter your level of experience or area of work, too.
And today, I am so excited to be joined by my co-host for the day, Mythili Krishnaraj, Global Delivery Lead, Pricing and Analytics Platform at AXA XL. And Mythili, I would love to have you just start by introducing yourself and telling us a little bit about your role, the company, and maybe also something you like to do in your free time outside of work.
Sure, sure. Thank you, Rachel. That's quite a warm welcome. And it's nice to be here because being, you know, a customer on the other side of RStudio, and I keep telling this to everyone, that it's become a product which I started loving so much. So I'm glad to be in the Data Science Hangout today. And I'm Mythili Krishnaraj. I'm part of Global Delivery Lead for AXA XL for Pricing and Analytics Platform. I've been in the industry for more than 20 years in terms of insurance and technology, have played different roles in terms of data and technology. Today, I'm here to share some of the experience with RStudio we had over the two years in terms of the maturity of the platform within the organization.
And free time, yes, I am a busy mom with two kids, but my older one is 16 and the younger one is nine now. So I'm getting a lot more time, I can say. I started my Executive MBA in March this year. So pretty busy with all that. But I love reading. And I love also like some artwork like glass art or crotchet or sewing. So that's what about me and I'm so, so happy to be here today. Thank you.
Excitement for the year ahead
Thank you. So while we wait for questions to come in from everybody, I'm curious, what's something that you're really excited about thinking about the data science team in the next year ahead?
Okay, so it's a very exciting year for us. Because when I was talking about RStudio, it's been on on premise servers for almost from 2018. I joined the team two years back, taking the management of completely the platform. But it's been on on prem and we are in the process of migrating everything to cloud. We started the work this year and this is going to be one of our strategic platform for the pricing and analytics world. So I wanted to see the product getting evolved. And again, that's going to be a completely different experience in terms of again, bringing the maturity. And also, you know, always the operational cost comes for discussion, whenever we do such migrations and getting the platform there. So I'm really looking forward for that.
Platform responsibilities and team structure
So when I said I'm in delivery, global delivery, lead pricing analytics means like platform means a lot for different organizations, let me put it in, you know, in a easier box. So the platform lead comes with responsibilities in terms of security, because AXA XL being, you know, one of the global players, and also insurance and commercial space. It's very much data privacy, GDPR, and all the other security regulatory is quite hard for us. We have to be very, very careful with the data we handle, and also with all the integrations of different systems. So my role is to ensure that I'm aligning to all the security policies that is put forward by information security, as well as it also comes with operational work, like all the data analytics work and all the data science work, pricing work, because we use RStudio also for pricing.
Because in insurance, if you see, we have pricing as one of the important functionality of underwriting. So we use it for that as well. So we have to ensure that the platform is stable. And there is one property model we have for pricing, which I can say that 24 by seven, it should be up running. So that is, again, it comes under my responsibility. And apart from that, you know, definitely the budgets, right? So anything you run in terms of platform, it's not that you have huge investment to do, but you have to make sure that you have the right platform. So with the budgets, like given budgets, and also considering what is coming in terms of expansion from different users, I have to ensure that everything goes by line.
Moving to the cloud
So the main reason why we are moving to cloud is not only for this particular platform. It's the whole decision of the corporate taken by AXA and then AXA XL. By 2023, we wanted all the data centers to be gone. And we went with private cloud. And now it's more of hybrid, choosing private and public. It's the decision of moving to cloud is again, you know, different reasons. One is the operational cost, definitely. Because if you see, we had, so when I say AXA XL, we are part of XL group and XL group acquired Catlin. Then we were again, you know, acquired by AXA. So we have like, say, three integrations gone into this whole of this acquisition. And because of that, the technology stack is huge. And we are trying to streamline everything. And in terms of the cost also, we wanted to say that, okay, move it to cloud. And cloud is also, you know, way of looking into saying by demand, you put on the infrastructure rather than just giving the maximum. So that is the thought process behind.
But I will also, yes, agree to a point that cloud could be pricey. It could come up with its own, you know, different way of configuring it. And but since we know how much we have spent all these years in terms of on-prem, we have something like on comparative basis. And we'll try to be within that. But I can't promise saying that definitely it's going to be cheaper than the on-prem.
Platform governance and maturity journey
So, you know, it's quite, you know, I'm going to be talking about this, I think, throughout my career, because this is something which I have really enjoyed doing it. So the platform, when I started managing it two years back, it was, you know, all managed by the business users. So there was a bit of, you know, hidden detail, I can say, like a black box. We don't know what was the models that were sitting there. Also, we don't know who were the users, because it was being expanded to different departments within the organization. It started like a POC. And I think it started expanding to different business units, because they saw the benefits that was coming out of it. And then it was also picked up by one of the biggest property model pricing.
So it had its own, you know, visibility in terms of, oh, we can do some in this platform, all these things. And from always from the for the users who have been on Excel sheet for some time, you're giving a platform which comes with its own benefits, they will love to use it. So that's how it started onboarding. There was very, very thin line of support already there. When I say support, that was in terms of making sure the service doesn't go down, and also onboarding the new users. So that was the minimal thing I saw when I entered this unit. And then I have to take the maturity as my priority, because as I said earlier, the operational cost is something, you know, as a delivery leader, I have to make sure that we are within that, and also security, because tech debt and security became priority after AXA acquisition for us.
So with all this as my main objectives, I said without even knowing what is there inside the black box, it will be very difficult. So we started with the journey of the maturity of what I was talking about. It was due diligence that first I wanted to do in terms of understanding what has gone in the system, and also how to bring the stabilization of the platform, because we also had complaints that it was mainly for the property pricing, the platform was coming down, and they were not knowing the reason what was happening. Is it with the service? Is it with the network? Is it with the application? So that's where I think RStudio team also really helped us, and Kevin Hayden is here. So he was so helpful.
He got the engineers, and we did a complete architecture review. So due diligence was going on one side, the inventory. I will also tell how I did the inventory, because that might be useful for many of them here. So it's a huge organization, different departments. I don't know anyone in person as well. So what we did was, and they're all spread across the global region. So what we said was, okay, let's start doing a questionnaire. So it was a simple questionnaire prepared with, you know, a lot more questions for us to first understand who are you, what are you doing here, and what is your pipeline, and how many, you know, models you have, what is your data sources, and that was the type of the questions. And we sent it out. We collected all the information, which was very, very helpful for us to say that, okay, this is what the platform is getting used for.
And then one by one, we need to come out with the plan of, we call that as application currency, because, you know, we also quantify that whatever changes you bring into the platform. So we started with those projects to get to the maturity level. And there were quite a lot that has been done over the two years, and still I'm saying that we are still in progress. I won't say it's closed, because, you know, for a platform like this, where we have a number of users, we can talk about IDE users, we have around 200 to 250 users. And in terms of Connect viewers, we have around 1,250. So for such a platform, you know, it never goes. There is still work, we are, it's going.
Relationship between data scientists and IT
I will say very, very collaborative. See, one thing I believe in is technology cannot achieve what they want to achieve without the business. And it's the same for business users also, the data scientists, they can't do what they want without the technology, because some of these requests which is coming from them. Simple example, I'll give you in terms of the packages they need, for example, right? Sometimes when it comes to us, we need to understand the importance of it and quickly turn it around. And at the same time, their deployment. So now recently, we implemented proper deployment release management for all our models. So nothing goes to production without going through the deployment, which we call it as release management.
And there, you know, I can understand being 20 years in technology. For other applications, it's okay, we get that particular build after, say, three days or four days, or even a month planning can be done. But for analytics, they want something to the market very quickly. They wanted to give the scoring models to underwriters. They wanted to give the pricing models very quickly in terms of, you know, change in inflation and things like that. So what we did was I have to come up with a different plan of release management, saying that, okay, let's put, you know, let's customize it. We won't treat every model as an application and go through the entire change management. We will go through a very, very thin line, only the minimal, right? The basics, what we need to do in terms of the guidelines. So we have to customize everything for our data scientists.
And I think they saw that when first, when I talked about bringing in guardrails and, you know, all that, they were thinking, oh, this is going to be complicating what we do now. We are not going to have the flexibility and we are not going to have that efficiency, which we had earlier. But now they are seeing why did I do it and how come it has streamlined their process. And now it's, you know, it's become like more easy for me to go and tell them this is what I'm planning to do for the next phase. So I took time, but I think now it's been two years and without their help, I will not be able to do anything.
Yes. So, you know, the tips is really for the IT team to understand the value of what you're doing, because everything you do, like data scientists, there is something going to be an output. It could take one week, it could take a month, or even it could take six months, which you are doing an R&D. But for them to understand the value we're going to bring in is going to be very, very important. And also explaining to them the flexibility you need, because analytics and, you know, the data science is all about the speed that comes with any platform. It could be any platform, right? So that is something which you need to explain to them. I'm fine. I'm here to support in terms of the governance or security or stability, what you want to bring in, even the operational cost. But at the same time, can I get this? So it's always, you know, two ways.
So if you say, I'm going to be really supportive of your guardrails, but at the same time, I'm expecting this from your team, I think it's going to be more good. And definitely I feel that that collaboration is required. Like I said, otherwise, it's very, very difficult to make this platform mature. And one thing what I'm seeing now is after this journey, two years journey, the platform has matured. And now we both know how we work. I mean, each team knows how we work. And there are sometimes trade-offs, both sides, but we are able to take up that. The reason being, finally, we wanted to give the best to the output for our business and for the business to earn the profits what they want. So finally, the objective is same, both for technology and the data scientist world. But I think where that sort of conflict comes in is, you know, when objectives, there is no shared vision. So to get that shared vision, I think both teams need to talk. That's very important.
But I think where that sort of conflict comes in is, you know, when objectives, there is no shared vision. So to get that shared vision, I think both teams need to talk. That's very important.
Why RStudio won over other platforms
Yeah, definitely. So I think it's, I mean, I won't take the entire credit. Like I said, it's the users, it's the business users. I think I mentioned this to Kevin sometime back. They saw the flexibility that comes with RStudio. When I say flexibility, it's not like just, you know, without any guardrails going and using it. They know that entire governance is there, but what they saw it is like the IDE. Now, after the workbench, we have the launcher, so which they felt that it's one of the good functionality that came out because, you know, you have Python, Jupyter Notebook, anything they can launch and also package management. And the other thing is the, you know, the ease, easy of like day going and developing something and then publishing it to Shiny. So the property model, which I'm talking about, is entirely built on Shiny.
And like past two years, you know, we have not had any major failures in terms of the platform, even though the users are expanding and, you know, we have it on prem with servers, still the business is able to do what they wanted to do, the data scientists. So that's one of the reasons, you know, their inputs convinced even our architecture to think that we cannot lose RStudio. And another advantage is the RStudio completely, you know, the way it can be installed, like how we are doing now. An IDE now, if we want to copy the image and set it up, we also have very good scenarios. We have got the complete IDE set up within two, three days. So that is something which is going to be helpful even when we go in cloud. It's going to be more placed on containers, but if you wanted to do something very quickly, it can be launched. So that is something, you know, it was useful for us to convince. Yes, there was a debate about, you know, even using other competitive platform and also we had Azure and all that, but finally RStudio won the case.
Managing 200 users and governance
Okay. So, when we did the due diligence also, you know, there were already some good things happening. Like the onboarding of users was not just the license was given. They were put in different AD groups. So that helped us. So there will be some users who just wanted to use only IDE and not publish any content to the connect. Like say they don't build anything in Shiny. So, they were like put into different AD groups. So, that was there. But one thing what it was lacking is we didn't know who were the users onboarding. And even if there is a leaver, we didn't know that person has left and we need to revoke their access completely so that we could take the license back.
So, the governance was thought about, okay, what can we do in terms of each step? So, first is the license. How do you get them like onboard? And then how do you keep an audit trail of when are they leaving? What are they doing? And say there will be some users, they get the license, but unfortunately, they are not able to use the application for say five months, six months. So, we started putting a timeline around it. If they don't use the platform for six months, we wanted to get in touch with them and find out whether they need the license. Otherwise, we take the license out of them.
So, everything has been put in terms of onboarding. And in terms of governance, like I said, we have segregated the environments now. So, we have the development and the production separate. So, in development also, we have staging and non-prod. So, when they are developing the models, when they are going to publish it, they need to come through the, again, a ticket, which comes to us, to my team. And then we go through it. What are they deploying? And what is the model? And which is the data sources? And have they used any sort of hard code passwords? Because those are all the things we have filtered now. And there is also a validation framework, which we are using in terms of validating, okay, what's happening? Are they using licensed libraries? Are they using the proper packages?
Packages is one good thing, because we have, again, it only can go through my team. So, any packages that is requested, my team will be installing them. And now, if we need to, say, download it from some website, we need to whitelist it. That is a separate team. We need to send the request for whitelisting, and I need to approve it. Otherwise, all of our packages come from MRAN, the Microsoft MRAN. And that's how we are managing the governance for new users.
Package approval and open source
See, there is no hard rule that they have to use only the packages from MRAN. If they choose from some other website, like say when you Google and you find it, it comes for whitelisting. And this was implemented recently. So, we need to send out the request to make it whitelisting so that we can use that link to download the package. But one step, what is in terms of governance is it goes through my approval. So, I need to ask several questions. One thing I have to make sure is there is no vulnerabilities that we are exposing, right? So, if that is the case, and mostly I will say that we have not come across any time that the users have requested something and we are not able to get that for them. Any website it could be, we see there is, you know, it is a proper website and it has its, you know, we know that it's not going to cause us any problems, then we are happy to go ahead with that.
To be honest, it's a very small team because like I said, always operational cost is also something we wanted to keep it within. So, it's a very small team I hold. But we go by, you know, the business priority. If they say something, no, it's very, very critical because the next week I need to release this for, you know, for the scoring or, you know, for underwriters, then we try to give that as a priority. So, that's how we keep and now, I mean, for past year, we are using Kanban board where we keep all our tasks, everything, including installation of packages and it could be anything related to governance and security. We are following agile model there and we pull out the tasks based on the priorities. And we know who's working on that. If it doesn't come back, why it's not coming back. So, I think in terms of project management also, we have picked up this agile way of doing it. So, it's very quick.
Yes, sometimes they need to because, you know, they could be asking it from some other websites outside. So, sometimes, yes, they need to explain why. It could be very, very useful for them in terms of some functions. And like I said, this is one of the main platforms for our actuaries as well as the data scientists. So, we can't say, you know, just bluntly no to it. So, we also put in time to see is there anything that we could do about it. Is there an alternative? If it is not there, then I'm happy to, you know, approve the whitelisting. And also, there is something else we have discussed also. In case we wanted that website to be used to install a package but we are thinking that we are not very sure whether that has any sort of risks that it is posing, then what we do is we do the installation of the package. Then again, we remove it from whitelisting so that no one is using it further. So, that's how we are trying to do. But like I said, many times, I think I can say 99% it has gone through without any issues.
Open source, security, and the balance
Actually, I mean, definitely, if anyone wants to jump in, I will also love to hear. For AXA XL open source was allowed earlier but then it started becoming very strict that we are not supposed to be downloading anything. That's why about all this whitelisting, we can't use it without making it approved by all the info security group as well. So, after my approval, it goes through info security. They have to approve, then only it goes through. So, that was put in. But earlier, they had to go to open source and they can download it. But now, it's very, very strict. But one thing I wanted to tell us, I'm not very familiar with all the packages, but I can tell you that there is around 1,000 packages that is getting used currently now. So, I'm thinking that most of those packages are available in Microsoft MRAN. So, I think most of the functionality which our users are using, I think it's coming from validated package list.
Yeah. So, Manju, the whole thing, right? See, it's again, like I said in the start, it's looking at both ways, right? So, I agree to you definitely, because you lose that sort of, you know, the flexibility to get what you want to do as a data scientist. But the whole thing is, it's not like I said, a very hard rule that nothing should be got from open source or nothing can be installed. We also have a process of, you know, finding alternatives. Other ways, there is also a process called going through, you know, complete security process, the toll gate, and then, you know, explaining why do we need this. And then, you know, we have something called as a RAM process. And if that is being set, a year is set on it, like say expiry date is set on it, every year it will be reviewed. And if we have already got an alternative, they'll close that. Otherwise, they will continue using it.
So, the risks organizations are having is huge. Coming from insurance organization, if something goes wrong, even one data gets leaked, the amount of penalty that the organization has to pay is very huge, which, you know, we can't even quantify how much is that. So, that's the risk we hold. And the data privacy is like really the topmost priority for us. And we have to ensure that for different applications. So, this is one of the applications, but there is quite a lot. And the governance is required. Otherwise, it will be too hard after a few years. Because we won't even know from where the packages have been downloaded and what each package is bringing the vulnerability. And it becomes very, very difficult to manage.
No, definitely. I mean, I, I understand where you come from. Yes, it, it has to be nurtured. But I think it's a balance. And if we can achieve that balance, it's good for both sides, right? See something like you said, if everyone is being bit conscious, what they are exactly doing, am I every step I do, if I'm being conscious that no, I'm not going to pose a risk, then it's fine. It's absolutely fine. But when it comes to the reality and everyone doing their jobs, sometimes it's because of the time pressure or the timelines pressure also, there are chances that they might forget to check something. And, you know, even though I'm working in technology all these years, I am still not an expert of security. So, that's why we sometimes go to our security advisors to find out whether is this open source okay and whether can we allow it. And that's why they keep one more approval after me that the infosecurity has to approve it.
So, I think it's again, you know, like I said, it's a balance. There should be, if you really need something, there should not be a complete blunt no. There should be channels to help you. So, if organizations could do that, I think it's really nurturing both mindsets, right? One side, we making sure, you know, we are aligning to all of our security. At the same time, we are also giving the flexibility for the data scientists.
So, I think it's again, you know, like I said, it's a balance. There should be, if you really need something, there should not be a complete blunt no. There should be channels to help you.
Dev, staging, and production environments
Yes, we do. So, we have dev, prod, and also we have staging environments because some of the things that our team, like, wanted to test, say, for example, we were, like, moving to Workbench last year. So, we wanted to install everything. We wanted to try out, we use one environment where we don't have data scientists at all. So, we have such an environment and also we have, like, for the users to try out anything in terms of their trial, like, the lower environment. Once everything is ready, that's where they submit the ticket. Okay, everything is done and I have my data sources and I have my algorithm ready and now I need to deploy this or publish this content, then it goes through the quickest way, right? That's what I explained. We have customized it because, say, there is a dashboard which someone has built it for within two days and it has to be immediately going to production and we see it's a very simple dashboard. Not many external data sources are there and there is no sort of any security that we need to validate. Then what we do is we give an SLA of even one to three hours. It will be published in the production, but say if it is a model which they also have built it for, say, six months, three months or nine months and it has got quite a lot of data sources and also there is some hard-coded passwords sitting in the code, then that goes through its, you know, little bit of validation. Again, I'm not talking about it will take a week. Again, there the SLA is two to three days.
Deployment process and CI/CD
So definitely no waiting time of two weeks. And there is also, you know, we wanted to give them access from, like, automated checkout from GitHub and going into production. That is also possible. And since we are moving into cloud, the DevOps is going to get involved. And it's going to be all automated, the CI CD pipeline. So then it becomes very quick. But now on-prem also, that's what I said, the maximum we are taking is only two to three days. Otherwise, it's like, say, two hours or three hours, even one hour. Say there is a change, like you explained to us that it has to be done. Maybe instead of 30 seconds, it's going to take an hour. That's all. So it's not a major difference. But we also have a checkout process from GitHub, which is going to be very automated.
So for us, because like I said, since I am owning pricing and analytics platform, I have customized this release process a little bit different. What our information security calls out is you definitely need to have non-production and production. You can't do your development in production. Please do it and then go through release management. We have a change request that we need to submit. But what I have done is since it is data scientists and pricing actuaries, they can't wait for every change, the change request to be raised and going through the entire process. If I have an auditing mechanism that I can document it, then I have, you know, the flexibility to customize it. That's why I have done it that way. So if it is a small change, no implications, and it can go within one hour, we send it within one hour. They need not go through the entire change request process that we do for all the other applications.
R and Python users in the same environment
No, they are all in the same environment. And even before the Workbench, there was some very little usage of Python. That time they had a server where they were using it. But now after this, we don't have separate environments. We have R IDE as one of the main thing because Workbench, we purchased it last year. So after that, I think it's only our platform that's getting used now. But there are also, in other departments, there are other competitor platforms, but we have to see when we move to cloud, what is getting segregated. R is going to be there, definitely RStudio is going to be there. But along with that, what other platforms are going to be also given to the community, we are not very sure of.
It has to go through the POC process. And earlier on, I mean, I'm talking about like, because now it's my 13th year in the organization, but before that, we used to have like anything you wanted to try, you can try and then go and put that as a business case and get it. But now we need to go through the procurement just to make sure even the POC departments, I mean, they know that they have to go through procurement, they have to reach out to the vendor, get the product and try it. And if we like it, then go with it. So there, if they wanted new languages or anything, or they want new platform, it has to go through the procurement process. And like I said, it also has to be justified. Why are we moving into new technology and the benefits of it?
Awesome. I don't see any other questions, but I just want to say thank you so much, Mythili, for joining us. It was so helpful to hear your perspective and experience here. No, thank you very much. Thank you very much for the RStudio team as well, because I think throughout this journey I'm talking about, I don't know how many times I've asked my team to reach out to RStudio and especially Kevin and say that, oh, this is very much urgent for us. Can you please help us? So I should really thank even RStudio team for that. And yeah, I look forward to working long time in this. Let's see how that goes.
That's awesome. You just reminded me, I wanted to say this as well. If anybody is ever curious if your team is using RStudio's products or if you want to be connected with your RStudio account manager, I can't believe how many times that there will be teams who reach out, who just don't know that their colleagues are using our tools yet. Feel free to reach out to me in the least salesy way possible. I can just let you know. I'll put my LinkedIn on here too, if you ever just want to ask and are curious. But thank you all so much for joining today and for all the great questions too. We will see you back hopefully next week. We also have an awesome meetup next Tuesday at 12 Eastern time with Julia Silvey and Isabel Zimmerman on using MLOps with Vetiver in R and Python. So I am also just going to put the short link real quick to that meetup if you want to just add it to your calendar too.
Awesome. Well, thank you, everybody. Have a great rest of the day. Bye.
