Data Science Hangout | Mythili Krishnaraj, AXA XL | Platform Governance With a Shared Vision

Transcript#

This transcript was generated automatically and may contain errors.

Welcome, everybody, to the Data Science Hangout. If you're joining for the first time, it is so nice to meet you. I'm Rachel. The Data Science Hangout is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing and what's going on in the world of data science. So these sessions are recorded and shared to the RStudio YouTube, as well as the RStudio Data Science Hangout site. Try to get them up at least a week after each session. So you can always go back and rewatch or find helpful resources. We also have a LinkedIn group for the Hangout, too. So if you ever want to continue a certain discussion, or if anybody wants to talk to each other in there, other than me just talking in there, please feel free to join that group.

Together, we're all dedicated to creating a welcoming environment for everybody here. So we love when everyone can participate and we can hear from everybody. So there's three ways that you can ask questions today. You can jump in by raising your hand on Zoom. You can put questions in the Zoom chat. Feel free to put a little star next to it if you want me to read it, or I can call on you to introduce yourselves and add some context, too. Lastly, we also have a Slido link where you can ask questions anonymously. And Hannah will share that in the chat as well. I just want to reiterate before we get started that we love to hear from everybody, no matter your level of experience or area of work, too.

And today, I am so excited to be joined by my co-host for the day, Mythili Krishnaraj, Global Delivery Lead, Pricing and Analytics Platform at AXA XL. And Mythili, I would love to have you just start by introducing yourself and telling us a little bit about your role, the company, and maybe also something you like to do in your free time outside of work.

Sure, sure. Thank you, Rachel. That's quite a warm welcome. And it's nice to be here because being, you know, a customer on the other side of RStudio, and I keep telling this to everyone, that it's become a product which I started loving so much. So I'm glad to be in the Data Science Hangout today. And I'm Mythili Krishnaraj. I'm part of Global Delivery Lead for AXA XL for Pricing and Analytics Platform. I've been in the industry for more than 20 years in terms of insurance and technology, have played different roles in terms of data and technology. Today, I'm here to share some of the experience with RStudio we had over the two years in terms of the maturity of the platform within the organization.

And free time, yes, I am a busy mom with two kids, but my older one is 16 and the younger one is nine now. So I'm getting a lot more time, I can say. I started my Executive MBA in March this year. So pretty busy with all that. But I love reading. And I love also like some artwork like glass art or crotchet or sewing. So that's what about me and I'm so, so happy to be here today. Thank you.

But I think where that sort of conflict comes in is, you know, when objectives, there is no shared vision. So to get that shared vision, I think both teams need to talk. That's very important.

Why RStudio won over other platforms

Yeah, definitely. So I think it's, I mean, I won't take the entire credit. Like I said, it's the users, it's the business users. I think I mentioned this to Kevin sometime back. They saw the flexibility that comes with RStudio. When I say flexibility, it's not like just, you know, without any guardrails going and using it. They know that entire governance is there, but what they saw it is like the IDE. Now, after the workbench, we have the launcher, so which they felt that it's one of the good functionality that came out because, you know, you have Python, Jupyter Notebook, anything they can launch and also package management. And the other thing is the, you know, the ease, easy of like day going and developing something and then publishing it to Shiny . So the property model, which I'm talking about, is entirely built on Shiny.

And like past two years, you know, we have not had any major failures in terms of the platform, even though the users are expanding and, you know, we have it on prem with servers, still the business is able to do what they wanted to do, the data scientists. So that's one of the reasons, you know, their inputs convinced even our architecture to think that we cannot lose RStudio. And another advantage is the RStudio completely, you know, the way it can be installed, like how we are doing now. An IDE now, if we want to copy the image and set it up, we also have very good scenarios. We have got the complete IDE set up within two, three days. So that is something which is going to be helpful even when we go in cloud. It's going to be more placed on containers, but if you wanted to do something very quickly, it can be launched. So that is something, you know, it was useful for us to convince. Yes, there was a debate about, you know, even using other competitive platform and also we had Azure and all that, but finally RStudio won the case.

Managing 200 users and governance

Okay. So, when we did the due diligence also, you know, there were already some good things happening. Like the onboarding of users was not just the license was given. They were put in different AD groups. So that helped us. So there will be some users who just wanted to use only IDE and not publish any content to the connect. Like say they don't build anything in Shiny. So, they were like put into different AD groups. So, that was there. But one thing what it was lacking is we didn't know who were the users onboarding. And even if there is a leaver, we didn't know that person has left and we need to revoke their access completely so that we could take the license back.

So, the governance was thought about, okay, what can we do in terms of each step? So, first is the license. How do you get them like onboard? And then how do you keep an audit trail of when are they leaving? What are they doing? And say there will be some users, they get the license, but unfortunately, they are not able to use the application for say five months, six months. So, we started putting a timeline around it. If they don't use the platform for six months, we wanted to get in touch with them and find out whether they need the license. Otherwise, we take the license out of them.

So, everything has been put in terms of onboarding. And in terms of governance, like I said, we have segregated the environments now. So, we have the development and the production separate. So, in development also, we have staging and non-prod. So, when they are developing the models, when they are going to publish it, they need to come through the, again, a ticket, which comes to us, to my team. And then we go through it. What are they deploying? And what is the model? And which is the data sources? And have they used any sort of hard code passwords? Because those are all the things we have filtered now. And there is also a validation framework, which we are using in terms of validating, okay, what's happening? Are they using licensed libraries? Are they using the proper packages?

Packages is one good thing, because we have, again, it only can go through my team. So, any packages that is requested, my team will be installing them. And now, if we need to, say, download it from some website, we need to whitelist it. That is a separate team. We need to send the request for whitelisting, and I need to approve it. Otherwise, all of our packages come from MRAN, the Microsoft MRAN. And that's how we are managing the governance for new users.

Package approval and open source

See, there is no hard rule that they have to use only the packages from MRAN. If they choose from some other website, like say when you Google and you find it, it comes for whitelisting. And this was implemented recently. So, we need to send out the request to make it whitelisting so that we can use that link to download the package. But one step, what is in terms of governance is it goes through my approval. So, I need to ask several questions. One thing I have to make sure is there is no vulnerabilities that we are exposing, right? So, if that is the case, and mostly I will say that we have not come across any time that the users have requested something and we are not able to get that for them. Any website it could be, we see there is, you know, it is a proper website and it has its, you know, we know that it's not going to cause us any problems, then we are happy to go ahead with that.

To be honest, it's a very small team because like I said, always operational cost is also something we wanted to keep it within. So, it's a very small team I hold. But we go by, you know, the business priority. If they say something, no, it's very, very critical because the next week I need to release this for, you know, for the scoring or, you know, for underwriters, then we try to give that as a priority. So, that's how we keep and now, I mean, for past year, we are using Kanban board where we keep all our tasks, everything, including installation of packages and it could be anything related to governance and security. We are following agile model there and we pull out the tasks based on the priorities. And we know who's working on that. If it doesn't come back, why it's not coming back. So, I think in terms of project management also, we have picked up this agile way of doing it. So, it's very quick.

Yes, sometimes they need to because, you know, they could be asking it from some other websites outside. So, sometimes, yes, they need to explain why. It could be very, very useful for them in terms of some functions. And like I said, this is one of the main platforms for our actuaries as well as the data scientists. So, we can't say, you know, just bluntly no to it. So, we also put in time to see is there anything that we could do about it. Is there an alternative? If it is not there, then I'm happy to, you know, approve the whitelisting. And also, there is something else we have discussed also. In case we wanted that website to be used to install a package but we are thinking that we are not very sure whether that has any sort of risks that it is posing, then what we do is we do the installation of the package. Then again, we remove it from whitelisting so that no one is using it further. So, that's how we are trying to do. But like I said, many times, I think I can say 99% it has gone through without any issues.

Open source, security, and the balance

Actually, I mean, definitely, if anyone wants to jump in, I will also love to hear. For AXA XL open source was allowed earlier but then it started becoming very strict that we are not supposed to be downloading anything. That's why about all this whitelisting, we can't use it without making it approved by all the info security group as well. So, after my approval, it goes through info security. They have to approve, then only it goes through. So, that was put in. But earlier, they had to go to open source and they can download it. But now, it's very, very strict. But one thing I wanted to tell us, I'm not very familiar with all the packages, but I can tell you that there is around 1,000 packages that is getting used currently now. So, I'm thinking that most of those packages are available in Microsoft MRAN. So, I think most of the functionality which our users are using, I think it's coming from validated package list.

Yeah. So, Manju, the whole thing, right? See, it's again, like I said in the start, it's looking at both ways, right? So, I agree to you definitely, because you lose that sort of, you know, the flexibility to get what you want to do as a data scientist. But the whole thing is, it's not like I said, a very hard rule that nothing should be got from open source or nothing can be installed. We also have a process of, you know, finding alternatives. Other ways, there is also a process called going through, you know, complete security process, the toll gate, and then, you know, explaining why do we need this. And then, you know, we have something called as a RAM process. And if that is being set, a year is set on it, like say expiry date is set on it, every year it will be reviewed. And if we have already got an alternative, they'll close that. Otherwise, they will continue using it.

So, the risks organizations are having is huge. Coming from insurance organization, if something goes wrong, even one data gets leaked, the amount of penalty that the organization has to pay is very huge, which, you know, we can't even quantify how much is that. So, that's the risk we hold. And the data privacy is like really the topmost priority for us. And we have to ensure that for different applications. So, this is one of the applications, but there is quite a lot. And the governance is required. Otherwise, it will be too hard after a few years. Because we won't even know from where the packages have been downloaded and what each package is bringing the vulnerability. And it becomes very, very difficult to manage.

No, definitely. I mean, I, I understand where you come from. Yes, it, it has to be nurtured. But I think it's a balance. And if we can achieve that balance, it's good for both sides, right? See something like you said, if everyone is being bit conscious, what they are exactly doing, am I every step I do, if I'm being conscious that no, I'm not going to pose a risk, then it's fine. It's absolutely fine. But when it comes to the reality and everyone doing their jobs, sometimes it's because of the time pressure or the timelines pressure also, there are chances that they might forget to check something. And, you know, even though I'm working in technology all these years, I am still not an expert of security. So, that's why we sometimes go to our security advisors to find out whether is this open source okay and whether can we allow it. And that's why they keep one more approval after me that the infosecurity has to approve it.

So, I think it's again, you know, like I said, it's a balance. There should be, if you really need something, there should not be a complete blunt no. There should be channels to help you. So, if organizations could do that, I think it's really nurturing both mindsets, right? One side, we making sure, you know, we are aligning to all of our security. At the same time, we are also giving the flexibility for the data scientists.

So, I think it's again, you know, like I said, it's a balance. There should be, if you really need something, there should not be a complete blunt no. There should be channels to help you.