Resources

Alix Schmidt & Amanda Ahrens @ Dow | Creating business value with community | Data Science Hangout

video
May 31, 2023
1:01:19

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the Data Science Hangout. So nice to see you all and hope everybody's having a great week. I'm Rachel Dempsey. I lead our pro community here at Posit. The Data Science Hangout is our space to chat about data science leadership, questions you're facing, getting to hear about what's going on in the world of data across all different industries. And so if this is your first time joining us here today, it happens every Thursday at the same time in the same place. At the Hangout, we're dedicated to making this a welcoming environment for everybody. And so we love when we can hear questions from everyone, no matter your level of experience or area of work.

I'm so excited. We have two featured leaders here joining us today. Amanda Ahrens is ESG Data Architect at Dow and Alix Schmidt, Chemical Engineer and Data Scientist at Dow as well. And I'd love to have you both maybe kick things off by introducing yourself and sharing a little bit about your role and also something you like to do outside of work. Amanda, do you want to go first?

Hi, I'm Amanda Ahrens. My job is in enterprise architecture. So like the title says, I focus on environmental, social, and governance data architecture. So really all things sustainability, climate, circularity, safer materials for Dow. I wanted to take the role because as a data scientist, I was kind of frustrated that I didn't necessarily have data access in a streamlined way. And so deployment of models was challenging. And so I wanted to be part of that solution. And I've always wanted to get more into the sustainability realm. So it's a good opportunity. I really enjoy curling outside of work. So even though I moved to Texas, I still keep up on curling and get to find others from Canada and the Midwest that transplanted to Texas there.

Alix, do you want to introduce yourself? Yeah, my name is Alix Schmidt. I am originally a chemical engineer, as Rachel said, got into data science with a master's degree kind of halfway through my career at this point. So I've worked mostly in R&D at Dow. So making new products, understanding materials, a little bit in manufacturing where we can use our operations data to troubleshoot problems. Right now I am in Dow's central R&D group for real deep technical expertise in information research. And I'm really mostly working on our MLOps and model deployment strategy. So of course there's a technical aspect to that, but a lot of what I work on is really around change management, upskilling of data scientists so that they can follow our governance standards, getting team leaders and project sponsors educated so we are all on the same page about what it takes to deploy a model. So that's not something I ever expected to be doing, but it's fun and I get to work with a whole lot of different people all the time.

Outside of work, I am a potter. So I do pottery. And I say that because I'll be drinking coffee, but I only use handmade mugs now to support local artists. So this is a local artist in Midland, Michigan named Anna Safranski.

The Insight Scientist Community at Dow

Something I have been really excited to ask you all about is about the community building that you all have done at Dow and about data competitions and conferences. And I just posted on LinkedIn earlier this week that a lot of people have recently been asking me about how to get started with their own internal data conference. Amanda, maybe you can kick off with the sort of background of the community and the journey that we've been on.

Yeah, that sounds good. So we have an internal data science community that really has allowed us to lay the foundation for ever even being able to approach a data science conference. And we did successfully have a full data science conference last year. So we call it the Insight Scientist Community. And it was established in 2015. And I had the privilege of helping to launch it as one of the managers for the community at that point in time. And since then, it has grown to over 500 members inside Dow. And of those, about 100 of those people are full time practitioners and the remaining are interested parties, citizen data scientists, people getting into data science learning. And we have a good breakdown, a big chunk, about 35% from R&D, we have another third from operations, and then a good chunk as well from IT and supply chain, and then the rest is kind of scattered across the other functions and businesses within the company.

So throughout the years, we've been able to do, we started off smaller with poster sessions, we've kept up these monthly technical meetings where people can showcase, just showcase what they've been working on. But then we were able to do our first data science challenge in 2017. And then we did another one in 2019. And then we did our last one as part of the full data science conference in 2022. So it's been quite, quite a journey so far.

Executive sponsorship and the hackathon

I remember when we were talking a little bit about the competition, you mentioned that you had some executives who were part of that as well and they were helping judge some of the submissions. And I was just curious, like, how did you go about, like, bringing this community up to the executive level as well?

Yeah, we're very lucky that we've had really strong executive sponsorship, even from the beginning. So when Amanda was there at the founding of the community, I think the sponsor, the manager that really said, yes, I'm going to put my people's time into starting to do this, I think, reported directly to our CIO, or maybe two levels. So it was a very close connection to executive leadership, even from the beginning.

And it was actually that executive leader, CIO, Melanie Calmar, who I think proposed the first event that ended up being the data science challenge. But this was the first thing that was really kind of like, I think the executive leaders said, how can we use this community to start creating value for the company? And so they suggested, why don't you go figure out how to do a hackathon? Okay, and that word can mean a lot of things to a lot of people. But after really talking about it, we settled on this sort of shark tank, like the TV show shark tank style thing where groups come up with a project idea and pitch it to this executive panel.

How can we use this community to start creating value for the company?

I don't know if Amanda, you were involved in the step from getting that executive sponsor, the one to getting a panel of eight executives in the first challenge?

I'm not so much the first one. But in other areas, I feel like what really works is having the executive that's already on board, sell it to the rest. Like if you can go along with that executive that's already on board and kind of explain the vision of why you want to do a data science challenge or a data science conference, and what's in it for them, it was pretty easy to get them to say yes. Because they felt like it was an opportunity that the other executive was giving them to really like help set data science direction for the company. Because I mean, in all three data science challenges, winning projects were implemented in some way, like even if they weren't funded directly initially, there was such high interest by the executives and moving forward with the idea.

Yeah, all right, the number one project from last year, actually just won an Edison Award, which is a really big research and development in STEM award. And so that's like an external recognition to have that and that project actually brought three sort of very small projects from our toxicology department, our molecular discovery research area, and our supply chain area together to make a really powerful tool for making safer and more sustainable products. So I think that one is like such an awesome example of like exactly the vision of the event and to get that external recognition was really cool.

Choosing themes and cross-functional participation

I think a huge part of being able to do that is making sure that you set a theme that resonates with the executives, but also like the mid-level managers, because they have to be supportive of their employees actually participating in the data science challenge. Because seeing how ours was Shark Tank style, it wasn't just a day or two days or whatever, where people were working on a problem with the same data set. It was actually over a longer period of time. So I think beginning to end the challenge was more around two months.

The first theme was more general. It was like improving EBITDA. And then the second theme was around employee experience and customer experience, which was extremely top of mind, a hot topic at that time. And then the last one that we did in 2022 was all around sustainability. So it was on climate, especially decarbonizing, water and circularity and safer materials. And so that really resonated with everybody. Everybody at Dow is trying to prove that they're contributing to that. And that united a lot of people to come up with some great ideas cross-functionally.

So actually, it came that the latest theme was brought up to me by one of the technical principals that we had at the company, who just thought it would be like a really uniting theme. And then we pitched it to the executives that we knew were already on board and going to help us. And they were like, yeah, absolutely. Let's go forward with it. Like Dow seeks to be the most sustainable material science company in the world. And so that's like a very front and center theme for the company, not just like the data science challenge. But finding themes that are like enterprise level, like for the whole company is really important because one of the things we were thinking about is like, okay, let's make sure the theme doesn't exclude any one that has a certain type of job that just doesn't touch that area.

Upskilling and model deployment education

Libby, you had asked a question a little bit earlier. Do you want to jump in here and ask that? Yeah, it kind of falls in line with Catherine's question, too, because Catherine was asking about kind of that same area with upskilling for your tech stack. So she was asking about the curriculum and how you tackle the different use cases and stuff and skill levels that people have. And then my question was on the end of like getting leaders and ELT people on the same page with understanding what it actually takes to deploy a model.

Yeah. So I think the different parts of Dow have been on a journey with this, especially as we start using more of the cloud based tools. We're a Microsoft Azure company and we're trying to use those things. So that was pretty much just a slog. Like we just had to like the first couple of projects, you know, bless those people because it was very painful. But they documented very well. They documented the internal work process of how to make the external technology work for us.

My role is really specifically in R&D. And in R&D, we have quite a different situation because it's research. You never know what's going to happen. So you charter a project and it's very difficult to estimate the next things that you need because it could just totally fail and whatever. And then you're working with all these people from information systems who have project managers and they're used to this very formal scrum scheduling and you know exactly what you want to implement. And those are just like opposite things.

The situation that we had was R&D managers sponsoring projects with a sort of researchy amount of funding and then doing the research part. And yeah, that's good. But then they expected a deployment out of the end because that's what you see when you hear about AI and machine learning deployments in the world. And so then that was kind of the same story. It's like the first few projects were just really painful. We had to come back to the sponsors and say I'm sorry, this is going to take like eight months longer than you thought it would.

So now that we're on sort of the other side of that first batch of really painful projects, what I am trying to do is like really capture what we learned and work with those people who now understand, they're the ones who went through the learning journey, and make sure we can replicate that. So it's things like literally, I'm going to be preparing a sort of mini course called Model Deployment 101. It's just like literally what is a piece of software, what is a software architecture? How do you know what you need?

And in my case, we have a very powerful tool, which is using analogies to chemical manufacturing. So everybody in R&D knows you make a little flask of your chemical on the lab. You're like, yeah, I got a new product, but taking it from like this much on the lab bench to a 10,000 gallon reactor, that process is called scale up. They know that it's hard and it doesn't work. And it requires all these specific types of engineer to make it happen. And so we're trying to say, this is the exact same, just because you've done it in your development environment and this model works really good, there's still this whole process to get from that to a production software solution. And you can put up the architecture diagram and see, and be like, look, these are like pipes. Data has to flow through them. Would you have a bench chemist design your data pipes? Like, no, you wouldn't do that.

These are like pipes. Data has to flow through them. Would you have a bench chemist design your data pipes? Like, no, you wouldn't do that. That's why we have to have this particular skill set on the team.

Training cadence and asynchronous learning

In terms of developing these kinds of continuing education or community learnings for your company, are you doing those live? And do you have somewhere that they live after you've done them where somebody can access it if they get onboarded later?

I think we're trying to make most of them not live just because the need to educate somebody is so irregularly timed, right? So it's, you know, every once in a while, like next week, we're going to have a bunch of interns starting. And that would be a time when I could do maybe one live session and get a bunch of people trained at once. But for like full-time hires, they sort of join randomly. Some of these things are so fundamental that people need to hear them like day one, right? Like, okay, the expectation is that you use Git, like please do that. So I think we're going to go mostly like self-learning asynchronous kind of approach.

One of the things that we've been thinking about is like how do we facilitate these conversations between the technical experts and the resource holders where they basically describe what they need to do and then they get the estimate of how much resource it will take. And that's really the point when they need the education, right? We're really focused on not trying to get everything perfect on the first try and right now and just like getting stuff out there. And then we'll build together based on feedback from some of those early adopters.

Enterprise architecture and R&D agility

Hi. I think my question is about the communication and coordination between like sort of enterprise architecture level and the R&D level and if and how you figure out how, if EA needs like longitudinal planning about structure and people and resources and stuff, do you end up with tension between like being nimble at the R&D level and knowing that you'll be supported?

So I swear we didn't pay Alan to ask this question. So the answer to that is actually Posit. So in R&D where Posit products such as Workbench, Connect, Package Manager is actually one of the things that differentiates R&D strategy from elsewhere in the company that's fully Azure based and does development in Azure because it is so accessible, right? So the accessibility allows us to have much more participation in data science from all of our researchers. And it's expected that 75%, probably more, maybe 80 or 90% of things won't ever make it to production and like that's totally fine. And so Posit is excellent for that.

And then of course we can use Connect to deploy. And then if something really scales big enough and we have some kind of technical reason we need to move it to Azure, then we can. But so we think of it as kind of like an accessibility pipeline, right? We just want things to be super easy in the beginning and then just make sure that we have governance in place to recognize where we've crossed some kind of threshold where we do need improved reliability. So it is our policy in Dow because our Posit server is on-prem. We don't let anybody outside of Dow into it, right? So if you have an app that you want to release to a customer, we have to do something different.

I'll just add to from like the rest of Dow kind of experience. So Dow has digital centers, which are responsible for digital innovation. There's information research embedded in R&D and there's one for like a digital marketplace center, more on the commercial side, supply chain side, operation side. I feel like what's happened is the original plan was like that there would be like a sister organization that would partner to like help support longer term. And we're still working toward that and laying the groundwork. I will say we have some growing pains that we're working through still, but certainly making a lot of progress even over the last like five years.

Hiring and domain expertise

So this is the other thing we've run into when hiring. How do you figure out like this is a skill that this position needs to have coming in versus this is something that we can train people up in?

Oh, that's a hard question. I think the only answer that I have for that question that I think we really know is at this point for our materials and process related positions. So in R&D and manufacturing, we have found that it is much easier to hire people with degrees in that subject matter. So typically chemical engineering, and then build up the data science skill from there, rather than hire a computer scientist and teach them enough chemical engineering to have those really powerful conversations that you need to have in data science. So I think that's the only thing we really know for sure is like we want to start from the domain expertise that we'll need for that role and then add on digital skills from there.

Since the tech, a lot of the technology and especially like the Azure particular microservices and all that different stuff is so new, that we do recognize that there's nobody that's got like 10 years of experience in Azure Data Factory, for example. But so that gives you a little bit of a relief knowing that it's everybody's kind of in the same boat and training people and upskilling people right now.

Yeah, I think it depends on like the different areas of Dow too that you're looking at. I will say too, we realized, I mean, ideally you would have both the kind of industry experience and background as well as some exposure to data science or other digital skills. And so Dow is investing in programs that help provide that, like Alix already mentioned, the intern program with a bunch of interns starting that have more of a background, like they're pursuing PhDs in chemistry or chemical engineering or whatever, but then also they have some experience in data science.

Also, I am helping to plan a program, it's called DAISY, Data Analytics for Science Immersion Experience, kind of a mouthful, but it's actually targeted at undergrads that are studying STEM. And these are undergrads from historically black colleges and universities mostly. And we are giving them exposure to data sciences as a partnership with Carnegie Mellon and Accenture. And so they're going to be at Carnegie Mellon for a week, they'll be learning some of the, being exposed to some of the lecture material for a master's in data science program that they have. And then they're actually going to come to Dow and be able to see a poster session where they're exposed to data science projects at Dow and get exposed to like technical principals and data science fellows and hear about their career paths.

I will add one other thing. It's a really good point that we're well connected within our community enough that if somebody gets a resume that would be a great add to Dow, but not their group, we know who to send it to. And I would say that's probably mostly because of the Insight Scientist community. So the purpose of the Insight Scientist community is like to make sure that we don't get too isolated in each of our functions, right? And that we're sharing learnings because data science does cross all domains.

Community structure and sub-chapters

We have technical seminars for the full Insights community, but then we have, what we wanted to do was enable our members to find any type of close-knit community they wanted to. So you are allowed to make a sub-chapter of the community, as long as you have enough interested people to do that. And so we started with local chapters, so we have certain sites. There's like one in Michigan, one in Texas, there's one in the Netherlands, and one in Shanghai. For where there's sort of a high concentration of people, they can have local stuff, and then they have basically full autonomy over what that group does. So the Midland one, we pretty much just have happy hours. Other ones have book clubs or presentations or things like that.

So we really try to, we're not like the end-all be-all, right, of community planning. We try to create a platform where people can use our distribution list if they want to get a hold of everyone. They can create another channel in the Microsoft Teams if they want to make a subchapter. Yeah, just be a kind of a platform for folks to create whatever kinds of groups they need to create.

Rebuilding trust after missed expectations

There was an anonymous question a little bit earlier that was, how do you regain trust and rebuild reputations after failing to meet expectations, whether time and budget, if you have experience with this, and especially when dealing with a difficult sponsor?

Yeah, that's a really, really good question. I'll say we definitely understand exactly what you're asking. Mostly it's sort of organizationally, we do have such senior leadership that's saying like, hey, this stuff is really important. We have to start learning how to do it. So we do have kind of senior leadership behind us in terms of saying, okay, look, we have it out there. This didn't go well, but we need to try again. Like we have to try again. We don't have a choice.

Personally, for just me and how I run my projects, I just am transparent, right? Like I can just say like, yeah, that was awful. Like it was not fun for anybody. You can't cover it up, then they're not going to trust you. I think, yeah, transparency, authenticity is the only way to trust.

I kind of explained, because I feel like typical IT projects, the only definition of success is delivery of the product that you were supposed to go out and deliver. Whereas like in R&D, that's not necessarily the case. It's like, if you kill a project quickly, that can still be seen as a success. And so I've been trying to use that kind of analogy where it's like, hey, this is, we are really genuinely trying to prove a concept here. So if we execute this project, and find, hey, this is absolutely not going to work, this approach that we were hoping would work, that is actually still a success. And making sure that the sponsors are clear on what that definition of success is from the beginning.

What would you do differently?

If you were to start this process of training and onboarding new data scientists over again, is there anything that you would do differently, like having the experience that you have now?

I don't know that there is. I think it's really hard to separate what our specific people have done, and where, like what a data scientist's job looks like now versus five years ago. Like if I knew we were going to be here five years ago, I could have said, oh yeah, like we would have done a lot more software development, best practice training, and coding skills, and some of that other stuff. But I just think things have changed, and so we will change.

I like listening to the new people's perspectives because since I've been at Dow for almost eight years now, there's been pretty drastic improvement. I'm not saying we figured everything out. We do have a long way to go still. But I feel like sometimes I'm more rosy about some of the progress, whereas somebody new could be extremely frustrated. Like, why can't I do this? Like, why am I having these access issues? So taking note of what they have to say is really critical because we would like to retain them as well.

Upskilling without application

I have seen this happen with a lot of people and in a lot of organizations. So I'm curious how Dow handles it when you upskill a bunch of employees into better data skills or a better data stack or whatever it is, and then they are not able to actually use those skills in their day-to-day job. I've seen it kind of sow a lot of dissent in people. They're like, okay, you want me to learn all this stuff, but you've given me nowhere to apply all of this.

I mean, I've definitely seen it be an issue at Dow, like when people go through more generic Azure service training, and then we've Dow-ized the environment, so then they can't do things exactly how they learned in the training, so that can be really frustrating. And so from like an Insight Scientist community perspective, we definitely kind of raised those issues to leadership that could actually help us take action. Like we got leadership to help us talk to security about the issues that were preventing people from having a similar experience.

I'm so glad you brought that up. That is another one of the key things that the Insight Scientist community does, is serve as a collective voice for like the technical people on the ground, right? Because you have senior leaders that go talk to tech company executives and hear sales pitches and whatever, and it's like, this is so easy, you know, it'll accelerate your work. And then they bring them in and we do a training, and then we get it in the Dow environment and with the Dow security policies and work processes and all that stuff, and it's nothing like it was in the training, right? And then your senior leaders don't understand why they haven't seen the acceleration that they were told would happen, and it's just a matter of education.

We also have a lot of training we're doing for people who are never going to be a data scientist, but they either generate data or they might use data for day-to-day decision making within their jobs or whatever. Everybody needs some kind of data literacy at this point, right? What we've done is just basically really, really customize the training to Dow and to their role, right? So we don't deliver the training unless it's got examples of like, literally, this is what you do. And I guess the gist of creating the content for that role is that if you can't figure out how to do that, then maybe you should reconsider if that training is needed for that role.

Planning the data science conference

Maybe in the final minutes here to tie it back to what we had opened up with about some of the conferences and competitions, how did you actually learn, like, what was needed for the conference or, like, what goes into planning an event like that?

Luckily, because Dow is a huge company, we do have people whose entire job is like planning events, right? And so we have lots of admin support. In terms of everything else, I think we've learned a lot from the poster sessions that we've done every year since 2015. So that helps us learn how to gather technical submissions, how to set up a SharePoint to receive abstracts and things like that. We even this year, people submitted their proposal via a SharePoint list, and then they added their abstract and some extra information once they were accepted, and then we used a power app to then make the program for the sessions available on mobile.

And yeah, the other key thing is, like, you got to schedule the day around the executives. So it's not easy to pick a date, especially considering it does have like six to eight weeks of lead up to it. So you have to have a time when folks can be working on that, and you don't have like the August month of vacation for Europe, and you don't have December off for North America and that kind of stuff.

Yeah, it took a year to plan the whole thing. So it's every other year because it's a lot. A lot of work. Yeah, definitely start small and literally keep every single document. Like, if you spent more than like 15 minutes crafting an email, make sure you save that so that it can be reused in the future. Because then eventually, I mean, we just naturally grew. We're like, oh, well, what if we do a poster session and some presentations or a poster session and the data science challenge, and then it turned into a large conference. And don't forget to have fun, right? Like, we scheduled in certain times for fun and just hanging out and catching up with people, especially post-COVID. Yeah, absolutely. And celebrate with mimosas, the drinks after.

Definitely start small and literally keep every single document. Like, if you spent more than like 15 minutes crafting an email, make sure you save that so that it can be reused in the future.

Well, thank you both so much for joining us here and answering everybody's questions. Thank you all for the great questions, too. I appreciate it and hope to see everybody back next Thursday, too.