Resources

From Chemical Engineer to Dow Data Science Leader | Michael Hausinger | Data Science Hangout

video
Nov 26, 2024
58:03

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout, everybody. If we haven't had a chance to meet before, I'm Rachel, I lead Customer Marketing at Posit. Posit builds enterprise solutions and open source tools for people who do data science with R and Python.

And I'm joined by my lovely co-host here, Libby. Hey, I am a Community Manager. I work with Posit to help foster our Hangout community here. And I'm also a Posit Academy Mentor in Posit Academy, where I help professionals do more with data by learning how to use both R and Python.

We're so happy to have you all joining us today. The Hangout is our open space to hear what's going on in the world of data across all different industries and chat about data science leadership and connect with others who are facing similar things as you. We get together here every Thursday at the same time, same place, of course, not if it's a holiday, though.

But if you're watching the recording and want to join us in the future, there will be details below in the YouTube description where you can learn how to add it to your own calendar and join us live.

But you can always reach out to me directly on LinkedIn as well. I love getting to connect with you all.

At the Hangout, we love hearing from you. It doesn't matter what your years of experience are or your title or your industry or what language you use or don't use. We really encourage you to talk in the chat, connect with each other, add your LinkedIn profile, introduce yourself, what you do, what your role is. If you have any roles that you're hiring for or if you're looking for a role, please, please talk about that so that we can connect people to each other.

And as a reminder, this is a community-driven discussion with our featured leader here. So if you don't ask questions, we don't have a conversation to have. So there are three ways to ask questions, which you are all here to do. You can raise your hand on Zoom. We can just call you. You can jump in live. You can put questions into the Zoom chat. We will find those and make sure that we ask them to Michael. We will ask them for you if you put a little asterisk next to them. But if you don't, we will just call on you and you can ask that question live.

And then there's also, if you feel more comfortable asking questions anonymously, there's a Slido link that Isabella will put in the chat. You can ask your questions there and we will also see them and ask them for you.

With all that, we're so excited to be joined by our featured leader and co-host today, Michael Hausinger, a scientist in research and development at Dow. And Michael, I'd love to have you kick us off with introducing yourself and a little bit about your role today, but also something you like to do for fun.

Michael's background and role at Dow

Thank you. Michael Hausinger. I'm excited to be here today. This will be fun. I sit in our product development organization within the consumer solutions business within Dow. So, Dow as a whole is a huge chemical manufacturing company, materials manufacturing company. The consumer solutions business itself is mostly making a lot of silicone containing materials as well as some other non-silicone home and personal care materials.

The silicone side of things go into just about any industry you can imagine from additives into shampoos and conditioners, coatings that get used in airbags, conductive greases, all kinds of things in just about any industry you can think of. And a lot of times it's a small component that makes everything else work in combination with other parts of a larger construction or formulation or something.

And my role, I sit in our product development organization where we're trying to make the next generation of interesting products for whatever application we're aiming for. There's people in the organization working on all those different industries. But I've ended up in this sort of unusual role where I'm more focused on the digital space and helping the rest of the organization design their new products a little bit more efficiently.

So, trying to figure out how to make it easier for people to look up the information that they need to know as they're picking a raw material, as they're picking a formulation to try to modify. Trying to make it easier to run calculations on some sort of test that my coworkers ran on some sort of analytical equipment. Maybe they want to do some calculations on that and they want to do the exact same calculation every time that they run that test. And so, I get the opportunity to help them do that more efficiently and make the computer do some of the tedious things that otherwise they would have been spending a bunch of their time clicking the same buttons over and over again, which is no fun for anyone.

And it's been a lot of fun the last few years getting to end up in this sort of unusual role, but it's been, I think, very valuable and I've enjoyed it a huge amount.

From bench science to data science via Posit Academy

So I started off, I'm a chemical engineer by training, and the first 10 years or so that I was working for the company, I was doing fairly standard chemical engineering type things. I was doing lab experiments where we would try to figure out how to improve the quality on some existing product, try to figure out how to make more material or make material faster in our existing assets, try to figure out what was important if we're scaling up a brand new material into larger and larger pieces of equipment and making sure that the quality stays good, all of those sorts of things.

And over all of that time, I would say I did not do anything programming related. I did some things in Excel all of the time that were pushing the boundaries of what Excel is good at. But it was all sort of more standard non-programming type tools.

And in 2021, I got asked if I wanted to participate in Posit Academy by some of my coworkers here. I see at least one of those is on the call already, which is great. And I was able to join that and start learning some tools. And it was really good timing of when I was asked if I wanted to participate there because I had some things that I knew were very inefficient, the ways that I knew how to run those calculations.

We were trying to put together a new version of some databases about the properties of ingredients that we might use when we're developing a new product. And the way that we were calculating out all this information would have been extremely tedious if we didn't start using some sort of programming tools. And so, the timing was very fortunate that things lined up well.

And I had some programming background from many years ago as a college student. And I think just about every engineer by training has to have a little bit of an introduction to programming. So, I at least knew some of the concepts of what programming can do. But it had just sort of been a class, and I took the class, and I finished the class, and then I was done with that language. It was some C++ in that lab at the time. Like, okay, I finished that class. I never am going to see these two languages again. And then maybe see a couple other semi-programming-type tools through other engineering classes. And then I was done until I got the opportunity to start learning R in Academy.

And it immediately clicked. So, the way Academy is structured, you go through a set of lessons. For me, it was a great setup as we went through the lessons. And it gives you a lot of the things that you're likely to do wrong when you're learning some of the functions. The places where, oh, this doesn't work properly because you forgot to put a text string in quotes. This doesn't work properly because you put the comma in the wrong place. Can you fix it? That sort of thing. And the structure just worked very well.

And then part of the structure also is that you're very quickly working on a project that some of my coworkers in Dow had put together, some data on molecular weight distributions of polymers, which is a very familiar set of data to me. Now, not with those specific experiments or whatever, but I know that type of data. And so, I was very quickly making some graphs that made sense to me and so forth.

And then you go through Academy with a cohort of around half a dozen people who are all going through the lessons at the same time. And you are all challenged to do an extension of recreating something with your target data set where you go off on a little bit of a tangent of something that's interesting to you, whether that's changing the formatting on a plot or making, I don't know, a new calculation on a new column in a data table or whatever it is. And hearing all the different types of things that everyone else in the group was coming up with was really helpful as well for me because, again, it's a bunch of people who do similar types of things to what I do, and everybody was constantly coming up with different ideas of what to do with the data set.

And so, whatever somebody else came up with was something that I probably would have wanted to know how to do a month later, I just hadn't thought about it yet. And so, that was a very nice structure. So, I've gone through that. That was when I first got into programming and realizing that it was something that could be very valuable for me on a daily basis as I did a lot of analysis of experimental work in my world, and I haven't looked back.

I quickly got to become one of these apprentice mentors and have helped out with a couple of cohorts coming through and keep learning all the time from questions that all the students are asking. They ask great questions that I haven't gotten to yet, and then pretty soon I'll be passing that along as a friendly tip when somebody asks, hey, what about this? Oh, I remember that because so-and-so asked that last time.

Knowing when Excel isn't enough

So there was one particular worksheet that I remember that I built in Excel, and it did what I wanted to, but I knew immediately that I was going way past what Excel was designed for. And it was something where I was trying to simulate something about some functional group reacting on a polymer chain, and I had set this thing up where basically each column of the spreadsheet was like a thousand rows or something, and each column was like one step in the reaction as it continued with just random number-related stuff.

And so I would have a column with some information in it, and then I'd pick a random number and select the correct spot in that column for like where the reaction happens, and then the next column is, okay, one reaction happened, now I'm at two reactions, where's it going to occur, and so forth, and I'm just combining all this text together. And so I ended up with this just ridiculous spreadsheet of like, it gives me one number, and I think it was like a thousand, a few thousand rows long and a few thousand columns wide where each column was like one step in the reaction, and it was all trying to calculate a number. Like, no, this is bad, this is not the way to do that.

So then like two weeks into Academy, which was sort of around the same time, I could calculate the same thing in whatever, ten lines of R code, and that was a lot more efficient.

But the other real signal that I've gotten with Excel is that there's a lot of stuff that I've done with various lookup functions and various countifs, averageifs, maxifs sort of stuff, which are some really powerful functions in Excel that get a little bit memory intensive. And so if you have a long enough spreadsheet that you're using some of these things on, it might take a few minutes to calculate, and that's a good sign that you're definitely pushing the boundaries of what Excel can handle.

What made Posit Academy's format effective

So I really liked the format because it quickly gave me enough tools to start making graphs. And to me, analysis and being able to look at a graph is a much better way to convey information than anything with just table manipulation type stuff. And so I think week two of Posit Academy, we were immediately making graphs and that helped a lot.

And then I could start tinkering with things on what the format there was. My favorite part about Academy as a whole, though, was that it gave this sort of... It forces you to start exploring things on your own by the way that you go through the lessons and it's got the, well, try this, this thing's broken, try to fix it, all that sort of stuff, which is fairly standard, I think.

But then the prompt to work in a group simultaneously going through those lessons and try to recreate something and then go beyond the initial target and say, you have to do something else, but it's up to you exactly what that is. And then talk about what you did and what you found out. That was really valuable to me and I haven't encountered that in a lot of the types of classes that I had encountered in the past.

But the fact that you get something where it's not even like there's multiple correct answers, there's an infinite number of correct questions to even try to answer was really valuable in my perspective. It was a format that worked well for me. I can't say that it works well for everybody, but it certainly worked well for me and a lot of other people around here.

But the fact that you get something where it's not even like there's multiple correct answers, there's an infinite number of correct questions to even try to answer was really valuable in my perspective.

Career advice: find the intersection of enjoyment and value

So when people ask me about career advice, the first thing I'll say is I don't really know how to advise somebody to end up in a role like mine because I wasn't trying to get into a role like mine and it didn't really exist at the time. But what I did do is that I kept track of things that I enjoy doing. And if you can find a match between things that you enjoy doing and are good at and things that other people find valuable but have no interest in doing, you're probably in a really good spot.

So I enjoy a lot of this data manipulation type stuff. I enjoy putting together complicated graphics and so forth. There's a lot of people who really want absolutely nothing to do with some sort of computer-related programming tools and would find it extremely tedious to try to put together a database of information or working through all the bugs of the formatting of text or even a graphic or designing the table that makes the graphic and all that sort of stuff. And a lot of people would find that extremely tedious and not so interesting.

But I felt a lot of pull from people who wanted the finished product, wanted the finished system where they could search through information or make a clean graphic or whatever it is, do their lab data analysis more efficiently, whatever it is. And I enjoyed the process of making those sorts of tools. And so it's worked out well as a nice fit together between those two things.

Collaborating across teams and building prototypes

So yeah, there's a lot of different types of questions that you might get asked and that will have varying degrees of how easy they are to answer and how easy it is to build some sort of shiny app or whatever it is. And one of the things that I've learned is it, it really helps if you have the ability to sit down with somebody, have them hand you the start of their data, whatever it is. And if they can just sketch out a, well, this is what I want it to look like, and maybe a little bit of what the math is in between, then you can start putting together a first version.

And if I have a first version, I always like to just hand a shiny app to somebody and say, try this, let me watch. Because the first thing that they're going to go try to click on tells you a lot about what they actually want to be doing. So if I hand somebody a webpage and they go, it takes them three tries to click on the right thing, okay, I need to either label things differently or design things differently or whatever.

And a lot of times it's not too much effort to redesign the display of something. It's the calculations and the import and so forth that are the hard part. It's the data cleaning and the import and the math that takes more effort. And then if you just tweak the display so that your graph switches the axes or switches from a bar plot to a scatter plot or whatever it is, that makes life a lot easier and gives the real answer that somebody wanted. So if you can get a quick prototype out there and let them look at it, that goes a very long way.

Understanding the data before you start

So the most helpful thing to me is having a good understanding of what format the underlying data actually is part of. If somebody says, well, I want to take all this information and I want to put it in a more structured database or I want to make a graph out of it or whatever, I want to be able to see some of the, like a representative set of the files that everything's sitting in right now so that I can tell, okay, how consistent is the naming of your files? How consistent is the naming of what's going to end up in a column? Is it a bunch of paragraphs of text that have pieces of information in them? That's going to take a lot of work if we're actually going to turn that into tables.

Or maybe it's all sitting in some sort of consistent analytical output that's a spreadsheet but is very non-tidy to use programming terminology here. If it always comes out the same way, you can probably do something with it. And so just that first glance of seeing what the data looks like is huge.

I will also say that understanding the value in terms of at least how much time it takes for somebody to do whatever they're doing right now and how many people would benefit from a new tool. That's also huge. If you know that the same tool is going to be useful for a couple dozen people, that's a lot better than if you're designing it specifically for the one person. But if it's one person who spends an hour a day doing the exact same data cleanup or whatever it is, it's probably worth it to speed that process up.

Visualization training and binary file formats

So, on the visualization thing, I will say one of my favorite Shiny apps that I've ever put together was something that I used in this visualizations training, which the concept there was I liked running a training in the past where, in person, I could just hand out a folder to everyone in the room and say, open this in a minute. I'm going to put a question up on the board about the things that's in your folder. It's got one sheet of paper. It's got some data in there. Answer this question, I don't know, which catalyst is best at having high conversion under these circumstances for the data that's in your folder? And just raise your hand when you know the answer.

And what I didn't tell them is that there were about a dozen different possibilities. Some people got bar graphs. Some people got scatter plots. Some people got a table. Some people got whatever. And so, some people would just raise their hand in five seconds because they got an immediate answer from their scatter plot or something. Some people would still be staring at their text 30 seconds later going, why do I feel dumb? Everyone else is raising their hand and I can't answer this question. What am I missing? And so, you just got unlucky with a data format that wasn't optimized for this situation.

And so, I put together a shiny app that does the same thing. It's just a web link. You click on the link and it's got a random number generated in the background that says which of these dozen possible data displays is going to load when you load the page. And so, I could run some of these visualization trainings online now or just drop a link in the chat and it looks very unsuspicious for everybody to be clicking on the same link. I know that they're clicking on the same link. That was a lot of fun.

In terms of the binary file stuff, so a lot of different analytical instruments have their own system of how things get categorized or get saved. And there's a lot of stuff out there where a particular file format is just a wrapper around some other file format. So, a .xlsx file is actually a zip file. You can change it to a .zip manually on your computer and unzip the thing and find a bunch of XML pages and stuff in there, which is always fun to find the pages and the graphics and so forth in all these Microsoft file formats.

And there's a handful of file formats out there that we use regularly that we wanted to dig into. A lot of those actually have information online where somebody has already published a parser, at least to some extent. And sometimes it's even a public domain file format where they've described all that information. But if somebody has just published a parser online, it's not always complete. Maybe there were some things that they missed when they were putting things together. And so, the ability to dig around in there has been useful in, well, it's nice to start with Excel files if you have those or CSVs or whatever else. But sometimes you can't do that and it's helpful to make some tweaks to other file formats coming in, which are structured, which goes a long way.

Internal vs. external tools and navigating the Microsoft ecosystem

It has been much easier for me because I'm making things that are for internal use so far. We have a sizable organization that is in charge of our DAO.com site and everything gets more complicated if it needs to be external facing because we have all this branding that's very important to a company like ours to make sure that it's got the right formatting and the right color scheme and so forth to fit with everything else Dow.

When you're just putting together a smaller type of tool for something internal, you don't have to worry about that to the same extent, which is nice. Now, on the other side, though, I will say we have a number of things that we're working on pushing towards external facing tools and the ability to put together a Shiny app or a Dash app or a Stringlet app or whatever it may be and say, this is what I want it to look like. This is the functionality I want, et cetera, I think is really valuable to somebody who's more working on the true web interface for the company to say, okay, well, maybe I'm going to write that in a different suite of tools, but I can visibly see all the functionality that I want to incorporate and that the people using it find important and I think that speeds up the process and requires fewer iterations of the back and forth.

Having a prototype, even if it's an internal prototype, goes a very long way on that sort of communication.

Quantifying the value of learning R

Yeah, quantifying value of any sort of digital tool has been a major struggle for all of us because a lot of times it's in time saved doing a task. And then the real question is, would you have even bothered to spend that time doing that task if you didn't have the programming tool?

So one of the biggest compliments, I think, that I can give R is that I had built, like run a number of calculations that we just wouldn't have attempted if I didn't have a programming tool. And so like I can answer questions that we just would have said, we can't do that.

Now, what number of value can I put on those? That's really hard. And in terms of time saved on something that also gets messy when you're communicating to stakeholders, if you say, well, I saved so-and-so, I saved this group of whatever, a dozen people, two hours a week each. And they say, oh, that's great. Does that mean that we can have one fewer person in our group? Well, that's not really the answer that I'm looking for.

But if you can save some of the tedious steps and answer, that's a good thing. If you can answer questions that you just would not have been able to answer in the past, because you would have given up first, that's even more value. But putting a number on it is not something that I've found a good solution to.

So one of the biggest compliments, I think, that I can give R is that I had built, like run a number of calculations that we just wouldn't have attempted if I didn't have a programming tool. And so like I can answer questions that we just would have said, we can't do that.

Overcoming imposter syndrome and building confidence

So fortunately, I was definitely not the first person learning R at Dow. And there were some other people like Tony and James, I know is here, and others who are very friendly and open to consult on things. And one of the things that I think all of us have learned is that if you've got a group of people who are all working in the same space, if you've got a new person who asks a question, a more experienced person might be able to answer that question, but might not have necessarily known that answer directly. It's like, I'm 90% sure that the answer is this. Let's try it. Yep, it worked. That's cool. I had never thought about it that way.

So I've learned a ton of stuff from the other broader Dow community in this area. But I think I've been able to share quite a few things that they've learned from me as well, because I'm asking slightly different questions than they had asked in the past. So once you start to realize, like if you have more experienced people who are very willing to say, I think it's this, but I've never tried it before, that quickly gets you out of the imposter area, because you know that this is not a stupid question. I'm doing something that's kind of new and unusual, and that happens very quickly.

The other thing that I've encountered is that there's a huge amount of information online for all these programming languages where you can ask a question in some way, and it might give you an answer. That goes a long way. Convincing people that they're going to get somewhere by Googling it is huge.

So with one of the recent Posit Academy groups that I was working with, one of my favorite moments in that was somebody had a question where they wanted to make a plot on a log scale instead of a linear scale, and I said, I want you to share your screen. We're all here. We've got five minutes left in class. I want you to just type something in your search engine in the terminology that you would use for figuring this out, and open up a page and see if it gives you the right command or the right function to use here. Let's go. Let's try it.

And she Googled something, and she found a ggplot function of scale x log or logarithmic or something like that that accomplished what she wanted to do, and it was there. It was new to everybody in the group. I had seen it, but it was new to everyone in the group, and it was, I'm going to make you Google this, and you're going to find it, and you're going to get that answer. And the confidence that you can do that, and even if it breaks, it wasted 30 seconds of it crashing once. I can go Google this again, try something else. That goes a very long way as well.

It's a lot easier to run tests on code, especially if you're just doing something like modifying the graph, than it is to run a new reaction in lab that may take you hours or days or whatever it is. And the difference in time there is something that takes a lot of scientists some practice to get used to.

What makes Academy successful at Dow

So I think that, in my experience, Dow has always had a good willingness to let people learn new skills, and desire for people to learn new skills. With Academy itself, I think one of the things that we've learned there is that making sure that before somebody even starts going through Academy, that we've been in touch with their management chain and got explicit approval that yes, we will set aside time for so-and-so to have enough available bandwidth to work on all the lessons and things in Academy, that goes a long way.

And then the other thing is, the faster that you can work with your own data and get some benefit out of it, the better. So it's more motivating to you, and it's more motivating to your management. If three weeks into Academy, you say, I just took this Excel file that I get on a daily basis out of some instrument, and I turned it into this graph that I have to make every time, and it was faster. All of a sudden, you're like, okay, so you set aside this time for me to work on this class, and I'm still going to need that time. But now one of the other things that I was doing on a daily basis, I don't have to do anymore. People quickly start to see the value there, and it's exciting for both the leadership and the students. And that's a really valuable milestone when people hit that.

So we've had good support, but making sure that we're up front with the communication and up front with the approvals and so forth is really important to make sure that people don't get pulled away into some other responsibilities of a different training program, into a different job responsibility that pops up into whatever.

What's next: broadening awareness of what's possible

I think that the next big thing is a lot of times with getting more people aware of what's possible with the programming tools, not necessarily teaching them all the programming tools, but saying, if people can come in and say, here's this concept that I have, I think it might be easy to somebody who knows R or somebody who knows Python or whatever it is, and actually being mostly correct on that assessment, even if they don't know the programming stuff, that's a huge step forward.

So trying to put out more broadly useful examples of these things and advertising them to the organization and saying, here's a couple examples of something that we've been doing recently, you might like some of these, but let us know if you have other ideas of something that might fall into the same realm that might look similar, that sort of stuff, come talk to us and it might be really fast to accomplish.

I think that's a big goal for everyone in the data science world, to have the right communication with the non-programmers out there who have a lot of great ideas and the faster you can filter those down to the ones that are good for computers and the ones that are not as friendly for computers to work on, the easier your job becomes.

I think that's a big goal for everyone in the data science world, to have the right communication with the non-programmers out there who have a lot of great ideas and the faster you can filter those down to the ones that are good for computers and the ones that are not as friendly for computers to work on, the easier your job becomes.

Well, thank you so much, Michael, for joining us today and to share your experience. And I'm so glad to hear how this experience with Academy has gone for you. But just so happy to have so many people from the Dow team here today as well.