Resources

Data Science Hangout | Stephen Bailey, Whatnot | From Academia to Industry

video
Mar 18, 2022
1:08:57

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout, everyone. If you're joining for the first time, it's great to meet you. I'm Rachel, I'm the host of the Data Science Hangout.

As I mentioned at the meetup yesterday, if you were there, I do want to take a moment to say it's nice to be able to share some space with everyone right now. What we do at RStudio is only made possible because of the community. And we're all beneficiaries of so many amazing community members, many of whom are affected by the war in Ukraine right now. So we also want to use this opportunity to support them back in any way that we can.

And for anybody joining for the first time, the Data Science Hangout is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing and what's really going on in the world of data science. We really want this to be a space where everybody can participate. And we can hear from everyone. So there's three ways to ask questions. You can always jump in live and maybe raise your hand on Zoom might be the best way to do that. You can put questions in the Zoom chat and just put a little star next to it if you want me to read it or else I can just call on you to bring into the conversation too. And then lastly, we do also have a Slido link where you can ask questions anonymously too.

Just like to reiterate, we love to hear from everyone, no matter your level of experience or area of work too. But for today, I'm so happy to be joined by my co-host, Stephen Bailey. Stephen's a data engineer at Whatnot. And Stephen, I'd love to turn it over to you to introduce yourself and maybe share a bit about the work that you do.

Stephen's background and journey

Yeah, absolutely. Thanks, Rachel. Hey, everybody. It's really my pleasure to be here today. I'll give you a little bit of my story over the last five years or so and where I'm at now.

My data journey started as a PhD student doing biomedical image analysis at Vanderbilt University. My PhD was on this cool mix of MRI images of the brain and cognitive development and children, like educational pedagogy with reading. And so essentially what we did was we brought kids in every summer and took brain scans of them and then looked at how their brains changed and interactions between brain areas changed as they learned to read and become fluent in decoding and reading comprehension. It's really fun. We got to do a lot of data engineering and processing and basically learn everything from the ground up. Statistics, the whole data science workflow.

Towards the end of that program, I decided I wanted to go more into industry rather than sticking with academia because I just love the process so much. By the time a project got to the poster session and I was explaining things, I was already tinkering on the next thing. Industry was a great fit. I started at a company called Immuta, which is a data catalog company that's very focused on compliance-related functionality. For example, if you load data into your data warehouse and it has PII on it, a lot of times you want to protect that data or apply permissions and policies on it so that only certain people can see it.

Immuta automated that. I got to build out the data team there. I got to learn a lot about data management, a lot about metadata management. Just recently this year, I went from a director position there managing a team of about four or five people to an individual contributor position at a company called Whatnot, which is kind of like QVC meets eBay. People can hop on the app and sell stuff, especially collectibles like trading cards, sports cards, vinyl records, vintage clothing. They can build a following. It's a very interesting mix of social networking, real-time auctions, live stream data. I'm running the data platform there, or at least managing the engineering aspect of it. It's been very much a drinking by the fire hose experience, but it's been really awesome to start learning.

Moving from management back to individual contributor

I know on these chats a lot, we talk about what it's like to move into leadership or move into the first management of data science role, but what has it been like to go kind of the other way back to being an individual contributor?

Yeah, that's a great question. I think I don't know what the real percentage of data professionals who go into management and then come back into individual contributors is. I don't know what that looks like, but I know anecdotally a lot of people do make that switch because, I think for two reasons. One is a lot of us get into data and kind of like very organically move into positions of influence within the organization simply because we love answering questions and we love like getting close to the business problem and like trying to use data to improve it. It kind of naturally, I know for me at Immuta, I kind of naturally moved into a position where we were growing. We needed more people to focus on it and I was like situated to lead that team. It wasn't like a, I want to go and be a manager type of experience. It was very organic.

As I spent more time in that role, I learned a ton, but I also like missed the data side because being like an individual contributor right now is just so fun. I mean, you just get to play with so many cool tools, the sort of the opportunities and possibilities of building different data products is endless. It's getting easier every month. And so that's really what I, you know, that's what motivated me to move back into the data engineering role is just that opportunity to like build bigger, cooler, more interesting systems that I hadn't done before.

Business problems at Whatnot

Cool. To put us all maybe in like the mindset of the work that you do at Whatnot, I'm curious, what are some of the like business problems that you're helping solve?

Yeah. So that's a great question. I think Whatnot is really exciting because it's a peer to peer marketplace. It's kind of like, you know, an Uber where you've got a driver and you've got a passenger and the app is really like facilitating a transaction between them. So the same is true for us where you've got a seller with Pokemon cards, you've got a buyer who wants to buy Pokemon cards and we have to put them together at the same time, like in real time so that they can have a transaction take place. So what is really exciting at Whatnot is the focus on real time analytics and real time data.

I'm very much a person who's, I come from a world where it's like big batches, big batch processing, like with medical imaging. And then even with like a lot of business data, it's not, it doesn't need to be that fast because, you know, as long as the report's there in the morning, like everyone can kind of do their jobs. But at Whatnot, because things move so fast, the ability to implement real time systems is actually very important because if someone's in an auction and selling something that's valued at $5,000, you know, a lot of people want to go see that. And the auction is going to be over in 30 seconds to 60 seconds. So the speed at which things are moving is really interesting, really like a challenging technical problem to answer, but there's so much opportunity for building interesting, like insights off of it and systems.

Cool tools: real-time databases and dbt

So, you know, kind of in the same vein of the real time analytics conversation, there's kind of a whole host of new databases that are becoming more popular that allow you to build, build like streaming pipelines much easier than you normally would. This whole space is fairly new to me, but I think in the past, if you wanted to like stream data from an application and then consume it by it for an analytics application, you would have to set up like a Kafka, a whole Kafka event bus. And you'd have to have like an like a sort of a streaming, a very like tuned streaming system. So there's like Kafka SQL analytics and other things like that. But now tools like Rockset and materialize and timescale DB are trying to make it much easier to just write SQL on top of these systems. And so as long, once you set up the pipe from your event source whether that's clickstream data or whatever, you can just write SQL queries on it and then expose those to applications.

And, you know, basically you have a very, a much lighter weight way to make recommend, like to implement like recommendation systems or ranking systems, or even like payment processes, like payment analysis systems. So fraud detection is like a big question that we're big issue that we're trying to focus on. And I think data, like the data team has a big opportunity to help like influence, make that easier. The other tool that I just love working with is DBT, which is like a data management tool. And they have some new functionality out around what's called a metrics layer, which is basically a way to simplify the creation of like metrics and kind of govern metrics kind of within the database logic.

Data engineering vs. data science

Yeah. Hey there. Thanks for doing this. Yeah, I work at RStudio. So I was curious if you could clarify what the difference between a data engineer and a data scientist in your eyes, because I see that you're currently a, your title is data engineer, but previously it was a data and analytics.

Yeah. Yeah. I feel like I've kind of been all over the map from a title perspective. I think my training in the PhD program was very much like you kind of get trained to be an independent investigator. And that means like you're designing the experiment, you're collecting the data, you're processing the data, you're storing it somewhere for reuse, you're analyzing it, you're creating the images, and then you're also like presenting it. So you get this full life cycle.

I would say the data engineering side of things, I think of more as designing data models, making like cold, like ingesting data from different sources across the company, making sure that those pipelines are running efficiently and reliably and really building trust in the raw data or like the, and the slightly like lightly processed data. Whereas the data scientist, I think of as much more like pointed towards a problem area. So the engineer, I think of as systems, a systems thinker, where it's like, we're bringing in data, we want to make it reusable for a large number of applications. Whereas a data scientist is very much like, I need to solve a problem, I need to create a, you know, an app, a specific application that's going to like, or a specific model that's going to be deployed to make decisions.

Learning dbt and finding meaning in your career

So what I found, you know, so my background was a scientist, right? Like experiment. And I'm taking that experiment to the end, to the conclusion, like what did we learn from this? And it's like very project based. When I moved to Immuta, I started off and I was a data scientist and I like had those projects. When I started building out the internal team and the data platform, it's a very different kind of task because you're building out an organization, an organizational capability, which is the data platform. So it's not a project. It becomes a system that you're building. And I was not equipped at all to answer, like to understand how to build a system that's going to be very reusable for the company over time.

What dbt does is the whole, the whole shtick is basically like decoupling the, in the pulling in of data with the using of data. So dbt basically says, all right, you've got data in your database. You want to do like, you want to build a dashboard over here, put all of the logic in SQL and put it in your database. And it basically allows you to, it makes it very easy to do that. And it's got some, like the, the way it does it, it's like very heavily, nicely managed for you. It makes it very easy for even someone who's very new to like database management to, to do it the right way, right off the bat.

And so it made me much more. So first of all, it let me build a system that was not terrible, right away. And then secondly, it helped me to, and the team, improve the system and build conventions over time. And because it's all in version control, it's in GitHub, it, as we added more people to the team, they could go back and look at like, what's happening. They could submit pull requests and we could have conversations over new data models that are coming in and things like that. So it kind of, you get the first win out of just like, it helps you do things the right way early on. And the second win is it creates a channel for communication and collaboration. That is really hard to, to create if you're not using something like GitHub.

And the second win is it creates a channel for communication and collaboration. That is really hard to, to create if you're not using something like GitHub.

For me, I'm a huge, like, for me, I get a lot of reward out of doing things that make, like, that I can see an impact on the way people work or like, like the personal sort of like, of collaborating with people. And so the types of things that I really enjoyed at Immuta were when I could sit down with a business stakeholder and like, say like, hey, like, what's your problem and really understand, like, the sorts of tedious tasks that they're doing that we could automate. I could build, like, oftentimes it's pretty easy for a data person to build something that's going to, you know, automatically pull those numbers and sort them in the right way that's going to help them help a business user do actions better.

At Whatnot, I'm kind of doing similar things where I can build pipes and build systems that make it easier for us to understand our customers or, you know, build workflows that are more efficient. But I do miss, I do miss the domain, like, the healthcare domain, the image processing stuff, thinking about the sort of bigger picture of science and general, generalization. I think it's offset a little bit though, because the data world is so deep and the technology is always changing. There's a lot of, you know, conversations around what's the best way to do this or, like, what's the best way to organize the teams and all of this stuff. So I find that even though I miss some of the healthcare type of learning and thinking very deeply about specific problems, there's more than enough exciting stuff for me to learn and dig into that that keeps me satisfied.

Privacy-enhancing technologies and data compliance

So zero-knowledge proof is basically a concept around cybersecurity where you can validate stuff without actually knowing the actual answers. So I was just wondering if there's any applications around personal data that help making working with easier, because, you know, I'm in Europe and GDPR is quite a big thing over here. And, you know, there's a lot of concerns around privacy.

Yeah, so during, I can't speak too knowledgeably about this specific question. I do have some experience with like implementing privacy, privacy enhancing technologies in organizations. So at Immuta, we, the product would scan a data warehouse for sensitive data and tag it. So like it would say, all right, this looks like you have social security numbers, it looks like you have addresses, it looks like you have names and things like that. And then there were a couple of approaches that we could automatically implement if you had a policy that wanted to mitigate privacy concerns. The ones that we had were masking technology, masking methods, like hashing, redaction, replacement with string, rounding, etc. We had one that was called k-anonymization and we had one that was called differential privacy.

What I'll say is that it's for a lot of organizations, just implementing like the basics is very, very, the basics at scale is very, very challenging because you have to have a lot of metadata, essentially, a lot of high-quality metadata to do privacy management very, very well. So you have to know, you have to have a language around what is sensitive data, you have to have metadata on the data itself that says this column is sensitive. You have to have policies in place in your organization. So you have to know, like actually, you actually have to have someone who's translating that into an actionable policy of like, we need to mask this type of data for these types of users. You have to have high-quality user data to know like who is accessible.

So it's extremely challenging to get all those things right. I think what we found, like what we spent a lot of our time doing at Immuta was trying to help organizations build a language that would allow them to address those sorts of questions at scale. So all of that to say, I think what we found oftentimes was that people would come in and they'd say like, we want to do differential privacy, which is essentially like randomizing, injecting randomization into your data set to provide a level of privacy guarantees. But then they'd always end up going down to like, let's just mask the data, like let's just mask the data and like get it, like get started using it rather than like trying to do the most private thing first off.

Transitioning from data scientist to data engineer

I actually wanted to ask you about your transition from being a data scientist to data engineer. So how is the transition itself? And what would be some of the things that you think that really helped you in your current job from your data science experience? And what are the weaknesses or things that you had to learn immediately?

So for me, I love the systems building part of data engineering. So I love being able to think big picture about what are the patterns that we're implementing in these systems and how do we make them very high quality and efficient, like implement checks. And that was true even when I was doing my PhD, I often found myself gravitating towards the methods and rerunning pipelines and trying to make pipelines more efficient, even when that probably wasn't the most, the best thing for me to be doing.

The challenge that becoming more of a data engineering role, like moving more into the data engineering role has presented is you have to know a lot about like the technologies and you have to learn a lot of software engineering patterns that I was never taught. And so, you know, things like domain driven design and like testing, like how do you design a Python package? How do you implement good testing? How do you implement observability and like log tracing across multiple systems, networking concerns. There's a lot of technology, like I would say the technology side of things is much more important in the data engineering world than it is on the data science side.

I think one thing that I bring to a data engineering role that I wouldn't have if I hadn't spent time as a data scientist is understanding use cases for data. A good data engineer, I think has a lot of leverage over helping the organization, like get data, not just to places on the right time, but in the right way with the right metadata that makes it useful, like do some pre-processing of data to make it really easy for data scientists to work on it. And so I think I bring a lot of that like context, like contextual information about how data is going to be used, which makes me better at building the systems for the users.

Being the first data scientist vs. joining an existing team

That's a great question. I think there's pros and cons, and you do have to ask yourself, like, I think a lot of self-knowledge comes into play here. What's great about being the first data scientist at a company is you get to write the, you get to build the system, you get to understand, you get to chart the territory. You're, like, kind of, like, you know, you're the pioneer charting the uncharted territory. So that's very, that can be very exciting for people. It can also be very overwhelming and lonely, because you don't know if your decisions are going to be right. You don't know if, like, you're going to have to rebuild the system in the future. You might not even know what you're doing, really.

When I left Immuta, I knew for sure that I wanted to join an existing data team, because I wanted to learn from others. I think that's the, that's one of the differences, is if you're doing it, if you're charting your own course, there are great communities out there, there's great learning materials out there, but you're still going to be ultimately alone to some extent. I knew I wanted to join an existing team, because I wanted to learn from other people. I wanted to see what other people were doing, how they were doing it, what kind of dashboards were they building, what kind of conversations were they having, what, like, opportunities they see. And it's been very, very rewarding to be in that sort of position for this at Whatnot. So it's just, like anything, it's just got trade-offs. I would say, if you want to be the first data hire at a startup, go for it, but just make sure you know kind of what you're getting into, and are prepared for that.

Working with real-time data

You mentioned earlier that you went through, when you started at Whatnot, working with real-time data in comparison to batch data in the past, and I'm curious what unique and specific challenges that you ran into when navigating that switch.

Yeah, so, I mean, this is very much an active area of work for our company, but it's a, what I would say is that the technical side of things matters a lot more. With batch data, you can kind of, you know, just shove stuff into an S3 bucket, or like, you know, not think too much about schedules, or efficiency of processing, and things like that. If you're trying to deliver real-time analytics to an application, like a mobile application, there's almost no room for error. Like, you have to be thinking, because you're trying to do it, whatever your SLA is, it might be five seconds, right? So you think, like, after you click a button, you know, that event has to go somewhere, and then it has to have logic applied to it. Maybe it has to be joined with other data, or a model has to run on it, and then it has to be sent back to your mobile device. And if anything takes, if it takes 20 seconds, like, as a mobile user, that's an eternity.

So the stakes are just so much higher, because you are trying to do something much more, you know, that affects the user much, much more closely, and it's much more interactive. So that's one of the biggest shifts, I think, just from a sort of emotional, like, almost like an emotional standpoint of, like, people are going to be using this, this is going to affect the user experience. If this thing doesn't work, then, like, maybe a show doesn't show up on someone's feed. And that means that seller doesn't get featured as much, which changes, like, the seller's perception of their experience on the app.

R vs. Python for data engineering

I see Ian had asked earlier, what's your opinion of R as a data engineering tool versus other languages like Python or C, for example?

I'm not a, I'm not an expert in R. I used it during my PhD, a number of, like, for a number of use cases. I love R Shiny. R is, like, so much more elegant for a lot of data science work than anything in Python. I think that's my, after using, after, but I made a deliberate switch to learn Python for two reasons. One was because a lot of my work was using image processing libraries that required me to do script, like, use things like OpenCV or SciPy. There were just more libraries out there for image processing work.

As I've, and I've, my sense is that there are more out of the box solutions for Python. Like, it's much more of a lingua franca in the data engineering world than R is. Just for, like, thinking about, like, what AWS provides, like, AWS provides a CLI for, or not a CLI, but a library for Python users to interact with resources in AWS. Maybe there's something like a Boto3 for R, I'm not sure. But I would say that outside of, if, like, to the extent that you have to do things outside of a, like, data management and processing flow, like, like, pull, like, ingest data, move data from a source system into a database, but not, like, doing the processing on the fly, my guess is that Python has a little bit more, more libraries out there.

The data engineer's role and organizational thinking

Yeah, so one of the, I think one of the fundamental things that most data engineers are responsible for at some point is, like, the ingestion pipelines that, you know, either ETL or ELT is kind of a new pattern of just, like, extract and loads or replicate the data into your warehouse and then do the transformation in the warehouse or in the data lake. So you spend a lot of time, like, building, setting that up. But there are some good tools out there now, I'd say in the last five years, tools like Fivetran, Stitch, Meltano, Airbyte is another one now, that have essentially made it a commodity, like data replication a commodity.

But one of the areas that I think is newer, I think something that is emerging in the data engineering world is the data engineer as not just a technical person, but as a systems builder and a systems thinker. Some larger organizations have data architects, but this sort of idea that we have, like, across the whole company, you have information flowing around, and we need to be able to have a, we want to avoid a situation where, as the company grows, you start having all of these data silos, and everyone's, like, getting their own sources of truth and, like, creating their own metrics, and none of the metrics agree. The data engineer is really well-positioned to think about, like, how does data move throughout the organization? What sort of conventions do we want to put around publishing data products? What sort of guarantees do we want to make? What kind of language do we want to use? What compliance patterns do we need to implement to make sure that our data is being used correctly?

Towards the end: moving from academia to industry

Thanks, Steven. I see Tatsu had a question, as well. He said, as a fellow recovering academic, if you want to jump in, Tatsu.

Yeah, sure thing. Hey, Steven. Great to see another post-PhD doing well in industry. So, actually, it's pretty funny, because my background is very similar to yours. I think you said you were imaging while you had kids reading, or something like that. Mine is basically sub-EEG for the imaging medium, and then we had kids exercising. So, a lot of similarities there. I ended up landing in the customer success space. I work at RStudio. And, of course, I have a background in using RStudio, and here I am, right?

But, what I found interesting, so far, from, you know, here and there, as you're answering questions, right, is I'm very curious on your perspective on, you know, that transition and, like, how easy or hard it is, right? For me, I think it was very difficult. You know, I come from psychology, which traditionally, kind of, all they teach you to do is become an academic, right? There isn't a whole lot of help for you as the student who's, you know, considering a career and, you know, maybe a data-related role, right, which lends really well to the skill set that you're taught. But, there isn't any course that teaches you how to do that transition. There isn't a whole lot of networking events that lends well to that.

And, at the same time, right, like, I think that, from the industry side, I think people are starting to recognize that, oh, wow, like, people that come through the PhD pipeline, they have a lot of what we need. But, the issue then becomes, right, from our perspective, we don't know how to phrase those skill sets, right, to be able to apply it in an interview setting or anything like that. So, I just like to hear your, kind of, thoughts on that. Is it, kind of, a failure of the higher education system, or is it something more that, you know, industry folks could be doing better to identify?

That's a, that's such a great question. I think my brain is scrambled on it a little bit. How do I say, like, it's such a, it's almost like a relationship. When you go with a PhD program, you do feel like it's, it's, it's both, it's not, it's a job, but it's, like, more than a job, too. There's, like, it's almost like you have a relationship with the field, and you're very invested in the research, and you're, like, it's, you're being, you're part of the scientific community. And I think one of the things that's very jarring about leaving a field is you lose that, it's all, you feel like you're getting divorced, or something like that, like, you're, like, you're severing that relationship, and you're severing that community, because it is, it is almost two different worlds.

For me, I identified pretty early on that I wanted to move out, and that helped me, because I was able to spend a lot of time talking with people who weren't in the university, and building relationships, so that when I did transition, when I graduated, I already knew, like, I knew people who were on the other side, so to speak. And so I didn't, even though I felt lonely, and I was the first data science hire, and, like, no one knew what a data scientist did, I also felt like I had a pretty good understanding of business, of, like, this is what a business, this is what doing data science and business, doing data in a business looks like.

So I think that if anyone's thinking about transitioning, or doing something similar, I think that's a very, like, talking to people, and building relationships before that transition can really help make it softer.

I had a very similar experience, like, when you go through a PhD program, they teach you how to learn, right? We know how to learn how to learn, and we can learn how to, you know, figure out this whole getting hired in an industry role sort of thing, and that's exactly what we do, right? But, yeah, it would have been nicer, right, if you didn't, you weren't the one that had to actively do all of this networking necessarily, right? It would have been nicer if there was just kind of some easier formats for you to kind of plug yourself in.

Yeah, I had an interview, you know, one of the things that kind of surprised me is I was this biomedical imaging PhD at Vanderbilt in Nashville, and Nashville is, like, a huge healthcare hub, and I couldn't find a job. I, like, it looked for quite a while, and I couldn't find a job, and I met skepticism. I met, I had one interview where they're kind of like, you sound great, but why would you want to work here?

And then I had another job that was like, well, we don't really do any imaging stuff, and, you know, I was kind of like, I'm like, I don't, I, it's about the learning, it's about the impact, it's about, like, it's not about the imaging, you know, it's just one example. This is just one subset of problems that I'm interested in, and so you do, you do have to go through, like, and intentionally think about branding yourself. And I think the biggest thing in industry is you can't get, you can't get obsessed with the problem. You can't fall, you can't fall in love with a problem. It's much, it makes your job of finding a job and, like, fitting in much harder. You have to fall in love with the, your position in some way, like, like, your, the way, like, how you're solving problems, more of, like, applying the mindset of a scientist into this business context.

I think the biggest thing in industry is you can't get, you can't get obsessed with the problem. You can't fall, you can't fall in love with a problem. You have to fall in love with the, your position in some way, like, like, your, the way, like, how you're solving problems, more of, like, applying the mindset of a scientist into this business context.

Yeah, I haven't spoken to a PhD where, you know, things like this don't resonate, right, and I've certainly struggled through that myself. It does, you, everyone carves their own path, I think, as Tatsu was saying, you just kind of, you learn to learn, you kind of figure it out, and I think that the skill sets that everyone's learning in graduate school, no matter what the subject area is, are very helpful in being able to help address that. But, you know, my biggest thing, my biggest advice would be don't be afraid to reach out to people. Most people are very willing to, to talk to you about what their journey was, no matter where they're at in, in their career, whether they have a PhD, whether they don't have a PhD. And once you kind of see that and you start to talk to people, you realize that your, your, your room is much bigger, right? There's, there's a lot of people from all walks of life that have moved multiple careers or multiple industries.

What data scientists should do more and less

Yeah, so I would say for data practitioners in general, and this goes for analysts and scientists, when I was managing the team at Immuta, one thing I found myself saying to our data scientists a lot was, like, how can we simplify, like, simplify, simplify, like, things, because in the business context, and again, we're a smaller company, we weren't building, like, and we weren't building, like, models, right, that had to, like, accommodate a bunch of stuff. We were building more of just data products, and dashboards, and scores, and things like that for our business consumers.

Having the, you can't, I can't overstate the value of having transparent logic for end users. And we talk a lot about, like, you hear about explainability in the context of, like, neural networks, but the same is true for anything, like, any function that, like, takes input data, and then, like, outputs something, the more explainable that is, the more likely it's going to be adopted and by the business user. And so, like, if you can get the same, if you can get the same output from logarithmic regression that you can from, like, some Bayesian model, like, go with logarithmic regression.

The other thing I would say is just always thinking with an eye to deployment, like, you know, can we do this, what's the simplest way we can do this, and so that we can get it in production as quickly as possible. So can we do this as a SQL statement? Like, that was a lot of times the decision that we had to make on the Immuta team, like, can this be a SQL statement that produces a table that can be read in, or does it need to be some Python, like, lambda function that runs on a schedule? Because that additional complexity is, it compounds as the system grows. So the more you can kind of keep things simple, the best, the better.

Yeah, reminds me of a meme about, like, data science and then being unmasked is actually just if-else underneath. Exactly, yep. And I would say always do the if-else if you can. You can still call yourself data scientist. I mean, they do it in science too, right? Like, you never want to look at an actual, like, professor's code. Just read the paper, get the insights. Don't worry about what's underneath.

Building trust in data and getting people to feel ownership

So one of the light bulb moments for me was at leading the data team at Immuta and, like, starting to grow the system was reading a book called Data Management at Scale. It's very, like, theoretical, but one of the principles is this idea of domain-driven design, which is as data comes in to your system, you don't want to build just, like, a big black box where it's, like, again, it's kind of explainability, right? So when data comes in from, you know, the marketing team and the sales team and the customer success team, you don't want to just, like, merge it into one, like, super, super, super table and then, like, expose it. You want to try and set up, like, interfaces where it's, like, all right, here's the lightly processed marketing data. Here's the lightly processed sales data. Here's the lightly processed customer success data and kind of keep some, keep some, you want to track the lineage, right?

What I found is that the more you can do that, the more people can, like, feel, feel ownership over it. We did this one thing at Immuta where we exposed Salesforce, which is our customer relationship management platform. It has, like, all of our customer lists and stuff in it. Salesforce data comes into the warehouse, then it gets exposed in our BI tool Looker. And there was this weird thing where the person who managed Salesforce would look at numbers in Looker and they would be, like, I don't know what these are. Like, they wouldn't, they didn't feel ownership over it simply because it was, like, going through our pipes. And so we re-architected things and tried to keep it much more, like, domain oriented. And they, that had the positive effect of, like, making them feel more ownership over it. Because they would go and they'd look at Looker and they'd look at Salesforce and be like, okay, this matches up.

And that sort of building trust and building familiarity with the data is really hard. But I think it's easier if you can keep things separate and lightly processed and just have good documentation and clarity. But it's a challenge. I think it's probably one of the biggest challenges in data management.

The more you can do that, the more people can, like, feel, feel ownership over it.

Thank you so much, Stephen, for jumping on and sharing your experience with us. And thank you all for asking all these amazing questions, too. I just wanted to see, Stephen, what's the best way to get in touch with you if people have follow-up questions or want to connect? Is it LinkedIn?

Yeah, LinkedIn is good. I have a Twitter account. I try, Twitter kind of scares me and I don't want to get sucked in. So I don't post on that too much. But LinkedIn is good. I started writing more this year. And so feel free to subscribe to that. But yeah, I really appreciate all the time, everybody. This is awesome. Thank you so much. And if you want to continue any of the conversations from today, I'll put the LinkedIn group in the chat, too. And feel free to start your own discussions there. I usually share a few of the helpful links there, too. Thanks all. Have a great rest of the day.