Sean Nguyen @ S2G Ventures | Data Science Hangout
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey everybody, welcome to the Data Science Hangout and our last Data Science Hangout of 2023. I know people always say this, but I seriously can't believe that we're already here at the end of the year. I want to say a huge thank you to the 47 leaders who have joined us at the Data Science Hangout this year. And those three sessions where we all were the featured leaders. And who knows if this is accurate with Zoom, but the over 2,600 people who have joined us at Data Science Hangout since we've started them and the core group here who keeps coming back and hanging out with us each week. So thank you all so much for making this space what it is.
If this is your first time joining us today, so nice to meet you. I'm Rachel. I lead customer marketing at Posit. This is our open space to chat about data science leadership, questions you're facing, and getting to hear about what's going on in the world of data across different industries. So we're here together every Thursday at the same time, same place. We'll be back on January 11th. We're all dedicated to making this a welcoming environment for everybody.
There's always three ways you can jump in and ask questions or provide your own perspective on certain topics. So you can raise your hand on Zoom and I'll keep my eye out. You can put questions in the Zoom chat and just put a little star next to it if it's something you want me to read out loud instead. And then we also have a Slido link where you can ask questions anonymously.
With that, I'm so excited to be joined by my co-host today, Sean Nguyen, Senior Staff Data Scientist at S2G Ventures. And Sean, I'd love to have you jump in here and introduce yourself and share a little bit about your role, but also something you like to do outside of work too.
Sean's background and role at S2G Ventures
Yeah, so I'm Sean. I'm a first data science hire at S2G Ventures. We're a venture capital-based company based in Chicago. We're investing in food and oceans and clean energy companies. And so we were founded in 2017. And I was kind of brought on in 2021 as the first data science hire to kind of flesh out the capabilities of the organization. And my background is in molecular biology. And so I was really interested in working in data. And then I kind of learned about this thing called data science. And then I got really addicted to learning how to program. And then in terms of things that I like to do outside of work, I really like photography and cooking. So I'm a bit of a person that likes to try out new restaurants in Chicago and cook at home.
Proof of concept and getting buy-in
Yeah. So as the first data science hire within the org, I knew we wanted to get information to our business units. And so what I wanted to do was run analyses and send it to, let's say, my manager. The thing is, is that when I created our markdown document at the time, in order to share it with her, I had to basically render the HTML file and send it over. Because this proprietary information, we can't share it easily because you don't want it out in the world. So I'd send the email attachment and send it to her. And then she would be like, oh, I this is great, but I don't want to download something and have to open it. So I want to be able to click a URL link.
And so in terms of what it was, it's kind of an analysis of anything that she wanted to know. So let's say like here's the meetings that you have during the week. And that's really insightful information that you can't necessarily get from, let's say, a dashboard or something that is super specific to them. So like in terms of the idea of the POCs, that is to kind of almost give that nugget of like, oh, wow, this is really useful to me. And then what that does is it gets that buy in, because oftentimes when you're trying to, let's say, get a new service or a new product, some people will look at the sticker price, like a sticker. It's like, oh, wow, there's no way I'm going to pay that. We already have Tableau. We already have Power BI, whatever. Why do I want to get another, you know, data science product I've never heard of? So in order to kind of almost get the buy in is you kind of give them that like, oh, wow, sure, the art of possible right to be like, oh, you know, here's this cool thing I was able to do. And then they're like, oh, my gosh, I really, really like that. Tell me more.
So in order to kind of almost get the buy in is you kind of give them that like, oh, wow, sure, the art of possible right to be like, oh, you know, here's this cool thing I was able to do. And then they're like, oh, my gosh, I really, really like that. Tell me more.
Communicating insights proactively
Yeah, so I think the idea of being able to proactively find information that might be relevant to people, because like to your point, like I can create 10 dashboards and then just check the dashboards. Other people, they don't have time to do that. And so sometimes you can be like, oh, it looks like you're meeting with, you know, Mary from whatever company. Did you know that X, Y, Z thing happened? And then they can go armed, ready with this knowledge whenever they meet with this individual and they don't have to check a dashboard. And so being able to find whatever information, whether it be from a different data source, some analysis or report, and then you're able to kind of send them an email. Or some type of alert dynamically. So I use the Blastula package with R to kind of send emails programmatically to people. And so that's how I was able to get them to to really like latch onto that, because to your point, you know, it's people don't want to necessarily have to check something. It'd be nice if it was delivered to you in your inbox and you can be like, oh, wow, this is really helpful. And then you go on your way.
I love Mike's suggestion in the chat here to get everybody going because we love the conversation that happens in the chat. But share your favorite holiday film in there.
The S2G wrapped and Quarto presentations
One thing I did recently, I think on Tuesday. So I decided last year to make a Quarto reveal JS document. So a slide deck of like we call it S2G wrapped like the Spotify wrapped where I summarize all the things that we've done in the organization. And it's like more than just a PowerPoint deck because it actually embeds all the different widgets and dashboards. And so I use like GT with the interactive interactivity with it. I used all these different charts from like Highcharter and all that and allows people to kind of interact with stuff and kind of see everything as a summary. And it's one of those things that I got so much good feedback. Oh, my gosh, this is amazing. And then allows the folks to to kind of see like all the work that we've done.
And so that's like one thing that I kind of created. And I would encourage other people to do that is to use like Quarto. You can create an HTML file, a PDF, or you use like the reveal JS presentation or you can render it as a PowerPoint as well. But it allows you to go from just routine analysis to something that's really amazing that you can share with people in your community.
Selling data science work
Yeah, I mean, I don't have experience in sales, but I think to your point, I think it's about trying to find the way that you can provide value. So it's to find value that the person would be interested in. And so then you're trying to make their life a little bit easier. Whether it be selling something or you're trying to give them information, in my case, information that allows them to act and, you know, kind of do their job a little bit better without friction.
Absolutely. Yeah, for sure. I've noticed that firsthand, right? Where, yeah, you can have the fanciest models or anything like that, but unless you can surface it to them and something that's meaningful to them, it's exactly that point, right? I think it was David Robinson, right? If it lives on your laptop, it's essentially useless, right? You need to ship it or send out something so that people can actually see it or appreciate it, right?
If it lives on your laptop, it's essentially useless, right? You need to ship it or send out something so that people can actually see it or appreciate it, right?
Defining success as the first data science hire
So, I think my role has evolved over time at S2G, right? And at first, it's just that we were trying to, we had all this information, right? Because like at the end of the day, technically, for the bottom line for S2G is like, being able to invest in innovative companies, and then hopefully being able to generate market returns for the fund, right? And so, it was, for us, it was clear as like, hey, are we driving value for our portfolio companies? Because if they're successful, then in the end, we'll be successful.
And so, it was clear from there, like, if I'm able to identify investors or companies that are relevant to us, then therefore, it will help the bottom line of the organization. So if I'm able to identify or create different models that allows us to, you know, find out a company that's a diamond in the rough, or an investor, or someone that we may not have interacted otherwise, right? We allow them to find them by doing analyses and things like that. And therefore, that's how, like, at least for me, I was successful, right?
One way to do that is to be able to kind of interview the different stakeholders, not necessarily your manager per se, right? You get guidance from them, but then you interview stakeholders, identify their pain points, right? It's like, oh, I, every time, you know, I'm scrambling right before my meeting to find the talking points to whoever I'm meeting with, right? So, then I was like, okay, let's take that, take the different data sources using the contextual knowledge of, like, the different data sources that we have. So, for us, we use Salesforce, other data sources, right, trying to integrate them, and then, you know, provide a proof of concept or whatever, and then you drive value there.
Working alongside BI tools and ETL pipelines
Yeah, so what I've done recently, we had some BI tools before, and but then what we didn't have was, because sometimes those tools require, essentially, more or less model data layers, where it's, like, pretty clear, like, when you're querying from this, you feel confident in the data that you're pulling from, in terms of the rows, and things like that. But what I had to do is, I had to create ETL pipelines, so it's just getting data from our raw data sources, transforming them, and modeling them so that they're actually usable, and then that actually feeds our BI tools downstream, right. And so, at least for me, we use Posit Connect to have scripts that take the data, do the thing, and then send it back to our data warehouse, right, and then from there, our BI analysts can take the data and do whatever they need to do, and feel confident about it.
So it's not like we're trying to replace one another, we kind of live in harmony, right, so I'm able to take the data, clean it, and then run my own analysis, but then other people, like other not, power users that can use it to run analysis themselves, and not have to worry about is this correct or not, so yeah.
Productionalizing models and deploying securely
Yeah, so I would say it's, it was a struggle at the beginning, right, like there's no doubt about it, of getting all the, A, the data, getting the model, and then I would say maybe spent like, I don't know, 75 percent of the time getting the value, in the sense that like getting the nugget of information, but it was in like a, a Colab notebook, right, so I had this machine learning pipeline, and it was like essentially a white glove service, right, of like, Sean, give me the top, you know, 100 investors for this company, or whatever, right. And so I would do the analysis, and run that stuff, but then they were like, oh, you know what, I want someone that's going to write a 10 million dollar check, so then I have to go in and change it, change it, change it. So then what I ended up doing was like, okay, this is not scalable, we need to productionalize this to a POC, like a Streamlit application, right, so it's like a Python app, web app, they can kind of do whatever they want, change the parameters, do it, it's self-service.
But I didn't, I couldn't do that until I had like the foundation of like the model, so to speak, so it's like a ton of time went into that. So then I was like, okay, let's create a web application, and then that was a beast in and of itself, but then once that's up and running, you're kind of, you know, smooth sailing from there. So it's like 75 percent, and then the other 25 percent for me was creating a web application, or you know, some type of widget thing that the end unit, or the business unit could use to be able to do whatever they want, and I call it productionalizing it, because it's like, it's no longer in a notebook, but it's an actual app, right.
And then another caveat is like the whole deploying, like when you're in the enterprise, you can't just, I would love to just deploy it in GitHub, right, and make it open to the world, but because it's proprietary information, you have to, that's another like a security thing, getting buy-in from your IT people, right, because that's a whole beast if you're in a big organization. And I can talk about it when I got Connect, right, because I'm not a Unix, a Linux admin, and so I had to kind of deploy it and learn all that stuff as well, right, and to be able to deploy it within, so I can just send a link to an internal Connect server, and then anyone in the world can access the web application easily.
Small wins and simple apps that make an impact
Yeah. So I think for me, being able to host web applications, whether it be Streamlit, Shiny, or other like Flask applications securely, because that's like a big hurdle within, at least for me, is like I can't share it publicly, right? And so one of the quick wins for me is being able to host, whether it be a Quarto document.
So like a good example is someone creating basically like, I was helping one of our analysts out. I made a way to, like a pivot table essentially, but like where they just upload whatever CSV that they want and it automatically would do it for them. It was like a quick thing that I did for them that made their life so much easier, right? And it could, it would take me time to kind of do it. I was like, you know, I don't want to repeat myself over and over. So then I basically have been still to this day, it's like this little pivot table thing that will automatically do whatever they need to do.
It doesn't have to be complex because I think, you know, even I thought so. I wanted to have the most sophisticated app that has all the little, it's like a Ferrari, right? But you don't need that. You just need something simple, right? Just if it does one thing and it does it well and robustly, people will begin to use it over and over and over, right? And that's kind of, it's so funny. It's like, I was so proud of this app that I made, but it's like the simple thing, like the pivot tables, like people like that, right? And it's like, whatever, it's cool. It did not take me nearly as long, right? But sometimes it's those little small things that it does it well. And that's what, you know, can get people really excited about whatever you're producing.
You just need something simple, right? Just if it does one thing and it does it well and robustly, people will begin to use it over and over and over, right?
How do you find out like what those things are that are taking people way longer than it should or like the opportunities to help somebody? Yeah. So I think it's just one of those things where it's like pattern recognition, where someone has come to you like more than three times to do this one thing because you do it well, it's very simple. Whether it be deduping or looking up or aggregating, like for us, we want to aggregate our quarterly, you know, metrics or KPIs. And so I do that a couple of times, like, you know what, maybe we should make this into a little self-service application so that they can kind of do whatever they want. And so it's just kind of another David Robinson thing. Don't do it more than three times, right? Then create a function. But in this case, if you do something, maybe create a Shiny app or whatever app that can help them do whatever they need to do more efficiently.
Working with IT and deploying Posit Connect
Yeah, so are you asking more of like how I implemented Posit Connect and Workbench within the org? Yeah, no, so truthfully, I think like I was kind of self-taught, right? And so I didn't know any better, right? So a good example is learning how, okay, so I have data that's like duplicated in Salesforce, like how can I make it so I can't dedupe it easily, right? And so then I created some scripts that would dedupe it on a nightly basis, but then to your point, like I need to be able to, let's say I have Connect or some type of process to automate it, right? Making sure you have buy-in from your IT folks.
And sometimes with organizations, you are really restricted, right? To be able to even access the database. It's like sometimes you can't even get that, right? And at least for me, I was fortunate enough in that we were kind of building out our cloud infrastructure. And so I had a little bit of a hand in that, but I by no means knew like what was up. I think I talked about in my Connect talk for Posit that I was trying to update the server and I totally borked the image, right? And so luckily I backed it up. And so right now our production image is like the RStudio Connect backup. Like that's what we use in production now, right? Because I didn't know any better, right?
And so it's just, I think it's kind of having that mentality to be able to move quickly and to kind of iterate and then know that you're going to make mistakes. But then I think it's highly dependent on your IT infrastructure, right? And then being communicative, right? So sometimes like their job is to make sure everything's secure and to not have data leakage or anything like that. And so making sure that you're on the same team, right? To be like, hey, look, I'm trying to, we're on the same team, we're trying to provide value to the organization and understand like what are their concerns, right? And then that you're not going rogue because what I've experienced is that that's not fun for anyone, right?
Tackling data silos and understanding business needs
Yeah, so data silos can exist. For me, it was data was siloed in the brains of like the business unit. And so what we've done is do informational interviews where you kind of, you go tag along with them, right, to kind of see like, how do they go about doing their job, right. And so by going to, let's say, the business development team, or the investment team, and kind of shadowing them, you understand their either their pain points, or like what they can, what you can do as a data scientist to kind of help make their job easier.
Okay, but at least for me, all the data is in like BigQuery and all the because we feed everything into the data warehouse. And then we do all the transformations and send it back to like a clean layer. But in order for me to know what to extract, because there's so much raw data, right? You almost have to understand what is the business trying to achieve? Because sometimes I've done things all the time where I do stuff that I think addresses what they need, but it's not exactly what they will want. It's like they may ask for ABC, but they really need DF, right. But you don't realize that until you kind of talk to them, even though they say, I want ABC, they really need DF. And you don't get that until you actually kind of understand things.
So I, you know, because I, you know, I was in grad school, I was, I'm used to working late night hours and just working on it for just brute force figuring it out. But what I had to realize when I pivoted the industry is being able to iterate quickly to be able to show them like a janky looking report with some number. It's like, it's like, is this right? Is this not right? Is this like in the ballpark? And then getting feedback or directionality because, you know, I do like that, like the shiny well-polished application, but it's like, it'll get to that thing where you work on it, but then it's like, they're like, this is not what I wanted, right? And so you're much better off showing something that's really rough, but it's somewhat tangentially relevant to them. And then iterating again, then to like take all the time, make it super beautiful looking, but it's like not the values of the information that they will want, right? So that was kind of one thing I had to kind of unlearn when I was in graduate school to the industry is like to make sure you're getting feedback from them.
Managing long-term vision with leadership
Yeah, so I would say it's, you know, what I've done, right, is I do whatever is asked of me, right? So let's say it's like the XYZ KPIs, you do that. But then sometimes I'll work on a side project. And at least for me, I was doing like a relational graph network this year for a wrap. And I kind of slipped that in as like a little things like, I think this is pretty cool, they're gonna think it's relevant. And you do everything, you're not going rogue, per se. But you're just adding that little, oh, look at this cool thing, right? And then you kind of you try to find the most, not shiny, but like more like interesting thing, if you can, and insert it. And then that will pique their interest, right? And then they'll ask you for more. And then you almost kind of get that permission to, oh, that was just something I spun up, you know, last afternoon, right? And then they're like, oh, yeah, let's do that.
I'm not advocating for like, disobeying your manager or anything like that. But like, what I'm saying is, you know, they may not know what's possible, right? And so what we're trying to do is show them like, hey, this might be possible. Would you like me to spend more time on this, right? And you're still do whatever you need to do, right? That's the beauty of having code is you're able to use something and recycle it and improve it. But then at the same time, you can kind of work on other things and maybe slip that in and then get them to to buy in.
Yeah, I think it's one of those things where you try to have that open conversation with the management and where you're getting like, oh, what do you think about next year, us creating like an S2G chatbot or we're using like a LangChain LLM type of stuff. And, you know, they may not necessarily understand, be like, oh, a chat GPT for the organization. And so then they're like, oh, yes, we would love that. And it's almost kind of like, you know, it depends on your relationship with the management as well. Like, yeah, no, that's really helpful. Having a conversation would be like, you know, let's let's just brainstorm what we call like, you know, at least my relationship with the management is like that where you can kind of like, hey, what do you think about like a chat GPT for the work? Like it's going to be a huge effort. Like, don't get me wrong, but you're having a conversation. It's like, is this something worth pursuing? It's just not.
Well, thank you all so much for joining us today and for all the hangouts that you've joined us in in twenty twenty three. Thank you so much, Sean, for sharing your experience with us today as well. Thank you for having me.
Wishing everybody a great holiday and we'll see you all back in the new year. So we'll be back on January 11th with Marin from TD Bank. And I've been listening to a lot of podcasts lately where they do their credits at the end and I realize I don't do that here. So I do want to say thank you as well to all the people from Posit who make the data science hangout happen. So thank you to Hannah, Tyler, Robert, Curtis and Catherine, who will jump in to do that, as well as our creative team who helps us make sure we get these recordings up and the upcoming speaker list. So thank you to Olivia, Maya, Green and Margaret. But have a great holiday season and a happy new year, everybody.
