Data for good, mentoring, and stellar internships | Sebastien Ouellet | Data Science Hangout
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome back to the Data Science Hangout, everyone. I am a community manager working with Posit to foster our Hangout community, and I'm also a Posit Academy mentor. I help professionals do more with data by learning how to use R and Python for data science. I am joined by our lovely host, who is the person who got all of this started, and we have to thank for everything. So, Rachel, would you like to introduce yourself?
Hi, everybody. I'm Rachel Dempsey. I lead customer marketing here at Posit, and I'm going to be hanging out behind the scenes as a co-host today. I'm joined by my other Posit colleagues who are helping gather the questions there in the chat, but nice to see you all.
Thank you. Yes, thank you to Curtis, and to Isabella, and to everybody behind the scenes at Posit who is helping make this happen. One thing that I didn't realize when I joined, Rachel, is how much work it is to do Hangout stuff behind the scenes, so huge kudos to everybody.
We are so happy to have you joining us today. The Hangout is our open space. We want to hear what's going on across the world of data in different industries. We want to chat about data science leadership. We want to connect with other people who are facing the same things as us, so we get together every Thursday, same time, same place, almost every Thursday. Obviously, not on Thanksgiving, and every once in a while we have something like PositConf that interrupts things, but most Thursdays we're here. If you are watching this recording on YouTube and you want to join us in the future, go down in the description. There's going to be details on how you can add it to your calendar, and then you can join us live on Thursdays.
I want to say thank you to everybody who's made this the friendly and welcoming space that it is today. We are super dedicated to making it stay that way, so if you have feedback about your experience you'd like to share with us anonymously, good or bad, or maybe suggestions for topics to dive deeper on, Rachel is going to share a Google form in the chat. There it is, where you can leave feedback, and you can also always reach out to us on LinkedIn as well.
Okay, so this is the part where I tell you this is led by you. This is a community-led discussion, so we love hearing from you no matter your years of experience, what industry you work in, your title, your language that you work in, or no language at all. We really want you to ask questions and connect with each other in the chat. So the chat is your space to do whatever you would like with. It's usually so much fun.
I would talk about your role, talk about where you're based, make sure you include your LinkedIn link or a link to your website. Also feel free to share any jobs that are being posted in your organization that you're hiring for, and then also if you are looking for a role, please let people know what you're looking for so that we can help connect you with hiring positions.
There are three ways to jump in and ask questions today and share your own experience as well. If you just have an anecdote to share, you can raise your hand on Zoom. We will jump in and call you. You can put questions in the Zoom chat. Feel free to put a little asterisk next to them before or after if maybe you are somewhere loud like a cafe or your mic doesn't work, we can read your question for you. Or you can ask anonymously in Slido. So Isabella just put the Slido link in the chat there and we will check that throughout and get your questions asked.
I also encourage you as you're introducing yourselves in the chat today to let us know whether or not you've ever worked in a Data for Good initiative, whether or not you've ever volunteered in a Data for Good initiative. We're going to be talking about that a little bit today. I'm very curious, so get your questions ready for that. All right, I am so excited to be joined by our co-host today, Sebastian Ouellette. He is a staff developer at Kinaxis. Sebastian, welcome. We would love to hear a little bit about you, what you do, and then also a little bit about what you like to do for fun outside of work.
Introducing Seb
Sure, yeah. Welcome, everyone. Yeah, you can call me Seb, everyone. And I'm based in Ottawa. I grew up in Quebec and went to Ottawa for studies. I've done studies in physics and then cognitive science and then landed in computer science. My work typically in cognitive science was really about like computational modeling. And I then moved back to Ottawa between Montreal and Ottawa, then started working as a data scientist, then a data engineer, then as a machine learning developer. And really, it's very similar work across all those titles. It's just different companies like different titles.
I guess just to give a bit of context to it, I guess I've been in industry for nearly, yeah, about nearly 10 years now. So across all of these, I've worked in machine learning applications for things related to weather predictions, things related to physiological responses, things related to just data engineering in general, where we talk about data infrastructure, optimization, and forecasting. So that's a lot of what I'm doing now today at Kinaxis.
So at Kinaxis, we're a supply chain company. It's a lot about related to demand forecasting, supply planning. And when we talk about supply planning, there's a lot in terms of optimization there. And at Kinaxis, my role has grown really into leading the incubation programs, the internship programs. So as part of my main responsibilities now, I'm doing a lot of mentorship, a lot of like ideation, and consulting with different teams within the company, so that I can bring in something related to, again, ML expertise, what's really possible to do with the type of data we've been collecting, what data we need to collect to achieve extra goals.
Outside data, let's say. I'm a big fan of game design, not just gaming, but really a lot of game design, tabletop type of design. I really like swing dancing as well, even though I haven't done it too recently. But again, winter is probably a good time to dive back into it.
And then, yeah. So in terms of my free time outside of my typical work hours, something I also do for fun, but data related, is the data for good initiatives that Libby already mentioned. So it's been maybe seven to eight years now that I've been contributing with organizations like DataKind, with now just recently started with TechNYC, and I've been doing stuff with Data for Good Ottawa, and a few other initiatives mixed in. For example, there's something called data learning code to do, like workshops for people who don't have, let's say, a technical background, who wants to learn a bit about scripting, bits of Python, bits of R, to bring in their jobs that are not, again, programming related, but just being able to write a tool for themselves.
Career path and background
So yeah, so Python is by far the main one. I'm not, I'm like familiar with R enough, but there aren't really too many use cases so far, especially because it's also quite easy in Python to do some interoperability there, and like again, bits of C++ whenever needed. If we're looking for something like a very critical path that we need to optimize, that we call from Python, it's always basically the main language for that, Python.
Right, so my very first research project was related to biophysical simulations. So I was doing physics, and then a summer grant was really like, how do we simulate small particles in the fluid? And then I was like, I really like the simulation part of it. I am also learning Python on the side. I'm learning about cognitive science as well, and I decided to switch, because then in cognitive science, a stream that we have here at College and University, so that's where I went to school. I went to University of Ottawa, then College and University, then UniversitΓ© de MontrΓ©al. One of the streams was cognitive modeling, computational streams.
So as part of this, we are building essentially out of data collected from experiments, from how people would typically reason about things. And also, just to keep in mind, this is before the deep learning boom. So I was doing there in 2010. So again, people weren't really thinking about neural networks too much already by then. So we're thinking, okay, given how people react in experiments, how their visual cortex kind of handles information, what should we try to model to make sure that we have something that looks like how someone would react? Again, we can talk about like agent-based modeling, not the type of agents that people talk about today, which is again, LLM-based. And things that have a bit of a mapping to perceptual differences, as opposed to, here's a bunch of images, and we have very specific labels. It's more like, well, if essentially there are optical illusions, we need to reproduce those as well. Things like this, like the biases that people bring in into their decisions and their perception.
So yeah, it was a lot of rule-based modeling, still some simulation. And then from there to machine learning, it was like, okay, well, there's definitely a lot of statistical inference we're doing already for that type of computational modeling. I want to do some more of it. And then that's where I landed in terms of machine learning, computer science programs.
Internship program and mentorship
One of the main things that we like to do is always give options to the interns. So a number of interns are reporting to me now. And on their first week, I make sure that I have always a list of projects, ideas that we believe are interesting, that we don't know how well they will work eventually. But we know there's something to start with. We have some data somewhere, potentially, sometimes part of the project is collecting data from external sources or producing it. And out of this list, then it's a bit of a skill and interest matching problem.
So what we're doing really is saying like, what really don't you want to learn? What don't you want to do while you're here in the four or eight or 12 months that you're here? And what do you want to learn? What do you feel familiar with? And then they pick projects, I give them time over the week to read about the projects, there's always supporting documentation, and our conversations with me and other team members. And then they get to own that project over the length of their internship. And if they hit a milestone early, there's always flexibility.
Something that we do try to do is always have an intern own the project, be the main contributor, make sure that they can then explain what that project and that the value of that product is to other stakeholders. So we have them interact with product managers, with directors, and VPs often. So it's really part of building those communication skills, building that sense of ownership, which make people care about the projects, make people care about what they're learning a lot more if they're like the single source of progress for a project for a while. And then we always try again to support them with other full time team members, myself included, and doing a lot of meetings across essentially the internship.
So that's part of our program. It's been successful so far. So a lot of these projects, they're moving in two different phases. Some of them released some of them in β we've published a paper and patents related to those projects that interns have been again owning.
Absolutely. And there's something also, there's very valuable having people who are like, let's say you've been at the company for four years. And then you say, hey, could you mentor that person? It also allows people who are just deep into the software, the problems and such to force them to teach it to someone else who has like extremely little context, because again, they're an intern. They may be in their second year of university. So the tools are new, the problems are new, everything is new, as opposed to like onboarding someone who's like another senior developer. Again, they may know the tools, they may have context already. So I think people start from scratch and go, okay, yeah, how do I teach this thing? What is missing from our documentation, from my mental model of how things work? Am I hitting an edge case that's just kind of happened to work well, but now someone is questioning, like, why does it work that way? All of those are really good questions for people who are just kind of, yeah, doing day to day work.
And it forced them to reframe a few things. And I think that's really good for avoiding those bits where there's internal knowledge, like tribal knowledge that's just like in the heads of a few people, having them teach interns, even though the interns will leave the organization after a few months. It makes people realize, oh, we've got those gaps. We have to improve a few things. And then, like you mentioned, it also just allows us to learn a few more things because of those perspectives.
Traits for success
So one important thing is being willing to communicate often early. It's not really a case of, like I mentioned how they own the project and it's easy for someone to fall back to, okay, I own the project. I'll give an update once a week and then that'll be good enough. I see a lot of success when people communicate their successes every single day. They say, I've done this today and that worked out, that didn't work. And then I go, sure. Like the thing that didn't work and you still got time. And then that communication, that constant rapport between the intern, anyone really.
I say intern here, but that's what I've seen across lots of people across our organization. The more communicative you are, the better it is. And how you communicate, that can be very different. That can be tailored to the people. So if it's just like a few lines in MS Teams, good. If it's a few screenshots, a recording, if it's just a call that you have a few times a week, just about this, again, anything kind of works. If it's just like PRs, do you say, hey, here's my GitHub PR, can you review it? Essentially. All of those just allows people to like stay on track, allows people to, again, spread the knowledge because people know what you've been working on. You could do great work, but then people aren't exactly aware of it because they're just not able to see it.
Being curious is the trait that I always, it's one that we literally interview for, I could say. Being very curious about, again, I mentioned like interns poking holes into, let's say, different architectures that are already established. That's something that's always a good thing because it's very easy for people to just keep working in a specific way. And someone who's curious will say, well, I really want to know why it's been done this way, as opposed to accepting it and then falling into this thing where there's a bit less innovation, because again, we're just following a track as opposed to diverging.
The one thing I always do as well with every intern, I do like a one-on-one every week. And that one-on-one is their space to provide feedback and ask questions. It's not necessarily, hey, how is your project going? It's really more, how is your internship going? Like, what am I doing that's maybe lacking or not lacking, or I'm doing too much? What is it that you're missing from, let's say, your environment? And then we can talk about the project, but it's always giving like this space. Again, when you mentioned being vulnerable, it can be scary for people to just go in and like, there's a group chat, there's like 20 other people, and they might not want to just raise a question or anything like this. So I think it's just always this placeholder every week, being able to do this.
Contributing to open source
So one thing that I can provide some bit of context before answering that specific question as well. So across the work that I've been doing for Data for Good, one project that I'm maintaining specifically is open source. And then I'll mention what's lacking from it for people to be easily contributing to it, even though you could, and what projects typically do.
So one thing that you can do is read the contribution guidelines. So for every R packages, Python libraries, any link that's like, again, an open source software, if it's big enough, it'll have contributing guidelines and development guidelines. So that will basically tell you what the workflow should be. The typical workflow is that you'll fork the repo and then create your change and do a pull request from your forked repo to the main repo. And then that will go into that list of pull requests on the original repos. And then mechanistically, it's that simple.
Now, how to make sure that your PR is going to be valuable and read by others and not just decline immediately. Again, those contributing guidelines, they're really important because they tell you like, if you need to add specific types of tests, how you need to integrate it with specific components of the software. And you'll just need to be careful about like a checklist often. So this is really part of the problem. And then you should look at the issues as well that are open on GitHub. Because typically, especially like something as big as scikit-learn, tons of issues. And then those are basically tickets that often people triage already for you.
So some great projects, they say something like beginner friendly. So if it's like your first time touching this piece of software, from a development point of view, or if it's your first time contributing to open source projects, then look for those labels. You can look at the list of labels and see like, okay, that's an easy one. That's a quick one. That's a beginner friendly. People will typically label those. And then just filter by those, look at one that you go, oh, yeah, I could do this. I know how to do it. And maybe it's not like an amazing contribution, because it's like, okay, I need to improve the documentation there, add a few tests, do something like, I'll add this method. And then this method is just like something that's well known already, but it's just not integrated in the package. So those bits where you kind of know you can succeed at it, before trying to engage with the whole rest of the workflow, right?
So you can say, okay, the first one, it'll be like 50 lines, they'll be like, the learning will be how to do all those steps and interact with the community. And then once you've already interacted with the community, typically, open source projects, they don't have like 300 contributors, they have a core group of contributors, and then you'll get to know them. And then there's lots of other contributors who are there like, you know, contributing once and twice, and then that'll be it, right? So just getting to know the community through those like small PRs and being part of, let's say, the Slack, the Discord or whatever to use. Or even just the discussions, like there's a discussion section on GitHub. Often, it's not always, you know, there, but a lot of times it is, and there's a lot going on there that you just don't know, because you haven't clicked the discussions tab.
Working in supply chain
So one surprising thing is that there, you would think that there is a lot more automation for things related to supply chain. But one of the biggest value adds in terms of like supply chain software is really visibility and automation. So one giant thing, like the one of the biggest problems is really people have data from 20 different sources, and they don't talk to each other. So having a single source, where you see everything from all those sources, and you see the interactions, and you see the impacts, the influences, et cetera, that seems to be like just a giant piece of a solution.
And I was thinking when I started, I was like, okay, it's going to be about, let's say, complicated algorithms that solve like those problems in a very efficient manner. And it's still a part of it. But what customers really see is like, oh, I can see my demand and my supplies, like connected. And if the demand changes, I can see what's the impact on my supply orders immediately. And then they don't exactly care about like, oh, yeah, it's going to use this specific method from like this new publication that will solve everything much better. No, it's really like, I want the visibility, I want to be able to share it with others, because people, again, those organizations that are using supply chain software, huge organizations, and they may have like hundreds of planners just sharing info. And they don't want people to just send emails with screenshots of Excel spreadsheets and say like, oh, is this looking odd to you? They want a platform that really just says, can I try something out and share it with someone?
Getting started with Data for Good
Right, and that's honestly the biggest thing I hear here. So when people go, oh, how did you get started? How did you know you could solve those problems? The truth is, like, I don't stop myself from thinking, oh, am I familiar enough with a problem to try to get involved? Or do I have the skills? Do I have the familiarity with the methods? I just let the project tell me if it's very complicated, or I need more time to learn about something. And then it's typically much better to just jump in, and then learn while you're trying to solve it, as opposed to stopping yourself from even applying, or contributing, or connecting with others.
So often, again, like you mentioned, I didn't have supply chain experience before I started. For my two previous positions, I didn't have any experience in the domain as either. So when I mentioned something like weather products, like weather prediction products, I didn't have any meteorological specific knowledge before, but then I learned it on the go. And the same thing for DataKind, for example, like I started doing optimization work there, and I didn't have any experience in there before. Some of it was about data quality assessments in healthcare data. Again, I didn't have any healthcare data specific knowledge.
So all of those specific problems, if you're very curious, one of the ways to get started is look at your local organization. So in Data for Good Ottawa, we had nonprofits here that were just like, we've got data, we collect stuff about the people we serve, the people who are funding us, our campaigns, our operations, but we don't know how to make sense of it. We don't know how to extract value out of it. And then they'll reach out to something like Data for Good, and say, Data for Good Ottawa, there's also like Data for Good Toronto, that's quite big. And then they have chapters like this across other cities as well.
And then they just say, okay, well, we'll put something like a Shiny app for you, where you connect your data, and then there you go. You've got the dashboard, and then a few metrics, a few things that will be more informative than just a giant spreadsheet that's kept over the years. And that's probably the easiest thing to do. If you want to go and commit a little more, often, there will be things like data dives, like that kind of organizes those. There are call for volunteers across other organizations where they say, you have to kind of apply, but overall, it's just to do a bit of skill matching.
Again, you don't look at this and say, oh, do I have really enough computer vision knowledge to like, try to identify fish in undersea footage? Well, if you've done any computer vision, it's not going to be that different from just other camera footage. So just go for it. If you've done a tiny project about computer vision, apply. I mean, it's going to be probably going to be a few times before like people onboard you, like any kind of applications, they may just be overwhelmed with a bunch of applications. Maybe they don't have a need right now, et cetera. Those are not like job postings or they have deadlines typically. So you often just apply to a few, like I'm experienced and I sometimes apply to a new organization and I get nothing back, right? I don't take it personally. I just think like, well, they're probably doing something else. And maybe they just didn't like my profile either. And that's normal.
So it's really just a case of, yeah, look at a few places, go to meetups. So like we have a great meetup here called Civic Tech. So we have like Toronto Civic Tech, Ottawa Civic Tech, and tons of people are just coming together to see, okay, can we try to provide more transparency to government procurement programs? Like it's very Civic Tech-ish, right? And people are like from bike Ottawa, for example, how do we make sure that we have more data related to the, like better infrastructure for non-motorized vehicles, let's say.
Oh, one thing I couldn't mention too, in three, two weeks, there's going to be a data kit event for healthcare related issues hosted by DataKind. So if you Google DataKind data kit event December, you can subscribe. And then there'll be a GitHub repo where you can just contribute. It'll be like very similar to the open source flow we discussed earlier. So if you want to dip your toes into it, and that's just an open application, like if you click register, you'll get in and you can start looking at the data they have, look at the problems they have, if you have a list of issues, and then people will be able to comment on your work and you'll see what else is doing, what else everyone else is doing.
Career advice
Yeah, so one piece of advice I received early on, that I probably keep repeating it, is don't be afraid to bug people about work that you care about. So that achieves two things, typically. The first thing is, if you are curious about something, and you have like lots of people in your organization, and you go, why would I send a direct message to my VP, because they mentioned this in the meeting or whatever, or like something public? Well, the first thing is, typically people will respond. And if they don't respond, they probably were not bothered by it. It's not like really, it might look scary to just send direct messages to people who seem like outside your sphere of typical interactions. And it's just a good thing to remind people that maybe you're waiting on something from them, maybe you have just questions, and people will just answer them, even though you thought that maybe they didn't know you existed, things like this.
So I know that typically for making things work, you kind of need to remind others of a few things, especially if you're like caring about the project. And then not a lot of people are caring about the project yet, because it's part of your project. If you keep emailing people, again, like if I send an email that I don't want to reply to, it doesn't take much of my time at all. And on the off chance that some people you email go, oh, yeah, no, I, you know what, that's interesting. Then you start forming those connections. And that's why I say like, you can bug people with emails. It's not like a phone call, you're not stopping people on the street. You're really just saying like, here's a message. Let's see what happens. And it's how a lot of things kind of work in the end. Like, you have people aware of your project. And then people are forgetful. So if they were supposed to send you some data next week, next week, it's been a month. Well, again, keep emailing them, and they'll send it to you eventually.
Yeah, so one piece of advice I received early on, that I probably keep repeating it, is don't be afraid to bug people about work that you care about.
Closing and the Close Read Prize
Next week Thursday is Thanksgiving. So obviously no meeting on Thanksgiving in the US. We hope that you have a wonderful holiday wherever you are, whatever you are doing.
There's an event called the Close Read Prize. If you guys have ever heard of scroll retelling, they were super popular I feel like 10 years ago. As you scroll down the page on the New York Times, actually a couple of weeks ago, The Economist had one that basically things zoom in or the charts evolve or they become interactive, that sort of thing. Close Read is a Quarto extension that implements scroll retelling with Quarto. And the developers of that package, James Goldies is a data journalist in Australia and Andrew Gray is a data science professor at Berkeley. They released the package over the summer. They have a conference video if you want to learn more. And they just really love to kind of have people play around with it and maybe implement their kind of data-driven document that they've always wanted to exist. So if that's the type of thing you're interested in, I'd encourage you to check out the Close Read Prize. And the package is under active development, so if there's any bug, if there's any feature you think would be really cool to exist, they are all ears for it.
The deadline's December 15th, so it's this perfect, well, not perfect, but maybe if you have some free time. And then we may extend it to the end of January 1st, so it might be a good kind of holiday project. So yeah, encourage you. If you or if you have a friend as well who like might be really excited about making kind of these data-driven documents, check it out.
Awesome. Thank you, Curtis. Yes, we all know that like sometimes things grind to a halt in December, so if your weeks get slow, you have something to work on. And definitely go find James Goldie's talk from PositConf 2024. All right, well, with that, I will wrap up and say I hope you have a wonderful rest of your day. Seb, thank you so much for spending time with us. We had a wonderful time talking with you, and we'll see everybody back in a couple of Thursdays.
