Resources

Tom Mock @ Posit PBC | Data Science Hangout

video
Jun 12, 2024
56:55

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. Welcome back to the Data Science Hangout. I'm Rachel Dempsey, and I lead Customer Marketing at Posit. A few people are actually learning about Posit through the Hangout, so I'm adding this. If Posit is new to you, we are the open source data science company building tools for the individual team and enterprise. I'm so happy to have you joining us here today. The Data Science Hangout is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, and connect with others facing similar things as you.

We get together here every Thursday at the same time, same place, so if you are watching this as a recording on YouTube in the future and want to join us live, there's details to add it to your own calendar below. Just double check it adds it for 12 Eastern time so that you'll be able to join live. I'm learning from the Hangout survey that we shared a few weeks ago that people really enjoy connecting with other attendees in the chat, so if you're interested in connecting with others, I want to encourage you to say hello in the chat and briefly introduce yourself, maybe share your role, where you're based, and something you do for fun too.

Also like to add, it is absolutely okay to just listen in here if you want, but you can also be a part of the conversation happening in the chat or also jump in live and ask questions or provide your own perspective. So you can do that three different ways. You can raise your hand on Zoom. There'll be a reactions bar in the button below. You just press that to raise your hand. You can put questions in the Zoom chat and just put a little star next to it or asterisk if you want me to read it. Maybe you're in a coffee shop or something and we'll have to mute anybody who accidentally is unmuted. And then we also have a Slido link where you can ask questions anonymously too.

You'll notice we have a different featured leader today than our originally scheduled leader who unfortunately was sick this morning, but you will not be disappointed. I am so excited to be joined by my co-host for the day, Tom Mock, product manager at Posit. And Tom, thank you for jumping in here and being our featured leader for the day. I'd love to have you introduce yourself and share a little bit about your role and also something you like to do for fun too.

Yeah, absolutely. Thank you for inviting me. It's always fun to join the Data Science Hangout, although it's my first time as a host. So thanks for having me. My name is Tom Mock. I'm a product manager here at Posit. I overlook the Posit Workbench tool as well as RStudio, the IDE.

I've been here at Posit since about 2018. I want to say it was June the first time I actually joined the company, but I actually officially got hired in August. Part of my onboarding ask was, hey, I'll take the job if you let me go to work week before I'm hired in August, because I was finishing up my PhD. So always be negotiating is my rule for when you're getting hired or looking for moving between jobs.

Something fun. Let's see. So I live in San Antonio, Texas, and it gets awful hot very early in the year. So we're lucky enough to have a nice backyard. So we do a lot of hosting family fun days, we call them, although it's not always Sunday. We grill out. We've got a huge backyard for the dogs to play in. I've got a new son who is walking now. So he toddles around and occupies energy and is a lot of fun.

Tom's career journey

That's awesome. Well, Tom, I know you just mentioned your PhD, but I thought maybe to kick it off, you could share a little bit about your journey of like, how did you get here to this role of product manager you're in today? Yeah, absolutely. I think it's funny that it's like you go to college to do a specific task. Usually you're like, oh, this is gonna be my major. This is what I want to be when I grow up. And you kind of try to set yourself up on a path. So about halfway through college for me, I was like, hey, I really like science. I really like being an athlete and exercise. So maybe exercise physiology is something I could do. So maybe physical therapy school was the right path for me.

I wasn't quite the right path for me, but about third or fourth year of school, I was like, well, I don't really know what I want to be when I grow up. So I will go to more school because that's always the answer to problems is get more education, right? So ended up in a master's degree doing exercise physiology, which used a lot more science and some data analysis on top of that. Although ironically, mostly with SPSS and Excel.

And then halfway through the master's degree, I was like, still don't quite know where this is leading us, but we'll try PhD school because that has more education or opportunity maybe. Halfway through PhD, I was like, yep, still not the right path for me, but I started using R for doing my analysis. And I was like, there's this new field to me, data science. Maybe I want to be a data scientist. So tried to actively explore that route. I did end up finishing my PhD, which was great, but...

So actively started finishing up a PhD. I had actually talked to, I think Curtis was the first person I talked to at what was RStudio at the time about opportunities here, because I used RStudio to do a lot of my analysis.

I ended up on the sales team here at RStudio, now Posit. That was 2018. So this is a very long-winded answer and apologies, but we're getting there. I really do think that sales and customer success is a really interesting role because you're exposed to everything. You still get to do... I was still doing some data analysis in terms of how was I doing as an individual. I got exposed to hundreds of different companies doing data science here using our professional tools and open source, and got to talk to a lot of practicing data scientists.

I did that for about three years. Ended up creating a role called customer enablement, which was mostly around education and enablement of our professional customers, as well as open source users about how to use our tooling. And then after about a year of doing that, ended up in product management, helping out with Quarto as we were getting that launched and off the ground, integrating it into our Posit team stack, as well as just kind of bringing it to the community with workshops and education resources when it was brand new. And then I've now been in the Posit workbench in RStudio product manager role for a little over a year now. And that's been a huge blessing and kind of where I want to be for the next five plus years here at Posit is kind of working in this role.

On sales and selling ideas

Yeah, my secret here is that I think everyone is doing sales no matter what you're doing, right? Even as a true data scientist, you're trying to sell an idea. You're not trying to sell an idea that's false, right? That's what a good salesperson does is try and figure out what's the problem that you have. Do I have something that is valuable to you that can solve that problem and that you're willing to kind of expend money or time or something for?

So always be selling is another kind of mantra of no matter what you're doing, you're marketing yourself, you're selling a tool, you're creating a model, you're creating an application. You have to convince people that it's worth using. And again, you're not trying to lie to them. You're not trying to do the classical kind of used car salesman deal of just like get them in and get them out, whatever you can do to get a sale. You're really trying to see like, okay, here's the problem space. How do I help out this person or help out the company or in some cases help yourself? Like you're trying to sell yourself to get hired somewhere or do something like that.

My secret here is that I think everyone is doing sales no matter what you're doing, right? Even as a true data scientist, you're trying to sell an idea.

Creating your own role and career capital

I think a common one, and so before talking about myself, I'll talk about others in the sense of I know a lot of people are trying to move laterally within the company. And sometimes that means moving outside the company to be hired in a different role. But let's say you're working at a company as like a business analyst or a data analyst. And you're like, hey, my aspiration is that I want to be a data scientist. Right. Like I want to move and grow within the role and I have to advocate or kind of pitch or sell the idea of me being in that role.

There's a great book that I used as kind of like helping design a career or kind of think about your career long term called So Good They Can't Ignore You. It's not a catch all. Right. Like everyone's experience is individual. But So Good They Can't Ignore You is like a nice book by Cal Newport that talks about building a career and this idea of like a craftsman mindset. Basically, you're not always going to be able to follow your exact passion, but developing like how am I going to be really, really good at what I do and enjoy and challenge myself to do that and develop like a passion for the work and really enjoy what I'm doing, no matter what it is.

So that was kind of a non-answer, but the idea of like you're cashing in career capital again, like you're you're selling an idea. And if you're trying to move laterally or create a brand new position, one, you want to be, again, quite literally so good, they can't ignore you. Right. Like a valuable employee. So you're not trying to say, hey, don't fire me. Give me a new role. You're trying to say, don't lose me. Give me a new role.

You're not trying to say, hey, don't fire me. Give me a new role. You're trying to say, don't lose me. Give me a new role.

And so being confident and kind of, again, building a career that you want to have requires delivering value to to your company initially. So I really enjoyed sales. I did it for three years. I did what you would call like run your lane of like be successful, have your customers be really, really successful and therefore continue to invest. But ultimately, I was like, oh, I have other aspirations. I did want to move horizontally. So cashing in some of the like, hey, I've done really good work. I think I could continue making an impact in this space of like unblocking people or educating people or allowing them to realize more value and kind of moving that direction. Yeah. And people bought into it.

What sales teaches you

Yeah, absolutely. Do that. Sometimes both people can be in full agreement and still not get there. Right. Of it's not as easy as people think of like, oh, well, the product should sell itself or all you have to do is get agreement that someone needs to buy something. Everything's a tradeoff. And that might mean, hey, this is great. You're trying to switch roles. We really believe in you, but we can't do it now. It's six months from now because we have to backfill for you or we have to figure out what does it mean to switch you into this position that requires a new job title and other things.

And then for actual selling of like selling product again, you could have people they're like, I love your product. I love using it. It's really valuable to me. I can't convince the people who control the budget that we can move forward with this. And at that moment, it's a no, like, you know, both parties are in agreement, but still the answer is no. And you have to be kind of willing to play the long game together of like, I want to help unblock you. Let's see what we can do to develop more of a business case or say, like, if you don't have this tool or you don't have me in this role, here's what you're missing out on or the lost value or whatever, like just reframing it.

So again, realizing that the immediate no or the immediate pain or the immediate roadblock is not eternal and being willing to kind of be creative or work on it long-term again across years or months. It takes time.

TidyTuesday origins

So Tidy Tuesday is one of the most, one of the most amazing community initiatives. And I love getting to see what people put out from that. And we'd love to just hear from you how it started. Yeah, absolutely. So I think that was ironically how Curtis and I first started talking. So again, I gave this long monologue background of like how I got here and apologies for everyone who had to listen through it, but it really was about this, like, trying to figure out what you're doing with your life. Right. And so part of that was like, I needed to learn R at work when I was at work, when I was doing my PhD, AKA paying someone to let me work, which is an interesting life choice for a little bit.

My boss had a tool that was not R that we used for all of our statistical analyses. So I had to, one, like convince our lab that I can do everything we needed to in R and probably faster, but I also had to learn how to do that itself. So I had to have an outlet that wasn't my day-to-day work because I was kind of concurrently doing that. And I wanted to eventually ended up in a data science role that probably was not using mouse data. And it's like, in an interview, I didn't want to sell myself as, hey, I am a neurobiology of aging person that can help you. I'm trying to sell myself as a data scientist. So more business-oriented datasets, more datasets that are outside of my domain expertise, because there's not that many companies that are hiring data scientists for mouse behavior as opposed to human behavior or something else.

I think ultimately it's like that led to, okay, I need datasets to work on. And I was part of what was at the time the R for Data Science online learning community. I joined that along with Jesse Mosopak and a bunch of other people at the time who were wanting to learn out loud or do a book club and read R for Data Science to learn more about using R for data science. And as part of that, there was little cohorts of people that were like, oh, we're going to create projects that people can work on. And R for Data Science, the book club was like, oh, well, what if we did a weekly challenge where we showed people datasets and then people could work on it and create visualizations? So Tidy Tuesday was like, oh, well, every Tuesday we'll put together a data science project or a dataset. You analyze it with the Tidyverse or R or whatever programming language you want, and you can post it on social media. So we ended up doing that for since April of 2018.

Again, really just before I joined RStudio at the time. And it's been going on since. I've taken a step back from that in terms of like the data science learning community, which is what that is called now. They're managing Tidy Tuesday and the overall community. It's huge. It's amazing. But it's not something that like I could continue to run as an individual. So I'm really happy that they're keeping that going.

I have a deep love for absolutely messy data. And so I think some of my favorite projects or datasets were ones where the analysis was actually here's 14 Excel spreadsheets of varying type and now create a function to clean them and put them into like a tidy third normal frame dataset that's all one and it's reproducible. Like that to me was like, oh, this is why statistical programming or data science or R were useful to me is because I could do things that were essentially remarkably painful with Excel or just unblocked me in ways that it's like, OK, I couldn't have done this on my own.

Teaching and unblocking learners

I do think that reframing it to something they have experience with is helpful. And again, I think that's why I personally struggled with like figuring out a career path so long is because I really wanted something applied, right? It was not very rewarding for me to learn foundational concepts just to learn foundational concepts as much as it's like healthy and eating your vegetables. It was not something I truly enjoyed in my educational background.

So when I'm working with learning or if I'm developing like a workshop, I usually try to present like a little bit of like the foundational, here's how you use something in Quarto, for example, like here's the rules of how it works, but now like here's why that's interesting, right? You have to connect it back to value for the individual. You're not just learning Quarto to learn Quarto, you're learning Quarto so that you can be better at reporting or it saves you time or it allows you to create things that otherwise would be very, very difficult. And not only that, but like in your job, so your role is creating presentations, like here's how we can automate 50% of those steps and the different parts integrating there and saving you time, like whatever, just like trying to drill down into, hey learner, what do you care about? Why are you here? Why are you either paying me money for a workshop or paying me to be your teacher? Like what is, what are we going there?

R community growth and evolution

Absolutely. So I would consider myself both an old hat and a new hat in the sense of, if we're saying like I started using R really around 2016, 2017. So this was, again, right around the time that R for DS, the book, R for Data Science came out. And I remember the first time I used R, I was not a fan. Absolutely not. Like we used it in one of my courses. I was using a tool called R Commander, and I don't want to belittle the tool, but it did, it was not compelling to me in terms of, again, I did not see the immediate value as the learner of like, why do I care about this tool? Because it had some of the limitations of using something like Excel with very little of the benefits of using code.

And then circa 2017, I actually started using RStudio, reading R for Data Science, and I was like, I get it now. Like here's all the pieces together of connecting the dots, like a syntax I can get behind, actually using source control and source code, not just typing it into my console and clicking enter, right? Like I got the benefit of using code.

R as an ETL tool

The question was, what are your thoughts on R as an ETL tool to build data pipeline? I have a few flows that use R, but I struggle to containerize and automate the flow without manual scripts running. Absolutely. Yeah. So there's some really cool tooling out there using things like dbplyr, Apache, Arrow, DuckDB, where you can do real data engineering with R in the sense of like work with datasets that aren't just CSVs and work with like more traditional data warehousing tools or object storage or databases themselves.

Part of the ask there, I think, is around containerization. You're trying to capture an environment and recreate it. There are some pre-built containers out there, things like the Rocker project that are ready to go and already have like the Tidyverse and R and a bunch of other things pre-baked into the container that you might be able to use. I, of course, have to give a plug for Posit Connect in terms of that's one of the problems it solves is not having to worry about managing Docker containers for isolation. You can actually just like publish your source code to connect and it handles the isolation and recreation of environments. But ultimately, it's like thinking about it at that level of like, again, suggestion for Arrow, the R package, DuckDB, dbplyr. They allow you to do a lot of really powerful things in R, but you don't have to like drastically expand your dependency tree, right?

GT extras and favorite R projects

Absolutely. Yeah. There's two that come to mind immediately. One of which is a project I still maintain that was like a area of deep interest for me, which was making tables and really pretty functional tables with R. So there's a package out there called GT for like a grammar of tables or great tables, whatever you want to call it. I have a package called GT extras, which I add to a lot and still have some issues open for adding new capabilities.

But especially during COVID and the height of the pandemic, that was essentially like therapy for me of like, I'm trapped inside my house. I can't really go places. I was able to expend a lot of like creative energy and have a lot of fun writing an R package for making really cool tables with GT and then allowing others to take that and run with it. Again, like start kind of a movement or start something and let that community grow on its own. Like if you look on Tidy Tuesday or social media, like there are lots of people using GT extras to do things I never even could have thought of because they're using other domains. So that one I think is a lot of fun.

More directly at work, there's an R package that we've started called Bauhaus, which is A, my favorite art movement, shout out to Lazo Mahalo Nagy, which is the picture behind me. He was a Bauhaus artist and I think the 1940s, 1930s. But the R packages called Bauhaus around like our warehouse, like our data warehouse that we use internally here at Posit and using that to clean up data sets that typically it's like you're joining a lot of tables, you're doing some ETL processes, you're trying to figure out data about the product that I'm managing and it makes life simpler. So we've been working on that and I think once that project comes to fruition, it'll make, again, save me hours of time and not having to write things manually, which is the dream of any R package or function is that it saves you time and makes life easier.

Building a coding community

Yeah, there's actually some teams made their own internal Tidy Tuesday, actually using Tidy Tuesday data or just using that to kick off the initial energy, kind of the inertia or startup energy to get people together and doing like a book club or a weekly sync. At very large companies, we've often seen that that could end up being hundreds or more R developers, Python developers, data scientists, whatever, coming together and doing presentations. While in other smaller groups, it's like two to five people that are just enjoying the struggle and the success of data science in their group or data analysis in their group.

I think the biggest part, and this is to Rachel's kudos, show up. I think that's the biggest thing is like every Thursday, Rachel shows up. And every Thursday, Rachel has friends and colleagues who show up to help manage adding people to the Zoom call or helping answering questions or just being involved and just doing something every week for an extended period of time. Essentially, if you build it, they will come. There's going to be roadblocks. There's going to be days where you're sick. There's going to be days where you're texting a friend at 1030 at night and being like, are you available, please? And Rachel has lots of social capital to cash in because of all the cool things she does. And so I think that's the same thing for you as an individual in an organization. You might have to bootstrap that first couple weeks or first couple sessions, but then you start being like, hey, other people are getting excited and they have an opportunity to grow or grow their own career or present or even just hang out. But just showing up on a continuous cadence at a specific time at a specific place is often enough to get something started and grow from there.

Balancing coding with the rest of the job

Right. It's a great question. And I wish there was a perfect easy answer. But again, going back to the sales role when I was trying to create new opportunities for myself or even when I was in my PhD, it was side work at some level. So I'm not saying that's what is my current role. But in PhD, I did all. I ran my lane. I did my job and I did it well for what I was supposed to do for my PhD. And then I came in a little bit early every day and did practice or our stuff in the morning. Like when no one else was there, I could just have the lab to myself. And similarly, when I was in sales, it was like I ran my lane. I hit my numbers. I really helped out customers and I used opportunities of overlap of this customer's asking about a new thing to do in R Markdown. So therefore, I'm going to spend some of my time to learn more about doing this. So I can help educate them and unblock them and get them excited about it and make them more successful, which makes us both more successful in like a sales or customer success role.

So if you're not in sales or you're not in education or not in a PhD school, it is often like, again, going back to do a good job and have your core job function, like you have to run your lane to cash in some of the other things. But trying to do that overlap of like being aware of opportunities where it's like, hey, here's a project where if I do use this new skill I'm trying to learn here, I think it'll pay off for me and it will also pay off for whoever's paying me to do my job.

Leading a great data science organization

There's a couple data science leaders I've talked to that I think always have the right mindset, which is you're not just trying to provide value to the business, you're trying to provide value for your direct reports. So as a data science leader, of course you have to say like we are running our lane and providing value to the business and they understand how data science can be used to educate, inform, influence decision-making, or in some cases be the product of our company. But you also have like a responsibility to your team to unblock them, help them grow, help them educate themselves, and be successful at your company. Because that's, again, it's like you're investing in yourself or in your team to make a bigger impact.

I think the opposite of that coin is people who are overly concerned with like, oh, well, we can't do that because that would require them learning new skills, or this is the way we've always done it, and we can't refactor. And I get it, like building a product, I absolutely understand like the fear or the anxiety around, well, we're gonna have to refactor or change this or change how we work and change management is really hard. But being willing to, A, very much engage with the business and help them understand why data science is valuable. And for your team, making them valuable by letting them have cycles to grow, giving them growth opportunities, giving them the tools they need to be successful, as opposed to just kind of rolling over to whatever kind of, oh, well, these are just the limitations we have to live with. Like there's blockers. It may take months, it may take weeks or years, but push on those blockers, don't live behind them.

There's blockers. It may take months, it may take weeks or years, but push on those blockers, don't live behind them.

Thank you so much, Tom. And I know you told me that you have to run right about now, so I want to give you the space to go and do that, but say thank you so much for time to join us today and kind of jumping in here last minute. I so appreciate it. Of course. My takeaway for y'all is read books, have fun, join the data science hangouts with Rachel and stay hungry, humble and kind, which is our motto in the Posit organization. So thank you very much for the time today. And I'm around on LinkedIn and other things if you want to chat.

Table contest and DuckDB discussion

If folks haven't heard about the table contest, I almost don't want to call it a contest because the goal at the end of the day is kind of a community sharing event. However, there are winners and there are runners ups and there's honorable mentions to kind of recognize the awesomeness. So Rich and I have been running it, I think since 2020. And so we've done, I think three or four years. I think we skipped a year. And I think there's like almost over 200 submissions, you know, and each one of those are like kind of beautiful, well-documented. They could be made with GT or Reactable or DT or Huxtable or the list goes on and on.

And basically there's a deadline and you submit, you either submit a table and then a lot of other people have submitted a different type of thing, which was like a tutorial. So they maybe use, you know, R Markdown or Quarto or something else and kind of give like a step-by-step guide to kind of teach people how to do stuff. In fact, Tom Mock, if you look at his blog, back maybe two or three years ago, he was doing a ton of GT blog posts to kind of help him itself understand as well as his customers understand how to take advantage of DT and do all these kinds of custom things and like a lot of styling things that I think his customers were asking him how to do.

The other part of the contest is sometimes people just need like a deadline and a community event to just get you kind of over the hurdle to like make these stuff. So that's kind of what it is. If you want to participate, there's, there's the low end, easy stuff, like make a beautiful table and share it. It could be, you know, HTML, it could be interactive. Yeah. There's shiny tables, for example. Um, if you're, if you've been inclined to make a blog post for your personal website, where you kind of show off how to do something particularly cool or something particularly tricky that you didn't see well documented on the internet, that's also a perfectly good thing to share.

There's a deadline rich and we have all these, uh, table package maintainers and past winners who are going to be kind of like the judgy review process. Uh, and at the end of it too, we are, uh, we put it all into a gallery. We're going to revamp our table gallery, um, where everyone who gives us permission, uh, we'll host it so that, uh, you have this nice large collection of table examples, uh, that are all well-documented or they're, you know, they're like detailed tutorials. Um, so super exciting.

Yeah, I could chime in on that one. I've been looking at DuckDB lately. So it's, it's analytical. So think about, you're not connecting to a relational database, you know, like management system, you have like some flat file. That's really massive. I'm talking like billions of records. And so it's awesome for like lazy loading, quick analysis, and it introduces new syntax to, to SQL. You know, it has its own dialect. It's interesting, you know, slight differences here. Like you could do stuff like summarize from table and you get like full numeric summaries. Like query just, you know, gives you, you know, summaries like Q1, Q3 and all that.

And so it's great in that. It's like brings that power of, you're literally reading off like flat files, parquet files, for example. And then it has also this thing where it can read from like your file tree. So you could say, select all from star dot parquet. And if they all have the same, you know, structure, it'll actually literally, you know, read and then do an analysis on all of the files at the same time. You could have a thousand files, you know, you collect it and it could read JSON, CSV, parquet files. And so it brings that kind of like power of, you know, dealing with, you know, like your relational database to your file system, but it's awesome for like really massive records. And then it has great APIs for Python, for R, you know. So think about what you could do with dbplyr, you know, with your relational database. But now you're dealing with, you know, some flat file and you're connecting through DuckDB.

Thank you all so much for joining us today. I know Tom's gone now on a new meeting, but thanks to Tom for jumping in and joining us as our featured leader. Always nice to see you all. Have a great rest of the day.