Enabling Citizen Data Scientists at Dow Chemical with Posit Academy
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everybody. Thank you so much for joining. Welcome to the RStudio enterprise community meetup. I'm Rachel Dempsey calling in from the RStudio office in Boston today. We are streaming out to LinkedIn and YouTube live right now. So huge thank you to Hannah who is hanging out and helping behind the scenes too. If you've just joined now, feel free to introduce yourselves through the chat window and say hello, maybe where you're calling in from. We can see it coming in from LinkedIn and YouTube.
But for today's meetup, we are joined by James Wade, research scientist at Dow Chemical. At RStudio, we get the chance to talk with so many people who are helping others across their company improve their data science skills. And one of those great people being James Wade with us here today. So I'm so excited to hear from James today about the lessons learned through teaching, community building, and collaboration with scientists and engineers across Dow.
Just a few notes after and during James' presentation, we will have lots of time for Q&A as well. So you can put your questions into the YouTube chat, LinkedIn, or use the Slido link as well for anonymous questions, which we've shown on the screen here. But for anybody who's joining for the first time, this is a friendly and open meetup environment for teams to share use cases, teach lessons learned, and just meet each other and ask questions.
So thank you all for making this a welcoming community. We really want to create spaces where everybody can participate and we can hear from everyone. I really appreciate how kind and helpful this community is, and just like to reiterate, we love to hear from everyone, no matter your level of experience in the field or the industry that you work in. So always just a nice reminder for us all to be nice in the chat too.
But with that, thank you all again for joining us. I would love to introduce James and pull him up on the stage here virtually as well. So James is a research scientist working in the chemicals manufacturing industry as part of a research and development team. James has a special interest in sustainable materials design to develop new capabilities for research. In talking with James, it's clear he has a passion for helping convince others to give data science a try as well. So I'll pull you up over here, James, and share your screen. Great to have you here.
Introduction to Dow and citizen data science
Thanks so much, Rachel, and thank you for giving me the opportunity to come here today and share our experience with RStudio Academy. We have been focusing on upskilling what we're calling citizen data scientists, certainly not a new term, but one that we use readily throughout our research and development group. And before I talk in detail about our experience with RStudio Academy, I want to start by telling you a little bit about who I am, a little bit about Dow, and a little bit about our research and development group and how RStudio Academy is a key role in upskilling our researchers within this R&D community.
So I'm originally from South Carolina. I have my journey through the United States on the map here, and I continue to move further and further north. I'm now located in Midland, Michigan, and if I go any further north, I'm going to have to get some immigration documents. I'm a chemist by training. I have a PhD in chemistry, in particular analytical chemistry, where in my undergraduate and graduate work, I was mostly focused on measuring biologically related materials, mostly proteins with a focus on diagnostics.
But now at Dow, I work on, as Rachel mentioned, characterization of materials. I have a particular interest in looking at sustainable materials design and development, and also with a data science focus, have a strong interest in high-throughput research automation and how we can use tools, our tool belt of statistics and AI and ML to advance our material science innovation.
So I mentioned that I'm here in Midland, Michigan. This is where Dow is headquartered, and I want to take a few minutes to describe what Dow is focused on and how our research and development group plays into that so that you can get a better sense for the types of people that were targeted for this RStudio Academy program. Dow is a very large company, though I wouldn't be surprised if many of you have never either heard of us or don't know of the products that we develop, and that's because we're a B2B or business-to-business company predominantly.
Many of our product development and, in turn, our research efforts are aimed at addressing many of the global megatrends that I'm sure all of you are familiar with. These are things like climate change and a huge growth in the emerging middle class, a significant investment across the globe in digital transformation, and trends related to urbanization. As a material science company, we're focused on developing material science innovation across areas touching packaging, infrastructure, consumer, and mobility.
Now I wish that I had time to tell you about particulars of our innovation. It really is what we get excited about every day coming into work, but the overall key takeaway here is that we're looking to apply science and engineering expertise, along with data science tooling, to innovate in this material science arena.
the overall key takeaway here is that we're looking to apply science and engineering expertise, along with data science tooling, to innovate in this material science arena.
Within R&D in particular, we have a broad suite of capabilities. We have a few thousand researchers that are working day in and day out towards that innovation, and our capabilities span many things that relate to data science in terms of the amount of data that we're generating. We have robust high throughput and analytical capabilities. Analytical, in this sense, is how I identify, which is getting excited about making very careful measurements of our most sophisticated materials that we develop. We also have robust application testing, a significant in silico modeling capability, and really what brings me here today is a growing and significant data science present to take advantage of all of this enormous amount of data that we're generating.
What does citizen data science mean?
I mentioned at the beginning of the talk of we're using RStudio Academy to enable citizen data scientists at Dow. So what do we mean by that? Well, us as researchers, we come in to work every day and either are generating, working with, or both, a set of messy, unstructured, and scattered data. Many of you, I'm sure, are used to working with some messy data, but some of the particular challenges with the material science and chemistry landscape is representation of materials and polymers, molecules, things of those natures introduce some distinct challenges related to data that make it even messier than what you might anticipate from a naive perspective.
Where we see a lot of value, in particular with data science, is actually taking this significant amount of data that we generate, take a researcher that we deeply embed into this data analysis workflow, and they can take their subject matter expertise, say an analytical chemist like myself, or maybe a synthetic chemist who's an expert at making new polymers, and they can apply that expertise in how they analyze their data. It's that combination of subject matter expertise with those data science skills where we see a significant amount of value.
Data science as an interdisciplinary endeavor
Now, data science is a bit of a nebulous term, and you can sort of map onto the definition whatever you would like, but one useful representation that we have internally is this example of a combination of domain expertise, math, and statistics, along with computer science. Now, I imagine that many of you have seen something similar to this in the past, and in particular of how this applies to research, we see the combination of domain expertise and math and statistics. This would be the skill set that people typically come into our research environment with. We would call that our traditional research.
When we introduce additional software skills and computer science expertise, we can do that with some math and statistics that's going to fall under this machine learning branch. A third piece that's often used in that's often underestimated in terms of the value and significance for innovation is the combination of our domain knowledge with computer science skills that we term data stewardship, and it's at the intersection of all of these of where we find data science.
Now, again, we didn't come up with this representation. We're just borrowing it from others. The key here is that we're not necessarily looking to find a single expert that can touch all of these areas. Rather, we're looking to build a community of innovators that collectively can come and advance our data science efforts leading to new material science innovations.
Towards that end, we have this citizen data science program that aims on taking this data collection, wrangling, and analysis, and translating that into decisions or actions that we can take for innovation and ultimately wanting to make more money by selling new materials that are addressing our customer needs.
This aim of the citizen data science effort is to provide a governance framework for best data practices, along with the behaviors to adhere to that governance framework that are all data centric. When we pair the citizen data science foundation along with a technology base, we have significant investment in our digital infrastructure, behavior, and technology comes together to form our data foundation, again, with the aim of getting to decision making, actionable insights to create new value.
When we think of the goal of a citizen data scientist within our research and development group, ultimately, we're aiming to construct a tidy data project. Now, in an RStudio meetup, I doubt I have to define tidy data, but the overall aim here is to have a collaborative, transparent, and reproducible workflow. Each one of those descriptors is key for packaging our projects into leverageable components that can further their overall innovative pipeline.
Guidelines for success
To supplement this citizen data science program, we've defined a series of guidelines that we believe are critical towards the success here. Now, I'm not going to go through all of these that you see on the screen here, but they fall into four different categories. The first of which is data organization. These are things that I imagine are familiar to many of you, like storing our data in tidy data tables and storing our data in a managed system of record.
Another category here is the data analysis, where we want to separate out our data storage and our analysis workflows. And we also want to create some transparency in terms of how we analyze our data. This is looking at things like Git-based version control to ensure that we have good reproducibility and reliability on these methods.
Third is data access. This is a particular challenge for much of our research where we're using often proprietary data formats. This is where we want to make sure that we can store data in a way that's going to be accessible for the long term. We can have organized data, but it needs to be both organized and accessible in order for us to have that durable value out of that.
And then in particular, looking for long-term value preservation, we're teaching our citizen data scientists to package their analyses so that they can be leveraged by others who might not understand all of the intricate details of a particular project, but they can go and still leverage those key subject matter expertise that's encoded in that package so that it can continue to further innovation long past the lifetime of the individual project.
We have a significant program named across all of these guidelines here. They're easier said than done, but this is our ambition for where we're aiming our citizen data science program to head towards.
Welcoming new users: the RStudio Team workbench
Of course, I'd be remiss if I didn't mention the tools that we use for this. There's a lot of different options that you can go choose. You'll probably not be surprised by what we actually do choose for our environment, but it's critical that we have a workbench for our citizen data scientists to work in. And here are the criteria that we look at to evaluate the platforms overall.
Number one is we want to make sure that we have both an approachable and a collaborative workspace that will welcome new people that either are new to the company or new to applying data science to their research efforts. The second is that we want to have a modern, capable environment that has extremely low barriers to our exploring ideas. So it needs to be collaborative so that new researchers can work with those who are more knowledgeable in this data science space. And we want to make sure that getting started is something like a 10-minute exercise where you don't have to go learn an entire ecosystem in order to start applying code-based data analysis to gain additional insights from your data.
Third, we want to make sure that we have an extremely safe and secure environment that has robust data connectivity. So we're not spending much of our time trying to get access to the data. Instead, we can focus most of our time in analyzing that data, again, ensuring that those connections are made in a safe and secure manner.
Lastly, this is a critical piece, which is making sure that you can actually deploy your insights to other people. To get value out of your data science work, you have to be able to share that with others. And so a critical performance metric for our workbench is having these simple and secure and, of course, rapid developments as close to a single push-button deployment as possible.
I said that you wouldn't be surprised. In our case, we use RStudio Team and take advantage of the Workbench, Connect, and Package Manager servers to collectively form this RStudio Team product for most of our workbench efforts in Dow. Now, this isn't our only tool that we use for data science, but it's certainly one that I'm most excited about and we're able to generate a lot of value from.
How do we build this? Introducing RStudio Academy
Okay, so I've talked a little bit about what we want to build, focusing on that tidy data project concept of being collaborative and transparent. I've talked about where we're going to build this work using the RStudio Workbench or Team product. Now, the question is, how do we build that? If you're like me, before RStudio Academy, you went and did a simple Google search. You'll find with these searching that you have an immense amount of free online resources that are available to you to go learn pretty much any programming language of interest.
You could spend all of your time for the rest of your life, I think, trying to consume all of these open, available resources for learning programming. However, what we found internally with our citizen data scientists is that they just didn't have the time outside of work hours to go and learn something new and add it to their role. So, in particular, RStudio Academy has met a critical need of empowering people to work during business hours to go and learn these capabilities.
Now, many of you might be unfamiliar with RStudio Academy overall, so I want to take just a few minutes to talk about some of the specifics of what Academy is and why it was so useful for our citizen data scientists here at Dow. You can go to the linked address here and learn all about the details, probably in a more eloquent and organized way than I'm going to describe today, but I want to talk about the experience that we've had so far.
As you might expect, there's going to be a series of lessons. These are interactive tutorials. If you've looked at the learning section of the RStudio website in the past, or looked at some of the learning modules on the RStudio Cloud, I'm guessing that you'll see something a very similar experience with Academy. What sets it apart is, I think, sort of the rest of the components here.
The project milestone is a challenge to Academy participants, where their aim here is to recreate something like a ggplot of a linear regression, for example. From this, you're taking the lessons, much like many other curriculum you might see, and you're applying it in practice. It's the practice piece that I think is so critical to this.
You will then bring everybody together in group sessions, where you can showcase not only the milestone recreation, but a milestone extension to say you've taken some core concepts and then can apply them to go and see where else you can take these capabilities. It's these milestone extensions where I've learned a tremendous amount from the other participants within RStudio Academy.
I've been quite impressed, even in the first weeks of learning new things, where I thought that I had already learned all that there was to know about something like the RStudio development environment. These group sessions are ways where not only you can show off what you've done, but you can also learn from your peers and mentors. For our setup, we had mentors both from Dow and from RStudio, both of which were critical to ensure success of the learners, where we had the programming expertise and strong familiarity with the content, paired with Dow experts who could translate what the students were learning week to week into practical lessons for how they could apply it to their own research.
Also with Academy, there's an opportunity to meet with the mentors in an office hours type setting. These also were critical to ask questions that maybe people were a little bit too hesitant to ask in a larger group setting, and also a way to focus on getting particular advice for their own research projects that might not apply to the larger group setting.
And the final piece here is something that may sound obvious. However, the daily practice of making this continuous learning within the data science realm a habit, I think that can't be overstated. It's something that has been critical for me in my data science journey, and I think that it's one of the best ways that we can ensure that through a little bit of effort each and every day, you can see tangible benefits to create value through projects that are outside of Academy overall.
Now, I talked there through a couple of the specifics about what is included in Academy, but I want to talk a little bit broader about how to think about Academy overall. I'm stealing this analogy from Eric, who I think is on the meeting with us today, where we're not really talking about going to a training class where you're going to be watching videos or listening to a lecture series. I emphasize that practice component. Really what you're doing here is going and learning a new skill, much like you would if you were practicing the piano.
Now, you're not going to be playing advanced sonatas on day one, but you still can produce music. You still can produce value. You still can produce insights from that, and it's that continuous learning concept that I think really is critical where you're developing this habit as an apprentice participating in this Academy program.
Nuts and bolts: cohort structure and candidates
Okay, so let's talk a little bit about nuts and bolts. For Academy, you need a couple of things, and now these are our experience. I welcome interruptions from Eric or others if anything I'm saying is no longer accurate, but for us, when you're putting together a cohort, you need to have, of course, people who want to learn. Looking at something five to seven participants within each of your cohorts, I was mentioning the mentors before. You need one mentor from RStudio and then another mentor from your own organization.
As I mentioned previously, the key role of the mentor from your organization is to translate the learnings into the systems that you already have in place for your company, for example. The third piece is you need a project to work on. The closer the project is to your work, the better. You're going to be spending a significant amount of time working on this project, so it's worth the upfront investment to make sure that it's one that really makes sense for your workflows.
Thinking about who could participate, who would be a good candidate for Academy? The number one predictor of success in our experience is intrinsic motivation. That might sound somewhat obvious, but you need to want to do this. You need to really care if you are learning the content within Academy. Secondly, a good candidate will have a project that they can go and apply the knowledge that they're learning right away. Ideally, they could do it in parallel as they're learning the key concepts within Academy. And third, you need to have both the time and support in order to participate. Practically, you probably want to make sure that your manager, your boss, whomever, supports this program and wants to make sure that you can dedicate time at work during working hours to invest in upskilling yourself through the Academy curriculum.
All right, so on the flip side of that, who might be a bad candidate? Who probably should not participate in Academy? Now, there's lots of examples that I could think of here, but I wanted to highlight a couple that you might not think about. Number one, you're an experienced R user. We had a number of people say that they knew R, but what they were looking for was wanting to do it the right way. I can say that from experience here, that probably means that you either are too knowledgeable to participate compared to the other members of your cohort, or you don't really have enough of that intrinsic motivation.
You're not necessarily learning a particular set of the right way to do things, rather than learning maybe some best practices that you can apply with your peers. Also, thinking of this experience here, if you've already regularly are building and deploying Shiny apps, probably too advanced. If you're a regular contributor to Tidy Tuesday on Twitter, again, probably too advanced. Maybe the number one giveaway, if you are able to successfully achieve writing some new Tidy evaluation code, and you don't have to go look on the RStudio community or go through the docs, yeah, you're definitely too advanced for Academy.
The other category to look out for here is people who are excited about learning how to code, but they're just thinking of it as a new skill. We had a survey that we sent out before our initial cohorts with Academy to get some sense of the community of why they wanted to participate. We had a number of different answers along the lines of, yeah, I think this would be great for a skill to add to my list. You probably don't want to select those people for RStudio Academy, predominantly because they don't have enough of that intrinsic motivation to want to go and complete the curriculum and then apply it to their work. The other thing to look out for is if you don't have a project to apply it to, it's likely that the skills that you learn, if you don't maintain that habit, if you're not developing this as a continuous practice, you're probably not going to have that good of an experience.
Okay, enough of the negatives. What makes a good cohort? I mentioned that a cohort needs to have somewhere between five and seven people in it. And Dow is a very large company. We have around 36,000 employees. I mentioned a few thousand within our R&D groups spread across the globe. This made picking our cohorts a bit of a challenge. What we've learned from this, we've had several now, is the components that make a good cohort are, number one, the people are from a similar work group. It doesn't have to literally be the same thing that people have the identical roles. In fact, we've found some good outcomes from having people at maybe some of our highest level of technical leadership roles and some people at the various, the earliest entrance into the company. Those seem to work okay together.
What is more important is that the types of problems that they're thinking about on a daily basis are similar enough that they can discuss with how they could apply the new skills that they're learning to their own work. Now, again, a nuts and bolts component here is you want to make sure that you have a similar time zone. You don't want people struggling to attend the group sessions. Those are critical to success, as we've heard from our participants.
And then the third piece here, something that we sort of learned the hard way, of making sure that everybody within a given cohort is of a similar skill level. I had that long list of identifiers that might suggest that somebody is too skilled for Academy. You want to make sure that if you do have somebody with advanced skills participating, that they're well matched with people who are at least close to their skill level. You probably don't want a brand new beginner with somebody who's regularly deploying Shiny apps, for example.
Survey results and participant feedback
Okay, so let's talk about what people thought of this. We surveyed our first cohort, where we had about 19 people participate, to see what their experience was like. I have two quotes that I pulled out from the survey here. The first one is talking about the fantastic user experience they had, really enjoying the bite-sized learning messages that they got from the lessons themselves. And the second one, which was particularly encouraging to us, was hearing how highly they recommended this program for other researchers within our company to go and apply to their own work as well.
This also gives some hopefully helpful markers in terms of the time commitment that you should expect. The guidance that we gave in advance was to spend about 30 minutes a day on the curriculum, not including the group sessions, so maybe adding an additional hour to that. You can see from the learners that we surveyed that most people were spending between three and ten hours per week on Academy. A few learners were spending more than ten hours, and a few maybe were a bit too advanced, spending less than three hours per week on the content.
We also surveyed them about whether they thought it was a good use of their time, if they recommended it to others, if we should invest more money into this, and the criticality of the mentors to this. Now, these might be a little bit difficult to read, but pretty much everybody said that they either agreed and strongly agreed with a few being neutral. But overall, what we saw was quite positive feedback.
Maybe the most important metric from this was, are they still using the content that they've learned? Surveying them six months after they left Academy, we found that 16 out of the 17 of the survey respondents were still writing code at least once per month, or more frequently. The one person out of the 17 who isn't, let me know that she took a new role that was as more or less a data science manager, and so she wasn't writing code on a regular basis, but was still applying what she learned every single day, and in fact, more so than when she first started the program.
16 out of the 17 of the survey respondents were still writing code at least once per month, or more frequently. The one person out of the 17 who isn't, let me know that she took a new role that was as more or less a data science manager, and so she wasn't writing code on a regular basis, but was still applying what she learned every single day, and in fact, more so than when she first started the program.
We were blown away by this feedback, and it really encouraged us as we started to shape out future cohorts. I will admit that we probably had a bias in the amount of enthusiasm of our initial participants, so I don't think we'll have quite the same level of positive feedback as we survey future cohorts, but I do anticipate that we're going to continue to see really positive outcomes from our participants and Academy overall.
After Academy: next steps and community building
One point that we're still working on today is, what do we do with people after they graduate from Academy? Academy, at least for our curriculum, and it will be different for yours for sure, but we didn't focus too much on reproducibility and deployment, but sharing your analyses, as I mentioned for the criterion for our workbench, the ability to share what you're creating, share your insights, that's really where all the value resides.
So, looking at some future developments for our own internal efforts after Academy, we are wanting to find ways to offer more advanced lessons. I will give a bit of a challenge here, which is creating some of the more advanced lessons is a bit of a challenge in that we can't look to RStudio for as much help here, because most of the specifics are actually something that is related to us as a company, not necessarily to a generic data science concept.
We also have a focus on transferring people from the Academy environment onto our local workbench, where they can go and apply that to our most sensitive or internal data. There certainly was some work going on alongside the participation in Academy, where people were using our on-premises capabilities, but smoothing that transition is something that we could work on in the future.
And then the last piece is focused on community building. Now that we have more and more people that are able to participate in Academy, we want to have a landing zone for them to come and continue to learn from others. We have heard a number of times that people miss Academy, they wish that it could have gone on longer, and that community building is a critical piece to ensure that people have that same positive spirit that they can apply their data science skills continuously.
A success story: from Academy to Shiny app
I want to close here by telling a little bit of a story that came out of Academy, which is one of our success stories of what an Academy participant was able to create with a little bit of additional effort. She got particularly excited about the capabilities that she was building, but this is an Academy graduate who did not have prior experience with R prior to this experience and what they were able to go and create.
In particular, the need here was that there was a piece of equipment that was used for analyzing some of our products, particularly its polymer analysis tool, where users could come up and use this instrumentation and collect their own data. A benefit of this is that they don't have to rely on subject matter experts to go collect this, but there was a gap in that people were unable to go and analyze their data without advanced training.
As a solution to this, a Shiny app was built that allowed the user to come and to analyze using enough subject matter expertise that was not encoded in one-on-one mentoring. Instead, it was encoded in code, where you could go and apply that code to analyze the data and extract those valuable insights. The app itself had a number of different components, such as selecting and plotting data, analyzing that data, automatic report generation, viewing of historical data, and then an ability to go and inspect calibration curves to ensure that the results were, in fact, of value.
Now, I will say that this was not solely the Academy participant's effort in contributing to this, but she is a very valuable and active contributor to this capability and is currently becoming the predominant maintainer of this codebase. So, Academy empowered her to be able to continue to learn, tackle new code, tackle new packages, and be able to apply that to a tool that is creating an improved employee experience for both the developers of the tool themselves, they don't have to spend as much time coaching people on data analysis, and also for the end users of the Shiny app, where they can have a much streamlined approach and are no longer reliant on others to extracting insights from their data.
The last thing I want to say here is a huge acknowledgement to many of the people that made this successful Academy partnership possible. I want to give a particular shout out to Tony Sokolov. I saw him in the chat here, so I know that he's watching along with me. A huge thanks to both Dow and RStudio mentors for all the work that they put in to make this program a success. And I want to thank all of you for giving me the opportunity to speak about this today, and I'd be very happy to take any questions that people have about our own Academy experience or about the questions you might have for your own. So, thank you very much.
Q&A
Thank you so much, James. That was great. We're all clapping, even though you can't hear us right now. I see there's quite a few questions coming in right now, so let me organize myself here. But just a reminder to everybody that you can ask questions on YouTube Live, if you're watching there, or on LinkedIn, if you're watching there. And if you want to ask questions anonymously, you can do so as well. I just put the link there on the screen.
But I see Laura asked a great question over on Slido, and it was, would be fun to learn. How do you handle those that know coding would be useful to them, but can't motivate to take five steps back to take 10 steps forward?
Yeah, that's probably our toughest use case. But typically what we'll do in that case, because we have a somewhat large base of researchers, we typically will pair them with somebody who might have that capability, where you could accelerate their individual project work. So you pair somebody like an RStudio Academy graduate, and have them work side by side on potentially going and doing the actual project itself. But if it's the situation where they predominantly want to learn these capabilities themselves, if it's somebody who really just wants to learn that, but doesn't have a use case right away, our recommendation in that case is to have them go and take advantage of the free online resources that all of us, I'm sure, are familiar with.
So I think it's probably an unsatisfactory answer. I know that I would probably be unsatisfied if I got that from my survey response that I filled out. But it's really a question of, again, that intrinsic motivation. If they're not excited to learn the content, we think that there might be other opportunities for them to go and learn that instead. I guess one thing I will add as well is, if it's a time issue, we did spend a good bit of effort to convince leadership, our internal leadership, of the value of these sorts of skill sets. So having a base of projects that have already created value from this is a really good argument, I think, for dedicating the time.
Penny asked a question, how can more advanced users participate in developing curriculums? Yeah, that's a great question. What we have done is we, a group of us, it was predominantly Tony, I should say, was heavily involved in developing the curriculum for our learners. So it was a curriculum that was based off a real data set that was sanitized and anonymized in a way that would allow us to release it externally. So what I would look for, for the more advanced users, is to identify use cases where you have a good data set that might be representative of what other people could see.
I also would encourage anybody who has the skill set, is already familiar with a Shiny app development, to go and try to create some modules of yourself. The caveat I would give there is that there has been tremendous thought into developing the Academy curriculum overall. Don't let your leadership convince you that you don't need this, that you can just do it yourself. It's a fantastic curriculum and I would encourage you to try to take advantage of that rather than try to roll your own. For the more advanced content, maybe take advantage of some publishing capabilities like maybe a R Markdown-based website or maybe a Quarto-based website if you have some early adapters out there, where you can go and create some interactive or at least some simple websites that you could publish for others to go and take advantage of.
Another question over from YouTube is, does Academy also teach good coding style and version control? The absolute curriculum, like the particular curriculum that you'll see, will be based off of your own desires. So if you have a strong desire to include Git-based version control, that is certainly something that can happen. It was not covered in ours in favor of other topics. There's a lot to cover in 12 weeks. We wish we had more. The coding style will come out of all of the tutorials are written and following, I assume, the tidyverse style guide. That's just how I learned, so I don't know anything else from that. But you can select amongst the list of topics whatever is important for the participants in the individual session.
Someone had asked on Slido anonymously, if you're trying to sell Academy to the individual who would fill the group mentor role, what level of commitment and bandwidth do they need to have? So I'm actively trying to sell that role as we're shaping future cohorts. The most significant piece there is the skill set that you need to have is a familiarity with how data science is done at your organization. You don't have to, for example, be an expert in R. We have some experts in Python who have already participated as mentors. And in terms of effort, you need to have, I would say, about three hours a week that you could dedicate. The first hour would be attending the group sessions. The second hour would be attending an open office hours period, which is somewhat optional, but I would encourage a mentor to attend. And the third is being able to answer questions sort of as an ad hoc basis.
Now, I would say that three hours a week is probably the floor. I would encourage a mentor to dedicate at least a half a day, if not more. And what likely will come out of this is some projects worked where you can work one-on-one with some of the participants to translate what they're learning. And you'll see opportunities where you could streamline some of their code where if they haven't really gotten deep into dplyr yet, and you can see a way to have a streamlined pipeline, that's something that would take a little bit more effort, but I think it would be worth it to accelerate some of the learning.
Someone just asked, is the type of data you work with relevant to the work you do at Dow, or is it random data sets regardless of which company you're with? For our use case, it was literally data from a project. It was not a successful project, but it was a real project. So it was as real as we can get in terms of the data. We did simplify it a small bit to not intimidate people too much from some of the volume or maybe the messiness of the data, but it's as real as we could get and still let sort of people outside of the company see it as well.
So somebody else asked, what other ways of teaching R have you tried or considered at Dow, and how does Academy compare? I can speak from my own experience in what is not the right way to do it. I try to organize a local section amongst my technology area of having meetings every couple of weeks where we would cover maybe a chapter or two of R for Data Science. I imagine a book that many of you are familiar with. What I found was that people were able to quite easily answer or complete the challenges throughout those chapters. What was missing there was the actual community building and the practice. So people were able to hear me talk about why I got so excited about the capabilities, but not as much about how they could go and translate that to their own work. It was more they were understanding the tools that I were building rather than doing it in practice.
The other aspects that we've looked into are some of the open, I'm forgetting the name of it, but the online courses that you could go and take. I myself learned from a Coursera course, but that was, I don't know, six years? No, not quite. Five years ago, I think, at this point. I really enjoyed that, but you'll see that there's a significant pool of people who get so excited about the programming that they're enough of self-learners that they can go and do that. So it's sort of maybe an unsatisfactory answer there, but we found that people who can learn this stuff on their own, and for the most part, already have. That's where Academy fits in is addressing the people who want to learn it, but have not been able to find the bandwidth to go and learn it on their own.
There's another question coming over from YouTube, and it is, what is the duration for the training in RStudio Academy? So for ours, it's 10 weeks of content. There is a break week built in. It was impossible for us, and I think for everybody, to find a continuous set of weeks where nobody's going to go on vacation. So you have a catch-up week, and then there was also a week where we had somebody from RStudio come in and share some of the capabilities of the RStudio Connect product. This is something that I mentioned at the beginning of the talk that we already have internally, and it was kind of a demo of the capabilities of that, where we actually put some additional content to take people from seeing it presented from the RStudio perspective to how you would actually do that internally.
Let's see. One is, can you have multiple groups go through at the same time or just one cohort, and what if we wanted to upscale hundreds of people? We have had three cohorts and four cohorts go at the same time, and we are currently wondering how we can go and scale it. I will say that the limitation on scaling it to hundreds of people is more on us. We want to make sure that we have just as good of a user experience for the Academy participants as the 19 people who did it the first time, as if we had 100 people or 200 people do it at the same time.
I'm questioning if we have the bandwidth internally to make sure that we're giving everybody the attention that they would need. In particular, here is capturing them from they see something that they want to go do and then go actually do it in practice. You're naturally going to feel frustrated and get very used to seeing errors in the R console whenever you're learning a new programming language, but we don't want that to be coupled by frustrations and how we actually go do that in something like, okay, here's a data source that I want to go connect to. Here's a transactional database. That sort of work is something that you would need dedicated additional mentors outside of the Academy mentors to make sure that people are able to go and do that work.
So we've started somewhat slow intentionally to make sure that we get this right. We found that there's a broad level of upskilling to that. I was talking about the citizen data science capabilities, maybe a baseline level of data science competency. That can be scalable. That can go to hundreds of people. The in-depth training, going into writing code day in and day out, that's something that at least for us, we've started out going slowly and we're, as I mentioned, looking at how we could go and scale that to many more people.
A question that I had is how did you find out who might be interested and get the word out to everybody? We initially, as part of this citizen data science program, we also have a large program related to upgrading our digital infrastructure. So we had a series of focal points and internal focal points that we contacted to see who they thought would be good candidates. Those candidates were asked to fill out a survey. We asked them questions like how interested are you in learning R on a five-star scale? What projects would you like to apply this to? What have you learned in the past? What programming language do you have? What is your level of R already?
I will say that even the people that I know for a fact are quite advanced R practitioners would not call themselves advanced in the survey. So whatever people say, I sort of bump it up one. So if somebody says they're a beginner, I kind of slot them in as intermediate. But those are the sorts of questions that we asked via the survey. We then got together to select the end of it, the cohorts. And that's where the advice came from of how to shape these, the cohorts in the end. And it's, we're still learning exactly what an ideal cohort looks like.
Someone had asked, this seems like a good way to set your company apart from others, advertising that you help learners upskill in coding. Are you hiring? We definitely are hiring. I am not very involved in that. But I can, I see that Tony has been answering some questions here. So Tony might be able to go and grab some links related to that. We're particularly interested in people who have that scientific expertise that they want to combine with data science. And it's a, we are all way busier than we would like to be. So we would love to hire lots of people to come help us do this work.
One other question I see over here is, after RStudio Academy's 10-week training, is Academy still available for questions, support, or consult? Yeah. So the Academy content is up for at least three months. I don't know how long it actually is, but it's, the content itself is still available. I don't know if we have any promises about the mentors to hang around, but I will tell you that they've still been answering questions.
So we have, we added them to, we use Teams as our internal messaging service, and I still see them popping up. What has helped with this is this longer-term, ongoing relationship with RStudio, where we feel perfectly comfortable bothering some folks with questions when we have them, leveraging support as well for some of these capabilities. So I don't know what is available to everybody, but I know that we have not struggled to get answers to questions when people have them.
I will say that I mentioned that community building piece. We're, one part that we've struggled with is actually getting people away from just the Academy community to the larger data science community, and we're trying, one action we've taken for that is having bi-weekly office hours where anybody, it doesn't have to be an Academy participant, but anybody can show up with questions, cool stuff they want to show off, talk about anything that's happening within the R or Python communities. So that's really been, it's growing slowly, but it is growing. That's a piece where I would recommend, rather than relying on the RStudio folks for questions, is to focus predominantly on the community building.
