How to collaborate effectively with other data scientists (version control, project sharing, etc.)
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello everybody, my name is Ryan Johnson, and I'm a data science advisor here at Posit. Welcome to this month's Enterprise Community Meetup, where we'll discuss another end-to-end data science workflow using Posit Team. Just as a reminder, this is a recurring event that occurs on the last Wednesday of every month, and we hope it'll serve as a good primer for those that are brand new to Posit Team, but also help current Posit Team users become more familiar with our tools.
Now with the holiday season upon us, there's no better way to express your holiday spirit than by sharing the gift of a beautifully crafted, organized data science workflow. So with that in mind, and during today's demo, we're going to discuss tools and techniques within Posit's toolchain specifically designed for sharing and collaborating with others.
Overview of Posit Team
So to kick things off, I first want to make sure everyone here has a solid high-level understanding of Posit Team. So Posit Team, it's a bundled offering of our three professional tools. We have Posit Workbench, Posit Connect, and Posit Package Manager. And if we start right up here at the top with your data scientists, they're going to be creating insights, so writing code within Posit Workbench, and they can choose whichever IDE they want. It can be RStudio, JupyterLab, Jupyter Notebook, or VS Code.
And they can also program in whatever language they want. So for your R developers, maybe they're creating things like Shiny applications, or R Markdown, or Quarto documents. And for your Python developers, maybe they're creating Jupyter Notebooks, or FastAPI, Streamlit, Shiny, Vetiver, and the list goes on. But it doesn't do your developers any good if they can't share this content with the people that need to see it. And that's the whole purpose of Posit Connect.
This is our professional publishing platform, so it gives a home for all this content your data scientists are creating, so that they can easily share them with the people that need to see them. And that could be, you know, coworkers, it could be decision makers at your company, or maybe you just want to share it with friends and family. And then our last tool is Posit Package Manager, which does exactly as its name implies. It helps control all those amazing open source R and Python packages your team uses. It also gives a home to any internally developed R and Python packages. We also just released some really cool new features within Package Manager, so you can actually control access over which packages your team can use.
Why Posit cares about open source data science
Now, before we get into the actual demo, I want to take a moment to talk about, you know, why does Posit care so much about open source data science? In the list I'm going to present, it's not an exhaustive list, but I did want to highlight a few points why Posit is so invested in open source data science, especially in the context of sharing and collaborating.
First, open source data science is free, meaning if you want to share your workflow with someone else, there is no requirement to purchase any software. All the components needed to run the workflow are completely free of charge. Second, open source data science, it promotes transparency. Now, when you share a workflow with someone, they will know everything, and I mean everything, that went into your analyses because they have access to your source code. There's no proprietary or black box analyses that are hidden from your collaborators.
And finally, open source data science allows teams to easily collaborate with specific people, but also with the greater data science community. I'm sure many folks listening to this webinar right now have likely had a data science question and posted it to a public forum and hopefully got a response from the community. This is the type of data science learning and support that Posit wants to encourage.
Collaboration options in Posit's tools
So what options exist today to collaborate and share content using Posit's tools? Well, the first way is to share your source code, and that's where version control comes into play. So many of the tools that we create here at Posit, including the various IDEs included in Posit Workbench, they have integrations to make working with version control, including Git, a breeze. Version control and repository hosting services like GitHub, or maybe your team uses Bitbucket, are some of the most important tools for data science collaboration.
We also have some cool features built into Posit team that can assist with data science collaboration, including project sharing and Git back deployment. So project sharing, this is a feature in Posit Workbench and allows for users to work together on RStudio projects. You can even do live code editing with this feature, which we'll demonstrate a little bit later on. Finally, we'll finish our workflow today by demonstrating Git back deployment within Posit Connect, which can be used to directly deploy content to Posit Connect from a Git repository.
So to go over today's workflow, we are first going to start by creating a new repository on GitHub, which will be the home base, so to speak, for our collaborative data science project. We'll then pull the repository into Posit Workbench as an RStudio project. For this simple workflow, we're going to be creating a Shiny application in R, and we'll show you how more than one user can collaborate on the same piece of code using version control as well as project sharing. And I'll be showing version control features in both RStudio as well as Visual Studio Code running within Posit Workbench. Finally, we'll show you how to publish the collaborative Shiny application hosted on GitHub to Posit Connect using Git back deployment.
Creating a GitHub repository
So to kick off this workflow, here I am within my GitHub homepage, and I'm going to start a brand new repository. So in the top right corner, you'll see this green new button. Go ahead and click on this. I'm going to choose me as the owner, and I'm just going to give this repository a random name. I'll just call it Test Repo 123. For the description, I'll just say this is a test repo. We'll make this public, and I'll go ahead and add a readme. And then I'll hit create repository.
So here's the current state of repository. It's pretty bare bones. The only thing you'll see inside of it is that readme file, and that's pretty much it. But what I want to do next is I'm going to take this repository I just created, and I'm going to pull it into Posit Workbench as an RStudio project. And to do that, the first thing I need to do is just grab the URL for this GitHub repository.
Setting up an RStudio project from GitHub
So let me navigate over to Posit Workbench. And if this is the first time you're seeing Posit Workbench, this is the home screen when you first log in. And I'm signed in in this eValue instance as Publisher1. And I'm just going to open up a brand new session by clicking right here. You can see the four separate IDEs that Posit Workbench currently supports. And for right now, we're just going to live within RStudio Pro. And I'll go ahead and kick on the session.
So here we are within the RStudio IDE. On the left-hand side, we have our console. In the top right corner, we have our environment pane. And in the bottom right corner, you can see our file directory. So I'm going to go ahead and pull in a GitHub repository as an RStudio project. And to do that, there's actually a few different ways you can create a project in RStudio. But my preference is to go in the top right corner. You can see Project None. I'm going to click on this and select New Project.
And when you click on that option, you'll get this pop-up menu to select how you want to create your project. You can create a new directory, which is like creating a new file on your desktop or your computer, for example. You can also create a project in an existing directory if that file already exists on your computer. Or you can pull in a project from Version Control. And this is the option I want to use. Since we're using Git and GitHub, I'm going to select Git. And then I'll paste in the URL for my repository. We'll just leave the name as test-repo-123. And I'm going to place it in my home directory of my project, which is represented by this tilde. And then I'm going to use R version 4.3.2 for this analysis. And just a cool feature of Posit Workbench is you can actually choose from different R versions. And you can have as many R and Python versions as you want installed on your server. And with that, we'll hit Create Project.
Every time you create a new project, it will always open up a new RStudio session. So here we are. And if you look in the top right corner, you can see now we're in the test-repo-123 project.
Creating the Shiny application
So now that we have this project created, we first want to create something. Because right now it's just a bare-bones project. And so I'm going to create a Shiny application. Now within the RStudio IDE, there's actually one that comes built into the RStudio IDE. So if you click in the top left corner up here, you'll see Shiny Web App. So I'm going to use this as kind of like a scaffolding. But I'm actually going to bring in a custom application that our team has created. So I'm going to call this test-app. And I'll just leave it as that, test-app. This is going to be a single file, app.r. And I'm going to place it within our projects directory, which again we call test-repo-123. And I'll hit Create.
So you'll notice down here in the file directory, and I can make my screen a little bit bigger here. And my file directory right here, you can see test-app. That's the directory we just created. And inside of it is this single file, the app.r file, which is what you're seeing now in the top left corner of the RStudio IDE. This is your source pane. Now if you're not familiar with Shiny, Shiny is just a way for you to create interactive web applications using R code. You can also do it in Python as well.
Now I mentioned before that this is like a placeholder application that you can run by clicking this little Run Application button. And you can see here's an example Shiny application. But let's go ahead. I'm going to pop in some code for a little bit more of an interesting Shiny application. So I'm going to go ahead and delete all this placeholder information. And I'm going to copy in some code over here on my screen. And I'm going to paste it in.
All right, so this is a slightly longer application, but it's still not too long. It's less than 100 lines of code. But it's going to incorporate some really cool new features of Shiny that we're really excited about, including this bslib package, which is really great for creating really nice-looking Shiny applications and dashboards. So let's go ahead and run this application and explore it a little bit.
So you can actually see right here I'm actually missing a package. This is actually a pretty cool little tangent we can go on here. So I need to actually install this icon for this application to work. So when I click Install, I'm going to just go ahead and hit this Install button. We're going to get this Jobs tab that opens up. And it's going to go really fast. I'm actually using Posit Package Manager to install this package as a pre-built binary. So it's only going to take a couple seconds. And boom, it's done. So now we can try running the application.
All right, so here's this Shiny application, again, using bslib. And what it's looking at is a dataset called Palmer Penguins. And some of you may have heard of this dataset before. It's a really popular dataset for just doing, you know, exploratory data analysis. But it's actually real data, which is why I like it so much. And you can see we get some nice value cards along the top that summarizes some information about these various penguins. And then you can see some charts in here, like the bill length, bill depth, bill mass, separated by either species, island, or if it's a male or female penguin. And just some cool features of bslib. We can click on these individual cards and expand them so you can see the plot in more detail. So this is going to be the Shiny application that we're going to work on today.
Project sharing in Posit Workbench
All right, so the first feature that I want to go over within Posit's tools for sharing and collaborating is going to be project sharing. Let's say, for example, that you want to call up your coworker or someone on the other side of the world, and you want to have them help you out when writing this Shiny application. Now, if anyone here is familiar with, like, Google Docs, for example, and you know when two people are working on the same Google Doc at the exact same time, you can actually see their cursor and you can see them typing live. And that's a lot of the same functionality that you'll get with project sharing within Posit Workbench and these RStudio projects.
And that's a lot of the same functionality that you'll get with project sharing within Posit Workbench and these RStudio projects.
So I'm going to go ahead and share this project, this TestRepo123 project, with another user on this instance of Posit Workbench. So I'm first going to click the name of the project in the top right corner, and you're going to see this option to share project. So in this pop-up window, we can choose who we want to share this project with. So I mentioned before that on this instance, I'm currently logged in as Publisher 1. Let's go ahead and share it with Publisher 2. So I'll start typing out Publisher 2. And this is typically going to be tied into your authentication within your team. So I'll select Publisher 2 and select Add. And there we go. We've now just shared this project with Publisher 2.
So now we kind of have to use our imagination a little bit, and I'm going to switch roles into Publisher 2. And so I'm going to bring on another screen here, and I'm going to use the dark format of Posit Workbench because I'm here logged in as Publisher 2. And here on the home screen of Posit Workbench, you can see over here on the right-hand side, there's the TestRepo123, and you'll see the little folder icon with the arrow inside of it. That indicates a project that has been shared with me as Publisher 2. And I can click right here to jump right into that project.
All right, so here I am within that project, TestRepo123. And the first thing you'll notice up here in the top is this little P. That stands for Publisher 1. So if I want to follow Publisher 1's cursor, all I need to do is click on this box right here. And you can see it opens up this icon or the window right here with the Shiny application that we were working on previously. And then if I switch back over to Publisher 1, you can see that little P up here. That stands for Publisher 2. And let's go ahead and follow their cursor as well.
So now I'm going to try my best to demonstrate this functionality. So here in the background, we have Publisher 1, and in the foreground, we have Publisher 2. And if I start typing here, you'll notice, I'll just make some comments here, that this edit was made by Publisher 2. And then if I wanted to, conversely, I can come back here, make another edit as Publisher 1, and you can see how that's been reflected right here in the other window. So again, this is a great option for live collaboration. So if there's a piece of source code that you want to work on at the exact same time with somebody else, then you should really check out project sharing from within Posit Workbench.
Version control with Git and GitHub
Now, while project sharing is an extremely powerful tool, we'd certainly advocate for the use of version control and repository hosting services like GitHub, Bitbucket, GitLab, for your sharing of data science projects. So what I'm going to do now within this session, again, here logged in as Publisher 1, I'm going to take all this source code that we've been working on, including the Shiny application. I'm going to save it here locally within Posit Workbench and show you how we can push these changes up to that GitHub repository that we previously connected to.
So the first thing I'm going to do is save this Shiny application. You can see here it's in this red text with little asterisks. That lets me know that's not currently saved. So I'm going to go ahead and hit the floppy disk symbol right here. It's going to save my file. And now to show some of the Git integrations, if you look over here in the top right-hand side, there's a tab up here in the top right called Git. And this is going to show you all the Git integrations within the RStudio IDE.
And currently we see a lot of question marks right now. So the first file is this gitignore file. This was actually added by RStudio when we created this project, as well as this testrepo123.rproj file, which is always created in the home directory of a project, an RStudio project. And then we also have this testapp directory. So I come back to my projects directory, which is right here. And inside of it is going to be that Shiny application. Now, currently we see all these question marks. And that's because Git, version control, it sees these files but doesn't really know what to do with them. Should it ignore them? Should it track them? But we want to go ahead and have Git track these files so that we can push them up to GitHub.
So to kick off that process, the first thing we're going to do is stage all these files. And I like to think of it just like staging for a play. So while the curtains are still closed, you want to make sure everything's set up in the right place, it's staged and ready to go for when those curtains open. So let's go ahead and stage these three objects. So I'll click the bottom one, the middle one, and the top one right here.
So once we have these all staged, we're going to go ahead and make a local commit to version control. And if you see right here, this toolbar, it provides a lot of helpful operations for Git right here within the RStudio IDE. So you don't necessarily have to use the terminal or the command line. So I'm going to select this commit box. And it's going to open up a window here, which I'm going to adjust. And let's just go ahead and explore this for a second. So over here in the top left corner, this is the same window that we were seeing within the RStudio IDE, the three files. And now since they're staged, you can see this little A. That corresponds to being added. So these have been added to version control, they're staged, they're ready to go.
And down here at the bottom, you can see any differences. Now, every single one of these files is completely new to version control, which is why everything's showing up in green. But if you were to make any edits or remove any lines, you might see some red lines in here as well, corresponding to things that were deleted. So if you're happy with all these changes, which we are, we have this new shiny application that we worked on collaboratively with Publisher 2. Let's go ahead and commit this locally to version control. We have to add a commit message here. Now, this is a bit of an art form because you want to make these commit messages, as informative as possible, but also pretty short. So I'm just going to say, create shiny app. And that's pretty much it. I'll hit commit.
You'll see this pop-up window, which shows you some of the git commands that are running in the background, along with its output. So if I close this window, you can now see the git window over here is now empty, but we see this little kind of information flag, letting us know that our branch is ahead of origin main by one commit. And what that is basically telling us is that our GitHub repository, hosted up in GitHub, we are actually one commit ahead of what's on GitHub. So if we're ready to push these changes up to our GitHub repository, we can again use the RStudio IDE to help us out here by simply clicking this push button.
Now, you may have to set up your git and GitHub credentials beforehand. This is something I did before this workshop, just to make sure that my instance of RStudio here can talk to GitHub and that it knows who I am and I have the credentials to make these pushes. So now that we've pushed those changes, let me switch back over to my GitHub repository and let me refresh my page. And so here's the current state of our GitHub repository. We have that readme file, which we originally created. Here's that rproj and gitignore files that RStudio automatically added for us. And here's our test app. Here's that Shiny application we're working on. I can click into it, click on the app.r file and see all of the code.
Collaborating across IDEs
So now that we have the Shiny application on GitHub, it is primed for collaboration. So let's say, for example, that publisher two wanted to pull in this GitHub repository and do some changes to their Shiny application, which we're going to demonstrate right now. So here's the current state, again, of our GitHub repository. Let me go ahead and re-grab the URL. And then I'm going to drag in publisher two.
So I'm going to make some changes to that Shiny application as publisher two, because, again, I'm a collaborator. I want to work alongside publisher one to develop the Shiny application. Now, we could do this in RStudio, but let me just go ahead and show you how someone might do it within Visual Studio Code running within Posit Workbench. So I'm going to open up another session, because you can have multiple sessions running within Posit Workbench. And I'm going to select Visual Studio Code and hit Start Session.
All right. So here we are within Visual Studio Code, and it's pretty much a blank environment. If I click on my File Explorer over here, it's letting me know no folder has been opened. And if we look along the left-hand side, you'll notice there's actually a source control icon. So we can click on this. And let's go ahead and clone in that repository that I just copied the URL for. So I'll click Clone Repository and paste in the URL. And I'm going to place it right here in my home directory. Let's go ahead and open it up. And now you can see within that test repo, one, two, three, and here's my test app with that app.r file being shown right here. And again, this is all running within VS Code.
So let's say, again, as Publisher 2, I want to make some changes. I'm not going to make any dramatic changes here. I'm just going to scroll down to, let's see, line 29. You can see here's the title of this dashboard, Penguin's Dashboard. Let's make that a little bit more informative. Let's say Palmer Penguins Dashboard because that's the dataset that we're going to be using. So we're just going to add Palmer here. I'll hit Save. And you'll notice there's this little one icon over here in the source control. So let's go ahead and click on that. And you'll notice that there's been a change here within this app.r file.
So let's go through some of the same steps we just did within RStudio. So the first thing we need to do is we want to stage this change. And we can do that by clicking this little plus icon. And once it's been staged, we want to add a commit message. And I'm just going to make it nice, short, and sweet. So I'll say change app title. And we'll hit commit. And once those changes have been committed, again, locally here within VS Code, running on Positive Workbench, we can then push those changes up to GitHub. So again, we can just click on the sync changes. And that should push everything up to our GitHub repository.
All right, so now that that's done, let me switch back over to my GitHub repository. And I'll refresh my page. Let's just go into application and make sure that change is there. So we'll scroll down here. And now you can see on line 29, Palmer Penguins dashboard. Now I'm going to switch back over to Publisher 1 here. Because again, now I have a change in my GitHub repository that's not currently reflected within my local code here as Publisher 1. You can still see title is the old just Penguins dashboard. So now would be a great time to pull in those changes from that GitHub repository, those changes made by Publisher 2. And to do that, again, within the RStudio IDE, we have nice, easy features to do this. We already pushed before. Let's go ahead and pull. And once we do that, you can see all the output here from running the pull command. And now if we scroll down in our source code here, Palmer has been added to the title. So again, that's how you can work collaboratively with version control using whatever IDs you want with whomever you want.
Git-backed deployment to Posit Connect
Okay, so we've shown you a few different things now which are helpful for collaborating and sharing data science workflows with others. But now we have this great, shiny application over here on the left-hand side that's been worked on by two separate developers. And ultimately, we want to be able to share this application with whoever needs to see it. And that's where PositConnect comes into play. So we're going to go ahead and show you how to publish this application to PositConnect.
Now, here within the RStudio IDE, you can actually implement something known as push-button deployment. And we showed this in some of our previous workflows. Here's that little blue icon at the top of your screen which you can easily push. And I could manually publish this to PositConnect. And this is a great option, especially if you just want to quickly share an application with somebody or any piece of content that can be hosted on Connect. However, if I publish this to Connect and then I come back to my source code and I make a change, I'm going to have to click the button again and republish. So it's a bit of a manual process.
So what we're going to show you here is something called get-back deployment. So instead of publishing it directly from the RStudio IDE, we're actually just going to have PositConnect look at our GitHub repository. And if it ever detects a change in that GitHub repository, so a commit in that GitHub repository, it can automatically rebuild that content for us.
So to kick off the get-back deployment process, we have a few things we need to do to get started. So I'm going to go ahead and just clear my console down here in the bottom left. And I want to go into the directory that houses the content that we ultimately want to be published on Connect. So here within that test app directory, you can see we just have the single app.r file. And I'm actually going to set this as my working directory just to make things a little bit simpler.
Now, when we publish this application to PositConnect, PositConnect needs to know some information about my current working environment, such as what packages am I using, what versions of those packages, what R version am I using. So that information is needed by Connect so that it can, you know, just make sure it runs this application correctly without any issues. Now, for push-button deployment, this little blue button, a lot of that's done automatically. But if we're going to use GetBackDeployment, we just need to provide that information, and it's super easy to do it.
So from within the directory of the content you want to publish, all you need to do is run from the rsconnect package, there's a function called writeManifest. That's it. I'll hit Enter. It's going to capture some of my dependencies using another really great package called renv. And then it generates this manifest.json file. I can click into it, and we can just take a quick peek, and it's pretty noisy. There's a lot of great information in here that Connect can easily read. But for the most part, it's just information about packages in my environment and, like, what versions of those packages and where I obtained them from. All the information Connect needs to replicate my environment exactly as it is right here within Posit Workbench.
So now that we've created this file, we want to make sure that we also include this file within our GitHub repository. So let's go ahead and just show you that workflow one more time. So we have this manifest.json file that's been added within my Git repository or within version control. Let's go ahead and stage it. I'm going to commit it. And we'll just make a quick message here of add manifest file. And then we'll commit that up to GitHub.
So if I come back to my GitHub repository, we click on test app. I need to refresh here. Oh, seems like I've done something wrong here. Oh, yeah, I've got to push it, obviously. So let's go ahead and push that. There we go. So I committed it, and then I had to push it. Now if I refresh my page, we have the app and we have the manifest file right alongside of it.
So let's go ahead and take this application on my GitHub repository and publish it directly from this GitHub repository. So I'm going to re-grab the URL for the third time. And I'm going to switch over to Posit Connect here in this demo environment. So this is the home screen of Posit Connect. Just some example content that we have here just to explore all the various pieces of content you can currently host on Posit Connect. But in the very top here, you'll see this publish drop-down, and I'm going to select import from Git. I'm going to paste in the URL and I'll click next.
And it's going to ask me what branch I want to publish from. Now our GitHub repository just has a single branch. It's that main branch. But there's actually a pretty cool workflow here where you can have multiple versions of a single piece of content published on different branches. So maybe you have a production version on the main branch, but maybe you have a development version on some other branch. You can actually publish two pieces of content from the same GitHub repository from different branches. But we just have one branch. That's main. So we'll select next. And then it's going to look for directories where it can find that manifest.json file. So these are deployable directories. And it found one inside that test app directory. And all we have to do is give our application a name. I'll say test app. We'll just keep it nice and simple. And then we'll hit deploy content.
So at this point, PositConnect is reading in that manifest file and just making sure it can replicate my environment. And once it's done, we can click open content. And we'll just give it a few seconds to boot up.
So here's that shiny application that we've been working on. And you can actually see the title here reflects that change that Publisher 2 made, Palmer. And it looks and it behaves just like it did when it was running within Posit Workbench.
Now, to demonstrate how powerful get back deployment is, so this application is now linked to that GitHub repository. And if anyone, anyone I'm collaborating with, makes a change to the main branch on that GitHub repository, it'll automatically be reflected here within deployed content on PositConnect. So it'll automatically stay updated. So let me actually go ahead and show you that workflow quick.
Now, to demonstrate how powerful get back deployment is, so this application is now linked to that GitHub repository. And if anyone, anyone I'm collaborating with, makes a change to the main branch on that GitHub repository, it'll automatically be reflected here within deployed content on PositConnect.
I'm going to come back here to my environment, Publisher 1. But anyone can do this, Publisher 2 or anyone else that's working on this GitHub repository and has access to that main branch. Let's go ahead and make another change to the title. So I have PalmerPenguinsDashboard. And I'm just going to add a dash. No, I'm going to add a smiley face. There we go. Kind of silly, but another change. We'll hit Save. Now you can see in the Git tab, we have modified this file. So let's go ahead and stage it. I'm going to commit it locally. And I'll say add smiley face to title. And then we'll commit this locally and then push it up to GitHub.
So now if I switch back over to my GitHub repository, let's just check the application and make sure that change is reflected. So you can see the smiley face right here on line 29. Now I'm going to switch back over to PositConnect. I'm going to refresh my page here. And you'll notice that the smiley face isn't yet there. That's because this piece of content, usually by default, it's only going to check every like 15 minutes to see if there's a change to that GitHub repository. So we could just hang out here for 15 minutes and wait for that change to take effect. And this is a timeframe that can be modified by your system administrators. But we can actually just manually kick off that check. So if I go to my info tab here, scroll down, check for updates periodically for that GitHub repository, or I can just have it manually check and click update now. So let's go ahead and click this button.
So it says changes were found in the Git repository, so it automatically rebuilds it. Hit okay. And now you can see the smiley face reflected here in our final Shiny application. So again, GitPack deployment is a fantastic way for collaborating with others and also having a piece of content hosted on Connect that's automatically updated anytime someone makes a change to that branch on the GitHub repository.
And that wraps up our last PositTeam workflow of 2023. And I really hope you all found today's session helpful for collaborative data science. Now, usually for these sessions, we'd have a live question and answer right afterwards. But just given the time of the year and the fact that a lot of people are out of the office, we instead wanted to provide you with another way to ask questions. So on the screen here will be a link to a forum on Posit's community page, which we'd welcome any questions about today's content. Finally, as a reminder, all of these PositTeam demos are recorded and publicly available, which you can view on YouTube. We hope you have a great holiday season, and we look forward to seeing you again for our next workflow in 2024.

