Resources

Open Source Environmental Monitoring with Shiny! | Wayne Jones, Shell

video
Oct 18, 2022
1:01:10

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

back to the rstudio enterprise community meetup. Hope everyone's having a great start to the week. If you've just joined now, feel free to say hi through the chat window and maybe where you're calling in from. Because we're focusing on Shiny today, maybe also share what's your favorite Shiny app you've ever created or a public app that you've seen. I always love to see new examples. Special welcome if this is your first time joining us today.

This is a friendly and open meetup environment for teams to share different use cases and teach lessons learned, just meet each other and ask questions too. And so they happen every Tuesday at 12 eastern time. But together we're all dedicated to making this space inclusive and an open environment for everybody, no matter your experience or industry or background. And you can also ask questions anonymously as well through the Slido link that I'll just put up on the screen here right now. We'll try to answer as many questions as possible at the end of the presentation too.

Today we have the pleasure of learning about open source environmental monitoring done with Shiny. And with that, I would love to bring my friend here up on stage, Wayne Jones, who will be presenting today. Thanks, Wayne. Thank you very much. Can you see the screen? Everything looking okay? Yep, looks good. Well, thanks a lot for the opportunity to talk to you today about my experiences with open source monitoring with Shiny.

Introducing GWSDAT

I'm going to talk principally today about an application that we've built called GWS, that stands for Groundwater Spatial Temporal Data Analysis. A little bit about myself, I'm Wayne Jones. I'm a principal data scientist in Shell Global Solutions, and I've been working in Shell for 15 years now. And I would say that this is the project I've been working on for the longest, almost since I've been at Shell.

A little bit about soil and groundwater activities at Shell and the wider industry. Shell obviously operates globally across different countries with different types of assets here. We manage a large amount of land, and we have some cases of groundwater risk in remediation. For example, if a petrol tank at a fuel station leaks, we have to remediate the situation. We do that with working with the environmental regulators. And we do things proactively monitoring, so our larger sites, typical, even at binaries and terminals, we always monitor the groundwater.

We look for any contamination. The way that we operate in Shell is we work with third-party environmental engineering companies. They typically go out and they do the analysis, give us reports, and they work with the environmental regulators to come and remediate the solution as and when needed. And we've got a long history of research and development with academia and joint industry projects.

So, first and foremost, what is GWSDAT? Well, GWSDAT is a collaborative project between Shell and the University of Glasgow. It's a user-friendly, open-source, decision support tool for analysis of groundwater monitoring data. Underlying it is effectively our statistical programming language, and we use a combination of our packages, obviously including Shiny, together with some custom-developed algorithms we've built at the University of Glasgow as well. It's been around for a long time. I think I can't remember, to be honest with you, it's that long ago, but I think it's about around 2010 when we released the first version.

From Excel and tcl/tk to Shiny

So the way GWSDAT used to work before Shiny was effectively, it was a combination of an Excel add-in which contained various functions including data input templates. And we've had a graphical interface using the rpanel package which is basically a wrapper for tclpk, which is, you can see a snapshot of the GUI here in the bottom right. It has very strange programming paradigms, very, very difficult programming, and the rpanel package is basically a wrapper which wraps up some of these, dare I say, obscure tclta functions, right, and that makes it a bit more complicated, but the long and short of it, it's still very, very hard, right.

And so when Shiny became popular, it was actually the University of Glasgow that approached Shell and said, look, we think this would be strategically a really good idea to do this, and what's more, they were willing to go above and beyond the call of duty and do it and actually do it for themselves as well, right, and in that process, we took, you know, Git had come about as well, right, and it shows how long this has been around for, and I've put a GitHub site there where we host the glus.app package there, and it's open and visible to view, and you can view all the code from there.

What GWSDAT is designed to do

Before I give a demonstration of the tool in action, I think it's worth pointing out what it's actually designed to do. It's there for the analysis of trend and groundwater monitoring. How that typically works, right, is you get a site and you construct wells with the active underlying water table, you take samples of the water, send them to the lab for analysis, and they come back with the concentrations of the various different solutes, so xylene, toluene, total hydrocarbons, a lot, right, and the job of glus.app is to help analysts interpret trends. Is it going down? Is it going up? Where is this concentration spreading to?

And to do that, we use smoothing statistics, and I'll give more examples of this during the demonstration as well, both smoothing in one-dimensional time series, for example using local linear regression. We also use spatial smoothing. And uniquely in the industry, we use a spatial temporal smoother, which effectively, it's the way that the industry typically works, is you get a concentration contour for one round of sampling. It's another concentration contour independently to the second round that you collect this information, and so on, and so on, and so on. So in the old traditional way of doing things, you know, you would have three sort of spatial cricking models here, but glus.app does it differently, effectively does it in a one-er, right, and there are various different benefits of doing that.

Worth pointing out that it was with the help of the University of Glasgow, the underlying methodology for the spatial temporal statistics is penalised spline, line-based methodology, and we went for this way more than anything because of the speed of computation. There are other things like Gaussian process modelling, three-dimensional cricking, but we found them to take a long time to run.

Live demonstration

So at that point there, I'm going to move over to a bit of a demonstration of what the actual tool, I'm going to start off by saying what it used to look like, so you get a look and feel for it. We have a GWS stat add-in, and this is typically how most people in industry access to your GWS stat add-in. The way it used to work was here, this is the old data input format here, there's effectively two tables, and they need to be populated, you've got the historical monitoring data table, this is the name of the well, the constituent, the date it was sampled, together with the concentration value.

In addition to concentrations, you can add groundwater elevations, this is how high the water table is, and that enables you to estimate groundwater flow direction. And the second table is a table of the well coordinates. What happens is, in terms of the connectivity between Excel and R, it's a bit of a hack if I'm being totally honest with you, when you click analyze from the data menu here, basically what's happening here is it's using VBA macro, so basically it takes to create CSV files here, and then it's running R in batch mode, and returning this graphical user interface.

This is the new Shiny graphical user interface for GWSDAT. Again you can see instantaneously it looks a lot nicer, it looks a lot more up-to-date, you don't have to overlay all the graphs in one place, you know you did lots of tabs and more sophisticated functionality.

This initial plot here is called the spatial plot, you've got the locations of the wells, together with the names of the wells underneath it, and above it the last constant, the last measured value on top of the well location, overlaid or superposed with the axis on the right hand side there are the predictions of this spatial template, and you can scroll it back, scroll it forward in time, for example, you basically build up an image of how this plume mass has changed over time.

And in addition to that one of the functionalities, and to better quantify what exactly has happened to the plume, because without having these objective measures, we're getting confused about the interpretation, but let me give an example of how we come across that, we press this button here, plume diagnostics, what this does here, it takes a time slice from this spatial template, delineates a plume with a prescribed threshold value of 10 micrograms per litre, and once it's enclosed, it uses a numerical interface to give you a plume mass and a plume area, and you can effectively track that through time, so you can effectively, you know, measure the total amount of contamination at your site, and you get, if you press this tab here, for example, what you get is a time series of plume mass, total plume area, and the average plume concentration, so this is an objective measure that we calculate from the underlying analysis, and which doesn't rely on the interpretation of the user of these plots.

There's various drop-down boxes here on the right-hand side, where you can choose different options, different substances from the initial data sets, I've done xylene there, you can change the units, there's a range of different plot types, if for whatever reason you just want to plot the data, you can do plot terrain circles, for example, you can toggle well labels on and off, show the groundwater contour here, so the blue arrows correspond to the estimated groundwater direction of flow and strength, effectively you can see in this situation here, because the water levels are high in the kind of northwest region, and low in the southeast, we anticipate the groundwater flow in that direction.

The blue arrows are estimated with a technique of doing a triangulation between the well locations, and fitting a plane to them, just reading the gradient of that off, but the contours are done with a low F smoother, that's basically a local linear regression. The nice thing about here, the two independent methods of trying to estimate the same thing, and you can see generally speaking, you have a good agreement, multiple lines of evidence effectively.

There's a whole host of variety of different ways that you can output the format as well, one of the nice things is the ability to generate PowerPoint animations, what this does here, is it basically runs it through time in an animation, and it will output the results in a PowerPoint slide pack, like so, and they're all overlaid exactly over one another, right, and so typically we see these styles of contours being attended to by the reports that go to environmental regulation.

Loads of other things I can show, I'll just go to the time series plot, this is a combination where you can look at different wells, and the different substances on a well by the well level basis. It plots the data, and it overlays it with a local linear regression to the point. Challenging intervention where they just typically do, you know, linear fits, you know, and there's examples here where, you know, interpretation of a linear fit is a lot different to what it was with the transmute of linear fits, it's just flat basically, whereas this transmute is not constrained to be monotonic in direction, and interpretation here I think is closer to which is mainly increasing rapidly and slowly.

Obviously, there's potentially a lot of different combinations of wells and solutes, right, so what we have here is the trend and threshold indicator matrix, this is where it has a high, it's a lookup table of all the combinations, particularly time slice, and it gives you a colour key reporting on the trend, right, and you can see that, you know, green, dark green means it's trending downwards very quickly, red and dark red, strongly increasing, so this is a high level lookup about what wells you may be having issues with, and the ones you're drawing attention to here are Benzene from monitoring well 7, Bond Benzene from monitoring well 11, let's do a deep dive on that, let's go to Bond Benzene monitoring well 7, and you can see, yes, indeed, that bit by bit, it's increasing rapidly, and 11 the same as well, right, and that might be harder to spot if you're going through each one of them individually.

There's a variety of different plots here as well, this is one of my personal favourites, it uses lattice, as most people might recognise, right, we haven't moved into GT plots yet, we might do one day in the future, and you can just customise this, this is basically all the concentration data for all the wells at this site, right, you can see that the different volumes are in different colours, and each cell corresponds to a different well, right, it's a very, very concise plot, it's also quite useful for looking at the correlation between the different wells.

This is, I've highlighted to you the local version of the app, this is for local installation, you know, we install it as an app package effectively, right, together with the Excel add-in, but there's a publicly available implementation of this online, and then this is the advantage of going Shiny, right, previous architecture that was only available locally, but with Shiny, we deploy it over the web, and, you know, in the past, it also opens the door in terms of various different file formats, right, so in the past, you could lock down to Excel add-in, but here we have various, you can manually enter the data.

Adoption and impact

And it's been used and been around for a long, long time, and when we did the last questionnaire, there was four main points that it came down to, the design optimization of monitoring remediation programs, right, you know, you just see things you wouldn't otherwise see effectively, right, you know, early identification and potentially new releases, and one example here is when we were actually promoting Judarestat, the first version, the way we did it was we asked people for their data sets, and we presented the data back to them in Judarestat, and even in those few exercises, we saw they noticed things that they hadn't previously seen, so it really does, you know, give you more insights than the traditional methods that they were guys were looking at.

Rapid interpretation of complex data sets, it's principally designed for the smaller data sets, you know, where you have typically the retail fuel stations, where you have small data sets, where it's perhaps not worth setting up an RPGIS, for example, data set, but it works equally well for the large data set as well, and it's got inbuilt reporting functions which take, you know, consultants in the past use to take ages to do it, imagine replicating some of this stuff in Excel, it would take you ages and ages and ages. So that on its own is a huge time saver.

One of the, you know, the good things about being open source is that it's open and free to use, right, and it gets used a lot more than it would be if it wasn't open source, exactly. One of the bad things about open source is you don't know exactly where it's being used, right, but, you know, we've collected evidence of its applied use and it's been used, you know, typically globally. It turns up in all manner of different places that, you know, today just surprises me, you know, it's been used on the Navajo Indian Reserve in America, it was discussed in an Indonesian, you know, student geology group, it's been used across the US, Australia, you know, Pakistan and, you know, loads of different places across the globe.

We've worked with many environmental regulators and, you know, it's referenced in some of the environment regulators' guidance documents and, you know, we've just highlighted there the UK Environment Agency, but we have done, you know, joint presentations with the US EPA. They particularly like the plume diagnostic tool, its ability to quantify, you know, the degree of contamination in the plumes at the same time, effectively, and we are, you know, included in the standardised or guidance documents for the field of groundwater monitoring. So, it's well established and it's been around for a long time and it's still being used more and more and more.

Critical success factors

You know, I suppose, you know, retrospectively looking back at, you know, why it's been successful, it was the right tool for the right time, right. It wasn't trying to be more than it actually was, it wasn't a hugely, hugely sophisticated statistical tool either, right, you know, it's basically a visualisation tool with some three-dimensional smoothing effectively, right, but critical was adopting the open source framework, especially in the field of groundwater monitoring and environmental modelling in general, right. It increases accessibility, so more and more people use it, more and more people that use it, the more well respected and adopted it becomes.

Also, there's transparency, you know, if we were using a black box tool, our environmental regulators may be inclined to question the underlying methodology when it's here, all laid out with all the code fully transparent, so it's fully open to state and, you know, it's free to use, which is also important, you know, especially in certain countries, but there are a lot of countries in the world who are using it now.

Critical was adopting the open source framework, especially in the field of groundwater monitoring and environmental modelling in general, right. It increases accessibility, so more and more people use it, more and more people that use it, the more well respected and adopted it becomes.

Another critical success has been developing that strong relationship with an academic family. You know, we've had, we've typically done it with engagements with a string of different PhD students, which has helped build that relationship, and we've been working now, you know, over 10 years, and we're on our third PhD student as well. I think it's also worth pointing out that the university was incentivised, UK-based university, by the research excellence framework, right, and this is about quantifying the impact of academic work in society effectively, right, and I think this was probably one of the reasons why Glasgow were willing to go above and beyond the call of duty and put their own time and resources into actually developing the Shiny version of Geodeska, right, so they paid for a postdoc for a six-month period to report the code where it was before into the Shiny.

And other factors include computational speed, right, you know, when you work at Glasgow, they say, well, there's kind of different models we could do, but there was always a trade-off between the time it took to run the model versus, you know, the correct model, and I think we hit upon a reasonable compromise in using piecewise, they're very, very quick. I don't think it would have been adopted the way it had been if it took, you know, sort of hours for it to run the model. The fact that it's computationally fast is one of the, in my opinion anyway, one of the critical success factors.

Again, no special data requirements, this is interesting as well, right, you know, we didn't ask for any special data that we didn't already have, it's just a piece of formatting it in a particular way for data input, so there's no barriers effectively, but we made it as open and transparent and reduced the barriers for entry, made it easy as possible. And of course, the one I would like to talk about specifically is adopting that Shiny framework, and I think that's been absolutely one of the critical success factors, and there's many benefits of doing that effectively, you know, first and foremost, it's a more sophisticated, up-to-date graph, we're using interface.

You know, Shiny is a structured graphical user programming, as a graphical, as a GUI programming syntax, which I really like, it makes sense, there's plenty of documentation out there, there's plenty of tutorials, and there's plenty of support out there, and it's taken GWSDAT as well, from being a local installation to specific users, to being more widely adopted online, all the benefits of having an online web app, you don't need to install anything, you can just go to a web url, and that's it.

And there is a publicly available version here, and that's been hosted by the University of Glasgow, there's a lot of subscriptions around the shinyapps.io, however, you know, this is mostly the demonstration purposes, I would say, or fine if the data is, you know, sensitive or, but, if you are concerned about data security, and then there are two pieces of advice, right, one is you can go down the local installation, so keep it local, right, so any data you have doesn't move off your own PC, for example, or if you want an online version of it, what you can do is, is you can basically build your own internal Shiny server service internally within your computer, and I know, you know, Shell, we have our own RStudio Connect internal service, and I know BP have got their own internal service.

Another benefit as well is that, you know, it's been successful for GWSDAT, but the underlying architecture has actually been used for other open source tools as well, right, so there's a monitoring toolbox, it's by Kunkawi, again, it's an open source monitoring tool for LNAPL, LNAPL is the kind of pollution you see on top of the puddles, you normally see the rainbows on top of the puddles, it's more related to that type of contamination, and again, open source toolbox. The interesting thing about this one is, yes, it's usually GWSDAT as a modeling template, but in this one here, this is sort of bringing on to RStudio, and one of the latest things out of RStudio is using a combination of both R and Python code in the same place, and I asked them, you know, why did you do it in both languages, right, because I think it is easy to maintain in one language, but they said, you know, it was resource, right, they had, they didn't have dedicated R developers, they didn't have dedicated Python code, so they, by RStudio having the flexibility, by Shiny having the flexibility to use R and Python together, they could use different developers to get the job done. I think that's an important point, especially for where RStudio is going.

Future directions

Going forward, and I'm conscious of the time now, because I'm coming to an end, where we see the future is, is through sustainable remediation, right, and this is, again, alluding to what I said earlier about spatial-temporal, you know, it's basically giving you, what we did was a research exercise where we compared the data efficiency of doing the old traditional technique of doing independent spatial modeling versus one spatial-temporal model, and there was a range of benefits, including more information using fewer observations.

You can achieve the same level of performance, and if you turn that a little bit on its head, you can say the spatial-temporal methods can achieve the same level of performance, but with fewer data points, right, fewer data points means fewer something, right, fewer, fewer journeys in vehicles to do it, fewer lab tests, right, and lesser impact on environment, right, and with that, you know, the, and this is a current theme, with our latest student, um, Peter at the University of Glasgow, so we're trying to implement cost-effective spatial-temporal approaches to optimize ground and monitoring.

Fewer data points means fewer something, right, fewer journeys in vehicles to do it, fewer lab tests, right, and lesser impact on environment.

Last closing point, um, I guess, what haven't I done, which I would really like to do with the rest of that, uh, you know, typical data format is in Excel, and, um, that's because typically the data sets are warehoused in Excel, but more and more now, we're seeing, you know, central databases where it's managed in a lot better way, right, and the approach up to now has been basically to query these databases in gws.format and then just put them into Excel. That's still a kind of two- or three-step process. What I'd really like to do is pull integration, right, so you could log into that, and it actually just gives you a table, and it automatically queries that table from the site that you choose, so there's one portal that you could design, and anyone in, say, Shell or any organization would basically have, you know, at their fingertips an analysis of all our groundwork.

That's one of my to-do in the future. I think I have to do that.

Q&A

Thank you so much, Wayne. It's amazing to see how companies are able to work together, too, and use this open-source work. I really love the case studies that you put on the website. It seems like you've thought so much about the marketing and communication of the app as well and, like, anticipating people's demands and knowing that people may want to use this from Excel and using the Excel add-in, so I'm really curious, how do you gather all this user feedback and kind of, like, know how to communicate this out to everybody?

Yeah, that's a good question. I would say we, although it might look like we do a lot of marketing and communication, it's, we probably didn't do as much as we could have done, right, and I think, but basically, it's working hand-in-hand with people, understanding and gathering their feedback. On the GitHub site, there's a list of suggestions for improvements, and the way I operate, basically, is, you know, we get lots of different people asking for lots of different functionality, but if lots of people are asking for the same thing, right, that's when it will go up the list of files.

In terms of how do we market it to begin with, how do we promote it, how do we get feedback, ground water monitoring, you know, there's a whole field of environmental engineering, effectively, in Shell, there's teams who are fully integrated into the industry. We've presented in lots of different areas, you know, we work hand-in-hand with CLAIRE, that's Detaminated Land Action in Real Environments, that's the kind of European portal, ground water, and we also work with the American Petroleum Institute. Professional bodies, I would say, is probably the one word answer to that question. And I think, you know, the one piece of feedback we're getting lately is that there should be more training material out there and we're in the process of putting some of it together.

Russ said, hey, Wayne, very interesting solution, what geospatial libraries are you using here? That's a good question. Off my head, right, I'm struggling a bit, but if you look on my GitHub site, it will, it should tell you the dependency on the package. SM, SP and Splunks are the ones that are the spatial in nature. Yeah, we're not using any of the off, to be fair, there's very little spatial temporal stuff out there, certainly nothing off the shelf. And for those that are out there, almost insist on the data being regular in nature, because it's convenient. We don't have regular data, so we've had to build our own components in there, and all the Splunk code is within the package source.

Yeah, I mean, we are in the process of building, you know, different additional functionalities, right? I don't think there's anything there that we want to implement, because if it was, we would have already implemented it. We're having to build this stuff ourselves now, probably because it's not been done before. So case in point, you know, what we've implemented, leave on well out analysis is the latest thing. So this is the ability to basically just drop a well, refit and compare the results directly against one another. Going forward, there'll be more sophisticated tools to do this, you know, which will actually give you sampling designs as to what wells you should sample at different points. And we haven't found any, much to my regret, we haven't found any of their packages out there already. So, yeah, having to develop them ourselves, basically.

Another question that was over on Slido anonymous question was, I know you just shared the GitHub, but do you have a reference for your work generating the PowerPoint slides from Shiny? Yes, I'm trying to think what we actually use, we use something like that, right? But if you have a look at the code, it will tell you how to do it. If you search in the GitHub, off the top of my head, I can't remember, I think it's using Officer. If you search for PowerPoint in the GitHub clone repository, search for PowerPoint, you'll find it there as well. There is a very old fashioned package called R2PPT. I am the author of that package, but it's a bit outdated now and it's probably going to get crammed fairly soon.

One of the things that I have experimented with is using our markdown as a report creation, where instead of doing PowerPoint slide packs, you basically have an animation in a Word document. What I found up to now, it's very easy to create some HTML reports and PDFs, bit tougher with Word, but I think it needs to be Word because the consultants will need to, always need to amend standard report. So that's still a work in progress.

One was, is the, for Excel manipulation, is OpenXLSX being used? No, it doesn't. This is, we used to use a package called add-on client, but we found it kind of not particularly well supported. So I wrote a little bit of a hack to be honest here, right? If you download and do that add-in, you will see that all the code is there on how to do it, right? So the way it actually does it is it dynamically builds like a text file on the fly, which sort of saves off your different variables and all the rest of it. And then it sends that over together with the datasets as input to an R batch file. And that's kind of how it works. And that works fine because it's only one way traffic, right? And that you're just sending the data from Excel into R. If you had to do the other way around as well, then the architecture, what I'm currently using would fall flat in place.

I see some nice feedback in the Slido. Hey, Wayne, nice presentation and Shiny app. Love how everything is reactive to input widgets and how you've got rid of a trigger button. And the follow-up question there was, how long did you take to build the application? And what were the key things you knew needed to be in the app from the start?

Yeah, that's a good question, right? I mean, in terms of, yeah, it wasn't me that actually did the first Shiny version, right? We had a postdoc who was doing it on a six-month basis. He ported most of the functionality that was there before. And then I've added to it since, effectively, right? You know, key to the number one was the spatial plot, right? Having that movie of the spatial distribution and slowly exchange it through time is number one, right? And then everything effectively follows from that. That's the unique plot, right? That's the plot that was really hard to do in Excel for example, which is typically how some of the consoles used to do this in the past.

And I guess I would say convenience as well, right? Having it all there in one place, you know, in a reproducible, repeatable fashion, right? You know, the data input's always the same and you get lots of options and plotting. The same cannot be said of doing it in Excel manually, take all day for people to come up with a lattice of plots. You can do that in literally seconds.

One other question is, did you have to make the case for using open source? This is a very good question, right? I didn't, you know what, you know, it wasn't even my idea to go open source, all right? It was the guys at the groundwater. They work in the space, they knew how it worked and they realised early on that open source was the way to go for the reasons I pointed out earlier. And what I'd say is, you know, Shell is an entity, you know, more and more, you know, we want to adopt, use and also write open source code as well, and there's various people I know. And it's also really great motivator, right, with our employees. I think people really, really like giving back their open source and it motivates them to promote open source and stuff. And I think we'll do more of it going forward with Shell, definitely.

But thinking about like the upkeep of the application and as you make improvements to it out, I was just curious, Wayne, what does your team look like? And how many Shiny developers do you have? And how many data scientists? Yeah, I mean, what, yeah, what I'd say is, there is, I would say there's a smallish team, right? You know, so I have Luke from the ground, soil and groundwater team, who's the subject and active expert in that field. And then I have the collaboration with the University of Glasgow, tends to be more the R&D app. In terms of support and maintenance, it tends to be me, right?

And again, this comes back to being open source. I don't know whether, I mean, I suppose I get, you know, four or five emails a week or something like that on average, I'd say. I don't know whether a lot of people using it and they have a small number of issues, or whether there's a small number of people using it with a large proportion of issues. But yeah, it is a worry, really. I mean, you know, in terms of sustainability, you know, if I fell underneath a bus, God forbid, you know, I think there needs to be a better plan B in place, right? Too much of it is just relying purely on me.

But that's one of the reasons why we're doing it open and transparent. We welcome people to come on board, you know, get involved in these open source projects, you know, so it doesn't rely just solely on one person. So if anyone wants to volunteer, then we'll gladly help out by all means.

Thanks for sharing, Wayne. We're facing a bottleneck with OpenXLSX going from R to Excel, would have never thought of calling R via VBA. Is the VBA code in GitHub too? The VBA code is embedded within the Excel add-in, right? If I just look at the code, it's in that add-in and all the code is there, visible.

I was curious, Wayne, I know you mentioned the LinkedIn group for GWS, but if people are interested in working on this with you, what's the best place to get in touch with you? Just drop me an email. Alternatively, you know, have a look at the GitHub page and, you know, have a look at the issues, suggest some enhancements, all that stuff. I mean, what I'd say about it, it's a very specific tool, right? It's a specific tool, right? So, you know, really apply it, but that's its kind of strength, really, is that it's a very specific tool for a specific area. So, really, you kind of look at the people that work in the geology space who have a bit of domain knowledge to come in and be able to help on that side as well, but everyone is welcome to help, believe you me.

I'm curious to know how you manage any confounding variables with the data provided in the open source you presented, or biased data. It's a good question. I would say, typically, we don't have any confounders. In the time series plots, right, it's not like we're doing a regression of one against the other, it's all against time. When it comes to fitting the spatial plot, right, it's using splines, right, it uses a three-dimensional spline basis, and obviously these variables, by design, are not going to confound one another. The danger there is overfitting, right, more than confounding parameters, and this is where p-spline comes into its own, is that it penalises the step changes from these parameters estimated from the spline, and that's what controls the degree of complexity and balances between under and overfitting. Confounding variables isn't really an issue for that reason, because we use spline by design, they're not confounded. Overfitting is, but it's controlled through this penalised drawing.

But thank you all for the great questions. I did just want to let everybody know that while you're all here, if you are interested in joining future events, we'd love to see you at those. You can view upcoming meetups, which happen every Tuesday at the same time at this calendar here, and we also have a data science hangout every Thursday at noon eastern time as well, where you can hear from different data science leaders in the space in a very informal setting and ask any and all questions to them. But next week, I am very excited to also have a presentation on intro to using Git and GitHub in RStudio, and so that will be next Tuesday.

I just want to say thank you so much, Wayne. It's amazing to see your team's use case and hear about your experience. It's really awesome to hear how you're working together with environmental agencies and other companies as well through this too. Thank you so much for sharing with us. Thank you.