Resources

Oliver Bridges - Smart DCC | Energy Meetup | RStudio

video
Aug 6, 2021
29:09

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thank you everyone for joining us today. I'm really pleased to be able to present our journey into data really for the DCC. We started a few years ago and it's been a really interesting journey and we've enjoyed it all the way but firstly I'd like to say thank you to everyone for joining us. It's really great to be part of this community we've had a lot of help along our way and it's nice to sort of give something back.

On top of that I'd like to say thank you to our studios they've been a fantastic help all along and the journey couldn't have been done without them. My name's Oliver Bridges I'm head of data science at the DCC and I'm going to show you our journey from having a few canned reports through to having a team of 16 data science data wranglers and report writers and how we've progressed through that.

So the first couple of slides are a little bit complex and a little bit wordy so I apologize for that please bear with me we'll get on to the data science side of things very quickly but I just would like to set the scene of what the smart DCC done why we've had to make some of the decisions that we have so let's go on to the first slide.

About the DCC

As I said I'm head of data science at the data communications company the DCC. It's important to understand what the DCC is. We're a licensed and regulated body that was set up and established the United Kingdom to roll out smart meters across the country. We were set up and licensed by BASE which is a government department so our license was enacted as part of parliament and we are regulated by the energy watchdog Ofgem who watch out for everyone.

The license started in 2013 but we didn't start installing meters until about 2017 and one of the most important things to consider as part of our journey is that we are scrutinized with every cost we have. We have a price control and everything must be agreed with our stakeholders and also our customers. Our stakeholders being the government and the Ofgem and our customers being the energy suppliers, the network operators.

So our network is the network that connects these smart meters across the country to the users of the system and at the moment we've got about 13 million meters that are connected across the solution. That will rise in the next year or so and we're going to be 15 million meters that are connected across the solution. That will rise to about 50 million meters and about 250 million connected devices that will be on our network. So really we're a very large scale IoT solution.

Our whole network is rolled out by our service providers so the DCC themselves do not provide any infrastructure or any kit really. We manage our service providers who do that. At the moment we have about 15 million messages across our network every day so it's quite a busy network but on top of that we also have all the inventory data that is there to support all of that.

At GoLive everything was outsourced as I said including our analytics so the DCC did not provide any analytics at that point but we were mandated to have a reporting function that was responsible for governing and distributing the reports. But they would check the QA check them and as part of this they would have SQL access into it. So just a little bit of SQL just enough to go in and check that our reports were right but we did not have an analytics function. But now we've grown from a team of 4 to 16. We've taken all the analytics in-house now so the DCC provides it all.

It's been a journey to get there and I'll take you through that step. We've got a complex data set and what feels like a large data set as well and it's growing all the time as we get busier and busier and more installs happen. We've also got a huge customer base where we've got over 100 customers that we have to satisfy across the solution and they all use it in a slightly different way. So that's where we are now.

The smart meter ecosystem

The next slide will try and show you the landscape that we're operating in. So I'm going to explain this diagram it's a little bit complicated from left to right. So on the left hand side is the home and this is where smart meters would sit and smart meters are installed to enable people to have greater functionality at the local end of the electricity market. So within an electric meter and a gas meter you'd also see in-home display and this allows the user to see how much electricity or gas they're using or what tariff they're on or any interesting information. But there's also a number of other smart devices that would be in the home including prepayment devices and devices to enable electric vehicles.

All of this is installed within the home and that's what one of our day jobs is to ensure that the installation of these meters takes place and these devices communicate via what's called a comms hub. So on the left hand side within the home all those devices and there's a comms hub and that allows communication both within the home but also via a wide area network back out to our users.

And if you look at the the middle piece there that describes our network. So we've got communication service providers, data service providers and a whole host more. There's eight shown there but in reality there are 15 at the moment and that will be moving up to about 40 by the time the solution is complete. And on the far right hand side are our customers who communicate with the meters or receive communications from the meters and these are the energy suppliers, the people who supply energy to the home. Also the network operators and there is the ability for authorized third parties. So that might be a comparison website for instance who might want to take some meter readings and recommend the best tariff from the best provider. That is our ecosystem that we're working with and it is a complex one.

The scale and complexity of the data

So the next slide my team specifically asked me to to bring this in because it is complex data that we're dealing with. If you look there that's a map of all the installs just on one day. So that was the number of meters and where they were installed one day and actually this is taken from one of our applications that we utilize.

That means we've got an ever-growing base of devices out there that are communicating across our network growing by about 20,000 a day and that means the volume of data we're dealing with is growing and you can see there just last month we were over half a billion records in a month that we need to process and analyze to provide results to our 100 plus customers and this creates a complex data set. There's all the transactional data but on top of that there's the data that supports those transactions, the inventory data.

And why does this get complicated? Well smart meters have protocols that allow them to communicate and there's 127 different types of message that can be sent across the network. Those messages might vary from a meter read to a firmware update or a certificate swap and all those messages come across our network. They vary in size and some of them have to be sent in a specific order to allow them to be processed. So there's a complicated series of messages that goes across the network.

We mentioned earlier about our infrastructure is a complicated technical solution that means that the data that maps to it is complicated and we have to map that all together. And finally there's our users. How are they using the solution? They're all using it differently so we have to understand what the difference between our 100 plus users and how they're using it and how we report on that data to them and to our governing bodies and and to government. So our data is large and complex.

The analytics journey: from outsourcing to in-house

So to go through the journey that I talked about earlier from outsourcing our analytics function to taking it in-house, we go back to 2017 when our first meters were installed and we finally had a live solution. At that point we provided 22 canned reports that were provided by a service provider and we had four engineers who were there to ensure that they were distributed correctly and they had the correct data in them. So they had some SQL access that allowed them to look at it but the reality was they weren't set up as an analytics function, they were a reporting function.

But of course once we went live with more and more meters being installed on our network, a brand new multi-billion pound programming, the data questions started coming in. So as those data questions came in we tried to sort of service those but we weren't set up for that that type of reporting. But we did start creating some reporting, some additional reporting beyond what we were mandated to do as part of our contract, but a backlog started building up.

Then in 2018 the DCC formed a new service called the TOC, the Technical Operations Centre. This was stood up to allow the DCC to monitor the solution end-to-end and the reason for this was that our service providers were all mandated to manage their own piece of the network but no one was looking end-to-end to see how the end-to-end service was performing. Were our customers getting what they needed out of it? So the TOC was stood up and four new staff came in and we brought a visualization tool in with us and suddenly we had some capabilities to allow us to perform some complex analytics and produce some more reports. But of course as we got more tools to answer data questions even more data questions came in.

So by 2019 the analytics tool was now dealing with 1 million plus rows of data. The visualizations were taking an hour, two hours, several hours to run and the analytics that we were being asked to answer were becoming more and more complicated and those data questions just did not stop coming.

Discovering R and RStudio

But the visualization tool we noticed had an R plugin and this really was where our journey transitioned. Within the team by this point we're a team of eight we decided perhaps we'd better learn R. None of us had R, we had some basic SQL or varying levels of SQL around the team. Some of us have been Linux scripters, some of us had some experience in other programming language. So we 2019 we downloaded R, we got RStudio and RStudio desktop, the freeware that's available to download and I must warn anyone who's thinking of doing this once you start there you won't stop.

It is a gateway drug and you know we're all addicts now and we sit there every day working on RStudio but it was a fantastic moment for us and especially once we plugged that in with connectivity into our database using DBI we suddenly had the ability to do a lot more with the data that we wanted and we could do it programmatically as well. We were still dealing with very large data sets and at this point eventually we stumbled across data table which is fantastic.

It is a gateway drug and you know we're all addicts now and we sit there every day working on RStudio but it was a fantastic moment for us.

I love data table it's got a bit of a very hard syntax to follow but it is so fast it's unbelievable. I've put there that we started dealing with 10 million rows of data I think we're up to about 100 in data table now and we're doing some incredible stuff that you just wouldn't believe with it and then on top of that we started using Shiny for dashboards but of course we were only on the desktop version so any dashboards that we wanted to show off that meant that people had to have RStudio installed and R and everything else so it really wasn't sort of something that we could push out across our whole company but we were using Shiny we loved it we couldn't believe what we could do with it but again data questions more data more data more data.

Moving to RStudio Team

So in 2020 just over a year ago we signed our commercial license with RStudio and it's one of the best things we've done. I'll thank RStudio quite a few times during this but they were very patient with us as I said we have a complex license that means that we we had a few asks as part of the contract they were very patient with us and just over a year ago I know that because we've just renewed our license we we signed the the deal to get RStudio and of course with RStudio Team you're now open up to having workbench connect and package manager and we got all of that.

And we at the same time we expanded from 8 to 16 bodies so that's roughly what we are now 16 people within the team we also migrated our our database to the cloud to be closer to the processing so everything is cloud-based and that might be the next part of our journey how we expand that we're still on SQL server at the moment but we also got our first contractual delivery something we got paid for rather than just helping our customers out and we now produce monthly 500 page pdf with a lot of the processing done within RStudio we still visualize for our visualization tool that that is a huge report that takes days of processing we're we're aggregating you know half a billion records as part of that and it's very well received by our customers.

Now I'll talk you through each of the parts of RStudio Team starting with RStudio workbench as I've said all the heavy lifting is done there this is where we sit we're logged on all day utilizing it. I've got a call out whoever put the the rainbow braces in there's a 1.4 of that we we do love that and and there's new functionality being brought in all the time our our regular reports have increased from 22 to 150 and that's increasing all the time as there's more and more demand for analytics from our team. We also take advantage of the git integration and that's fabulous because we really are on evolution and before we had that our version control needed to be improved.

And we're not just creating reports and analytics there's all sorts of hypothesis testing statistical inference and forecasting that we're doing with some ML. Below I've given just a rough idea of how we utilize our workflow so we'll schedule our reports with Cronar we know that we perhaps should be doing that via our markdown we're very familiar with it as I said we interface with the database using DBI we then wrangle with the data using data table tidyverse deployer and there's a big debate within my team over who likes to use what but with the large scale we we end up putting everything in a data table and of course you can then use all the tidyverse functions on top of that data table just like a data frame and ggplot's called out there separately from the rest of tidyverse we will generally create our visualizations with ggplot.

RStudio Connect

The next connect is really the the the glory of everything we do this is what people get to see utilizing connect and of course this is what we didn't have during whilst we were just on the desktop so with connect we managed to create a monitoring dashboard and this dashboard allows us to measure analyze and report on over 3,000 metrics every five minutes. There's a rag status that that we put to it and if it's red or amber then we'll send out an email and people can see it on the dashboard as well and it's all totally configurable we can configure the metrics we're reporting on and also who their sort of emails etc will go down.

We also use connect for parameterized reports we mark down pins we're really starting to get into pins and also apis. APIs allow us to really publish a single version of the truth and that was one of the biggest problems we we had in the beginning was that you could write SQL to get an answer any way you wanted to know those answers would be different that'd be equally correct but but now we can publish things both externally and internally with with apis. Pins we've started really using and putting data sets there that we can share all this beautiful aggregated data that takes two or three hours to to to combine together and get the results out we can then pin it and other teams external to us can utilize that data.

APIs allow us to really publish a single version of the truth and that was one of the biggest problems we we had in the beginning was that you could write SQL to get an answer any way you wanted to know those answers would be different that'd be equally correct but but now we can publish things both externally and internally with with apis.

We've also got a access portal so instead of people coming to us asking for data we can distribute that data to them and we can we can do it with access controls so different people see different data and and that's all handled as part of connect. We're we're also really starting to turn into an application support team as well because because of this huge infrastructure that we've built with RStudio Teams we we need to monitor it each report that runs we need to know what it's produced has it failed all of that is built in with more Shiny dashboards. So just down the bottom there I've mentioned a few of our favorite parts so Shiny and Shiny dashboard we utilize that all the time pins plumber with their apis quite often use plotly for the visualization because that's more interactive and of course R markdown for the parameterized reports there.

Package manager

Moving on we've got the package manager and this is really the unsung hero of everything we do because it seamlessly manages the packages for us and I had a quick look in in the year that we've had it we use 500 unique packages 8000 package downloads no one really knows they're doing it it's all managed for us and and that's fantastic and there's so many packages out there provided for to answer every question really it's amazing. We've also started creating our own packages and that integrates nicely into workbench we we use some of the the tools within there and we've started making packages for our SQL connectivity our our logging and our report writing so package manager is it sits there doing its job and it really is the unsung hero of it all.

Q&A

Holly would it be okay to ask you a question that came from one of the previous slides yeah I'd love to. So someone had just asked what machine learning algorithms were used for forecasting. I'd have to hand that one off to someone in my team certainly within some of the forecasting that we're doing we're utilizing prophet Facebook's prophet I think for some of the time series forecasting that is done but the ML that's done that that is someone else in the team I'm very much the the script writer not the data scientist here. Awesome thank you they can put it in the chat as well if other people have feedback too.

How things have changed

How is this changing so we've gone from from having no meters installed no data to having huge data and it's got more complex but we've got more accurate as it's grown because we we're now able to do so much more with the data that we've got and as I said we we've been able to grow with the solution we're at 13 million meters that will move up to 50 million meters 250 million connected devices and we will be able to grow with that and we've seen that already and obviously being stood up on the cloud as well it's very much easier for us to add memory or or scale in any way that we want and and that's been fantastic.

As I said we've been paid now for analytics so we're producing the the standard that industry utilizes to understand how they're doing so it's not so much about how we're doing but how are industry doing how are they utilizing our smart meter solution and we've become the trusted advisors to our customers and our partners our service providers and also our stakeholders and they're now utilizing us more and more but of course what does that mean even more data questions they don't stop coming and we'll we'll keep having more and more of those coming.

Thanks and demo screenshots

Before I uh because I've switched PCs I can't give you a live demo of some of our products but luckily I did put them in as a reserve slides so we can go on to those in in one moment but before I do I want to say thanks to everyone out there who's helped us from our first days of trawling stack overflow for answers how do you do this how anyone writes any code without stack overflow I don't think it's possible with but the whole community out there they are community is just amazing I think what what gets delivered out there to people who are doing stuff is incredible the amount of help that is out there. Of course I want to say thank you to everyone at RStudio they've they've been with us all our journey longer than they actually know because obviously once we downloaded desktop we were starting using their blogs and and all of the information that they put out there but they they've helped us with every step of the way.

And a big thank you to our customers who ask the questions whether they're internal or external we get asked all the interesting questions and you know we do love to answer them and my team you'll see them online midnight sort of coding away because everyone loves just getting into that data and and using the products we've got and and just having fun really we we really do have fun with the data.

So I was at this point going to show a live demo but I do have some of the the apps that I was going to show so this is our forecasting app obviously it's a a stationary screenshot and and this is what uses prophet for the time series analysis and we we need within our industry to forecast where we're going how many devices are going to be needed next year and the year after but also we we have a very our comms team love to announce that we're at our 15 millionth meter or whatever and we can see there's a forecast there at the bottom for the 9th of September we'll probably get given a chocolate bar within the company for that and we're 98% confident that we'll we'll be hitting that date and generally we're we're within one day of where we're forecasting so this forecasting helps us with our comms but also helps us with our ordering of the comms hubs and the other devices that are on our network but also the infrastructure that people are going to need to process these messages.

The next one this is a lovely app that I I like to use we get a lot of questions quite often from an MP so someone from parliament will come to us and say how is my constituency doing have I got a lot of connected homes it seems to be a repeated question so we've got all that data available we can map all the installs to it and you can see the drop downs on the left I could choose the constituencies and you can see there where those two constituencies come out and then the data at the bottom that shows the number of households in in that area and the number of connected homes so so this is utilized within the DCC a lot for those questions that come in external to the DCC how can we answer them and they appear live on the map as well it's actually lovely when I use it live.

Going on to our analytics that we utilize for our sort of application support we're running many reports every day and we need to monitor them you can see the average is about 250 reports produced every day and we've got an analytic tool that allows us to not just see how many they've produced also the performance of them and also it takes you to the logs for them so it really is an application support tool that allows our 24 by 17 to examine what is happening and we don't have to worry about going in and checking logs it all happens automatically.

And finally I mentioned monitoring with a dashboard this is the dashboard that monitors about 3,000 metrics across our whole infrastructure it's highly configurable but of you can see there that and I fixed it in time so not to have any commercially sensitive data on there we were measuring about 1500 metrics two of them we we found were of interest and our 24 by 17 would have a look at that and there's hyperlinks all through that where you can see the history get a nice plotly graph come up and you can also link across to a help guide via our markdown.

So apologies that those last four weren't interactive thank you for for bearing with me during the technological meltdown that I had at the beginning I'm really pleased to have got through that and to have got here and and presented to you today I'm happy to take any questions.