Data Science Hangout | Wayne Jones, Shell | Thinking Empathetically & Using Your Initiative
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi everybody, welcome to the Data Science Hangout. If you're joining for the first time, it's great to meet you. I'm Rachel. I'm the host of the Hangout.
If this is your first Hangout, as I know it is for Wayne too, this is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, and what's going on in the world of data science.
So as I mentioned, the sessions are recorded and shared to YouTube, as well as RStudio data science Hangout site. So you can always go back and rewatch or find helpful resources there too.
We always want to create spaces where everybody can participate and we can hear from everyone. So there's three ways that you can ask questions during these. So you can jump in live and raise your hand on Zoom. You can put questions in the Zoom chat. And feel free to put a little star next to it if you want me to read the question for you if maybe it's a little loud where you are. But we also have a Slido link where you can ask questions anonymously too, and Tyler will share that.
She was right ahead of me and shared it just in the chat. I just always like to reiterate, we love to hear from everybody, no matter your level of experience or area of work too.
But today I am so happy to be joined by my co-host, Wayne Jones, Principal Data Scientist at Shell. And Wayne, typically we get these started with just having you introduce yourself a little bit and sharing a bit about the work that you do.
Wayne's background and role at Shell
So my name is Wayne Jones. I'm a Principal Data Scientist in Shell. I've been there now for 15 years. Prior to that, my background has been in ecological modelling. So my PhD was in modelling the growth of fish. And following that, I went to work for an energy company, EDF, as a quantum analyst. And then I went to work as a pricing consultancy for about five or six years. Then I've been at Shell for 15 years now.
I would say most of the time I've been the role of a statistical consultant and as such, you get exposed to lots and lots of different business problems across all manner of areas. For some, people might find quite surprising, given that I'm working for an energy company. So, you know, I would say certainly in the earlier years, a lot of my time has been spent developing open source software for environmental monitoring applications.
GWSDAT: groundwater monitoring open source tool
Groundwater monitoring is something that is a very open and transparent industry, basically, where we work with environmental regulators and we periodically monitor our sites, whether it be petrol stations or refineries or terminals. And we have and we report out the latest status to the environmental regulators and we work with them to develop the best solutions for remediation, if there are any issues effectively.
Right now, this is something that we work with a range of different environmental engineering companies as consultants. And they perhaps go out and do the sampling of the data, do the send the data to the lab, get concentration results back. And then in the past, there was lots of different ways of analysing the same data. So there's an opportunity to standardise, make things even easier for everyone and to increase the level of interpretability from these data sets.
Right. So this is this basically created in collaboration with the University of Glasgow, an application called GWSDAT. It's available as an R package, GWSDAT stands for Groundwater Spatial Temporal Data Analysis Tool. You guys can just download and it's free to use effectively. Yeah, and it's all to summarise what it does effectively. It's there to identify trends in groundwater solid concentrations.
Excitement about offshore wind and energy transition
I suppose the most exciting thing I've done recently is I've been working on a bid with offshore wind. Right. So the Dutch government, for example, they have for each offshore wind site, they have different criteria for winning that bid effectively. And the one I was working on was on environmental and ecological aspects of having a wind farm. How do you avoid bird strike, for example? How do you keep noise down? How do you monitor the environment? How do you demonstrate that you've had a net positive impact on the environment as well?
And this kind of brought me back to my ecological modelling days and it was really good. I can't give full details of the kind of stuff we're proposing as part of the bid, but I have to say it's very, very exciting. And again, we'll be done in a very, if successful in the bid, it will be very open and transparent. So we'll be looking to develop open source solutions and also, you know, being open with data.
Advocating for open source at Shell
Well, yeah, that's another good question, right? So when we first started using open source tools, we were actually challenged, right? I remember speaking to an architect in Shell and him saying, don't bother investing in open source tools, Wayne, because they'll probably be banned by Shell in a couple of years time, right? I mean, I look at how the future's panned out. It couldn't be any different, right? We use lots of open source tools like R and Python in our jobs on a day to day basis now.
I would say scientists really like the idea of making their code open source. So quite a few people come to me and ask for the pros and cons of doing it, right? Now, for GWS app, the business case for being open source is fairly open and shut, right? You know, it's basically it's open, transparent, you know, environment regulators haven't got a hell of a lot of money. They're not going to trust black box solutions. So it's an open and shut case.
But then, but we're expanding now and our business leaders are opening their eyes to the benefits of being open source, for perhaps in the future, we didn't see the benefit that we have now. It's becoming a more open culture.
Standardizing tools and global reach
So, yeah. So first of all, it's a global industry. So we do this across the different globes and there's different standards in different areas of the globe. Right. A lot. There are many, many industry bodies that advocate best practice in these areas. Right. And you'll see in the list of references where we are actually referencing best guidance, get best guidance documents. And this, if you like, is the kind of document which outlines statistical techniques in a digestible fashion to environmental engineers.
And that's one of the things I'm particularly proud of with GWSDAT. Right. You know, is that because it's open source, there's no barriers to its use and it gets used globally. Right. It gets used. You know, I'm really, really surprised by the way it gets used. So the last person who joined the GWSDAT group on LinkedIn is from Yemen. Right. We've got people from, you know, all over the globe, Australia, America, you know, especially America. Right. But then, you know, you're talking like Malta, China, Pakistan. You know, it truly is a global tool. So that's one of the things I'm most proud about.
Shiny app design and the Golem package
Oh, do you know what? I've never heard of Golem, if I'm honest here. Maybe you can tell me a bit about it. So obviously I haven't heard about it. It's just not written using Golem.
It's a similar paradigm to the question that Joe asked, right, about standardization. So it's a package that allows you to develop shiny apps using a standard method that essentially Colin Faye says the app, the Golem allows you to build industrial shiny apps that are robust. So he has a certain methodology that you follow in Golem to build a shiny app.
Well, it isn't. And I will certainly check out Golem because it sounds very interesting. Right. It might not have been around when we were doing this, to be fair, because it's quite a while ago since we first developed this version of GWSDAT. Shiny is absolutely fantastic for the smaller apps. Right. But the large, big industrial apps, sometimes if you don't design it properly, it can become rather unwieldy and very hard to support and maintain. You know, and that's really important in the spirit of open source is to make it a way that someone can come and run with it for themselves. So I will certainly check out Golem for future shiny applications.
15 years of change in open source data science
Yeah, I mean, that's a good first of all, I would say it's been the adoption to begin with of open source science, open source tools. And if I roll it back even further, you know, when we would do when I was in university to a PhD, it was an old fashioned computer lab. And we had the old Unix machines. Right. And they were, I think, about £20,000 each. And that's over 20 years ago now. Right. And then you also I wasn't about you had something called S plus instead. Right. And the license was about £20,000 a year and the rest of it.
Right. So but within the time that I had finished my PhD, we'd moved to Linux, which meant you could install on just using standard PCs, which obviously saved about, you know, £19,000 per PC. And we had started using our saved all the license costs for S plus and all. So that's that's going back. That's a long way. And that saves a maths department lots of money.
Coming back to Shell now, you know, I see changes from my perspective is first of all, there was a reluctance to adopt open source software. Right. So, you know, talk discussions of those architects, but they just didn't trust it from an architectural security perspective. Right. And some of them fears could still be warranted now. Right. But there's also about the the technical correctness of the underlying outcomes as well. Right. So in the past, I've done work in R and we've had to double check it in SAS.
But then in my experience, because I have such a large user base, then lots of the functionality gets a hell of a lot of testers. Right. And you don't, you know, and generally speaking, especially for the packages which have been used high, high volume. It's very, very mathematically correct. Indeed, I've seen mistakes in some commercial offerings of software because they're just not tested to the same amount.
So then there's the adoption of open source and then there's the actually making open source of, you know, stories like me where, you know, we don't have this very protective business model. Right. We realise that there's benefits in doing open source software. So not only are we using open source tools, we're giving back to the open source community now as well, engaging the open source community.
Machine learning use cases and energy transition
Yeah. So, I mean, Shell's business model is it's completely changing. Right. You know, there is an energy transition and more and more and more I'm involved in what we refer to as renewable energy solutions problems. Right. So this is related to low carbon technologies such as wind, solar, carbon sequestration projects. And these are all fairly new right now. And it's and the scale and the speed at which developments are happening in this space are quite, quite surprising.
Right. So all the machine learning technologies are being used to a lesser or greater degree. Right. You know, we say we're using all manner of technologies, whether it be basic, simple time series forecasting. Machine vision is also a strong one. So that's the deep learning. LSTMs when it comes to solar forecasting. Look at all. When I first started in Shell, right, I was almost a heretic for bringing to the table random forest. Right. But then now and now again, why don't we, you know, random forest and machine learning algorithms such as that are becoming part of the course as well.
Another big use case that we have and I suppose a big ticket item is predictive maintenance. Right. This is where we listen to the data continuously streaming, look out for anomalies and warn people when we think components are about to fail. So you can plan for failure and reacting to failures when they actually occur. So the answer to your question is loads. Right. Then I mean, I could keep on going for a long, long time. Right. There's loads of different new and novel applications coming in and in need and are mostly associated with the transition in the energy.
Benefits of open source recognized by leadership
Right. OK, so I mean, I guess it has to do with the themes around machine learning and AI. Right. Back in the day, you know, when I was this consultant in Shell, we were rather a niche and there wasn't the general IT support for us. Right. So we typically did skunk projects, basically. Right. And we just had our own IT infrastructure and all the rest of it. But now, you know, it's completely gone, you know, huge scale.
Right. And Shell as a business, and I'm sure lots of other businesses as well. We have the IT departments basically working with us now. Right. Supporting us in our endeavours because they can see the benefits of it as well. Right. And I would say things like Python and R, they understand the benefits of deploying these open source solutions, the capabilities and their machine learning and an AI, for example. Right. So and indeed, it's the chosen method to go. If you didn't, you know, if you didn't go open source, there wouldn't be a hell of a lot of choice out there either for that matter. So it's it's it's really been revolutionary. I'm talking about the last 10 years, I would say, not just recently.
Making the case for open source
Well, I guess I'll use again that as an example, I two reasons. Right. Is one. It's transparent. Right. And we're not hiding anything. Right. Because if you have have if your company is selling your software, then you're not going to give your IP away by giving letting people see underneath the hood. Right. And the second one is cost. Right. So and in the case you start working right, the environmental regulators were saying this is open source. We're making it publicly available so that other people can use it as well. So there's greater adoption. Greater adoption means better the more you trust it.
So certainly in my scenario, that's where we see the biggest benefits of having, you know, using and distributing open source software. In other areas, it could be a bit of a gray area, right? You know, you have to and my steer has always been, you've got to make that business case bulletproof before you go and open source your software. Because one thing I would say about it is that it is a one way street. Right. Once you made it open source, you can't recover it back. So you better make damn sure when you do make it open sources for the right reasons and you have clearance from the business to do.
Handling sensor data at scale
Right now, if you look online, you might see lots of presentations about this from Dan Jevons, our VP. Right. Out there, we have millions of sensors, you know, pumping out data points on a second by second basis. So there's a huge volume. So we've actually partnered up with a company called C3.AI and this is an industrial scale, highly distributable, highly scalable way of deploying solutions, analytical solutions to lots and lots of data. So it's obviously cloud based, spark based technology at an industrial scale. So if you're going to have a look at C3.AI, this is Shell's chosen partner to develop these solutions at scale.
And I think you can deploy, you know, Python based scripts. The clever part is how you distribute and allocate them jobs and coordinate and orchestrate them jobs. Not Sparkly R, I think most of the applications is Python, basically on C3. So it's PySparkly. However, however, we have I have done work with Spark, not Sparkly R, right?
Collaborating with Excel users via GWSDAT
I mean, because. When we designed GWSDAT, you know, it was in in mind that it's not a statistician, it's not our programmers that are going to be using it, it's environmental engineers. Right. And we wanted the barrier of entry to be as low as possible. OK, so the primary user interface for GWSDAT as it was designed in the first instance is an Excel add-in effectively. Right. Where users have an Excel add-in menu which behind the scenes connects with R, grabs the data, sends it to R, uses GWSDAT package and uses Shiny to give you the graphical interface. So, you know, no one has that do any R programming effectively, it's a point and click. What you see is what you get kind of interface.
Yeah. However, the code is openly available. We do have pull requests from environmental engineers, you know, who wanted something updated in the software and all the rest of it. So it works. It really does work in that respect.
There are a few hacks in the background to make it work in Excel, right? Effectively, how it works is in Excel, it takes the settings and the data and puts it into a temporary set of CSV files and temporary notepad file. And then it uses it, looks at the registry to see where R lives and then it uses R in batch mode. Right. So grab the temporary files and run Excel and run Shiny in the GWSDAT package like so.
Career path and team structure at Shell
I never I never thought I'd be this lucky to be honest, if I was honest, you know, I mean, being able to spend most of my time in North Wales, you know, have it in a role where you're given quite a lot of autonomy, you know, and freedom to explore the things which interest you. And I never dreamt it would be so. So and the work life balance as well. Right. You know, I was thinking about this in the day, actually, never in my wildest dreams that I ever think I'd have it so good.
Wayne, I saw that Joe had asked another question as well around AI ethics frameworks. What sorts of AI ethics frameworks are applied at Shell and in an open source data science in general? You know what? That's that's a really good question. Right. And the way I would describe it is work in progress at the moment. All right. You know, at the moment, I think the way it stands now. It's important, obviously, but it's going to becoming increasingly important as AI gets more sophisticated. Right. So I think, you know, there's lots of people in Shell working on building an AI ethics framework. I can't comment any further to say it's work in progress.
So we've changed quite a lot, right? You know, so back a few years ago, maybe five or six years ago, we effectively had a statistical consulting team, I think about 15, 20 people. And, you know, now I would say those 15, 20 people are basically embedded in other roles as senior managers, typically nowadays, right? So you've got a whole range of different teams actually sit within data science and within data science. We have optimisation, we have statistical consulting, we have the data scientists themselves. All right. And you've got other teams like machine vision, artificial intelligence as well. So so it's just it's gone really, really big now, effectively, it's very, very big. And I think the whole organisation is I think there's over 100 in data science, data science only.
As for my role, you know, so I guess your, you know, position in your career when you're forced to make a choice between going down the managerial route or the technical route, right? I was kind of offered an opportunity in the managerial. I don't know. I don't want that. And it kind of crystallised me that we want to go down the technical career path. In Shell, I'm happy to say, you know, if there's a very organised career structure with job ladders and, you know, effectively in a technical career path, you can become what's called the principal science expert, which I'm happy to say I was awarded in 2019. And that's kind of recognition that you're a senior technical leader and your job, you know, if you go down that route is basically to be a technical expert and technical leadership in this field.
Mentorship and using your initiative
Yeah, just coming back to people, right, it's not just because I'm going down the technical career path doesn't mean that I don't work or manage people, but I suppose the difference is, is you tend to instead of being a team lead or a manager, you tend to be more of a mentor. So this is bringing, you know, and I do this for the Royal Statistical Society as well, you know, mentor a few people to get the chart as its status. And I get a lot out of it, right, just to be purely nosy as well, if I'm being totally honest, it's a different industry just to see what it's like in other places. But yeah, I do like mentoring people and trying to shape their careers and trying to get them on the right path and use the benefit of my experience, you know, to help them in their path.
I would say the biggest thing is, you know, particularly in Shell is, you know, you have to use your initiative, right? You know, if you see white space, put your name on it. I remember what someone said to me, you know, use your initiative.
I would say from this is the team leads, you know, at the moment, the biggest challenge we're having in this space is recruiting. All right. You know, and I think that this is this is a challenge not just in data science, but also in lots of different areas. I've never known such a jobs market to be so liquid and never known so many opportunities. Right. You know, it is definitely, you know, a job, you know, if you're looking for a job, I've never known it's so healthy. You know, people are spoiled for opportunity out there basically at the moment, not just in data science, but in a whole range of other industries as well. Yeah. So. So it's definitely a seller's market, guys.
I love that you just brought up mentorship there, too, and I'd love to just dive deeper into that. If you have maybe some tips for us all who who want to reach out and help mentor people that don't really know where to get started.
Yeah, I mean, I mean, yes, I mean, well, semi-formal, semi-informal, I would say. Right. Well, what I try and do is people who identify themselves fairly early that they want to go down a technical career path, I basically help shape them into giving them advice as to what they need to do to start achieving that. Right. And it's something that, you know, obviously, deep technical expertise doesn't come overnight. It takes a long time. Right. And, you know, we try and make sure that people who want to go down this career track try and get the right skills and background there as well.
You know, working with universities, I think, is a good one, right? You know, something I really like doing is working with different academic institutions. They've got a good relationship with the University of Glasgow, I'm one of their member of staff now. Also got a good relationship with the University of Lancaster, University of Newcastle, UCL as well. Right. And that's part of what my job is. I'm trying to advocate that we build more academic relationships in Shell.
So, you know, being a principal science expert in Shell, you know, you're talking about a very diverse set of skills. Right. So we've got lots of chemists as well, geologists, data scientists. So why tell people if they want to go down the technical career path is to just soak it all up, right? The best you can, right? Because as you go perhaps further into your career, maybe further up the food chain, right? You get less time to do technical work, right? You seem to get dragged in other directions, right? So when you're early on in your career, you know, like doing a PhD as well. Right. You have a lot of time to do technical work. So soak it up, kids, you know, and use your initiative.
I'll give you an example of using initiative, right? You know, we bought some, this is going back again, we bought some new PCs, ran some algorithms, realised they weren't running any quicker. Right. And I was like, why not? Well, the reason is because even the PC was quicker, more powerful. It's by virtue of it had more cores. Right. But each core was the same sort of processing power. So some of the algorithms we were using were, they were just single threaded, just multi-threaded, whereas, you know, we wanted to move to multi-threaded. So I used a look just on my own back, right? I looked into parallel processing in R and then when we actually had some more use cases in R, I started using this parallel processing. Right. That got me noticed for some of the guys. And then when Spark came in, I was the person to do the pilot with Spark app, for example. And it ended up with me and the now VP, you know, on a Spark conference explaining what was done there. So that's just one example of using initiative. Don't hold back. I don't wait. Just do it.
Communication skills and empathy
Yeah, I mean, in terms of, you know, what I'd be looking for is good technical depth. Right. But also, you know, if you're working in industry as well, you need to you need to be able to explain and communicate yourself as well. And this is perhaps somewhere where the data scientists could learn a bit more. You know, still go to conferences and see you really, really badly presented presentations are very, very good pieces of work. And I think in industry, you know, a real benefit if you can. If you can be both technically good and also very good at communicating these often complex principles to non-mathematicians as well.
Yeah, I mean, I'm not the expert in that, but there's a colleague in Shell and she has, you know, compiled lots of hints and tips about doing it as well as courses you can take as well. Off the top of my head, I can't think of any of them. Right. But then what's helped me is mentors. Right. For one thing and business consulting. Right. So one of the things when I first joined Shell I was most impressed about is people's ability to communicate very well.
Yeah. And for someone so, you know, if you spot someone who you think is really good at communicating. Right. And you're impressed by them, you know, just buddy up. Right. Say, look, I like the way you did that. Can you give me some hints and tips as to how you do it? Right. And often there's the basic stuff as well. Like, you know, the trouble is that data science, you kind of focus on a very small area sometimes and you get kind of obsessed by the details. Right. I think I've done a presentation about data storytelling in the past. Right. And one of the first things I had, I advocate is to step back, step back, step back some more. So when you're presenting a problem, you give the bigger picture and you narrow in to the data science, technical details rather than rather than actually, you know, obsessing about the details immediately. You've got to bring the people in. And the other thing I would say is empathy. Right. If you think more empathically. Right. Then that will go a long way. So instead of thinking about it from a data science, put yourself in the in the customers or the client's shoes and think about it from their perspective. That really helps. Right. Really helps. Those are my two tips.
Right. If you think more empathically. Right. Then that will go a long way. So instead of thinking about it from a data science, put yourself in the in the customers or the client's shoes and think about it from their perspective. That really helps. Right. Really helps. Those are my two tips.
That you do. I think I'd really do think, right, if there's one tip I would give to people, you know, is with more empathy, you'd make yourself a better consultant. You have to put yourself in the client's shoes. Definitely.
Overcoming nerves when presenting
Sure. I'm talking about presentations and whatnot. We use SCRUM methodologies on our team. One of the ceremonies is every two weeks. We review something. I've started to just to practice presenting in front of people, just presenting something, anything, 10 minutes, five minutes, just trying to be succinct about it. I still get nervous when I present virtually, which doesn't make any sense to me because I'm in my home, I'm comfortable, but just seeing the little cues in front of me makes me a little nervous. Just getting those reps in I found to be useful. Then there was a comment a little further up in the chat that said, ask for feedback. That's been valuable. You can present something terribly, but unless somebody tells you it's terrible, you just don't know.
I think eventually, you know, you do get more confident, right, as you do more. So you obviously encourage yourself to do more as well. Right. But from Wales, I think one of the scariest things I've had to do is go to the Eisteddfod. So this is a national pageant and as a child, you have to do it. But you have no choice. Right. And you sit in the front of the stage, on a stage in front of, you know, hundreds of people, basically. Right. And I suppose it's trial by fire. If you can cope with that, then you can cope with presenting.
Another thing I really like in school is to get young children to talk to the rest of the class about being nervous, is when you bring something into the class that you really cared about. Right. So bringing something and then, you know, you get a child, you know, they're not thinking about the fact that they're having to present something to the rest of the class. They're just thinking about this toy or dolly that they brought in. And, you know, I think that can help overcome some of the nervousness as well.
Data science and climate change
Yeah, so that's that's a very good question. Right. And I think, you know, data science and statistics and optimization has lots of roles to play in that. Right. You know, so, for example, you know, some stats group has been doing lots of work in detection, leak detection. So this is identifying leaks of say methane or CO2. Right. So that obviously is a direct benefit to the environment, you know, stuff like that has a benefit on the environment, climate is different. We're doing lots of weather modelling, and we're moving to low carbon technologies and we're bringing that data science to bear on those low carbon technologies such as wind, for example, we're doing optimization around wind.
One interesting thing is, is on microgrids, for example, this is a combination of using battery, you know, for storage got coupled up with say wind or solar and stuff like that. You can use optimization algorithms to reduce the CO2 output as well. Right. Those are very, very small set of examples. And, you know, if you, if you look at, if you do a search on Dan Jevons shell, you'll see, you know, a bigger picture there as well. He talks about it in greater detail about what AI can bring to bear.
Thank you so much, Wayne. I'm just going to put into the chat right now the LinkedIn group that we all have if, if there's certain topics that people want to continue the conversation. And I try to put a summary of a few tips on there every week as well.
Thank you so much Wayne for for sharing your experience with us and all of your insights. Really appreciate all the great questions too.
And I know there's a lot that we can learn from you. Technically as well and I know we talked about doing a meetup and showing the shiny application and diving into that a bit deeper so would love to do that as well.
Awesome. Thank you all and if we're looking ahead to next week we'll be joined by Lindsay Clark, director of data science at healthcare blue book as well we are getting better about sharing the upcoming list here on the website too. Thank you all so much. Have a great rest of the day.
