Resources

James Laird-Smith @ Bank of England | Data Science Hangout

video
Oct 31, 2023
59:32

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Awesome. Well, happy Thursday, everybody, and welcome to the Data Science Hangout. Hope everyone's having a great week. It was so awesome to get to meet so many of you at the Posit Conference in Chicago, and so happy to be back with everybody this week. For anybody I haven't had the chance to meet yet, I'm Rachel Dempsey, and I'm the host of our Data Science Hangout here and lead customer marketing at Posit.

So this is our open space to chat about data science leadership, questions you're facing, and really just getting to hear what's going on in the world of data across all different industries. And so we're here every Thursday at the same time, same place. And James, I think that's where you first came across the Hangouts is you were watching a lot of them on YouTube, too. Yeah, I've become a bit obsessed with programming YouTube. So I watch quite a lot of programming tutorials, programming conference talks online, and play YouTube through the television. So it's quite a constant stream for me.

Well, at the Hangout, we're all dedicated to making this a welcoming environment for everybody. So we love hearing from everyone, no matter your years of experience, titles, industry, or the languages that you work in, too. And so it's totally okay for you to just listen in. You can also jump in and ask questions or provide your own perspective on certain topics. You can put questions in the Zoom chat and just put a little star next to it if it's something that you want me to read out loud. And then lastly, we also have a Slido link where you can ask questions anonymously, too.

Introducing James Laird-Smith

But with all that, I am so excited to be joined by my co-host today, James Laird-Smith, data scientist at the Bank of England. And James, I'd love to kick things off by having you introduce yourself and share a little bit about your role, a little bit about the work you do, and also something you like to do outside of work. Yeah, sure. Thanks for having me. It's great to be on.

So yeah, like you introduced, my name is James. I am a data scientist at the Bank of England, where I kind of work as like an internal data consultant. Like the bank is quite big. And so I partner, the team of which I'm part, we partner with different business areas in order to try and improve their data science workflows and to bring kind of a best practice to the work that they do. I am also sort of separately, I have the kind of specific title of our business owner, which is something which I'm not sure exists in many places. As part of that, I am kind of the business's representative of the tool, which means I attend some meetings where some things about R are decided upon.

I should also maybe say, because this is a frequent source of misunderstanding, the Bank of England is the UK central bank. So some people think it's a retail bank, but it's not. So for those joining from the United States, it's much more akin to the Federal Reserve. People who are from elsewhere know it as the Bank of Australia or the Bank of Japan or the European Central Bank. So we're not a retail bank, we're the central bank for the United Kingdom.

And okay, what about something you like to do outside of work too? So I think for people, what people sometimes will notice is about my accent, which is not something I do outside of work. I happen to be a South African, so I only came to the UK about just under six years ago now. I think the thing that people most know me for, although I don't do it that much anymore, is I used to be very involved in competitive debating. So at high school and university for a long time, I debated competitively. Debating is something which is mainly based out of schools and universities. And so I don't do it that much anymore, but that's something I dedicated quite a lot of free time to.

What are a few of the things that they have in common or what lessons have you learned from your experience debating? The answer is not much. If I add a stretch, I would say that even in both roles, a lot of debating is what we call impromptu. So we only get a short time to prepare, but that makes us quite good at brainstorming. So sometimes when I do some statistical modeling, some of the activity is to try and imagine what the forces that are in play in your model are, so what's really happening in the real world. And debating has some of that. Of course, on a skills level, it's very useful to be able to speak in front of people and to have practice speaking in front of people because that has obviously widespread use, not just in data science, but in lots of professional capacities.

Supporting 1,000 R users at the Bank of England

James, when we were talking a few weeks ago now, you had mentioned to me that the bank has 4,000 employees, and I think like close to 1,000 of them use R. Yeah. So I think the thing to understand is I always think of the Bank of England as just a giant data factory. So like we have a huge amount of data going in and a broad set of data going in. The weird thing is not so much that we have 1,000 R users, the thing is we should have more actually. So it's just that data is just so central to basically every facet of what the bank does that actually a lot of our aim is to try and get more people into R and the data science languages.

I have like a brief breakdown which I kind of prepared for this which I think is maybe instructive on this point. So the bank is hugely varied in the kind of things that it does. So there's like at least one sort of division of the bank which is a bit like an investment bank or a hedge fund or a mutual fund. And their thing is a bit like time series and the usual things you would think of like quantitative finance and stuff like that. But then there's another area of the bank which is like an academic department who's primarily involved in publishing papers and research. And they will publish in LaTeX and use Quarto and things like that. There's also a huge amount of the bank which is like just operational. For example, we collect data on like banknotes. So just the physical currency which that's an operational division of the bank but also collects data and we use that data to make decisions.

The other thing maybe to note for people who might not know is we actually have good telemetry on this. So we sort of keep track of the number of users of all our different tools. And so it's not just like we count downloads or something like that. It's about a thousand users over the course of the year have used RStudio.

The business owner and IT owner model

The question was, you're the business champ for R. Is there an IT side owner as well? And how do you work together? Yes, absolutely. So the idea is that we have a technology owner, which is much like IT and we have a business owner and yes, we do absolutely work together. So the technology owner, unsurprisingly, is often somebody from sort of the sysadmin type of role. And the business owner is designed to be separate from that. It's much more common that this person would be a data scientist. It's much more common that this person would be working day to day. But yeah, we absolutely work together and the technology owner and I sort of shoot each other questions every day. The other thing, which the business owner is also meant to be a kind of like champion. So if there's something wrong with the tool, we have some administrators because the bank is very big. We have a team of administrators as my job to raise issues when they come about and to be a sounding board for the users in general. Sometimes have to walk a bit of a tightrope. So balancing different users' interests.

Laura, I see you had a question somewhat related as well. Yeah. So it's tagging on to that and it's related to, you know, when there's two owners of one thing, obviously there's never just, like you said, there's a lot of overlap. Have you had to weigh the, you know, there's ownership and then there's territorialism. And in every organization I've ever worked at, there are some people who have just moved into the territorialism niche.

Yeah. Yeah. So it's a great question. Mercifully, I have been saved from this particular quandary the same way I think you have. A lot of our, if there are tensions, they tend to come in prioritizing. The trade-offs that I have to make and pay attention to are what the majority of users are likely to find troublesome. And that's often a judgment call.

And thankfully, I have never needed to escalate that. The bank is very collegial in that respect. But if so, then there is a kind of a chain of command. On that same note, the metric that I kind of, a lot of it is individual. The metric I try and use to assess these things. It's important to be mindful of people who don't complain. So people who have like tried to use a tool, but have given up because it's too difficult. And I have to always be aware that I don't hear from those people because they've given up. And there's kind of a selection bias there, which is, you know, you don't hear from the people who've given up.

There's kind of a selection bias there, which is, you know, you don't hear from the people who've given up.

And so one of the things which I'm working on right now and will continue to try and work on is just getting package installs correct. So we have a set of our own internal packages. And usually the process is quite streamlined. But of course, if there is a problem, then that's a very big stumbling block, especially for a new user. So things like that, I tend to prioritize quite highly. But yeah, it's not the easiest thing to do. And then it requires judgment and experience.

Cross-government collaboration and modernizing submissions

The question was, as part of a government agency, is there a cross-governmental agency group, perhaps loosely affiliated with a focus on R or other languages and perhaps emerging technologies? We are trying. So there are definitely people who use R across the government sector. We have pretty good relationships there, not quite to where it could be. And there's definitely more sharing that could be done and that could also extend to data sharing. Yeah, definitely room to grow on that one.

Well, it was more that within pharma industry, we're engaging quite a lot in discussions with the regulatory agencies, you know, FDA being a good example, to define what they want to see from us when we interact with them. I've been talking to one or two people from different parts of finance who then said, well, it'd be fantastic if there was a similar discussion between various bodies and financial regulators to say, you know, are we giving you what you want? Can we modernize this?

Yes. So there's a couple of avenues on this. The one thing we are trying to do quite hard is modernize our submissions. In the financial services and then finance industry in general, is a lot of the time there need to be submissions made to the bank on various aspects of regulation. And we put a lot of effort and are still putting a lot of effort on into trying to make that as seamless and streamlined as possible.

Onboarding and adoption

Yeah, and the answer is that there are no easy answers. So, a lot of the things that you've touched on, we do have dedicated training and dedicated trainers and we have a schedule. Documentation, also important together with that, the kind of offline asynchronous resource that people can consult. I will also say about documentation in general, writing is a delicate art. So, I think one of the things that makes a difference to me is actually being given time to write something properly and so to give the user a beginning to end experience that is reproducible for them, that is underrated in my experience.

The third pillar of that is just solving common problems as much as possible. So, the holy grail of package rollouts or rollouts as platform rollouts rather is just the one liner, just can I solve this in one line? Is this like a one line thing that I can put in and I can get the experience? And we know that, everybody on this call will know that it's never as easy as that. We have a range of our packages at the bank, which are specifically for data access. Just try and get it one line. I just want one command that I can run and it gets the user from zero to connected. And as much of that as you can give to users, the better.

Reproducibility across a large user base

Yeah, totally. So, I think it's again, some training goes into this. I really think that Posit's tool set actually do this incredibly well. The reason that R Markdown doesn't really have... You can't just reproduce tiny bits of R Markdown. It's really meant to be start to finish, which basically means... It's reproducible by default and you actually have to opt into the non-reproducible bits.

But otherwise, it's just the usual sorts of things that people on this call will know about, which is try and make examples reproducible, for example, in the reprex package, a lot about culture and just running things. But also works in combination with what I just said, which is it's easier if they have easy data connections, right? So, if you're able to give some people a data connection package, then they can do it start to finish. And I think a lot of users are actually quite good at... A lot of users in my experience are quite good at spotting, oh, now I can run this end to end because I've got this data connection. I'm not working with extracts.

Maintenance, handoff, and building for others

So, this is coming back to James's point that when you develop stuff, you definitely don't want to own that solution forever or maintain it. Within the business I'm in, we have a kind of policy that I think is similar to what James was saying, where if we're developing something, it's worth the time to develop it in a way that the people we hand it off to can maintain it themselves. Maybe not necessarily expand on it greatly, but if there are bits they tweak, they know where to tweak and they can get in there and add a new variable or change something from A to B quite readily. If you take a little bit longer and give them the right solution that's easy for everyone to maintain going forward, it relieves the burden on everyone.

I can't help but 100% endorse all of that. I only have an example to add, which is I think these can be straightforward things like function names. So, just take time to name your functions. It's weird because it feels like make work, but it's not, actually. It's just naming functions and naming variables and stuff is just a really important part of someone else understanding it in the future and then being able to audit it or debug it in the future.

And, sorry, to kind of expand on that, another example is in like markdown documents or reports that you construct for someone, if you make them parameterized and what you do is that you feed in, if you ingest like a YAML file or something that drives that parameterization, then they can always expand it and add future parameters and do all kinds of clever things with it. Yeah, 100%.

Dashboard design and avoiding dashboard fatigue

Donald had said in that thread, one of the biggest barriers I face in building dashboards or packages is convincing people that the work is worth the maintenance costs from either my team or theirs and would love additional discussions around this.

Yeah, so everything I've kind of said also applies to dashboards. So dashboard design is just also a hugely underrated skill and takes time and experience to appreciate like when a good dashboard is made well. I try and say to managers or whomever I'm trying to convince is it's much more preferable to have a smaller number of insights conveyed well and continuously than to have a great wedge of insights to, you know, sort of haphazardly bandied together. The reason that's not always easy is there's always somebody who wants their favorite chart. And that's where you kind of get this dashboard fatigue phenomenon where everybody wants to put their own like bit on the Christmas tree and eventually the Christmas tree doesn't look good or falls over.

That's where you kind of get this dashboard fatigue phenomenon where everybody wants to put their own like bit on the Christmas tree and eventually the Christmas tree doesn't look good or falls over.

I am constantly struck by just how similar, well, I mean, the name does it justice, right? This is design and design is hard, right? I kind of feel the same way about dashboards as I do about good writing. So, you know, good writing starts well, right? You know, the first sentence should like engage the reader in the same way that like the first thing that the dashboard shows you should be the most important thing. Again, structure is important. We're very good as humans at structuring text very well, because, you know, we get lots of training on this insofar as we have, you know, structure, headings, citations, and things like that. And we don't really get it for dashboards, but a lot of the same things apply. Especially asking yourself the question, answering the question on behalf of the user, you know, what are you trying to communicate to them? What question do they have that you are trying to answer?

Testing and rolling out new versions

Yeah. So this is a bit of a journey that the bank has been on. So as we sort of like grown in a user base, we've started to see this a lot more that, you know, when we do roll out a new version of R, I'm kind of now the person responsible for testing it, but I can't test the code of a thousand users for a startup. Like not everybody is using version control, but you know, we are getting there. And that's part of what I've tried to do is to try and get like a test bank. So these are things which absolutely should work. So these are, you know, unit tests and regression tests and integration tests.

We try to get business, other areas of the bank to contribute their code. That's not always easy because they're busy as well. I will also say that it's again, something which just takes time and energy. So, you know, I invest quite a lot of time in putting together our test suites now, and it's hard in the short term, but pays a lot of dividends in the long term. The other thing which is a bit useful for that is, so my area of focus is kind of on our data warehouse. My sort of like, I'm also partly responsible for the tooling on the data warehouse. So that's my way of bringing it together.

Adopting new tools and languages

So, yeah, I would say that there's a balance. And with important work, we will facilitate. The one thing to bear in mind is the bank is quite sensitive about security. So we have policies on what can be installed, which I think is fairly common at many organizations. And so yeah, we try and balance the concerns of wanting to adopt new technologies and wanting to be secure, but also not to adopt something too quickly. Somebody develops a business critical process in a language that's not supported. And, you know, people will move on to other roles, and perhaps leave the bank. And, you know, there's a maintenance burden, which goes along with that, which we have to be mindful of. There is a bit of work happening with Julia right now. But yeah, we haven't sort of promoted it to fully fledged support yet.

Career advice and the transformative power of coding

Oh, that's a good question. It's going to sound very obvious, but it's just to really get enjoyment out of what you do, just to... I think that that probably speaks to a lot of people on the call, given your attendance here today. But just being interested in the subject matter that your work is involved in is just huge for your motivation and your ability to continue with it.

I think it's not good career advice perhaps, but it's always just to keep perspective about what your career is and what your job is and that you should never be sort of hamstrung or bound to a particular career or company or anything like that. It's just to assess as rationally as you can the situation you're in and where you want to be and to cash out, to take time for yourself and to make sure you have that balance of the considerations that you care about.

But if you do get a chance, I do think programming is transformative. So I mean, we on this call are probably long past the stage where we are surprised at the power of programming. If you think back to your past selves, certainly back to my past self, which is just the ability to use code to interact with the various parts of your life or your work was definitely a changing point for me. And so I don't mind advising people to take up code because I know what code and software has done for me and it happens to be a reasonably good career prospect at the same time. And so what a fantastic thing for us to be able to engage in something which is both hugely interesting, hugely rewarding, and has good career prospects along the way.

Well, I'll tell you a bit about my experience of learning. So I grew up in South Africa and one of the things that happens in South Africa, it's changed a lot now, but internet use as a South African is far removed from what you might encounter in the West. South Africans used to like turn off the data on their phone because they were afraid of being charged on their mobile plan because the internet was very expensive. And so, what I would do is I would go to the university library and I would find a way to download YouTube videos. So I listened to some talks on Python and I downloaded them because I knew it would be too expensive to watch them all in my flat at home. And so these demonstrations, which were video recorded, were just like gold dust to me. They were like something which I, as somebody who lived many, many thousands of, well, thousands of kilometers away, could experience this world. And I got insight into this world and live demonstrations. And I'm very, very grateful for that. It changed my life. Of course, you watch them on repeat because you can never download enough.

These demonstrations, which were video recorded, were just like gold dust to me. They were like something which I, as somebody who lived many, many thousands of, well, thousands of kilometers away, could experience this world. And I got insight into this world and live demonstrations. And I'm very, very grateful for that. It changed my life.

I would also say that that kind of reinforces the notion of video. So I know that some conferences and some speakers on Stanley don't want to be video, but it is so, so appreciated by people who cannot be in the room with you. And there are kids in other parts of the world who will watch that. Who will go out of their way to watch that. And you're always speaking to them.

Novel technologies and R at the bank

So, yeah, in the middle section, there's definitely Data Table. So Data Table is great. Data Table, slightly underrated R technology. But when you have something which a lot of people try to take out of the database and want to work on really big data, they need the performance. Also, lots of people like Data Table for its syntax. A lot of things. So I guess we are now just starting on this journey like anybody of LLMs. And so we have a hackathon quite soon just to see that we're not using it in earnest yet.

Of course, we do do text mining. So text mining, the bank has a network of agents across the country. And those agents report to us on, they do interviews, and they report to us on the state of different industries and businesses. And so text mining is very, very fruitful. We'll be able to extract insights from that. So yeah, like I said before, it's hugely varied. So the bank is very data set hungry. And those are just some of the ways that we utilize R in particular.

Thank you so much, James, for joining us today and sharing your perspective. It's a reminder to me to also say thank you to everybody who ends up watching these hangouts on YouTube in the future as well. I just want to say like, you all are part of this community as well. But a special thank you as well to everybody here for all the great questions today. Thanks everybody for making this community what it is and hope you all have a great rest of the day.