Resources

Navigating technology for data teams | Stanislav Seltser | Data Science Hangout

video
Dec 20, 2024
1:00:28

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back to the Data Science Hangout, everybody. If we haven't had a chance to meet before, I'm Rachel, I lead Customer Marketing at Posit. And I am going to keep adding this in here because I learned from a friend at the Hangout that it's helpful for me to actually let people know that Posit is the company formally called RStudio. So we build enterprise solutions and open source tools for people who do data science with both R and Python.

Sure, I'm Libby. I am helping out Rachel with the community stuff here with the Hangout, helping host. And I also work with Posit Academy to help people learn R and Python to do better work with data in their jobs. So happy to have you all joining us today. The Hangout is our open space to hear what's going on in the world of data across all different industries to chat about data science leadership and connect with others who are facing similar things as you. So we get together here every Thursday at the same time, same place.

But thank you so much to those who have helped make this the friendly and welcoming space that it is today. And we're all dedicated to keeping it that way. So if you ever have feedback about your experience that you'd like to share with me anonymously, good and bad, or maybe suggestions for topics to dive deeper on, I'm going to share a Google form with you in the chat here.

And I would love to hear your feedback there. And as mentioned, you can always reach out directly on LinkedIn as well. Yes, absolutely. We love hearing from you. And we love hearing about your experiences here in the Hangout and all of your questions. This is a discussion that's driven by the audience, right? By us as a community. No matter your years of experience, your title, your industry, what language you work in, we want to hear from you.

Before we get in and do an introduction for our featured leader, I wanted to make sure we shared a recent blog post that Libby had written last week and published, which shares career advice that has been shared here in the Hangout. And Libby, do you want to share a little bit more about that?

Yeah, I just put it in the chat. I'm really excited to be able to sort of amplify the voices that we have on the Hangout, because I feel like all of our guests and our participants and our community members are so brilliant and smart and amazing. And there's so much wisdom trapped in these episodes that people attend live or like are up on YouTube, and I really want to share them with the world. So this is the first one. It's pieces of career advice that really resonated with me. I hope that they will resonate with you too.

Well, with all that, thank you again for joining us today. I'm so excited to be joined by our co-host, our other co-host Stanislav Seltzer, VP of Enterprise Architecture and AI Infrastructure at State Street Global Advisors.

And Stanislav, I'd love to have you introduce yourself and share a little bit about your role to get us started, but also something that you like to do outside of work for fun.

Yeah, so my name is Stanislav Seltzer. So I work at SSGA, that's a division of State Street. And I lead a group that focuses on infrastructure and AI and data science. So our primary, we don't do data science ourselves, but our primary users are data scientists. Or economists, financial engineers, whatever you want to call them. So it's a variety of people with different backgrounds who are really interested in solving problems of economics and finance and predictive analytics. So I'm there just to really help them facilitate doing their job and make transformation from ideas to actual operations smooth.

Well, outside of my job, I'm really interested in the ping-pong. I am crazy about ping-pong. I just won a company tournament in ping-pong and I play in a club in the ping-pong tournaments. And I also do triathlon as well in my spare time. So kind of like split my time back and forth between triathlon and ping-pong. A really good exercise, keep me in shape and challenged.

Pushing new technology forward in organizations

So Stanislav, a lot of times in the Data Science Hangout conversations come up about how do we ask for approval for new tools and how do we approach our IT organization? And I think it's really exciting to have you on the call to be able to share the experience from that side of the house as well. And when we were talking before, you shared with me the company is not going to invest in a tool just because a bunch of people like it. So I'm wondering what have you found most helpful when pushing an idea or new technology forward?

So I think usually in organizations, like when we see somebody like doing startup, right? So you put together a business plan and you put together a technical prototype, right? So there's two pieces together, show your idea in action, and they also show what's it's worth in the future. So in the company, it works in a very similar way. So you have internal decision-making teams, which kind of function like, in my opinion, like a VC. So you need to think about the presentation, like non-technical presentation and technical. So ideally you prototype something and make it an example that shows what the capabilities are.

I think if you merely propose something, that's really not sufficient. So essentially when you're looking to get something into organization, you need to find somebody who really feels this is necessary. And the person is driving the company principal business. So like say in case of financial companies, that would be a financial engineer. If you go to any other company, like pharmaceutical, pharmaceutical engineers. So they really need to be your peer partner to help you push the technology forward. And I think that's how we got all the adoption of all the new technologies in the bank. We are always partner with somebody, and it's not easy actually to find people.

So you need to have some digging and sales and marketing on your own to actually understand what people are doing and how they go about the jobs, where the point points are. So just merely approaching, you know, CEO, CTO was the idea. It doesn't really work. So oftentimes it's best to talk to your managers, to peers and, and prototype, even on your personal PC, even outside the company, just to show us that you tried it yourself and you get some understanding of technology as somebody who's proposing it.

So just merely approaching, you know, CEO, CTO was the idea. It doesn't really work. So oftentimes it's best to talk to your managers, to peers and, and prototype, even on your personal PC, even outside the company, just to show us that you tried it yourself and you get some understanding of technology as somebody who's proposing it.

Yeah, that really resonated to me when we were chatting earlier, when you said companies work like a VC. And so when you're bringing in this new idea, they're going to be looking at what's the least risk and maximum value for them. You're not the only one who's bringing the ideas, right? So if you think of yourself, right, you're an investor, right? You can invest into, you know, there's so many attractive options you can do, right? So you can invest into a risky enterprise, or you can buy, you know, us government bonds, or you can buy, you know, index, SAP 500 index stock. So you have so many ideas that have different, you know, risk characteristics to them. So as an investor, right, you put yourself into that position. You have options and you're going to consider this option when waiting something, right? And you need to become, I think the whole premise of finance is that you need to be compensated for the risk, right? So when you're proposing something, people who are going to fund you are going to think about the risk and they're going to say, okay, what I'm going to get in exchange for the risk. And that is a question that you need to be ready to answer, you know.

How the team uses Posit and Shiny

So I think for this group here to learn a little bit more about some of the work that your team does, it might be helpful. Could you tell us a little bit about the ways that your team uses Posit, but also Shiny on the team? So, um, we use a variety of tools. So I think, uh, the Posit was a dominant force, um, and still is, uh, for our data scientists. I think, um, uh, it was a main language for, uh, many, uh, engineers are coming out of college with graduate degrees in finance and economics. Um, and so most of the people that we work with would be familiar with that, uh, with our method libraries and methodologists related to the, um, financial modeling. So the R was, was very widely adopted. And so we use Posit as a, as a tool, um, primarily, first of all, because it has a very nice debugger built in, um, the UI was integrated with Git, which is important for IT processes. Um, and, uh, you know, it has a nice browser and ability to build packages and all this kind of stuff.

So now we're a little bit expanding. So I think, um, we're looking at the market, market trends. So, um, Python is really dominating the data science market. So, uh, we're shifting towards expanding tools. So, um, uh, Visual Studio is our second now second main, uh, development tool, um, alongside with Jupyter notebooks. Um, the Posit is still there. So Posit was there, will be there. So we use Shiny R, um, as a library where users can really easy design their own applications. So that has, um, uh, appeal because IT doesn't have to be involved in developing application. So somebody who's really doesn't know that much about programming Shiny R can relatively easy put together a relatively sophisticated application.

Um, not to mention that you can use nowadays, um, LLMs to actually generate application, the whole Shiny R application and deploy it on your PC and show to somebody really quickly. So, um, there's a little bit of appeal. So Shiny, Shiny, there's a Shiny Python now too. So we're looking into that. Um, and, uh, the good thing about Shiny is that somebody who really doesn't, like I said, really is not a soft engineer, professional soft engineer can not only develop application, but you can actually, if you buy enterprise version of Posit, you can easily upgrade to things which are required by IT standards, like single sign-on, authentication, authorization, all the integration with the corporate security and so on.

Inspiring creativity and experimentation

And so, I know we talked before a little bit about how Shiny gives people the freedom to experiment and how you're not allowing people that freedom, it might stifle their creativity, but it's like, how do you inspire creativity across the organization?

Well, I think there's multiple aspects to that. So, um, I think, uh, essentially the long-term goal for us is to, um, kind of, you know, um, I wouldn't say totally minimize the role of IT, but maybe kind of make it easier and easier for people to kind of dream up things and, you know, and describe them in English and, uh, transform them to, to applications as quickly as possible. Um, so, um, I think we, we are an investment bank, so, um, investment banks, um, very, very much research oriented. So that means we're producing research papers, we're producing results that we publish and, and this research papers eventually transform into mutual funds and other financial products that we deliver to market every year. So the speeding up the process from idea to, um, to the product, that's what we're trying to do.

And basically being able to let people test the stuff, um, in a relatively short time, that's what the ultimate goal is. So, you know, if somebody comes up with a strategy, um, they should be able to validate that strategy, you know, quickly and as cheaply as possible. So that's, that's our goal is to, you know, everybody has a budget. So if you strive to do things cheaply, economically and efficiently, that allows more people to try things and, uh, you know, try more ideas and more people just generally experiment with technologies. Um, so the data science market is very, you know, very agile. And so for bank, you also have to, to stay abreast of it, all the technologies and, um, constantly, constantly evaluate what's happening in the market.

What problems the team solves

I would love to, because I feel like I need, um, a more simple explanation of the types of problems that you solve so I can understand exactly what you do. So explain it to me like I'm a 10 year old. I would love to know what type of problems you solve or what type of problems your end users solve if you are like building tools.

So, uh, we have, um, I think half of the workforce of the bank is researchers. Researchers in economics and finance. So the problems that they're solving, pretty much every researcher is a portfolio manager or researcher in economics and they work with portfolio manager. So what they do is they run mutual funds, they run hedge funds, they run ETF products. And the goal for them, right, is to be able to achieve performance, right? So you sell somebody ETF, that ETF shows reasonably good performance, right? So achieving performance means you predicting the future.

So if you're running mutual funds, you have, you're constrained by the, you know, investors are going to tell you what the risk characteristics are. And, um, there's a, you know, profile, how long they're willing to invest. And it's your job to think about what's going to happen in the future and where to invest money into. So, so forecasting the future is actually incredibly difficult because, uh, you don't know so many things, right? So you may be observing some things like observing prices on, uh, you know, stock exchange, but you don't know what's driving, what's underlying forces. So that's, that's a challenge, a really fundamental challenge.

And for us, it's predicting technology where the technology is going to go. So think of it. Once you pick something, you cannot change it easily because it's just going to be, you know, lots of people is going to use it and it will be next to impossible to change it. So you need to be really, really careful and thoughtful. And, um, when you're picking, um, underlying technology for researchers to use, right. So maybe picking up a modeling language, I think that's, that's one actually choice who is a big impact, right? So there's so many languages and you have Julia, you have Python, you have Rust, um, you know, so you're making a trade of decision, right?

What is the best strategy, right? So if you, uh, buy proprietary tool, right. You use commercial proprietary tool, company goes out of business. What's going to happen to us? Who's going to support, right? So for that reason, we generally, I think last maybe probably 15 years, we've been slowly, slowly using open source tools with commercial support. So POSIT perfectly fits into that category. So they support Python R, they provide R3, and they give us commercial support. Which is what we need when things go, you know, sometimes things go wrong.

Um, and, uh, what we noticed that if you buy really good proprietary stuff, eventually it dies. End of life is inevitable, not just for people, but also for software. Um, and it's actually happens much faster than, than you really think about it. And so what we, the goal for us is to predict when this is going to happen, how quickly, and what can we do to defend ourselves? So I think that's, if you're asking, what's the challenge for us, is that, uh, picking up the tools and, um, figuring out how long is it going to last, you know?

Career advice

Um, I think like, if you're really looking for career, I think the first thing is, is, um, building networking. So being social and connecting with as many people as you can. That's, that's my advice. Number one. So presenting when, when you can, you know, going to meetups and presenting and, and being a good speaker.

Um, that's number one. And number two, I would say is that helping other, other people to succeed in their jobs, uh, will also help you to succeed as, um, as an influencer and, um, motivator. So inside the SGA, we're very big on that. So like, we, we mentor people and the senior executive mentor us. So I think that's, uh, very powerful, great, very powerful cohesion within the company.

Um, and second, I would say collaborating with people, I think in 21st century, I think like, if you think like, like look at nineties and eighties when U.S. was outsourcing, uh, you know, a lot of, uh, work, uh, overseas. And, um, I think in, in the, you know, 21st century, it doesn't really matter where you are. So what matters is, you know, collaboration and, uh, your skills.

So some people, you know, make a career changes pretty often. Uh, and some people make career changes, I would say, not so often. Uh, so if you make career changes often, like I would say every two, three years, the problem is you don't get to focus on big things. You focus on relatively small things. So to make big, big changes happen in an organization, to make impact, you really need time to, you know, to sell your ideas to organization, to deliver them, um, to train people and maintain and support after.

So Python, uh, uh, Rust, uh, AI, PyTorch. So that's machine learning. So, so the areas that I'm focusing, I'm trying to go really, really deep, not to go too wide, just, just focus on, on things that, um, I really can, can, uh, be an expert on. Right. So if somebody asking me a question about the ML model, I am able to read it, diagnose it and correct it, um, relatively, you know, short amount of time. Somebody asked me about R code, why it's not working. I'm able to debug it and, uh, figure out what is wrong with it. So that's, to me, I think you really need to have very disciplined focus on what you, what, what you're doing.

Um, I mean, and I did, I'm actually a professor of a social professor of computer science at BU as well. So to me, learning is kind of continuous process. So you need to think about that. Um, not only because technology changes for people who are, for us who are in technology, uh, technology is changing. So you have to be continuously learning, but also for yourself, just, just generally challenge yourself, um, to, to different things. Uh, because you, you know, you're in the market for one time, like 30 years, 25 years.

Forecasting when a tool will go obsolete

So the first thing I think is a red flag for us is when people stop committing to, to the tool. And when you, let's say if you use something even like open source, right. And, um, you go to GitHub and you look how many people commit, how many people make commits to the tool. So if you see is that the rate of commitment is draw is dropping, that's a red flag. So, um, generally when we use any open source package, for example, right. Um, we monitor that metric. How often people commit. So if somebody didn't commit, let's say there's a package and somebody didn't commit for a year. Okay. We know that thing is in trouble. Maybe the maintainer died. Nobody, nobody's interested. So something is, is up there.

So we did a general investigate. Um, if it's not open source, if it's proprietary, is the way we look at it is, um, how often vendors deliver releases. And if this releases our maintenance releases or they bring up innovation features. So usually, uh, market leaders deliver releases every quarter. And with those releases, the, the, the healthy company will, um, will deliver a mix of new features and maintenance. Right. So is it just going to maintenance, uh, let's, you know, for six months, a year, that's a red flag.

Um, you know, so going to customer conferences and talking to companies, right. Also, uh, can show you, right. There's a number of customers increasing or decreasing. You can just look how many people come to conference, um, technology upgrades. Right. So, um, I'm not going to give you a name, but we had a vendor for 10 years. So he, he was bought by a bigger vendor and the bigger vendor, um, just stopped, uh, upgrading underlying compilers. They were using the same compiler. So compilers actually, I would say every year there's a release from GC, let's say GCC compiler every year there's a release. So if the vendor is relying on the product and the compiler and, uh, they just stop upgrading the compiler and you see that, okay, two, three years, same compiler. That's a red flag.

End of life is inevitable, not just for people, but also for software. Um, and it's actually happens much faster than, than you really think about it.

So we kind of get worried and, uh, you know, maybe underlying engineering team is shrinking or maybe the company just wants to use. So usually what you see product dying in the marketplace when somebody buys it, right. And they just want to use as a product for, um, you know, just a cashflow, right. And they just run it until cashflow, you know, until customer are paying. So that is a place where we don't want to be.

And, you know, it's hard to, to actually predict when somebody, somebody is going to buy somebody. So this has happened all the time. So tomorrow opposite could be bought by, by a bigger company. Right. Um, that's okay. So, but what we would be worried if, you know, RStudio releases would slow down in the frequency. I think that would be, we're a public benefit corporation.

Monitoring open source ecosystems

Yeah. So spark is one of the main tools that we monitor very, very actively. Um, and we actually have commercial version of it. So Databricks, um, so yeah, so the spark has, I would say over 200 commuters and those commuters are not individual people. They're companies. Uh, so usually what you see is that if you, if you look at the, you know, where the people are working on, you see major companies, you see Intel, you see Apple, you see Microsoft, you see IBM, uh, Databricks is obviously, um, dominates. But, um, those companies are not, uh, uh, you know, basically what usually when you see commuters like that, that means commuters are using open source internal, right? They're using open source internally, they're running internally, they're running operations, and they're making changes based on a, on a feedback, right?

And, uh, we generally look at the, at the discussions that people have on a, on a public Jira because it really gives you insight into what's coming, right? So usually if you go to the vendors, a commercial vendors, when you say what's coming and say, oh, it's a secret, you know, well, if it's, if it's, if you back by public source, right? And people committing to it, it's not a secret what's coming down the line. You can see it, uh, in the discussions, if you summarize discussions, if you summarize, uh, you know, if you look at Jira comments that people make, you can actually see where things are going.

Um, so like, I think, uh, recently there's a big change in the Spark where, uh, you know, Databricks stopped supporting Spark R and they switched to, to Posit Sparkly, Posit now official partner. Uh, that change was not announced publicly until like Databricks, uh, AI conference in June this year. But if you read the Jira comments carefully, you would see that coming very clearly. And, uh, we, we, and we did source it, like I would say in February, March, we already knew this is what's going to happen. Even before they publicly announced, uh, this, this change.

Um, so I guess congratulations to Posit, uh, for quickly picking up the Slack, but I think from perspective of, uh, again, I think it's one of the advantages using open source with a commercial support. Uh, you can actually see what's happening. You can see what's happening with the products, where is it going? You can look at developing branches, right? Before they make it to the main branches and you can see the future. So that's, uh, uh, gives you ability to plan, right? So one of the things for us is ability to plan forward. We like to plan for a long time and we don't like surprises.

Avoiding and exiting dying technologies

I think, um, like the way we look at technology, right, for us, the first and most important thing is usability, right? So if you see something that's not usable, that's probably the first thing indicates you do not, don't go there. Um, usable means, uh, you know, easy to learn, easy to, to test, easy to maintain, right?

We invested a lot of effort into R and R is not really, you know, easy to use. It doesn't have a built-in debugger, for example. So when you're trying to debug something and let's say you don't have RStudio, you, and you have a pressure, right? You have, you know, operational pressure to fix a problem. You could be in a hell of a lot of trouble because of that.

Um, the other thing is that when you look at community size, sheer community size, right? So again, compare R to Python. If you look at, um, a number of packages that is on Py, on PyPI, I think it's Python is PyPI, close to 700,000 packages. And you look at the number of packages on CRAN, 37. So it's not even order of magnitude. It's, it's almost tourism magnitude, right? So, so when, when there is a big community behind something, I think it kind of indicates to you where the market is going to go. So if you pick up something that's kind of an edge, um, you, you know, you may find yourself with something that's slowly dying over time.

Um, and, uh, what we noticed that if you buy really good proprietary stuff, eventually it dies. So the way we look at it, you know, okay, if you're doing, uh, you know, operational reporting, maybe you better off with not Databricks, but maybe Oracle or, or, you know, Snowflake or some other vendors. But if you're doing analytics on a, you know, on a big scale, um, you know, that is platform for you. So I'm not trying to sell Databricks or anything, but just example, right? So how's, um, how we think about, you know, the technology has to feed for what you're doing, right? Um, and so there's all those aspects, okay. Cost and, uh, usability and maintainability and whether that's going to enhance your, your personally, you know, market skills, all this stuff. Um, but at the end, what matters was, uh, the product you choose, it gives you ROI, right?

Yeah, I'm just, I was just thinking of a good example. So again, just personal anecdote, but uh, I really used to like, uh, you know, um, Android phones because they were very, very cheap, right. And, um, and the open source and all this stuff. And then eventually at some point, I actually switched to iPhone. And so if you, I found, you know, it's expensive, um, closed system, right. And it has lots of disadvantages. However, what you look at iPhone that keeps the value, right. It gives a value, you know, and if you look at the long-term, let's say five years, what you will see is that Apple is doing releases. Every quarter is doing releases, maintenance releases. And once a year they do feature release, right.

And if you look at Android, what you will see is that number of releases that deliver new features is actually much smaller. And, you know, I'm so surprised that even old iPhones continue to keep value, where if you get like Android, Samsung, which cost exactly the same as iPhone, the values is, you know, initial value cost of buying it is very close. But over time you will see decline is much steeper, uh, in resale value. So that's, that's telling you something, right. That's telling you how the companies take, what's attitude towards the product, long-term product maintenance.

Skills worth going deep on

So I think, again, this is my personal opinion, so it's not an official anything. Um, when I, I, when I was in a PhD program at WPI, uh, they have this class called concrete mathematics. And I was like, okay, what's, what's, what's concrete mathematics? Isn't mathematics is abstract by definition. And it turned out to be is that what they're giving us is, is just, um, a large set of problems. Uh, and they expected us to solve it. Uh, it was very broad set of problems, right? And they expected us to solve them. Um, and I think to me is that was really core, core skills that I learned from entire PhD program is that, you know, your strengths is not in one particular thing. Your strengths is, is actually trying, you know, solving different problems as many as you can.

Um, and basically that is allowing you to build up, uh, you know, your technical skill. So, and so like when I'm teaching classes at BU, we solve a problem from Kaggle, which is main competition website. Uh, we solve problem from, um, ArcPuzzle, which is another main competition website. Actually, if you've never heard of ArcPuzzle, uh, I highly recommend it's a $1 million price competition where they give you, uh, I think about 500 puzzles. And what they do is they expect you to solve the puzzles in a way that's generic.

Um, so that's actually very interesting because people say, oh, uh, like LLM can solve all the puzzles. And that is not true because LLM cannot solve all the puzzles, uh, because LLM can only solve the puzzles that other people already solved. So if you never seen the puzzle, the chances of LLM solving it, uh, relatively low. So actually it's very interesting to be, to, to, to try this ArcChallenge. So I'm out of 400. I'm, I've just done 25 and just basically it's, it's not that complicated, uh, puzzle, but when you're trying to think a bit, okay, how do I do it generically? Uh, you know, not just solve specific puzzles, puzzles in a specific way, but how do we do this generically? That's actually very, you understand it's a very difficult problem.

So my recommendation, you want to get better, you do problems that's a hard, even if you don't solve them and you fail, you still learn something. Okay. So, um, and then, you know, if you get really disappointed, you can go solve a little simpler problem and kind of give yourself a little, um, little reward. So for me personally, okay, solving 25 puzzles on ArcChallenge, yeah, that really, that works you out, you know?

So my recommendation, you want to get better, you do problems that's a hard, even if you don't solve them and you fail, you still learn something.

Uh, so, uh, Kaggle too. So Kaggle, Kaggle is, um, is a great site where they have competitions from different industries, right? And it allows you to really do a lot of machine learning, right? So you join the team or you form the team or you, that's, that's purely, purely machine learning. So, um, ArcPrize is, um, is, is more towards AI. So I, I, I try to do problems from different domain, some problems from just pure, pure mathematics, some problem, you know, pure, um, uh, statistics, some problem with machine learning, some problem from AI. So I kind of try to get all around it in the data science because it's such a big area.

Thank you all so much. And Stanislav, was that the right link? Uh, yes, it is. Yeah. Okay. Awesome. Well, thank you again. Thank you. Have a great rest of the day, everybody.