Luis Lopez - earthaccess: Accelerating NASA Earthdata sci through open, collaborative development

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name is Luis Lopez and I'm a software engineer at one of the NASA Earth Science Data Centers. I hope that I don't mention a lot of acronyms because that's, you know, a disease that NASA has and government agencies, but I'll try to explain all the acronyms and hopefully this is not a boring talk for you. So I'm going to present a Python package that has improved the experience of scientists trying to get data from NASA Earth Sciences.

This is basically the gist of it. If you want to take home the message is that a couple years ago we started developing this package that facilitates how scientists access satellite data, basically, and reduces the time to science. This is a must, I think, for the times that we're living in and it became now kind of a small community-led project that has helped a lot of different research groups to accelerate their science. And also the other message that you have to take home is that we have the coolest logo. And I don't know if, well, you probably are an R user and you notice something particular about our logo and that is that it fits with our ecosystem. And the reason for that is that it was designed by Allison Hertz, the designer for a lot of our shiny packages and different things in the art world. So we're really happy about that.

And we can all agree that it shouldn't be that way. Science, it's, you know, a collaborative effort. And my vision and the people that I work with has the same vision of it's not, it shouldn't be exclusive, right? And everybody that can participate in science should participate in science. And software, it shouldn't be a limitation. So no one should be left behind.

The earthaccess solution

So in the year 2000, late 2021, NASA partnered with OpenEscapes. And if you were here last year, you know what OpenEscapes is. But if not, it's a really cool project about like teaching open science practices to different cohorts of scientists across, you know, different organizations. So NASA partnered with OpenEscapes to kind of learn together, okay, we know that this is painful. We know that data in the cloud is going to be problematic for the scientists. So how can we like all try to solve this problem together?

And well, then we went back to the what is problematic for the scientists. And we kind of divided the problem into like, you know, the operations that you have to do in order to get to the data. Regardless of where the data was, you needed some authentication. You need to search for the data. And once you find the data, you need to access it. So once you divide that, two years ago, it occurred to me that, oh, this should be in a package. Because like those operations are very concrete. And that's how the package that I'm talking about came to be.

So earthaccess is really simple. It doesn't have a lot of, there is a lot of things probably going on under the hood. But for the scientists, it's basically three lines of code. So they have to have a credential with NASA. They have to search for something. And you can use the DOI for the data set, if it's one available, your bounding box, your temporal domain. And you can use more of those things. But as you can see, even if you understand Python, you see that it's very concrete and very simple.

Now, this is a real science example for earthaccess. And this is like searching for some data, getting some, what in the NASA lingo is called granules or files from the satellite, and plotting that data using X-ray, the library that we heard in a previous talk. So the whole code for doing this, that's plotting sea level rise, the line on the right is from the satellite, and the other is from a different data set that was in voice in the ocean. Well, you know, you're looking at it, and it's not as long as it used to be because now earthaccess allows scientists to focus on their science and not on the other problems of accessing the data. So you can basically come up with this infographic by yourself that NASA has in their sea level rise group webpage with very few lines of code.

And now the next thing that we're trying to do with earthaccess is like, well, we know that this is a toy example, kind of. Scientists, what they really want to answer those planetary questions that I was. So they don't deal with one file or two files. They deal with archives that contain a thousand or a million files, and you need to scale this thing. So once you put that same philosophy on doing that, the code doesn't change, but the semantics doesn't change. What the science wants is that answer. And the next thing for earthaccess is the library is how to deal with distributed authentication, with optimized reads to the files and IO and caching and all of the cool stuff, but with keeping the simplicity of the API.

A bridge between ecosystems and languages

So in a way, I think the project has become this technical and social bridge between the APIs that NASA produced over the years it is producing and the libraries that scientists are using for getting to that data. So you see, you can use earthaccess to get some data and put it in Pandas or open it in X-ray. And we have this interoperability because, you know, we have this connection of ecosystems. And fortunately, that has resonated with the scientists because it's like, wow, this is like not as complicated as it used to be. And from only me and a few others, the library grew up to like, now a lot of people are using it because it makes sense. It's simpler.

And the other side of this is that I thought at the beginning that this shouldn't be language specific. So Carl Bodiger, a professor at the University of Berkeley, had like, oh, we should take some part of earthaccess and do like the login mechanism in R if you are like an R scientist. And it started this project called RdataLogin, which has some of the basic code that you need to know to do RdataLogin, which has some of the basic core things that earthaccess does. And the idea that I have is that it becomes an SDK because that's what at the end scientists will need. Like, they are the same operations and it shouldn't be limited to one language. Whatever language a scientist uses, that's where earthaccess should be living. And we have some talks with people that work with Julia. So that will be the next one.

And I think this is the message, right? Like, if we reduce accidental complexity and make things simpler for the scientists, help them actually solve the problems, because we're not solving the problems. That is the complicated part. I think the technical side on the left on this graph is the simple part. The really complicated thing is like, once you have that, you know, the biggest study about what's happening in the nervous system, like what's happening to the Amazon, what's happening in the rivers, then how to transform that into policy. That is the complicated stuff. So the technical side, I invite you all to help your local scientists in solving that part.

If we reduce accidental complexity and make things simpler for the scientists, help them actually solve the problems, because we're not solving the problems. That is the complicated part. So the technical side, I invite you all to help your local scientists in solving that part.

And this is some of the people that have helped the library over these last two years. And you might have seen some familiar faces. But with that, I thank you and I accept your question.

Luis Lopez - earthaccess: Accelerating NASA Earthdata sci through open, collaborative development

Transcript#

Why this matters: the state of our planet

NASA's Earth data infrastructure

The problem: complexity blocking scientists

The earthaccess solution

A bridge between ecosystems and languages