More productive but a lot less fun — with Charlie Marsh

I think that's actually kind of been the big unlock for me is I, it's not always the case that it can do any one thing like way faster or way better than I could, but it can do, I can parallelize or multitask like a lot more.

So, I mean, I think the main thing that I'm like, yeah, agents are like, they're just becoming like more and more capable. I do think it is absolutely like using them well is a skill like anything else. And I think it's very easy to use them poorly and also to get burned on them and think they're not useful because their build is this magical thing. And then you try it, you know, if you just sort of walk in and try it for something and it doesn't do it right or it makes one of the mistakes that I would recognize as like the unobvious thing that an LLM might get wrong, like, okay, yeah, it tells me something that's not true. A lot of people will have that interaction and they're like, I can't trust it for anything. I actually think that's not right. Like, I think it's a skill like anything else to figure out how to use them well, when to accept what they believe, like how to get them to validate it. Like, it's obviously bad if the thing is telling you something that's not true. But I think as you use it more and more, you start to identify like why that would happen, how you can avoid it. So, I view it as like very much as a skill that like you can get better and stronger at.

And, you know, I think a big thing that I say a lot with the team is like, you know, like I said earlier, so much of our success has come from having a deep understanding of our users. And increasingly, our users are like building with agents. And so, even if like we don't want to like build with agents for whatever reason, like we sort of have a responsibility to because at least I feel that way. Because I need to have a really good understanding, I think, of how people are working and how programming is changing, how people are using our tool. Like, that's why like PYX being written in Python, we have a big Python project where we're using uv-rough-ty. So, if you're not building like large applications in Python, then you're not expecting what it's like. Yeah, exactly. And with agents, right? And so, I think we're like, you know, as a team, we're sort of like this, you know, this mass of people all trying to figure out how to use these tools in different ways. And we're like learning a lot from each other, but it's also changing super fast. And I think the best thing, at least from my perspective, the best thing we can be doing is just like experimenting and like trying things and like viewing that as a core part of the job, I think, to like learn and understand how this stuff works.

Open source contributions and "slop PRs"

Do you have any advice for people maybe like looking to contribute to like Ruff or uv or things like that who are dipping their toes into agents? It's a challenging time. Yeah, I would say, I mean, there are certainly a lot of projects that have publicly put out AI policies in different ways. Like, I think TLDraw stopped accepting external PR or something to that effect. And then I know like Ghosty. I think, you know, they have a very specific AI policy. We're not there yet. I think it's almost guaranteed that we will be.

But I think the thing that's challenging is like writing, like getting, putting up code, sorry, writing the code, a draft of the code is like not expensive. Like that is the cheap and easy part. The hard thing is validating correctness, you know, thinking hard about the architecture, like all these things that come downstream of writing code. And so increasingly what we see is like, you know, interaction that like, I don't really know what we do about this, but it just doesn't really make sense, is like we have an issue, a contributor takes it, they just, you know, drop it into Claude Code and say like fix and link to the issue. And then they just put up the PR, right? And then it's like, what's the, no, but it's sort of like, what's the point of that? Because like I could just do that, right? And the hard part is what happens after, which is we have a bunch of, we have to give a bunch of feedback, we have to think hard about the change, we have to think hard about how you validate it. And, you know, if the person who puts it up hasn't done that work or thought about doing that work, like that's actually, that is like the hard part, right? It's like I can prompt the agent. A lot of like, you know, I guess I call them slop PRs, you know, and so like I've even heard. Well-intentioned. Yeah, well-intentioned. Often, often well-intentioned.

But it's like, you know, if someone, it does create a weird dynamic because for us it's like we get a huge PR, we kind of have to like think through like, well, how do we engage with this? Like is the person going to engage if we ask them hard follow-up questions? Or maybe they're like, you know, we have all these questions like, well, what should I do next? And we're like, well, that's actually the hard part is like we don't really know what you should do next. Like you have to kind of like do the work of figuring out like how do we develop confidence in the change. And so I think, you know, we're sort of trying to figure out like how to, it's not, and it's not even exclusive to open source contributors. There are like elements of this too that happen like within the team. Like I got some feedback, which I thought, which I really appreciated, which was that people on the team felt like they couldn't trust my PRs as much anymore, which is like completely fair, completely fair and very interesting. Because the point was like, you know, previously when people were reviewing my changes within the team, you know, they didn't feel like they had to review them super closely. Like they trusted my work a lot. And now it's like if I'm putting up a PR that's done by Claude Code, like I am, like empirically I'm putting up code that has patterns or mistakes or issues in it that I just wouldn't have put up before.

people on the team felt like they couldn't trust my PRs as much anymore, which is like completely fair, completely fair and very interesting.

Do you feel like your trust or confidence in your PRs stayed the same? Like, no, I mean definitely gone down. Fair. Like I think I'm still a lot more productive, but like this is why it's like, I think it's like a very challenging moment to like figure out how to think about these things. Like I, you know, and sometimes too I'm like, you know, a bunch of stuff gets pointed out on a PR. Like, sorry, what it took for me to realize this by the way was I put up a PR and I will try to review my PRs myself first, but it's a little different when it's code that you've been close to and like writing. But I put up a PR and I thought it was like fine. And then I got like a lot of feedback from someone on our team who's great. And, you know, he like looked at it really closely and he gave me like a lot of like real feedback. He's like, this is like not, like this is unused. Like this is, why are we doing this here? Like why aren't we using the implementation we have from over here? All these problems that I hadn't really noticed. And so like getting a review from someone who wasn't using an agent at all and was a domain expert and was thinking really carefully about it. Like I've just had to, over time I'm trying to build up the philosophy that I'm still responsible for my own code. Because it's very easy to forget that when the agents are just like turning stuff out and you're like this looks good.

Yeah, I mean lately like I've been on this journey of essentially, you know, I think I'm also in the same place that you are now where like I'm primarily using, you know, primarily using cloud code to implement. So like my workflow is I have a bunch of stack terminals, usually three or four or five cloud code sessions. I work on different projects in parallel because, you know, while I'm busy working on one thing, I can work on something else. And so like I've enjoyed like the productivity benefit of being able to like nudge along work and on multiple things or like multiple, you know, work trees within the same project. But like, you know, like you, like I've run into the same problem of the code quality just being, you know, often poor, surprising. I mean it has certainly gotten a lot better, but there's like this like code quality problem either of like solving the problem in a much more verbose way or with a lot of code duplication, like copy and pasting the same thing, especially like the test suites end up like much bigger and much more bloated and duplicated. And I'm like one of these, I'm like obsessed with like code cleanliness and like not repeating yourself and not having code duplication. And so like, you know, code smells. I built a lintel. Yeah, you know a thing or two about this. Yeah, so whenever I open up these code bases, I look at the PR. I'm like, ooh, this is like, you know, as far as code smells, like this is, you know, stinky, stinky cheese level of code smell.

But, you know, I feel like there's certainly like the human element of code review is important. And I think like for you, the agent user, I think part of it is your responsibility to like, you know, do as much of that before putting a PR so that you aren't like essentially asking other people to do all of the review labor. And so like I've been trying to figure out how to do that. I've also been investing a lot in like what I call adversarial code review tools because if you ask Claude Code to review its own code that it just generated, like you just do slash review, it's going to be like, looks great, ready for production, you know. And then if you're like, no, it's not, you're like, you're absolutely right. I need to write more test cases. And so I think we've all been through that like gaslighting of the agent's overconfidence and self-confidence in its work.

So I think just to connect these two things, I think it's so interesting, people on your team who are like, I feel less confident in your PRs and you're like, maybe me too. It's just a lot more. I'm like, that's totally fair. And Wes saying like, exploring like this idea of adversarial reviews, like is there a way to like up the confidence and the quality? Yeah. I'm really curious. I think it connects back to something you said on Twitter, like that reviewing code is the hard part.

Developing intuition for agent limitations

We're not very good at like understanding code quality, I think, as humans. And like, I think, well, I mean, I guess like the mentality I try to take towards some of this stuff is like, one, if we put something up and it's, well, first of all, it's, I feel like the thing where you have a cloud code session that writes the code and then you create a separate session and ask it to review it and it comes up with a bunch of valid feedback, it's incredibly unintuitive if you know like nothing about how these work. If you don't know anything about how the models work, it's incredibly unintuitive that that could ever happen. Because you're like, wait a second, it's the same agent. Like it wrote the code. Like how does it not find these problems? And for me, that's actually a good example of trying to develop intuition around like how these things are, the limitations that these things have. Because it's like, yeah, it actually does kind of make, it does make sense if you understand how they work. But if you don't and you just view it as, I put things into the model and it should like always like do the perfect right thing. It's an oracle. It's just, it's the truth oracle. Yeah, if you view it in a certain way, it's like, it's a very good example to me of like how you have to learn to use the tools. Yeah, I mean, that's not like a hard thing to learn or to do, but it's like, it's very unintuitive that that would be the case, in my opinion, if you don't understand how they work.

I think, I mean, one thing I've tried to like internalize is, and we're still like, I would say we as a team don't have answers to any of this stuff. We're still really trying to figure it out. I think for me, again, it's like people need to be responsible for their own code. When you see the model do things wrong, try to find ways to prevent that in the future, whether it's like updating the CloudMD or the AgentsMD or adding static analysis tooling. We're actually probably turning on more Lint rules now than we used to because, and we have like, even in Rust, we have like, we've always used clippy pedantic. It's like the most pedantic. We have like a couple of rules disabled. So like try to enforce things programmatically because the model's good at following if you have something that's enforced programmatically and it's getting diagnostics about it, it will fix those things. Or try to give feedback in the CloudMD or similar. Like we had all these problems. They're not like problems, but it was like, you know, in Python, oh, like Claude kept adding like dunder all to like all these things and it's like we basically never want to use that. And like blah, blah, blah. Or in Rust, it was like, oh, it's not using let, like let chains, which are like a very new feature. And so probably not as much of it in the training data or it doesn't realize it has access to them. So like putting that in the CloudMD.

And then over time, like trying to bolster, I mean, this was important even before AI, but it's like trying to bolster, like how can we get confidence in changes, you know, faster? Like in, you know, in TY, for example, we have this, we invested a lot in this like ecosystem report infrastructure. So like every time we put up a change, we run over a bunch of projects and we have like a really nice report that diffs all the diagnostics and links to the lines of code and like helps you understand what's going on. And like, and that's really helpful for building confidence in changes because we can like click through that report and like see what changed and try to identify problems. Ironically, also a thing that's incredibly useful is you give Claude that report and you ask it, are these false positives or false negatives? Go analyze the code. Like we've probably saved, we're saving like hours and hours and hours on that. Like try and understand whether these changes are false positives or false negatives, or like we get this user report, come up with a very minimal reproduction. Like someone on our team yesterday was like, I think this would have taken me six hours before. And so, you know, that's, I guess another lesson is like trying to find things that are off the critical path, that aren't shipping production code that you can automate.

Yeah, and I've discovered just through trial and error that different agents are better at different things. Like for me, like the stuff that I'm working on, it happens to be that, you know, Codex from OpenAI is a better code reviewer. It's a lot slower. Like I wouldn't use it for implementing necessarily because it feels like, to me it feels slow. And so, and I just prefer the ergonomics of Claude Code. But I'm happy to let, you know, Codex grind all day doing my code reviews.

But one of the things that like, maybe a segue into like a slightly deeper topic with, I'm very curious on your take on like some ideas that I've been having lately. But we're talking about code quality, but the question I've been asking myself, and it's been keeping me up at night is, you know, if humans aren't... Does it even matter? Maybe it's even this idea of code quality, like this is something that only people care about. And so like if people aren't reading the code anyway, like what is the new code quality? I know, I'm thinking about this a lot now. Maybe what is good code quality for agents is different than what it is for humans. Like actually, maybe code duplication, which I detest and whenever I see like, no, no, this could be factored out into a helper function. Like maybe the repetition is actually good for the agent. Like it helps establish like, what is the proper pattern for solving that problem? And so, you know, I found myself being like, well, should I refactor this? It's like, is this going to improve or, you know, hurt the agent's performance in the future? Like I don't rightly know the answer to that.

Maybe what is good code quality for agents is different than what it is for humans. Like actually, maybe code duplication, which I detest and whenever I see like, no, no, this could be factored out into a helper function. Like maybe the repetition is actually good for the agent.

And, you know, that's kind of the segue into like kind of the, you know, kind of the big idea that I've been, that's been, you know, beyond that, that's been keeping me up at night lately is like the more, I feel like there was a big sea change in like September, October when Opus 4.5 came out. When, you know, it was evident that the quality, like as I dabbled, I started using Claude Code and became an adherent in back in April. And I was using Claude Code to do, you know, fairly significant changes in Positron , which is Posit's data science IDE. And I work on, have worked on the Data Explorer, which is like an advanced data viewer within Positron. And I found myself often struggling

because I'm bad at UI code and it like uses React and there's a lot of like nitty-gritty React details. And so whenever I had to touch like the UI layer of the Data Explorer, I would find myself just slowing down because I'm like, ah, like not only am I like, do I not enjoy TypeScript at all, but like there's all these foreign concepts. I'm like, I'm not a front-end developer. Like, you know, and so I find myself in the past like being dependent on other developers to get help. Like, I can't do front-end, please help. And now I'm like, Claude, please help.

And so I found that actually was productive. And so I was getting a lot of stuff done, but obviously with like a high error rate and a lot of like just struggling with this massive, because Positron is a fork of VS Code, so it's a massive TypeScript code base. Even getting to the point where I could orient Claude in the code base where it could like run the build system, run the unit tests for that part of, you know, this million-line code base was a lot of work.

But, you know, September, October, there was like an evident sea change where like, wow, no, no, like, because I went from being a huge AI skeptic, was not even using Cursor or Windsurf prior to last March. I was still like Emacs, you know, I was like writing code by hand like a caveman. And so there was an evident sea change. I'm like, okay, coding agents of the future. And then September, October, like, okay, like this is the way I should be writing code from now on. Like I don't anticipate writing much code anymore.

Python in an agentic world

But what happened as I moved on was like, Python seems like the wrong language to be using. And the reason that I started feeling that way was because the code bases that I was creating, and this was especially like personal tools and projects.

I just want to take a second to dwell on the big point you made, which was existential, like what's our job? I sound like a bodily response. Well, sorry, my response was to the question of, yeah, which I've also thought about before, which is like if, like some of this stuff, I mean, I tweeted about this the other day. I was kind of kidding because like I occasionally, I'm, for whatever reason, I really like my code comments to be complete sentences. Right, like psychopath. And in a period, like stuff like that. And so I find myself at the end of a session often saying like, make sure all the comments, it's in my claude.md, but sometimes it doesn't do it. And I'm like, make sure the comments are in periods. Or like, I really don't like when you use like em dashes in comments, like use a semicolon or a comma or a period or a parenthetical or whatever.

And it's like, okay, that's like actually a good example maybe because it's like, does it matter? Like, sorry, that didn't even matter before. So maybe it's a bad example. You're like, am I holding Claude back? Yeah, well, no, but it's like, yeah, it's a question of like, okay, like I don't really like when the model uses like dunderall. Or like I don't really like in Python when the model does like local imports. Like sometimes it's just easier for it to import us. Oh, I hate local imports, yeah. Yeah, yeah, yeah. And it's like, okay, I, you know, we added a rule in our project where it's like you can't do that except for certain modules where you have to do that because they're big SDKs or whatever. And so now it's like enforced.

But the question for me is like, does it matter? And like I think, well, first of all, I absolutely don't know the answer. And I think no one does. And I think anyone who claims that they are confident basically about anything that is going to happen in the next like six months is mistaken.

Like one, I do think like the notions of code quality will be really different. Like I think what's good for agents and what's good for humans like will be different. I don't think we really know like what's good for agents yet. And I think that itself will also evolve a lot as the models get more capable. Like things that are helpful maybe right now might get completely solved by the next generation of models with regards to like as in divergences around like code quality, like duplication. Like maybe I won't even say because I have no data on it, but like maybe duplication is helpful right now but then it gets solved by a future release. This is a random thing. Like I think it will change a lot.

I think we will need like much better ways to understand like code quality. And I would love to see like actual quantitative metrics about this stuff because every opinion that basically that everyone has in the industry right now is just based on feeling, which is not nothing. But it's like I'd love to understand like can we do long scale like longitudinal experiments where we're like how does an agent act more efficiently to get to the end result? Like what are actual metrics that we can think about?

I think we will need like much better ways to understand like code quality. And I would love to see like actual quantitative metrics about this stuff because every opinion that basically that everyone has in the industry right now is just based on feeling, which is not nothing.

I also think this will like different software is built very differently. Like that's even true today. And I think that will continue to be true. Like certain projects, you know, there are certain parts of my code base too where I'm like way more hands off with the agents and certain parts where I'm like way more hands on. Like I think in TY, if we just like merged every PR that we had caught right, it would completely destroy the project. We would get a bunch of things wrong. We would accumulate incredible debt, performance problems, like correctness issues, like also no one would understand it, which is a really big problem for that code base because it's like a big, it's a compiler, right? I mean it's a type checker, but it kind of looks like a compiler. It's like a very sophisticated project.

In the PYX front end, if we just merged every PR that we had caught right, things would actually probably be totally fine, right? I mean that's a bit extreme, but my point is like those are very different. Or like I have a bunch of experimental scripts that I use to like pull data and stuff and it's like I care about that being correct and so I should care about the correctness, but I don't really care like what the code looks like. It's really about the output artifact. And so my point is just like I think different projects will have very different needs and the way they are built will probably be different.

Just to put a cap on that too, it's so interesting to hear you talk about, start with should sentences and comments end with periods. You know like what's the quality of that to like, should people be able to do this thing like a local import in a function all the way out to like does it matter in TY if we just accept Claude's commits or does it matter in the package manager? Yeah, I don't know. I mean I think we're still just like figuring all this out. All my opinions on this stuff, by the way, are like super different, like I said, than they would have been like four months ago.

Excitement and uncertainty about the moment

I mean for me, the prevailing feeling I've had lately is like I think that there is like probably no more exciting time to be alive than right now. I think honestly like I'm 40, so like I was 10 years old in 1995. Because 1995 was like the big year of like the Internet became a thing, like Netscape browser. So like I was using computers and like there was an evident like a palpable feeling of like the world is about to change in a very significant way.

For me, I had that same feeling like when I started doing Python, I was like this language is, I think there's a world in which like it gets big and the world is profoundly different because people are programming in this language that is like fun and accessible. Because it felt to me that programming was so like kind of painful and like not accessible. And so that was like the big thing was like why Python was so important was because of its, and the reason it's so popular now is because it's so ergonomic and fun to work in. Like writing Python is fun, like the language is ergonomic, like it's readable, it's concise.

And clearly like you know there were parts of Python that weren't very good and you know Charlie and your team, you know you've fixed, largely fixed you know some of the parts of Python that weren't working like the packaging and distribution and like the fast installation. And so now like the problem is Python has still got some issues like Python distribution is still like you know it's not a static binary like you can make in Rust or you can make in Go. Python still like compared to compiled languages is relatively slow. And so that means like running your unit tests, you know the same unit tests we run you know written in Rust or Go might take three seconds to run but it takes three minutes to run in Python.

And so now like viewed through the lens of like the agent kind of grinding on the code base and iterating and figuring out like what's the right way to solve this problem. Like what I've observed in using coding agents is that the agent is like, I've never like prior to this like I'd never felt like my MacBook Air getting hot before because basically the agent is just like pinning my CPUs all the time running all the unit tests. And what I found and why I was saying a few minutes ago that I'm starting to question whether I should be using Python at all is because that like call like the agentic loop is slower in Python. Like the agents are really good at writing Python because they have massive amounts of training data and reinforcement learning. And I do believe that like Python's readability and conciseness is also a benefit for the LLMs generating that code.

But ultimately if you compare like the, you know, agentic Claude code codex loop or agent of your choice loop building a Go code base that solves one problem, you compare that to the Python, you know, if you set two people to work building the same project in Go versus the same project in Python, what I'm seeing is that you can build the same project in Go faster because the agentic loop is faster, like the tests run a lot faster, a hundred times faster, but then also the distribution problem. So if you build a Rust project, you end up with a static binary that you can just CP over here. But with Python still it's like, oh, like we got to like, you know, we got to put it on the package index and like you got to install uv and like even though uv's fast, like you still got to install it. And so there's like all these caveats.

And so I do think that like I'm not of the opinion that AI is going to cause massive job loss and like get put software engineers out of the work. I'm more of like the optimist, like we're going to end up with 10 to 100 times as much software and all these personalized, customized tools and audiences. But I do believe that like we're going to see on a proportional basis less software being written in Python because of like that, like agentic loop.

Yeah, yeah. I mean, I guess I have a lot of thoughts on it. I mean, I think that's like very plausible. I think it's definitely possible that areas where you would have reached for Python before, it's no longer the right fit. I think that evolution might happen or might be happening, like things like standalone CLIs or whatever else. Because I think different languages will have different strengths. And suddenly the calculus around how you choose what to build in is very different. And so it will just like naturally be the case that like some things will be better for certain things versus others.

I do, I mean, I'm still sort of like figuring all this out myself. I guess like one thing I think about a lot is I actually like haven't really been like a huge Python defender as like a language in general. I think the thing is part of why Python is so powerful is like just like network effects and path dependence are like very strong. And so Python has this huge scientific computing ecosystem. And it's very hard if you want to do anything that involves any form of scientific computing to use anything else. And a lot of that will like continue to be true. Like those things exist and they will continue to be critical.

I think the other is, you know, we certainly as a team want to figure out like how can we make Python a great choice for agents. And so like how should the language and the tooling evolve, right? And like we thought, you know, for example, we thought a lot about building our own like test infrastructure or even like should we do our own runtime, like something that makes very different trade-offs. Those are obviously all very hard problems. But the question is like what if we wanted Python to be a great choice or remain a great choice for agents. Like I think we kind of have a responsibility to try to solve some of those problems and like make it a better choice. Like I'd love to do, you know, standalone binaries. Like just for example, like we should absolutely. Kind of like what you can do with like bun and Java. We should absolutely just like make that like possible. And there's no reason we can't. It's just a question of like prioritization and I'm sure there will be trade-offs and whatever else.

But, you know, I think what I would say is like we're trying to learn about how to build software with agents and like what's required. I think it'll probably impact our roadmap a lot in terms of the problems we choose to take on. I think it will also be the case that like the way people make decisions about what programming language to use will be like super different. It might not be the case that most people are writing most of their code with agents this year. I'm not sure if the distribution will happen that fast. But like I think it's important to be when you think about building tools, it's like that's like what software is going to. Software is going to be like really different. And so I want to fold that into our roadmap. I want to fold that into how we think about what we're building. Even if we choose to say like actually like we're going to build this other tool that has nothing to do with Python. Like I'm even open-minded to that because it's like what's just what's the thing that's going to make people like more productive.

Yeah, I mean, but you're right. I mean, you're absolutely right that I mean one of the, I mean, Python has enormous inertia in the scientific world. Yeah, yeah. Most people listening know, but like I think almost all of the LLMs and the AI labs themselves like, you know, modern LLMs are all built with Python. Yeah, I mean like entirely Python. Yeah, yeah, yeah. All Python. So it's PyTorch and it's Jax and even Google which has its own Silicon. So Python is clearly not, you know, not going anywhere.

And I think the shift with agent ergonomics may influence like the builders of those AI frameworks. It may influence their future choices in terms of like where they build the, you know, even, you know, even in the fullness of time, I think probably engineering the building, the architects of the LLMs, people building training, post-training, reinforcement learning, the LLMs, the people writing that code increasingly, they are going to be using agents to write that code. And so at some point, you know, maybe it's like three years down the line, maybe it's 10 years down the line, I can imagine that we'll end up with like a whole parallel ecosystem of like non-Python, you know, AI training frameworks, you know, in a sense like maybe Google's already, you know, already working on it.

But for now, like the foothold that the kind of like the anchor that we have through PyTorch and Jax and like the whole ecosystem of AI frameworks for Python like that, you know, it's a juggernaut, right? And there's already teams in place that are continuing to build and maintain those libraries. And so, you know, I think like some people might hear what I'm saying and say, oh my God, is Wes saying Python's going to die? I'm like, no, I'm not saying that at all. Like, you know, I think if we look at like, you know, if you look like the Stack Overflow language rankings, you see like Python just, you know, emerged out of the 2000s. And now it's just like this hockey stick of like, oh, it's the most popular language, you know, by far. And I think it will still remain, you know, you know, the number one or the number, you know, number two or three language. But I expect that we'll see, you know, significant growth in other languages, you know, but the amount of Python that's generated will be, you know, 10x or 50x or 100x.

Yeah, I mean, obviously I would say this because I'm working on a company that builds Python tooling, but like I don't think Python's going anywhere. But I do think that like the way that we choose like what we build with, the things that make a language great, like all that will change a lot. And like it even impacts how we think about things like today, obviously. Like we're building, you know, we're building TY. It's a type checker and a language server. So, but there's a bunch of things we're building there where I'm like, wow, like this is like really specific to how humans work. Things like hover, like you hover over a symbol and like what does it show you? Machine do hover. Yeah, well, I'm like, it's like, oh, we're spending a lot of time on like really good hover. And it's like, it does matter. But like, you know, an agent will never use it. Or even like auto completion, right? Like that's actually, that's a very hard problem. Like we're spending a lot of time on that. And like, I absolutely think it's merited. I think it's like good, important work. Like we have to have those things. It's a table, it's still a table stakes to be like a usable LSP. We have to have like good auto complete. But it is like, you know, the way that like agents would use that tool is going to be like really different.

Building resilient tooling

And so, you know, at least I'm thinking about it a lot. Like, you know, I think I've always tried to think ever since like AI became really usable. I've just thought a lot about how can we make sure what we're building is resilient. Because like we really cannot predict the future. But like we can think about like what's going to be resilient and what's not. So like static analysis, like having really good verification, like really fast verification. Like that's, in my opinion, like very resilient. Like as you write Python, I think like typing, for example, will only become like more important and more useful. Because it's a very fast way to validate correctness. And then package management. It's like, okay, like I mean maybe there's a world where you have no dependencies. And you're like building all of everything from scratch. But like I think we're like much further away from that. And so, you know, being able to install and manage your dependency tool chain. Making it really easy to start and run your project. Like those are things that to me, I feel like we're making very resilient bets. But we also have to be thinking a lot, you know, just about how it's changing.

If I could just try to put a thread to it too. Because I feel like all these points are so interesting. And I hate to go back to it. But this point of like starting it. Should I put a period at the end of my comment? Yeah. Like what's being like, should I just use Go for this? Yeah. And this fact that like agents, there's an ergonomics to what agents care about. And what do you choose for humans versus machines is such an interesting topic. And to your point, like what would be resilient? What's going to endure, you know, five, ten years? That's why it's an amazing time. I don't know. We don't know the answers to any of these. We're just sitting here talking.

I mean, like I frankly find like the uncertainty. Like I haven't felt, you know, been in this kind of state of like, you know, just excited enthusiasm about what's happening. I find it terrifying. Yeah. I mean, it's scary. It's scary. But I think that's part of what makes it so exciting is because it is also scary as like, well, gosh, like, you know, this is, you know, just even taking in what's happening on like a daily basis or a weekly basis. Like I open Hacker News, like news.ycombinator.com when you're at them. Like, oh, what fresh hell is being unleashed on us today? Yeah. You know?

But even like, you know, having wild thoughts, like, you know, talking about IDEs, like I haven't touched an IDE in like two months. Like the only reason I've opened an IDE is because there's still like IDE tooling that is necessary for like, you know, if I'm working on Positron, like you need to use the whole VS Code IDE tooling for debugging. And like, yeah, there's still, you know, the need to like, you know, sometimes there's a need to set breakpoints. But even like maybe breakpoints are only for humans. Like, you know, just agents got to figure out how to do that themselves. Yeah, yeah.

And I think it's a very like, at least I feel, it's a hard time to be a leader because like in any capacity. And I, because I think we just like, there's so much uncertainty. I think, I don't know, my approach there, which I guess is the same as it's always been, is just to be really honest and be like, this is what we think we understand. This is what we don't. Here's what I think we should be trying to answer. Do you think, is that hard because like as a leader uncertainty is hard? Or do you think that's hard because there's like something about the uncertainty or like this situation?

Maybe it is a little bit of the latter because I think things are changing really fast. And so like there are lots of products that you could build today. Like even if you said, okay, we're going to like really focus only on building around like AI and agents. There are lots of products that you would think about building today that could be made completely redundant in like three months or something. And so I think that, again, that's where, for me, it comes down to like trying to build around resilience. And it's like, I think the layer that we're at is a very good one for being robust to change, but like, yeah, there are lots of products that we might think about building today. Like, I don't know, like what if we were like, oh, we want to build like a code review tool and not to comment too much on that example, but it's like, okay, like how relevant is that going to be in like three years? Like it's so, I think, you know, to some degree it's like uncertainty is hard. To some degree, I think it's just the change is so fast and the capabilities are changing so much that maybe it is a little bit unique.

But so maybe kind of like a mix of both, you know, as a leader, like you don't, not that I'm like an amazing leader or super experienced leader or anything, but like, at least from my, in my opinion, you don't always have to have like all the answers, but you have to have like some opinions and like a clear strategy to rally around, even if it's like, get more information or like, or learn more.

How Astral the company is adapting

I mean, how, how do you, I mean, so there's, you know, there's, there's, you know, Charlie Marsh, the person, but then there's also like Astral the company and the team. So I'm curious like how, like how you as a, the team and the company is, has changed, you know, in the last six, nine months, like in reaction to everything that's been happening. I think like I'm seeing like all kinds of reactions from different companies and teams. And obviously there's a lot of new companies being created right now, like AI agent native from the ground up, like, you know, people are talking about like Y Combinator accelerator, you know, it's like a startup accelerator talking about like, Oh, 95% of the code written by these companies is all AI generated. And so many companies are starting scratch with this like totally AI native agent native mindset. Whereas like, you know, companies that have been around, you know, or that are a little bit older, you know, it could be a lot older or approaching it and adapting in different ways. And so there's like internal ways of operating and business processes, but then also like team culture and like how people operate.

And so it's like, I feel like everything is being turned on its head and people, every individual is adapting and changing their, the way that they interact with the world of software engineering and open source. And probably maybe after this, like, I do want to talk about kind of, you know, the whole open source, you know, ecosystem aspect, but, but I do want to hear about kind of like how, how just the impact it's had on Astral, the company and the team and, and, and your culture.

Yeah. I mean, I think like, thankfully, I guess thankfully we haven't had to make too many like major strategic shifts. Like, I feel like the things that we were focused on remain have remained good bets maybe just by chance. But like, I think a lot of it is actually cultural around trying to, you know, get people like curious and interested in experimenting with a lot of these tools. So I've tried to do that a lot. Like we've set up, cause across the team, like the adoption varies a ton. Like there are some people who don't use this stuff at all. And then there are people like me, I guess. And, and, you know, maybe a few others who are on the other end of the spectrum where it's like everything we're doing is there.

And I think, you know, a lot of that has been like getting, we're small enough, you know, we're small and we're flexible, which is like a huge benefit. Like I'd much rather be like a 20 person team than like a 200 person team trying to figure out how to adapt. It's like way harder. But I think, you know, we, we'll do these like AI, basically AI knowledge sharing sessions where we bring the whole team together and someone just like screen shares and they literally just like walk through how they do their work. So it's just like, yeah, this is like, I'll have like this open and then I'll do this and I'll do blah, blah, blah. I mean, it's like pair programming, you know, it's just like sharing how you work, but like everyone has a super different thing. And then a bunch of people have like reactions and we just like have an open conversation about it.

And so, you know, just trying to like get people thinking about it and in a very, I think a very genuine way, like, like the good thing I think about are, are, you know, having a company culture like this is like, we, we can come in and I can be like, yeah, I do like this, but like, it kind of sucks. And like this, I've tried this, like doesn't really work or like, well, like it's not like we're sitting there going like, this is the most amazing thing ever. You have to use this right now or like you're about to be made redundant, like your jobs and like, you know, your job is gone or like programming is over. If you don't like, it's really not like that. It's like, Hey, we should all be like pretty curious about like what's happening here and like blah, blah, blah.

So, you know, I think, I think another thing that's been cool is like we do, we try to do a lot of, I mean, we run the company like mostly asynchronously and we try and do a lot of just like sharing and discord around like what's working and what's not. So like, Hey, I built, I built this thing. It's been really, especially for internal tooling. That's been like a great test bed for us or tooling around PRs, like the ecosystem stuff that I mentioned. Like I think Claude created this like really nice report that, you know, so it's not just a GitHub comment, it's actually like an interactive report and you can click on it and that's like super useful. And so like, I think seeing wins like that come out in low areas that are kind of low stakes, like all that's been helpful, I think for, you know, getting people like interested.

I mean, there's another part too, which is just like access. Like how do you, this isn't like a huge problem for us because we're a small team, but it's like, how do you make it so, how do you make it so like the bar to trying this stuff is very low. There's like a lot of different tools. And so, you know, I, we're basically like, you know, everyone at the company, everyone on the team has a corporate card. Anyone can just put like a Claude Max subscription on it and like no questions asked. Yeah. It's like, I mean, that's like an easy thing to do and that companies, companies like should be doing. But, but, but just making sure that there's like a very low bar to like trying things out and, and sharing like what's working and what's not.

Yeah. Yeah. It's interesting to hear too, like where you mentioned like this feeling of uncertainty and the challenge as a leader in like building AI products. But it's also interesting to hear like inside the company, like how do you encourage like a culture of experimenting and like how do you, yeah. No, cause there are definitely people on the team who are like skeptical for sure. Like, you know, our team, I mean, it's probably doesn't represent like, you know, an average sampling of like human beings across America or whatever, but it's like, we have programmers who come from like all different sorts of backgrounds. And, and like, I don't know, like ages or like eat programming ecosystems or like what they did before or like whether they're self-trained or like, you know, it's just like people have very different opinions about a lot of this stuff.

And so, you know, I don't approach that. And I think people on our team don't approach that from, we don't approach that in like a combative way. Like you have to be using this stuff or like blah, blah, blah. I really want it to just be more like, okay, this is like, this is weird. Like we have to like figure out like what we're doing. Yeah. Yeah. So it's a little, I don't know. That's sort of like been my approach. And I feel, yeah, I feel a lot more uncertainty honestly about like yeah. Things like the open source and like the contributors and like how we manage those interactions. Like it is increasingly the case that we get these PRs that are like a lot of work to shepherd and we don't quite know what to do with it yet.

Open source contributions in the age of AI

And I did hear you like just to put a like spotlight on that. One thing that struck me before is you mentioned like when someone opens a PR and you're thinking about a response, I think you mentioned like you want, you don't want to discourage them. Like you don't want to give them a response that sends them away. So yeah, I think hearing you say that, like thinking about contributors in the context of not sending them away, being like a really welcoming. Yeah. That's always been a big part of the project for me is like we wanted to have, we want to have, you know, a community where like if you, even if we say no to whatever you're asking for, you should feel respected. Like you didn't feel like we listened to what you asked for. We thought about it and we gave you a reasonable rationale for why we couldn't. And then obviously if we say yes, you know, that's like a better always a better experience for people.

But that for me, I think that was like also a big, it was something that created really positive feedback loops and like over the course of the projects was, I think we just created a community where people felt like we took them seriously and like, you know, we're really responsive and, and, and then I guess like you, sorry, I switched, but you were talking a bit about like today, today, like in open source with AI, these contributions now as part of a welcoming community. It's very hard. I don't really know how to navigate it. I mean, we also get things like issues, you know, that are clearly written by, and there are all these tells that is written by an LLM, like lots of headers, like lots of bulleted lists with like bolded items, like lots of parentheticals, like maybe like some emojis, things that are way too thorough, like a human would never do it this way. Like it not only talks about like the problem, but also like the proposed solution and like blah, blah. It's like a human would just not do that.

And like when we get issues like that, like we don't have any policies on this yet, but like sometimes we'll basically just be like was, or even when we get a PR now, sometimes we'll be like, was this written by an LLM? If so, can you just talk us through like how you verified correctness and like blah, blah. And I don't really know. It's just like, it is just a, it's just like things that used to make sense, like don't quite make sense anymore. Even sometimes now in the context of open source and code review, sometimes it's like I put up a PR, you know, that I wrote with Claude and then hopefully did a bunch of work, you know, a bunch of iteration. Like hopefully it's not just like a first pass, like slop thing that I put up. Someone puts comments and then a decent chunk of the time,

what I'm doing is I just copy and paste the comments into Claude code, right? And have Claude fix things. And I review, I paste one comment, you can have Claude code, read the comments using the GHC. I know, I know, I know. But, but I will basically, sometimes I will basically, I'll just like copy, you know, I'll copy the comment I'll put in the cloud code and then I'll, I'll do it one at a time. Like ideally, like I'll look at what changes and then I'll commit that and then I'll go to the next one, blah, blah. So I'm like, I am being thoughtful about what it's doing, but like, it's pretty, that's pretty weird. Like, why am I like, why are we working that way?

Like there's a bunch of connectivity that's lost. Right. And like, like I just think like the way that that whole loop works will be really different. And, and the way that like the open source contract works, it's sort of like has to change because, you know, it used to be that like writing the code was the hard part. And now it's like putting up code that passes the tests. Let's just put it that way is like not hard. You had validating, especially in a project like UV, just for example, it's like, okay, I'm going to write some code. And then like, okay, that has to work on, you know, on windows, macOS Linux.

It has to work on maybe people have all sorts of different drive setups. Like maybe it has to do with like sim links or hard links or like, there's all these things that are actually very hard to test. And we have no way to really verify that.

The Lisp paradox and open source collaboration

It reminds me, I, I, maybe you've, you've read this famous essay from, from years ago called, I believe it's called the Lisp Paradox. I don't think I've heard of it. So I'll summarize it for you. So the idea of the Lisp Paradox is that it's trying to reckon with like, why, why did Lisp like the Lisp ecosystem that includes scheme. And there was a whole like Lisp thing in the eighties, like why did Lisp not become more popular? And so the idea is that, that, that problems that are technical problems in other programming languages are social problems in Lisp.

And so the idea was that individuals could craft these solutions to problems in this elegant way all by themselves. And so the barrier to, to for an individual to be able to create their own personalized solution to the problem was not as high as it would have been in like C or in, you know, COBOL or, you know, other languages of the, of the 1980s. And so the, the, the outcome of this, this is obviously back in the 1980s, you know, when programming was a lot harder. So the outcome of this was that people were less incentivized to collaborate because they could just do their own thing. Right.

And I feel like now we have that problem, but worse because you know, it's like, why, why bother collaborating? Why bother doing open source when you can just fork the project and make it exactly the way that you want. Yeah. And so even like the incentive to participate and to propose changes and have to interact with humans to get changes into a project, I, you know, I'm seeing, you know, people just, you know, fork it, fork it, fix it and move on. Maybe they'll throw up a PR. Maybe, maybe they won't, but it's, it's, I think that list paradox problem like now, like we have this problem of now, like the challenge is how do we continue to incentivize that, that collaboration, that healthy collaboration, which yields communities that work together to solve problems and collaborate over a long period of time when now like the code is super cheap and the cost of saying like, well, you know, that, you know, to help you like I'm not going to collaborate with you because it's easier to talk to my agent than it is to talk to a human on GitHub. And so I'm just going to go solve the problem and move on because I've got a job to do. Yeah.

to help you like I'm not going to collaborate with you because it's easier to talk to my agent than it is to talk to a human on GitHub. And so I'm just going to go solve the problem and move on because I've got a job to do.

The whole like kind of cost benefit of open source has completely changed. And so I don't mean to catastrophize, but you know, I'm seeing people talking about all the end of open source and it, you know, they may not be wrong. Like I, I, I don't know. I think these are also, I think like one challenge with that and with a lot of these things is like there are, there will be compounding effects over time that we won't see acutely like, okay, one person forking a project is one thing. But then over the course of a year or two or three years, like everyone forking and no one building like new foundational tools is like the, what are the effects of that? You know, everyone's working and like introducing their own CVS, you know, like security. Yeah. There's like, there's just like, and so I, I don't know. I think it's, it's important to be thinking about right now. I don't know what to do about it. Yeah.