Resources

Aleksander Dietrichson - AI for Gaming: How I Built a Bot to Play a Video-Game with R and Python

video
Oct 31, 2024
19:34

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thank you. So I'll be speaking about AI for Gaming. And what you see here is a software called Gatai playing a video game called Christmas 3. And this is from one of our training sessions. This was when we tried to do this brute force, something that didn't work. And we'll get back to why that didn't work shortly.

And a little bit of history. First, Gato is a web portal for game creators. And Gato, then when they approached me about this project became Gato AI, which turned into Gatai and then Gatai the AI. And Gato means cat in Spanish. And I live and work in Argentina. And because it means that that's why Gatai is a feline, as you can see here. And that sort of stuck with throughout the project.

Christmas 3 is a puzzle game that you can play online, you can look it up and you can play it directly in the browser. And the purpose of the game is to move any of these tiles, either vertically or horizontally, so that they line up in groups of two, three or four, no three, four or five. And then you get different scores based on that. So it's not a very complex game. And I'll just show here, these would be two valid moves to make on this board. And there are 10 different types of tiles. And the board is 10 by 6.

The state space problem

And that is sort of why the brute force approach will not work because you are looking at 10 to the 60th combinations here. And so we had to look for strategies on how to reduce this. This is what is called the state space. So the total number of possible configurations of the board. And that's an important concept in reinforcement learning, which is what we're going to be using to play this game.

Now, so to put this number in context, this is, you know, it's more than the number of atoms on Earth, less than the observable universe. And it's actually quite a bit bigger than chess that actually surprised me when I looked at this. We can talk about why after, but that actually surprised me. It's more complex than chess when you just look at the state spaces.

It's more complex than chess when you just look at the state spaces.

So what is a state space? It's a configuration of the board. In our case, when you use reinforcement learning, there's an algorithm called Q-learning that totally talks about state space, action space. There's a reward, which is essentially the score and the resulting state. So state space is what is on the board. Action space is what you can do with that. And the rewards is the score. And then the resulting state is what does the board look like when you are done, when you have made your move. And this goes into a formula and the algorithm can converge and then solve the game essentially.

Reducing the state space

So we had to reduce the state space. And we had three strategies for that. The first one was to play one type at a time. So I do stars first and then I do the red balls and then I do the little candy sticks or whatever it was. And then that will take this. I've just arbitrarily put A through G on the board here. It will reduce it to 2 to the 60th, which is still far too much for a brute force approach, but it is some magnitudes less. And this is actually quite convenient if you're going to program it, because this is a binary number that you can pass through functions and have fun with.

The other one, probably the most important one is windowing, which is if you create a board that's a 3 by 2 here, you can put that, you can slide that across the bigger board. And then, you know, if you know how to play this pattern on the left, you can use that on any part of the bigger board. And then there's transposition because the game is symmetrical. If you know how to play the position on the left, you can also play the position on the right. So that reduces our spaces quite a bit and we can take advantage of that.

So I trained the model on these sub boards and I did this synthetically and theoretically, I got as far as 5 by 4. And after that, the algorithm would not converge. So I left it overnight and it didn't converge. I used POSIT Cloud to do this. So I had standard and commodity hardware. You can probably tweak more out of it, but it's going to start costing money just about at that point.

So those were the models we had available. I trained the first two to begin with, that doesn't take very long. And then, you know, you want to do some spot checks. So here's one where, you know, I want to know that it makes an intelligent move. You call predict on the state space that you've created. And you know, it says B1 to C1, which is the blue you see on the screen here. So here's one where there isn't a reasonable move to make in a 3 by 3 grid and it comes back with a pass. But so it's been trained to act when there isn't a reasonable move to make.

The other thing that was interesting was, you know, does it actually think ahead? And it turns out that it does. Here, if you give it the board on the left, it will propose A1 to B1. And we assume that that's in preparation for the next move, which will score then 50 points in this case. And then what happens when you have competing moves? The one to the left here will score 50 points. The one down will score 100 points the way the scoring system is. And it actually chooses the highest scoring move.

So at this point, we had trained the 3 by 3 and the 3 by 4s. And it seemed reasonable that, you know, we were actually, this actually worked. As a concept, we trained the rest of the subboards up until 5 by 4 is how far we got.

Browser automation and the vision model

And then I wanted to do this. I wanted to interact with the GUI. I wanted to actually play the game in the browser. And for that, I used Selenium with a Chrome browser. I used Python for the connecting code because I couldn't get the R package to work. There is an R package, R-Selenium. It didn't work on my hardware then. It actually works on the hardware I'm bringing today because I've switched in the meantime. This is a MacBook. I was working on a Chromebook at the time. So it depends if you have the right drivers and you're sort of beholden to the hardware here.

So the way this works is the AI will start the game. It sends a URL to the driver, which then loads the game in Chrome. And it then takes a screenshot that gets sent back to the AI. The AI analyzes the screenshot and sends moves back to Chrome through that driver. That's a Selenium driver.

So this is what we get in. This is our raw material. And of course, we have to cut it into different tiles to be able to evaluate what would be reasonable moves to make. Now, the problem here is that when you do this, the tiles, when you calculate the coordinates, you come up with decimals, which means that depending on where you are on this board, the tile is going to be slightly smaller and bigger, which is why you can't actually do a direct comparison. So we had to create a vision model as well. We used magic for that. That's a package for image manipulation and then multinomial logistic regression. And we, you know, it was relatively easy in that we could just count the number of red, green, and blue in the image. And that was enough information to be able to distinguish them.

And the way this works is you give it an image file or an image in memory. You read it here, I printed it out. It's a star. And you can also call predict. This is a regression model. And if you ask for the probability, it will give you a probability for every category. And for our purposes, you know, we're going to choose the first one, obviously, you get a non-zero probability for every category, because that's how the algorithm works.

Results and business cases

So when I had this, I was able to play the game on the board and we tried out the different strategies. And we got 42.5. That's the mean score per move, which I think is the only reasonable metric to use here. There's a version of the game where you can play in a certain time, but then again, you're really testing your hardware, not the AI that you'd created. So I chalked this up as a victory for Gatai.

And then comes the question, why do all this? And of course, the main one for me was bragging rights. There's some nerds play video games and bigger nerds create them. But if you write the code that plays the video games that these super nerds write, then I think you are a member of the geekhood.

if you write the code that plays the video games that these super nerds write, then I think you are a member of the geekhood.

Then the other thing was, I had taken their products from their websites. You know, this is wild caught product from their catalog. And I had applied AI to it. That is what I wanted to do. And I wanted this to be a portfolio project. I wanted to go back to the company and say, you know, there are some business cases for you where we can use AI.

And the game that I analyzed is not one where you have an opponent, but it's relatively easy to see how you could do that. But I did implement an assistant. Let's see if I can get this to run. And here I am actually playing the game on the board. And I'm trying not to make too good moves and huddling a little. And at some point, Gatai will detect that I'm stuck and it will, like Clippy, show up and propose some moves. This is also done with Selenium, because you can send JavaScript all the way through into the browser, and then the JavaScript will paint the Gatai logo and write out this and the arrows and everything. So that point of view, it's quite powerful.

But what really caught their interest was the idea that how about we apply generative AI to this? How about we do this at scale? We can scan in all the games in your catalog, and we can make an LLM that will then create custom games for your users. So I know what kind of game you'd like to play. Here is one that is made exactly for you by our system. And then they asked me how much that would cost. And I said, you know, maybe $5 million, maybe 10. And then they asked when I could submit my last invoice. And that was the end of my career in the gaming industry.

Using the project in teaching

Now, I have a day job. I am a teacher. I teach data science at the university. And I thought of how to reuse this project or project like this in my teaching, because we did a lot of work. We did a lot of simulations. We got results back. If you have an empirical approach, you will say, well, from these results, we can extrapolate these rules with a certain p-value. For example, one of them was there's always going to be a two-move combination that is reasonable to make on this board. We have a p-value for that. Now you can take that to the next level and say, well, that's a conjecture. Can we prove this mathematically? And you can. And then, you know, if you want to follow that trail, what's the minimum number of tile types you need to have for that to be true?

And when you're done with that, you can play with the geometry on the board. You know, we're doing six by 10. What if you do one by 10? What are the rules then, mathematically? Oh, one by infinity. Infinity is always fun to work with when you're proving stuff mathematically.

Of course, all the code we wrote should sit in a package, so we can talk about package development. I implemented the state spaces as S. That's not correct, actually. It's S3 classes. And I implemented the robot as R6 classes. There is a reason for that. What are the pros and cons of each type of class structure that we have available? Graphics processing is an important topic because images are data too, and we should be able to analyze them and manipulate them and use them in data science. And browser automation, I think, is a cool topic.

So, you know, there's a couple of tracks here. There's one very theoretical track, and there's one more engineering track, maybe some system integration. And of course, we have to write this up. We have to create this presentation, use Quarto for that. So I think there's enough here for one semester seminar for a group of students that I have. And I'm hoping to be able to distribute this so that the guys who like mathematical proofs get to work on that, and the people who want to work on more engineering stuff can do that. And then since we do this as a group project, we also learn how to work in an interdisciplinary field, interdisciplinary team. So this is how I tend to use it going forward as a teacher. And that's what I had for you today. Thank you for your attention.

Q&A

Very interesting. I can't wait to go and play some Christmas tree.

Quick question from the audience. You mentioned that the model got an average score of 42.5 per move. Did you get a chance to compare that to an average human player? Only to myself. Yeah. And I got, you know, 20 per move or something like that.

Okay. Interesting. Okay. So do you have any thoughts to apply this concept to other kinds of games or have you looked into that for the future? Yeah. So this is a puzzle game. And then, you know, this kind of algorithm works really well for a puzzle game. And you can also use the Selenium connector to actually play it in the browser, which I think is very powerful.

If you want to do space invaders, for example, then things need to move faster. So you need to intercept. You can't really play it in the browser with a Selenium setup and take a screenshot and evaluate. You're not going to have that much time. So for that, the solution I have would be to use, there's a platform called Unity, which is used to create games. And it has actually has a plugin for AI agents. So then you bypass the rendering on screen of everything. And you actually play the game synthetically just before it goes to the screen and goes to the hardware. So then you can actually train models for faster games, so to speak.

Very cool. One last question, short one. How long did this project take, start to finish, if you're willing to say? Yeah, three weeks.

Very interesting. Thank you so much. Yeah, absolutely.