Using AI to retime speedruns loadless
5 years ago
Russia

Hello there everyone.

Recently I started a project to train a neural network that would have a speedrun video as an input and that would output Time without loads. And it seems that I'm having a stunning success with it at this moment!

I've used Need for Speed: Carbon game as a baseline since I happen to mod it, and that it uses time without loads. Even though the network is not working completely as I intended so far, it manages to hit 1 second accuracy on any% NG+ runs, making it already useful for retiming runs in that particular category.

If there are any active mods reading this, I could use some of your feedback please. Particularly, I would like to know what would you expect from such a thing - PC requirments, accuracy, analysis speed etc. Would you think it will be acceptable to use AIs for such a task, given it has a good accuracy?

Also feel free to ask questions.

Tenka, Quivico and 5 others like this
Texas, USA

Over at HM64 (https://www.speedrun.com/hm64), we've been having trouble with emulators that have shorter load times than console. There are categories ranging from 2.5 minutes to 8 hrs long, and there's too much RNG for us to establish something across all the runs. I'm not a mod, but something like this would be really helpful in equalizing the playing field for both emulator and console runners- a problem we've been struggling with for while.

Personally, I think accuracy would take precedence over just about anything else. If it takes all night to process, that's fine, but it'd be much better to know that the results are accurate than how quick a number is returned. Honestly, if you're serious about this, I'd recommend hosting a website so things like PC requirements aren't an issue; it'd only be things like upload size and file type.

Edited by the author 5 years ago
ShikenNuggets likes this
Scotland

I still think human checking is always best but this is kind of cool I think if it is correct is a big thing but also it will have to train in lots of different games if it was going to be used for other games. But if just for need to speed just needs to get it every time ^_^ seems like a fun little project tho :3

oddtom likes this
Canada

I really like this idea, and I hope you pursue it further, but I have some concerns about how practical it would be. As @oddtom mentioned, accuracy is by far the most important thing. I wouldn't even consider relying on something like this unless I was absolutely sure it was perfectly accurate to the frame in all cases. Any less and I'd just have to double-check the results manually.

Another concern I have is, how much data would be needed to train the AI to work for a particular game? For a game like Super Mario Odyssey that has thousands of runs (5,895 as of writing) I imagine this would be less of an issue, but the game I moderate only has 9 runs. I imagine the amount of data needed would be prohibitively high for that game, and likely for the vast majority of games on the site.

As for analysis speed... this is a tricky one. Will the amount of time it takes to analyze a video increase linearly with video length? If so then that would be easier to work with for runs of any particular length, but if not then that could make it problematic for longer runs. In any case, if running it for like 12 hours isn't enough then I probably wouldn't want to use it, especially if it's a very hardware-intensive process that's works fastest with no other processes running.

Again, it's a really cool idea, but there are some pretty difficult hurdles it would need to be able to jump for it to be useful in a lot of cases. But if it did manage to jump those hurdles, or even just some of those hurdles, then it could be a very helpful tool in speedrun verification.

Edited by the author 5 years ago
Russia

Thank you guys for your kind responses.

@oddtom Accuracy is indeed the #1 priority over anything else, but, unfortunately, we cannot really speak 100% accuracy because of some technical stuff that goes behind the scene - the result of neural network processing is not a "true/false" answer, but rather a probability of a certain case, like "98% sure it's a loading screen". Using statistics or brute force we can negate the effect of such error to a better side and reach false 100% accuracy, but technically speaking 100% acc is unreachable.

About the PC reqs... they are not as strict at some point: you need a moderate CPU + NVidia GPU (needs CUDA for calculations) combination for it to work, I'm using an i5-3470 and a GTX1050Ti and that is more than enough. GPU is the most valuable thing here obviously. I would like to test it on some other PCs as well, like using CPU only etc, etc... Oh, you also need Linux so far, but I plan on making a Windows variant as well for obvious reasons :) This PC allows scanning videos on about 3x speed of original, or having around 90 fps at average.

@ShikenNuggets about how much data is needed. tl;dr: more data is not always good. More detailed explanation below.

Currently, I use 16k train and 6k validation images, totaling at 22k images - that's a little bit more than a minute of 30fps video. Bigger's better, but over fitting is a huge problem - the AI can just "remember" all the input images and not really "learn" to tell those apart, I already had that happen on my test set of 22k images. More brain power is required to actually train it in the best way, which I'm trying to do at the moment.

And, to answer @MelonSlice , humans make mistakes as well. I don't consider human retimed times to be 100% accurate, as we at NFS already experienced issues when different people would have different results for same runs. It's kinda OK to make mistakes. With a neural network we can tell time with a certain margin of error - for example, we can say that "This run is 12:46 +- 1 second loadless". So, if that 1 second makes a difference, e.g. it would move a particular run up or down the leaderboard, it would require further investigation. Otherwise I don't see much problems with leaving it "as is" with a comment that "1 second margin of error can be present"

Scotland

well with the human checking thing I didn't just mean for time but also splices, speedup or anything else someone may use to cheat, I was thinking of the whole thing not just the time :)

Russia

Oh, that thing, yeah. While it is possible to check for splices and cheating using AI, it sounds more like a Nobel Prize Winning Project than a srcom thread ^^

Canada

"100% acc is unreachable... It's kinda OK to make mistakes" Unfortunately this is a pretty big deal breaker for me wanting to rely on something like this. I understand why you're okay with it being slightly off, but for me, if I know the number I'm being given could be wrong then I'd rather just go through the process manually. That being said, being able to analyze 90 frames a second is something that would make re-timing longer runs (like in the 3+ hour range) much more feasible. Although another possible issue is, would the margin of error increase with run length? If so, a one second margin of error on NFS Carbon's NG+ category (which lasts 12-15 minutes) could be as much as a 12 second margin of error on a 3 hour long run. And if there was a conflict that needed to be looked into further because of that margin of error, I doubt anyone would want to spend however long it takes to re-time multiple 3 hour long runs to ensure that there's no problem.

"different people would have different results for same runs" Is this a common occurrence? This is somewhat off-topic, but if so, are you sure that all the moderators for the game are on the same page about where loading screens start and end, what a loading screen is, etc? Human error certainly happens, but if everyone's on the same page then I wouldn't expect it to be that much of an issue.

While typing all that out, I thought of a way this could be made more useful. I have no idea how feasible this is or how compatible it is with what you already have, it's just a thought. What if, instead of just doing the analysis and coming up with the time, it was more of an AI assistant? So after it performs the analysis, it shows you everything that it thinks is a loading screen, which you can either confirm or deny, adjust the start or end of, etc. This way you can still benefit from the speed and automation, but still be able to see what's going on and manually get it to whatever "close enough" is for whoever's using it. It would also make manual re-timing easier later on if necessary since you can still use the AI analysis as a baseline.

Edited by the author 5 years ago
Russia

@ShikenNuggets that is... actually a pretty dope idea! Just making some kind of timeline that the AI would "mark" with what it thinks a loading screen is so you can later on jump between those places and recheck everything manually. Taken into consideration, but it would take much (I mean MUCH) more time to implement. GUI is always a pain to work with.

Also an update: I hit in-second matches on NG+ runs, and any% runs are only 5 seconds off, making the current one 99.93% accurate, which is already close to ideal.

ShikenNuggets likes this

The assistant is a really good idea imo.

@GrimMaple How does your architecture look? Are you using Convolution? How many layers? Are you feeding in individual frames and classifying them binary(load/not load)-frame?

Which frame are wrongly classified? You could make a second pass sanity check and remove loadframes that stand alone (neighboring frames are non-loadframe) as there is not game that has 1 frame loadframes (they always come as a bunch). Same with non-loadframes.

Maybe it's worth to try what happens when your input is a stack of 4 consecutive frames, somehow get the time dependency into the network.

Those results sounds already really impressive! good job looking forward to seeing a prototype :)

Albania

This is a pretty good idea @GrimMaple . Actually gives me a decent idea on what I could use a RNN for for speedrunning purposes. You should create a Git and share your ideas so we can help this project along. I am assuming you're using CNN for this. Either way Deep Learning can be applied to so many things its nice to see someone creating something here.

Russia

@Tigger77 I'm using a pretty standart solution: GoogLeNet via Caffe. The reasoning behind that is it was the only solution that worked out of the box and doesn't require a lot of hardware to use - in worst case scenario, it could be easily ran on CPU. I'm not that deep into the fancy theory behind all the architectures and stuff, so please forgive me my ignorance :)

I'm feeding individual frames and using float [0; 1] as a result - loading screens usually stands out and when it is a loading screen, the result is usually closer to 1 than anything else. For now most actual loading screens have >0.96 output and I take an advantage of that.

We [me and my coworkers] discussed the idea of feeding more than 1 frame, but it was quickly rejected as a GPU-heavy solution. After some talk we came to a conclusion that it would be best to see how a single frame would work and only then take action if necessary. And it seems like it isn't :)

About the classification, at first it was clear that the network struggled really hard when the loading screens differed. For example, if one loading screen would be notably different from another, it would only start recognizing one of them. I solved the issue by creating more classes (eg game, loading1, loading2 etc), and that seems to work really well - initially it gave me the most boost in having a working thing. In conclusion, it really matters to introduce as many classes as the network needs.

@Soulkilla thank you for a reply, but I would like to not share much at this point -- I don't enjoy sharing barely working things, it just sucks IMO. I don't want interference either, because merging could be a huge pain. I will eventually make a git repo when I'm sure the project is useable