most of you reading this will probably have noticed the sudden performance problems and inconsistency bugs throughout the site. Let me just give a little bit of background as to what happened during the last two days.
We have been struggling to keep up with the growth of the speedrunning community for a while now and are constantly looking for ways to improve the site's performance. Right now the site is a big monolith, running on a single, beefy server. To reduce the server's load we now rented a second dedicated machine just to host the database. On Saturday we made the switch and moved MariaDB over to that new server.
In the hours following that we saw that performance decreased substantially, e.g. mean page load times went from 100ms to 400ms. To give the new server time to warm up its caches, we decided to let the experiment run until Monday morning, closely watching the server's metrics in the mean time.
Unfortunately the situation did not improve and we saw a lot more failed and slow requests throughout the nights. This plus the fact that things like user registration, game requests and other things broke made us decide to revert the experiment a few hours ago. We moved the database back and the site should be behaving normal (meaning "not great, but certainly much better than over the weekend") again.
We've learned that the database indeeds requires 80-90% of this server's performance and that all other services (nginx, Redis, PHP, e-mail) are negligible in terms of memory and CPU usage. We also learned how valuable our monitoring is and improved that setup a lot over the weekend.
Our next action items are
- Debug why moving the database caused weird consistency errors. We configured MariaDB to run in full ACID compliance, so we don't expect transactions to just disappear without any kind of error.
- Improve our database queries in general. We had a 100MBit connection between the two servers and our queries alone nearly saturated that link. We could also see that MariaDB spent a considerable amount of time just sending out packets. Reducing the resultset sizes should free up some time for actual query logic.
- Further improve the caching layer and make more use of Redis in general. During the weekend we saw that the API rate limiting was causing 90% of all locks in MariaDB and just moving that logic to Redis was small, yet quick win for the performance.
- We will most likely run a similar experiment with a dedicated database server in the future. We are thinking about replication and running multiple read-slaves, but still fear the additional complexity in our setup.
We apologise for the disruptions over the weekend. As they say, you can't make an omelette without breaking eggs.
-- The Team
I was watching Highspirits' any% run ( https://www.speedrun.com/Dragon_Quest_Builders/run/8yved96m ) and was noticing that it's classified as PS3 -- but in the VOD he says he's playing on PS4 (I cba to find the exact timestamp). From the accidental save screens it sure does look like the PS4 as well.
Dutchj, I can't reproduce your problem. Can you give me more details (what game, what browser are you using)?
Yeah the algorithm for relative times is not 100% identical, but I personally like it ;-)
we have just changed how timezones are handled throughout the site. Until now, we used a cookie to render times in your timezones on the server. This, over the time, caused all sorts of issues, starting with endless reload loops on the PS4 and some other browsers and ending in a lot of hacks to handle DST changes. Also, this prevents us from caching HTML output, as it depends on each user's preferences.
On top of that, when the site was moved to a new server, we also switched from running in CET/CEST (UTC+1/2) to using UTC. The many hacks throught the site to compensate for the original timezone lead to some problems (like new run times being off by an hour), which motivated us to finally do something about it.
A note to marathon managers: We noticed that times seem off by 1-2 hours for newer marathons (with our schedule being the only source for the start time, it's hard to verify if it's correct ;-)). Please check your schedules and make sure the start time is correct. Please note that you need to configure it in UTC, not -- as earlier -- in your local timezone.
This is not a mistake, the 1-Loop category looks like non-misc to me. =)
Concerning your other question: No, this is not possible and not planned, you would have to crawl all categories and runs (you would probably fetch a plain list of runs instead of querying each category for its runs).
Fixed, thanks. There as a typo in the code, killing the XSS prevention.
We encourage external sites to integrate with us and use our data. That's why I've built the REST API over the last few months. All we ask for in return is that those users give credit. Really, a small link to us will probably be sufficient. That is not much to ask.
Without a license on our site, external sites must assume that they are not allowed to use the data at all. Just because we don't have a license does not mean our data is automatically public domain. We therefore need the license to enable others to safely use our stuff.
Just for reference: Images are supposed to be cached for one hour in everyone's browser cache. You can, after uploading a new one, check your changes by doing a hard reload (Ctrl+F5 in most browsers on Windows), but most users will only get the change after that one hour.
The site required cookies to work. Browser that disabled those ended up in a reload loop.
I "fixed" that, accepting that for users with no cookies, dates would be in the wrong timezone (a proper fix would take more time to implement and atm the first priority was that those users can at least see ¤something¤ of the site).
Right now, I can't reproduce your problems [any more]. http://www.speedrun.com/gtasa works fine for me in Vivaldi.
No problem, I will kill your account. As for why this is not an option for everyone, I don't really know. Maybe a good chunk of our users is constantly drunk and would delete their account during binges. ;-)