1. Here you will find official announcements and updates. These announcements are also linked in the Official SotA Discord server.
    We encourage comments from the community! To keep the announcements official, we ask that comment threads be created in the General forums for player input.

                                                 Thanks!

R106 Outage - Recovery update

Discussion in 'Announcements' started by Ravalox, Oct 2, 2022.

Thread Status:
Not open for further replies.
  1. Ravalox

    Ravalox Chief Cook and Bottle Washer Moderator SOTA Developer

    Messages:
    1,731
    Likes Received:
    4,954
    Trophy Points:
    125
    Gender:
    Male
    Location:
    Dallas, TX
    First off. No, the game is not going away. We'll be doing some special stuff as a thanks for being so positive and standing by through the whole thing taking into account lost time and rewards. But we first will need to get through the recovery phase.

    Please note that all sub rewards will still be delivered when you login - there is no expectation that anything will disappear from your account.

    Chris, DC and I have made really good progress. Still can't make an exact estimate as when we will be up, but the Triage, Troubleshooting, and determination phases of Disaster Recovery are pretty much done. Moving into recovery.

    We have a few things on deck to do there is a range of possible outcomes none are too scary at the moment, just time consuming. Our highest priority is to preserve the players progress in the game and their account inventories. This is why this wasn't a same day or overnight fix.

    To better understand, a summary:

    After the release was applied, the Database was updated for the Heritage change. The update failed, but not in a way that would/should have caused us issues. We then performed our sanity testing to make sure the game was stable before opening it to the players and found that there were errors showing and some house lots were showing as un-claimed that should have been there. These same lots could not be claimed, as the "U" deed menu showed the player still owned the lot. (none of the items were gone, just the reference to the lot the items should be on were not present. - so no data loss)

    We then restored the Database backup from just before the update started (taken after the game server was closed). This is the de facto primary emergency plan. We found that the missing lots still existed in the R105 DB. and after looking at older DB backups (we keep over a month's worth - to ensure that we have DB's going back to the previous server update/restart), we found that the issue existed for some time. long enough that we could not fathom a DB rollback.

    This is what led us to take the time to understand better what was impacted and form a solid sets of plans for recovery.

    We are currently working on building a QA recovery build, once that is executed and tested we will know if we can open QA for you all to come in and test/give feedback if things are as we expect them to be ... We will give detail in a testing directive format so you know what we expect will be there and what issues to ignore (if any).

    After ...

    We will be having a post mortem live stream once this is all wrapped up and will give in sight as what we are changing in the release process to better guard against such a major failure, as well as go into detail about what led up to the issues, etc...

    Further updates will be posted here and in Discord.
     
  2. Ravalox

    Ravalox Chief Cook and Bottle Washer Moderator SOTA Developer

    Messages:
    1,731
    Likes Received:
    4,954
    Trophy Points:
    125
    Gender:
    Male
    Location:
    Dallas, TX
    Current status:

    We have made further progress in recovery efforts... and work will continue through the night. The issue we have been facing is being addressed in two main parts. The first was to eliminate the errors being generated due to the missing references; this has been done. The second part is to heal the Database. This is what we are working on now.

    As soon as we have this next part in place and tested internally we will look to get as many players onto QA as possible to ensure that edge cases are not out there.

    When we are in the final preparation, We will Post a Testing Directive style post with details of what to look for and what to try. With that sanity testing done, we will then migrate the fixes onto the live server.

    Thanks once again for your patience!
     
  3. Ravalox

    Ravalox Chief Cook and Bottle Washer Moderator SOTA Developer

    Messages:
    1,731
    Likes Received:
    4,954
    Trophy Points:
    125
    Gender:
    Male
    Location:
    Dallas, TX
    Greetings!

    We are approaching the moment when we will be asking players to login to the QA server and test.

    The recovery has required a staged process. So far we have applied two DB Migration scripts and are currently running a third script against the database that is touching every item on every character in the game to resolve the issues. This last phase is performing over 400 million record read/write operations. The script has been running for two hours and, though we cannot commit to when it will be done exactly, the next step will be to open QA to the players.

    We are preparing the Live server with a build and then that will also need to have the scripts applied to it before we can get everything back up. We are hoping to get that point within 24 hours, with confirmation that QA is stable after testing with players.

    I will publish a QA Testing directive in the usual forum space but I will also post the thread link in Discord, this Announcement Forum thread, and on SotA's Twitter.

    We are putting together a set of rewards and sales for enduring this event with us. The package takes into consideration the time lost, time on potions that was lost, etc. The details of the items will be published as we get close to publishing the Live server, ensuring accuracy.

    There will also be a postmortem Livestream on Friday where we will be able to relate the what, how, and processes we applied to recover in addition to improvements in both process and infrastructure we will be implementing to prevent this level of catastrophic failure from occurring again.

    Once again, thank you for sticking with us through this ordeal and so many thanks for being so positive! The activity in Discord has given the team an opportunity to smile while we worked on the issues. You all have proven to us that the hard work and hours we have been putting in during this recovery effort are well worth any lost sleep. On behalf of the entire team, Thank you.
     
  4. Ravalox

    Ravalox Chief Cook and Bottle Washer Moderator SOTA Developer

    Messages:
    1,731
    Likes Received:
    4,954
    Trophy Points:
    125
    Gender:
    Male
    Location:
    Dallas, TX
Thread Status:
Not open for further replies.