1. Here you will find official announcements and updates. These announcements are also linked in the Official SotA Discord server.
    We encourage comments from the community! To keep the announcements official, we ask that comment threads be created in the General forums for player input.

                                                 Thanks!

Server outage (Dec 24 2023) Root Cause Analysis

Discussion in 'Announcements' started by Ravalox, Dec 24, 2023.

  1. Ravalox

    Ravalox Chief Cook and Bottle Washer Moderator SOTA Developer

    Messages:
    1,746
    Likes Received:
    5,003
    Trophy Points:
    125
    Gender:
    Male
    Location:
    Dallas, TX
    Starting at midnight last night the server became unresponsive to players. This was due to an issue with the Database backup routine failing at startup, leaving the database locked.

    The outage lasted a total of 1 hour, with no rollbacks or other impact.

    The database is backed up every 8 hours through an automated process. At the start of the process, the primary DB is locked and one of our secondary DB servers takes over. in this case the script failed immediately after locking the database.

    This caused the gameserver to become non responsive since there was no confirmation response when writing updates to the database. (a bit of a safety catch to ensure that data is not lost).

    We found that the issue was caused by a rare "glare condition" (conflicting execution of multiple processes at the exact same time). We restarted the backup routine and all operations recovered as expected.

    We will be making two changes to prevent this from occurring again:

    1. Adding an unlock command to the errored exit portion of the backup script
    2. Changing the start time of other DB dependant processes to ensure that a glare condition cannot occur in the future.
     
    Beaumaris, StarLord, Mafu and 19 others like this.