1. Here you will find official announcements and updates. These announcements are also linked in the Official SotA Discord server.
    We encourage comments from the community! To keep the announcements official, we ask that comment threads be created in the General forums for player input.

                                                 Thanks!

Connectivity Issues April 1, 2017 (Saturday Morning)

Discussion in 'Announcements' started by DarkStarr, Apr 1, 2017.

Thread Status:
Not open for further replies.
  1. tphilipp

    tphilipp SotA Dev Moderator SOTA Developer

    Messages:
    535
    Likes Received:
    1,747
    Trophy Points:
    63
    :)

    Well, I do hope that my post wasn't that whiney, though...., that was not my intention. We have a ton of work still to do, and 2112Starman is not wrong that we had a downtime that we ideally shouldn't have had... it's sometimes a "pick your battles" kind of deal in a small team where most of us wear multiple hats, etc..
     
  2. Brass Knuckles

    Brass Knuckles Avatar

    Messages:
    3,958
    Likes Received:
    7,707
    Trophy Points:
    153
    No I actually loved reading it, I miss all they tecky convos.. thank you!
     
  3. 2112Starman

    2112Starman Avatar

    Messages:
    3,613
    Likes Received:
    7,989
    Trophy Points:
    165
    In the gig before my last gig, I was at a medium sized company (150 vm environment). I actually fired up the VMware Airwatch (called the hybrid cloud back then) (AWS too, just lesser so) for it, migrated 25% of my infrastructure up there. This was just after release and it turned out, I ended up being one of their "beta testers". I say that because it was not ready for release and I spent a year on the phone with the VMware engineers trying to get their stuff to work, saying that, they have some a long way and now actually use AWS on their back end. I think the cloud has its uses but from now on, Ill stick to a local on premise cloud. I still remember even office 365 going down for 8 hours a while back. It happens. Now I dont have any issues really with what you and Port are doing, largely because this is the first outage I have seen, which means you are doing your job. I mainly responded to your VMware comment largely since its something I have never heard utter by mid to high level system/network/datacenter administrators. The low ones usually say "ya, we use hyperV because its not "Linux" (that one drops me to the floor) or "we use it because its what we are used to".
     
    Last edited: Apr 1, 2017
  4. Roycestein Kaelstrom

    Roycestein Kaelstrom Avatar

    Messages:
    4,627
    Likes Received:
    10,229
    Trophy Points:
    153
    So the next time someone whines about horse statue costs on the add-on store vs the license and asks if everyone is ok with it, we should ask how many hours we want to wait when the service is down instead.
     
  5. tphilipp

    tphilipp SotA Dev Moderator SOTA Developer

    Messages:
    535
    Likes Received:
    1,747
    Trophy Points:
    63
    I hear you :) My vmware comment was meant to be humorous at first, but yeah, it's based on me being a bit opinionated about it, also.
    I think that I associate vmware with bloat, b/c I had to administer it a few times, and all I needed at one point was to basically look what's running, shut one vm down, etc.. So, in my brain that would be as simple as having a command line somwhere. And they do, you can ssh into the host (although sometimes one has learn their unfortunately sometimes changing toolset and command changes there). At that time, though, I basically got told "oh man, you need to use vsphere, it's the only way", and so I started downloading this 250MB monster, and kept asking myself "why?".

    And yeah, I hear you about the "because it's not Linux" argument, I also get that kind of comment from time to time.

    I did over the years learn to accept the "we use x b/c that's what we are used to", argument, though. One can't deny that it's better to have people know their way around one thing, and be able to jump in quickly, in case of, than forcing them to use something they feel uncomfortable with (ouf course, eventually they learn and might even appreciate it, if time is available). If the ecosystem doesn't become a mix of too many different things in the end, it can be good to stick what people know best. Again, it's a pick-your-battles kind of situation.
     
    Last edited: Apr 1, 2017
  6. 2112Starman

    2112Starman Avatar

    Messages:
    3,613
    Likes Received:
    7,989
    Trophy Points:
    165
    Around 80% of mid sized business's are viritualized. Of them, 82% of them are running VMware and 38% of them are running hyperV. 3% are using "other" which is sounds like you guys are a part of. Point being, this does define "industry standard" and its really really odd to hear about companies not going with the things that actually work in the enterprise market (in tiny to epically huge environments) :)

    I'd bet you guys can run everything on 2 latest gen server filled full of ram, you can buy VMware packs to run full HA for less then $5K (vSphere Essentials).

    Tell Starr to start diverting all my pledge money to some vmware licenses hahahahah!!!
     
    Last edited: Apr 1, 2017
  7. Greyfox

    Greyfox Avatar

    Messages:
    1,680
    Likes Received:
    5,942
    Trophy Points:
    113
    Gender:
    Male
    Location:
    USA EST
    <delete>
     
    Roycestein Kaelstrom likes this.
  8. Greyfox

    Greyfox Avatar

    Messages:
    1,680
    Likes Received:
    5,942
    Trophy Points:
    113
    Gender:
    Male
    Location:
    USA EST
    Thanks for the quick response and relatively quick recovery. Every MMO and Internet service suffers downtime.
     
  9. Roycestein Kaelstrom

    Roycestein Kaelstrom Avatar

    Messages:
    4,627
    Likes Received:
    10,229
    Trophy Points:
    153
    Several UO shards that I played in the past used dedicated servers over VMs due to some weird network issue that throws lag/rubber effects when the number of connected clients is getting high.

    Does SotA use VMs? If so, could it contribute to lag we've been getting when we go into the place with lots of dynamic item loads?

    Just curious about the hardware architecture here.
     
    Time Lord and tphilipp like this.
  10. tphilipp

    tphilipp SotA Dev Moderator SOTA Developer

    Messages:
    535
    Likes Received:
    1,747
    Trophy Points:
    63
    Partly. We do use VMs for about half of our servers. We used to have all on VMs, actually. However, b/c of a combination of some issues with cloud services used, and cost effectiveness we moved the other half to dedicated, but external, cloud managed servers. After some disappointments there, and having gotten own, real hardware in the meantime, we moved those servers again. This half of the servers includes the game servers. So yes, during the lifetime of SotA being open to the public, we moved servers twice, actually. So in some ways its ironic that we had our biggest downtime, yesterday, which was longer than the times we moved stuff.
    Anyways, it's full control now over those machines, but also full responsibility, of course. There are actually also VMs there, but well, we control the host now, which became increasingly important to us with our past experiences. The VMs are not used for the actual game servers, though, those are dedicated at the moment. Slowly this setup will grow, of course, and be more automated when it comes to failover, etc.. If we would've had the experience back then that we have now, we could've avoided both moves and the work involved, I guess... well, that's how you learn. :)
     
  11. Vyrin

    Vyrin Avatar

    Messages:
    2,956
    Likes Received:
    7,621
    Trophy Points:
    153
    Gender:
    Male
    Location:
    Minnesota, USA
    A few hours since persistence, for a game "in development"? Aren't we supposed to be still testing/fixing not assuming perfection? A solution was found quickly, in the off hours.

    Looked at the right way, this was a very good thing to happen. It would be silly to offer anything other than encouragement (and for those who can, technical ideas) to everyone involved here. Keep up the good work.
     
    tomcat_de, Time Lord, Elwyn and 5 others like this.
  12. Daxxe Diggler

    Daxxe Diggler Avatar

    Messages:
    2,692
    Likes Received:
    5,711
    Trophy Points:
    153
    Gender:
    Male
    Location:
    Virtue Oasis - Hidden Vale
    I think this problem happened at the perfect time. I was not even at my computer during the entire outage so I didn't even know about it until the next day! :p

    Still, for it to happen on a weekend at this stage of development... and get fixed before the following Monday (actually before the next day) is very promising to me. I have played other games that were already "released" and a server crash on a Friday night meant don't bother trying until Monday. :(

    So kudos to the team for making this incident virtually unknow to me. ;)
     
  13. 2112Starman

    2112Starman Avatar

    Messages:
    3,613
    Likes Received:
    7,989
    Trophy Points:
    165
    The servers being virtualized wont really effect those things unless configured pretty poorly. At this point in the world, most servers are virtualized (80%). This goes for massive servers hosting everything you see on the internet from small to huge sites.

    I can explain it best like this:

    Computer hardware has far exceeded operating systems and software. 10 years ago you would buy one hardware server per server (say Windows server 2012 with is a derivative of Win7 or server 2016 which is a derivative of Win10). But then 64 bit computing came around and lifted the 4 gig ram limits.

    The last server I built had 32 cores and 2 TB of ram and could run 100's of virtual servers on it and barely crack 10% cpu utilization (it could run 1000 windows 7 VM's easily on it if we wanted it to).

    What people dont realize is the same is true for home hardware with the exception of video cards which are still highly utilized in games.

    This i7 Skylake I run right now could run dozens of win7 machines (ram dependent) all at one time and barely crack 10% cput utilization. If those machines were just doing office kind of work, internet browsing and email.

    The servers in this case and Win7 VM's dont even know they arent really on physical hardware (well, they are smarter about it now, in fact designed for it), they have for example vmware drivers for all of their virtual hardware. Machines that are virutalized also run far better then their counterparts because of this, in a VMware environment, there is hundreds of millions of servers out there running the same extremely optimized drivers which means they are extremely efficient and its one of the reasons why you get huge up times. You go from have a 20 gig windows install on a drive that uses drivers attached to the hardware (flawed) to VMware esxi on the server which is a 250 MEG install with the virtual severs using these really efficient drivers. In a virtual environment, the only time you need to reboot a server is for patched, 1 year is like normal and they very rarely crash. If they crash, the main point of using virtualization is that at the point, all the virtual servers pick up and immediately move to other physical servers running in the cluster. 10 years go, this was like magic.

    So that takes me to why I asked them if High availability kicked in and all the SOTA virtual server migrate to a physical server thats still running.
     
    StarLord, Time Lord and tphilipp like this.
  14. tphilipp

    tphilipp SotA Dev Moderator SOTA Developer

    Messages:
    535
    Likes Received:
    1,747
    Trophy Points:
    63
    In general I agree, but depends on what's running on the other VMs. There is more than just the CPU. I fully agree with you that virtualization is perfect to make more effective use of machines, than before. We are not against it. However, it's still possible to bog down a CPU by one host alone, completely, if you really wanted to. And that does have definitely an effect on the other hosts. Ever ended up on a host at AWS where you can see a high steal time? Oh, turns out the machines on the same hardware are owned, and are mining bitcoins with some efficient client.... great. It became a rule of thumb for us to check if a new instance on services like AWS feels sluggish after starting it, and if so, kill it and start another one.


    The gfx chip makers are actually adding support for virtualization in their hardware now, also. Still, there are so many other components that are shared, that also have a say, depending on your needs.


    True... except if the hardware itself dies. In which case, which was our problem on saturday...


    ... one needs additional hardware. And all the management logic needs to be itself on failsafe hardware. This is the direction we are going towards, b/c - just to stress this point - we have no problem with virtualization and totally see the benefit. However, out of past experiences, it's not acceptable anymore for us to not be able to control the host/hardware, b/c there are plenty of side effects that come from other instances that do matter.


    Because you are thinking of some server in some scalable array having had an issue. That wasn't the case, though.
    The thing is, that given what happened, it wouldn't have made a difference at all to use VMs for any server. No VM tech would have. What happened specifically, was that the main firewall/load-balancer's interface, where the data center's uplink cable is plugged into, died. Actually, "died" is not correct, it went in some kind of zombie state to still handle existing connections, but refuse new ones. And this on the hardware side. The operating system didn't notice any of that, no errors, nothing. For it it appeared as if simply nobody connected anymore, but it was still handling all existing connections. The hardware itself did notice that something is wrong, though, as warning LEDs came on (and yes, I've never experienced such a thing before).
    All of this added to the delay in getting this taken care of, b/c we needed physical acces. It didn't help either, that it was in the middle of the night on a weekend.

    So, to fix this from happening again, yes, we need failover redundancy there. This could be in many forms, my choice would be more than one box (the second one is actually there, and always was, fully setup, just not live, b/c of not having had the time to push this all the way), more than one cable, and CARP. Fact is, this failover has to be hardware based, and not VM.
     
  15. jammaplaya

    jammaplaya Avatar

    Messages:
    1,139
    Likes Received:
    1,995
    Trophy Points:
    113
    +1 great idea!! To accomplish this you could add layering deeds to the add-on store that allow players to change the slot which their masks go in to.

    For example, I could change the slot of my plain carnival mask to the cloak slot and then wear an epic cotton hood over it! That would look amazing I think.

    Only 1 DR masks would work with the layering deeds, of course, since cloaks always only give 1 DR as well, I think. Not quite sure.

    Just make sure that whatever is in the cloak slot would go underneath whatever is in the head slot and it should be just fine.

    A great addition to artistic freedom I should think.
     
    Time Lord and Hornpipe like this.
  16. tphilipp

    tphilipp SotA Dev Moderator SOTA Developer

    Messages:
    535
    Likes Received:
    1,747
    Trophy Points:
    63
    ;)
    Now that's a way to hijack this thread :D
     
    Time Lord and jammaplaya like this.
  17. Elwyn

    Elwyn Avatar

    Messages:
    3,619
    Likes Received:
    4,784
    Trophy Points:
    153
    Gender:
    Male
    Location:
    San Antonio, TX
    You mean a third floppy drive, they already have two in the lobby.
     
    Time Lord and Brass Knuckles like this.
  18. 2112Starman

    2112Starman Avatar

    Messages:
    3,613
    Likes Received:
    7,989
    Trophy Points:
    165
    This is when its great to have DRS on the cluster and have vROPS installed so that one rouge VM cant take down all the VM's on one server. Shared GPU's are ok, but pretty limited, we use them now in a 250 win7 VM Horizon build. The cruddy thing is that Nvidia now wants to charge per use licensing fees to use these cards. vRops gives you every and any piece of information you could ever imagine on the VM's in the cluster over time. Having vROPS installed allows you to proactively watch for major issues in hardware. For example, I used vrops to prove that one of our massive SQL servers that the SQL team built wrong was pegging the VM so hard at times that it was sitting at 120 ms latency to storage and as well was causing the other 70 VMs on its data store to have 80+. Very powerful stuff.
     
  19. tphilipp

    tphilipp SotA Dev Moderator SOTA Developer

    Messages:
    535
    Likes Received:
    1,747
    Trophy Points:
    63
    Sounds powerful indeed - thanks for sharing, as I (as you would've guessed) am not following vmware's development, regularly. I do hope, though, that we one day need 70 servers for SotA and a constant need to start and stop instances... if that's the case, then SotA has proven to be majorly successful :)
     
  20. StarLord

    StarLord Avatar

    Messages:
    711
    Likes Received:
    1,245
    Trophy Points:
    105
    Gender:
    Male
    The network seems to be down again (Thursday 10PM CEST)
     
    Time Lord likes this.
Thread Status:
Not open for further replies.