Understanding Probability and Sample Sizes

Discussion in 'General Discussion' started by Poor game design, Nov 5, 2016.

Thread Status:
Not open for further replies.
  1. MrBlight

    MrBlight Avatar

    Messages:
    2,388
    Likes Received:
    4,452
    Trophy Points:
    153
    I skipped most of the posts in here.
    So far not seeing issue with the rng like some are. but i did see someone mentioning gear breaking.
    I also agree.. make gear un repairable or the max dura un repairable.
    Regardless of the rng, the chance could be 5% instead of 25%. once you have said armor, or wep, you never need to craft one agian.
    Currentlt everytging is a 1 time purchase. i need 1 bow. i need 1 armor set. n it lasts for ever or till i upgrade.

    This means 0 economy from player t player. already seeing it.
     
  2. Vagabond Sam

    Vagabond Sam Avatar

    Messages:
    294
    Likes Received:
    816
    Trophy Points:
    40
    Gender:
    Male
    Location:
    Brisbane
    Let's not confuse the issue.

    People have crtisized saying RNG is broken and there are good arguments why their data is problematic.

    However it is unreasonable to expect players to gather and craft enough to meet confidence standards required to know within a significant margin of error.

    More importantly though, what this feedback tells us is the system is not "fun".

    Interestingly enough it is also not immersive as it requires suspension of belief to imagine my skilled carpenter cannot control the outcome of the craft with regards to the start distribution.

    But threads like this are just confusing the underlying issue by debating the interpretation of symptoms.

    Not seeing the forest for the trees.
     
    LoneStranger likes this.
  3. Burzmali

    Burzmali Avatar

    Messages:
    1,290
    Likes Received:
    1,771
    Trophy Points:
    113
    To counter OP's initial point, 400 samples isn't enough to accurately determine the actual probability to within an arbitrary margin of error, however it is far more than sufficient to evaluate the validity of a proposed probability. I spent years running a statistical quality control program for products that ignite your vehicle if they fail and industry standard was around 5 units if you could measure it and 30 if it was go/no go. Basically, if my spec says that 1 in 100 should fail a test and I have 2 in 30 fail in a sample, that'd be enough to shut down a line that cost thousands an hour to run to verify that the process was still working as expected. In Binomial distributions like crafting, calculating the odds that the advertised number is actually governing the roll is pretty trivial, getting heads four times in a row is a 1 in 16 chance, that's not so uncommon as to create concern, but if a coin comes up heads 9 times in a row (1 in 512) that's roughly the criteria to trigger suspicion in modern industrial processes, i.e. you'd stop assuming the coin is fair until proven otherwise.

    To be clear, I don't have any specific complaint about the fairness of SotA's dice (though I have a sneaking suspicion it forgets to roll them sometimes when the server is busy), but the assertion that you need some massive sample size to evaluate whether or not a system is being held to a published probability is nonsense.
     
    Spoon and LoneStranger like this.
  4. Drocis the Devious

    Drocis the Devious Avatar

    Messages:
    18,188
    Likes Received:
    35,440
    Trophy Points:
    153
    Gender:
    Male
    If you flipped a coin and it came up heads 16 times in a row I wouldn't "shut down the line". Neither would anyone else flipping a coin.

    Because we're talking about an MMO crafting system and not drivers that might catch on fire, I'm comfortable not freaking out about a sample size of 400. I don't doubt that you have more experience and more knowledge about statistics than I do. But the concept is still the same. You can't come to the conclusion that based on some guys "random sample size of 400" that the RNG is broken. It may be, but you can't come to that conclusion.

    When you use statistics you're applying a methodology, at your job, or even here. There's no hard and fast rule that says you need 100 vs. 400 vs. 5 million. I believe the term that is often used is that something is or is not "statistically relevant". Which doesn't mean that it's 100% accurate, just that the methodology is considered to be meaningful vs. meaningless.

    Do you know why most companies use 100 as a sample size? Because they can get away with it. (my opinion) Because to do more would cost more, take more time, and productivity is key (keep that line moving!). Since we don't have a line to keep moving, I think we have the time and it's not going to cost too much for the devs to "check it".

    The 400 sample size may or may not fall within that threshold, however it doesn't actually prove anything. I proves far less than a larger sample size would, and it's debatable if the sample was truly random. What's really important here is that people "don't like it". That's fair feedback. I'm not one of those people, I happen to like it just fine although I think it should be more draconian with percentages that are more punishing (for the good of the economy). All I'm saying in the OP is that I don't think that a lot of the people doing the complaining have any idea what they're talking about when it comes to "the math" and the "probability".
     
    Last edited: Nov 6, 2016
    jammaplaya likes this.
  5. UnseenDragon

    UnseenDragon Avatar

    Messages:
    404
    Likes Received:
    1,097
    Trophy Points:
    55
    Gender:
    Male
    Location:
    Columiba, MD

    Yes, one would use a smaller sample to trigger an investigation. Especially in a model such as this when there is a known 'correct' prior. And companies, if they are worth their salt, don't just pick 100 as their sample size. They may report X out of 100, but there is a simple formula to calculate sample size to test for a given confidence interval in a given population size. So to the point you made, yes people finding numbers way out of expectation is worthy of investigation. Does it mean that it couldn't happen by chance, of course not. But, especially when it is an untested product in progress, I would absolutely use that as a sign of something to look at. And 400, if controlled properly would be a reasonable number, and doubling that wouldn't bring tighten the confidence interval that much.
     
    LoneStranger likes this.
  6. Turk Key

    Turk Key Avatar

    Messages:
    2,561
    Likes Received:
    4,012
    Trophy Points:
    153
    Gender:
    Male
    I understand your point as to reducing the odds of success. But can you imagine if a year from now, 20 people out of the whole population had succeeded in producing the Best Gear, we would have a riot on our hands with every possible explanation about how unfair it is. People work, have kids, have to study, and on and on. Where is the sweet spot? Telling everyone that they don't need what the 20 have just will not cut the mustard IMO.
     
  7. Drocis the Devious

    Drocis the Devious Avatar

    Messages:
    18,188
    Likes Received:
    35,440
    Trophy Points:
    153
    Gender:
    Male
    That's a valid concern. But so is the idea that everyone should have the best gear.

    It's very similar to limited housing. If everyone has housing, is housing still special? I would argue it's not.

    But I also argue that the current system is FAR from excluding the average person from getting the best gear. What I actually read people complaining about is that they often have gear destroyed when they have a 44% or above chance of success and that makes them sad. They do not want failure, they want a high degree of success. No where do I read players with this attitude thinking about the long or short term effects of "everyone being successful" to the economy.

    I want everyone to be happy and have fun, but I don't believe everyone having an easy time of it is the path that will lead us there. Ultimately no one is going to have fun if everyone in the game has the same +7 Longsword that they made themselves.
     
  8. Vagabond Sam

    Vagabond Sam Avatar

    Messages:
    294
    Likes Received:
    816
    Trophy Points:
    40
    Gender:
    Male
    Location:
    Brisbane
    I don't understand how an item lottery creating system increases difficulty.

    I also don't think exclusivity that's tied to success via large number averages is particularly innovative or fair.

    Particulary in a game marketed a single player as well a multilayer
     
  9. Drocis the Devious

    Drocis the Devious Avatar

    Messages:
    18,188
    Likes Received:
    35,440
    Trophy Points:
    153
    Gender:
    Male
    You're against loot tables then?

    The same logic applies to those.
     
  10. KnownInGameAsGeorge

    KnownInGameAsGeorge Avatar

    Messages:
    190
    Likes Received:
    369
    Trophy Points:
    18
    That would be ridiculous, this is Novia and we are all adults here.
    Draw your weapon and buff defense like a grownup or you are not going to last long once I start swinging my mace.
     
  11. Vagabond Sam

    Vagabond Sam Avatar

    Messages:
    294
    Likes Received:
    816
    Trophy Points:
    40
    Gender:
    Male
    Location:
    Brisbane
    Most loot tables in games consist of a controlled range of drops in the range of dozens.

    I can choose what tables to farm and i can predict with reliability what the expected outcome is.

    Better items or higher rarity items are controlled through encounter difficulty.

    Assume 'legendary' equivalent item in shroud of the avatar is exceptional with 3 enchants and 3 masterworks.

    Your loot table, inclusive of failures due to breakage and rolling sub par is huge, uncontrollable and unpredictable with no mechanism to prefer the stats you desire.

    I'd like to make an item with the Light enchantment but i have no appreciation of the odds on getting that roll and at current material costs just can't justify that gamble.

    Based on that I'd argue your equivalence of the two systems is not very fair.

    To be clear i don't want loot tables in SOTA like this.
     
    Kirran likes this.
  12. meadmoon

    meadmoon Avatar

    Messages:
    218
    Likes Received:
    456
    Trophy Points:
    18
    *rolls eyes*

    Latex or boffer?
     
  13. Burzmali

    Burzmali Avatar

    Messages:
    1,290
    Likes Received:
    1,771
    Trophy Points:
    113
    Of course they would, the odds that a fair coin would turn up heads 16 times in a row is 65535 to 1. Assuming you were checking that coin 4 times a day, it'd be decades between false positives. Honestly, if you are going to pontificate about prob and stats, turn off youtube and take a college course or 3.
     
    Aldo likes this.
  14. Drocis the Devious

    Drocis the Devious Avatar

    Messages:
    18,188
    Likes Received:
    35,440
    Trophy Points:
    153
    Gender:
    Male
    That's what we would call a personal attack.

    Also, I have taken several college courses, enough to know that I'm no statistician. To call Khan Academy "youtube" is kind of funny though. It's not quite a cat video or some guy in his basement making stuff up.

    Here, please listen to this episode of Radio Lab at the 8 minute 25 mark.
    http://www.radiolab.org/story/91684-stochasticity/

    If you still want to argue with me after hearing from the Professor of Statistics at the University at California Berkeley, and the Professor of Finance and Law at Arizona St. I think we'll just have to agree to disagree.


    @Chris @DarkStarr @Berek Please feel free to use this Radio Lab example if you ever make a post about the RNG, I think it would be a great help to most people.
     
    Last edited: Nov 7, 2016
  15. Burzmali

    Burzmali Avatar

    Messages:
    1,290
    Likes Received:
    1,771
    Trophy Points:
    113
    Well, I guess we can agree on something.

    As far as crafting goes, a sample size of 400 looks to be good for detecting around a 3-5% discrepancy in a published rate, i.e. in system with a published rate of 10%, a sample of 400 would be enough to reliably detect if the actual rate was likely outside of the range of 6-14% with more accuracy than is required by your typical medical journal.
     
  16. Zapatos80

    Zapatos80 Avatar

    Messages:
    538
    Likes Received:
    753
    Trophy Points:
    63

    That depends on your definition on good gear. One thing for sure though, making great gear is exponentially difficult. For example i'm trying to get a 3 ench 3 MW piece for my chest.

    I'm guessing, on average, it will take about 30-35 pieces to get the stats I want on top of the 3/3 (i'm around 75 masterwork, 88 ench).

    Let's say 30 pieces, so 300 bolts of fustian cloth.

    1) 300 fustian bolts = 600 fustian spools = 2400 cotton spools = 4800 raw cotton + 600 suet. At decent market prices (about 130g/suet and 5g/cotton atm), that comes out to 102k gold.
    But let's not forget those sneaky fuel costs.

    2) Processing 4800 cotton is 1200 times the operation at 2 wax per, so 2400 wax. Processing the resulting 2400 spools of cotton into 600 fustian spools is 600 times the operation at 4 wax per, so another 2400 wax. Finally, the fustian bolts is 300 times the operation at 1 wax per, so 300 wax, for a total of 5100 wax at 4 per, so 20400g in fuel.

    3) Finally, refining all those materials is about 4-5 hours sitting at the loom, depending on how fast you can process your batches of 20.
    So 30 fustian chests comes out to 122,400g and 5 hours work.

    4) NOW do all this math for gold & silver ingots (i'm not going to get into probabilities here since calculating the exact number of ingots you'll use per attempt on average would take way too long). But let's ballpark and say 20 ingots used per chest, overall, to get the 3/3 I want, so 600 ingots total.

    600 ingots = 2400 ore + 600 coal, and a bit less than an hour at the forge.
    Right now gold is 3-3,5k/100 and silver about 5k/100. Let's average at roughly 4k/100 with coal at 6g per.
    600 ingots = 96,000g of ore + 3600g of coal = 99,600g + 1 hour at the forge.

    5) Finally, the fuel cost in mandrake root. 600 ingots = 60 attempts = 600 mandrake root at 4 per, so 2400g in mandrake root. And about 30 minutes work processing and picking enchants/MW.

    6) Let's tally :

    Gold required = 122,400g for the 30 chests, 99,600g for the ingots and 2400g in mandrake root = 224,400g
    Time required = 5 hours + 1 hour + 30 minutes = 6h30mins


    FOR A GRAND TOTAL OF!

    SO, for a single "great" chest, ON AVERAGE, 224,00g and 6 hours and 30 minutes spent working on it. Seems fair to me.

    Keep in mind those number could be, probably, 20-25% lower if I was GM Ench GM MW (but lets not even get into how much ore, gold, XP and time is needed to get there).


    Finally, if you go for a 4th enchant OR Masterwork, those numbers are multiplied, easily, by a factor of 8. So 1,2M+ for a 4/3 or 3/4 piece of gear as a GM/GM.


    So yea, "good" gear is easy, "great" gear is VERY hard and expensive, and "amazing" gear is just insanity. Which IMO, is fair!
     
  17. Drocis the Devious

    Drocis the Devious Avatar

    Messages:
    18,188
    Likes Received:
    35,440
    Trophy Points:
    153
    Gender:
    Male
    You're not really showing your work here, you're just saying it's true because you said it was true. Also, you're using a methodology that includes the hard core bar of a "typical medical journal". Are we talking medical trials where all you have to do is prove something is as effective as a placebo? That wouldn't be a very high bar, nor would it help us make a video game any better.

    The OP is about two things really. One, that the average person on these forums doesn't understand probability. Two, that a sample size of 400 doesn't explain if the RNG is broken. You're attacking number two (without showing your work) and using criteria that may work well for you at your job but may not (I don't know) apply here at all. Again, we don't have an assembly line to "stop".

    You're also assuming the data is controlled, which seems to be a big assumption considering your inclination to discount anything on the internet as uninformed. :)
     
  18. Drocis the Devious

    Drocis the Devious Avatar

    Messages:
    18,188
    Likes Received:
    35,440
    Trophy Points:
    153
    Gender:
    Male
    Yes, it is. That's by design.

    Well in my opinion it's great! I don't know what fair has to do with it at all. We all have the same opportunity to make great gear (over time).
     
  19. Zapatos80

    Zapatos80 Avatar

    Messages:
    538
    Likes Received:
    753
    Trophy Points:
    63
    Yea i'm saying, if my calculations are correct, then the game seems pretty balanced. ESPECIALLY when you consider that there's really not a big difference between a 3/2, 3/3 or 4/3 piece of gear, effectively "soft capping" gear, but always giving the true, dedicated crafter a chance to go for a true masterpiece. I like it!

    Also I agree that the average person has no idea how statistics work. As an ex "semi-pro" part time poker player, I made a lot of money off that counter-intuitive math :p
     
  20. Zapatos80

    Zapatos80 Avatar

    Messages:
    538
    Likes Received:
    753
    Trophy Points:
    63
    While I do agree here about the +7 Longsword, you have to remember it's still quite a bit of work to do yourself. Raising the smith skill high enough to craft a longsword, mining the ingots for the sword, mining the silver and gold for enchants/MW, the fuel costs, and obviously a good bit of time invested.

    Yes, it can be done "relatively" easily, but if your goal is only to get decent/good gear like that +7 longsword, you can buy those items by farming monsters for 20 minutes, MUCH faster than going at it yourself. Plus, a lot of people aren't interested in crafting at all. Finally, I think the real "player market" for crafters will be the high-end stuff, with a middling market for average goods sold under mats price for training's sake. Hope that makes sense :)
     
Thread Status:
Not open for further replies.