Software simulator and unexpected results

k_c · Sep 19, 2009

sagefr0g said:
ok icnt, i see the difference.

but what's the why of it?
or at least the how of it?
i mean what's the source and reason for these differences, errhh ehhmm, i mean the mechanics of whats going on. :whip:

just i would imagine that the mechanics of that 156 card grab and riffle is unrealistic for the long run sort of thing.:whip: but it could happen.

probably what i'm asking is what's it all mean, or i'd just like to hear some rational with respect to what you layed out there.

part of it is i guess your showing maybe a random shuffle is unbiased while a non random repeated same way shuffle could be over the long run? the implication being the closer to random the better for sims?:whip:

sagefr0g, do you mind if I join you as a non-expert in physical card shuffling?

I know that in the course of viewing blackjack forums over a period of time I have seen in various discussions that it has been shown that the effect of a non-random shuffle as might be done in a casino is pretty much the same in the long run as a random shuffle. I found these 2 articles on the subject.
http://www.blackjackforumonline.com/content/manvscom.htm
http://www.blackjackforumonline.com/content/nonrandom_shuffle_blackjack_systems.htm

If true then I would imagine that a non-random shuffle would have to be upredictable in it's implementation details, though, to turn out to be equivalent to a random shuffle in the long run.

On a separate but maybe related note, if the ideal random shuffle is defined as the perfect interleaving of alternate cards, evidently that is not random at all because with enough shuffles the deck will eventually be returned to its inital state.
http://forum.wolframscience.com/showthread.php?s=&threadid=537

sagefr0g · Sep 20, 2009

k_c said:
sagefr0g, do you mind if I join you as a non-expert in physical card shuffling?

good company is always welcome.

I know that in the course of viewing blackjack forums over a period of time I have seen in various discussions that it has been shown that the effect of a non-random shuffle as might be done in a casino is pretty much the same in the long run as a random shuffle. I found these 2 articles on the subject.
http://www.blackjackforumonline.com/content/manvscom.htm
http://www.blackjackforumonline.com/content/nonrandom_shuffle_blackjack_systems.htm

If true then I would imagine that a non-random shuffle would have to be upredictable in it's implementation details, though, to turn out to be equivalent to a random shuffle in the long run.

On a separate but maybe related note, if the ideal random shuffle is defined as the perfect interleaving of alternate cards, evidently that is not random at all because with enough shuffles the deck will eventually be returned to its inital state.
http://forum.wolframscience.com/showthread.php?s=&threadid=537

a wealth of info in those links, more than i can digest in a short period of time. thank you.

just me maybe, but i can't seem to get a handle on what random really means. automonkey in the chat the other night said something about meta-physical stuff.

:whip:

i know there are phenomenon in nature that are thought of as really random, stuff such as how radioactivity behaves in certain instances is i believe one case.

i dunno, i think it has to do with uncertainty, stuff that's knowable and unknowable. so it's to me kind of weird cause it seems randomness would have a dual nature such as how matter can be viewed as particulate or as a wave, sort of thing. not sure what he meant but i think QFIT posted something about he looked into the Heisenberg uncertainty principle when he was messing with the issue of rngs and random stuff.

but what ever the post where you explained the limitations of a rng and shuffle stuff was interesting even if it does make my head spin, lol.

London Colin · Sep 20, 2009

k_c said:
You have me convinced, There are definitely 27 possibilities in your 3 card example and 6 equally likely orders that are needed. It's impossible that 27 equally likely possibilities can be representative of 6 equally likely orders.

When I get time I am going to switch my algorithm in the programs that do shuffling. Thank you for pointing this out.

You're most welcome.

k_c said:
My original point in this thread was to suggest the possibility of a shuffling bias to the original poster due to his statement that he restarted his sim 10,000 times. Both my algorithm and the Knuth algorithm would be susceptible to this bias if only the rand() function and modulus division are used each time the shoe is reordered for the reasons I stated.

At the risk of sounding like a broken record, your plan to replace the modulus operation with this ...

int j = int(double(rand())/(RAND_MAX+1)*totCards+1)

will not eliminate the bias; it will only redistribute it. The same pigeonhole argument applies to this as did to the shuffle algorithm. The only solution I am aware of is to discard any random number that falls out of range, and ask for another, like I do in the function I put in my first post.

That being said, the Wiki article says -

This form of bias can be avoided in a number of ways. Probably the easiest involves deliberately limiting the number of usable outputs of the random number generator to a multiple of the number of choices to be made prior to performing the modulo operator, and re-trying the random number generator until a value within that range is produced. Although this algorithm is, strictly speaking, not guaranteed to terminate, given a true random number generator, the probability of this algorithm not terminating in a reasonable time is so small that it can be neglected for all practical purposes.

So, apparently there are other solutions, but I don't know what they might be.

As with the shuffle, the pigeonhole argument can easily be demonstrated by shrinking the numbers to create a suitably small example. Suppose you want to generate integers in the range 0..3, and RAND_MAX is 5 (i.e. 6 possible return values from rand() : 0..5).

Code:

[B]rand()   rand()%4   (rand()/6 * 4)[/B]
0        0          0
1        1          0
2        2          1
3        3          2
4        0          2
5        1          3

So the bias just shifts from (0 and 1) to (0 and 2).

London Colin · Sep 20, 2009

fabietto said:
That's the function I'm using in my software. According to the documentation (http://stdcxx.apache.org/doc/stdlibref/random-shuffle.html), the elements in the array are shuffled according to a uniform distribution. Seems to me a good and easy way to obtain a proper shuffle. Might be interesting to see if calling multiple times (i.e., at the beginning of each trial) the std::rand() function makes any differences compared to just calling it once at the beginning of a simulation.

Did you mean to say 'std::srand()', rather than rand? (i.e. seeding the PRNG?)

I'm using random_shuffle() too, but I've still got the original code present, commented out, and may go back to it. As I recall, random_shuffle() proved faster. The worry has to be, though, that you are depending on a bug-free implementation on every system on which your program is built. I know you could say the same thing about any standard library function, or even the compiler itself, but given all the subtleties that my enquiries into this area have thrown up, I'm tempted to adopt the do-it-yourself approach.

Another issue is that I don't think there is any guarantee that random_shuffle() must use rand() internally. (Indeed, it might be better if it didn't.) So you can't be sure that a call of srand() will have any impact on it's operation, meaning every run could be identical. I had a quick look at your code and saw that this is what you are doing, so there may be portabilty issues.

You can get around that by utilizing the optional third parameter to random_shuffle(). This also offers the possibility of specifying an entirely different, better PRNG.

I mentioned the Boost libraries earlier. I had a quick look, and there is indeed plenty of support there for random number generation, inlcuding an implementation of the 'Mersenne Twister', which I know is highly regarded. The support includes the ability to create a RandomNumberGenerator function object, which can be used as the third parameter to random_shuffle().

k_c · Sep 20, 2009

London Colin said:
You're most welcome.

At the risk of sounding like a broken record, your plan to replace the modulus operation with this ...

will not eliminate the bias; it will only redistribute it. The same pigeonhole argument applies to this as did to the shuffle algorithm. The only solution I am aware of is to discard any random number that falls out of range, and ask for another, like I do in the function I put in my first post.

That being said, the Wiki article says -

So, apparently there are other solutions, but I don't know what they might be.

As with the shuffle, the pigeonhole argument can easily be demonstrated by shrinking the numbers to create a suitably small example. Suppose you want to generate integers in the range 0..3, and RAND_MAX is 5 (i.e. 6 possible return values from rand() : 0..5).

Code:

[B]rand() rand()%4 (rand()/6 * 4)[/B] 0 0 0 1 1 0 2 2 1 3 3 2 4 0 2 5 1 3

So the bias just shifts from (0 and 1) to (0 and 2).

Yes, you are correct. My first thought was to discard the excesses but I wanted to avoid the possibility of indefinite postponement so I rather haphazardly came up with what I posted.

I like the way that you analyze problems by starting with a simple model. Basically that is what I do. My main problem right now is that I don't have as much time as I would like.

k_c · Sep 20, 2009

sagefr0g said:
good company is always welcome.

a wealth of info in those links, more than i can digest in a short period of time. thank you.

just me maybe, but i can't seem to get a handle on what random really means. automonkey in the chat the other night said something about meta-physical stuff.:whip:

i know there are phenomenon in nature that are thought of as really random, stuff such as how radioactivity behaves in certain instances is i believe one case.

i dunno, i think it has to do with uncertainty, stuff that's knowable and unknowable. so it's to me kind of weird cause it seems randomness would have a dual nature such as how matter can be viewed as particulate or as a wave, sort of thing. not sure what he meant but i think QFIT posted something about he looked into the Heisenberg uncertainty principle when he was messing with the issue of rngs and random stuff.

but what ever the post where you explained the limitations of a rng and shuffle stuff was interesting even if it does make my head spin, lol.

To me random means just that, random. Anything can happen.

Simulating random is a different issue. If one event is expected to occur 1 time in 10 and another event is expected to occur 9 times in 10 but you want to simulate the random occurences of each event then a valid sim should do 2 things:
1. Ensure that anything can happen
2. Ensure that Event 2 occurs 9 times as much as event 1 in the long run
Determining how often something is expected is where combinations and permutations might be used.

Addressing the above is what London Colin has been most helpful with.

Since I don't think there is such a thing as a perfect random number generator the next problem is to look at efficiency and in particular relative to blackjack. As a fellow non-expert I can only give my perceptions.

Some rngs are better than others maybe even to a great degree. I think the general consensus of those seeking the best solutions are that rngs that come with programming languages such as Visual Basic, C++, etc. are notoriously weak. But there are an awful lot of possible shuffles in blackjack so even a very good rng might not be a great deal more effective than a poor one. Kinda like comparing
000000000000001/9999999999999999999999999999999999999999999999999
9999999999999999999
to
999999999999999/9999999999999999999999999999999999999999999999999
9999999999999999999

Second numerator is far larger than the first but scientists would view these virtually identical to a large number of significant figures.

QFIT · Sep 20, 2009

k_c said:
a very good rng might not be a great deal more effective than a poor one. Kinda like comparing
000000000000001/9999999999999999999999999999999999999999999999999
9999999999999999999
to
999999999999999/9999999999999999999999999999999999999999999999999
9999999999999999999

Second numerator is far larger than the first but scientists would view these virtually identical to a large number of significant figures.

The difference between a good RNG and a bad RNG is enormous. Compilers nearly always uses LCGs. Linear Congruential Generators. Very fast, very easy, and worthless for Blackjack. The RNG in Excel (unless they have changed it) was known to degrade after 30,000 numbers or so. Period is the number of results before the RNG repeats. (An algorithmic RNG must repeat after some period.) The MarZam II RNG has a period so long, that if you started a PC generating numbers as fast as it could at the beginning of the Universe, it will not yet have repeated.

iCountNTrack · Sep 21, 2009

QFIT said:
The difference between a good RNG and a bad RNG is enormous. Compilers nearly always uses LCGs. Linear Congruential Generators. Very fast, very easy, and worthless for Blackjack. The RNG in Excel (unless they have changed it) was known to degrade after 30,000 numbers or so. Period is the number of results before the RNG repeats. (An algorithmic RNG must repeat after some period.) The MarZam II RNG has a period so long, that if you started a PC generating numbers as fast as it could at the beginning of the Universe, it will not yet have repeated.

Actually not all LCGs are bad. I use an LCG by L’Ecuyer with added Bayes‐Durham shuffle to generate random shuffles. The algorithm is
described in Numerical Recipes (2nd edition p. 994) as the function ran2(). This provides nearly 2^32 distinct values and the sequence of random numbers has a period in excess of 10^18.

QFIT · Sep 21, 2009

iCountNTrack said:
Actually not all LCGs are bad. I use an LCG by L’Ecuyer with added Bayes‐Durham shuffle to generate random shuffles. The algorithm is
described in Numerical Recipes (2nd edition p. 994) as the function ran2(). This provides nearly 2^32 distinct values and the sequence of random numbers has a period in excess of 10^18.

According to Marsaglia, LCGs generate numbers in distinct plains in a 3D scatter chart. Which could affect results in an unpredictable manner. There is also a problem of degradation or of non-randomness is some segment of the bits. But if you always apply to the previous shuffle, this helps.

iCountNTrack · Sep 21, 2009

An even better RNG

FYI
"A 623‐dimensionally equidistributed uniform pseudorandom number generator"
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.pdf
It is claimed to have superior distribution properties and a period of 2^19937‐1! (exclamation mark not factorial

)

QFIT · Sep 21, 2009

iCountNTrack said:
FYI
"A 623‐dimensionally equidistributed uniform pseudorandom number generator"
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.pdf
It is claimed to have superior distribution properties and a period of 2^19937‐1! (exclamation mark not factorial )

Yes, the Mersenne Twister is not an LCG. Marsgalia doesn't like it. But it passes his tests. I've been accused of using this rng, although I never have.

k_c · Sep 21, 2009

QFIT said:
The difference between a good RNG and a bad RNG is enormous. Compilers nearly always uses LCGs. Linear Congruential Generators. Very fast, very easy, and worthless for Blackjack. The RNG in Excel (unless they have changed it) was known to degrade after 30,000 numbers or so. Period is the number of results before the RNG repeats. (An algorithmic RNG must repeat after some period.) The MarZam II RNG has a period so long, that if you started a PC generating numbers as fast as it could at the beginning of the Universe, it will not yet have repeated.

Here's what I don't understand. I think the difference between a good rng and a poor one is the period. I think that is the number of trials before it begins to repeat, right? If I am wrong in this then this question is meaningless.

I researched a little and found the period for the C++ rng is 2^32 - 1 and the period for a good rng to be around 2^45 to 2^90.

Anyway, let's say a poor rng has low period that is adequate to a small extent. If you use an rng such as this to continuously shuffle at some point it begins to repeat the same shuffle, right? However, if the shoe is never reset to an ordered state then when it begins to repeat it's almost certain that it will be shuffling a totally different shoe order. Why wouldn't this be random enough?

QFIT · Sep 21, 2009

k_c said:
Here's what I don't understand. I think the difference between a good rng and a poor one is the period. I think that is the number of trials before it begins to repeat, right? If I am wrong in this then this question is meaningless.

I researched a little and found the period for the C++ rng is 2^32 - 1 and the period for a good rng to be around 2^45 to 2^90.

Anyway, let's say a poor rng has low period that is adequate a small extent. If you use an rng such as this to continuously shuffle at some point it begins to repeat the same shuffle, right? However, if the shoe is never reset to an ordered state then when it begins to repeat it almost certain that it will be shuffling a totally different shoe order. Why wouldn't this be random enough?

Period is only one aspect. Distribution, patterns, whether high or low bits are as random as all bits. Some RNGs start out ok, don't repeat, but distribution starts to deteriorate.

London Colin · Sep 22, 2009

QFIT said:
Period is only one aspect.

Just focusing on that aspect, I have similar problems to k_c, trying to get my head around all the implications of a short period.

Wikipedia said:
Even if only pseudo-random shuffles are required, there is another potential pitfall: the number of pseudo-random shuffles that can be generated cannot exceed the number of states that the pseudo-random number generator can take. For many older systems, this can be as small as 2^32, a number which is 58 orders of magnitude smaller than 52!, and thus 52-card shuffles generated using this system cannot possibly be unbiased. Indeed, since they can be completely enumerated, it is often possible to tell from the first few cards in such a pseudo-randomly shuffled pack what the remainder of the cards will be. Thus, in addition to all of the implementation problems above, to have any chance at all of generating a plausible pseudo-random shuffle, the pseudo-random generator must have in excess of log2 52! = 225.58 bits of internal state, preferably by a considerable margin.

And that is just talking about shuffling a single deck; for 8 decks we are dealing with 416! permutations.

So, to summarise my (lack of) understanding -

I can see that if we continually shuffle the same initial deck, we can only ever produce a subset of all the possible shuffled decks. But need that subset be biased, as stated above? (That is, can we infer anything about whether the 2^32 available decks will be as representative of a an average deck, for the purposes of blackjack, as any truly-randomly-chosen 2^32 decks would be?)

If each shuffle is performed on the previous shuffled deck, the size of the reachable subset presumably goes up. But what impact does that have?

QFIT said:
Distribution, patterns, whether high or low bits are as random as all bits. Some RNGs start out ok, don't repeat, but distribution starts to deteriorate.

Long ago, I wrote a text-based version of the code-guessing game 'Mastermind', for my own amusement, using the numbers 1-6 to substitute for the coloured pegs of the original. I generated the numbers in the 'obvious' way: 1 + rand()%6.

I had to play the game an embarrassingly large number of times before I noticed something eerily familiar about the codes I was solving. Evidently the (DEC VAX/VMS C compiler) implementation of rand() suffered greatly from the problem of lack or randomness in the low bits, and I was generating a really short, repeating sequence of numbers, let's say 612564.

All my 4-digit codes were just windows into that repeating sequence - 6125, 1256, .. , 6461, etc. Not very taxing for a codebreaker!

QFIT · Sep 22, 2009

I'll delete that nonsense from Wikipedia when I get a chance.

London Colin · Sep 22, 2009

QFIT said:
I'll delete that nonsense from Wikipedia when I get a chance.

So long as you replace it with sense.

k_c · Sep 23, 2009

London Colin said:
You're most welcome.

At the risk of sounding like a broken record, your plan to replace the modulus operation with this ...

will not eliminate the bias; it will only redistribute it. The same pigeonhole argument applies to this as did to the shuffle algorithm. The only solution I am aware of is to discard any random number that falls out of range, and ask for another, like I do in the function I put in my first post.

That being said, the Wiki article says -

So, apparently there are other solutions, but I don't know what they might be.

As with the shuffle, the pigeonhole argument can easily be demonstrated by shrinking the numbers to create a suitably small example. Suppose you want to generate integers in the range 0..3, and RAND_MAX is 5 (i.e. 6 possible return values from rand() : 0..5).

Code:

[B]rand() rand()%4 (rand()/6 * 4)[/B] 0 0 0 1 1 0 2 2 1 3 3 2 4 0 2 5 1 3

So the bias just shifts from (0 and 1) to (0 and 2).

I think I solved the problem without the problem of the possibility of indefinite postponement. In order to eliminate the bias what I did was to generate a new random number each time the original random number fell outside of the accepted range. Then I added this new number to the system time and took the modulus of that to put it into the acceptable range.

This seems to give good proportionate results for numbers through 415 which is what would be needed to shuffle 8 decks.

I am attaching a test program. It generates numbers in the range 0 to an upper limit that is input and a total of random numbers to be generated that is also input. It then displays how many of each number was randomly generated.

I think using this and the Knuth shuffle will work well. Thanks again for your comments. If you would like to see the source code, send a PM.

QFIT · Sep 23, 2009

Rule #1 of random number generation. NEVER use the system time.

k_c · Sep 24, 2009

QFIT said:
Rule #1 of random number generation. NEVER use the system time.

Rule #1 for learning about random number generation. USE the system time. It's the most prevalent method for beginners to intermediates out there. Do not put the cart before the horse.

johndoe · Sep 24, 2009

QFIT said:
Rule #1 of random number generation. NEVER use the system time.

Why not?

Software simulator and unexpected results

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Attachments

Well-Known Member

Well-Known Member

Well-Known Member