Faster/clearer computation of dealer probabilities?

#1
assume_R said:
Eric,

I see you computed the dealer probabilities non-recursively. I wonder if that would be faster, as I compute them recursively. I pm'ed you a link to my recursive function.
This seemed like it deserved a new thread. Please let me know if different practice is recommended/required.

I started to respond to assume_R's comment above in our pm, but as others have pointed out, (1) the computation of dealer probabilities is where about 99% of the time is spent in my CA, and (2) it is one of the most complex and difficult to read sections of the code. So I thought I would try to describe here in more detail what I was up to, so perhaps others might have suggestions for how I might make it faster?

First, a big benefit of a recursive computation is that it is extremely easy to read, since it corresponds almost exactly with a prose description of what a dealer actually does. I started out with this approach; following is the corresponding snippet from my very first working draft of a finite-deck CA, from a long time ago :) (no snickering please, I was just a kid):

Code:
// Compute_Recursive() is called by Compute_Probabilities() and itself recursively to count the
// possible dealer outcomes and respective probabilities.  The argument chance_hand is the
// probability of the current state of the dealer hand.

void Dealer_Hand::Compute_Recursive(double chance_hand) {
    int card;
    double chance_card;

    if (count < 17 || (HIT_SOFT_17 && count == 17 && soft)) {   // all hands 16 or less and maybe
                                                                // soft 17's get another card
        for (card = 1; card <= 10; card++)                      // for each possible hit card
            if (shoe.cards[card - 1]) {                         // that's still in the shoe
                chance_card = shoe.Chance(card);
                Deal(card);                                     // deal it
                Compute_Recursive(chance_hand*chance_card);     // and do it again
                Undeal(card);
            }
    }
    else if (count > 21)                                        // keep track of dealer busts
        chance_bust += chance_hand;
    else
        chance_count[count] += chance_hand;                     // and stand on 17,18,19,20,21
}
Simple, and very similar to your implementation. But mine was slow. The problem with the recursive approach is that we end up "visiting" a particular subset of cards in the dealer hand many times, corresponding to the many different orders in which those cards may be drawn. But the key observation is that every one of those different orderings of the same subset of cards has the same probability. So, my approach is to pre-compute, just one time, an enumeration of all possible subsets of cards, keeping track of the number of orderings for each. (This is done in the usual recursive way; see lines 316-357 of blackjack.cpp.)

When we want to compute the dealer probabilities in any particular situation, we walk through the pre-computed list, discarding any subsets that require more cards than are available in the current shoe (lines 225-233). For each subset, we compute the probability of that subset (lines 240-247), multiplied by the number of orderings of that subset-- that is, the number of times we *would* have visited the subset with a normal recursive approach (lines 252-257).

There are more details, but this is a starting point for motivating why I did things this way. The resulting current version of the algorithm ends up being approximately 7 times faster than the recursive implementation. But I haven't revisited this in a long time, and other approaches, ideas, etc., are appreciated!

P.S. Note that all line numbers above refer to version 5.2 on my web site, or the initial <c369d5b1e28d> changeset in the Bitbucket repository.
 

MangoJ

Well-Known Member
#2
How do you handle the soft Ace in this approach ? If the Ace were always hard,
then yes we could just permute all cards.

But I don't see it for soft Aces. Say we want to calculate the Dealer standing on 17. One of those sequences we need to evaluate would be 5A83. However, the permutation 5A38 is unphysical, since 5A3 is already a 19.
So, do you also check if a permutation is valid ?
 
#3
MangoJ said:
How do you handle the soft Ace in this approach ? If the Ace were always hard,
then yes we could just permute all cards.

But I don't see it for soft Aces. Say we want to calculate the Dealer standing on 17. One of those sequences we need to evaluate would be 5A83. However, the permutation 5A38 is unphysical, since 5A3 is already a 19.
So, do you also check if a permutation is valid ?
This is a good point. A simple "n-factorial-ish" computation of the number of all possible permutations of cards in a given subset would include some that are invalid, such as in your example above. Instead, in the pre-computation step (lines 316-357), we actually do the entire "slow" recursive traversal of all possible permutations of dealer hands-- but just once-- and for each one, we increment an integer multiplier associated with the corresponding unordered subset of cards. (See lines 341-355; in 342 we increment the multiplier for a subset we have already seen, in 353 we increment the multiplier to 1 for a newly-created subset that we haven't seen before.)

In short, because we enumerate permutations via the recursive process similar to what the dealer actually does, we never even visit "invalid" permutations.

One detail: I basically do everything separately by dealer up card, including keeping track of these multiplicities. So, using your 5A38 example above, the number of valid permutations (the DealerHand.multiplier[] variable) is 4 for up cards of 3, 5, or 8; it is just 2 for an ace up card.
 

MangoJ

Well-Known Member
#4
I'm sorry, I'm not good at reading other peoples codes, I will just ask you for some clarification (I could probably find in the code itself).

So you have a table, starting from upcard to each stand value, each "line" with a different combination of drawn cards, where the table entry is the number of valid permutations of those combination.
This table does only depend on S17/H17 (no other rules needed), and is most probably just a few thousand entries. This table could in principle be precompiled (or created by a class method), using a recursive run on an infinite deck.

Did I get this right ?

Then the exact dealer probability (depending on deck size and removed cards) would be to sum up all combinations in the table, multiplied with their stored number of permutations (in most cases a multinominal coefficient), and their probability of drawing such a combination.

Then my next question would be: One would need an efficient two-way adressing scheme of a specific combination which is invariant under permutation. I've seen a similar mapping in a paper of Nairn (however it is one-way), so how is it implemented in your code ?

Thanks for your time, I learn a lot from your comments.
 

assume_R

Well-Known Member
#5
Let's race!

Hi, Eric.

So let's pick a set of rules, and a set of cards and we'll compare the non-recursive approach to the recursive approach, and scientifically see which is faster. I have your code, and will set up a system which will create a shoe and I will recompute the dealer probabilities 1,000 times. And then do the exact same thing with my recursive function.

As mentioned previously, your recursive code is similar (though not exactly equal) to mine, and I have made numerous efforts to optimize the recursive probability calculations as much as possible. However, you are correct in that the same dealer hands with different orderings do get calculated separately in the recursive function, and that may make your non-recursive code way faster.

But we have to remember to take into account the "preprocessing" step in your code.

So let's say off the top 8D, S17? We'll give the dealer a 2, and hence there will be 31 2's instead of 32 2's. Every other rank will have 32 cards, except 10 which will have 128 cards.

I get the following output:
17: 0.1396941124
18: 0.1345183452
19: 0.1299287332
20: 0.1240209940
21: 0.1183078290
Bust: 0.3535299860
BJ: 0

Do you agree that a straight speed test is the fairest way to see if recursive can give better results than non-recursive?
 
#7
MangoJ said:
So you have a table, starting from upcard to each stand value, each "line" with a different combination of drawn cards, where the table entry is the number of valid permutations of those combination.
This table does only depend on S17/H17 (no other rules needed), and is most probably just a few thousand entries. This table could in principle be precompiled (or created by a class method), using a recursive run on an infinite deck.

Did I get this right ?
Exactly. For example, for S17, there are just 1676 subsets of cards that we care about: 248, 291, 341, 386, and 410 for hand totals of 17 through 21, respectively. My CA does exactly what you suggest could be done in principle, just once in a pre-compute setup; it recursively deals from an infinite deck to generate the list of subsets.

MangoJ said:
Then the exact dealer probability (depending on deck size and removed cards) would be to sum up all combinations in the table, multiplied with their stored number of permutations (in most cases a multinominal coefficient), and their probability of drawing such a combination.

Then my next question would be: One would need an efficient two-way adressing scheme of a specific combination which is invariant under permutation. I've seen a similar mapping in a paper of Nairn (however it is one-way), so how is it implemented in your code ?
If I understand your question correctly, during the recursive dealing of hands in the pre-compute step, when we encounter a particular permutation of cards (say, 5-A-8-3), we want to map that to a canonical representation of the corresponding unordered subset, right?

My CA does this in the dumbest possible way: it maintains the list of encountered subsets in a simple linear array, and given a particular permutation, it simply walks linearly through the list of subsets already encountered so far, searching for the subset with a single ace, 3, 5, and 8. (See lines 332-340.) If it doesn't find it-- i.e., if this is the first time we have seen this particular subset of cards-- then it appends it to the end of the list.

You are correct that there are more efficient and sophisticated ways of doing this... but I would argue that doing so would violate at least the second rule of premature optimization, namely: don't do it... yet. :) By that I mean that this particular section of the algorithm is currently (1) pretty short, (2) relatively easy to read, and (3) not even close to the long pole in the tent in terms of overall execution time. To optimize here will hurt (1) and (2) and do almost nothing to improve (3).

In my opinion, the place where significant performance improvement would be useful is in the computation of the probability of a given subset (multiplied by the number of corresponding permutations). Generating the list of subsets is the easy part-- the hard part is being quick about what we do with the list once we have it.
 
#9
assume_R said:
So let's pick a set of rules, and a set of cards and we'll compare the non-recursive approach to the recursive approach, and scientifically see which is faster. I have your code, and will set up a system which will create a shoe and I will recompute the dealer probabilities 1,000 times. And then do the exact same thing with my recursive function.

So let's say off the top 8D, S17? We'll give the dealer a 2, and hence there will be 31 2's instead of 32 2's. Every other rank will have 32 cards, except 10 which will have 128 cards.

I get the following output:
17: 0.1396941124
18: 0.1345183452
19: 0.1299287332
20: 0.1240209940
21: 0.1183078290
Bust: 0.3535299860
BJ: 0
This sounds like a very interesting and useful exercise. If it helps, following is example code that uses my CA to perform this same calculation (although see below):

Code:
#include "blackjack.h"

int main()
{
    BJDealer dealer(false);
    BJShoe shoe(8);
    dealer.computeProbabilities(shoe);
}
If you want some output to verify that we are computing the same thing:

Code:
#include "blackjack.h"

#include <iostream>
#include <iomanip>
#include <limits>

int main()
{
    BJDealer dealer(false);
    BJShoe shoe(8);
    dealer.computeProbabilities(shoe);

    int upCard = 2;
    std::cout << std::setprecision(std::numeric_limits<double>::digits10 + 1);
    for (int count = 17; count <= 21; ++count)
    {
        std::cout << count << ": " <<
            dealer.getProbabilityCount(count, upCard) << std::endl;
    }
    std::cout << "Bust: " << dealer.getProbabilityBust(upCard) << std::endl;
    std::cout << "BJ: " << dealer.getProbabilityBlackjack(upCard) << std::endl;
}
We should be careful to ensure that the "tic and toc" timing is wrapped around the same computation in both implementations. For example, note that the "public" interface to my code does the calculation for all dealer up cards at once. Let me know how it goes, and we can compare results!
 

MangoJ

Well-Known Member
#10
ericfarmer said:
In my opinion, the place where significant performance improvement would be useful is in the computation of the probability of a given subset (multiplied by the number of corresponding permutations). Generating the list of subsets is the easy part-- the hard part is being quick about what we do with the list once we have it.
Well, that's exactly my point. If one would had an efficient direct translation between the index of this table to the canonical representation of a combination, all one would need to do is cycle through all entries of the table, get their canonical representation, and then calculate the probability of drawing that representation.

Thinking more about this, one would need to have it just one-way, and ~2k is sufficient small, I would hard-code a table (best to have a meta-program generating this table as an include file). For better usability, I would split the table in different parts of dealers total. One would not need (and you don't have it I think) a table for bust hands, since all non-BJ non-stand hands are busted.

I don't know your canonical representation, let's assume it's just an fixed length (10) array of used cards in that representation, together with the total number of cards (for efficiency of calculation). This representation is used by my CA.
So a A4T4 would be in canonical representation {{1,0,0,2,0,0,0,0,0,1}, 4}.
Then all one would need is a table like

Code:
typedef struct canonical {
	int cards[10];
	int total;
}

typedef struct DealerTableEntry {
    int multiplier;
    canonical representation;
}

DealerTableEntry DealerS17Table[1676] = {
...
{X, {{1,0,0,2,0,0,0,0,0,1}, 4}},  // A44T
...
};

DealerTableEntry DealerH17Table[1676] =  ... ;
(where X is the correct multiplier for A44T).

Depending on the given rules, one would then just use a pointer to the S17 or H17 table (i.e. DealerTableEntry* DealerTable = DealerS17Table).
For the probability of dealer standing on 19 one would then just iterate the corresponding part of the table, calculate the probability of drawing that specific representation (some fractions of factorials), and cumulate.

Code:
canonical Deck = {{8,8,8,8,8,8,8,8,8,32}, 104}; // i.e. double deck
double q = 0.;
for(i=INDEXLOW[19]; i<= INDEXHIGH[19]; i++) {
   DealerTableEntry entry = (*DealerTable)[i];

   // probability p of drawing the entry.representation from Deck
   double p = factorial(Deck.total - entry.representation.total) / factorial(Deck.total);
   for(card=0; card<10; card++) {
      p *= factorial(Deck.cards[card]) / factorial(Deck.cards[card] - entry.representation.cards[card]);
   }

   q += entry.multiplier * p;
}
To further improve performance, the factorials could be re-arranged and should be tabulated as well.
This is not any tested code, just a sort if idea. What do you think ? I also apologize the C-style (I'm not well familiar with C++, part of reason I cannot read your original code well enough).

Edit: One should optimize the expression "factorial(Deck.total - entry.representation.total) / factorial(Deck.total)", as factorials will grow quite large, while the ratio will be rather low.
 
#11
MangoJ said:
Well, that's exactly my point. If one would had an efficient direct translation between the index of this table to the canonical representation of a combination, all one would need to do is cycle through all entries of the table, get their canonical representation, and then calculate the probability of drawing that representation.
I think we are in violent agreement. As you'll see below, your description of how this might work is almost exactly what my CA currently does.

MangoJ said:
Thinking more about this, one would need to have it just one-way, and ~2k is sufficient small, I would hard-code a table (best to have a meta-program generating this table as an include file). For better usability, I would split the table in different parts of dealers total. One would not need (and you don't have it I think) a table for bust hands, since all non-BJ non-stand hands are busted.

I don't know your canonical representation, let's assume it's just an fixed length (10) array of used cards in that representation, together with the total number of cards (for efficiency of calculation). This representation is used by my CA.
So a A4T4 would be in canonical representation {{1,0,0,2,0,0,0,0,0,1}, 4}.
Then all one would need is a table like...
In place of your example code, following is a snippet from my CA (blackjack.h):

Code:
    struct DealerHand {
        int cards[10],
            multiplier[10];
    };
    struct DealerHandCount {
        int numHands;
        DealerHand dealerHands[423];
    } dealerHandCount[5];
Almost exactly the same idea, with a couple of exceptions. First, I keep track of a multiplier for each subset, for each possible up card, for the "invalid permutation" reasons we discussed earlier. Second, just as you suggest, I keep a separate list of subsets for each hand total 17 through 21; the array size 423 is the maximum number of subsets encountered for any (standing) hand total, S17 or H17.

MangoJ said:
For the probability of dealer standing on 19 one would then just iterate the corresponding part of the table, calculate the probability of drawing that specific representation (some fractions of factorials), and cumulate.
...
To further improve performance, the factorials could be re-arranged and should be tabulated as well.
So far, I think we are on exactly the same page, and we are at the challenging step, of efficiently computing this probability. I try to take some steps to "re-arrange and tabulate" those factorial expressions; this is probably the most opaque section of code in my CA:

Code:
// max*Values[c] are the largest values of s and h for which we need to compute
// lookup[][c][].
//
// lookup[s][c][h] is the probability of h + 1 consecutive cards with value
// c + 1 being dealt from the shoe, given that s other cards have been removed
// from the shoe, and is equal to
// f(shoe.cards[c], h + 1)/f(shoe.numCards - s, h + 1), where f is the falling
// factorial function.
const int BJDealer::maxSvalues[10] = {0, 11, 11, 11, 12, 11, 10, 9, 8, 7};
const int BJDealer::maxHvalues[10] = {11, 8, 5, 4, 3, 2, 2, 1, 1, 1};

void BJDealer::computeProbabilities(const BJShoe & shoe) {

    // Compute lookup[][][].
    for (upCard = 1; upCard <= 10; upCard++) {
        int maxS = maxSvalues[upCard - 1],
            maxH = maxHvalues[upCard - 1],
            numCards = shoe.numCards;
        for (int s = 0; s <= maxS && numCards; s++, numCards--) {
            double *l = lookup[s][upCard - 1];
            int cards = shoe.cards[upCard - 1],
                n = numCards;
            double p = l[0] = (double)cards--/n--;
            for (int h = 1; h <= maxH && cards; h++) {
                p *= (double)cards--/n--;
                l[h] = p;
            }
        }
    }
To be honest, I don't really have much idea how well my implementation performs compared to the many other CAs out there. If there are improvements to this same basic approach, or completely different approaches to doing this, I would be very interested to learn more.
 
#12
assume_R said:
So let's pick a set of rules, and a set of cards and we'll compare the non-recursive approach to the recursive approach, and scientifically see which is faster. I have your code, and will set up a system which will create a shoe and I will recompute the dealer probabilities 1,000 times. And then do the exact same thing with my recursive function.

<snip>

So let's say off the top 8D, S17? We'll give the dealer a 2, and hence there will be 31 2's instead of 32 2's. Every other rank will have 32 cards, except 10 which will have 128 cards.
I had some time to experiment with this, and came up with some results. The short version: the non-recursive algorithm is a little over 7 times faster than the recursive one.

Longer version: For 8D, S17, I measured the time required for 10,000 iterations of computing the probabilities for each of the 10 possible dealer up cards (see code below). This is on a comically old 2.4 GHz computer, with MSVC++ 2003 and Boost 1.44. (I have newer compilers, but 2003 is faster. :))

It took 3.468 seconds with the non-recursive algorithm, and 24.907 seconds with the recursive one.

Following is the code that performed the test:
Code:
#include "blackjack.h"
#include "Combinatorial.h"
#include <iostream>
#include <ctime>

// Simple wall clock timer.
struct Timer
{
    Timer(double& elapsed) : elapsed(elapsed), tic(std::clock()) {}
    ~Timer()
    {
        elapsed = static_cast<double>(std::clock() - tic) / CLOCKS_PER_SEC;
    }
    double& elapsed;
    std::clock_t tic;
};

int main()
{
    int numDecks = 8;
    int numSamples = 10000;
    double elapsed = 0;

    // Time Eric's code.
    {
        Timer t(elapsed);
        BJShoe shoe(numDecks);
        BJDealer dealer(false);
        for (int i = 0; i < numSamples; ++i)
        {
            dealer.computeProbabilities(shoe);
        }
    }
    std::cout << "Eric     -> " << elapsed << " seconds" << std::endl;

    // Time assume_R's code.
    {
        Timer t(elapsed);
        simmarca::DealerTotals P;
        simmarca::DealerCards deck;
        deck.assign(4 * numDecks);
        deck[9] = 16 * numDecks;
        for (int i = 0; i < numSamples; ++i)
        {
            for (int upCard = 1; upCard <= 10; ++upCard)
            {
                P.assign(0);
                --deck[upCard - 1];
                simmarca::CalculateTotals(P, 1, deck, 52 * numDecks - 1,
                    ((upCard == 1) ? 11 : upCard), 1, upCard == 1, false,
                    false);
                ++deck[upCard - 1];
            }
        }
    }
    std::cout << "assume_R -> " << elapsed << " seconds." << std::endl;
}
 

k_c

Well-Known Member
#13
ericfarmer said:
To be honest, I don't really have much idea how well my implementation performs compared to the many other CAs out there. If there are improvements to this same basic approach, or completely different approaches to doing this, I would be very interested to learn more.
When I first saw your lookup implementation I didn't know what to make of it except that it was probably there to increase efficiency. After a very long time it finally dawned on me what it did.

What I do in my programs that I think does the same thing is to use 2 parallel arrays. I think the efficiency is about the same.

Code:
	bool comp[21][10][21] = {false};
	double p[21][10][21];
The 3 elements of each array represent the number of cards removed from the starting composition, each of 10 possible ranks, and the number of the specific rank in the present dealer hand. For dealer probabilities 21 is more than you need in the first and last elements but using 21 definitely covers all possibilities.

The first array is boolean and if a combination of cards has already been computed it is set to true. The second array contains the computed probabilities.

Then in my code I check to see if a value has already been computed. If not it is computed and the boolean array is set to true but otherwise the already computed value is used.

Code:
        if (poss)
	{
		short rem = 0;
		double handProb = 1;

		for (short rank = 1; rank <= 10; rank++)
		{
			if (r->comp[rank - 1])
			{
				short numRank = r->comp[rank - 1];
				if (comp[rem][rank - 1][numRank - 1])
					handProb *= p[rem][rank - 1][numRank - 1];
				else
				{
					double prob = r->getProbRank(shoe, rem, rank);
					p[rem][rank - 1][numRank - 1] = prob;
					comp[rem][rank - 1][numRank - 1] = true;
					handProb *= prob;
				}
				rem += numRank;
			}
		}

		.......
 

assume_R

Well-Known Member
#14
ericfarmer said:
I had some time to experiment with this, and came up with some results. The short version: the non-recursive algorithm is a little over 7 times faster than the recursive one.

Longer version: For 8D, S17, I measured the time required for 10,000 iterations of computing the probabilities for each of the 10 possible dealer up cards (see code below). This is on a comically old 2.4 GHz computer, with MSVC++ 2003 and Boost 1.44. (I have newer compilers, but 2003 is faster. :))

It took 3.468 seconds with the non-recursive algorithm, and 24.907 seconds with the recursive one.
}[/CODE]
Well I'll be damned! I get 1.344 seconds for non-recursive and 6.915 seconds for recursive (VS 2010 64-bit). So a 5-fold increase in speed. I lose the race :laugh:.

k_c said:
What I do in my programs that I think does the same thing is to use 2 parallel arrays. I think the efficiency is about the same.

Code:
	bool comp[21][10][21] = {false};
	double p[21][10][21];
The 3 elements of each array represent the number of cards removed from the starting composition, each of 10 possible ranks, and the number of the specific rank in the present dealer hand. For dealer probabilities 21 is more than you need in the first and last elements but using 21 definitely covers all possibilities.

The first array is boolean and if a combination of cards has already been computed it is set to true. The second array contains the computed probabilities.

Then in my code I check to see if a value has already been computed. If not it is computed and the boolean array is set to true but otherwise the already computed value is used.
So this is an interesting twist on the speed test. Because in the test Eric and I just performed, it was on a straight calculation of the dealer probabilities. k_c, did you calculate them recursively as well?

Now in my code, when I actually run my CA, before computing the dealer probabilities I actually check if they have been computed.

Eric, does your code always recalculate the probabilities when doing the CA even if they have been calculated once before? i.e. do you store the calculated probability values?

One more interesting twist is that during the recursive function, during the recursive calculation, I can check if that "subset" has already been calculated already. I had avoided doing this because I assumed the overhead would be too much for when you're already "inside" the recursive function.

Finally, Eric, I used Visual Studio 2010 Ultimate to profile your code, and here are where the biggest bottlenecks are (mostly the division). Any way we can think of to speed this up even more?

 

London Colin

Well-Known Member
#16
assume_R said:
Finally, Eric, I used Visual Studio 2010 Ultimate to profile your code, and here are where the biggest bottlenecks are (mostly the division). Any way we can think of to speed this up even more?

One of the more minor changes I made in my copy was to eliminate most of the minus ones that permeate the code in array lookups. This was mainly for the sake of readability, but might also have some impact on speed.

i.e., instead of having something like - int cards[10] - and indexing it everywhere with card-1, add an extra, redundant element - int cards[10+1], making the zeroth element redundant, and index it directly.


I also used public methods, wherever one exists, rather than refer to the array directly. I made sure these were all inline functions, so there would be no performance overhead as a result. That does mean the code would look the same, even without the additional, redundant array elements.

So I would call: shoe.getCards(upCard), which equates to shoe.cards[upCard] in my code, or shoe.cards[upCard-1] in the original.

[Actually, looking at the code, I can see I tried different implemenations, and currently have one enabled in which the arrays are replaced by classes, with overloaded operator[], so that the redundant array members are optional even without using the access functions like getCards(upCard).]

Hope that makes sense.:)
 

k_c

Well-Known Member
#17
assume_R said:
Well I'll be damned! I get 1.344 seconds for non-recursive and 6.915 seconds for recursive (VS 2010 64-bit). So a 5-fold increase in speed. I lose the race :laugh:.



So this is an interesting twist on the speed test. Because in the test Eric and I just performed, it was on a straight calculation of the dealer probabilities. k_c, did you calculate them recursively as well?

Now in my code, when I actually run my CA, before computing the dealer probabilities I actually check if they have been computed.

Eric, does your code always recalculate the probabilities when doing the CA even if they have been calculated once before? i.e. do you store the calculated probability values?

One more interesting twist is that during the recursive function, during the recursive calculation, I can check if that "subset" has already been calculated already. I had avoided doing this because I assumed the overhead would be too much for when you're already "inside" the recursive function.

Finally, Eric, I used Visual Studio 2010 Ultimate to profile your code, and here are where the biggest bottlenecks are (mostly the division). Any way we can think of to speed this up even more?

I think what Eric's code does is to compute all possibilities for a given shoe composition once in order to guarantee that nothing is computed more than once. After that when the final probabilities are computed all that needs to be done for a given dealer hand composition is to look up the value that is known to already have been computed and then multiply by the number of ways that the hand can be dealt.

My approach is similar except that I don't pre-compute anything but rather build a list of already computed data as I go, so nothing is ever computed more than once. After that I also multiply by the number of ways a hand can be dealt so no, I do not compute recursively.
 
#18
assume_R said:
Well I'll be damned! I get 1.344 seconds for non-recursive and 6.915 seconds for recursive (VS 2010 64-bit). So a 5-fold increase in speed. I lose the race :laugh:.
Very interesting. I have tried to maintain a suite of compilers, continuing to cling desperately as far back as M$VS 2003, partly due to academic interest, but at times out of necessity for performance. For example, running the same test compiled with VS 2008, on the same hardware, yields 3.218 vs. 30.188 seconds. (Compare this with the earlier 3.468 vs. 24.907 on the same hardware with VS2003.) With the newer compiler, mine got faster and yours got slower!

x64 is another monkey wrench. In my experience at work, the 64-bit compilers have simply not caught up yet, so to speak, usually tending to be slower than their 32-bit counterparts. Although sometimes it's tricky, where sometimes relative performance between two different applications on 32-bit is reversed when compiled 64-bit!

(I have seen even more marked differences: I wrote some CPU-intensive board game AI not long ago, that I release with a pre-built executable using VS2003... because the 2005 build is twice as slow, and the 2008 build is twice again as slow!

assume_R said:
Eric, does your code always recalculate the probabilities when doing the CA even if they have been calculated once before? i.e. do you store the calculated probability values?
k_c answered the mail well here. I cache the distinct subsets in a list ahead of time, and calculate probabilities as I traverse the list in linear order.

I definitely have my teeth in this again; I plan to poke around both in the dealer probabilities (making it faster), as well as in my calculation of expected values for splitting more than once (making this accurate). We'll see how it goes...
 

assume_R

Well-Known Member
#19
ericfarmer said:
Very interesting. I have tried to maintain a suite of compilers, continuing to cling desperately as far back as M$VS 2003, partly due to academic interest, but at times out of necessity for performance. For example, running the same test compiled with VS 2008, on the same hardware, yields 3.218 vs. 30.188 seconds. (Compare this with the earlier 3.468 vs. 24.907 on the same hardware with VS2003.) With the newer compiler, mine got faster and yours got slower!

x64 is another monkey wrench. In my experience at work, the 64-bit compilers have simply not caught up yet, so to speak, usually tending to be slower than their 32-bit counterparts. Although sometimes it's tricky, where sometimes relative performance between two different applications on 32-bit is reversed when compiled 64-bit!

(I have seen even more marked differences: I wrote some CPU-intensive board game AI not long ago, that I release with a pre-built executable using VS2003... because the 2005 build is twice as slow, and the 2008 build is twice again as slow!
That is going to depend on what you use. For example, if you use 32-bit values such as float, a 64-bit system might have to up-convert them all to 32-bit numbers (i.e. double) before sending them to the CPU. However, most modern compilers know if you use "int" and actually convert it to the appropriate memory-length before sending it to the CPU. So the short answer (as usual) is really "it depends".

Also are your optimization flags the same in both versions? I've only used VS 2010 and I've been happy with its optimization. It is pretty much on par with the latest gcc, which I've also compiled my CA with on Linux.

ericfarmer said:
k_c answered the mail well here. I cache the distinct subsets in a list ahead of time, and calculate probabilities as I traverse the list in linear order.
So I haven't spent as much time with your code as you have, but I would imagine that a linear order is far from optimal. If you think having it sorted would help, you should store the list as a "set" which will always remain sorted when you add new elements to it. Can we think of how a sorted list might make it more efficient? So when you search through the list, you don't need to traverse linearly - you just give the "set" (or "multiset") the value you want, and it immediately returns the result you want. Visual Studio stores sets internally as red-black trees I believe.

I will look at your code again and we can discuss ways to speed this up.

Edit: Some minor optimizations include using ++iterator in your for-loops instead of iterator++, and using != instead of < . These don't nearly compare to algorithmic changes but they are some "good to know" tricks.
 
#20
assume_R said:
Also are your optimization flags the same in both versions? I've only used VS 2010 and I've been happy with its optimization. It is pretty much on par with the latest gcc, which I've also compiled my CA with on Linux.
Yep, release config, same flags. Unfortunately, as you say, the only useful benchmark is your own application, since relative performance varies a lot. Whenever I release pre-built software, I simply try all of the compilers and pick the fastest one... which frequently (but not always) turns out to be the *older* one. (Note that compilers are evolving with at least two goals, and only one of them is performance. The other is language compliance, and in this respect things are indeed mostly getting better.)

assume_R said:
So I haven't spent as much time with your code as you have, but I would imagine that a linear order is far from optimal.
Hmmm. I may be misunderstanding your idea here. It would think that any traversal other than barging through a simple linear array would be slower. The list/set we are talking about is the collection of distinct subsets of cards in the dealer's hand (totaling somewhere between 17 and 21), right? Assuming that we generate that collection ahead of time, as I do, I'm not sure what difference it makes in which order that collection is traversed when computing the probability for each element (i.e., subset of cards).

assume_R said:
Edit: Some minor optimizations include using ++iterator in your for-loops instead of iterator++, and using != instead of < . These don't nearly compare to algorithmic changes but they are some "good to know" tricks.
Agreed that pre-increment is cheaper, and is now habit. My C++ code-- and even the language itself-- has evolved a lot in the last 14 years since my first cut at this. As I think I said here or maybe in a pm, if I had it to do all over again...
 
Top