Very interesting write-up. This is an important question-- rephrasing slightly, "How do we know that a 2 billion round simulation is good enough to calculate the expected value of a blackjack round?"
I think the initial response must be another question: "How good do you want it to be?" Because if "good enough" means, for example, needing an estimate of expected value that we can be confident is accurate to 4 decimal places (in percent of initial wager), then 2 billion rounds is nowhere near enough. On the other hand, if "good enough" simply means accurate to 1 decimal place, then 2 billion rounds is overkill.
The Central Limit Theorem is relevant here. Suppose that we know in advance the standard deviation (sigma) of the outcome of a single round. (We don't, but we could estimate it as well. Let's use the 1.1418 value from the Wizard of Odds appendix
here. I know this is for 6 decks, not 1, but this is back of the envelope.)
Then for a large number n of samples, our estimated EV is approximately normally distributed with standard deviation sigma/sqrt(n). So if we were to run our n-round simulation repeatedly, we should expect our estimated EV to be within 2*sigma/sqrt(n) of the *true* expected value about 95% of the time.
Plugging in 1.1418 for sigma and 2 billion for n yields a "one-sided" 2-sigma difference of about 0.005%, or about 2 decimal places.
As discussed elsewhere, this is easy to demonstrate, simply by running your 2 billion-round simulation multiple times. For example, suppose you run your simulation, sample 2 billion rounds, and get an estimated EV of -0.071826%. Is it appropriate to include this many digits in the result? No, because if you run it again, you may see -0.069712%, or -0.078315%. If you kept running the simulation many more times, about 95% of the results would be within 0.005% of the *true* expected value. If you want more "quotable" digits, you need more sample rounds.
Note that this leaves all of the combinatorics at the door, so to speak. The number of decks, whether suits matter, etc., are not the important factor. All that matters is the *variance* of the underlying distribution. For example, suppose that we simulate a round, not by shuffling a deck and actually playing out a hand-- which has all of the concerns associated with the large number of permutations-- but instead just make a single random draw from the probability distribution of outcomes such as the table in the Wizard of Odds appendix above. Then all of the above analysis still applies: more accuracy requires more samples, based *solely* on the *variance* of the underlying distribution. (Of course, if we knew the distribution in the appendix ahead of time, then we wouldn't need to run the simulation in the first place, but you get the idea.)