question about standard deviation and variance

sagefr0g

Well-Known Member
#1
in another thread i posted a bell curve graph of some data
https://www.blackjackinfo.com/community/attachments/gaussian-jpg.8969/

well anyway, in the bell curve graph, i just let excel do it's thing, to where i just followed instructions on how to create a bell curve from some data, sorta thing. didn't really understand fully what i was doing, but i did precisely follow microsofts instructions on how to do so.

but anyway, i was further just fooling around, tinkering with excel and that data, far as getting numbers for variance and standard deviation. i first allowed excel to compute variance and standard deviation for all of the data and excel came up with numbers for variance and standard deviation. but then, i realized that my data contained negative numbers of losing events and positive numbers of winning events, also the number zero for tie events. so, further just fooling around, i made excel compute variance and standard deviation for the positive and negative events separately.

so, it confuses me, far as how to interpret those two sets (complete data & negative, positive data separated) of numbers for variance and standard deviation. or maybe it's not proper to separate the positive and negative data, far as creating numbers for variance and standard deviation?

anyway, if anyone understands my confusion, please illuminate.o_O :confused:
 

gronbog

Well-Known Member
#2
Positive, negative and zero data points are perfectly valid for the computation of variance and standard deviation of a system (like a blackjack game) for which those results actually occur.

Variance deals with this by squaring the individual data points before obtaining the difference from the mean, thus eliminating the "direction" of the difference. Standard deviation is simply the square root of variance.

You can separate the positive and negative results and calculate the individual variances/SDs, but I don't know why you would want to do that.
 

sagefr0g

Well-Known Member
#3
@gronbog
thank you so much for your reply!
i didn't really know why i wanted to separate the positive and negative results, either, just was tinkering around in my relative confusion.
but anyway, another question regarding that separation of positive and negative results, would it be possible to create a gaussian curve for just those separate results? eg. the negative results only.
if so, would it come out looking like a bell curve, sort of?
errhhh, i know i could just try and do it with my data, but just asking.

edit: by the way, your response has helped to allay some of my confusion.
 

gronbog

Well-Known Member
#4
The gaussian (normal) distribution is an approximation of the binomial distribution (bell curve), so if you used only the negative data points, I would expect it to look like 1/2 of a bell curve.
 

sagefr0g

Well-Known Member
#5
this is what i come up with:
all data, negative only data & positive only data, images in that order below.
note: the green is actual data(histogram?) graph, blue is i guess excel's rendition of the (random?) data (histogram?) as a bell curve, and red histogram bin data (not sure really what that is).
not sure what the word histogram means, lol, guess i oughta look that up.


alldata-jpg.8982
all data
negdata-jpg.8983
negative only data
posdata-jpg.8984
positive only data
 
Last edited:

sagefr0g

Well-Known Member
#6
bear with me a bit here, as i have no formal education regarding this stuff and am just trying to gain some understanding.
so one thing i think i understand about the above histograms, is that the frequency data for the green line 'from original data' type and the blue line 'generated from data randomly filled' are unique numerical values, whereas the bin numbers are the same for each. the implication to me is that the random data is needed to build the 'classic' bell curve, sorta thing. apparently the random numbers excel generates relative to the data can help 'fill in the blanks' that real data tends to lack (perhaps because it is so sparse and and prone to herky jerky fluctuation), so that the proper fit of a bell curve can be derived, from those 'fitted' frequency numbers. in other words, perhaps with enough original data, the green curve would tend to look more like the blue curve, but i wonder, with more data, might the blue curve change as well to a better fit, being as the more data there is the better fit a created bell curve would be, sorta thing?
@gronbog , you mentioned why i might want to separate the negative and positive values in my data and compute the fluctuation and standard deviation. i really had no idea, but i must say, i do find the three graphs above interesting. i mean it's a fact my loss's are weighted less than my wins, far as the data goes and just me maybe but to me the graphs above clearly show that.
just me maybe, but it seems a good thing to have both the green and the blue curve, as the green curve is real deal stuff, and the blue curve is more theoretical. so one might take with a grain of salt, either graph and go from there when ruminating over the implications either graph may have behind it.
but anyway, i think that i get the idea of degrees of standard deviation and the mean relative to gaussin graphs, at least on an elementary level.
edit: another question, far as a bell curve, like when one refers to say one standard deviation low and one standard deviation high isn't the normal way to look at it if you are referring to a bell curve, that the one standard deviation low is on the left side of the curve and the one standard deviation high is on the right side of the curve?
 
Last edited:

gronbog

Well-Known Member
#8
sagefr0g said:
isn't the normal way to look at it if you are referring to a bell curve, that the one standard deviation low is on the left side of the curve and the one standard deviation high is on the right side of the curve?
The curve should peak near the mean (if it's a perfect normal curve, it will peak exactly at the mean). Standard deviations high and low will occur to the right and to the left of the mean respectively.
 

sagefr0g

Well-Known Member
#9
gronbog said:
The curve should peak near the mean (if it's a perfect normal curve, it will peak exactly at the mean). Standard deviations high and low will occur to the right and to the left of the mean respectively.
yes, that’s how i thought it was. i never even have come close to understanding how to interpret a bell curve or histogram. never even knew what a histogram was till going through the process of trying to make a bell curve that fit my data. i had just understood in a foggy sort of way that results would be 1 std around 68.2%, 2 std around 27.2%, 3 std around 4.2% and 4 std around 0.2%, and that there were high std’s and low std’s, such that low std’s were interpreted on the left side of the curve and the high std’s interpreted on the right side of the curve, such as 1 high or low std is valued +,- 34.1%, 2 high or low std is valued +,- 13.6%, 3 high or low std is valued +,- 1 2.1% and 4 high or low std is valued +,- 0.1% . essentially degrees of +,- sigma distributed (or arranged) so that it is dispersed incrementally increasingly or decreasingly from the mean and about the mean sorta thing.

so anyway, i was kind of blindsided by the fact that negative data could yield a bell curve. but now i think i realize that a set of data negative, positive or mixed, has a mean and that data can be distributed (or arranged) so that it is dispersed incrementally increasingly or decreasingly from the mean and about the mean sorta thing. kind of a wakeup call, for me, lol.

so i’m guessing a bell curve can or is, in essence a probability space of sorts. to where, one can guesstimate the frequency of future events, far as how those events relate to some degree of high or low standard deviation as long as the condition of the future events have at least the average condition of the prior data s conditions.

one thing kind of throws me, far as histograms and bell curves goes, has to do with fitting a bell curve to the results of data from advantage play, and the symmetrical appearance of a bell curve, sorta thing. intuitively, it would seem because of the fact that such data should be heavily weighted win amounts over lightly weighted loss amounts, that one would expect the histogram to be skewed looking (which it is) in favor of positive amounts over negative amounts, sorta thing. but bell curves are symmetric on both sides of the mean, to where in my mind the bell curve, belies the fact of the weighting of the events that (in my mind) are supposed to be depicted (in other words, i would intuitively expect the bell curve to appear skewed because of the respective weighting of wins & loss s). i know i must be confused or missing some relevant understanding with respect to that concern. please, anyone, illuminate me on my confusion if possible. i suspect, my confusion is in the same realm as my confusion over the fact that negative data could be depicted by a bell curve, sorta thing.o_O:confused:
 
Last edited:

gronbog

Well-Known Member
#10
Perfect bell curves are symetric, but the actual bell-like curves produced when graphing data points from actual systems are often skewed to the left or right. There is a formula for quantifying the amount of skewness. It might not be obvious to the eye when looking at the graph
 

sagefr0g

Well-Known Member
#11
gronbog said:
Perfect bell curves are symetric, but the actual bell-like curves produced when graphing data points from actual systems are often skewed to the left or right. There is a formula for quantifying the amount of skewness. It might not be obvious to the eye when looking at the graph
now that you mention that, i can definitely see skewness in the histograms of my data produced by excel, but also i believe i can see some less than obvious small degree of irregularity in the 'bell curve' produced by excel.
i did find some functions in excel that can evaluate skew and kurtosis
skew-jpg.8986
kurt-jpg.8987


edit: i did try and apply these functions to some of the graphs created above, but found that i couldn't figure out what part of the histogram tables (bin & frequency or just bin or just frequency) to include in the function. when i attempted each combination, neither result was zero. so if any of my attempts to use the skew function was proper, then there is some skew, lol.
definitely shows me that i don't understand very well, what the heck the bin & frequency table really is.o_O:confused:
 
Last edited:

sagefr0g

Well-Known Member
#12
@gronbog
one final question (all right i lied, probably will have others,lol)
but anyway, am i correct in thinking that essentially a gaussian bell curve ( a perfect one) is a theoretical construct of a histogram (for which the condition or state being analyzed is appropriate for the construction of a bell curve) of the ultimate (virtually infinite) data for some conditional state for which data can be collected? in other words, is a gaussian bell curve supposed to be the theoretical most accurate histogram of data, assuming we were able to obtain all data possible, flawless in every way? or another way of saying it, a gaussian bell curve is the ultimate correct histogram in a set of histograms of increasing accuracy and data availability?
if the above is correct, (then i really did lie, because another question immediately pops up). the question of which conditions or states of being are or are not appropriate for creating a bell curve.o_O
 

gronbog

Well-Known Member
#13
As I understand it. The Gaussian distribution is a theoretical distribution which can be used to estimate the properties of systems whose distributions resemble it (i.e. look like a bell curve). No actual system is likely to actually converge exactly to the Gaussian distribution.
 
Top