Statistical Theory: Using Bayes' Estimators | Heads Up Poker And Spin and Go Videos

15 posts / 0 new

Tue, 09/06/2011 - 21:31

coffeeyay

Statistical Theory: Using Bayes' Estimators

Using a Bayes’ Estimator to Provide a Best Estimate of a Random Villains’ Opening Frequency Given a Small Sample Hi guys, I still don’t have access to PokerStars so I’ve been working on theory instead. Chad asked an interesting question that got stuck in my head until I worked out my solution. Here’s the question: “What maths would we use to find the distribution of likeilhoods of a specific outcome given a finite sample, for ex say someone raised 4/5 buttons and we are looking to find the likelihood of their pfr being greater than 70% how would we go about this (or just point me in thew direction of the right theory and i'l give it a shot). Thinking in the context of making significant adjustments in the face of small samples.” So first off, a disclaimer: I’m not Mers. This is the first time I’m doing this kind of problem. I do have some experience with this kind of stuff since I have a degree in Physics and Math and I took a grad level stat class, but I’m not a stats wiz. I might still be doing stupid stuff, and I might be flat out wrong. But here is my best take on it, along with the research I did to get to my conclusions. It’s pretty long but the results are pretty simple. So the idea of finding a distribution and estimating a parameter (in this case opening frequency) is a problem in statistics called estimators. Finding “best” estimators is an interesting subject and has a lot of depth of it. That being said, I’m doing a cursory job and jump to the chase and avoid much of the underlying theory. Read more about estimators here: http://en.wikipedia.org/wiki/Estimator So as we start the problem we immediately reach a cross roads because there are two main approaches to the problem. The first is the direct method--all we know is that someone raised 4/5 buttons. We focus in and care only about did they raise or do something else. The data is in the form of {1,1,0,1,1} where 1 = raised, 0 = didn’t. This means that the sample represents a Bernoulli distribution. So here we could use the Bernoulli distribution http://en.wikipedia.org/wiki/Bernoulli_distribution and calculate Chad’s question. But I didn’t do that. I’m working for myself and so I’m going to answer the question in an expanded way that makes more intuitive sense for me and will provide a more accurate estimate. The problem is that I don’t think that the way we have formed the problem is accurate, and I don’t think solving it will give practical information. Chad wants to use the information to make decisions like can we make a significant adjustment like 3b bluffing. For me the question becomes interesting when we immerse the problem in reality--we aren’t playing against an unknown coin. We are playing against a player within a sample population. I want to use more advanced math and give an accurate estimate of the following question: I am playing against a random $30 ST player, the effective stacks are between 20 and 25 bb, and this player just opened 4/5 of his buttons. What is his most likely opening frequency, o? Hint: it’s not 80% as it was in Chad’s question. Why not? Because the average $30 player has an established average frequency and we have to use that to get his most likely opening frequency. In his article about 3betting Mers specifically stated that his estimate of mean opening frequency is 55%. We want to use that kind of info. So here’s where we get deeper into the math. We want to use prior information to make our decision more accurate. This kind of statistics is called Bayesian statistics. http://en.wikipedia.org/wiki/Bayesian_statistics. The basic idea is that behind the parameter o (the opening frequency) there is a prior probability distribution, we use this together with our random sample (the 5 hands in which he had an o of 4/5) to create a posterior distribution of what we think his opening frequency looks like. I want to use the best possible Bayes’ estimator to do this. http://en.wikipedia.org/wiki/Bayes_estimator To make life easier I’m going to use a common trick in Bayesian statistics--I’m going to use a convenient type of prior distribution. For each random sample distribution there’s a “conjugate” distribution that is particularly easy to work with called the conjugate prior. http://en.wikipedia.org/wiki/Conjugate_prior. For the Bernoulli distribution this is the Beta Distribution. http://en.wikipedia.org/wiki/Beta_distribution. It’s kind of ugly distribution itself, and it might not be the most accurate estimate of the actual prior, but it means we can black box our math a lot and all of the formulas we’ll need in the end end up pretty and exact. So let’s figure out this prior distribution. The Beta distribution has two parameters (called hyperparameters) alpha and beta (called the shape and the range). One way to do Bayesian stats is just to guesstimate the parameters using reasonable assumptions (like using Mers’ 55% mean for instance). I’m sure this works fine for someone who has an intuition for the Beta Distribution, but after playing around with the two hyperparameters I realized that I don’t. And I want this to be relevant to $30 ST games in particular, and I don’t intuitively know what that looks like either. So I’m going to use actual data to do this--just some small set for everyone to get an idea of the results we can get using this method. I used a small random sample of villains from my database of FullTilt superturbos, which unfortunately was found through Poker CoPilot which has really terrible filtering so it’s not filtered for any blind level or stake. The only filter I used was that I used the villains against whom I had the most hands so that at least each data point wouldn’t be skewed by small sample size. I supplemented this data with some data from Mr. Bambocha’s HEM which also isn’t necessarily filtered for effective stacks. This won’t be super accurate, but if there’s interest then together as part of the FastTrack forum we can always make this more. For now I just wanted something that was reasonable and so here is the data I used: {0.6; 0.55; 0.69; 0.62; 0.23; 0.65; 0.23; 0.27; 0.41; 0.6; 0.34; 0.63; 0.08; 0.64; 0.0; 0.54; 0.8; 0.65; 0.38; 0.38; 0.2; 0.56; 0.55; 0.56; .59; .53; .61; .54; .43; .7; .57; .77; .51; .46; .69; .54; .57; .15; .36; .62; .58; .24; .58; .70} I then plugged these values in to this calculator http://www.wessa.net/rwasp_fitdistrbeta.wasp to get the best Beta distribution fit for the data. This gives me the Beta distribution hyperparameters alpha and beta, called shape1 and shape2 on that website, that best correlate with the data set. And now we have our prior distribution--it’s defined entirely by being a Beta distribution and having alpha = shape1 = 2.48 and beta = 2.68. I graphed it using Mathematica’s free webapp and it looks like this: http://www.wolframalpha.com/input/?i=beta+distribution+2.48%2C+2.68 So now we can use it to solve our initial problem. Using the following site http://www.ds.unifi.it/VL/VL_EN/point/point4.html about Bayes’ Estimators, I look at the part about the Bernoulli distribution. Using their notation, we have a = 2.48, b = 2.68, Xn = number of opens = 4, n = number of hands = 5. I now use their formulas to get the answers I want. First I use their formula right after bullet 1. The posterior distribution is a beta distribution (that’s the magic of a conjugate prior, the posterior distribution is of the same form) given by a’ = a+Xn = 2.48 + 4 = 6.48’ and b’ = b + (n-Xn) = 2.68 + (5-4) = 2.68+1 = 3.68’. Again I plotted it and it looks like this: http://www.wolframalpha.com/input/?i=beta+distribution+6.48%2C+3.68. This represents our now population distribution--we are not dealing with a random anymore, but instead a random who then proceeded to open 4/5 buttons. The posterior distribution represents that population sample. This is the qualitative part of the answer--like how likely is it over 65%. But I want what is the directly useful result: the answer to MY problem--what is the best estimate of villain’s opening frequency given a prior. This is given by the Bayes’ estimator after bullet 3. -- Un = (Xn + a)/ (n + a + b). Plugging in the variables, we get that Un = (4 + 2.48)/(5+2.48 + 2.68) = .638 = 63.8% = o. We have it! So I know this was a wall of text with math involved so CLIFFS: Given a population sample we can use the data to compute a prior distribution in the form of a beta distribution with the hyperparameters a and b. In this case I used a small sample and got a = 2.48 and b = 2.68. Then we use our random sample of data from the current villain in the form of total number of opens Xn and total number of hands n. In Chad’s case that was Xn = 4 and n = 5. Lastly this lets us compute the most likely opening frequency o% = (Xn + a)/ (n + a + b) = 63.8%. Some things that I know can be improved and worked on further: 1. Improving the random villain data set. The more we put in this the more accurate our estimator will be and therefore the more useful. This could be made very accurate by having it filtered by blind levels and stake. As it is, I have some data points but there should be many more and it would really be best to make sure they are accurately filtered by stake and effective stacks. This is really key to making this method accurate and useful in game--essentially once we have accurate alpha and beta numbers we will be able to always have an accurate idea of villain’s ranges. 2. Working out more uses for the posterior distribution we calculate--for example you can use it later on in a match as a better prior to calculate off of. It also contains in it more information like variance and likelihoods of other opening frequencies besides the most likely one. 3. Using this kind of approach to dealing with non binary ranges. By incorporating limping, folding, and creating a distinction between open shove and minraise frequencies we can try to make our estimator more powerful, nuanced, and perhaps more accurate. It will involve generalizing our conjugate prior and random sample distributions from the univariate Beta and Bernoulli to their respective multivariate analogs Dirichlet: http://en.wikipedia.org/wiki/Dirichlet_distribution and Categorical http://en.wikipedia.org/wiki/Categorical_distribution.

Tue, 09/06/2011 - 21:34

hokiegreg

LOL. This is a perfect

LOL. This is a perfect example of a question appropriate for the Mers-Only thread. Just direct any math-related questions/ideas to that thread. I'm not even going to pretend to be able to understand Bayesian theorum type stuff :)I'm good at reviewing hands, talking about general theory, mental game stuff, and owning peoples lives.

Tue, 09/06/2011 - 21:39

(Reply to #2) #3

coffeeyay

I was hoping to incite some

I was hoping to incite some discussion. It isn't so much a question as an answer :) I tried to make it accessible, read through it and ask if you have any questions.

Tue, 09/06/2011 - 21:48

hokiegreg

oh i misread the beginning, i

oh i misread the beginning, i see. you are responding to a question! haha. will read through it, good stuff!

Wed, 09/07/2011 - 00:35

Champaz

I was thinking about this stuff too.

Intersting stuff especially for us superturbo players that play vs new random opponents all the time.I diden't quite get how you came up with these forumlas but I guess I'm gonna read a little bit about Bayesian Theory, if you could explain it in a little simplier way so that even retards could understand I'd be super happy.I'm also quite interested how you Greg would more practically think about these kind of problems without going to deep into the math side of it.Btw can you see how much a ''average opponent'' opens at diffrent blind levels in HEM and if you can how?

Wed, 09/07/2011 - 01:23

mersenneary

Hokie if you're going to pay

Hokie if you're going to pay people to come onto the team you should at least let me know first.

Wed, 09/07/2011 - 01:25

mersenneary

...wait a minute I'm getting

...wait a minute I'm getting told this is a post by a student.Awesome stuff coffee. I think the biggest thing you have to worry about is the question of practicality, and narrowing down this information into a way that people can apply it and actually change decisions that they make at the table. That said, introducing more math in a quest for precision that really is quite valuable is never going to be a bad thing. Thanks for posting this.

Wed, 09/07/2011 - 07:51

(Reply to #7) #8

coffeeyay

well the nice thing is that

well the nice thing is that the formula you get at the end is quite simple: o% = (Xn + a)/ (n + a + b). since the hyperparameters a and b are done outside of the table, all you need to do is add the sample size to the denominator and add the observed opens to the numerator which you cna do on a calculator. Suddenly you have a best estimate of their opening frequency and you can use it for 3b shove math. I'd argue that finding this opening frequency is easier than figuring out what range to shove over the frequency.It would be nice to expand this to include limping because then the results would possibly be less obvious because limping ranges tend to be small. Practicality wise I think there clearly are still big problems: one is that this method tells nothing qualitative about their range--you still have to guess if the ranges are merged or polarized. However if properly generalized to include limping and shoving it's possible we could get end up numbers that let us make educated guesses.The other is problem is constructing a good prior since we really need to have confidence that it is relevant. People play very different between 20-25bb and 0-10bb and so i'm worried that the current sample i have is a bit polluted and so barely relevant. Similarly when using it at the table you need to be careful that you're counting the random sample all at the same or similar effective stacks. One place where things can be helpful is that if you correctly calculate the posterior early in the game and then villain moves out of that effective stack range, you can quickly calculate the posterior and stick the new a and b hyper parameters in your notes. If the game comes back to those stack sizes due to a swing, or you play villain again in a few days, you can use those new hyper parameters to make your decision even more accurate and not worry about having to read your HUD for which opens were from which stack sizes.

Wed, 09/07/2011 - 08:07

(Reply to #8) #9

coffeeyay

What other concerns are

What other concerns are there? I'd love this stuff as useful and accessible as possible :) I know The math is a bit intense, but you can just treat it like a black box. With that in mind, I'll quickly run through what I did without all the dead mathematicians. The only stats jargon I'll use in the next paragraph are prior distribution, which is the general population tendancy, and the posterior distribution, which is the population tendancy given that villain acted in a certain way. Our goal is to use the prior to find the posterior and the most likely opening frequency.INPUTS: First input is a sample of villains' opening frequencies, which I use to compute two numbers a and b, which define our prior distribution. The second input is in game information--number of hands, n, and number of opens Xn.OUTPUTS: The first output is most likely opening frequency which we compute using our inputs using the formula o = (Xn + a)/(n+a+b). The second output is an update of our population tendancies: the posterior distribution. This represents the fact that our villain isn't random anymore since he opened Xn/n buttons. This is found by updating our original inputs a and b to turn them into the new a' and b' by using the formulas a' = a + Xn and b' = b + (n-Xn). Note that in the b' formula (n-Xn) is simply number of folds, so to update our prior we simply add opens to a and add folds to b.Now we jot down the opening frequency and use it as our best estimate for the next few hands, and we jot down the a' b' into our notes so that if we want to do this process again we can do it more accurately using a' and b' as more accurate inputs against this villain rather than the original a and b.

Fri, 09/09/2011 - 17:06

#10

coffeeyay

Data gathering

Anyone have thoughts questions on any of this?In particular, any thoughts on how to get good data for more situations like cbet %, limp %, open shove?I don't have HEM yet so I don't know how easy it will be to get data like this. I think the problem of filtering by effective stacks is most important because with these Bayesian methods it's not so bad if you don't have much prior data since it's taken care of in the hyperparameters (essentially the larger the total of a and b the more certainty you are placing in the data), but it's really bad if you have incorrect data since then you're going to be consistently making errors akin to using incorrect reads.The more i've been considering this process the more applicable I think it is--given aggregates of cbet% you can tell how soon you can start c/r light or donking a lot, given limp % you can start jamming over limps light, etc, and the math is simple enough that you just need an excel sheet set up and you plug in n and Xn and voila you have a best estimate of whatever you need. It really lets us quantify small sample size reads in a pretty useful way. But it hinges a lot on the prior being decent, as well as obviously knowing what to do given reads ;)

Sat, 09/10/2011 - 16:03

#11

mersenneary

This would be pretty sick as

This would be pretty sick as a HUD number even for people who don't understand where it comes from. People aren't going to want to calculate it but if it's just there I think people would really like it. And it definitely is useful. If someone minraises the first 3 hands and the HUD says 100% people make mistakes jamming hands they shouldn't sometimes.

Sat, 09/10/2011 - 17:45

(Reply to #11) #12

coffeeyay

Yeah i was kinda thinking

Yeah i was kinda thinking that it would be quite useful in a HUD. I'm going to keep mulling through methods and have been looking into the generalization to a multivariable setting to incorporate limping, it doesn't seem to hard to set up math wise, but the prior and in game work would be a bit more so I'm not quite sure if it's worth it. That being said, in the context of a HUD it would be incredibly +EV since information about his limping range gives information about opening range and vice versa--i still think the biggest hurdle would be dealing with incorporating effective stacks.

Sun, 09/11/2011 - 18:35

#13

mersenneary

Yep, agreed.

Tue, 09/13/2011 - 01:44

#14

chadders0

ok so iv fiddled with pt3 and

ok so iv fiddled with pt3 and i got some stats for some 20-25bb on avg population, i got stuff like their avg defend %, their 3b% and i think with more fiddling later i could fill most requests on necessary stats, the info can be filtered for 25bb exactly or 20-25bb, the later having a sample size at least 5 times as big.one i pulled was 3b% first hand from opponents was around 21%, and an even split of all ins and non all ins (though some of those non allins were no doubt stuff like 230 which i guess is an effective all in)this became 18% when filtered for 20-25bb, so i guess that may be a slight indication of first hand spazz factoring in.They flat an additional 40%ish (i think this was approc the same for 25bb and 20-25bb).It's actually really satisfying to pull those stats out so if anyone playing has a much bigger sample over a similair stake i will happily help them do the same to get more accurate results. i had only a 10k game sample.

Check out my Hyper Turbo Video Pack

Follow me on twitter

Wed, 09/14/2011 - 14:11

#15

coffeeyay

Nice! Some progress!The only

Nice! Some progress! The only problem is that an average is only one piece of data--the other piece is a confidence or variance. That would let us compute the hyperparameters a and b and then know that an observed frequency o moves us x away from the avererge prior frequency f. But I think just having those averages is incredibly useful from a qualitative point of view.Can you post them here? If possible include sample size, and then I can just use a bit of guesstimation for the variance and can create hyperparameters for each effective stack size.

Shopping cart