1.d3 and 1.e3 are dead. TWIXT PP
12 replies. Last post: 2005-02-10Reply to this topic Return to forum
Alan Hensel at 2005-02-05
Inspired by bennok’s posting in the “300,000th game” thread about downloading 50,000 games, I decided to write a Ruby script to download all of the Twixt games. As bennok noted, it’s a pain because of timeouts. I had to restart the script several times. It should have taken only 50-55 hours, but it actually took more than 4 days.
Worse, there is no easy way to distinguish between games in progress, forfeits, and outright wins (wins by connecting opposite sides). I had to do some extra programming to parse all of the moves and determine whether opposite sides were connected. Some more programming went into scraping the HTML page to find the “Result 2:0” or “Result 0:2” string that would signify a forfeit.
The first TwixtPP game played on Little Golem was 37491, Richard Malaschitz (Little Golem’s webmaster) vs. jymon, on March 24, 2003. Richard lost.
Since then, 12243 more TwixtPP games have been played on Little Golem, and 322 more are in progress.
A Twixt game can end in one of 4 ways: resignation, connection, draw, and forfeit. The 12244 completed games ended this way:
9555 (78.0%) - resign
1531 (12.5%) - connect (a player actually connected his border rows)
1136 (9.3%) - forfeit (a player abandoned the game; it timed out)
22 (0.2%) - draw
Notice how few draws there are! There are so few, you can inspect them by hand. Having done that, I would classify 9 of them as real draws, and 13 as “courtesy” draws. What do I mean by “courtesy” draws? For example, when Clever Hunk was kicking Tanya’s ass, and then seemed to suddenly change strategy with respect to Tanya’s ass, and offered a draw, and she accepted.
240489 (Holy Cr@p!)
With only 9 real draws in Little Golem TwixtPP history, draws account for less than 1 in 1000 games!
Games ended with an average of 24.4 pegs on the board, and a median of 26 pegs.
33.2% of games – just a smidge under 1/3 – are swapped. Are games underswapped? Well, excluding forfeits and draws, the first player won 5584 out of 11086 games, or 50.4% of the time. If we imagine the 0.4% represents newbies who haven’t figured out the swap option exists, then it’s actually remarkable how well the swap rule is working.
Speaking of the first move, here’s a really interesting question: how strong has each opening move been, historically? Which ones are near that magic 50-50 point that you’re aiming for in a first move, and which ones should you swap? The historical data is fascinating, so I’ve made a whole separate page for it. Draw your own conclusions.
Door1 at 2005-02-05
I was wondering if the reason there are not more draws is because of people resigning? I think there could be a few of them although I have not checked.
David J Bush ★ at 2005-02-05
Thanks for all your work!
Of those 9 real draws, 3 of them involved the same player, Dan Vasilescu: 47271, 50381, and 286636.
What do the colors mean on that quadrant chart? I thought they might indicate red for “swap this,” blue for “don't swap,” and purple for “unclear” but what algorithm is used to determine this?
It might be interesting to see two more quadrant charts: how a specific first move fares when it is swapped, and how it fares when it is not swapped. I assume you combined these two cases into one for your chart.
Also would it be possible to sample just those games where both players have ratings above some cutoff value?
Why does your title say 1.d3 and 1.e3 are dead? Don’t tell Klaus...
Alan Hensel at 2005-02-05
Certainly there is some truth in that, Door. It’s part of why I never begrudge anyone playing beyond the point where it looks like they could reasonably resign.
In the recent draw between Dan Vasilescu and technolion (game 286636), if technolion had chosen 39.resign instead of 39.g22, no-one would have batted an eyelash. It looked like it was over. Who could have known that Dan would blunder in move 40?
In game 84238 between catherine and deyzer, the draw condition is clearly there. It’s very tempting to want to count it as a draw in the stats. But the game actually ended as a forfeit. Catherine let it go.
Another interesting case: isn’t there something Cynic Kim could have done in game 73589 against Pit? There are only 6 moves in the whole game – 3 each – can it really be an honest win?
But it could just be that I’ve been geeking out on Twixt data lately, and these special cases have caught my attention. Maybe these cases are really rare, and the vast majority of resignations are the right call. Then again, who knows.
Just don’t begrudge anyone playing it out.
Alan Hensel at 2005-02-05
So, I wonder if there’s something about Dan’s playing style that elicits draws?
The colors are explained at the bottom of the page. You might have to google “confidence intervals”, as I know they’re hardly common knowledge. But it plays into the color coding, as purple denotes a certain degree of uncertainty – not enough data... Everybody, play more Twixt!!! :-)
As for rating-based sampling, I think I might toy with that next. It might actually be easier and more meaningful to generate the ratings myself from the data.
My title was meant to provoke. My first idea was “Twixt stats”, but that was dull. And I, for one, now that I’ve seen my data, will only play d3 or e3 if I think the other player will not swap it. And I’ve noticed that the top players have been swapping d3 and e3 lately. Good intuition, perhaps? In any case, I think the stats have given me the confidence to play new opening pegs. C3 or f3 if I’m being conservative; L3, j4, c10, or maybe others if I’m feeling more experimental.
Alan Hensel at 2005-02-06
Well, I’ve toyed with ratings. And the results are in. What I did was sort the games by game ID, and run an ELO rating system over them, and sample only those games where the rating (after the game) of both players was above average.
The table looks a little different now. Much more purple. D3 is still bright red, but e3 now looks more reasonable, and f3 has gone dark purple. C3 is holding steady hovering over 0.5. More spots down the c-v column look good.
Method: I used the same ELO ratings formulas as Little Golem, but the numbers don’t exactly match, for a couple of reasons. Little Golem calculates new ratings at the end of the game, but my data does not track when the games ended, so I calculate ratings in the order that they started. I also didn’t bother not rating the unrated games. And I don’t know if Little Golem ever rounds to integers behind the scenes, but I don’t. As a sanity check, here’s how the Top 10 come out of this system:
1. 2533 Klaus Hu?manns
2. 2320 Pit
3. 2230 Axel Wehrenberg
4. 2202 David J Bush
5. 2164 Tim
6. 2157 Alan Hensel
7. 2115 Steven
8. 2110 Loren Schenkelberg
9. 2063 Dan Vasilescu (silexu)
10. 2013 tasuki
Seems about right. Numbers are a little higher; not sure why. By this unofficial calculation, Klaus is a Grand Master Twixt player!
David J Bush ★ at 2005-02-06
I don’t recommend anyone play 1.L3. Never mind the stats, it’s a horrible move which should never be swapped. 1.L4 is almost as bad, and I sure wouldn’t swap 1.L5 for that matter. I was a bit stunned to see that 1.L1 or it’s reflection was played in 62 games. That tells me the signal to noise ratio is a bit low. 1.L1 is absolutely the weakest first move you could possibly make.
Alan, you mention decisive games which arguably should have been draws. I would be particularly interested in decisive games which would have been draws under the standard Twixt ruleset (or better still, decisive for the other player.) This is not a very well defined criterion; what I mean is, if the winning path involves crossing links, and the losing side arguably could have forced a draw or even a win under standard rules at some point, then that may be an example. If the winning side could have made an alternate move which would have won under standard rules, then that is not an example.
My purpose is to provide evidence that the rules difference does make a difference, and standard rules with link removal should be implemented instead. This was discussed in another thread, where game 113257 was discussed in detail. That’s the only example I know of so far where the rules difference made a difference in the game outcome.
And just in case you were wondering, here is an example of a win for one side turning into a win for the other side when the rules are changed.
Alan Hensel at 2005-02-08
There are currently 435 finished Twixt games with crossed links on Little Golem, out of a total of 12323. Of these, the winner had crossed a link in 331. (I’m having a hard time imagining how link crossing could make a difference if the link-crosser lost the game.)
So, more than 3.5% of Little Golem’s Twixt games have crossed links. And in 2.7% of the Twixt games (one in 37) the winner crossed links.
Would you like to sift thru those 331? Or all 435? If so, how would you like me to serve the list? And is there anything else I can search for, that doesn’t require AI?
(I just poked thru a few at the top of the list. 39278, perhaps? I didn’t look at it very closely.)
Flemming Jensen ★ at 2005-02-09
Thank you for all the work. Thats very interesting, indeed.
best regards Flemming
David J Bush ★ at 2005-02-09
Wow, fantastic! 39278 is indeed an example I was looking for. White might have been able to achieve a standard win by varying moves earlier in the game, but by move 46 the only winning path I see for White crosses itself. Again one of Dan V.'s games!
You’re right Alan, if only the losing side has crossing links, then standard rules would not have made any difference. Would you be willing to email me a text file of URLs? Just the game numbers would suffice, but URLs would be more convenient. Oh, and a bag of chips would be nice. WITH dip.
I am indebted to you for your hard work! Many thanks!
My email is firstname.lastname@example.org
Dan Mircea Vasilescu at 2005-02-10
Now I am an experiment. :O).
The reason why my games are so different is because lately I am thinking very few for each game. I am playing more by intuition. And the day before yesterday it was horrible, because I manage to blow up a lot of games. 1800+ here I come.
Sorry if I messed this post with my pitty complains.
Good luck everyone.
David J Bush ★ at 2005-02-10
After sifting through 332 games, I believe I found 8 examples where the rules difference makes a difference. I posted my results in the thread Changing the rules back to standard.
Door, you were right. I found a couple of games there that should have been draws even under TwixtPP rules, but one side resigned anyway.
Many thanks to Alan for providing more fuel for my argument!