2 years of Twixt TWIXT PP

10 replies. Last post: 2005-06-06

Reply to this topic Return to forum

2 years of Twixt
  • Alan Hensel at 2005-03-24

    It was 2 years ago today that the first TwixtPP game on Little Golem ( 37491) was started. Happy anniversary, TwixtPP! Since then, 13500 more TwixtPP games have been played, and 723 more are in progress (a high number, because Championship 8 just started).

    I've made some more attempts to get more meaningful results out of the first-move stats. There are so many ways to cook the numbers. But my current favorite is an adjusted scoring system. It's based on the idea that the main source of inaccuracy in the original raw numbers is the fact that the better players may have different first-move preferences.

    For example, if player A has a win expectancy of .8 over player B, then in the course of 5 games, on the average, player A will win 4 games, and player B will win 1. If in each of these games, player A took 1.f3, then the stats for the f3 spot are strengthened by increasing the numerator by 4, and the denominator by 5. However, an observer who knows the relative strengths of the players will not be moved to believe f3 is anything but neutral by these results.

    If we score the spot by adding .2 for each win, and subtracting .8 for the loss, then the expected neutral result is achieved. Said in a more general way, the score is increased if the owner of the spot won, and decreased if he lost; it is changed by the larger win expectancy if it was an upset, and the smaller if it was the expected result.

    The average is still taken at the end so that the moves are comparable. A player's first 10 games don't count, because his rating has little meaning at first. I also multiplied the results by 1000 because all the “0.” repetition was annoying. Then I added color coding. Purple = neutral, Red = strong, Blue = weak, Black = insufficient data.

    Updated “uncooked” numbers

    Updated “uncoooked” numbers for above-average players

    [Fully “cooked” numbers](<a href=)

    The cook's comments on the cooking: 1.d3 still looks too strong, but 1.e3 has sunk into negative territory. So, maybe e3 shouldn't be swapped, after all. It's highly controversial! ;-) The striking neutrality of 1.f3 should make David Bush happy.

    Darkness indicates insufficient data. There is a lot of insufficient data, and a few bright spots that I don't believe. That's to be expected. Still, there are a lot of possibilities for the adventurous, down the c column and in the upper-middle regions.

    Have fun!

  • Alan Hensel at 2005-03-24

    Um, I don't know how I screwed up the link, sorry…

    Fully cooked numbers

  • technolion at 2005-03-26

    w-w-w-what? I am sorry, after reading your post once I was completely lost. ;)

    But slowing I am trying to grasp your statistics calculations :) Thanks a lot for putting so much work in evaluation first move statistics!

  • David J Bush ★ at 2005-03-26

    This is an amazing amount of work, and your charts are fun to peruse, but of course you never claimed that they provide a useful guide for choosing a first move. There are three main reasons that keep the signal to noise ratio low, as I see it:

    1. There haven't been enough games between strong players.

    2. Maybe even the strong players aren't strong enough to produce results that correlate well with the intrinsic value of the first move.

    3. There's a mindset that may or may not be correct: Most experienced players have their own ideas about what would be a well-balanced first move. I'm not likely to try 1.H10 for example, because it looks horribly unbalanced to me. I doubt anyone rated over 1900 would be willing to play 1.H10 either. So, the data you get from such moves near the central region of the board probably involve at least one relatively weak player. But Twixt is full of surprises. Maybe some bizarre first move that looks horrible to an experienced player, is actually a very balanced first move, if you know the tactics involved. But such a move would probably take a long time to find, since experienced players would have to get past their preconceptions in order to give it a fair shot.

    Am I making any sense at all?

  • Hjallti ★ at 2005-03-28

    I'm not experienced at all, but I think there is another kind of noise involved…

    4. In games against weaker players a master my choose not to swap a certain opening he would swap against another master because he believes the strength of that opening move is difficult to grasp for a newbie. (I mean that opening z23 can be effective only if you play this or that second move, but a newbie won't be able to see that).

    Probably the same kind of difference even exist between two master players, somehow… with tactics and intuition of player A, f3 might be strong; while e3 is better for player B. (If such a situations are common in a kind of game, the swap rule advantage of the second player, is not so evident as it seems, but that is of topic)

  • Alan Hensel at 2005-03-29

    It's not work, David, it's play! 8-)

    And, no offense intended, I'm not as interested in helping you, David, as I am in helping newbies and intermediates. Some newbies may still insist that the swap move is unnecessary. But it should be hard to maintain that position after seeing my stats, with that nice warm red glow in the lower right corner.

    What does “the intrinsic value of the first move” mean?

    Imagine if Twixt were solved (for which one must imagine some great leaps forward in Quantum computing). Between two computers that had solved Twixt, each first move would lead to either defeat or victory (or maybe a draw, which would be very interesting). Every square on my graphs would be either bright red or bright blue.

    On the other hand, if both sides played randomly, my graphs would become purple all over.

    Of course, we are somewhere in between. In a sense, it's a measure of our Twixt talents, skills, and intelligence, though it can't be put on a linear scale. I believe it is possible for player A to consistently beat player B, who consistently beats player C, who consistently beats player A. That would be hard to prove statistically, but I'm sure it's possible.

    One thought I had was to look at the second moves for 1.d3. Maybe there's a better response than 2.swap. In a sense, what the charts are telling you about 1.d3 is that 2.swap is better than the average response to 1.d3.

    But then, once it gets out how to best respond to 1.d3, 1.d3's stats would start sinking. So, then, what's the “intrinsic value” that belongs in the d3 square? It depends on peoples' opening move knowledge. But not entirely; if people improve at end-game scenarios that involve races to the corner, or those end-game scenarios become more or less common due to our understanding of mid-game tactics, then the stats of opening moves along or near the crucial diagonals may be affected.

    These stats are only a reflection… I hope they mix things up and keep things interesting.

  • David J Bush ★ at 2005-03-30

    The intrinsic value of any move in Twixt is based on whether it wins, loses, or draws under perfect play. Greater value could be given to a faster win, or a safer win. In a losing position, greater value could be given to a slower loss, or a move which could be regarded as confusing. It is difficult to define what is meant by a confusing move or a safe move, but I believe these terms could be defined independently of who is playing the game.

    So, if Twixt ever were solved, I would imagine a table such as yours could contain more information than just “this should be swapped” or “don't swap” or “draw.” Strictly speaking, any initial move which does not draw is a losing move, but some losing moves may be more confusing than others. In fact, if a player becomes familiar with a specific opening, this might confer a practical advantage in an acutal game.

    I agree your table has useful information, but there's a sort of catch-22 element. Stronger players usually could beat weaker players even if they made the wrong swap choice or played a bad initial move. Two strong players usually follow their own ideas about opening moves and do not experiment with 1.L12 for example. And the result of a game between weaker players probably has low correlation with the initial move. Doesn't it?

    Anyway, the opening move is not the best thing for an inexperienced player to focus on. Tactics are much more important.

  • Alan Hensel at 2005-04-02

    Nevertheless, the tables have a lot of red in them; this suggests that a lot of games are being started with an unbalanced move.

  • ab at 2005-06-06

    Hi!

    where can i get all twixt games played at little golem (as text file for example)?

  • Alan Hensel at 2005-06-06

    I can e-mail them to you. It's about 4MB. Just send me a private message with your e-mail address.

Return to forum

Reply to this topic