An analysis of OSKI - to help determine Player #2s edge Word games

31 replies. Last post: 2010-11-22

Reply to this topic Return to forum

An analysis of OSKI - to help determine Player #2s edge
  • Ed Collins at 2010-10-18

    OSKI players may enjoy reading a short article I wrote titled, “How big of an advantage does Player #2 have?”

    I was going to post the entire article here, but then I decided to just post it on my website.

    The website line is:

    http://www.edcollins.com/oski.htm

  • Aganju at 2010-10-18

    Nice! So the advantage is very strong in the sense that it gives 95+% to player 2, but only by ~1.1 points or so. It might balance the game well if we just start player 2 with 1 point!

    For those who are lazy, here the link as a link.

  • Ed Collins at 2010-10-18

    Player 2's edge is much closer to a full two point advantage, than 1.1 points.

    After seeing the data, given the choice of playing first and receiving a 2 point head start, or playing second and giving up 2 points, I'd definitely chose the latter.

    Thanks for the 'lazy link.' I should have thought of doing that.

  • kingofthebesI at 2010-10-19

    Very good and interesting thankyou.

    It might be interesting to run unlimited tests on the least promising start words from the player #1 perspective. Then the most unfair words can be eliminated from being chosen as start position or given some kind of balancing factor!

  • Charlo at 2010-10-19

    Thank you for posting this. This shows pretty conclusively that player 2 has a nearly insurmountable advantage.

    One thing that would be interesting is to rewrite the program so that when there are multiple candidate plays of the same length, the one that uses the least-common letter (like X or U) is chosen. I wonder how strongly this “defensive” style of play would affect the score of the game.

    Another thing to try would be to give player #1 a two-point handicap to start with and see if that evens out the winning percentages any.

  • Ed Collins at 2010-10-19

    Charlo,

    Believe it or not, I have also thought about both of your suggestions.

    The second idea we should already be able to determine. For example, look at the 9-letter column. Player #2 won by exactly one point 55 times. (Out of the 262 total games played.) So all 55 of these games would now be Player #1 wins. Player #2 won by exactly two points 45 times. So all of these games would now be draws. We can then redo the math and come up with new win percentages.

    Idea #1. When I'm playing a game, I've often an uncommon letter, also thinking that this letter should give my opponent less options on his turn. I could try that with just ONE of the players, to see if that results in a higher win percentage for that player. If I try it for both players, I suspect the final total score might be lower, but the win percentages might stay about the same.

    I can make that change (it's a simple change to make) later this evening, after I get off work, and then rerun the program overnight.

  • MarleysGhost at 2010-10-19

    Ed, does Catherine do more than a 1-ply lookahead? If so, how many ply? Alpha-beta? MCTS?

  • FatPhil at 2010-10-19

    Ed - new stats request - number of words available for the last-but-one move, and for the final move.

  • Ed Collins at 2010-10-19

    Typo above: Idea #1. When I'm playing a game, I've often considered using an uncommon letter…

    MG: No, As mentioned somewhere in the article, for the moment there is no lookahead at all, other than just the current board position. Catherine finds all possible plays for that position and that's it.

    FP: Acknowledged. Good idea. In fact, it might be interesting to see how many words are available at EACH move. For example, the peak number might be around move 10, when large words are now possible and several choices of spaces still remain.

    By comparison, the first few moves probably have few words available, when there are just a few letters are on the board. Also, the last few moves should also see a reduction, simply because there are less spaces now (choices) to select.

    If the peak number of moves comes when it's Player #2s turn, that also might help, a tiny tiny bit, to explain why Player #2 wins such a large percentage of the time.

  • FatPhil at 2010-10-19

    Maybe measure the 'perimeter' of the used cells?

  • Ed Collins at 2010-10-19

    If anyone's curious…

    Letter Frequency Table

    SOWPODS word list

    entire word list

    e 275,582

    s 234,672

    i 220,483

    a 188,703

    r 170,521

    n 163,637

    o 161,752

    t 159,471

    l 127,865

    c 98,230

    d 81,731

    u 80,636

    p 73,286

    m 70,700

    g 67,910

    h 60,702

    b 44,953

    y 39,772

    f 28,930

    v 22,521

    k 22,075

    w 18,393

    z 11,772

    x 6,852

    q 4,104

    j 4,010

    267,751 words

    2,439,263 letters

    ---—————————

    Letter Frequency Table

    SOWPODS word list

    9-letter words and less

    e 132,191

    s 108,276

    a 91,481

    i 89,801

    r 80,665

    o 71,121

    n 68,965

    t 66,787

    l 61,424

    d 44,798

    c 41,415

    u 40,878

    g 34,206

    p 33,055

    m 32,813

    h 28,280

    b 24,741

    y 19,494

    f 16,263

    k 15,740

    w 12,791

    v 10,667

    z 5,407

    x 3,425

    j 2,744

    q 2,038

    155,302 words

    1,139,466 letters

  • Ed Collins at 2010-10-19

    Argh. So much for my nice, neat columns.

  • Richard Malaschitz ★ at 2010-10-20

    I develop own software for testing and I have very similar results:

    1000 games (max 9 letter)

    +66

    =94

    -840

    avg score diff: 2.11

    The same test, with another language (german) had the same result. Then I tried play on triangular shape. Triangle with 15 fields (6 moves per player):

    +88

    =143

    -769

    avg score diff: 1.59

    Triangle with 21 fields (9 moves per player):

    +132

    =137

    -731

    avg score diff: 1.69

  • Ed Collins at 2010-10-20

    I made two small changes to Catherine and I'm rerunning the program, currently with a vocabulary of all 8-letter words or less. (Complete game played in about two minutes.)

    One, whenever it is Player #1s turn, when given of choice of two or more top-scoring moves, Player #1 now chooses the word using the LEAST popular letter. Example: If staKes and staTes are both valid plays, staKes will be chosen because it adds a K to the board, rather than a T.

    The idea is that this should make it slightly more difficult for Player #2 to find a reply on Player #2's next move.

    This won't change the mechanics of the game - in a real game Player #2 could ALSO adopt this strategy - but myself and Charlo were curious to see how MUCH it helps.

    The least popular letter is determined by the frequency it appears in all 3 to 8-letter words in SOWPODS.

    Preliminary results are very good. After 260 games played, Player #1's win percentage is about 14% and Player #2's win percentage is almost 66%. This is MUCH closer than the 4% and 84% numbers arrived on the first run, before this change.

    Two, for fun, I'm also keeping track of the average number of moves per game, per each turn. So far, Move #9 (Player #1's fifth turn) is leading the pack. It might not surprise anyone that the numbers appear to resemble the common bell curve.

    Full data after at least 800 games have been played.

  • Ed Collins at 2010-10-21

    Okay, the results are in.

  • Hjallti ★ at 2010-10-22

    I would suggest a komi of 1.5 if the gain is between 1 and 2.

    It would make the game drawless and pretty even.

    Would the results be independent of language. I am the current dutch leader but I didn't check yet.

  • kingofthebesI at 2010-10-22

    Making a game drawless for the sake of it is wrong. The only reason to deliberately make a game drawless is if the players are going to get money from a tournament sponsor because of the extra drama that is potentially created!

    GWG is flawed because of grids with negative value and that bidding isn't integer.

    Why should someone be able to play optimally and lose? Admittedly starting EV of +/-~0.5 would be an improvement over starting EV ~+/-2 put it is still unfair.

    It would be great to pick the most relevant start positions garnered from the initial test and run unlimited as 10,11,12 letter words I think would change things.

    Why not an auction for who goes second that would start at -1 (to allow for the freak possibility that there is a grid favouring the starter)? I don't see the bidding going above 3 so it wouldn't slow the game that much (if that was a concern)!

  • kingofthebesI at 2010-10-22

    Making a game drawless for the sake of it is wrong. The only reason to deliberately make a game drawless is if the players are going to get money from a tournament sponsor because of the extra drama that is potentially created!

    GWG is flawed because of grids with negative value and that bidding isn't integer.

    Why should someone be able to play optimally and lose? Admittedly starting EV of +/-~0.5 would be an improvement over starting EV ~+/-2 put it is still unfair.

    It would be great to pick the most relevant start positions garnered from the initial test and run unlimited as 10,11,12 letter words I think would change things.

    Why not an auction for who goes second that would start at -1 (to allow for the freak possibility that there is a grid favouring the starter)? I don't see the bidding going above 3 so it wouldn't slow the game that much (if that was a concern)!

  • beppi at 2010-10-22

    Very interesting, Ed, thank you!

    Maybe some limitation to the white's power could be useful, like in WYPS. For example, white could have max-4 letters word at his very first move. In that case, the advantage of having one more useful letter on the board would be deleted.

    Would it be long to make this test, Ed?

    And Thanks!

  • Ed Collins at 2010-10-22

    beppi: I think I already know what the test results would be.

    Almost every single game, White (Player #2) is able to play a five-letter move on his first move. (As we all know, Player #1 can only play a four-letter move on his/her first turn.)

    The number of times Player #2 was not able to play a five-letter move, of ALL the games played, I could probably count on one hand… and I wouldn't even need all of my fingers.

    Thus, if White were not permitted to play a five-letter word, (on Move #2) White's average final score would drop exactly one point.

    But since White only wins by one point about 23 to 29% of the time, and since White's average margin of victory is much greater than one point, White would still have an edge.

    Of course, we ALSO have to consider and factor in how much one less letter on the board would affect Black's SECOND move… and then WHITE's second move, etc. But my gut tells me both of these moves, and future moves, would be affected about equally, canceling each other out.

    But let's find out… that change took two seconds to implement. I'm running the program now. I can give you the results of approximately 300 games ten hours from now, after I get home from work this evening. (For this run I'm removing the 'defensive strategy' mode described earlier.)

    Whatever the results, I'm not crazy about the idea… it just doesn't seem… oh, I don't know… what's the word? Elegant? Instead, I would rather give Black a one point lead initially, and I think the results will show it's the same end result. This way White would have no restrictions.

    Hjallti: I don't speak Dutch and I'm completely unfamiliar with the language, but yes, I'm pretty sure the results would be very very similar for any language. White has a big advantage after just three moves that is almost impossible to overcome, and I think that almost certainly holds true for most languages.

  • Ed Collins at 2010-10-22

    Beppi, your results are in.

    No big surprises.

    (Player #2 only allowed to play, at best, a 4-letter word on his first move.)

    Number of Games Played: 300

    Number of wins by Player #1: 28 (9.33%)

    Number of wins by Player #2: 225 (75.00%)

    Number of Draws: 47 (15.67%)

    Average Final Score Player #1: 53.70

    Average Final Score Player #2: 55.06

    Number of Halftime Wins by Player #1: 13 (4.33%)

    Number of Halftime Wins by Player #2: 239 (79.67%)

    Number of Halftime Draws: 48 (16.00%)

    Average Halftime Score by Player #1: 23.28

    Average Halftime Score by Player #2: 24.56

    Number of wins by Player #1 by exactly 1 point : 20 (71.43%)

    Number of wins by Player #1 by exactly 2 points: 6 (21.43%)

    Number of wins by Player #1 by exactly 3 points: 1 (3.57%)

    Number of wins by Player #1 by exactly 4 points: 1 (3.57%)

    Number of wins by Player #1 by 5 points or more: 0 (0.00%)

    Number of wins by Player #2 by exactly 1 point : 88 (39.11%)

    Number of wins by Player #2 by exactly 2 points: 77 (34.22%)

    Number of wins by Player #2 by exactly 3 points: 37 (16.44%)

    Number of wins by Player #2 by exactly 4 points: 19 (8.44%)

    Number of wins by Player #2 by 5 points or more: 4 (1.78%)

    Number of games Player #1 did better on final move: 57 (19.00%)

    Number of games Player #2 did better on final move: 51 (17.00 )

    Number of games final move of each player was equal: 192 (64.00%)

    Summary:

    The new 9.33% win rate for Player #1 is an increase from 4.95%.)

    Player #2's win rate has been reduced from 84.62% to 75%.

    Draws have increased from 11% to 15.67%

    The new average final score difference (margin of victory) has been reduced for Player #2 from 2.02 points to just 1.36 points.

    And naturally, the average halftime score is much closer.

  • beppi at 2010-10-25

    Wow, I thought that there was a much better result for black.

    Thanks for the test, Ed. No way.

  • FatPhil at 2010-10-25

    So, what should this function be for Oski:

    ->

    to improve RoRoRo's predictions.

    Evolution of such functions usually go via intermediate steps such as “pretend P1 gets a ratings adjustment of X, and apply the normal formula”. That would probably work fairly well here.

  • beppi at 2010-11-08

    Descending from your statistics, it appears that something should be done to make the tournaments even.

    Currently in a 4 player tournament, one gamer has 3 matches, 2 as white and 1 as black, that is a big disadvantage. Of course, it doesn't happen in 5-players tourmaments.

    Being OSKI a rather short game, I think that considering a game to be composed of two rounds, and the score to be the sum of the single results, it would be an important change.

  • Charlo at 2010-11-09

    I agree with beppi. I think having two-game matches and determining the winner from the total difference in scores would be a good way to run tournaments until OSKI is made more even.

  • Richard Malaschitz ★ at 2010-11-18

    I tried this simulation:

    - Triangular board - 21 fields

    - Play with limited number of 30 pieces: QJXZVWKFYBHMPGUCDLTNORIASEEEAS

    - There is no restriction for used words

    - Program prefer to use more frequent letters (E, A, S, I, …)

    - Program prefer for first move play “in line” for decrease opponents possibilities

    - Program is using only 50000 most frequent English words

    Results after 1000 games:

    First player 438 wins

    Second player 426 wins

    136 Draws

    Average score: -0.086

    There were 42 games finished before last move (program was not able to find word) - 41 wins for first player and 1 win for second player.

  • MarleysGhost at 2010-11-18

    Did the simulation do any lookahead? Did it choose randomly among words of the same length?

  • Richard Malaschitz ★ at 2010-11-18

    No lookahead. When is the same length, more frequent letter is used. If is random word chosen than results after 1000 games are:

    First player 365 wins

    Second player 473 wins

    162 Draws

    Average score: -0.307

  • Carroll at 2010-11-22

    Richard, why use more frequent letter first?

    What is average score, why is it negative in both your posts despite reversed situation between 1. Second player wins less and 2. Second player wins more?

    Couldn't anyone do look ahead for last moves (to prevent player2 to get a last move) as the tree must be thin there?

  • kingofthebesI at 2010-11-22

    Using the more frequent letters first stops your opponent using them, making it hard for them to make big words!

  • Carroll at 2010-11-22

    I don't quite agree, I you put down an 's' this letter may be used by your opponent too… opening a whole universe of words!

    But I did not play any Oski game, so maybe I miss something.

Return to forum

Reply to this topic