I am the journeyer from the valley of the dead Sega consoles. With the blessings of Sega Saturn, the gaming system of destruction, I am the Scout of Silence… Sailor Saturn.

  • 0 Posts
  • 20 Comments
Joined 1 year ago
cake
Cake day: June 29th, 2023

help-circle


  • Here are the results of these three models against Stockfish—a standard chess AI—on level 1, with a maximum of 0.01 seconds to make each move

    I’m not a Chess person or familiar with Stockfish so take this with a grain of salt, but I found a few interesting things perusing the code / docs which I think makes useful context.

    Skill Level

    I assume “level” refers to Stockfish’s Skill Level option.

    If I mathed right, Stockfish roughly estimates Skill Level 1 to be around 1445 ELO (source). However it says “This Elo rating has been calibrated at a time control of 60s+0.6s” so it may be significantly lower here.

    Skill Level affects the search depth (appears to use depth of 1 at Skill Level 1). It also enables MultiPV 4 to compute the four best principle variations and randomly pick from them (more randomly at lower skill levels).

    Move Time & Hardware

    This is all independent of move time. This author used a move time of 10 milliseconds (for stockfish, no mention on how much time the LLMs got). … or at least they did if they accounted for the “Move Overhead” option defaulting to 10 milliseconds. If they left that at it’s default then 10ms - 10ms = 0ms so 🤷‍♀️.

    There is also no information about the hardware or number of threads they ran this one, which I feel is important information.

    Evaluation Function

    After the game was over, I calculated the score after each turn in “centipawns” where a pawn is worth 100 points, and ±1500 indicates a win or loss.

    Stockfish’s FAQ mentions that they have gone beyond centipawns for evaluating positions, because it’s strong enough that material advantage is much less relevant than it used to be. I assume it doesn’t really matter at level 1 with ~0 seconds to produce moves though.

    Still since the author has Stockfish handy anyway, it’d be interesting to use it in it’s not handicapped form to evaluate who won.











  • OpenAI is in the position of constantly working on newer bigger shinier models, while saying every model they do release will be the one.

    AGI is just around the corner and we promise 3, 4, 4-Scarlett-Johansson, o1, 5 is the one that will be good enough to help you with your homework and legal letters and medical questions and remove the loneliness from your life!

    Would be funny if they manage to pop the bubble in the process of trying to go for-profit.


  • Sshh don’t tell the investors, I’ve managed to be paid for a decade by updating my code to work with other people updating their code to work with other people updating their code, all without actually doing anything new.

    We as a profession have developed a careful balancing act where we’re always busy doing nothing. If the balance was off just a little someone might actually have to think about new features instead of, say, migrating from CGI to PHP to JavaScript to jQuery to AngularJS to Angular to React to ???, rejecting LLM generated changes, “fixing” the same bug year after year, or reverting reverts of reverts of reverts of reverts of changes.

    And thinking is hard.







  • So today I learned there are people who call themselves superforcasters®. Neat!

    The superforecasters® have had a melding of the minds and determined that covid-19 was 75% likely to not be a lab leak. Nifty! This is useless to me!

    Looking at the website of these people with good enough judgement to call themselves “Good Judgement”, you can learn that 100% of superforecasters® agree that there will be less than 100 deaths from H5N1 this year. I don’t know much about H5N1 but I guess that makes sense given that it’s been around since 1996 and would need a mutation to be contagious among humans.

    I found one of the superforecaster®-trainee discussion topics where they reveal some of the secrets to their (super)forecasting(®)-trainee instincts

    I have used “Copilot” LLM AI to point me in the right direction. And to the point of the LLM they have been trained not to give a response about conflict as they say they are trying to permote peace instead of war using the LLM.

    Riveting!

    Let’s go next to find out how to give up our individuality and become a certified superforecaster® hive brain.

    To minimize the chance that outstanding accuracy resulted from luck rather than skill, we limited eligibility for GJP superforecaster status to those forecasters who participated in at least 50 forecasting questions during a tournament “season.”

    Fans of certain shonen anime may recognize this technique as Kodoku – a deadly poison created by putting a bunch of insects in a jar until only one remains:

    100 species of insects were collected, the larger ones were snakes, the smaller ones were lice, Place them inside, let them eat each other, and keep what is left of the last species. If it is a snake, it is a serpent, if it is a louse, it is a louse. Do this and kill a person.


    “But what’s the catch Saturn”? I can hear you say. “Surely this is somehow a grift nerds find or a way to fleece money out of governments”.

    Nonono you’ve got the completely wrong idea. Good Judgement offers a 100$ Superforecasting Fundamentals course out of the goodness of their heart I’m sure! I mean after all if they spread Superforecasting to the world then their Hari-Seldon-Esque hivemind would lose it’s competitive edge so they must not be profit motivated.

    Anyway if you work for the UK they want to hear from you:

    If you are a UK government entity interested in our services, contact us today.

    Maybe they have superforecasted the fall of the british empire.


    And to end this, because I can never resist web design sneer.

    Dear programmers: if you apply the CSS word-break: break-all; to the string “Privacy Policy” it may end up rendered as “Pr[newline]ivacy Policy” which unfortunately looks pretty unprofessional :(