Skip to content

Creating my first test

James Sun edited this page Jan 7, 2018 · 8 revisions

Contributor Guidelines

  1. Keep your patch concise and well-formatted.
  2. Even if a patch passes the Fishtest tests, there it is not a guarantee that it will be merged. Patches that add significant complexity will need to show a large benefit to be considered.
  3. Submit each idea individually as its own patch. (Parameter tweaks which support an idea are OK, except for bugfixes).
  4. Non-functional patches need only to demonstrate a speedup (benchmark numbers remain constant except for explicitly approved bugfixes).

Pull requests

  • Are up-to-date with current master.
  • Ideally consist of a single commit, consider rebasing your branch if it has a long history.
  • Employ a coding style that is similar to surrounding code, including white space.
  • Pass continuous integration testing, which checks for standard conformance and reproducibility of search.
  • The git commit log message, and not only the pull request comment, mentions if the patch is 'No functional change.' or changes the search. For the latter, it should also be indicated for which variant(s) functionality is changed.
  • All patches that require testing (see below), also report the results obtained in multi variant fishtest at STC and LTC. e.g.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,10.00]
Total: 6979 W: 3553 L: 3351 D: 75

Initial setup

Create a fork of the Multi Variant Stockfish GitHub repository and create a git local clone of your forked version, GitHub has a good tutorial on this process. If you need some help with git, git has a great documentation.

Synchronize with the Multi Variant master branch

Before creating a new patch, you have to make sure that your master branch is up to date and has all the newest commits from the Multi Variant Stockfish master branch. You can use a local script, view the following example.

#!/bin/bash
# sync-with-mvsf.sh
# usage: bash sync-with-mvsf.sh 
# synchronize the local git and personal GitHub repository
# with Multi Variant Stockfish GitHub repository

# go to the src directory for Multi Variant Stockfish (edit to reflect your installation)
cd ./GitHub/Stockfish/src

# update the local master branch with the new commits from Multi Variant Stockfish's master
# then and push them to the personal GitHub
git checkout master
git fetch upstream
git reset --hard upstream/master
git push origin master --force

Create your test

Standard tests

  1. In terminal, browse to the Stockfish folder
  2. Create a branch for your work git checkout -b my_branch
  3. Edit the Stockfish source to make your changes
  4. Compile the source with make build ARCH=x86-64
  5. Get the branch signature from ./stockfish bench, which will look like Nodes searched : 4190940
  6. Commit your changes locally with git commit -am "My commit message"
  7. Push your branch to GitHub: git push origin my_branch

Tuning with SPSA

  1. In terminal, browse to the Stockfish folder

  2. Add the official repo as remote with git remote add upstream https://github.com/official-stockfish/Stockfish.git

  3. Create your branch using this command git fetch upstream tune:my_tuning_branch. Change my_tuning_branch to whatever you want to name your branch.

  4. Switch to your newly created branch git checkout my_tuning_branch

  5. Remove const qualifiers from the variables in the source code that you want to tune.

  6. Flag the variables you want to tune with the TUNE macro. For example, if you have:

    int myKing = 10, myQueen = 20; 
    Score myBonus = S(5, 15); 
    Value myValue[][2] = { { V(100), V(20) }, { V(7), V(78) } }; 

    simply add the following line somewhere after it

    TUNE(myKing, myBonus, myValue);

    Type of the variables must be one of int, Score, or Value. They can be arrays of arbitrary dimensions.

    You can have multiple invocations of TUNE in different places. For example the code below is equivalent to the one above:

    TUNE(myKing);
    ...
    TUNE(myQueen, myValue); 
  7. If you have a function that needs to be called after variables are updated, for example void my_post_update() {} simply add its name to the TUNE arguments.

    TUNE(myKing, myBonus, myValue, my_post_update);

    You can add multiple functions and they will be called in the order you add them.

  8. By default, a variable v is tuned in the range 0 and 2 * v. You can change that by adding a custom range as another argument to TUNE as follows:

    TUNE(SetRange(-100, 100), myKing, myQueen); 

    This will change the default range for all the variables. To customize it further, you can set another range for the remaining variables

    TUNE(SetRange(-100, 100), myKing, SetRange(-20, 20), myQueen); 

    Here myKing is tuned in [-100, 100] while myQueen is tuned in [-20, 20].

    To return to range to default use SetDefaultRange

    TUNE(SetRange(-100, 100), myKing, SetDefaultRange, myQueen);

    so that the range for myQueen is default.

    Note: you can also change the range of each parameter manually as you input them to fishtest, as will be shown below.

  9. After you are done specifying what you want to tune and how, compile the source by running make build ARCH=x86-64-modern (run make help to learn more about compiling).

  10. Run the following command ./stockfish. You will notice a comma-separated list printed. Copy that list somewhere.

  11. Get the branch signature from ./stockfish bench, which will look like Nodes searched : 4190940. Here, 4190940 is the signature.

  12. Commit your changes locally by running the command git commit -am "My commit message"

  13. Push your changes to GitHub with git push origin my_tuning_branch

Run your test

Standard tests

  1. Go to http://35.161.250.236:6543/tests/run
  2. Select the variant for which the patch is to be tested.
  3. Fill in test branch passed_pawns and (standard chess) test signature 4190940
  4. Fill in base branch (usually master) and base signature (bench output for master branch)
  5. Check whether base and test signatures are the same, since patches for variants should not affect standard chess functionality.
  6. Set the opening book. If you have already set the variant, you can use the "auto" button to auto-select an opening book that matches the variant. The opening book's name must match one of the files in https://github.com/ianfab/books. Due to differences in FEN formats and variant rules, the books can usually only be used for variant that they are named after.
  7. Make sure your test repo is correct (has no trailing slash, eg. https://github.com/yourname/Stockfish)
  8. Fill in the notes describing your change
  9. Please follow the testing methodology below, unless you have a good reason to do differently.
  10. Click run test.
  11. If your STC test passes, you can move on to a LTC test (see below for parameters).
  12. If your LTC test passes, congratulations! Now, please create a pull request against the Multi Variant Stockfish repository, so your changes can be code reviewed. Please remember, even if a patch passes STC and LTC, it is not guaranteed to be committed.

SPSA tests

  1. Go to http://35.161.250.236:6543/tests/run

  2. Select the variant for which the tuning should be performed.

  3. Set the test branch and base branch both to the tuning branch and fill in the branch signatures.

  4. Paste the list that you copied into the SPSA Parameters list. This is comma separated data for parameter, initial value, minimum, maximum, ck value, rk value in that order. Here you can also make manual changes to min and max values for parameters. Note: if you use default range with initial value 0, the parameter will not be tuned since 2 * 0 and 0 / 2 are both 0.

  5. Leave "SPSA clipping" and "SPSA rounding" at their defaults ("old" and "deterministic") unless you have tried the defaults already and seen poor convergence. These are experimental options which should be used with care; for more details please see https://groups.google.com/forum/#!topic/fishcooking/LT55ExR0m9U

  6. Set the opening book. See the above section for details.

  7. Make sure your test repo is correct (eg. https://github.com/yourname/Stockfish)

  8. Fill in the notes describing your change

  9. Click run test.

  10. When your tuning session is finished and you want to apply the tuned values to your repository, you can use https://github.com/ianfab/fishutils to avoid having to manually copy the values.

Note: Do not modify the number of games in an SPSA test while it's running. This breaks the algorithm.

Testing methodology (or when should I change the standard parameters for tests?)

Definitions:

  • parameters tweak = changing the value of some constants in the code. The generated machine code is the same complexity (aiming at same number of processor instructions).
  • simplification = the number of lines in the source code of Stockfish goes down, or the number of processor instructions in the generated code goes down.
  • bug = a bug which has been discussed in the Fishcooking forum and confirmed as such by the maintainers. Potential bug fix solutions shall be first discussed in the forum, then tested in the framework.
  • hippopotamus = a patch which adds at least 10 lines of code.
  • STC = Short time control (10+0.1)
  • LTC = Long time control (30+0.3)
  • SPRT(x,y) = SPRT test with elo0 = x and elo1 = y

The following recommendations to choose the right parameters can be best understood with the graphical SPRT calculator at http://chess-sprt-calc.azurewebsites.net , which draws nice curves displaying pass-rate and average length of runs for various values of SPRT(x,y). When in doubt, stick with the standard tests parameters.

Standard tests

We use these for almost all our tests. It is our workhorse, designed to commit only robust patches that almost surely work. Our goal is to reduce at minimum possibility of regression and to avoid adding unnecessary complexity. Standard tests are also the fastest tests in the framework.

  1. STC: SPRT(0, 10)
  2. LTC: SPRT(0, 10)

Parameter tweaks tests

For parameters tweaks only, i.e. patches where the only change is in the value of one or more parameters, you can use the following (more sensitive) SPRT parameters. The rationale is that parameter tweaks do not introduces new code and usually a successful parameter tweak falls in the +1-3 ELO ball park.

Among parameter tweaks a special sub-case is the so called union patch or combo patch, that is a bundling of patches that failed SPRT but with positive or near positive score. Sometime retesting the union as a whole passes SPRT. Due to the nature of the approach and because of each individual patch failed already, a union has some constraints:

  • Maximum 2 patches per union
  • Each patch shall be trivial, like a parameter tweak. Patches that add/remove a concept/idea/feature shall pass individually.
  1. STC: SPRT(0, 8)
  2. LTC: SPRT(0, 8)

No-regression tests

These must be used for all simplifications, even one-liners, to test if the removal of the code is detrimental to Stockfish strength. We try to reject an ELO loss and even a neutral patch can fail -- nevertheless, because the code under test is simpler/smaller than original, we don't require the stricter standard mode. These tests are also used for bug fixes and other special cases, but only after being discussed and approved in advance to avoid people testing with no-regression mode becoming their preferred toy, instead of using the stricter standard mode.

  1. STC: SPRT(-10, 5)
  2. LTC: SPRT(-10, 5)

Non-functional changes

If your patch is a non functional change, you usually do not need to run it through fishtest.

Code refactoring

Send a pull request directly, explaining the rationale of your changes in the pull request.

Speed optimization

Test it on your machine, by running stockfish bench. It is recommended to run it several times, and compute a mean and stdev to see if the improvement is statistically significant. For Linux/Windows(MSYS2) you can use this script:

#!/bin/bash
if [[ $# -ne 3 ]]; then
  echo "usage:" $0 "base test n_runs"
  echo "example:" $0 "./stockfish_base ./stockfish_test 10"
  exit 1
fi

base=$1
test=$2
n_runs=$3

# temporary files initialization
cat /dev/null > base000.txt
cat /dev/null > test000.txt
cat /dev/null > tmp000.txt

# preload of CPU/cache/memory
($base bench >/dev/null 2>&1)&
($test bench >/dev/null 2>&1)&
wait

# bench loop: SMP bench with background subshells
for k in `seq 1 $n_runs`;
  do
    printf "run %3d /%3d\n" $k $n_runs

    # swap the execution order to avoid bias
    if [ $((k%2)) -eq 0 ];
      then
        ($base bench >/dev/null 2>> base000.txt)&
        ($test bench >/dev/null 2>> test000.txt)&
        wait
      else
        ($test bench >/dev/null 2>> test000.txt)&
        ($base bench >/dev/null 2>> base000.txt)&
        wait
    fi
  done

# text processing to extract nps values
cat base000.txt | grep second | grep -Eo '[0-9]{1,}' > base001.txt
cat test000.txt | grep second | grep -Eo '[0-9]{1,}' > test001.txt

for k in `seq 1 $n_runs`;
  do
    echo $k >> tmp000.txt
  done

printf "\nrun\tbase\ttest\tdiff\n"
paste tmp000.txt base001.txt test001.txt | awk '{printf "%3d  %d  %d  %+d\n", $1, $2, $3, $3-$2}'
paste base001.txt test001.txt | awk '{printf "%d\t%d\t%d\n", $1, $2, $2-$1}' > tmp000.txt

# compute: sample mean, 1.96 * std of sample mean (95% of samples), speedup
# std of sample mean = sqrt(NR/(NR-1)) * (std population) / sqrt(NR)  
cat tmp000.txt | awk '{sum1 += $1 ; sumq1 += $1**2 ;sum2 += $2 ; sumq2 += $2**2 ;sum3 += $3 ; sumq3 += $3**2 } END {printf "\nbase = %10d +/- %d\ntest = %10d +/- %d\ndiff = %10d +/- %d\nspeedup = %.6f\n\n", sum1/NR , 1.96 * sqrt(sumq1/NR - (sum1/NR)**2)/sqrt(NR-1) , sum2/NR , 1.96 * sqrt(sumq2/NR - (sum2/NR)**2)/sqrt(NR-1) , sum3/NR  , 1.96 * sqrt(sumq3/NR - (sum3/NR)**2)/sqrt(NR-1) , (sum2 - sum1)/sum1 }'

# remove temporary files 
rm -f base000.txt test000.txt tmp000.txt base001.txt test001.txt

Then send us a pull request. We will verify the speed up, for 3 reasons:

  1. The speedup needs to be statistically significant and not just random noise.
  2. The speedup needs to be confirmed on different machines. Sometimes a speedup on one machine is a slowdown on another.
  3. The speedup needs to be put in perspective with the code changes. If the code changes are invasive and/or ugly, only to achieve a small speedup, we will probably not accept the patch. This is subject to our appreciation.

If the above steps are not enough to clarify the opportunity to commit the patch, then the patch will go under a special fishtest SPRT[0, 10] test at STC. The rationale is that a speed-up is totally comparable to a normal patch: it adds complexity with the aim to improve ELO, so it makes sense to test under the same conditions. Because speed optimization is a non functional change, STC is enough, no need to further run the LTC.