Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress stalled? Currently checking #112

Open
Vandertic opened this issue Apr 6, 2020 · 6 comments
Open

Progress stalled? Currently checking #112

Vandertic opened this issue Apr 6, 2020 · 6 comments

Comments

@Vandertic
Copy link
Member

Recent promoted networks Elo are not very exciting. We will do an exceptional number of promotion matches for the current generation, to see if we can understand what the problem may be.
Stay tuned.

@l1t1
Copy link

l1t1 commented Apr 22, 2020

maybe the structure 12x256 is not as good as 15x192?

@Vandertic
Copy link
Member Author

Going to try an experiment today. There will be quite a lot of fast and small matches to get some feedback: do not panic! :-)

@wonderingabout
Copy link

wonderingabout commented May 17, 2020

it seems we spotted a false positive (38a13344) just in time and avoided a false promotion, this gives me a few ideas on how SAI's promotion strategy could be improved, i'd like to know if you find some of them interesting :slight_smile:

  1. just like we manually did here, instead of always promote, we could change promoted network if 3+ older networks have a 55%+ result against it, and play new reference matches to see if another network of the promotion batch can perform better, and if yes promote it, if no take another network of the promotion batch and repeat, etc

  2. it seems as SAI is getting stronger and stronger, the differences of strength between networks get thiner and thiner. In the past there were a lot of 65% big jumps all the time, but i think its the natural growth of an AI to have smaller and smaller differences. So i think we may need to increase the significancy of the promotion matches, from 50 to 80-100 for example. Since we promote only one network in the end, we could reduce by +/-2 the number of candidates in a cycle to compensate for the extra games (even if we play 50-100 extra games per cycle all in all, it may be a worth investment

  3. an idea i'm not sure about, to increase the significancy of the promotion matches, we could reduce visits to something like 1600 for example (selfplay games are not affected, still 3500). In a 90-95% reliability chance, i guess we should get almost same reliability of strength comparison, but instead we'd produce double or triple the games, what do you think?
    changing visits in promotion matches should also be reasonably safe, especially if the number of games is increased to match it
    (ex: from 50 (70) games at 3500 visits to 120 (140) games at 1750 visits)
    for example we do it in games against lz networks at 1600 visits and we trust in the comparisons data we get
    http://sai.unich.it/viewmatch/29c3a5ed69d23c37a7444c476222824c40239dc48e083f2a6bc316ed78a4f492?viewer=wgo

  4. in a similar idea, we saw in leela zero's journey that the stronger it gets, the longer it gets to get stronger, so far SAI was fine with 5k games promotions, but what if these smalls steps introduce a bigger instability due to promoting many similar networks that may be in fact weaker. I understand there's a concern about diversity, so maybe we could increase the window between 2 promotions to somthing like 7.5k-10k games at first to see (leela zero sometimes had like 50k games to 300k games easily)

what do you think about all this?

@Vandertic
Copy link
Member Author

About promotion strategy, we did many different experiments in 7x7 runs an three experiments in 9x9 runs, so that we could more or less get it right on the first attempt on 19x19.
We proved (but only on 7x7, of course) that no gating works better than gating for SAI. It gives faster growth and higher final strength. And maybe the greater diversity in training data is a plus too.

We are now doing a weak gating strategy, tested with good results in one of the 9x9 runs: always promote a network after the same number of games (5120, currently) but test a small number and choose the best among those. The aim is to avoid picking a very bad network, not to have always a real improvement.
These tests have another important aim, that is to experiment with different hyperparameters. No one really knows what is the right training rate, or for how many steps we should train in each generation, for example. We try different configurations for each block of 3 or 5 networks and then compare the results and choose if there is the need to change anything.
You can find the recent training "recipes" here.

Recently we are hand-picking a little the networks, and doing early reference and comparison tests before promotion, to be sure that we are choosing correctly. In principle, this could be avoided completely: even if we let the procedure choose a network that is a step back in strength, if will be a small step back (no more than 50 Elo I would say) and recovered quickly in the next generations.
It is even possible that this hand-picking could actually be destabilizing the procedure, by choosing greedily instead than exploring a little bit more. So we are basically not even sure it is a good idea, but it seems worth trying, both for communication reasons (it is easier to make people happy as long as they see a positive derivative) and as a patch for a very annoying problem we are facing.

The problem is that at this level of play, the final result of games do not hold enough information to discriminate between similar networks. Of course you can have big steps in strength when you increase blocks or do other major things, but generally similar nets will get around 50% wins against each other. You see that there is not much difference in our results against LZ107 through LZ116.

If you study a match game with a stronger AI, you can identify the key move that lost a game. If you then check again with the AI that made the mistake, you will see that often the move was chosen by chance, in the sense that if you ask two or three times it will often change idea and choose instead the right move. So I believe that the result of match games at this level is 5% a difference in strength and 95% a fair coin toss.

We will need to find a way to compare networks at a finer level of detail: move by move, instead of game by game. We have ideas, but it is a lot of work, so maybe in the future.

In conclusion, we are not changing the weak-gating strategy, in particular, every promoted network will play the same number of games and then be substituted. Apart from this, we will listen to suggestions, but remember that match games results are noisy and hold small information, and that it is not so important to always have a stronger net.

Finally, I believe that while we can play with the visits number in reference, panel and comparison games, for promotion matches we want to keep the visit number equal to the official one for self-plays, because in this way we are judging networks fairly. (Recall that v1 means only policy head, and the higher the visits, the more weight you are putting on value head, so if you change the visits, you are weighting differently the two components.)

@wonderingabout
Copy link

wonderingabout commented May 17, 2020

We proved (but only on 7x7, of course) that no gating works better than gating for SAI. It gives faster growth and higher final strength. And maybe the greater diversity in training data is a plus too.

yes, no gating is great too :)

We are now doing a weak gating strategy, tested with good results in one of the 9x9 runs: always promote a network after the same number of games (5120, currently) but test a small number and choose the best among those. The aim is to avoid picking a very bad network, not to have always a real improvement.

good point

These tests have another important aim, that is to experiment with different hyperparameters. No one really knows what is the right training rate, or for how many steps we should train in each generation, for example. We try different configurations for each block of 3 or 5 networks and then compare the results and choose if there is the need to change anything.
You can find the recent training "recipes" here.

ah i wasn't aware of this, so the data serves many purpose, i see, good initiative indeed

Recently we are hand-picking a little the networks, and doing early reference and comparison tests before promotion, to be sure that we are choosing correctly. In principle, this could be avoided completely: even if we let the procedure choose a network that is a step back in strength, if will be a small step back (no more than 50 Elo I would say) and recovered quickly in the next generations.
It is even possible that this hand-picking could actually be destabilizing the procedure, by choosing greedily instead than exploring a little bit more. So we are basically not even sure it is a good idea, but it seems worth trying, both for communication reasons (it is easier to make people happy as long as they see a positive derivative) and as a patch for a very annoying problem we are facing.

yes handpicking is generally something we'd want to avoid, and just let the AI training be autonomous for sustainability

i admit as a contributor that i'm always looking forward to promotions that increase strength, but i am not expecting it to always improve either (i know it can go up and down normally), but i think it may be a let down for some contributors to see it "regress" in their eyes (even though results show consistent if not rising global strength against lz networks)

i think this point is something SAI would greatly benefit from communicating more about
instead of the "generic" leela zero introduction, some SAI specific words like :

In SAI we don't use gating (55% promotions), our greatest purpose is not to promote a very strong network but to avoid a very bad one

It is expected that the elo goes up and down, but this is important for diversity and health of SAI, and the global strength trend is expected to be on the rise in dozens of generations, as can be seen in comparison matches against Leela Zero

We also use all these matches to experiment with different hyperparameters (lambda, beta, etc.) to better tune SAI future training

some introduction like that in the website with SAI's own words would be super effective and appreciated

i didnt finish reading but i'm commenting this first so i dont randomly lose it :)

@wonderingabout
Copy link

wonderingabout commented May 17, 2020

The problem is that at this level of play, the final result of games do not hold enough information to discriminate between similar networks. Of course you can have big steps in strength when you increase blocks or do other major things, but generally similar nets will get around 50% wins against each other. You see that there is not much difference in our results against LZ107 through LZ116.

If you study a match game with a stronger AI, you can identify the key move that lost a game. If you then check again with the AI that made the mistake, you will see that often the move was chosen by chance, in the sense that if you ask two or three times it will often change idea and choose instead the right move. So I believe that the result of match games at this level is 5% a difference in strength and 95% a fair coin toss.

We will need to find a way to compare networks at a finer level of detail: move by move, instead of game by game. We have ideas, but it is a lot of work, so maybe in the future.

In conclusion, we are not changing the weak-gating strategy, in particular, every promoted network will play the same number of games and then be substituted. Apart from this, we will listen to suggestions, but remember that match games results are noisy and hold small information, and that it is not so important to always have a stronger net.

Finally, I believe that while we can play with the visits number in reference, panel and comparison games, for promotion matches we want to keep the visit number equal to the official one for self-plays, because in this way we are judging networks fairly. (Recall that v1 means only policy head, and the higher the visits, the more weight you are putting on value head, so if you change the visits, you are weighting differently the two components.)

ok i understand it all, big thanks 💯
hopefully other contributors will also be interested about these details you gave

i also think SAI's weak gating approach is working fine and there is no reason to change it

it also seems i was a bit too much focused on strength improvement at every promotion (all the more reason why SAI would greatly benefit from clarifying that on the website with a small introduction)

if significance can be increased for promotion matches (for example by increasing game number a bit), i think it could only be a good thing, as long as it is not done at the expense of other things SAI needs

thanks for all this information 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants