Using KaTrain with a CoreML-KataGo on Apple Silicon

Since I am using macOS and Apple Silicon I wanted to use KaTrain with the latest KataGo 1.15.3 and CoreML versions of the strongest stable b18 and b28 models. That was a lot of compiling, converting and fiddling around.

Using CoreML is much better than OpenCL because the neural network can run much closer to the Apple Neural Engine (ANE), so you get a lot more playouts with CoreML.

ChinChangYang has forked KataGo on GitHub and improved it so it can run CoreML models. He has provided instructions on how to use it:

https://github.com/ChinChangYang/KataGo/blob/metal-coreml-stable/docs/CoreML_Backend.md

I wanted to keep completely isolated environments for each version of katago and each network and decided that I want them to live in ~/Documents/apps/.

mkdir -p ~/Documents/apps/katago-1.15.3-coreml-b18c384nbt-s9996604416

After compiling KataGo, I copied the resulting files so the directory contents look like this:

KataGoModel19x19fp16.mlpackage
KataGoModel19x19fp16v14s9996604416.mlpackage
coreml_analysis.cfg
kata1-b18c384nbt-s9996604416-d4316597426.bin.gz
katago

KataGoModel19x19fp16.mlpackage is a symlink to KataGoModel19x19fp16v14s9996604416.mlpackage.

Then in KaTrain, “General and Engine Settings”, I left all “Path to…” fields empty and in “Override Engine Command”, used:

/Users/marcel/Documents/apps/katago-1.15.3-coreml-b18c384nbt-s9996604416/katago analysis -model /Users/marcel/Documents/apps/katago-1.15.3-coreml-b18c384nbt-s9996604416/kata1-b18c384nbt-s9996604416-d4316597426.bin.gz -config /Users/marcel/Documents/apps/katago-1.15.3-coreml-b18c384nbt-s9996604416/coreml_analysis.cfg -override-config homeDataDir=/Users/marcel/.katrain

Using a b28 model

I have ChinChangYang’s fork of KataGo in ~/3rd/go/. So I switched to the right branch:

cd ~/3rd/go/KataGo
git switch metal-coreml-stable

Then I also followed the instructions on the page linked above.

When installing miniconda, it wrote some code to ~/.bash_profile but since I use zsh I copied the code to a separate script which I called conda-shell-setup:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/homebrew/Caskroom/miniconda/base/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/homebrew/Caskroom/miniconda/base/etc/profile.d/conda.sh" ]; then
        . "/opt/homebrew/Caskroom/miniconda/base/etc/profile.d/conda.sh"
    else
        export PATH="/opt/homebrew/Caskroom/miniconda/base/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

Then after starting a new shell I sourced this script with . conda-shell-setup.

Between the conda create and conda activate steps I had to run conda init; conda told me about this.

From https://katagotraining.org/networks/ I downloaded the “Raw Checkpoint” file for the strongest confidently-rated b28 network. I unzipped it to get the .ckpt file.

Then I created the binary model and the CoreML model following the instructions, just using a different path to the python program. i.e. ~/3rd/go/KataGo/python/...

Assuming that the downloaded model.ckpt file was in the current directory, you now have the following files:

KataGoModel19x19fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel
KataGoModel19x19fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin
KataGoModel19x19fp16.mlpackage/Manifest.json
model/log.txt
model/metadata.json
model/model.bin.gz
model.ckpt

I reorganized the files:

mv model/model.bin.gz .
rm -rf model.ckpt model

I wanted to keep the b18 and b28 networks side-by-side so I created a new directory for the networks, the katago executable and the analysis config file.

mkdir ~/Documents/apps/katago-1.15.3-coreml-b28c512nbt-s7709128960

I copied over katago and the analysis config files from the b18-related directory:

cp katago-1.15.3-coreml-b18c384nbt-s9996604416/katago katago-1.15.3-coreml-b18c384nbt-s9996604416/coreml_analysis.cfg katago-1.15.3-coreml-b28c512nbt-s7709128960

Then I copied the model files generated above to this directory. So now it looked like:

KataGoModel19x19fp16.mlpackage/Data/com.apple.CoreML/model.mlmodel
KataGoModel19x19fp16.mlpackage/Data/com.apple.CoreML/weights/weight.bin
KataGoModel19x19fp16.mlpackage/Manifest.json
coreml_analysis.cfg
katago
model.bin.gz

Note that unlike in the b18-directory, here we don’t need a symlink and the binary model is called just model.bin.gz.

Now KaTrain’s “Override Engine Command” needs to look like this:

/Users/marcel/Documents/apps/katago-1.15.3-coreml-b28c512nbt-s7709128960/katago analysis -model /Users/marcel/Documents/apps/katago-1.15.3-coreml-b28c512nbt-s7709128960/model.bin.gz -config /Users/marcel/Documents/apps/katago-1.15.3-coreml-b28c512nbt-s7709128960/coreml_analysis.cfg -override-config homeDataDir=/Users/marcel/.katrain

By changing that KaTrain setting it’s easy to switch between the b18 and the b28 model.

Conclusions

The b28 network is a lot slower than the b18 network at equal playouts.

I did a full analysis of 500 playouts per move for one of my recent tournament games that has 258 moves. With the b18 network this took 10 minutes and 25 seconds. With the b28 network it took 17 minutes and 40 seconds.

The win rate graphs were very similar between the b18 and the b28 networks. I did not check every move as to whether the recommended move or sequence or the point loss was the same between the two networks.

All this is on a MacBook Air M1 from 2020.

Overall I think it’s not worth using the b28 network for now. The b18 analysis is just as valuable and a lot faster.