Analysis of AlphaZero training data [D]

I am trying to train an AlphaZero model for Othello on a 6x6-board. Having been warned that too little exploration during data generation can lead to models being overconfident and trapped in some tight region of the search tree, I started with the value c_puct = 4.0, and then reduced this to 3.5 after a few generations. Also, I added fairly peaked Dirichlet noise (alpha = 0.15) to the prior predictions at the root of each tree search, with the proportion epsilon = 0.25. The temperature was initially set to 1.0, and then reduced to 0.8 after 20 generations.