java - 蒙特卡罗树搜索不起作用

我目前正在为棋盘游戏编写 AI Hex 。我想使用蒙特卡罗树搜索来做到这一点，并且已经尝试实现它。然而，人工智能会做出令人难以置信的愚蠢(随机) Action ，我不明白为什么它不起作用。

import java.util.ArrayList;
import java.util.Random;

/**
 * Created by Robin on 18.03.2017.
 */
public class TreeNode {


    private static final Random random = new Random();
    private static final double epsion=10e-5;
    protected double nvisits;
    protected double totValue;
    protected int move=-1;

    private HexBoard board;
    protected ArrayList<TreeNode>children ;



    public TreeNode(HexBoard board){
        this.board =board;
    }


    //Copy-Constructor
    public TreeNode(TreeNode treeNode){
        this.nvisits=treeNode.nvisits;
        this.totValue=treeNode.totValue;
        this.move=treeNode.move;
        this.board = new HexBoard(treeNode.board);

    }

    public void update(double value){
        totValue+=value*board.color;
        nvisits++;
    }



    public void expand(){
        assert(children==null);
        children = new ArrayList<>(121-board.moveCount);
        for(int i=0;i<121;i++){
            if(board.board[i]!=HexBoard.EMPTY)
                continue;

                TreeNode newNode = new TreeNode(board);
                newNode.move =i;
                children.add(newNode);

        }
    }

    public void calculateIteration(){
        ArrayList<TreeNode>visited = new ArrayList<>();
        TreeNode current =this;
        visited.add(current);

        while(!current.isLeafNode()){
            current =current.select();
            board.makeMove(current.move);
            visited.add(current);
        }

        //Found a leaf node
        double value;
        if(current.board.getWinner()==0){
            current.expand();
            TreeNode newNode =current.select();
            value =playOut(newNode.board);
        }else{
            value =current.board.getWinner();
        }

        //update all the nodes

        for(int i=1;i<visited.size();i++){
            visited.get(i).update(value);
            board.undoMove(visited.get(i).move);
        }
        visited.get(0).update(value);
    }

    public static int playOut(HexBoard board){
        int winner=0;

        if(board.moveCount==121) {
            winner=board.getWinner();

            return winner;
        }

        //Checking-Movecount vs actual stones on the board


        final double left =121-board.moveCount;
        double probibility =1/left;
        double summe =0;
        double p =random.nextDouble();

        int randomMove =0;
        for(int i=0;i<121;i++){
            if(board.board[i]!=HexBoard.EMPTY)
                continue;

            summe+=probibility;

            if(p<=summe && probibility!=0) {
                randomMove = i;
                break;
            }
        }

        board.makeMove(randomMove);
        winner =playOut(board);
        board.undoMove(randomMove);

        return winner;
    }


    public TreeNode select(){

        TreeNode bestNode=null;
        double bestValue =-10000000;
        for(TreeNode node : children){

            double uctvalue =(node.nvisits==0)?100000:(node.totValue/(node.nvisits)+Math.sqrt((Math.log(this.nvisits))/(2*node.nvisits)));
            uctvalue+=epsion*random.nextDouble();

            if(uctvalue>bestValue){
                bestValue=uctvalue;
                bestNode =node;
            }
        }

        return bestNode;
        ///
    }

    public boolean isLeafNode(){
        return (children==null);
    }
}

我在方法 calcualteIteration() 中的实现正确吗？

我知道这可能不是一个很有吸引力的问题，但我会很感激任何帮助

最佳答案

OP 在问题后的评论中添加了额外信息。这些额外信息的重要部分是，实现了 makeMove() 方法来检查接下来要玩的玩家(以确保棋盘更新正确)。

鉴于该信息，OP 中 select() 的实现是不正确的，因为它在计算 UCT 分数时没有考虑哪个玩家要移动。 UCT 分数由“利用”部分(第一个分数，计算所有先前模拟的平均分数)和“探索”部分(平方根下的部分，对于相对于其父节点而言很少被访问的节点而增加) )。当允许对手下一步行动时，这个等式的剥削部分应该被否定。如果不这样做，AI本质上会假设对手愿意积极帮助AI，而不是假设对手会试图为自己获胜。

关于java - 蒙特卡罗树搜索不起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42913263/

java - 蒙特卡罗树搜索不起作用

上一篇：java - Java 写入文本文件时换行

下一篇：java - 如何找到 ListView 中每行值的总和(使用 arrayList.add() 添加行)