java - 为什么 Alpha/Beta 剪枝对我的 MiniMax 算法没有影响?

标签 java algorithm minimax alpha-beta-pruning

首先,我很抱歉标题有点不正确,我只是不想让它有 30 个字那么长。 当我将它应用到我的 TicTacToe 游戏时,我实现的 alpha/beta 修剪极大地减少了评估的数量,请参见下面的内容。

每对评估计数均使用与输入相同的游戏状态进行测量。

Pruning vs no pruning

当我想对我一直在研究的玩神经网络的西洋跳棋实现修剪时,问题就出现了。这就是整个事情的目标,我刚刚制作了 TicTacToe 游戏来试验 MiniMax + Alpha/Beta,因为我以前从未处理过这些算法。

这是使用神经网络进行的同类实验。

checkers minimax evals checkers minimax with pruning evals

现在是代码(西洋跳棋,如果您想看一下 TicTacToe 版本,请告诉我,尽管它们几乎相同)。

我将只粘贴一次这两种方法的开头,因为它们完全相同,我将显示两个签名,因为它们略有不同。

Small note to make the code more clear.

Board is the object which keeps track of pieces, available moves, which turn it is, if the game has been won/drawn etc...

Move is the object which contains all information pertinent to moves, when I make the clone as the first line of the method I simply make a clone of the given board and the constructor applies the given move to it.

private double miniMax(Board b, Move m, int depth) {

private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {

两种方法的开始:

Testboard clone = new Testboard(b, m);
    // Making a clone of the board in order to
    // avoid making changes to the original one

    if (clone.isGameOver()) {

        if (clone.getLoser() == null) 
            // It's a draw, evaluation = 0
            return 0;   

        if (clone.getLoser() == Color.BLACK)
            // White (Max) won, evaluation = 1
            return 1;

        // Black (Min) won, evaluation = -1
        return -1;  
    } 

    if (depth == 0) 
        // Reached the end of the search, returning current Evaluation of the board
        return getEvaluation(clone);

常规 MiniMax 延续:

    // If it's not game over
    if (clone.getTurn() == Color.WHITE) {

        // It's white's turn (Maxing player)
        double max = -1;
        for (Move move : clone.getMoves()) {
            // For each children node (available moves)
            // Their minimax value is calculated
            double score = miniMax(clone, move, depth-1);
            // Only the highest score is stored
            if (score > max)
                max = score;
        }
        // And is returned
        return max;
    } 

    // It's black's turn (Min player)
    double min = 1;
    for (Move move : clone.getMoves()) {
        // For each children node (available moves)
        // Their minimax value is calculated
        double score = miniMax(clone, move, depth-1);
        // Only the lowest score is stored
        if (score < min)
            min = score;
    }
    // And is returned
    return min;
}

带有 Alpha/Beta 剪枝延续的 MiniMax:

    // If it's not game over
    if (clone.getTurn() == Color.WHITE) {

        // It's white's turn (Maxing player)
        for (Move move : clone.getMoves()) {

            // For each children node (available moves)
            // Their minimax value is calculated                
            double score = alphaBeta(clone, move, depth-1, alpha, beta);

            if (score > alpha)
                // If this score is greater than alpha
                // It is assigned to alpha as the new highest score
                alpha = score;
            if (alpha >= beta)
                // The cycle is interrupted early if the value of alpha equals or is greater than beta
                break;
        }
        // The alpha value is returned
        return alpha;
    } 

    // It's black's turn (Min player)
    for (Move move : clone.getMoves()) {

        // For each children node (available moves)
        // Their minimax value is calculated            
        double score = alphaBeta(clone, move, depth-1, alpha, beta);

        if (score < beta)
            // If this score is lower than beta
            // It is assigned to beta as the new lowest score
            beta = score;
        if (alpha >= beta)
            // The cycle is interrupted early if the value of alpha equals or is greater than beta
            break;
    }
    // The beta value is returned
    return beta;
}

老实说,我被困住了,我不确定我该怎么做才能弄清楚发生了什么。我已经在几个不同甚至随机生成的神经网络上尝试过 MiniMax+A/B,但在评估次数方面我从未见过任何改进。我希望这里的人能够对这种情况有所了解,谢谢!

最佳答案

感谢@maraca 帮我解决了这个问题,因为他只回复了一条评论,所以要回答我自己。

我发布的代码没有任何问题,问题出在搜索达到深度限制后我使用的评估函数。

我使用的是一个仍然未经训练的神经网络,它本质上只是吐出随机值,这迫使 MiniMax+A/B 遍历所有节点,因为与答案不一致,结果证明这是必要的修剪发生。

关于java - 为什么 Alpha/Beta 剪枝对我的 MiniMax 算法没有影响?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42988523/

相关文章:

java - 从 DOM 中删除节点

用于短文本的 JavaScript 字典编码器

algorithm - 如果该行或列包含 0,则将矩阵中的每个单元格设置为 0

prolog - Minimax 在 "Prolog Programming for Artificial Intelligence"中实现 - min_to_move/1 和 max_to_move/1 是什么?

c++ - 使用 Alpha Beta 剪枝提高此 MiniMax 的性能

algorithm - 为 Tic Tac Toe 的极小极大算法找出可能的走法

java - 子类中的自制方法显示语法错误

java - ThreadLocal 与新局部变量相比的真正优势

java - 是否推荐支持 Comet 的轻量级可嵌入 Java Web 框架?

algorithm - XOR在算法中有哪些实际应用