Minimax

The minimax algorithm is an algorithm for determining the optimal playing strategy for finite two-person zero-sum games with perfect information. These games include particular board games like chess, Go, Othello / Reversi, Checkers, mill and four in a row in which both players always know the entire history of the game. Also for game with random effect such as backgammon can be the minimax algorithm on the basis of expected values ​​expand. Usually, but not exclusively, the minimax algorithm is applied to games with alternate train right.

A calculated with the Minimax algorithm strategy is called the minimax strategy. It ensures the concerned player the highest possible profit that can be achieved regardless of the playing style of your opponent. The formed from the minimax strategies for both players strategy pair forms a Nash equilibrium.

For non- zero-sum games, in which the defeat of the enemy does not coincide necessarily with their own profit, the minimax algorithm does not necessarily provide an optimal strategy.

Variants of the Minimax algorithm form the core element of gambling programs such as a chess program. The increasing processing power of computers has now meant that even with such complex games such as chess, most people can be beaten by the computer without any trouble now.

For some games such as the so-called Nim game can be an optimal strategy calculated by efficient algorithms for combinatorial game theory.

Evaluation function

An ideal evaluation function assigns a position value 1 if player A wins, and the value -1 if player B wins, and 0 in a draw. Can one of all game positions the search tree up to the maximum depth to build (up to the end position, where you can see who wins ), so the algorithm plays a perfect game. However, in practice, the complete structure of a search tree only for very simple games like Tic -Tac -Toe is possible.

In almost all other games this is too computationally expensive. Therefore, one is content to build the search tree only up to a given search depth ( horizon). The evaluation function is modified, very good game positions for A obtained very high values ​​, very good playing positions for B obtained very low values. To determine the values ​​heuristics, is used to estimate.

Search Tree Example

The picture shows a simple tree with search depth 4 Player A is on the train.

The nodes of levels 0 and 2 correspond to match situations where player A is on the train. Here, the evaluation function of the child nodes of the budget for player A will train each maximized, that is, selected and assigned the value of the parent node.

The nodes of the levels 1 and 3 correspond to match situations where player B is on the train. Here, the evaluation function of the child nodes of the cheapest for Player B train is minimized in each case, that is selected and assigned the value of the parent node.

The algorithm starts at the bottom in the sheets, and then goes up to the root. In Level 3, the algorithm selects the smallest value of the child node and assigns it to the parent node ( it is minimized ). In Level 2 of the largest respective child node is then assigned to the parent node ( it is maximized ). This is performed alternately until the root is reached. The root is assigned the value of the largest child node. This then is the train that is to be played.

Comments

  • The minimax procedure is basically the particular game regardless, that hz B. Chess and Reversi use the same algorithm.
  • Interfaces for special game, only the following two program components: What moves are possible in an actual game situation?
  • How is a game situation evaluated numerically?
  • In addition to the minimax method, a game may apply additional game-specific methods, such as pre-computed libraries for opening moves.

The minimax algorithm is linear with respect to the number of possible moves to be checked. In general, therefore you need with increasing search depth exponentially long time. ( Note that in theory a game with finitely many states the term is constant, since the computation time no longer increases beyond a certain depth. Since in most games this depth but can never realistically be achieved, it is legitimate to speak of an exponential growth. ) On the other hand, tends to increase (depending on the numerical score ) at a higher search depth and the quality of the search result.

There are therefore various optimized variants, such as

  • Variable search depth: If only a few possible moves per game situation exist, eg because only a few pieces are on the field, the search depth can be increased (and vice versa).
  • Dynamic search depth: When changing the number values ​​at a location of the tree from level to level strong, the search depth can be increased locally ( and vice versa).
  • The alpha-beta - search may be used.

A significant time savings are achieved by storing the previously investigated positions and their ratings. If a position is reached by different move sequences from the initial position, the entire search tree underlying needs every time again to be searched.

Iterative depth-first search

Especially with limited time for searching (eg in tournament chess) iterative deepening search ( iterative deepening ) will be used. In this case, the search is repeated starting from the position to be examined, thereby increasing the desired depth of search steps. If the already investigated positions, as described above, stored, only the opposite of the previous search, newly gained positions must be evaluated with the evaluation function. This procedure continues until the available search time is exceeded, or a "sufficiently good " result is obtained.

Without iterative depth-first search when the time limit is exceeded in the worst case, only one train would have been investigated, but this up in very great depth. The next train, which might have secured the win after only a single turn, would have been never tried.

Implementation

Main program ( excerpt):

GespeicherterZug = NULL;   gewuenschteTiefe int = 4;   int rating = max ( 1, gewuenschteTiefe );   if ( gespeicherterZug == NULL)      There were no other trains more;   else      gespeicherterZug run; The normal version is:

Int max ( int player, int depth) {      if ( depth == 0 or keineZuegeMehr ( player ) )         return evaluate ();      int MaxValue = -infinity;      generiereMoeglicheZuege ( player );      while ( still train there) {         fuehreNaechstenZugAus ();         int value = min ( player, depth -1);         macheZugRueckgaengig ();         if ( value > max value ) {            MAX VALUE = value;            if ( deep == start deep )               gespeicherterZug = train;         }      }      return maximum value;   }   int min ( int player, int depth) {      if ( depth == 0 or keineZuegeMehr ( player ) )         return evaluate ();      int MinValue = infinity;      generiereMoeglicheZuege ( player );      while ( still train there) {         fuehreNaechstenZugAus ();         int value = max ( player, depth -1);         macheZugRueckgaengig ();         if ( value < MinValue ) {            MinValue = value;         }      }      return min value;   } The NegaMax variant is:

MiniMax int (int player, int depth) {      if ( depth == 0 or keineZuegeMehr ( player ) )         return rate ( player );      int MaxValue = -infinity;      generiereMoeglicheZuege ( player );      while ( still train there) {         fuehreNaechstenZugAus ();         int value = miniMax ( player, depth -1);         macheZugRueckgaengig ();         if ( value > max value ) {            MAX VALUE = value;            if ( deep == start deep )               gespeicherterZug = train;         }      }      return maximum value;   } Note: While the standard implementation for a player maximized and minimized for the other player, maximize Negamax variant for both players and reversed and negated alpha and beta in the recursion. It follows that the evaluation function must behave differently in the two implementations.

  • Standard implementation: The better the board position for maximizing player, the greater the return value of the evaluation function is (rate function ()). The better it is for the minimizing player, the smaller the return value.
  • Negamax implementation: Since both players maximize the evaluation function must provide all the larger values ​​, the better the board position of the straight -withdrawing (rate function ( player ) ). The value is always specified from the point of view.

Description: The Negamax algorithm

In order to simplify the code and treat each node of the search tree can equal, we define that each player tries to achieve a maximum result for yourself, that is the maximum value of the evaluation function. This requires the evaluation of the sequence positions are multiplied with (negation, hence the name). This is no longer necessary to distinguish whether A or B is on the train and therefore the maximum or the minimum is to be computed, but it is always calculated in each position only the maximum of the negated reviews the sequence positions.

574334
de