﻿ A* search algorithm

# A* search algorithm

The A * algorithm ("A star" or in English " a star", and A * - search) belongs to the class of informed search algorithms. It is used in computer science computing a shortest path between two nodes in a graph with positive edge weights. It was first described in 1968 by Peter Hart, Nils J. Nilsson, and Bertram Raphael. The algorithm is considered to be a generalization and extension of the Dijkstra algorithm, in many cases, but may be vice versa A * also reduced to Dijkstra.

Contrary to uninformed search algorithms of the A * algorithm uses an estimation function ( heuristic ) to look purposeful and thus to reduce the run time. The algorithm is optimal. This means that the optimum solution is found, if one exists.

• 6.1 Optimality
• 6.2 Time complexity

## Idea of the algorithm

The A * algorithm always examined first the nodes that are likely to quickly lead to the goal. To determine the most promising node to all known nodes each associated with a value that indicates how long is the path from the origin to the destination by using the node under consideration, in the best case. The node with the lowest value is examined next.

For a node called from the previous cost from the start node to reach. denotes the estimated cost of up to the destination node. The heuristic used must never overestimate the cost. For the example, the route search the crow is an appropriate estimate: The actual route is never shorter than the direct link.

## Operation

The nodes are divided during the search in three different classes:

• Unknown nodes: These nodes have not been found during the search. It is to them no way known. Each node (except the start node ) is unknown at the beginning of the algorithm.
• Known nodes: These nodes is a (possibly suboptimal ) route known. All known nodes are stored together with their value in the so-called Open List. From this list the most promising node is always selected and examined. The implementation of the Open List has great influence on the running time and is often used as a simple priority queue implemented (eg, binary heap). Initially known only to the starting node.
• Finally examined nodes: These nodes, the shortest path is known. The concluded as nodes are stored in the so-called closed list, so they are not examined more than once. In order to determine efficiently whether an element is in the closed list, this is often implemented as a set. The closed list is empty at the beginning.

Any known or finally visited node contains a pointer to its parent node. Using this pointer, the path can be traced back to the starting node.

If a node is fully investigated (including expanded or relaxed), as his successor nodes are inserted into the open list and added to the closed list. The predecessor pointer of the successor nodes are set to. If a successor node already on the closed list, it will not re- inserted into the Open list and not change its predecessor pointer. If a successor node already on the open list, the node is updated only ( value and predecessor pointer ) when the new way to get there is shorter than the previous.

If the destination node is fully investigated, the algorithm terminates. The path found is reconstructed using the predecessor pointer and output. If the Open List is empty, there are no more nodes that could be investigated. In this case, the algorithm terminates because there is no solution.

Due to the predecessor pointer of the path found by the target, starting output backward to the start. To get the path in the correct order can be reversed, for example, before the route search start and finish. Thus is searched from the actual target for the start and the Wegausgabe begins after the initial launch node.

Declare open list as priority queue with nodes / / priority queue declare closed list as set with Nodes program a- star      / / Initialize the Open List, the Closed List is empty      / / ( The priority or the f value of the start node is irrelevant )      openlist.enqueue (start node, 0)      / / This loop is executed until either      / / - The optimal solution is found or      / / - It is clear that no solution exists      repeat          / / Remove the node with the lowest f value from the Open List          currentNode: = openlist.removeMin ()          / / Did we find a destination?          if currentNode == destination node then              return PathFound          / / The current node is intended by subsequent functions          Is / / not further investigated so that no cycles          closedlist.add ( currentNode )          / / If the target has not been found: successor node          / / Set the current node to the Open List          expandNode ( currentNode )      until openlist.isEmpty ()      / / Open the list is empty, there is no path to the destination      return NoPathFound end / / Checks every successor node and adds it to the Open List added when either / / - The successor node is found for the first time or / / - A better way can be found to this node expandNode function ( currentNode )      foreach successor of currentNode          / / If the successor node is already on the Closed List - do nothing          if closedlist.contains ( successor ) then              continue          / / G value for the new way to calculate: g value of its predecessor plus          / / The cost of the edge of the currently used          tentative_g = g ( currentNode ) c ( currentNode, successor )          / / If the successor node is already on the open list,          / / But the new way is not better than the old one - do nothing          if openlist.contains ( successor ) and tentative_g > = g ( successor ) then              continue          / / Set the previous pointer and note g value          successor.predecessor: = currentNode          g ( successor ) = tentative_g          / / Update the f value of the node in the Open List          / / Insert or nodes with f value in the Open List          f: = tentative_g h ( successor )          if openlist.contains ( successor ) then              openlist.decreaseKey ( successor, f)          else              openlist.enqueue ( successor, f)      end end areas of application

The A * algorithm is generally an optimal path between two nodes in a graph. Optimal can the shortest, fastest or easiest mean, depending on the nature of the weighting function, which assigns to the edges of its length. In theory, the algorithm can solve all the problems that can be represented by a graph and for which an estimate of the remaining costs to the objective can be made ​​. Among the restrictions of A * see cons section.

A * is often used for route search route planners or in computer games. Heuristics straight line distance between states is usually used to target because he always optimistic estimates due to the triangle inequality. Other applications include games such as the 15 - puzzle or the women's issue in which a given goal state to be achieved. Heuristics can be used, the number of misplaced tiles.

## Example

In the example, the shortest path from Saarbruecken to Würzburg to be found. The diagram shows a small selection of cities and routes. The edge labels corresponding to the cost ( here: distance in kilometers ) to get from one city to the next. Each node contains the estimated distance to the destination node ( the value function). As a heuristic, the air line is being used.

The step -by-step pictures, the color of a node indicates its status:

• White: not yet found
• Gray: found, located on the Open List
• Black: studied, is located on the Closed List

Once a node has been reached, shows an edge on the parent node. For all nodes on the open list, the value is specified.

Step 1: It will be considered all successor nodes from the start node Saarbrücken:

• Kaiserslautern is inserted into the open list. ( The cost to get from Saarbruecken to Saarbruecken are 0, the cost of the edge Saarbrücken- Kaiserslautern is 70, and the estimated remaining costs of Kaiserslautern and Würzburg are at 158 This refers to the cost of the edge between nodes and. )
• Karlsruhe is inserted into the open list.

Saarbrücken is entered in both node as parent node. The node Saarbrücken is now considered final and is set to the Closed List. ( Intuitively: It's pointless, just drive around for a while, back again to arrive at the start, then at some point to come and say that this would be the optimal way. )

Step 2: Open List now contains two points. Since Kaiserslautern has the lower value, Kaiserslautern will be examined next, and considers its successor node. ( The way over Kaiserslautern is at least 228 km long, of about Karlsruhe at least 285 km. Accordingly, it is the most skill, first check Kaiserslautern in more detail. )

• Saarbrücken is already on the closed list and is not considered further.
• Frankfurt is inserted into the open list. The cost to Frankfurt calculated from the costs to Kaiserslautern plus the cost of the most recently used edge ( Kaiserslautern - Frankfurt). Kaiserslautern is stored as a parent node.
• Ludwigshafen is inserted into the open list. Kaiserslautern is stored as a parent node.

Kaiserslautern is set to the Closed List.

Step 3: The node with the lowest value is Ludwigshafen. ( The actual cost from Ludwigshafen to Würzburg deviate considerably from the estimate, since only highways to be used in this example. )

• Kaiserslautern is already on the closed list and is not considered further.
• The target node Würzburg is found but not yet issued as a solution, but first inserted into the open list. Ludwigshafen is stored as a parent node.

Ludwigshafen is set to the Closed List.

Here is an important difference between " nodes located on the Open List " and " node is located on the Closed List" clear: As long as a node is considered non-exhaustive is the path to that node only provisionally. Maybe there is a better path! Node to the closed list, however, are considered final. The shortest path to it is known.

Step 4: Frankfurt is explored. As the only successor node that is not yet on the Closed List, Würzburg is found. The value is calculated 289, which is lower than the previous value of Würzburg. This means that the route via Frankfurt is shorter than the path found first via Ludwigshafen. Therefore, the value of Würzburg is changed. Also, as a parent node now Frankfurt saved.

Frankfurt is set to the Closed List.

Step 5: As Karlsruhe now has the smallest value of these nodes will be examined next. Heilbronn is inserted into the open list and Karlsruhe set to the Closed List.

Step 6: Now Würzburg has the lowest value and is being investigated: the goal is found and the shortest way is Saarbrucken - Kaiserslautern -Frankfurt- Würzburg.

This example is merely for the understanding of the operation of the algorithm. The specificity of the targeted searching here is not clear, since all nodes are within the direct line between Saarbrücken and Würzburg. Under Web Links, Java applets, which are the targeted searches find better.

## Heuristics

There are two types of heuristics for the A * - algorithm: Allowable and monotonous heuristics. Each monotone heuristic is also admissible. Monotony is therefore a stronger property than the admissibility. In general monotone heuristics are used. The heuristic to estimate the distance between two cities - the crow - for example monotone.

A heuristic is admissible if the costs are never overestimated. That is, the estimated cost must always lie in the interval when the actual costs referred. If the heuristics used only permissible, but not monotonic, then to an expanded node is not necessarily the shortest route known. Therefore it must be possible to expand a node more than once. So there must be no closed list are used.

In the diagram on the right is an example graph and a feasible (but not monotone ) heuristic is shown. The best way from start to finish has cost 40 and runs through the nodes K1 and K2. The path through the detour node U has cost 45 At the start node, the heuristic estimates of costs and 40 at node K1 cost of 30, which respectively correspond to the actual costs. The nodes K2, U and target cost of 0 will be appreciated. As a heuristic must not overestimate, always costs are estimated from 0 to a target node.

Example of Closed List

• In the first step, the start node is expanded. The two successor nodes K1 and U are found and registered with the values ​​and in the Open List.
• In the second step, the node U is expanded, since its value is the lowest. The successor node K2 is found and registered with the value in the Open List.
• In the third step, the Open List is with the node and the node K2 K1. So the node K2 is expanded. The two successor nodes K1 and aim to be found. The new value is greater than the old value. Therefore, the value for the node K1 in the Open List is not updated. The target node is inserted into the open list.
• In the fourth step are with and with the target node K1 in the Open List. The expansion of K1 produced no successor node, since both start and K2 are already on the closed list. The use of the Closed List prevented here that the optimal solution is found.
• The fifth step is the only node is expanded to the Open List: the destination node. The solution found is thus start -U -K2- target and with cost 45 is not optimal.

Example without Closed List

• In the first step, the start node is expanded. The two successor nodes K1 and U are found and registered with the values ​​and in the Open List.
• In the second step, the node U is having expanded. The successor node K2 and start to be found and registered with the values ​​and in the Open List.
• In the third step, the node K2 is expanded. The successor node K1 (), U () and end () are found.
• In the fourth step, the successor node of K1 () are inserted into the open list: start with and K2.
• In the fifth step K2 is expanded for the second time, now with. The knots K1 (), U () and target () can be found.

### Monotone heuristics

Thus, a heuristic monotonic ( or consistent ), it must satisfy two conditions:

• The costs may never be overestimated (condition for an admissible heuristic ).
• For each node and each successor node must apply. Herein, the actual cost to get from to.

The second condition is a form of triangle inequality: The estimated cost of a node may not be greater than the actual cost to a successor node plus the estimated cost of this node.

The admissible heuristics from the example violating this condition: the cost of nodes K1 are estimated at 30. If you move now 20 in the direction of the target to node K2 suddenly no longer costs are estimated. This applies.

In many cases, such as the Euclidean distance ( straight line ) to target a monotonic heuristic, for example, the route with a car.

If a heuristic is monotonic, it can be incorporated in the cost function, and the A * search becomes a Dijkstra algorithm.

## Properties

The A * algorithm is

• Complete: If a solution exists, it is found.
• Optimal: It is always found the optimal solution. If there are several optimal solutions, once found it (depending on implementation details ).
• Optimally efficient: There is no other algorithm that finds the solution using the same heuristics faster. (more precisely: A * expands a minimal number of nodes. )

### Optimality

The A * algorithm is optimal if a monotonic heuristic is used. If no closed list is used, the optimality is maintained even when an admissible heuristic. In the following, the optimality is proved using a monotone heuristic.

Claim: The A * algorithm always finds an optimal solution.

Proof: Let be an optimal solution with cost and a suboptimal solution. Since the heuristic never overestimates the cost to the goal applies to each destination node and in particular for:

As a sub-optimal solution is valid for the costs:

The heuristic estimates the actual cost or underestimated them. So is true for any node on the shortest path to the optimal solution:

Thus, the following applies:

And in particular:

This means that the A * algorithm, the sub-optimal solution is not output while a better solution exists. It is therefore first the optimal solution found.

### Time complexity

The time complexity (or asymptotic running time) described here has only minor importance, since the strength of the A * algorithm is performed targeted searches and compared to the total number of nodes usually only a small portion is examined it. In labyrinths, however, this is often not possible and the actual running time approaches the specified worst-case running time. The time complexity is estimated only using simplifying assumptions. It largely depends on two factors:

• Accuracy of the heuristic used: Is not monotonic heuristic, the running time is exponentially as nodes are expanded several times. The more accurate the cost estimate, the less nodes are examined.
• Implementation of Open and Closed List: The cost-intensive operations in the A * algorithm are the methods to insert elements in the lists and remove, as well as elements in the Open List to change. These must be efficiently supported by the data structures used to allow a short duration.

In the following, the set of nodes of the underlying graph is referred to. It is assumed that all nodes and links are known in advance. (This is at a pathfinding usually the case. ) The heuristic used is monotonous. The Open List is implemented as a binary heap, the closed list as an array. ( Each node has a flag, whether it's on the list or not. ) The A * algorithm thus has a quadratic worst-case running time:

This term is specified as follows: The main loop of the algorithm is executed for each node at most once. The functions openlist.removeMin, expandNode and closedlist.add be so called maximum times. openlist.removeMin contains a maximum of nodes and required in a binary heap logarithmic term. closedlist.add requires only constant maturity in an array. This results in a provisional term of:

The term of expandNode is composed of: closedlist.contains has constant running time. openlist.contains also has constant running time when each node also an open list flag stores. The call to openlist.find can be omitted if each node also stores its value. It will either openlist.decreaseKey (requires linear running time to find the appropriate element ) or openlist.enqueue (requires logarithmic running time ) is called. The log of the linear term is dominant. A loop run within expandNode thus requires linear runtime.

All functions are called for each successor node. It is assumed that each node has only a maximum of outgoing edges. The number of loop iterations within expandNode is thus constant and can be neglected in the asymptotic runtime analysis. This assumption does not apply to graphs in which each node is connected with almost every other node.

The total running time is calculated as:

Optimization potential with respect to the worst-case running time is greatest when openlist.decreaseKey. The costly search in the heap does not apply if each node stores the position of its corresponding entry in the heap. Thus, the term of decreasekey would logarithmic and reduce the total running time to linear- logarithmic:

The limiting factor in A * is often not the processing time, but the memory space. Since all known nodes are kept in memory (Open and Closed List), A * for many problems is not suitable. Even the simple 15 - puzzle, the complete graph has nodes. With a correspondingly long approach, the available memory is not enough and A * can not find a solution. In the section related algorithms to find similar algorithms that try to restrict the usage of storage space.

## Related algorithms

The A * algorithm is used with the Dijkstra's algorithm, and a greedy algorithm. The Dijkstra's algorithm does not use heuristics ( so ), and selects nodes only on the basis of previous costs. When the heuristic of the A * algorithm monotone, so they can be in the cost function Anticipated (ie, ) and then use the algorithm of Dijkstra. A greedy algorithm, however, noted the cost is not ( so ), and selects nodes only on the basis of the estimated remaining costs. For example, the route search of the Dijkstra's algorithm is more appropriate if known prior to the route search, the goal is not (for example, the next station), and therefore it is not possible to use a heuristic.

Many similar to A * algorithms try to solve the storage problem. These include the following:

• IDA * (Iterative Deepening A *), a variant of iterative deepening search
• RBFS ( Recursive Best -First Search), the memory consumption linear to the limited length of the solution
• MA * (Memory - Bounded A *), SMA * (Simplified MA *), each use a fixed predetermined amount of space

These algorithms to limit the memory consumption at the expense of run time. This is no longer needed all nodes can be kept in memory under certain circumstances. Bad nodes are then forgotten and need to be re- generated later. Using a monotone heuristics and provided that enough memory is available, all these algorithms optimal. If the disk space limit is too restrictive, the optimal solution may be unattainable. In this case, a sub-optimal solution, or no solution is found.

An extension of A * is the D * algorithm (Dynamic A *). This algorithm is able to process new information about the environment efficiently. For example, if a bridge opposite to the initial assumption impassable, so the path is only partially rescheduled rather than A * reapply to the slightly modified graph. Especially in dynamic or unknown environments (robots moving through a disaster area ) may require repeated re-planning of the path already found. D * is A * optimal and optimally efficient.

Other graph-based algorithms are the Bellman - Ford algorithm (allowing negative edge weights ) or the algorithm of Floyd and Warshall (computes the shortest paths between all pairs of nodes ).

21137
de