awk

Awk is a programming ( scripting) language for editing and evaluating structured text data, such as CSV files. The accompanying interpreter was one of the first tools that appeared in version 3 of Unix; it is still widely used together with sed in shell scripts to edit, transform or analyze data. The term " awk " is composed of the initial letters of the surnames of its three authors Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan.

A version of awk is to be found in almost every Unix system, which is historically attributed to UNIX, as well as in any Linux distribution. A similar program is also available but for almost all other operating systems. The Free Software Foundation provides under the name gawk an extended free version available. Another free implementation is mawk by Mike Brennan. mawk is smaller and faster than gawk, which is, however, paid for by some limitations.

The language works almost exclusively with the data type string (english string). In addition, associative arrays (that is, with strings indexed arrays, also called hashes ) and regular expressions basic components of the language.

The efficiency, compactness, but also the limitations of awk and sed scripts suggested Larry Wall on the development of the Perl language.

Structure of a program

The typical design of an awk program is operations - such substitutions - perform on an input text. To this end, the text line by line read and based on a selected separator - usually a series of spaces and / or tabs - split into fields. Then the awk statements on the respective line to be applied.

The awk instructions while having the following structure:

Condition { statement block } This is determined for the line read if they (often a regular expression ) satisfy the condition. If the condition is met, the code is executed within the space enclosed by braces statement block. Derogation, a statement only of an action

{ Statement block } or only one condition

Condition exist. Is missing, the condition, the action is executed for each line. If there is no action, the default action is executed to write the entire row if the condition is met.

Variables and Functions

The user can define variables within statement blocks by referencing an explicit declaration is not necessary. The scope of the variable is global. One exception is the function arguments whose validity is limited to the function that defines it.

Functions can be defined at any point, the declaration must not take place before the first use. If they are scalars, function arguments are passed by value parameters, otherwise as a reference parameter. The arguments in a function call do not match the function definition, excess arguments are treated as local variables, omitted arguments with the special value uninitialized - numeric zero as a string and the value of the empty string - provided.

Functions and variables of all types take the same name space, so that the same designation results in undefined behavior.

In addition to user-defined variables and functions (field separator of Engl. ) And standard variables and standard functions are available, for example, the variable $ 0 for the entire line, $ 1, $ 2, ... for each i-th field of the line and FS for the field separator, and the functions gsub (), split () and match ().

Commands

The syntax of the command statements of awk programming language similar to that of C. Elementary commands are assignments to variables, comparisons between variables and loops or conditional instruction execution ( if-else ). In addition, there are calls to both fixed and implemented at user-programmed functions.

Outputting data to the standard output is possible through the "print " command. To print the second field as a command line, the command is

Print $ 2 uses.

Conditions

Conditions are in awk programs either of the form

Expression comparison- operator expression or of the form

Expression match operator / regular search pattern / Regular search patterns are formed as the grep command, and match operators are ~ " pattern found " for and! ~ " Pattern not found " for. As an abbreviation for the condition "$ 0 ~ / regular search pattern / ' (ie the whole row satisfies the search pattern ) can be used " / / regular pattern ".

As the specific conditions of the words BEGIN and END, in which the accompanying statement blocks are executed before reading the first line and after reading the last line apply.

In addition, conditions can be composed with logical links to new conditions, such as

$ 1 ~ / ^ E / && $ 2 > 20 { print $ 3} AWK causes this command that of each line starts with the second field, and e is a number greater than 20, the third field is displayed.

Examples

Here are some simple application examples to you such as Linux can simply type in a shell:

Echo Hello World | awk ' { print $ 1}'     echo Hello World | awk ' { print $ 2}' produce output "hello" or " world "

Echo Hello World | awk ' { printf " % s% s \ n ", $ 1, $ 2}' produces the output "Hello World! " and a newline.

Versions, dialects

The newer and more advanced version of awk is nawk (new awk ). It was introduced in 1985 and is the basis of current awk implementations. It differs by the ability to define your own functions, as well as a larger set of operators and predefined functions. The call is still mostly about " awk ", since a distinction between the two versions has become obsolete.

A written by Michael Brennan implementation of awk is mawk.

258558
de