sort (Unix)

Sort ( / usr / bin / sort ) is a Unix tool that sorts data streams or files merged, or may be reviewed on an already existing sorting. Sorting keys can be alphabetical or numeric and include configurable parts of the input ( rows ) in order also configurable.

The range of functions as well as the operation of sort is governed by the POSIX standard, however, the GNU sort some deviations from this standard on. The Single Unix Specification lists the utility sort as "mandatory" ( necessary part ) and expectable behavior be specified.

  • 2.1 Outdated methods of key definition
  • 2.2 Influences on the sort order
  • 2.3 Leading spaces


Sort is line oriented, sorting objects are called Records (corresponding lines), which are separated by newline characters. Each such record in turn consists of fields, which are separated by a field separator. The default for the field separator is blank that, but it can also be any other character on the command line option -t are chosen .

Sort keys are defined by a field ( or a part of it, about the third through fifth characters of a particular field ) and the associated sorting method ( alphabetically or numerically ) is specified. Complex sorting key can be composed of several consecutive such individual keys. For example, a date field in the format " DD-MM -YYYY " be sorted by numerically primarily after the 7th - 11th Character, as a secondary key after the 4th - 5th Sign and tertiary key after 1 -2. Sorting characters ( the-n option at the beginning defines all subsequent keys as numeric):

Sort-n -k -k 1.4,1.5 1.7,1.11 -k 1.1,1.2 / path / to / input Unless otherwise explicitly stated, the subsequent after the last key definition remainder of the line is considered the last part of the key (in the extreme case - if no key is defined - it means, that sort sorted by the entire record). If this is not desired, the end of the key must be explicitly specified:

Sort- k 2 / path / to / input # sort by field 2 to end of line sort- k 2,2 / path / to / input # sorted solely on field 2 Alphabetical sorting, etc. are influenced by the internationalization settings, especially the variables LANG or LC_ALL, LC_COLLATE considerably, and numeric sorting respond in their behavior to the respective value of LC_NUMERIC.

Input and output behavior, return values

Sort writes its output, nothing else should be specified to stdout and error messages to stderr. This expenditure can be redirected by the usual means ( pipeline, Redirection ). In addition, is the switch -o available that specifies a defined file as the destination of the standard output.

Sort accepts either a data stream to stdin or one or more files as arguments as input. If multiple files are specified, they can be merged in the course of sorting into a single output file. The special file name - means stdin, so that a data stream can also be combined with other files.

Besides the usual return value 0 (success) and > 1 ( intrinsic error condition ), if the sorting of a file is only checked, the value 1 is returned. This means that the specified file is not sorted with respect to the given criterion.

Notes on use

Outdated methods of key definition

The original sort did not know the now common and standardized form of key definition on multiple- k expressions. Instead, the key beginning with the switch N [. M], the end of the respective partial key is specified with -N [. M], where N is the (zero -based ) number of the field, M is the ( also zero -based ) number of the character is within the field. The following example provides the same key definition in old and new spelling. It sorts the user directory / etc / passwd numerically ( -n) after the third field ( user ID ), where ": " is used as a field separator ( -t ' :'):

Sort-t ':' -n 2 -3 / etc / passwd sort-t ':' n- k 3.3 / etc / passwd This method is still very common to see in existing scripts, but is discouraged from using it now. Even though most of today's implementations, this notation yet understand, so it is still not part of the POSIX standard and portable scripts should not assume therefore.

Influences on the sorting sequence

Apart from the basic distinction between alphanumeric and numeric sorting and the aforementioned internationalization variables, the user has a number of other options available to influence the sort order. This can be done in each case for the entire sort globally via an option or only a partial key by a trailing modifier. Option and a modifier read in each case the same.

Sort-n -k 3,3 -k 4,4 / path / to / input # n is global for both key sort- k 3,3 n -k 4,4 / path / to / input # n applies only to the first part of key The following modifiers are available: b ignore leading blanks; leading spaces are ignored, which is also true for keys that do not begin with the first character of the field. The key definition -k 2.2b, 2 reveals the sort key as the second non -blank of the second field begins and ends with the last character of the second field. d dictionary; Dictionary -like sorting. In this case alphanumeric characters and blanks are only taken into account, where the value of LC_CTYPE determines what is understood to be alphanumeric. f fold lower case to upper case; Turns the Case- Sensitivity ( distinction of upper and lower case letters ) by characters that have a capital letter as equivalent, are sorted as if they were replaced by those. LC_CTYPE determines the characters that correspond couples there. i ignore unprintables; all non-printable characters are similar to d, instead ignored. Again, LC_CTYPE defines what is meant by "non- printable". n numerical; instead of the alphanumeric sorting is sorted numerically. r reverse; reverses the sort order. Instead of descending into ascending order. Leading spaces

Regular source of confusion the different treatment of leading spaces, depending on whether t is specified on the command line or not. In particular, when the blank is shown as a field separator, which apparently reflects the default.

If- t is not specified, the leading Field separator is added to the respective field, therefore leading blanks are attributed to the first field, while they can be treated like other characters and - in the case of t '' - act as a field separator. Contrast, is not considered when specifying the-t of the Field Separator as part of the field. The POSIX standard introduces in his explanatory notes on the following example ( blanks as represented ):

Sort << EOF foo EOF # first field: " foo", second field blank third field empty sort-t ' ' << EOF foo EOF # first field blank, empty second field, third field "foo" References

Chgrp | chown | chmod | cp | dd | df | dir | dircolors | install | ln | ls | mkdir | mkfifo | mknod | mv | rm | rmdir | shred | sync | touch | vdir

Cat | cksum | comm | csplit | cut | expand | fmt | fold | head | join | md5sum | nl | od | paste | ptx | pr | sha1sum | sort | split | sum | tac | tail | tr | tsort | unexpand | uniq | wc

basename | chroot | date | dirname | du | echo | env | expr | factor | false | groups | hostid | id | link | logname | nice | nohup | pathchk | pinky | printenv | printf | pwd | readlink | seq | sleep | stat | stty | tee | test | true | tty | uname | unlink | Forums | who | whoami | yes

  • UNIX operating system component
  • UNIX software