Sed
Sed stands for stream editor and is a Unix tool that can be edited with text data streams. The data stream can also be read from a file. Unlike a text editor, the original file is not changed.
The sed command set is based on that of the line-oriented text editor ed this case, a certain variety of regular expressions, so-called ( POSIX ) Basic Regular Expressions ( BRE ) are used for the text - screening according to the POSIX specification used. However, the widespread GNU implementation uses GNU BREs that differ slightly from POSIX BREs.
Even if the language range of sed plenty of limited and specialized appear, it is nevertheless a Turing - complete language. Evidence can be Turing - completeness by a Turing machine using sed to program or by an interpreter for another Turing - complete language to write with sed.
- 2.3.1 Hold Space Manipulation
- 2.3.2 Multi-line instructions
- 3.1 Capacity limits
- 3.2 Greedyness
- 3.3 Practical limitations in shell programming
- 3.4 In-Place Editing
- 3.5 RegExp notation
- 3.6 Some typical methods 3.6.1 Deletion of text parts
- 3.6.2 at least one response mark
- 3.6.3 Replacement of several or all occurrences within a line
- 3.6.4 Filter specific rows
- 3.6.5 Debugging
Operation
Sed can work both within a pipeline as well as files.
Expenditures are always based
Sed ' Instruction1
Instruction2
...
Statementn ' input file > output file
Programming
Sed statements can be roughly divided into three groups: text manipulation, and other branches. (Most sed manuals as well as the POSIX specification divide notwithstanding this instructions in 2- address, one - address and address -less - see below - but this grouping is not suitable for introduction purposes. )
Text manipulation
This is the function used by far the most common and the instruction set is also particularly rich. Generally, an instruction has the following structure (2- address command):
Sed ' / start /, / end / s / old / NEW / ' inputfile input output x old beginning y old end z old x old beginning y NEW end z old " old " is replaced by " NEW ", but only from the line that contains " start " to the row that contains the "end" (2- address - variant). In contrast, the same substitution in the second example is carried out in all lines that begin with "y" or "z " (1- address - variant):
Instead of a single command command can also contain a list of instructions that are enclosed by { ...}. Apply to these instructions again, the rules described above, they can in turn consist of other composite commands. An example:
Sed ' / ^ [ yz ] / { s / ^ \ ( [ yz ] \ ) / ( \ 1) / s / old / NEW / } ' Inputfile input output x old beginning y old end z old x old beginning (y) NEW end ( z) NEW branches
Sed has two types of branches: unconditional branches ( jump instructions ) and conditional that come in response to a previously made or not made replacement operation for execution.
A typical example is the following: A source code was indented with the help of leading tab characters, these leading tabs are to be replaced in each case by 8 blanks.
Other than located at the beginning of the line tabs can appear in the text, but should not be changed.
The problem is that multiple linkages (replace N Tabs by N * 8 blanks) can not be expressed as a RegExp.
On the other hand, a global replacement would also affect the tab character within the text.
Therefore, a loop is formed with jump instructions ( below are blanks and tabs for clarity by "
Sed ': start
/ ^ *
Here the resemblance to assembly language is clear by a control structure is comparable to the usual built in high level languages repeat-until with a condition and a label.
Other instructions
Hold Space Manipulation
A powerful (although relatively unknown ) function of sed is called the hold space. This is a free memory area, which is similar to the known in some assembler languages accumulator in its operation. Although direct manipulation of the data in the Hold Space is not possible, but data in the pattern space can be shifted in the hold space, copied, or even with the contents thereof may be interchanged. Also attaching the Pattern Spaces to the Hold space or vice versa is possible.
The following example illustrates the function of the Hold Space: the text of a " chapter title " is stored and each line of each "chapter", readjusted the line itself but suppressed by the chapter heading:
Sed ' / ^ = / { s / ^ = / / s / ^ / ( / s / $ / ) / h d } G ' inputfile input output = Chapter1 line 1 line 2 line 3 = Chapter2 row A line B line C Line 1 ( Chapter 1 ) Line 2 ( Chapter 1 ) Line 3 ( Chapter 1 ) Line A ( Chapter 2 ) Line B ( Chapter 2 ) Line C ( Chapter 2 ) Whenever a line with " =" begins, the statement block is executed, which removes this character and for the rest of the line provides you with a leading space and brackets. Then this text is in the Hold Space copied ( " h") and deleted from the Pattern Space ( "d"), whereby the program is terminated for that line and the next line is read. As for the "normal line " the condition of the input block is not the case, only the last instruction ("G" ) is carried out, which attaches the contents of the hold space in the pattern space.
Multi-line instructions
Not all text manipulations can be performed within individual rows. Sometimes information one line at cross- substitutions must be included from other rows in the decision-making, sometimes be performed. For the sed- programming language provides the instructions N, P and D before, spent with those multiple rows of the input text at the same time in the pattern space loaded ( "N") and parts thereof ( "P") or deleted (" D") can be. A typical example is the following one-liner (actually two one-liners ), which provides a text with line numbers:
Sed ' = ' input file | sed ' N;
s / \ n /
Applications, options, notes
Capacity limits
Sed is not subject to (real) limitations on file sizes. Apart from the available disk space, which is a practical limit, most implementations realize the line counter as int or long int In today's common 64 -bit processors, the risk of an overflow can therefore be neglected.
However, as most text -manipulating tools in UNIX sed is subject to a limitation with regard to the line length (more precisely, the number of bytes up to the next newline character ). The minimum size is defined by the POSIX standard, the actual size can vary from system to system and can be looked as the value of the constant LINE_MAX in each case in the kernel header file / usr / include / limits.h. The length is specified in bytes, not characters (which is why a conversion about the processing of UTF- encoded files that represent single characters with multiple bytes is needed ).
Greedyness
In the scope of regexps distinguish between greedy and non-greedy. sed regexps are always greedy, which means that the RegExp always has the longest possible scope:
/ a * B /; " 'a', followed by any zero or more characters followed by 'B' " axyBBBskdjfhaaBBpweruBjdfh; longest possible scope ( greedy ) axyBBBskdjfhaaBBpweruBjdfh; shortest possible scope ( non-greedy ) The reason is that sed is optimized for speed and non-greedy regexps would require costly backtracking. If you want to force a non -greedy behavior, one usually achieves this by negated character classes. In the example above:
/ a [^ B] B * /; " " A ", followed by zero or more non- " B ", followed by " B " " Practical limits in shell programming
It should be mentioned that the allerhäufigste application of sed (and awk, tr and similar filtering software ) in practice - the ad hoc manipulation of outputs of other commands, like so:
Ls -l / path / to / myfile | sed '. s / ^ \ ( [^ ] [^ ] * \ ) * / \ 1 / ' # prints File Type and File Mode from Strictly speaking, an abuse represents. Since each call to an external program requires the costly system call fork (), are shell internal methods, such as the so-called variable expansion, even if they are to write much longer usually consider calling external programs. The rule of thumb for this is: if the output of the filtering process is a file or data stream, the filter program must be used, otherwise variable expansion is preferable.
In-Place Editing
Because of the way like sed performs text manipulation, this can not be done directly on the input file. As a separate issue from this file is needed, which is optionally thereafter copied from the input file.
Sed ' ...
RegExp notation
It has become common, regular expressions - to limit by slashes - as in the above examples. sed, however, does not require this. Any character that follows a substitution command is accepted as a delimiter and then expected in the sequence. These two statements are therefore equivalent:
S / ^ \ ( [^ ] [^ ] * \ ) \ ( [^ ] [^ ] * \ ) / \ 2 \ 1 /; swapped first and second word of a line s_ ^ \ ( [^ ] [^ ] * \ ) \ ( [^ ] [^ ] * \ ) _ \ 2 \ 1_; "_ " Instead of " / " This is convenient if the backslash is required as part of the RegExp, because then you can save the tedious escaping ( adding terms to the use as literal) is. It then gives way simply to another, unused characters.
Some typical methods
Deletion of parts of text
If by replacing it with nothing. Explicit deletion of parts of a line is provided only by the beginning of the line to the first line separator (D). The term
/ Expression / d however, deletes NOT the subexpression, but any line that contains expression! Expression acts as the address ( see above, 1- address variant of the command d).
At least one character responsive
The quantifier \ not provided for one or more of the previous expression - In the scope of the POSIX BREs is - in contrast to the GNU BREs. To write portable sed scripts that run not only with GNU sed, therefore, the term should be doubled and the * quantifier (zero or more ) can be used.
/ xa \ y /; GNU variant of " " x " followed by one or more (but not zero), 'A' followed by 'y' " / xaa * y /; the same in POSIX: " ' x' followed by 'a' followed by zero or more ' a's followed by ' y ' " Replacement of several or all occurrences within a line
Without giving further options, only the first occurrence of the search text replacement rule is always subject to:
Sed ' s / old / NEW / ' inputfile input output old alt alt alt alt alt alt alt alt alt alt alt alt alt alt NEW NEW old NEW alt alt NEW alt alt alt NEW alt alt alt alt This behavior, however, can be changed by specifying a comma Dopt ion: If a number N is specified, the Nth occurrence is only changed, a g ( for global) changes all occurrences:
Sed ' s / old / NEW / g' inputfile input output old alt alt alt alt alt alt alt alt alt alt alt alt alt alt NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW Filter specific rows
Basically sed is always the contents of the pattern spaces after the last statement. If you want to suppress this behavior for individual lines, so you can either have a rule to delete certain rows (explicit filtering), but it is also possible with the command line option-n, total turn off this behavior (implicit filtering). Output is then only what is specified with the explicit print command (p). p can serve either as a separate statement or as an option for other instructions. The example is from the text above used only the " chapter headings " from:
Sed- n ' s / ^ = \ (. * \ ) $ / chapter heading: \ 1 / p' inputfile input output = Chapter1 line 1 line 2 line 3 = Chapter2 row A line B line C Chapter Title: Chapter 1 Chapter Title: Chapter 2 debugging
For troubleshooting purposes, it may be useful to can be output intermediate results to the development in the pattern space can better understand. To the above -mentioned option can be used p. Lines may well be repeatedly output. In the above example program about:
Sed ' / ^ = / { s / ^ = / / p s / ^ / ( / p s / $ / ) / p h d } p G ' inputfile References