Awk
Taken from Tutorials Point See also Geeks for Geeks
'awk' is a interpreted programming language designed for text processing. An awk program consists of commands that are executed sequentially on lines of text. Optional BEGIN and END blocks can be used for setup and tear-down. Use single-quotes to mark the start and end of the awk script.
In its simplest form awk can be used to print the contents of a file:
awk '{print}' <filename>
The '-v' option can be used to assign variables before the awk commands are executed:
awk -v name="World!" 'BEGIN{printf "Hello, %s\n", name}'
'awk' treats input lines and output lines as space-separated fields. Columns from each line can be printed using their indexes (starting from 1):
awk '{print $3,$4}' <filename
Or you can insert a tab separator between the output fields:
awk '{print $3 "\t" $4}' <filename>
Filters can be added to select which lines to print:
awk '/<pattern>/ {print $3 "\t" $4}' <filename>
In the absence of a body block, awks default action is to print the whole record. $0 can be used to refer to the whole input record. These two programs therefore produce the same output:
awk '/<pattern>/ {print $0}' <filename>
awk '/<pattern>/' <filename>
Variables can be used without prior declaration:
awk '/<pattern>/{++counter} END{print "Lines matched = ", counter}' <filename>
'length' is a built-in function to return the length of its operand. To print all lines longer than 82 characters use:
awk 'length($0) > 82' <filename> # using default action of {print}
Awk comes with a number of built-in variables:
Variable | Meaning |
---|---|
ARGC | number of arguments provided at the command line |
ARGV | an array of the command line arguments |
CONVFMT | the conversion format for numbers |
ENVIRON | hash of environment variables |
FILENAME | the current filename |
FS | current field separator (default is space) |
RS | current record separator (default is newline) |
NF | number of fields in current record |
NR | number of the current record |
FNR | number of current record in current file |
OFMT | the output format for numbers |
OFS | the output field separator (default space) |
ORS | the output record separtor (default newline) |
RLENGTH | the length of the string matched in match function |
RSTART | the first position in the string matched in match function |
SUBSEP | the separator character for array subscripts |
awk 'BEGIN {print "Arguments = ", ARGC}' one two three four
awk 'BEGIN {
for (i=0 ; i < ARGC -1 ; i++) {
printf "ARGV[%d] = %s\n", i, ARGV[i]
}
}' one two three four
awk 'BEGIN { print ENVIRON["USER"] }'
awk 'NF < 10' <filename> # print lines with less than 10 fields
awk 'NF < 10 {print NF}' <filename> # print number of fields in lines with less than 10 fields
awk 'NR < 11' <filename> # prints first 10 records
awk 'BEGIN { if (match("Hello, World!", "or")) {print RLENGTH}'
Additional variables are available with GNU AWK (gawk):
Variable | Meaning |
---|---|
ARGIND | Index in ARGV of the current file being processed |
BINMODE | used to set binmode for file I/O |
ERRNO | error number if getline or close fails |
FIELDWIDTHS | use a space-separated list of fieldwidths instead of a delimiter |
IGNORECASE | used to make awk case-insensitive |
LINT | used to dynamically set lint options |
PROCINFO | used to access process information |
TEXTDOMAIN | text domain of the AWK program |