Linux Bash Scripting Tutorial


AWK Programming

AWK is a pattern scanning and processing language.
AWK is a program , which has its own programming language for performing data processing and to generate reports

Awk was originally intended for everyday data-processing tasks, such as information retrieval, data validation,data transformation and reduction.

Awk can be used to select data from files and then format the selected data into a report.



awk syntax:

		using only pattern

		awk 'pattern'  filename	   # all lines containing pattern will be printed

		using only Action
	
		awk '{action}' filename    # action will be applied to all lines

		using pattern and action together

		awk 'pattern {action}' filename #action will be applied to all lines containing pattern

      
The $ at the beginning of a line is the prompt from the system; it may be dif- ferent on your machine.

Running awk program from a file

awk -f progfile [filenames]

Fields

NF- Number of Fields

Number of fields in the current record

NR - Number of Records

Number of records read so far

FS - Field Seperator

Controls the input field seperator

Patterns

Patterns control the execution of actions.when pattern is matched,its associated action is executed.

BEGIN {statements}This statement is executed once before any input has been read
END {statements}This statement is executed once after all the input has been read
expression {statements}
/regular expressions/ {statements}
compound pattern {statements}
pattern1,pattern2 {statements}

awk command uses following data file

	Tomoto    1.99   3
	Apples    3.99   10
	Capsicum  1.99   7
	Mushrooms 3.99   3
	Potatoes  2.99   5
	Chicken   4.99   5
	

expression {statements}

$awk '$2 > 1.99' groceries.data
    or
$awk '$2 > 1.99 {print $0}' groceries.data

		Apples    3.99   10
		Mushrooms 3.99   3
		Potatoes  2.99   5
		Chicken   4.99   5
	

compound pattern {statements}

$awk '$2 > 1.99 && $3 > 3' groceries.data
    or
$awk '$2 > 1.99 && $3 > 3{print $0}' groceries.data

	Apples    3.99   10
	Potatoes  2.99   5
	Chicken   4.99   5

/regular expressions/ {statements} Usage

$awk '/Tomoto/ {print $0}' groceries.data
    or
$awk '/Tomo*/ {print $0}' groceries.data

	Tomoto    1.99   3
	

pattern1,pattern2 {statements}

$awk '$2>3,$3>3 {print $0}' groceries.data
    or
$awk '$2>3,$3>3 {print $0}' groceries.data

	Apples    3.99   10
	Mushrooms 3.99   3
	Potatoes  2.99   5
	Chicken   4.99   5

Usage of BEGIN and END patterns


     BEGIN and END are special patterns in awk programming. BEGIN pattern executed before any line processed in a file or piped command. END pattern executed after all lines read from a file


	$awk 'BEGIN {print "Vegetables  " "Price  " "Quantity  "} {print $1,"\t",$2,"\t",$3}'  groceries.data
	
	Vegetables  Price  Quantity  
	Tomoto 	 1.99 	 3
	Apples 	 3.99 	 10
	Capsicum 	 1.99 	 7
	Mushrooms 	 3.99 	 3
	Potatoes 	 2.99 	 5
	Chicken 	 4.99 	 5
	$awk 'BEGIN {print "Vegetables  " "Price  " "Quantity  " " Total Price"} {print $1,"\t",$2,"\t",$3,"\t",$2*$3}'  groceries.data
	
	Vegetables  Price  Quantity   Total Price
	Tomoto 	 1.99 	 3 	 5.97
	Apples 	 3.99 	 10 	 39.9
	Capsicum 	 1.99 	 7 	 13.93
	Mushrooms 	 3.99 	 3 	 11.97
	Potatoes 	 2.99 	 5 	 14.95
	Chicken 	 4.99 	 5 	 24.95
$awk 'BEGIN {print "Vegetables  " "Price  " "Quantity  " " Total Price"} {tp=$2*$3;print $1,"\t",$2,"\t",$3,"\t",tp; total = total + tp} END {print "Total Price: ",total}'  groceries.data
	
Vegetables  Price  Quantity   Total Price
Tomoto 	 1.99 	 3 	 5.97
Apples 	 3.99 	 10 	 39.9
Capsicum 	 1.99 	 7 	 13.93
Mushrooms 	 3.99 	 3 	 11.97
Potatoes 	 2.99 	 5 	 14.95
Chicken 	 4.99 	 5 	 24.95
Total Price:  111.67

Find Total Size of the Files in a Directory

     In this example , ls -l lists long listing of the files, 5th column has size of each file or folder in bytes.
grep command searches for files only,i.e except directories. files start with hypen '-', directories start with 'd'. excluding directories.
awk command BEGIN pattern executed before any line read from piped commands. so it displays HEADER "Total Size of the Files"
action pattern has { sum = sum + $5 }, it adds each output i.e $5 to sum variable. sum variable contains previous sum value.it contines all lines are read.
END pattern executed after all lines finished processed by awk command. END prints sum value .

ls -l | grep '^-' | awk 'BEGIN {print "Total Size of the Files"}{ sum = sum + $5 } END { print sum," bytes"}'

Display Sequence Number using NR field

		$ awk '{print NR" " $0}' groceries.data
	
		1 Tomoto    1.99   3
		2 Apples    3.99   10
		3 Capsicum  1.99   7
		4 Mushrooms 3.99   3
		5 Potatoes  2.99   5
		6 Chicken   4.99   5
	

FS and OFS usage in awk

	  $echo "sam:software engineer:50000:12-12-2011:banking domain"| awk 'BEGIN {FS = ":"; OFS = "\t"} {print $5}'
	
	banking domain
	

ADS