September 13, 2007

Doc contribution: Using awk

I comment a contribution came from far away (Doc):
I wanted to see just a part of some data in a file, ascii data, otherwise it doesn't work. Let's create a simple data file with bash:

for i in `seq 1 1000`; do echo -n $i " "; echo "$i*$i" | bc ; done > dati.txt

[which means for i in the sequence (seq) from 1 to 1000 (the ` are important), do the following: send to output or print (echo), without a new line (-n), the value of the variable i ($i), then a space (" "), then (;) send to output (echo) the value of i squared ("$i*$i"), but bash is not able to do that, so the $i*$i must be sent (|, which is pipe) to another program, in this case bc, a simple calculator. Once everything is done, send the result to the file (>) called dati.txt]
So, we have a file with two columns, one with the numbers from 1 to 1000, the second with these numbers squared. The order is not relevant. Now, I want to see only the data where the first column ends with 5 AND the second column is between 2000 and 3000:

awk '{if ($1 ~ /5$/ && $2 > 2000 && $2 < 3000) print $1, $2}' dati.txt
45 2025

[(that's more difficult for me) This calls the powerful program awk, in its form awk 'ProgramText' file. The $1 and $2 are the first and second field found in the file dati.txt, the first and second number of each line. So, awk prints out (print) $1 and $2 if the condition ($1 ~ /5$/ && $2 > 2000 && $2 < 3000) is matched (true). This condition is formed by three conditions connected with a logical AND (&&): the first condition [$1 ~ /5$/] means that the first field $1 (the first number) must match (~) a string with 5 at the end (5$) [if for example the 5 needed to be at the beginning, it would be 5^]; the second ($2>2000) and third ($2<3000) condition say that the second field ($2) must be between 2000 and 3000.

No comments: