Saturday, April 26, 2008

Using awk to extract lines in a text file

awk is not an obvious choice as a tool for strictly extracting rows from a text file. It is better known for its column/field manipulation capabilities in a text file. More obvious choices are sed, and perl. You can see how sed does it in my earlier entry.

If you opt for awk, you can use its NR variable which contains the number of input records so far.

Suppose the text file is somefile:
$ cat > somefile.txt
Line 1
Line 2
Line 3
Line 4

To print a single line number, say line 2:
$ awk 'NR==2' somefile.txt
Line 2


If the text file is huge, you can cheat by exiting the program on the first match. Note that this hack will not work if multiple lines are being extracted.
$ awk 'NR==2 {print;exit}' somefile.txt
Line 2


To extract a section of the text file, say lines 2 to 3:
awk 'NR==2,NR==3' somefile.txt
Line 2
Line 3


A more interesting task is to extract every nth line from a text file. I showed previously how to do it using sed and perl.

Using awk, to print every second line counting from line 0 (first printed line is line 2):
$ awk '0 == NR % 2'  somefile.txt
Line 2
Line 4


To print every second line counting from line 1 (first printed line is 1):
$ awk '0 == (NR + 1) % 2'  somefile.txt
Line 1
Line 3


% is the mod (i.e. remainder) operator.

Friday, April 25, 2008

Fast way to execute sequential commands from the command history

When I had to navigate to a subdirectory in a deep hierarchy, I found myself do a lot of cd's.
$ cd /home/peter
$ cd level1
$ cd level2
$ cd level3
$ cd level4
$ pwd
/home/peter/level1/level2/level3/level4

If I ever need to revisit that level4 subdirectory, re-running those same commands in the same order from the command history can be a real chore. You have to keep finding your way back in history to execute the next command.

bash has a short cut, Cntl-O (as in Control-Oh), that is your friend here.

Simply go back to the first command in the series (cd /home/peter).
Hit Cntrl-O (instead of Enter), and it will run that command, and automatically display the next command for you (cd level1).

You can do one of several things at this point:

  • You can hit Cntl-O to run the currently displayed command, and display the next one (cd level2).
  • You can hit Enter to run the current command, and terminate the sequence (next command is NOT displayed).
  • You can hit Cntrl-C, and the current command is NOT executed, and the next command is NOT displayed.


Cntl-O can be a real time saver.

Sunday, April 20, 2008

Quick hex / decimal conversion using CLI

Once in a while, you need to convert a number from hexadecimal to decimal notation, and vice versa.

Say, you want to know the decimal equivalent of the hexadecimal 15A.

You can convert in many different ways, all within bash, and relatively easy.

To convert a number from hexadecimal to decimal:

 $ echo $((0x15a))
346


 $ printf '%d\n' 0x15a
346


 $ perl -e 'printf ("%d\n", 0x15a)'
346


 $ echo 'ibase=16;obase=A;15A' | bc
346


Note that ibase and obase specify the input and the output notation respectively.
By default, the notation for both is decimal unless you change it using ibase or obase.

Because you change the notation to hex using ibase, your obase needs to be specified in hex (A in hex = 10 in decimal).

The input number (15A) needs to be in UPPER case. 15a will give you a parse error.


To convert from decimal to hex,
$ printf '%x\n' 346
15a


 $ perl -e 'printf ("%x\n", 346)'
15a


 $ echo 'ibase=10;obase=16;346' | bc
15A

Saturday, April 19, 2008

Extracting columns and fields from a text file

I posted about extracting lines from a text file ([1], [2]).

Enough about lines for now. Let's turn our attention to extracting columns and delimited fields in a text file. For instance, one task is to extract columns 5 to 7 in a file. Sometimes, the data you want reside in variable-length fields that are delimited by some character, say ",". A sample task is to extract the second field in a comma-delimited file.

As usual, there are more than 1 way to accomplish the tasks. The tools that we will use are cut, awk, and perl.

The text file is somefile.

$ cat > somefile
1234567890
1234567890
1234567890
1234567890

To extract fixed columns (say columns 5-7 of a file):

$ cut -c5-7 somefile
567
567
567
567

$ perl -pe '$_ = substr($_, 4, 3) . "\n"'  somefile
567
567
567
567

The current line ($_) is replaced with substr($_, 4, 3), the substring starting from column 4 (perl is 0-based) for 3 characters.

To illustrate extracting a particular field, let's use /etc/passwd, a colon-delimited file. Say we extract the 6th field (home directory of users).

$ cut -d: -f6 /etc/passwd

$ awk -F : '{print $6}' /etc/passwd

$ perl -p -e '$_ = (split(/[:\n]/))[5] . "\n"' /etc/passwd

Here, I used the split function to separate out the words delimited by colon and the new line. The output of split is a list, and we assign the 5th element (perl is 0-based) to the current line. \n is necessary as a delimiter [:\n]; otherwise extracting the last field will have an extra new line.

If you think of some simple way to do this, please share with us using comments.

Thursday, April 17, 2008

Use sed or perl to extract every nth line in a text file

I recently blogged about the use of sed to extract lines in a text file.

As examples, I showed some simple cases of using sed to extract a single line and a block of lines in a file.

An anonymous reader asked how one would extract every nth line from a large file.

Suppose somefile contains the following lines:
$ cat > somefile
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
$


Below, I show 2 ways to extract every 4th line: lines 4 and lines 8 in somefile.
  1. sed
    $ sed -n '0~4p' somefile
    line 4
    line 8
    $

    0~4 means select every 4th line, beginning at line 0.

    Line 0 has nothing, so the first printed line is line 4.

    -n means only explicitly printed lines are included in the output.

  2. perl
    $ perl -ne 'print ((0 == $. % 4) ? $_ : "")'  somefile
    line 4
    line 8
    $


    $. is the current input line number.

    % is the remainder operator.

    $_ is the current line.


    The above perl statement prints out a line if its line number
    can be evenly divided by 4 (remainder = 0).

    Alternatively,
    $ perl -ne 'print unless (0 != $. % 4)' somefile
    line 4
    line 8
    $

Click here for a more recent post on sed tricks.