March 16, 2016
Because of the arcane syntax?Because other languages cant do the job?
No.
I resisted AWK for a long time. Couldnt I already do everything I needed with
sed and grep? I felt that anything more complex should be done with a real
language. AWK seemed like yet-another-thing to learn, with marginal benefits.
Why Learn AWK?
Let me count the ways.
You are working TOO HARD
Too many times, Ive seen people working way too hard at the command-line trying to
solve simple tasks.
Imagine programming without regular expressions.
Can you even imagine the alternative? Would it entail building FSMs
from scratch? Would it be easy to program? Would it be fun? Would it work the way you want?
Thats life without AWK.
For simple tasks (only print column 3 or sum the numbers from column 2)
almost falling in the grep-and-sed category, but where you feel you might
need to open a man page, AWK is usually the solution.
And if you think that creating a new script file (column_3.py or sum_col_2.rb),
putting it somewhere on disk, and invoking it on your data isnt bad Im
telling you that youre working too hard.
Available EVERYWHERE
On Linux, BSD, or Mac OS, AWK is already available. It is required by any
POSIX-compliant OS.
More importantly, it will be the AWK you know. It has been around for a long
time, and the way it works is stable. Any upgrade would not (could not) break
your scripts its the closest thing to it just works.
Contrast with BASH or Python do you have the right version? Does it have
all the language features you need? Is it backward and forward compatible?
When I write a script in AWK, I know 2 things:
- AWK is going to be anywhere I deploy
- its going to work
Scope
You shouldnt write anything complicated in AWK. Thats a feature it
limits what youre going to attempt with the language. You are not going to
write a web server in AWK, and you know it wouldnt be a good idea.
Theres something refreshing about knowing that youre not going to import a library
(let alone a framework), and worry about dependencies.
Youre writing an AWK script, and youre going to focus on what AWK is good at.
Language Features
Do you want the following? (especially compared to BASH)
- hashes
- floating-point numbers
- modern (i.e. Perl) regular expressions
Its all there, ready to go. Dont worry about the version number, the
bolted-on syntax, or the dependence on other tools.
Convenience: minimized bureaucracy
In a script sandwich, your logic is the meat, and the surrounding bureaucracy
is the bread. In practice, bureaucracy means:
- opening and closing files
- iterating over each line of each file
- parsing or breaking a line into fields
These things are needed but they arent what your script is about. AWK takes
care of all that, your code is implicitly surrounded by a loop thats going to
iterate over every input line.
DISCLAIMER: This isnt AWK, its JavaScript. It might as well be pseudocode.
All code simplified and for illustrative purposes only.
// open each file, assign content to “lines”lines.forEach(function(line){// the code you write goes here});// close all the files
AWK is going to break each line into fields or columns for many people,
that feature is the main reason to use AWK. By default, AWK breaks a line into
fields based on whitespace (i.e. /\s+/) and ignores leading or trailing
whitespace.
Also, AWK is automatically going to set a bunch of useful variables for you:
- NF how many fields in the current line
- NR what the current line number is
- $1, $2, $3 .. $9 .. the value of each field on the current line
- $0 the content of the current line
- and more
// open each file, assign content to “lines”varNR=0;lines.forEach(function(line){NR=NR+1;varfields=line.trim().split(/\s+/);varNF=fields.length;// the code you write goes here});// close all the files
Convenience: automatic conversions
AWK does automatic string-to-number conversions. Thats something terrible in
real programming languages, but very convenient within the scope of the things
you should attempt with AWK.
Convenience: automatic variables
Variables are automatically created when first used; you dont need to declare variables.
a++
Lets unpack it:
- the variable a is created
- using ++ treats it as a number a is initialized to 0
- the ++ operator increments it
Its even more useful with hashes:
things[$1]++
- things is created, as a hash
- using dynamic key $1, a value is initialized to 0 (implicit in ++ use)
- the ++ operator increments it
Convenience: built-in functions
AWK has a bunch of numeric and
string functions at your disposal.
AWK is PERFECT*
AWK is PERFECT when you use it for what its meant to do:
- very powerful one-liners
- (or) short AND portable scripts
Now What?
Maybe Ive convinced you to reconsider AWK: good.
How do you learn AWK?
There are many possibilities:
In my next post, Ill explain everything you need to get you started with AWK.
Discuss on Twitter
TweetFollow @jpalardy