awk: A Tiny Programming Language for Text

awk is a small programming language disguised as a Unix command. Its specialty: column-oriented text. If your data has fields separated by spaces or commas — log files, /etc/passwd, CSVs, ps output — awk is faster than writing a Python script.

The mental model

awk reads input line-by-line. Each line is split into fields by whitespace (or whatever you specify). For each line, awk checks patterns and runs actions:

awk 'PATTERN { ACTION }' file

Patterns can be regex, line numbers, or expressions. Actions can be print statements, assignments, or full programs.

Field variables

  • $0 — entire current line
  • $1, $2, … — first field, second field, etc.
  • NF — number of fields on this line
  • NR — line number (record number)
  • $NF — last field on the line

Quick examples

# Print second column of any whitespace-delimited file
awk '{print $2}' file.txt

# Print first AND last column
awk '{print $1, $NF}' file.txt

# Print every line where the third column is "ERROR"
awk '$3 == "ERROR"' log.txt

# Print every line longer than 80 characters
awk 'length($0) > 80' file.txt

# Print line numbers
awk '{print NR, $0}' file.txt

# Skip header line
awk 'NR > 1' file.csv

Custom field separator

# For CSV (comma separated)
awk -F',' '{print $1, $3}' data.csv

# For /etc/passwd (colon separated)
awk -F':' '{print $1, $7}' /etc/passwd
# alice /bin/bash
# bob /bin/zsh

# For multiple delimiters (space OR comma OR tab)
awk -F'[ ,t]+' '{print $1}' messy.txt

BEGIN and END blocks

Run code before/after the main loop:

# Sum the third column of a CSV
awk -F',' 'BEGIN {sum=0} {sum+=$3} END {print "Total:", sum}' data.csv

# Average
awk '{sum+=$1; count++} END {print sum/count}' numbers.txt

# Print header before processing
awk 'BEGIN {print "Name | Shell"} {print $1, "|", $7}' /etc/passwd

Conditionals and loops

# If/else
awk '{ if ($3 > 100) print $1, "BIG"; else print $1, "small" }' data.txt

# for loop (sum columns)
awk '{ for (i=1; i<=NF; i++) total += $i } END {print total}' numbers.txt

# Multiple patterns
awk '/error/ {errs++} /warning/ {warns++} END {print errs, "errors,", warns, "warnings"}' log.txt

Real-world one-liners

# Top 10 IPs by request count in nginx log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

# Average response time from nginx access log (last field)
awk '{sum+=$NF; n++} END {print "avg:", sum/n}' access.log

# Find all users using bash
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd

# Count lines per file extension in current dir
ls | awk -F. '{print $NF}' | sort | uniq -c | sort -rn

# Convert space-separated to TSV
awk '{$1=$1; print}' OFS='t' file.txt

# Print only odd-numbered lines
awk 'NR%2 == 1' file.txt

# Sum disk usage from "du -sh" output (a hack — du gives K/M/G suffixes)
du -sh */ | awk '{sum+=$1} END {print sum, "total (mixed units, not great)"}'

Built-in functions

length(s)               # length of string
substr(s, start, len)   # substring
index(s, t)             # position of t in s
split(s, arr, sep)      # split string into array
toupper(s) / tolower(s) # case conversion
gsub(/pat/, repl, s)    # global substitute in s
printf "%-20s %dn"     # formatted print (like C)

Common mistakes

  • Forgetting to quote the awk program — single quotes around it always.
  • Mixing up $0 (line) with $1 (first field).
  • Trying to do everything in awk when sed or grep would be cleaner.
  • Not using BEGIN {FS=","} or -F',' for CSVs and getting confused.

What to learn next

The fourth-musketeer of text processing — sort, uniq, wc, cut — is up next. These are smaller utilities that combine perfectly with grep, sed, and awk.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *