awk: A Tiny Programming Language for Text
awk is a small programming language disguised as a Unix command. Its specialty: column-oriented text. If your data has fields separated by spaces or commas — log files, /etc/passwd, CSVs, ps output — awk is faster than writing a Python script.
The mental model
awk reads input line-by-line. Each line is split into fields by whitespace (or whatever you specify). For each line, awk checks patterns and runs actions:
awk 'PATTERN { ACTION }' file
Patterns can be regex, line numbers, or expressions. Actions can be print statements, assignments, or full programs.
Field variables
$0— entire current line$1,$2, … — first field, second field, etc.NF— number of fields on this lineNR— line number (record number)$NF— last field on the line
Quick examples
# Print second column of any whitespace-delimited file
awk '{print $2}' file.txt
# Print first AND last column
awk '{print $1, $NF}' file.txt
# Print every line where the third column is "ERROR"
awk '$3 == "ERROR"' log.txt
# Print every line longer than 80 characters
awk 'length($0) > 80' file.txt
# Print line numbers
awk '{print NR, $0}' file.txt
# Skip header line
awk 'NR > 1' file.csv
Custom field separator
# For CSV (comma separated)
awk -F',' '{print $1, $3}' data.csv
# For /etc/passwd (colon separated)
awk -F':' '{print $1, $7}' /etc/passwd
# alice /bin/bash
# bob /bin/zsh
# For multiple delimiters (space OR comma OR tab)
awk -F'[ ,t]+' '{print $1}' messy.txt
BEGIN and END blocks
Run code before/after the main loop:
# Sum the third column of a CSV
awk -F',' 'BEGIN {sum=0} {sum+=$3} END {print "Total:", sum}' data.csv
# Average
awk '{sum+=$1; count++} END {print sum/count}' numbers.txt
# Print header before processing
awk 'BEGIN {print "Name | Shell"} {print $1, "|", $7}' /etc/passwd
Conditionals and loops
# If/else
awk '{ if ($3 > 100) print $1, "BIG"; else print $1, "small" }' data.txt
# for loop (sum columns)
awk '{ for (i=1; i<=NF; i++) total += $i } END {print total}' numbers.txt
# Multiple patterns
awk '/error/ {errs++} /warning/ {warns++} END {print errs, "errors,", warns, "warnings"}' log.txt
Real-world one-liners
# Top 10 IPs by request count in nginx log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
# Average response time from nginx access log (last field)
awk '{sum+=$NF; n++} END {print "avg:", sum/n}' access.log
# Find all users using bash
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd
# Count lines per file extension in current dir
ls | awk -F. '{print $NF}' | sort | uniq -c | sort -rn
# Convert space-separated to TSV
awk '{$1=$1; print}' OFS='t' file.txt
# Print only odd-numbered lines
awk 'NR%2 == 1' file.txt
# Sum disk usage from "du -sh" output (a hack — du gives K/M/G suffixes)
du -sh */ | awk '{sum+=$1} END {print sum, "total (mixed units, not great)"}'
Built-in functions
length(s) # length of string
substr(s, start, len) # substring
index(s, t) # position of t in s
split(s, arr, sep) # split string into array
toupper(s) / tolower(s) # case conversion
gsub(/pat/, repl, s) # global substitute in s
printf "%-20s %dn" # formatted print (like C)
Common mistakes
- Forgetting to quote the awk program — single quotes around it always.
- Mixing up
$0(line) with$1(first field). - Trying to do everything in awk when sed or grep would be cleaner.
- Not using
BEGIN {FS=","}or-F','for CSVs and getting confused.
What to learn next
The fourth-musketeer of text processing — sort, uniq, wc, cut — is up next. These are smaller utilities that combine perfectly with grep, sed, and awk.