Home

grep, beyond the basics

What is grep?

grep is the workhorse text-search tool on Unix-like systems. In day-to-day DevOps and platform work it's everywhere — pulling lines out of logs, filtering output, hunting for the one error in a 50 MB file. This post covers the basics quickly and then digs into the flags and patterns I reach for most often.

A bit of history

grep first showed up in early-1970s Unix. The name comes from the ed editor command g/re/p — "globally search a regular expression and print." It's been polished steadily since, and modern grep is fast enough to chew through gigabytes without breaking a sweat.

The basics

grep prints lines that match a pattern. The flags I use most often:

Examples:

bash
# case-insensitive searchgrep -i "GET" /var/log/nginx/access.log# 192.168.1.1 - - [10/Oct/2023:14:32:10 +0000] "GET /index.html" 200 ...# everything that's NOT a GETgrep -v "GET" /var/log/nginx/access.log# 192.168.1.1 - - [10/Oct/2023:14:32:15 +0000] "POST /api/v1/data" 201 ...# how many GETs in this file?grep -c "GET" /var/log/nginx/access.log# 124

Regex: where grep gets useful

By default grep uses BRE (basic regex). Pass -E for ERE (extended) — alternation, +, ?, {} work without backslashes. I almost always use -E.

bash
# lines starting with a specific IP (anchored)grep "^78\.172\.216\.215" /var/log/nginx/access.log# 78.172.216.215 - - [10/Oct/2023:14:32:10 +0000] "GET /index.html" 200 ...# pull 404sgrep " 404 " /var/log/nginx/access.log# 78.172.216.215 - - [10/Oct/2023:14:32:12 +0000] "GET /non-existent-file.html" 404 ...# combined: lines starting with "client" that contain "error"grep "^client.*error" /var/log/nginx/error.log

Show context around matches

When you're debugging, the matching line alone is rarely enough. -A, -B, -C print lines after, before, or surrounding the match.

bash
# 3 lines after each matchgrep -A 3 "error" /var/log/nginx/error.log# 3 lines beforegrep -B 3 "error" /var/log/nginx/error.log# 3 before and 3 aftergrep -C 3 "error" /var/log/nginx/error.log

-o returns just the part that matched, not the whole line. Combined with sort | uniq -c it's a quick way to see distinct values and their counts.

bash
# every "error" occurrence on its own linegrep -o "error" /var/log/nginx/error.log

Working with files

bash
# which files contain a matchgrep -l "error" /var/log/nginx/*.log# /var/log/nginx/error.log# include / exclude file patternsgrep "error" /var/log/nginx/*.log --exclude="*.gz"grep -r "TODO" --include="*.py" .

Real-world log analysis

Find unique IPs and hit counts

The classic grep | sort | uniq -c pipeline. Pull every IPv4 in the log, sort it, count duplicates:

bash
grep -oE "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /var/log/nginx/access.log \  | sort | uniq -c | sort -rn | head
bash
     28 103.154.125.98     19 103.0.0.0     16 104.209.133.168      8 102.0.0.0      4 101.200.46.19      ...

(The trailing sort -rn | head gives you the top talkers.)

Match multiple patterns

-E with alternation, or pass -e multiple times:

bash
grep -E "error|fail" /var/log/nginx/error.loggrep -e "error" -e "fail" /var/log/nginx/error.log

Use it through SSH

Grep against a remote log without copying it locally first:

bash
ssh user@remote-server "grep 'error' /var/log/nginx/error.log"

In a shell script

bash
#!/bin/basherrors=$(grep -ic "error" /var/log/nginx/error.log)echo "Total error count: $errors"

Common pitfalls

Modern alternatives worth knowing

grep is everywhere, but for everyday code search:

For one-off log analysis on a server you SSH'd into, grep is still king because it's there.

References