grep, beyond the basics

What is grep?

grep is the workhorse text-search tool on Unix-like systems. In day-to-day DevOps and platform work it's everywhere — pulling lines out of logs, filtering output, hunting for the one error in a 50 MB file. This post covers the basics quickly and then digs into the flags and patterns I reach for most often.

A bit of history

grep first showed up in early-1970s Unix. The name comes from the ed editor command g/re/p — "globally search a regular expression and print." It's been polished steadily since, and modern grep is fast enough to chew through gigabytes without breaking a sweat.

The basics

grep prints lines that match a pattern. The flags I use most often:

-i — case-insensitive
-v — invert match (show non-matching lines)
-c — count matching lines
-n — show line numbers
-r — recursive (search a directory tree)

Examples:

bash

1# case-insensitive search2grep -i "GET" /var/log/nginx/access.log3# 192.168.1.1 - - [10/Oct/2023:14:32:10 +0000] "GET /index.html" 200 ...45# everything that's NOT a GET6grep -v "GET" /var/log/nginx/access.log7# 192.168.1.1 - - [10/Oct/2023:14:32:15 +0000] "POST /api/v1/data" 201 ...89# how many GETs in this file?10grep -c "GET" /var/log/nginx/access.log11# 124

Regex: where grep gets useful

By default grep uses BRE (basic regex). Pass -E for ERE (extended) — alternation, +, ?, {} work without backslashes. I almost always use -E.

bash

1# lines starting with a specific IP (anchored)2grep "^78\.172\.216\.215" /var/log/nginx/access.log3# 78.172.216.215 - - [10/Oct/2023:14:32:10 +0000] "GET /index.html" 200 ...45# pull 404s6grep " 404 " /var/log/nginx/access.log7# 78.172.216.215 - - [10/Oct/2023:14:32:12 +0000] "GET /non-existent-file.html" 404 ...89# combined: lines starting with "client" that contain "error"10grep "^client.*error" /var/log/nginx/error.log

Show context around matches

When you're debugging, the matching line alone is rarely enough. -A, -B, -C print lines after, before, or surrounding the match.

bash

1# 3 lines after each match2grep -A 3 "error" /var/log/nginx/error.log34# 3 lines before5grep -B 3 "error" /var/log/nginx/error.log67# 3 before and 3 after8grep -C 3 "error" /var/log/nginx/error.log

Print only the matched substring

-o returns just the part that matched, not the whole line. Combined with sort | uniq -c it's a quick way to see distinct values and their counts.

bash

1# every "error" occurrence on its own line2grep -o "error" /var/log/nginx/error.log

Working with files

bash

1# which files contain a match2grep -l "error" /var/log/nginx/*.log3# /var/log/nginx/error.log45# include / exclude file patterns6grep "error" /var/log/nginx/*.log --exclude="*.gz"7grep -r "TODO" --include="*.py" .

Real-world log analysis

Find unique IPs and hit counts

The classic grep | sort | uniq -c pipeline. Pull every IPv4 in the log, sort it, count duplicates:

bash

1grep -oE "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /var/log/nginx/access.log \2  | sort | uniq -c | sort -rn | head

bash

1     28 103.154.125.982     19 103.0.0.03     16 104.209.133.1684      8 102.0.0.05      4 101.200.46.196      ...

(The trailing sort -rn | head gives you the top talkers.)

Match multiple patterns

-E with alternation, or pass -e multiple times:

bash

1grep -E "error|fail" /var/log/nginx/error.log2grep -e "error" -e "fail" /var/log/nginx/error.log

Use it through SSH

Grep against a remote log without copying it locally first:

bash

1ssh user@remote-server "grep 'error' /var/log/nginx/error.log"

In a shell script

bash

1#!/bin/bash2errors=$(grep -ic "error" /var/log/nginx/error.log)3echo "Total error count: $errors"

Common pitfalls

No output: pattern didn't match. Double-check the file path and the regex (try -i to rule out case sensitivity, or -F if you actually want a literal string and your pattern has metacharacters).
Permission denied: log files often need root. sudo grep ... or use journalctl for systemd-managed services.
grep matching itself in ps output: classic gotcha — ps aux | grep nginx finds your grep nginx process too. Either pipe through grep -v grep or do pgrep nginx.

Modern alternatives worth knowing

grep is everywhere, but for everyday code search:

ripgrep (rg) — significantly faster, respects .gitignore by default, smarter defaults.
ack, ag (the silver searcher) — older but still around.

For one-off log analysis on a server you SSH'd into, grep is still king because it's there.

References

GNU grep manual