Grep Across Multiple Lines

Philip Wilkinson

Software Engineer, Amazon

Published: 2/1/2024

Quick Reference

Command	Explanation
`bash $ grep -Pzo ‘(?s)from.*to’ <file_name>`	grep -Pzo ‘(?s).. followed by the first word, end word and file name is the simplest way to use grep to match across multiple lines in a file
`bash $ ggrep -Pzo (?s)from.*to <file_name>`	In other cases where -P is no longer supported by grep, you have to use ggrep after installing it with brew install grep.
```bash $ pcre2grep -M ‘from(\n	.)*to’ <file_name> ```

Multiple matches with grep

If you are searching to match across multiple words on the same line, the grep command takes the form:

$ grep ‘from.*to’ <file_name>

For example:

Which uses regular expression syntax to match lines that contain all words complete until complete on the same line. This is because . means all characters while * means as many as possible.

Using grep -P or ggrep -P to grep multiple lines

To multiline match with grep, the command becomes much more complicated:

# if your machine supports grep -P
 $ grep -Pzo ‘(?s)from.*to’ 
 # using ggrep instead
 $ ggrep -Pzo ‘(?s)from.*to’

For example:

If your machine does not support grep -P, you can install ggrep from homebrew-core using brew and the command:

$ brew install grep

This will then become available as ggrep.

The parameters for this are:

-P uses Perl compatible regular expression (PCRE)
-z treats the input as a set of lines, each being terminated by a zero byte instead of a new line. Essentially this allows grep to treat the file as a whole line as opposed to multiple lines
-o prints only the matching strings as otherwise the entire file will be printed. The complication however is that will also add a trailing zero byte character which can cause additional problems.
(?s) activate PCRE_DOTALL which means that “.” finds any character or a new line.
.* will match everything, including new lines, up until to because of the addition of (?s) into the regular expression.

If you want to simply print out file names that have lines that have matches with the regular expression then you can alter the -o flag to -l which will list all matching file names.

Grep for single line to the final word in another line

$ grep -Pzo '(?s)success.*failure' process_output.txt
 # or
 $ ggrep -Pzo '(?s)success.*failure' process_output.txt

For example:

Grep for start of line containing multiple instances of the same word to the end of a line containing multiple instances of the same word

$ grep -Pzo '(?s)scheduled.*complete' process_output.txt
 # or
 $ ggrep -Pzo '(?s)scheduled.*complete' process_output.txt

For example:

Grep for word at the end of one line to the final word in another line

$ grep -Pzo '(?s)failure.*complete' process_output.txt
 # or
 $ ggrep -Pzo '(?s)failure.*complete' process_output.txt

For example:

Using pcre2grep to grep multiple lines

An alternative would be to take advantage of the pcre2grep extension which would simplify the command by adding the flag -M

$ pcre2grep -M 'from(\n|.)*to' <file_name>

Where the -M or --multiline flags allow patterns to match more than one line. This is an alternative that packs inbuilt support for Perl Compatible regular expression and is usually already preinstalled in your system alongside grep. Otherwise, this can be installed using your package manager.

Alternatively, you can also use the (?s) trick from before to turn on PCRE_DOTALL and make the dot character match new lines as well. Which simplifies the command to:

$ pcre2grep -M 'from(\n|.)*to' <file_name>

Common “gotchas” when using grep across multiple lines

grep will use the first and last instances of the words

When using grep across multiple lines it is important to be aware that the command will get both the first instance of the from word and will get everything up until the last instance of the to word. This will likely affect the output you expected, especially when there may be multiple instances of from or to in your document. Alternatively, tools such as awk or sed will start from the first instance of from but finish at the first instance of to.

grep uses regex standards

It is important to know that the “strings” following the grep command will match the document based on the rules of regular expression. This means that simply typing in fail will also match failure. To match only specific words when matching across multiple lines you can use regular expression tools to match one words. For example:

$ grep -Pzo ‘(?s)\bfail\b.*\n.*\bsuccess\b’

grep is case sensitive

grep commands are also case sensitive but you can control this using the i flag to ignore case.

Find out more about grep

As always if you want to find out more about how to use the grep tool you can use:

$ man grep

Which will print out all the options with explanations. Or:

$ grep --help

Which will print out a short page of all the available options.

Alternative tools

Alternatively, tools such as awk and sed make can make this command much simpler to implement. For awk the command would be:

$ awk ‘/from/,/to/’ <file_name>

where from is the first word or regular expression you are searching for and to is the final work you are looking for.

In sed the command is similar and takes the form:

$ sed -n ‘/from/,/to/p’ <file_name>

As with the prior example, from is the first word or regular expression and to is the final word or regular expression you are looking for.

Written by

Philip Wilkinson

Software Engineer, Amazon

Filed Under

Grep Multiple Strings

How to filter lines and extract specific information from the output of commands or text files based on string patterns and regular expressions with grep.

Grep

How To Filter The Output of Commands

Learn how to filter and format the output of commands and logs using the grep, awk, uniq, head, and tail commands.

LinuxGrep

How to Make Grep Case Insensitive

By default, grep is case sensitive

Grep

Grep In a Directory

Learn how to use grep to search for words and phrases within a directory and all its subdirectories, a specific directory, all files, and other variations.

Grep

Exclude With Grep

Excluding unwanted key terms or directories when using grep

Grep

Grep Count

Efficiently count lines or occurrences in a file.

Grep

Grep Across Multiple Lines

Quick Reference

Multiple matches with grep

Using grep -P or ggrep -P to grep multiple lines

Grep for single line to the final word in another line

Grep for start of line containing multiple instances of the same word to the end of a line containing multiple instances of the same word

Grep for word at the end of one line to the final word in another line

Using pcre2grep to grep multiple lines

Common “gotchas” when using grep across multiple lines

grep will use the first and last instances of the words

grep uses regex standards

grep is case sensitive

Find out more about grep

Alternative tools

Related Articles

Grep Multiple Strings

How To Filter The Output of Commands

How to Make Grep Case Insensitive

Grep In a Directory

Exclude With Grep

Grep Count

Download Warp to get started