Grep Multiple Strings

Last Updated 2022-12-29

Grep is a command line utility useful for many text-based search tasks, including searching for two or more strings or regular expressions. In other words, running multiple [.inline-code]grep[.inline-code] in one line. 

In brief, both of these commands accomplish finding multiple strings:

 # using the or condition
 $ grep ‘hello\|fizz’ myfile.txt
 # using -e
 $ grep -e ‘hello’ -e ‘fizz’ myfile.txt

[#for-two-strings]Full example to grep for two strings[#for-two-strings]

 # first let's seed a test file with some text for us to search for
 $ echo -e "hello world \nfizzbuzz \nNoMatch" > test.txt

 # now let's run our command
 $ grep 'hello\|fizz' test.txt
 hello world

Continuing to use that same [.inline-code]test.txt[.inline-code] file, here’s an example with some simple regular expressions

 $ grep 'he.*ld\|^fizz.*' test.txt
 hello world

This second example could equivalently be written as

 $ grep -e 'he.*ld' -e '^fizz.*' test.txt
 hello world

The [.inline-code]-e[.inline-code] option followed by a pattern (or expression) is useful to keep things readable, especially as you go beyond searching for just two patterns.

Recall that, by default, grep is case sensitive. Use [.inline-code]-i[.inline-code] to make grep case insensitive.

[#searching-for-more-than-two-patterns]Searching for more than two patterns[#searching-for-more-than-two-patterns]

Grep can handle as many patterns as you would like to give it. Just continue adding [.inline-code]\|[.inline-code] between the patterns as in the first two examples, or additional patterns prefaced with [.inline-code]-e[.inline-code] as in the third example. To walk through this, let’s quickly modify our test text file with:

 $ echo -e "hello world \nfizzbuzz \nyay, a match" > test.txt

And now we can run either of the below commands

 $ grep 'he.*ld\|fizz\|yay' test.txt
 $ grep -e 'he.*ld' -e 'fizz' -e 'yay' test.txt

And get the result we are looking for.

 hello world
 yay, a match

[#common-gotcha-with-regex]Common gotcha: [.inline-code]grep[.inline-code] with complicated regular expressions[#common-gotcha-with-regex]

If the regular expression you are trying to use isn’t working with grep, it’s possible that you’re using an Extended Regular Expression (ERE). By default, grep only accepts Basic Regular Expressions (BRE). Not to worry, by passing the [.inline-code]-E[.inline-code] flag you can force grep to use your ERE (note: passing the [.inline-code]-E[.inline-code] flag is the same thing as using the now-deprecated [.inline-code]egrep[.inline-code] tool). You can still search for multiple patterns with the [.inline-code]-E[.inline-code] flag, but now instead of separating the patterns with [.inline-code]\|[.inline-code], you’ll just use the [.inline-code]|[.inline-code] without the [.inline-code]\[.inline-code] to escape it.

 $ grep -E 'he.*ld|(z.*){4}' test.txt
 hello world

The [.inline-code](z.*){4}[.inline-code] pattern above looks for lines with at least four ‘z’s, but it doesn’t work without the [.inline-code]-E[.inline-code] flag because [.inline-code]{[.inline-code] wouldn’t be recognized as a special character. 

There are also Perl-compatible Regular Expressions (PCRE), which we will use in the “Searching Between Two Strings” section. We won’t be getting into the differences between these three syntaxes in this post.

[#patterns-from-a-file][.inline-code]grep[.inline-code] with search patterns drawn from a file[#patterns-from-a-file]

Another way to search for multiple patterns using [.inline-code]grep[.inline-code] is to employ the [.inline-code]-f[.inline-code] flag. This allows you to draw your search keywords from a file instead of supplying them as command line arguments. Let’s run through an example.

First let’s load the patterns we are searching for into a [.inline-code]keywords.txt[.inline-code] file. These can be any basic regular expression.

 $ echo "he.*ld\\n^fizz.*" > keywords.txt

Now let’s run our command on our existing test.txt file and see our results!

 $ grep -f keywords.txt test.txt
 hello world

[#between-two-strings-or-patterns][.inline-code]grep[.inline-code] between two strings or patterns[#between-two-strings-or-patterns]

Sometimes what you really need is to search and extract the text between two strings. Let’s say that you have a large log file pertaining to an incident and you need to search for and extract all of the account IDs that were affected. Assuming that the relevant lines of the log file look something like this:

 Time: 2022-12-08 00:25:10.828 UTC AccountID: 1563 Env: prod [...]

We would want to extract the text between [.inline-code]AccountID: [.inline-code] and [.inline-code] Env[.inline-code]. To do this we can use:

 $ grep -oP ‘(?<=AccountID: )(.*)(?= Env)’ log_file.txt

Here we are using a positive lookbehind [.inline-code]?<=[.inline-code] to locate the AccountID string, and then match 0 or more of any character within our capturing group, and finally use a positive lookahead ([.inline-code]?=[.inline-code]) to halt our search after the account ID. Note that there is a trailing space after “AccountID:” and a leading space before “Env”. You can play around with this a bit yourself in a playground here.

You can substitute in a regular expression in place of “AccountID: “ or “ Env” in the same way as before and it will work as expected. As a toy example, this would return the same results:

 $ grep -oP ‘(?<=A(.*)ID: )(.*)(?= Env)’ log_file.txt

The [.inline-code]-o[.inline-code] flag makes [.inline-code]grep[.inline-code] print only the matching part of the lines while the [.inline-code]-P[.inline-code] flag specifies that you’re passing [.inline-code]grep[.inline-code] a Perl-compatible regular expression (PCRE) (the difference between PCRE and other regular expression syntaxes are detailed here). Note that the [.inline-code]grep[.inline-code] installed in some operating systems (including many newer Macs) do not allow the [.inline-code]-P[.inline-code] flag by default, in which case using another tool like [.inline-code]sed[.inline-code] or [.inline-code]awk[.inline-code] would likely be the best alternative. The [.inline-code]sed[.inline-code] command for this would be:

 $ sed -e 's/.*AccountID: \(.*\)Env.*/\1/' log_file.txt

[#searching-across-multiple-lines]What about searching across multiple lines?[#searching-across-multiple-lines]

While [.inline-code]grep[.inline-code] is a very powerful tool, if you need to search across multiple lines, you’re likely better off using another command like [.inline-code]awk[.inline-code] or [.inline-code]sed[.inline-code]. here’s a quick example using [.inline-code]awk[.inline-code] to search across multiple lines from our final [.inline-code]test.txt[.inline-code]

 $ awk '/he.*ld/,/match/' test.txt
 hello world
 yay, a match