DevOps Zone is brought to you in partnership with:

Geoff Papilion has made a living running infrastructure for the past 15 years. He is currently employeed at Wikia.com, scaling the infrastructure to 1.5 billion request per day. Geoffrey is a DZone MVB and is not an employee of DZone and has posted 26 posts at DZone. You can read more from them at their website. View Full User Profile

Fork Less in Bash and See Performance Wins

07.17.2013
| 4638 views |
  • submit to reddit

So, if you haven’t seen this page you should take a look. It has a whole bunch of interesting techniques you can use to manipulate strings in bash. If you end up working with bash a lot you might find yourself doing this quite a bit, since it can save a lot of time.

Let's take a pretty typical task, stripping the domain off of an email address.

So this poorly written program will split an email address at the first @ and print the first portion:

#!/bin/bash
i=0
while test $i -lt 1000; do
	STRING="foo@bar.com"
	echo ${STRING} | awk -F@ '{print $1}'
	i=$(($i+1))
done

Its counterpart, which does not fork, uses a bash built-in to remove everything after the @:

#!/bin/bash
i=0
while test $i -lt 1000; do
	STRING="foo@bar.com"
	echo ${STRING%%@*}
	i=$(($i+1))
done

So, what's the execution difference?

$ time bash with_awk_split.sh > /dev/null
 
real	0m3.737s
user	0m0.196s
sys	0m0.556s
 
$ time bash with_bash_split.sh > /dev/null
 
real	0m0.034s
user	0m0.020s
sys	0m0.012s

It's 100x faster to skip the fork.

Now, granted this is a pretty dumb example, and it's easy to rewrite this to perform better than the bash example (i.e. don’t use a loop and just use awk which is 3x faster than the pure bash solution). So, think about what your doing, use a pipe over a loop, and if you can’t do that, try to find a built-in that can take care of your needs.

Published at DZone with permission of Geoffrey Papilion, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Loren Kratzke replied on Wed, 2013/07/17 - 1:25pm

Interesting stuff. On the flip side, when you pipe the output of one operation into the input of another operation you are able to utilize multiple cores. This can be good when you are chaining multiple ffmpeg commands together, or other similar operation that works longer on a larger amount of data, as opposed to iterating many times over a relatively quick running command such as above. The data and task have a big impact on which techniques execute the fastest.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.