Sed


This is from an old assignment...

Sed

  • SED is a stream editor. The most common use for it is to do global replacement of regular-expression patterns by strings. The general syntax for this use is as follows:
    sed s/[substitution-pattern]/[replacement-string]/g [filename]
    
    As well as 'man sed', the most useful resource on sed is probably the SED FAQ. There is also a user's manual, SED - A Non-interactive Text Editor, and a collection of SED one-liners here. There are versions of SED available for many platforms, including Windows.
  • Go to the base directory of your copy of the lnx.sl.edu web site (from assignment #4) and give the command 'cat index.html' to print to the screen the home page. Now give the command:
    sed s/P/p/g index.html
    
    The default behavior for SED is to write to standard output--so a modified version of the file should scroll past in which uppercase P's in the paragraph tags have all been changed to lowercase p's. Note that when you are doing global replacements like this you must watch out for unintended hits on the substitution pattern. In this case the capital P's in the phrase 'Home Page' have also been changed to lowercase. This can be avoided by choosing a slightly larger chunk for replacement (see below on handling special symbols). Now do 'cat index.html' again. The original file has not been changed. In order to save the changes we would need to redirect the output to a file like this:
    sed s/P/p/g index.html > index.bak
    
  • Now lets try another command which appends a string to an already existing string:
    sed s/size="2"/& color="#FF0000"/g index.html
    
    Unfortunately, you will get a syntax error when you try this. The problem is that bash, which parses the command line, has a special meaning for the double quotation mark. In order to get around this problem, and let SED parse the substitution command instead of bash, you should enclose the substition command in single quotes like this:
    sed 's/size="2"/& color="#FF0000"/g' index.html
    
    It is good practice to always use single quotes around the substitution. It would let us fix our first example, for instance, by using the entire paragraph tag for replacement, since '<' and '>' also have special bash meanings. (Try this.) The use of single quotes is something you will also see in the syntax of other commands that must be parsed by bash. Note that the ampersand, when used in a replacement string, indicates that whatever string matched the pattern on the left should be inserted at that point. Hence, what the above SED command does is look for 'size="2"' and replace it with 'size="2" color="#FF0000"'. This should affect two locations at the very bottom of the page and cause a color specification to be added to the font tag. We can make this change permanent by redirecting the output to a file and then replacing index.html with the modified file, as follows:
    sed 's/size="2"/& color="#FF0000"/g' index.html > index.bak && 
    	mv index.bak index.html
    
    Try this, and then open the page in your browser and see what happened. Note that when making permanent global changes like this it is very important that you be aware of whether another global change can undo what you have done. In this case, there is an obvious SED command which will change the color back. See if you can find it and then check the result with your browser. In other cases, though, there might not be an easy way to restore the file to its original state (for instance, if we were to change all uppercase letters to lowercase--easy to do but not easy to undo).
  • So far we have only done things that could have been done as easily with an interactive text editor. The real advantage of SED, though, is in processing very large files (such as log files) or multiple files that cannot be so easily managed with a text editor. Files can be designated using wild cards, and you can even do recursive replacement throughout an entire directory structure (a great tool for web developers) by combining find and sed into a single shell script--another example reinforcing the point made in The Unix philosophy of small tools. So use a text editor to create a file called SR (for search and replace) which contains the following script:
    #SR does global replacement in html files--use with caution!
    
    find ./ -name '*.html' -print | while read i
    do
    sed s/gentle_05.jpg/gray_fab.jpg/ $i >$i.bak && mv $i.bak $i
    done
    
    
    Now execute the script by typing SR. (If this does not work it is because the directory in which you saved SR is not in your path--either edit your .bash_profile to change your path, or use ./SR to indicate the current directory--in general it is a good idea to keep scripts in a special directory which has been permanently added to your path). Once you have run the script, use your browser to check the changes to your web structure. The background should be different throughout.

    It would be nice, of course, not to have to edit this script everytime we want to make a different global change. So modify the script by putting double quotes around the substitution command and using positional variables for the substitution pattern and replacement string, as follows:

    #SR does global replacement in html files, replacing its first  
    #parameter by its second parameter--use with extreme caution!
    
    find ./ -name '*.html' -print | while read i
    do
    sed "s/$1/$2/g" $i >$i.bak && mv $i.bak $i
    done
    
    The difference between double quotes and single quotes, in this instance, is that the double quotes still allow the bash positional variables, $1 and $2, to be processed by bash, while single quotes would have hidden these from bash--see 'man bash'. Also note that the g switch has again been added to the end of the substitution command. This merely allows multiple instances within a line to be replaced--otherwise SED would only replace the first instance on a given line. Now test this script by executing the command:
    SR gray_fab.jpg swirly.jpg
    
    (or perhaps ./SR gray_fab.jpg swirly.jpg)
    
    gnu-head-sm.jpg Check the result with your browser. This illustrates the importance of being able to undo global changes. There are other patterns you can play with in the root of your web site, and many on the internet. See if you can make some other global changes to font, color, spacing, text, or images.


    Updated March 1, 2002