Search and Replace in Multiple Files

(Version 4)

What is This?

It a simple program for doing search & replace in multiple files files from a command line with support for recursion into subdirectories, regular expressions (a.k.a. regexps a.k.a. regexs) and a test (dry run) mode.

There have been many many such programs written by others in the past but I just wanted a very simple command line one that works on Microsoft Windows as well as Linux so I hacked a copy of my Bulk File & Directory Renamer with Recursion & Regular Expressions program to do this simpler task.

Although I mainly wrote for Microsoft Windows 2k (where very few programs support regular expressions), it should run equally well under Linux or Mac OS X under Perl. (Of course one does not really need a special program for this if one is using GNU/Linux/Bash as a one line mess of 'find', '-name', '-exec' & 'sed'/'awk' can do it but I am too forgetful of syntax & sloppy with typing to risk destroying files attempting that.)

System Requirements

A Perl interpreter (with the 'File::Find', 'File::Path' & 'Getopt::Std' modules but those usually comes as standard with Perl anyway).

How to Use It

Basic Usage

To process all files and directories in the current working directory and subdirectories thereof changing any occurrence of the expression <From> to the expression <To>, simply:

perl SearchAndReplaceInMultipleFiles.pl <From> <To>

Depending on how you installed the program you might be able to discard some of it and less pedantically do:

SearchAndReplaceInMultipleFiles <From> <To>

Non-trivial expressions and expressions containing spaces will probably need to be in (double for Windows) quotation marks so the shell passes them to the program as strings rather than trying to split up or process them itself.

You will probably want to use the '-m g' option (see below) as that will cause it to replace every match in each of the files rather just the first match in each file, its default behaviour.

Although designed primarily for plain text (and similar, such as HTML) files, it can work on any file type (e.g. to replace dates in EXIF headers of JPG files) but take care as it simply treats all files as if they were plain text or binary, blithely ignoring structure, so it is easy to accidentally corrupt more complex formats.

Options

There are additional options which can be inserted before the two obligatory regular expressions

perl SearchAndReplaceInMultipleFiles.pl <options> <From> <To>

SearchAndReplaceInMultipleFiles <options> <From> <To>

The options are provided in the common Linux short format of single letters each prefixed by '-' and separated from eachother & parameters by spaces. Options that don't require additional parameters can be grouped (e.g. '-ft ' means the same as '-f -t ').

-h
Print a summary of the instructions. (The instruction summary will also be printed if a syntax error is found in the options or if the program is run without parameters.)
-n <Name>
Limit to filenames matching the <name> regular expression.
-m [egimosx]
Apply the given Perl search & replace modifiers. The most useful for file renaming are:
-b <directory>
Process files within <directory> directory instead of the current working directory.
-e
Do not recurse subdirectories.
-t
Test mode. (It does a dry run printing out the changes it would have made but does not does actually make the changes).

Example: Simple Search and Replace in Multiple Files

It can be very useful for correcting spelling mistakes duplicated across lots of files, for example with holiday photographs where a friend spots that I have consistently misspelt the name of a place across dozens of photograph captions.

All it needs is (replace the from & to strings to those required):

Windows: SearchAndReplaceInMultipleFiles -m g "\bLund'n Bridj\b" "London Bridge"

Linux: SearchAndReplaceInMultipleFiles -m g '\bLund'n Bridj\b$' 'London Bridge'

You can use it just as it is as a recipe but if you want an explanation, here goes. The '-m g' tells it to replace every occurrence in each file (so , for example, 'Lund'n Bridj viewed from Lund'n Bridj tour', becomes 'London Bridge viewed from London Bridge tour' not 'London Bridge viewed from Lund'n Bridj tour'). The '\b' marks word boundaries (more generic than spaces, it also includes pronunciation and string ends) to prevent it changing words of which 'Lund'n Bridj' is a substring. One does not really need these complications in this case as substrings are not likely to be problem so one could simply do 'SearchAndReplaceInMultipleFiles "Lund'n Bridj" "London Bridge"' & repeat the command until it makes no further changes.

Avoiding Command Line Problems

One complication of running programs from a command line is that the command line interpreter ('shell') treats some characters specially and replaces them with other things (such as values of settings or unprintable characters) before running the program. Some of these characters are even treated so inside strings. The risky characters are '$' & '$' on Linux and '%' on Windows.

There are two solutions in Linux. The simplest is to use single quotation marks (''...'') instead of double quotation marks ('"..."') for the strings. The other is to prefix ('escape') each '$' & '\' with another '\'. E.g. instead of

SearchAndReplaceInMultipleFiles -t "$20" "$30"

use

SearchAndReplaceInMultipleFiles -t '$20' '$30'

or

SearchAndReplaceInMultipleFiles -t "\$20" "$30"

In Windows I don't know how to prevent it substituting things beginning with '%' if it recognises them as settings (such as '%TMP%', the path to the system temporary directory). Within batch files supposedly prefixing '%' with '^' works but that did not work when I tested it directly on a command line. Fortunately Windows typically only has about 20 settings strings (use 'SET' to see which your installation is using) & '%' is usually used in English text with a space or punctuation after it so this is rarely likely to be a problem.

Safety

This is a powerful program that, running with sufficient file permissions, could corrupt every file on your computer (and networked drives) so take care. Preferably make a back-up copy of your files before use, take care that you are running on the directory you intend to run it on and run it in test mode (the '-t ' option) first checking that the changes it is going to make are what want them to be. Treat it with the care you would treat 'rm -rf *' on Linux or 'del /s /q *' on Windows.

Installation Options

It does not need fancy installation. Provided Perl has been installed and this program has been download, it should be read to run! The following is just options for making it look tidier.

Changing Installation Directory

If you are only going to use it for a one off job, just put it in directory you want renaming done in and run it from there with 'perl SearchAndReplaceInMultipleFiles.pl' (it might accidentally corrupt itself by searching & replacing in itself but its job will have been done by then) and delete it afterwards.

Alternatively, to keep for later use, put it anywhere listed in the computers 'path' setting (e.g. '/usr/local/bin/' on Linux & 'C:\Windows\System32\' will typically work) or put it wherever you like and add its directory to the computer's 'path' setting.

Getting rid of the '.pl' Filename Extension

The '.pl' on the end that tells the computer that it is a Perl program looks a bit untidy.

On Linux, you can remove it by renaming the program provided you tell your computer it is a Perl script either by always running it explicitly prefixed with 'perl ' or by editing the first line of the program so that that the bit after the '#! ' is the location of your computer's Perl interpreter.

On Microsoft Windows you cannot remove the '.pl' without using the explicit 'perl ' method but you can avoid needing to type it by adding ';.PL' to the 'PATHEXT' system setting (that tells Windows that '.pl' files are executables and therefore, like '.exe' files, don't need the file extension typed when searching for them in the system path directories).

Shortening the Name

'SearchAndReplaceInMultipleFiles' is nicely descriptive but long to type. That is not a problem if using GNU/Linux/Bash because file names in the system path directories can be autocompleted by pressing the tab key.

Unfortunately on Microsoft Windows, only items in the current working directory, not the system path directories, can be autocompleted. Hence on Windows it is probably worthwhile renaming the program to something much shorter if it is going to be used frequently.

Download

Download SearchAndReplaceInMultipleFiles.pl (5 KiB).

Other Perl Scripts, Disclaimers Etc.

See my computer programs index page for more simple useful computer programs.