#! D:\perl\bin\perl.exe ############################################################################### ############################################################################### # # To remove the mass of unwanted formatting & other crap that M$ Word # puts in HTML when converting a Word document to HTML # # by Andrew Hardwick, http://duramecho.com, # Released under GNU Public Licence. # ############################################################################### ############################################################################### # Version 1, 2001/12/12 # Version 2, 2002/3/22 # More garbage removal added. # Version 3, 2002/8/11 # More garbage removal added. # Version 4, 2005/3/5 # Just added this version history section into the comments. # Version 5, 2008/3/21 # Made it convert from M$ Windows codepage 1252 character set to UTF-8. ############################################################################### ############################################################################### # How To Use # Run from a command line with the source file name as arguement. # Output is to the same directory with file name prepended with 'Stripped'. # The following things still need manual correction: # Remove small caps formatting before converting to HTML as Word converts # the characters instead of applying it as formatting. # Get rid of the ... subscripting of pictures created from # Equation Editor equations. # Convert bulletted lists back from the paragraphs with dots that Word # saves them as to HTML bulletted
/
/gsi; $Html=~s///gsi; $Html=~s/