How to Create PDF Files (without using expensive software)

Summary

Creating PDF files is not straightforward so here is some advice lest you need to do so manually (there are automatic alternatives for many popular operating systems). The exact set of commands will depend on the software & operating system one uses (an example for M$ Win32 is below) but the route is essentially to convert it first to Postscript as if printing and thence convert the Postscript file to PDF using Ghostscript.

  1. Install Ghostscript if it is not already installed. (It will read, render, print & convert Postscript files.)
  2. Install Ghostview if it is not already installed. (It is a graphical user interface add-on for Ghostscript.)
  3. Install Adobe Acrobat Reader if it is not already installed. (To check the resulting PDF file in. Ghostscript can read PDF files as well as Postscript but Acrobat Reader can only read PDF so it will check that the file has not remained as Postscript.)
  4. Configure your computer to believe it has Postscript printer attached.
  5. Open the document you want to convert in some program that can print the document to a printer.
  6. Tell the program to print it to the Postscript printer.
  7. Divert the printing to a file. (This gives you a Postscript file version of the document.)
  8. Open the Postscript version in Ghostview/Ghostview.
  9. Tell the program to convert it (if there is no 'convert' menu, look under 'print' instead because it imitates printing to perform the conversion) and choose 'pdfwrite' as the converter.
  10. Test the PDF file by checking that it's file size is not ridiculously large, that it opens okay in Acrobat Reader and that the letters still look smooth when zoomed in on.

What are PDF, Postscript, etc. anyway?

'PDF' is a document format optimised for sending documents with fixed formatting for printing on fixed size paper across the Internet whereas HTML is optimised for sending documents with flexible formatting across the Internet for display on computer screens. HTML is far better for general purpose WWW pages but I add PDF alternatives ready formatted for printing to some page like dance instructions which readers are likely to prefer printed out. PDF is also useful as a "better than nothing" way of putting pages, which were originally designed just for printing, on the web when converting them to HTML is not practical and the original format is not something with widely available viewers.

'Postscript' is a document format designed as a standard way of sending documents to printers. As it was designed to only be used for temporary files and only be transmitted over a fixed wire link from a computer to an attached printer, it is has very little compression in the file and even includes at embedded programming language. PDF is essentially Postscript with the programming & other non-vital bits stripped out and what's left compressed. Both Postscript & PDF are proprietary languages invented by Adobe, the computer graphics software company, but their spec' is openly available and there are free viewers.

'PDF' is officially an acronym for 'Portable Document Format' but I suspect that the real origin of its name was a politically incorrect pun on "PDF File" which some joking programmer managed to slip past the management at Adobe!

Automatic Setup

Since writing the original version of this article, I have come across automatic ways of setting up free PDF creation on various operating systems:

However, this article might still be of use for those who want to how such automatic systems work.

Solutions to some Problems when Creating PDF Files

There are many pitfalls in this which typically lead to PDF the file being huge in size and with text looking as scruffy as a bad Fax and/or it not displaying text correctly on computers which don't have the fonts you used installed. This is caused by one or more of the converters drawing text as lots of dots as on screen instead of leaving it as text and including the font in the file for the next stage to render it with.

Here are the solutions to the ones I encountered when creating a PDF file on M$ Windows 2000. I guess the solutions will be similar for other operating systems.

Installing a Postscript printer driver.
Install it like a normal printer. Indeed, one can just pick a normal printer which one knows uses Postscript (such as an Apple Laserwriter). Alternatively use the Adobe Acrobat Distiller fake printer driver from Adobe which imitates a generic Postscript printer (and therefore should not add any printer-specific garbage to your Postscript file).
Installing a Postscript driver.
There is already one from M$ in M$ Windows 2000 that is installed by default with the operating system but a better replacement is available from Adobe.
Why are there 2 things which are printer drivers?
The 'printer driver' in M$ Windows terminology is misleading as it actually consists of two parts which are normally installed together when a new printer is added. One is the real driver which is a program that converts the output of other programs into the form that printers accept. This same Postscript driver should suffice for all Postscript printers. The other part is a 'ppd' file which is a small data file that tells the real driver what the hardware of a specific printer is capable of ('ppd' stands for 'PostScript Printer Definition').
How to ensure that the conversion to Postscript embeds the fonts.
Set the printer properties, if they are not already set that way, to always download fonts to the printer rather than using the printer's internal fonts.
How to ensure that the conversion to Postscript does not convert the text to bitmaps.
Set the printer properties, if they are not already set that way, to never bitmap text and always send them as vector fonts (or 'Truetype' or 'outline' or whatever that dialogue calls vector fonts). (If the options are something like to bitmap fonts below some size and to vector fonts above some size then set both those sizes to zero to force vectored fonts to always be used.)
How to ensure that the Postscript goes to a file instead of attempting to go to a real printer.
The printing dialogue from some applications have a "Print to file" checkbox. Set it to divert to a file and you will be prompted to for the file name. Alternatively (and to ensure that works in programs which don't have that option), set the default for the printer to be a file. The most fundamental way to route printing to a file is to, when installing the printer driver, set it to be on the local port called 'FILE:' which is on the same list as the real ports like 'LPT1:' & 'COM1:' in which case it should override the settings on the "Print to file " checkbox and always print to file. (In Unix I guess a straightforward pipe to a file would be the obvious method but M$ Windows does not support piping well.)
Where did the file go?
If the file prompt dialogue box was a normal M$ Windows one displaying folders then it went wherever you told it to but if it was a minimal one with just a line (which I think happens when the "Print to file" checkbox was used instead of setting the printer port to 'FILE:'), then one has to type the full path into the box each time, otherwise it will be saved to the root of the system drive (typically 'C:\') or sometimes an operating system directory (e.g. 'C:\WINNT\System32\', yuk!).
Why is the file '*.prn' not '*.ps'? Why is the icon for unknown file type even though Ghostview is installed?
That is another annoying feature of M$ Windows. It uses the extension 'prn' (for 'printer') for all files of redirected printing even if one explicitly tells it to use 'ps' (for Postscript). Simply rename the extension to 'ps' after the file is saved. (Of course, if you are foolish enough to use Windows Explorer with the file extensions hidden (the default mode) then this will have been really confusing! (It is not a good idea to hide the file extensions because those are what tells M$ Windows what to do with files when opened whereas the icons which appear to naive users to do that job are easy to fake.))
What is the difference between Ghostscript & Ghostview?
Ghostscript is a program for displaying (and converting, printing etc.) Postscript files which is operated by text commands. Ghostview is a graphical user interface which can be added to Ghostscript to make it easier to use manually with menus etc. (if one wants to control Ghostscript from another program, it is probably easier to call it directly though).
Why do the fonts look ragged in Ghostscript/Ghostview?
This might be because the Postscript conversion has bitmapped the fonts (if so then go back and sort out the printer driver stuff) or simply because Ghostview is not antialiasing the text it displays. To tell which, magnify the view greatly. If the raggedness grows horribly in proportion to the character size then the fonts have been bitmapped but if the raggedness remains only on the edge pixels of the characters then it is okay, it is just aliasing whilst being displayed on screen with the underlying Postscript file okay.
How to ensure that the conversion to from Postscript to PDF embeds the fonts.
Set the 'pdfwrite' converter properties in Ghostscript/Ghostview, if they are not already set that way, to always embed fonts.
Why is the file not '*.pdf'? Why is the icon not for a PDF file even though a PDF viewer is installed?
Ghostscript was not originally written for M$ Windows and other operating systems do not all require the file extensions to always tell of the file types so it does not append '.pdf' by default. Just type on the end of the file name when saving if you want (Ghostscript won't add its own unwanted extension like the M$ Windows printing to file does).
Why do the fonts look ragged in Acrobat Reader?
If all the other possible causes of font bitmapping are solved then maybe your copy of Ghostscript is too old. Some old ones (e.g. version 5.10) had this as a bug.
What size paper?
'A4' unless there some specific size more fitting to the job's requirements because most personal computer printers use the international metric 'A4' size paper or the similar USA 'Letter' size. 'Letter' is slightly longer than 'A4' so 'A4' formatted pages should fit on 'Letter' as well. For smaller pages 'A5' is good since 2 pages of that fit exactly onto 'A4' and is of the same aspect ratio so it can be magnified to 'A4' if desired (that is the ingenious part of the 'A' series paper sizes). If you really want something which is fully flexible with any size of paper, don't use PDF.

Accessibility & Editability

Remember that PDF was designed to represent a printed page with fixed formatting as the final stage before printing which makes it a good format for distribution for printing but not a good one for display on a computer screen or for subsequent editing. Viewing on a computer screen is normally much better done as HTML because that can be reformatted automatically to fit different screens and viewing preferences (and even converted to different media such as a speech output for browsing over a mobile telephone or Braille for blind computer users).

Therefore, if you put a PDF document on the WWW please also put an HTML version or at least the editing format file you created the PDF file from on WWW as well so your readers can read it how best suits them.

On this site, I have enclosed my PDF files an archiving/compressing format (such as 'tar.gz' or 'zip') to deter readers, and especially search engines, from going to the PDF files by accident for on-line viewing instead of the HTML alternatives of the same text which are far more comfortable to read on-line. It would be ironic if someone struggled with a PDF page in a browser plug-in or used an on-line PDF to HTML converter when a pure HTML version was there waiting for them!

A Microsoft Windows 'Walk-through' in Detail

The following is how I set it up on a M$ Windows computer. The software versions were: Ghostscript 7.03, Ghostview 4.1, Adobe Postscript Printer Driver 1.0.5, Acrobat Distiller PPD 3.0, M$ Windows 2000 SP2. The particulars will be different for different systems, of course.

Acknowledgements

I wish to thank Hin-Tak Leung for solving the font embedding problems for me and explaining how printer drivers worked in 2002 & VoltageX for telling me that PDFCreator now automates the process on M$ Win32 in 2005.