How to Create PDF Files (without using expensive software)
Summary
Creating PDF files is not straightforward so here is some advice lest you
need to do so manually (there are automatic
alternatives for many popular operating systems). The exact set of commands
will depend on the software & operating system one uses (an example for M$ Win32 is below) but the route is
essentially to convert it first to Postscript as if printing and thence convert
the Postscript file to PDF using Ghostscript.
- Install Ghostscript if it is not already installed. (It will read, render,
print & convert Postscript files.)
- Install Ghostview if it is not already installed. (It is a graphical user
interface add-on for Ghostscript.)
- Install Adobe Acrobat Reader if it is not already installed. (To check the
resulting PDF file in. Ghostscript can read PDF files as well as Postscript but
Acrobat Reader can only read PDF so it will check that the file has not
remained as Postscript.)
- Configure your computer to believe it has Postscript printer attached.
- Open the document you want to convert in some program that can print the
document to a printer.
- Tell the program to print it to the Postscript printer.
- Divert the printing to a file. (This gives you a Postscript file version of
the document.)
- Open the Postscript version in Ghostview/Ghostview.
- Tell the program to convert it (if there is no 'convert' menu, look under
'print' instead because it imitates printing to perform the conversion) and
choose 'pdfwrite' as the converter.
- Test the PDF file by checking that it's file size is not ridiculously
large, that it opens okay in Acrobat Reader and that the letters still look
smooth when zoomed in on.
What are PDF, Postscript, etc. anyway?
'PDF' is a document format optimised for sending documents with fixed
formatting for printing on fixed size paper across the Internet whereas HTML is
optimised for sending documents with flexible formatting across the Internet
for display on computer screens. HTML is far better for general purpose WWW
pages but I add PDF alternatives ready formatted for printing to some page like
dance instructions which readers are likely to prefer printed out. PDF is also
useful as a "better than nothing" way of putting pages, which were
originally designed just for printing, on the web when converting them to HTML
is not practical and the original format is not something with widely available
viewers.
'Postscript' is a document format designed as a standard way of sending
documents to printers. As it was designed to only be used for temporary files
and only be transmitted over a fixed wire link from a computer to an attached
printer, it is has very little compression in the file and even includes at
embedded programming language. PDF is essentially Postscript with the
programming & other non-vital bits stripped out and what's left compressed.
Both Postscript & PDF are proprietary languages invented by Adobe, the
computer graphics software company, but their spec' is openly available and
there are free viewers.
'PDF' is officially an acronym for 'Portable Document Format' but I suspect
that the real origin of its name was a politically incorrect pun on "PDF
File" which some joking programmer managed to slip past the management at
Adobe!
Automatic Setup
Since writing the original version of this article, I have come across
automatic ways of setting up free PDF creation on various operating systems:
- Mac OS X: PDF is so embedded in the operating system &
the Mac OS is so designed to be easy to use that one would expect PDF export to
come ready set up. Indeed it is.
- GNU/Linux: In the recent distributions I have tried
(Slackware, Knoppix & Red Hat), PDF (and Postscript) creation via fake
printers has been set up automatically.
- M$ Windows: This is the only popular personal office
computer operating system in which it is not set up as standard. However,
although Microsoft have not bothered, others have made free programs to do the
automatic setup such as CutePDF
Writer (Lite version of a commercial product) and
PDFCreator (an open source
project).
However, this article might still be of use for those who want to how such
automatic systems work.
Solutions to some Problems when Creating PDF Files
There are many pitfalls in this which typically lead to PDF the file being
huge in size and with text looking as scruffy as a bad Fax and/or it not
displaying text correctly on computers which don't have the fonts you used
installed. This is caused by one or more of the converters drawing text as lots
of dots as on screen instead of leaving it as text and including the font in
the file for the next stage to render it with.
Here are the solutions to the ones I encountered when creating a PDF file on
M$ Windows 2000. I guess the solutions will be similar for other operating
systems.
- Installing a Postscript printer driver.
- Install it like a normal printer. Indeed, one can just pick a normal
printer which one knows uses Postscript (such as an Apple Laserwriter).
Alternatively use the Adobe Acrobat Distiller fake printer driver from Adobe
which imitates a generic Postscript printer (and therefore should not add any
printer-specific garbage to your Postscript file).
- Installing a Postscript driver.
- There is already one from M$ in M$ Windows 2000 that is installed by
default with the operating system but a better replacement is available from
Adobe.
- Why are there 2 things which are printer drivers?
- The 'printer driver' in M$ Windows terminology is misleading as it actually
consists of two parts which are normally installed together when a new printer
is added. One is the real driver which is a program that converts the output of
other programs into the form that printers accept. This same Postscript driver
should suffice for all Postscript printers. The other part is a 'ppd' file
which is a small data file that tells the real driver what the hardware of a
specific printer is capable of ('ppd' stands for 'PostScript Printer
Definition').
- How to ensure that the conversion to Postscript embeds the fonts.
- Set the printer properties, if they are not already set that way, to always
download fonts to the printer rather than using the printer's internal fonts.
- How to ensure that the conversion to Postscript does not convert the text
to bitmaps.
- Set the printer properties, if they are not already set that way, to never
bitmap text and always send them as vector fonts (or 'Truetype' or 'outline' or
whatever that dialogue calls vector fonts). (If the options are something like
to bitmap fonts below some size and to vector fonts above some size then set
both those sizes to zero to force vectored fonts to always be used.)
- How to ensure that the Postscript goes to a file instead of attempting to
go to a real printer.
- The printing dialogue from some applications have a "Print to
file" checkbox. Set it to divert to a file and you will be prompted to for
the file name. Alternatively (and to ensure that works in programs which don't
have that option), set the default for the printer to be a file. The most
fundamental way to route printing to a file is to, when installing the printer
driver, set it to be on the local port called 'FILE:' which is on the same list
as the real ports like 'LPT1:' & 'COM1:' in which case it should override
the settings on the "Print to file " checkbox and always print to
file. (In Unix I guess a straightforward pipe to a file would be the obvious
method but M$ Windows does not support piping well.)
- Where did the file go?
- If the file prompt dialogue box was a normal M$ Windows one displaying
folders then it went wherever you told it to but if it was a minimal one with
just a line (which I think happens when the "Print to file" checkbox
was used instead of setting the printer port to 'FILE:'), then one has to type
the full path into the box each time, otherwise it will be saved to the root of
the system drive (typically 'C:\') or sometimes an operating system directory
(e.g. 'C:\WINNT\System32\', yuk!).
- Why is the file '*.prn' not '*.ps'? Why is the icon for unknown file type
even though Ghostview is installed?
- That is another annoying feature of M$ Windows. It uses the extension 'prn'
(for 'printer') for all files of redirected printing even if one explicitly
tells it to use 'ps' (for Postscript). Simply rename the extension to 'ps'
after the file is saved. (Of course, if you are foolish enough to use Windows
Explorer with the file extensions hidden (the default mode) then this will have
been really confusing! (It is not a good idea to hide the file extensions
because those are what tells M$ Windows what to do with files when opened
whereas the icons which appear to naive users to do that job are easy to
fake.))
- What is the difference between Ghostscript & Ghostview?
- Ghostscript is a program for displaying (and converting, printing etc.)
Postscript files which is operated by text commands. Ghostview is a graphical
user interface which can be added to Ghostscript to make it easier to use
manually with menus etc. (if one wants to control Ghostscript from another
program, it is probably easier to call it directly though).
- Why do the fonts look ragged in Ghostscript/Ghostview?
- This might be because the Postscript conversion has bitmapped the fonts (if
so then go back and sort out the printer driver stuff) or simply because
Ghostview is not antialiasing the text it displays. To tell which, magnify the
view greatly. If the raggedness grows horribly in proportion to the character
size then the fonts have been bitmapped but if the raggedness remains only on
the edge pixels of the characters then it is okay, it is just aliasing whilst
being displayed on screen with the underlying Postscript file okay.
- How to ensure that the conversion to from Postscript to PDF embeds the
fonts.
- Set the 'pdfwrite' converter properties in Ghostscript/Ghostview, if they
are not already set that way, to always embed fonts.
- Why is the file not '*.pdf'? Why is the icon not for a PDF file even though
a PDF viewer is installed?
- Ghostscript was not originally written for M$ Windows and other operating
systems do not all require the file extensions to always tell of the file types
so it does not append '.pdf' by default. Just type on the end of the file name
when saving if you want (Ghostscript won't add its own unwanted extension like
the M$ Windows printing to file does).
- Why do the fonts look ragged in Acrobat Reader?
- If all the other possible causes of font bitmapping are solved then maybe
your copy of Ghostscript is too old. Some old ones (e.g. version 5.10) had this
as a bug.
- What size paper?
- 'A4' unless there some specific size more fitting to the job's requirements
because most personal computer printers use the international metric 'A4' size
paper or the similar USA 'Letter' size. 'Letter' is slightly longer than 'A4'
so 'A4' formatted pages should fit on 'Letter' as well. For smaller pages 'A5'
is good since 2 pages of that fit exactly onto 'A4' and is of the same aspect
ratio so it can be magnified to 'A4' if desired (that is the ingenious part of
the 'A' series paper sizes). If you really want something which is fully
flexible with any size of paper, don't use PDF.
Accessibility & Editability
Remember that PDF was designed to represent a printed page with fixed
formatting as the final stage before printing which makes it a good format for
distribution for printing but not a good one for display on a computer screen
or for subsequent editing. Viewing on a computer screen is normally much better
done as HTML because that can be reformatted automatically to fit different
screens and viewing preferences (and even converted to different media such as
a speech output for browsing over a mobile telephone or Braille for blind
computer users).
Therefore, if you put a PDF document on the WWW please also put an HTML
version or at least the editing format file you created the PDF file from on
WWW as well so your readers can read it how best suits them.
On this site, I have enclosed my PDF files an archiving/compressing format
(such as 'tar.gz' or 'zip') to deter readers, and especially search engines,
from going to the PDF files by accident for on-line viewing instead of the HTML
alternatives of the same text which are far more comfortable to read on-line.
It would be ironic if someone struggled with a PDF page in a browser plug-in or
used an on-line PDF to HTML converter when a pure HTML version was there
waiting for them!
A Microsoft Windows 'Walk-through' in Detail
The following is how I set it up on a M$ Windows computer. The software
versions were: Ghostscript 7.03, Ghostview 4.1, Adobe Postscript Printer Driver
1.0.5, Acrobat Distiller PPD 3.0, M$ Windows 2000 SP2. The particulars will be
different for different systems, of course.
- Installing:
- Downloaded Ghostscript, Ghostview, Acrobat Reader, Postscript Printer
Driver & Acrobat Distiller PPD software installers (there was no charge for
these items).
- Logged in as Administrator.
- Installed Adobe Acrobat Reader (removing the desktop advertising shortcut).
- Installed Ghostscript (changing its installation directory to be under
'C:\Program Files' with most other applications).
- Installed Ghostview (changing its installation directory to be under
'C:\Program Files' with most other applications). Because it was installed
after Ghostscript, it detects and uses Ghostscript automatically.
- Expanded the self-extracting archive containing the Acrobat Distiller PPD
into a temporary directory (ignoring the installation instructions that came
with it because they required the use of a postscript driver configuration
utility which I haven't got).
- Installed the Adobe Postscript Printer Driver (I did this after getting the
PPD file ready because it is easiest to set up the printer settings whilst
installing this driver) and configured the fake printer (port='FILE:'; for the
printer, I used the 'Have disk' button, went to the temporary directory
containing the Acrobat Distiller PPD file and chose 'Acrobat Distiller 3.0';
paper size='A4'; minimum font size to download='0'; maximum font size to
bitmap='0').
- Using:
- I opened the document for conversion its normal program for display and
printed it choosing 'Acrobat Distiller 3.0' as the printer, a temporary
directory as destination & ending the file name with '.ps' after checking
the printer properties were okay (some programs override the global settings
with their own and have to be set independently).
- I opened the *.ps file from Windows Explorer thereby launching
Ghostview/Ghostscript.
- I used 'Convert' from the menu (with settings of device='pdfwrite',
resolution='600' dpi, and properties.EmbedAllFonts='true') choosing the
destination file path & ending the file name with '.pdf'.
- I opened the *.pdf file from Windows Explorer thereby launching Acrobat
Reader and checked the PDF file worked okay.
Acknowledgements
I wish to thank Hin-Tak Leung for solving the font embedding problems for me
and explaining how printer drivers worked in 2002 & VoltageX for telling me
that PDFCreator now automates the process on M$ Win32 in 2005.