| There is a program called detex that can be used to strip away TeX and
LaTeX commands, but it is not a great solution (it is designed to be a
pre-processor for a spelling checker).
Maybe the best solution is to generate a postscript file and then use
ps2ascii to convert from postscript to ascii. From the man page:
ps2ascii uses gs(1) to extract ASCII text from PostScript(tm) or
Adobe Portable Document Format (PDF) files. If no files are specified
on the command line, gs reads from standard input; but PDF input
must come from an explicitly-named file, not standard input.
If no output file is specified, the ASCII text is written to
standard output.
ps2ascii doesn't look at font encoding, and isn't very good at dealing
with kerning, so for PostScript (but not currently PDF), you might
consider pstotext from
http://www.research.digital.com/SRC/virtualpaper/pstotext.html
You need at least version 1.8g for compatibility with Ghostscript 6.0.
It is available from http://rpmfind.net/linux/RPM/pstotext.html
(pick the most recent version).
The short script ps2ascii should be included with your gs distribution
(on linux systems you should find it in /usr/bin, ready to go).
2000-Apr-24 11:28am furnstahl.1@osu.edu |