LaTeX as an Archiving Format: Benefits and Problems


Today, LaTeX is the standard format for writing papers in Mathematics as well as the preferred format for a major part of Physics. For presentations on the Web, these formats are usually transferred to PDF, a convenient format available for many different platforms, allowing direct viewing with appropriate rendering ("Reader") software. On the other hand, PDF is not the optimal format for long-term storage, because - it is owned by a commercial company - it is not stable over time (some older files cannot be read using the newer rendering software) - it is not fault-tolerant: compressed versions of PDF in particular may become completely unreadable if corrupted. - some PDF files do not allows the efficient extraction of the text behind the presentation, which prevents efficient indexing for search and retrieval. Since LaTeX is a pure text-based format with additional mark-up and available as open source software, LaTeX is a much safer choice for long-term preservation. But this presents several other problems: - While the PDF format is not the original, it provides the fixed pagination for reference - different compilations of the same LaTeX file under different conditions may provide different paginations. - LaTeX version of a dissertation may involve several different files: TeX, images, styles, macro packages etc., some of which may be necessary, some others not. - Some LaTeX files do not compile correctly, because some necessary files are missing. Although these problems exist, the advantages of using LaTeX as an archival format outweigh the problems. But for its efficient use, some developments are necessary: - automated validation of (collections of) TeX files - efficient administration of auxiliary files It might be useful to consider the packaging of LaTeX files into one (opaque) file, which could be rendered using a "TeX-Reader". This could increase the acceptance of the LaTeX outside of the mathematics community tremendously.