Using XML for long-term preservation




Using XML for long-term preservation


One of the objectives of the project is to explore the possibility of using XML as a format for long-term preservation. For this reason, we evaluated the practical use of XML in different parts of the system before deciding on the design. An XML schema, DiVA XML schema, has been developed to describe the inter-relationships amongst the various data elements and processes, and to support long-term preservation of the actual documents. XML Schema provides a means for defining the structure, content and semantics of XML documents. It is an XML based alternative to the XML Document Type Definition (DTD). Because one of the primary reasons for using XML was to support long-term preservation, the most popular DTDs for documents: DocBook and TEI were reviewed. Limitations regarding metadata descriptions were found in both of these DTDs, so the decision to develop a new structure for DiVA, using XML schema, was made. This schema combines the DocBook DTD for the textual parts of the document with the internal schema for all metadata (bibliographic and administrative data). Several applications, which implement the DiVA XML schema for content managing and communication between applications, were developed. Some of their purposes are essential for long-term preservation: - Make persistent National Bibliographic Numbers (NBN) available for the URN resolver ( at the Royal Library in Stockholm available. - Send MARC21 records in MARC-XML to the National Library. - Create file archives for the long-term preservation, checksum them, archive them in the DiVA archive and send them to the Royal Library. Currently the file archives for long-term preservation contain the original full-text file in various formats and the DiVA XML file, which contains all the metadata about the document. Furthermore the DiVA XML file contains all parts of the full-text file, which can be converted into XML. In the future it might be possible to transfer the whole full-text into XML, so the file-archives could contain only DiVA XML files.