Design and Implementation of a "Bitstream Preservation Planning Tool"
Job description: The University, Public Library, and Archive of Trier work on the digital reconstruction of the mediaeval library of the Benedictine abbey St. Matthias. For this purpose manuscripts (about 150,000 pages) which are spread all over the world have to be digitized and stored. The images are combined with descriptive meta data to enable effective research for the scholars. The Karlsruhe Institute of Technology cooperates with partners in Trier and Darmstadt and has built a distributed storage system to ensure the data integrity. The EU-funded DARIAH project aims to build a sustainable research infrastructure for the arts and
humanities. As a basic service offered by DARIAH, the “Bit Preservation Service” provides a long-term and reliable storage for the humanities research data. This service is composed of several exchangeable components to enhance the sustainability of the software.

Scope of work:
Bitstream preservation, the physical preservation of data objects, is in addition to long-term interpretability and readability of crucial importance to ensure long-term storage of research data. Checksums, replications and erasure codes are examples how to provide data integrity. However, comprehensive measures in this domain lead to performance loss while storing data or an increased demand for disk space. Nevertheless data like the digitized images of the manuscripts are too valuable to accept data loss or in some cases even not reproducible.

As an objective of this work a “Bitstream Preservation Planning Tool” should be implemented for the scholars of the DARIAH project. This tool, preferably implemented as a web application, offers the opportunity to determine the reliability of the configuration chosen on the basis of adjustable parameters like amount of data, time or number of replicas. The output should consist of the probability of data loss as a numerical value and various graphics to classify and value the probability determined. Therefore the main task is to identify and evaluate different bitstream preservation strategies using the data from the mediaeval manuscripts in order to develop a calculation basis for the application. 
