
A logical file format is a specific coherent structure for organising data in a file such that a software can read, process, present (render) and store the information.
To cope with the assignment to preserve digital documents for the future we have to exceed our knowledge about what file formats and versions of formats is in risk of beeing unreadable and what arrangements to take for each format.
It is also necessary to know what routines to follow and how to develop theese routines as well as what competence is needed for solving the probelms. We also need knowledge in what tools there are and how we can purchase and develop/improve them for our specific needs.
The general criteria is developed to act as knowledge base when to select a format that maximises the life length and guarantees reading and understanding of the information stored in a digital file. There should be a general rating of the logical file format judged from the criteria and irrespective of the files content. Besides the general criteria one must also look at the information to bee stored.
It can be devided into separate information categories:
First and foremost a evaluation needs to be done of the information to save, in what quality to preserve it and what its significant properties are. This varies for the different categories and must be determined for every single archive.
In general, first decide what quality to store in and what significant properties that need to be preserved and after that choose appropriate file format. This results in a consideration between quality and significant properties against the general criteria in finding proper archiving format for your organisation
During 2007 the LDP Centre and its parts worked together in a project callde CODA-FORM. One of the missions was to define criteria for archive file formats. The final report "CODA 2007" describes the whole project and its results.
Following parts presents the general criteria for logical file formats. The aim with CODA-FORM was to look at a formats aptness for archiving.
To really understand and interpret a logical format one needs to comprehend how it is built up and structured and how the format stores the bitstream. Without this knowledge about the format structure there is a huge risk that information can not be rendered. All that´s over in the future i a combination of 1s and 0s in a non-readable stream.
That is why criteria about logical structure is important for the possibiltiy to buid a interpreter to render information in a file.
This group includes:
The two latter criteria can be seen as a measure for the corrrectness of the upper ones.
This criteria looks at technical mechanisms that affects the inner structure of the format, such as encryption. Format bounded to physical media or encrypted can obstruct preservation of information in archive. To simplify preservation a file format should be free from following aspects:
The degree of external dependency (hardware or software) for a file format. Especially boundness to a specific hardware it is highly problematic. But also operating systems and rendering software can create problems in future technical environment. Because of this following two criteria are important.
This criteria measures if the format is well-spread and how and where it is used. A format that has been used for a long period and is well-spread can be seen as highly tested (used) and is likely to have a longer life expectancy. That is why it is wise to choos a format fulfilling some of these criteria:
Patents can affect support of the digital format in archive. Existing patent can restrict future development of open source tools. The cost of software applications for future conversions can be large. the problem is not the patent itself, but the stipulations it brings. To prevent from this problem, check the format against following criteria:
The term transparancy means the possibility to with common tools render the files content. An other benefit is if the attached meta data (if included) easily can be anlaysed. For non-textual information standard or simple representation is preferable to optimised representation. When you choose file format check the following criteria to assure the formats transparency.
Digital formats with possibilities to store meta data about itself in a transparent form inside the file is preferable when archiving. Files like that are easier to manage over time, less sensitive to information losses compared to files where meta data is stored outside the file format. For this reason it is an advantage if a file format supports this two ciriteria: