The final report from the CODA-WEBB project consists of three parts:
- Strategy for archiving a website
- A literature review of tools for web crawling
- Review of ”Open repositories” to publish, store and give access to digital documents
The first part of the report is intended as a handbook for authorities and other organizations working in the first phase of web archiving. The report concludes that a number of areas need to be investigated, for example: aim and requirements, selection, users and access. Other areas discussed in part 1 are frequency and date for collecting the web, methods, tools and technical questions.
In part 2 three web crawlers were evaluated and compared: HTTrack Website Copier, Heritrix and PageNest. The aim of the survey was to choose an appropriate tool for the Test platform project at the LDP Centre. The tool selected for this purpose was Heritrix.
Part 3 consists of an evaluation of Open repository tools, with the aim to choose the best tool to use in the Test platform. Five tools were evaluated by a literature review: EPrints, DSpace, Fedora, Greenstone and DAITSS. The study suggested Fedora as most suitable for the Test platform.