Problem areas
Typical problems with ingest:
Collecting web sites involve some typical problems, that are connected to how the web site is built. For ingest this might lead to parts of the web site are missing or do not function as intended.
- All objects must have at least one connecting link
- Search functionality that relies on written text.
- Calendars, since they are ”endless”.
- Dynamic objects, e.g. maps with built-in functionality, e.g. zooming.
- Other functionality built up with e.g. JavaScript and AJAX and which are dependent on communication with a server.
- Websites outside www.domain.xx will not be collected during a standard collection, e.g. material published on social media.
To consider when preserving:
- Choose file formats suitable for long-term preservation
- All files to be perserved require software that can open the files during the whole preservation time. A good idea is to stick to as few file formats as possible, that are as easy as possible to preserve.
- Proprietary formats such as .doc created in Word, .xls from Excel and .ppt from Powerpoint are not suitable for preservation since their format specifications are not openly available. If the producer end their support for these formats, it will not be possible to develop tools to handle them in the future. To be able to do that, open specifications are required. Thus, if documents and records are to be preserved in the long run, we recommend conversion to PDF/A-1.
- Comprised files can be risky to preserve, since they require specific tools and algorithms for to be possible to open.