March 2005 Edition

More than meets the eye:
some aspects of a digitisation project

What follows is a short version of a paper given as an introduction
to the
Digital Image, Digital Text colloquium at the
Dublin Institute for Advanced Studies,
4 December 2004

Irish Script on Screen (ISOS) is a digitisation project of the School of Celtic Studies, Dublin Institute for Advanced Studies. Its purpose is to create high-quality digital images of selected manuscripts in the Irish language, page by page, cover to cover, and to make these images available, free of charge, to the scholarly community, via a dedicated website. As a working project we have been in existence since 1998, and, during that period, ISOS, like many other projects, has expanded and contracted, depending on the economic conditions of the time.

The rationale behind Irish Script on Screen is that it represents a logical extension of the statutory remit of the School of Celtic Studies to investigate and publish manuscript material in the Irish language (Dublin Institute for Advanced Studies Act 1940). The project is funded from the annual budgetary allocation of the School, within the federation that is the Dublin Institute for Advanced Studies. After the initial capital outlay when the project was launched back in 1998, the main expenses have been current ones. Generous grants from the Heritage Council have enabled us to purchase additional equipment, and to re-organise the website.

 


ISOS works on a partnership basis: necessarily so, as the School of Celtic Studies does not own any of the manuscripts it digitises. We form contractual partnerships with the various holding libraries, and these contracts set out clearly the respective functions and responsibilities of the ISOS project and of the library in question, so that from the start we have a thorough understanding of each other's roles. Our partners to date have been the Library, Royal Irish Academy; the National Library of Ireland; the James Hardiman Library at NUI Galway; the Library, Mount Melleray Abbey, Co. Waterford; the Library, Trinity College Dublin; and the OFM-UCD Partnership that manages the collection of Franciscan manuscripts at the Archives Department in University College Dublin, and where ISOS is currently working. In the early stages of the project, a partnership with the School of Computing at Dublin City University was entered into with regard to provision of technical expertise; this partnership continues at the level of shared research.

There are three distinct parts to our work: the digitisation itself, the processing of the digital material, and the delivery of the results of that work to the users. We now have in excess of 22,000 individual images on display on the ISOS website, and as each image is displayed in two formats, not including the thumbnail images, the total number of images available is over 44,000. The traffic on this site is fairly 

considerable for a specialist website such as this. Since May 2004, for example, we have registered a total of over 1.3 million hits, with .5 million requests for pages. We have a system of registration, explained below, for the use of our higher-grade images, and the figure for registered users stands at close to 300.

There will always, of course, be a healthy demand for us to deliver more that we can do, and that is probably an index to the use being made of the site, and to the interest in it. We try to be as accommodating as possible in dealing with feedback and requests from ISOS-users; in very exceptional cases we have supplied TIFF files on CD to facilitate private scholarly research. We have also been receptive to special requests by holding libraries to digitise specific manuscripts that lay outside the strict remit of the project; the Stowe Missal in the Royal Irish Academy and the Giraldus Cambrensis manuscript in the National Library of Ireland are cases in point.

Ideally we would like very much to develop a parallel text and manuscript interface, and that is an ambition that I hope will be realised in the future. So also is the idea of a section of the site that could be devoted to secondary or even primary schools, and that would illustrate matters of manuscript creation and writing, the history of the book and so forth, but, again, that is for the future, and, as with parallel text, could be catered for when the work of digitisation has come to an end. For the moment, then, we are concentrating all our efforts on image capture, processing and display.

 



Keeping up with technology is an obvious and important component of a project such as this. I may cite as an instance of the speed at which things advance, our use of 1 and 2 GB JAZ disks to transport files from our digitising stations when we began this work. JAZ disks were ideal for this purpose at the time, but, in the space of a few years, they have become obsolete, for our purposes at any rate, as we now use pocket-drives with a capacity of up to 90 GB for the same purpose of transferring material as we used the 2GB JAZ disks six years ago.

On average, ISOS, now reduced to a single camera, and scanning at a resolution of 600 dpi, generates between 30 and 50 scans per day, averaging 100MB per scan. These scans are processed to the extent that headers and footers are added, the header identifying the holding library, the footer containing the copyright information; a rule is also added for scale. These files, which are in TIFF format, are what we store in our archive. The TIFFs are condensed to low and high-grade JPEGs to facilitate access on the website: the low-grade JPEG averages between 200 to 400KB in size, and is accessible to everyone on the ISOS website; this format is a 20% reduction of the full-size, high-grade JPEG (created when the TIFF is converted to a JPEG), which can be around 5MB, and which is also available on the site. The reduced JPEG that is available to 

all is generally perfectly adequate, even for scholarly use, but for those who require access to the full JPEG, we ask that they fill out a registration form, post it to us, and then we supply them with a username and password. Registration information is included in the preliminary material on the ISOS website.

The TIFFs are now archived in two ways by ISOS. At the time of its inception, tape back-up was the only reasonable storage system around, and ISOS has used tapes to store the processed, uncondensed scans since that time. Nowadays these tapes have a stated capacity of 40GB, uncompressed, and a projected life-expectancy of 30 years. They are very suitable for the deep archive of the type of digital material that we are generating. There are, however, some disadvantages to the use of tape, particularly if one wishes to retrieve material from them quickly; they are not meant for practical daily use: they are archival depositories. Another consideration is that although the projected life of the tapes is 30 years, that is merely a projection: being a recent technology, no-one can say for sure whether or not that is in fact the case, or even if it is, whether or not the technology for reading them--the tape-drive--will last that long. For that reason, therefore, anyone holding these items should have a programme in place whereby the tapes can be checked every few years, to ensure that they are still readable.

The tapes are meant for archival use, and daily file-handling and file-inspection are not practical options. We might have improved our situation with the purchase of an automatic tape-loader, such as is currently available, and which can retrieve files from tape automatically, but, as we would still be dealing with tape, we decided instead to pursue another option. The receipt of a generous grant from the Heritage Council meant that in 2003 ISOS was in a position to purchase a disk-array consisting of two computers containing 12 drives each, each drive with a capacity of 200 GB: as disk-capacity is increasing yearly, it is not unreasonable to expect that some day the ISOS archive will be stored on just a couple of drives. As a result of this, we are now able to store the TIFF files on this disk-array, as well as archiving the material on tape. This means that files can be viewed or managed in much the same way as one views or manages oneís own conventional files on a PC, and that migration of material from one computer to another is much easier. The benefits of this are obvious. At present we are also storing the working version of the website itself on the disk-array; that is the version that we use to test the system when we add new material, before letting it go live. A vital aspect of this pre-delivery stage is the scrutinizing of each image, as part of the quality-control policy of the project. The size of the website currently stands at 81GB.

It is fair to say, then, that at this point in the ISOS project, the question of storage and of archive management has become as vital as the continued generation of digital content. One might even say that it is perhaps the single most important issue, and, as technology changes, we must ensure that we will be in a position to change with it, to maintain our deep archive and to migrate it to accommodate new technologies when the time arrives. For projects starting out, or at the planning stage, the question of storage and management must be a priority. Digitisation is a long-term commitment, and for projects that are in progress, concerns about management and storage may, in time, even supercede the elementary question of finance: if a project ceases due to lack of funding, the question of the archive still remains.

Technical support is a vital component in any project such as this. In a larger institution, one supposes, such back-up might be taken for granted. But in our case, we have come to regard technical support as a necessarily integral part of the project, complementing the digitisation, and fairly crucial to the long-term issue of archive management.

Another aspect of a digitisation programme that is web-delivered is the question of accessibility. Many will be familiar with the World Wide Web Consortiumís guidelines on this subject <w3.org>, and the question is a huge one. Navigability and legibility are two of the issues that pertain to our own project. While all our pages are navigable by keyboard up to the point where the thumbnails come into play, and while the large JPEGs are still legible at 200% magnification, it is not something we should be complacent about; and certainly when we reach the stage of including parallel text, this will be an issue that will be at the top of the agenda.

I have outlined the elementary and fairly obvious issues that underlie this project, and that should be considered in advance by anyone contemplating something similar. I have laid emphasis on these, not because content and presentation are in some way secondary issues, which of course they are not, but because for a project to succeed, and to have a healthy prognosis, project infrastructure must receive at least as much attention as project delivery, and this is especially true as the project grows and gathers momentum.

In conclusion, I believe that even from a small project such as ours there are a number of things that may be learned, particularly, perhaps, by those who might be contemplating similar projects in the future. That there should be great emphasis on planning goes without saying. And among the elements to be taken account of in that planning, I would give priority to questions of the project targets, quality-control, on-going and immediate technical back-up, long-term storage, and reasonable flexibility and understanding in dealing with feedback from the end-user.

Professor Pádraig Ó Macháin
School of Celtic Studies
Dublin Institute for Advanced Studies

 
back to March 2005 index