REPRESENTATIONS
OF WALES
IN FICTION
OF THE
ROMANTIC
PERIOD:
DIGITISATION
PROJECT
Andrew Davies
and Anthony Mandal
[To view our sample electronic
text, Bleddyn (see below), please
click this underlined
link. Visitors can also access downloadable
versions of both the sample text and this report from our
Project Downloads
section.]
[Portions of this
project were developed in co-operation with Belser Wissenschaftlicher
Dienst, who kindly supplied permission to reproduce materials
published in the Corvey Microfiche Edition.]
Recent advances in Information Technology have
afforded faster and more convenient ways of searching texts,
catalogues and bibliographies than have been previously available. For
some time, Cardiff University’s Centre
for Editorial and Intertextual Research (CEIR) has explored
the feasibility of producing digitised texts that are searchable
by computer. A report was presented in September
1997 by Anthony
Mandal which examined possible avenues for the digitisation
of literary materials available at Cardiff.
In February 1998, a scheme was agreed upon by
members of the Board of CEIR to begin a pilot project, in
collaboration with Andrew
Davies, to convert five novels with a Welsh context from
the Romantic period into fully searchable electronic texts. This
would enable those working on the project to develop an appropriate
methodology for the conversion process, as well as to explore
the logistics of such a procedure.
The first phase of this scheme
was completed in August 1998, resulting in the production
of a CD-ROM sampler, which was demonstrated at the ‘Scenes
of Writing, 1750-1850’ conference in July 1998. The
project generated praise and enthusiasm, with many members
of both the academic and publishing communities expressing
interest in either purchasing such material or collaborating
on future works. The final aim of this project
would entail combining the texts with other relevant material
(e.g. biographies, bibliographies, background, criticism,
etc.), to form a multimedia resource on CD-ROM.
Developing an electronic resource
in a similar fashion to the CD-ROM sampler would serve three
important purposes:
academic,
creating editable texts, which could be searched for key words/phrases;
practical,
for use in the process of editing text, either for training
or for production of final texts;
to
complement facsimile reproductions in codex form, possibly
as a fully edited version of the source text.
This could be supported by additional
apparatus, such as biographical and historical details, bibliographies
of both primary and critical texts, secondary criticism, glossaries,
notes, etc. Additionally, the electronic text could
serve as the basis of future reset editions.
THE
DIGITIAL TEXTS
The file format currently employed
for this scheme is Adobe’s proprietary Acrobat
Portable Document Format (PDF), which works along the
same principles as the hypertext found on the World Wide Web. Sections,
sentences, even words, can be ‘hotlinked’ to other similar
segments of the text/s through a simple mouse-click. The
advantages of the PDF format include its self-contained nature
(all the typography, images, and searchable information are
included in one file), its flexibility, and its consistent,
user-friendly approach to document management. Additionally,
because the software required to browse PDF documents (the
Acrobat Reader) is freely distributable, this saves immense
costs in developing a propriety interface for such a scheme.
PDF allows for a great potential
in the field of academic research. Amongst its
many features, the electronic text enables a variety of searches,
a few of which include:
different
uses of certain keywords could
be studied (e.g. ‘sentimental’ in a particular text)
words
can be queried in proximity to
one another (e.g. in a gothic novel, the use of ‘castle’ and
‘horrid’ within a certain number of pages of each other)
word-stems
can be interrogated (e.g. ‘terrify’ would bring up details
of ‘terrible’, ‘terrifying’, ‘terrified’, etc.)
Boolean
searches for certain word combinations using logical
operators (e.g. ‘horrid’ AND ‘spectre’,
‘sentimentalism’ OR ‘sensibility’, ‘villany’
NOT ‘villainy’)
searching
across a range of texts (both primary and secondary)
for particular words, terms, or phrases of interest
PDF files and the Acrobat Reader
which accompany these these texts, therefore, offer significant
and highly useful potential for development, which would complement
and expand the already recognised importance of the facsimile
edition.
METHODOLOGY
I. Preparation
of Materials
i) Xeroxing: the
book pages are photocopied, either as two-page or four-page
openings per A4 sheet
ii) Scanning:
the xeroxed sheets are scanned at optimum resolution into
document management software to create PC images
iii) Optimisation:
the images are then cropped, sharpened, cleaned
to ensure the highest quality for conversion to text
II. Conversion
of Image to Editable Text
i) Optical
Character Recognition: the images are
then passed through an OCR engine, in order to convert the
graphical images into actual text, in Word 97 format. The
success ratio is approximately 75%, but can obviously vary
depending on the quality of the source image: the software
also allows ‘training’, so that idiosyncrasies of the text
(e.g. the elongated ‘s’ of pre-nineteenth-century texts) can
be anticipated. Initial (i.e. very basic) proofing
is done at this stage to correct glaring anomalies in the
conversion process
ii) Formatting
for Proofreading: regular styles are applied
to the different textual elements (chapter headings, body
text, verses, etc.). Minimal standardisation is
employed, based on established house rules: at the moment,
policy is to adhere as closely to the conventions of the source
text, while ensuring visual consistency. Hard-copies
are then printed from this regularised text
III. Editing
and Corrections
i) Collation:
the hard-copy is carefully compared to the source text for
errors and differences
ii) Silent
correction: slipped type and accidentals in
the copy-text are then marked on the hard-copy for alteration
iii) Editorial
intervention: any cases for editorial intervention
are made; e.g. whether to standardise, regularising of names,
rules of predominance, etc.
iv) Application
of changes: corrections and alterations recorded
on hard-copy are than transcribed onto the electronic text
and the changes noted in appropriate forms
IV. Creation
of Electronic Text
i) Optimise
for online viewing: the text is formatted and
structured for viewing on a monitor, and converted into a
PDF file. Initially, we used Word 97 for this process,
but are now moving over to a full desktop publishing package,
and plan to experiment with both Adobe PageMaker
6.5 and Corel Ventura 8.0,
in order to find the most flexible and appropriate medium
ii) Optimise
for printing: a second PDF file is prepared
especially for printing, with a proportionately smaller font
size, and a page layout of two ‘book pages’ per A4 sheet. These
could then be used as ‘work-texts’ for study, teaching, training,
consultation, mark-up, etc. [At this stage of the pilot
project, this option is not currently available]
V. Addition
of Supplementary Materials
i) User
guide: an online guide (also printed manual?)
would be available to instruct the end-user how the Acrobat
software works
ii) Introduction:
an introduction to the series and the text [At this
stage of the pilot project, this option is not currently available]
iii) Biographical
information [At this stage of the pilot project,
this option is not currently available]
iv) Bibliographical
information: both primary and secondary [At
this stage of the pilot project, this option is not currently
available]
v) Critical
apparatus, etc. [At this stage of the pilot project,
this option is not currently available]
VI. Mastering
the Final Package
i) Prepare user interface:
a console to install and navigate through the package will
be constructed using the latest version of the industry standard
Macromedia Authorware
ii) Print
Hard-Copy: a final hard-copy of both the online and
printed versions of the electronic texts and apparatus would
be made and once again proofed for errors
iii) Record
Image onto CD-ROM: the data files for the electronic
text as well as the Reader software would then be mastered
onto disk, for both archiving and duplication. High
quality versions of any images from the texts are also supplied
in a universal file format (uncompressed/compressed TIFF)
for printing, etc.
 |
| This screenshot demonstrates how various
navigation structures, in addition to basic back/forward
movements, can be used. Bookmarks
(left) allow users to move through significant sections
of texts (e.g. chapter, tales, event, etc.). Thumbnails
(right) display a reduced graphical facsimile of pages,
enabling the viewer to move through sections based on
their visual appearance (e.g. illustrations, changes from
prose to poetry, etc.). |
ADOBE
ACROBAT
Adobe’s Portable Document
Format (PDF) is an established file system for
the storage and presentation of electronic texts in a consistent
and flexible way. It operates according to similar
principles as the HyperText Markup Language (HTML) used for
web pages on the Internet, in that it allows hyperlinking,
searching, and bookmarking. PDF, however, also
moves much further beyond the limits of HTML, in that it preserves
the richness of a printed document (fonts, graphics, and layout
are duplicated in PDF exactly as they were prepared in the
original application) and allows for sophisticated querying
and navigation around the electronic text in a way that is
currently impossible (or at least highly difficult to sustain)
in HTML. Moreover, any and all additional
elements (fonts, images, colour, sounds, and
movies) are embedded directly into a single file, allowing
for complete and compact portability.
In order to browse a PDF file
it is necessary to install the Acrobat
Reader: this is a freely distributable application,
able to run on a multitude of operating systems, including
Windows (3.1, 9x, NT), MacOS, Unix, Solaris, and many others. This
again substantiates the claim of PDF to be as universal as
HTML is via the world-wide web: in fact, many web sites do
employ PDF either as a supporting format to HTML or as a primary
vehicle for the dissemination of information. In
fact, PDF is being strongly championed by a number of people
as the document format of choice for the new ‘e-books’
which will be appearing over the next few years.
The analogy to a printed document
is apposite, as the process of actually creating a PDF document
involves printing the
document as a PostScript file (an established and sophisticated
programming language used to control high-quality printers).
The PostScript file is then distilled
through the Acrobat software into an online and fully
searchable electronic text. Various options can be selected
to set the quality of image reproduction, and typefaces can
be fully or partially embedded in the document depending on
the requirements of the author.
 |
| As well as allowing users to view
illustrations up to 1600%
magnification (with Acrobat 4.0) without losing detail,
the Search Query enables
sophisticated searches to be made across not only single
texts but whole ranges of collections, from a single dialogue
box. |
SELECTING
THE TEXTS
Our policy in selecting the initial texts for our
pilot project was to choose rare or unique texts, which displayed
significant depictions of Wales and Welshness in the context
of the Romantic era. Each initial text would display
some sort of variation in genre and style, and would also
represent diffferent authors from the period. Bearing
in mind that this was a tentative scheme to explore the potential
and feasibility of such a project, and that there was a great
deal of further material worthy of exploration, the five texts
which obtained the above criteria were:
Emily
Clark, Ianthé, or the Flower of Caernarvon (London:
For the Author, 1798): essentially a sentimental novel set
against the rustic backdrop of West Wales, with the typical
plot motif of the virtuous heroine kidnapped by the bigamous
seducer to a remote Scottish castle [source based on a
private copy held by CEIR]
Anon.,
Welsh Legends: A Collection of Popular Oral Tales (London:
J. Badcock, 1802): five illustrated stories, in both prose
and verse, covering Welsh folk tales and including such diverse
figures as demon lovers, patriotic bards, and betrayed Moorish
soldiers [source held in the Salisbury Library, Cardiff
University]
Evan
Jones, The Bard; or, the Towers of Morven. A Legendary
Tale (London: R. Dutton, 1809): a pseudo-Gothic story
set in the mystic past of North Wales, replete with revealed
identities, kidnapped heroines, and assassins [permission
to copy from the microfiche kindly supplied by the original
publishers, Belser Wissenschaftlicher Dienst]
Olivia
More, The Welsh Cottage (Wellington, Salop: F. Houlston
& Son, 1820): a domestic, moral tale, which features Wales
as a rustic and unspoiled place of retreat and contemplation
[source held in the Salisbury Library, Cardiff University]
W.
S. Wickenden, Bleddyn; a Welch National Tale, Being the
First of a Series (London: For the Author, 1821): an example
of the post-Scott regional-historical tale, set in a Wales
torn between Royalists and Parliamentarians during the seventeenth
century [permission to copy from the microfiche kindly
supplied by the original publishers, Belser Wissenschaftlicher
Dienst]
Each text typically took 35 hours
(for about 200250 pages of original text) of labour
to complete in a basic form, including microfiching, conversion,
and standardisation. Additional factors such as
formatting, mastering onto CD-ROM, preparing the user guide,
and so on, increased this. 
THE
SECOND
PHASE
With the first phase complete, and the basic
five texts digitised and only slightly standardised, the second
phase of the project offers a number of opportunities for
refining and improving upon the techniques already developed. The
project has shown the potential of converting a large quantity
of texts and preparing them for sophisticated searches and
analysis: as well as being strong analytical tools, the digitised
texts form an easily accessible (and printable) corpus of
literary works. An obvious example of this would
be the digitisation of whole runs of early periodicals, such
as the Gentlemans Magazine, Quarterly Review,
Edinburgh Review, and so forth. Users could
then search for particular authors names, generic keywords
(e.g. gothic, sentimental, etc.), reviewers phrases,
or publishers concerns. Of course, this would
represent a massive undertaking requiring months, if not years,
of committed labourhowever, it is a programme we are
seriously considering. One more tangible result
of this project has been the decision to establish a longer-term
project within Cardiff Universitys Centre
for Editorial and Intertextual Research, which would involve
creating a large digitial corpus of literary (and non-literary)
texts which have been selected from significant editions of
quality, and edited rigorously by members of the Centre.
A second aspect of development
would include the inclusion of secondary apparatus in equal
terms with the original (perhaps edited) text. Material
such as biographical information, bibliographies, expansive
annotations, articles, chronologies, and contextual information,
would make a CD-ROM containing a selection of literary works
more than an occasional research tool, but a fully featured
academic package. Again, this kind of involvement
requires a concentrated amount of labour, both physically
and mentally, and we hope to undertake such a scheme as resources
allow.
One definite result of this programme
has been the proof that the digitisation (for archival and/or
analytical purposes) of rare and significant works is both
feasible and attainable with minimal effort (in academic terms). The
potential that electronic texts offer is only just being realised,
but that something can be done relatively easily does not
mean that it will be done well. Such schemes, as
with anything else, require careful planning and preparation:
a clear and consistent policy needs to be laid down at the
outset, and it must be followed rigorously. Without
a thesis, a programme like this simply becomes an end in itself,
and such an approach negates its usefulness in academic terms. The
terminal point must always remain in sight, and one needs
to exercise caution in what can be achieved and the remit
of such projects. Ultimately, easy and instaneous
access to information can never replace the difficult task
faced by all scholars: the acquisition of knowledge itself.
To view Bleddyn,
our sample text, please click on the underlined link.