requirements_for_users
requirements for users of the BHL-Europe website
composed and proposed by Francisco Welter-Schultes, UGOE, 19-05-2009
these guides or suggestions could be used to develop a preliminary website (Deliverable 5.1), to create an initial BHL-Europe web presentation that could be as useful as possible, before starting with WP 5.5 (user surveys, user testing, improving the preliminary web presentation).
in the course of programming
www.animalbase.org I am used to give "approximately exact" programmer-compliant descriptions of user requirements. I also learned that librarians often do not have an idea what scientists may be looking for in old literature and much less how they actually try and succeed to find it.
I also know that such requirements are clear to me but not always to the programmer. please ask if something is unclear.
I only know zoologists' requirements, I cannot say much about botany, but botanists' requirements and search strategies should be more or less the same.
General requirements for BHL-Europe web content presentation
- 1 - the most important requirement. design of the search page should be tested by the programmers themselves with the oldest available browsers to make sure that all scientists can have access to the produced result. all website functions should be tested by all members of the consortium, with all their different browsers and systems, to avoid browser-specific features in the websites that would prevent some users from seeing some content. we must be aware that our pages are also requested by scientists in Albania and Mongolia, who may have only once in a week access to the internet for a few minutes with a very old computer. These are also scientists and we have a high responsibility also for them. for testing our system we should not hesitate to use the same old technical requirements. if programmers don't have a 12-year old computer, ask me, I have it, and I'm still using it every day.
- 2 - this involves avoiding frames wherever this is possible.
- 3 - keep the default size of any delivered page as small as possible. many scientists, especially in the tropical countries, have to pay much money for the memory space of the requested pages, and it is our responsibility not to discourage people to read literature. higher resolution than the eye can decode - only after an extra click.
- 4 - try to find a website design that is independent from modern trends, and will not look outdated after a while. researchers come to BHL over and over again with perhaps large time gaps in between, nobody is happy to be forced to spend time to get along with a new website design.
- 5 - website content should be found by the Google search engine. google listing should be tested.
URL requirements
the URL is an important tool for users, it is also one of the most important visible presentations of bhl-europe
- 1 a short URL main address. www.bhl.eu is for sale (blocked by a Dutch company, possibly the rights can be taken over by law)
www.bhl.org is still free. the abbreviation BHL has become commonly understood by the community
the URL www.biodiversitylibrary.org is much too long.- Comment from Chris Freeland: bhl.org is not available. It is registered by Future Media Architects and that company does not sell domains, ever. They have had bhl.org registered longer than BHL has been in existence, so we have no legal ground for obtaining the domain. We ran into this same issue when BHL initially launched, as bhl.org was also our favored domain.
- Comment from Chris Freeland: I am also in agreement that biodiversitylibrary.org is too long. Again, this discussion was raised back when BHL launched and we decided for the sake of branding to promote biodiversitylibrary.org rather than a shortened URL. I would advocate keeping biodiversitylibrary.org and also using a shorter alias, such as www.biodivlib.org.
- Comment from Francisco Welter-Schultes: yes, www.biodivlib.org would also do it.
- 2 - URLs to literature titles must remain permanent. this requirement should have the highest priority in this section.
- 3 - the (total length of the) URL of a literature page must be as short as possible.
- 4 - the URL should end with the ID number (this is necessary because users are used to find items by modifying ID numbers in URLs).
bad examples:
http://www.animalbase.uni-goettingen.de/zooweb/servlet/AnimalBase/home/speciestaxon?id=6778 - such a URL is too long and should be avoided if this is technically possible
http://www.fishbase.org/Summary/speciesSummary.php?ID=2420&genusname=Gasterosteus&speciesname=aculeatus+aculeatus - this is much too long, the ID is not at the end
good examples:
http://www.biodiversitylibrary.org/bibliography/824
http://www.biodiversitylibrary.org/item/16108
http://www.faunaeur.org/full_results.php?id=278059
search page = entry page
- 1 - the search boxes section must show up as far as possible upside at the entry page. It is unacceptable to force the user to scroll down to come to the search functions.
- 2 - what needs to be searchable:
search function for all titles (monographs and journals)
definition: a
monograph is a book (Buch), which has author(s), year and title. a
journal (or periodical, Zeitschrift in German library catalogues) was periodically published usually in several years and in several volumes, usually with contributions by several authors. these are called
articles. BHL uses the expression
serial, which includes journals and some titles here regarded as monographs, this expression serial should not be used here. serial publications like Novitates Conchologicae and Systematisches Conchylien-Cabinet could either be classified as subsequently published monographs (as in German library catalogues, where monographs and journal titles appear in list pages) or as journals (as in AnimalBase, where monographs and journal articles appear in list pages), or as both, researchers are used to these unclear limitations between both classes. for BHL-Europe such a case should perhaps be treated as both, several subsequent monographs and a journal (this practice is also common in German library catalogues). a journal
article is treated like a monograph.
search functions
should all be presented with a scroll box: Begins with (= default), Contains
- - ALL WORDS: all words (= freetext function lumping searchable strings from author(s), co-authors, year, title, place of publication, name of publisher)
- - AUTHOR: last name of author(s) (the first name should be neglected, it confuses more than it helps)
- - YEAR: year of publication (1877, 1876-1878)
- - TITLE: words or parts of words of title (caution: "Contains" function must be able to search each word separated by a space independently! the current BHL site allows only single search strings, without indicating that! query "journal sciencias" currently does not find Journal de Sciencias, but query "rnal de sc" does find it)
- - ID NUMBER: item number (identification number of the title)
- - JOURNAL: extra function for journal search (this function shall exclude monographs from search results. this is necessary to allow selectively reducing the number of hits. it is important to give easiest tools to reduce hit numbers)
- - alphabetical list pages of all digitized journals: A B C D E F ..., so that users can select parts of the journal list. users should also be able to obtain a complete journal list of a seach query [Begins with] "journal of sc" or "evo"
powers of the search functions:
- - multiple author synonyms (search for linne shall show results referring to author names Linné, Linne, Linnaeus, Linnæus, Linnaei, Linnæi, Linnei, search for linnaeus shall give the same results) - this problem can technically be solved either by the way German library catalogues have solved it, or (simpler) as in AnimalBase (master name with synonyms).
- - on the other hand, search functions must not be too tolerant (see also Don't 1)
- - a professional special character management
definition: with special character any character is meant that exceeds the set of 26 Latin characters (in German Sonderzeichen, é, ú, æ, ?, @, -, ö, +, #, ß etc.).
this includes:
- correct showing of special characters in any browser that supports UTF-8, in author names, titles and other text components (this includes names of authors in taxon finder functions)
- special characters must be searchable (if æ is searched then ae and æ must be found)
- special characters must be findable (if ae is searched then ae and æ must be found)
Examples for search pages
BHL
www.biodiversitylibrary.org
Search for:
Archiv für Naturgeschichte
gives 4 results: Natural History Museum, London , New York Botanical Garden and MBLWHOI Library (2)
Search for:
Archiv f
gives 2 additional results for the same journal: American Museum of Natural History Library (2)
The reason is that the ü was badly decoded by that library, which gave as result:
Archiv furgeschichte. Abteilung A. .
Search for:
Archiv Naturgeschichte
gives 0 results
You have to know that the search funktion only works for words appearing directly one after another. Searching for French journals is always a problem if the characters contain diacritic marks.
Gallica:
http://gallica.bnf.fr/
Same search
Archiv für Naturgeschichte gives 159 results, obviously none matches. You have to scroll down and click a lot to be sure that your journal is really not there.
Search for:
Annales des Sciences Naturelles
gives 8149 results, the first 3-4 results seem to match well. This does also work if "des" is omitted.
The search function in Gallica is more powerful than in BHL, it also seems to have less problems with diacritic marks, but gives clearly too many results (see also below).
What would be the ideal search page?
Monographs:
A biodiversity researcher who needs to consult information in a monograph, comes in with the following information:
Name of the
author,
year of publication,
title of the monograph, often they also know the total number of pages, locality of publication, name of the publisher,
page or plate number on which the requested information is published.
Examples for citations of monographs in biodiversity related scientific contexts:
Brera, V. L. 1802. Lezioni medico-pratiche sopra i principali vernie del corpo umano vivente et le cosi dette malattie verminosa. - pp. 1-186. Crema. (Ronna).
Bloch, M. E. 1786 Naturgeschichte der ausländischen Fische. Berlin. v. 2: i-viii + 1-160, Pls. 145-180.
Erichson, W. F. 1837. Die Käfer der Mark Brandenburg. Erster Band. Erste Abtheilung. - pp. I-VIII [= 1-8], 1-740. Berlin. (Morin).
Fabricius, O. 1780. Favna Groenlandica, systematice sistens animalia Groenlandiae occidentalis hactenvs indagata, qvoad nomen specificvm, triviale, vernacvlvmque; synonyma avctorvm plvrimvm, descriptionem, locvm, victvm, generationem, mores, vsvm captvramque singvli, provt detegendi occasio fvit, maximaque parte secvndvm proprias observationes. - pp. I-XVI [= 1-16], 1-452, pl. [1]. Hafniae, Lipsiae. (Rothe).
Forsskål, P. 1775 Descriptiones animalium avium, amphibiorum, piscium, insectorum, vermium; quae in itinere orientali observavit... Post mortem auctoris edidit Carsten Niebuhr. Hauniae. Descriptiones animalium quae in itinere ad Maris Australis terras per annos 1772 1773 et 1774 suscepto, ...: 1-20 + i-xxxiv + 1-164, map.
Geoffroy, E. L. 1799. Histoire abrégée des insectes, dans laquelle ces animaux sont rangés suivant un ordre méthodique; Nouvelle édition, revue, corrigée, & augmentée d'un supplément considérable. Tome second. - pp. [1], 1-744, Pl. XI-XXII [= 11-22]. Paris. (Calixte-Volland, Rémont).
Westwood, 1845. Arcana entomologica, or illustrations of new, rare and interesting insects
, 2: (1843-1845) [vi], 192pp, 48 pls [49-95 + "68"].
Examples for presentations of results:
BHL:
Searching for "Arcana entomologica" gives 1 result:
Arcana entomologica, or, Illustrations of new, rare, and interesting insects / by J.O. Westwood.
Publication info: London :William Smith,1845 [i.e. 1841-1845]
Contributed By: Smithsonian Institution Libraries
This presentation by BHL is fairly good, and well arranged. Author, year and title shows up, usually this is sufficient. Good to show the name of the institution which digitized to work, because the user can estimate how good or bad the quailty of the scan can be expected. It would be desireable to add the
year of contribution.
The given link leads to the bibliography page, which takes some time to load. From there the user has to click once again to get to the
item page with the digitized content - also this page takes time to load:
http://www.biodiversitylibrary.org/item/44314
In works which only have one single volume, like this one, it would be desireable to link directly to the item page.
On the item page the user is directed to the first scanned page, usually the front cover. This is uninteresting. A good researcher will verify the
title page first, to see if the name of the author, the year and the correct title appears there. So this takes again time to load, up to 20 seconds. The next step is to consult page 169 or whatever. So it would be desireable to set directly a
deep link to the title page, so that the title page appears when the item page is first opened.
Very good: the stable item page URL remains always visible regardless on which page you are currently navigating. At any time during you work in the book you can copy this URL.
Searching for "Histoire abrégée des insectes" gives 4 results, alphabetically arranged by first words of the title. This is acceptable, you can quickly find what you are looking for.
Histoire abrégée des insectes : dans laquelle ces animaux sont rangés suivant un ordre méthodique ... / par M. Geoffroy, docteur en médecine.
Publication info: A Paris :Chez Calixte-Volland, libraire, quai des Augustins, no. 24 :An VII de la République françoise [1799]
Contributed By: Smithsonian Institution Libraries
Also this is good. Here we have 2 volumes, so a link to the bibliography page is useful.
Gallica:
Searching for "Histoire abrégée des insectes" gives 2029 (!!) results, fortunately those with the word in the title rank first:
Histoire abrégée des insectes, dans laquelle ces animaux sont rangés suivant un ordre méthodique. Tome 2 / par M. Geoffroy,... - C. Volland (Paris) - 1799
Book. Image mode only
Extract :
Subject : Insectes -- Classification -- Ouvrages avant 1800
Table of contents : 0099169.htm TABLE ALPHABÉTIQUE DES noms françois des INSECTES, contenus dans le second Volume. Les noms en caracteres italiques sont ceux des espéces. L'ABEILLE, p. 385-419. Sup. p. 728-729. L'agathe, p. 124. L'albastre, p. 168. L'alchymiste, p. 149. Amaryllis, p. 52. L'amélie, p. 223. L'aminthe
Full bibliographic record Add to your collection
Get the document on
Gallica
This is much too much information, and not well arranged. It contains unnecessary information for those who search for a title (Extract, Subject and Table of Contens sections are absolutely superfluous here). The result set returns too many results. If the title does not show up you are helpless because you ask yourself, is the title not digitized or will it show up on page 1834?
Much too broad frames, much too less space for the really internesting information. If I get a set of results, the next thing I want to know is, which one of the hits is the work I was looking for? Thumbnails are not needed here.
Very good: "Full bibliographic record" allows one click to see quickly more and really valuable information without the need to load a new page.
The link leads directly to the digitized work, in this case to the title page of the volume. This is useful.
http://gallica.bnf.fr/ark:/12148/bpt6k991697.r=Histoire+abr%C3%A9g%C3%A9e+des+insectes.langEN
The presentation of the actual scan in Gallica has always been the best one of all, it is slightly worse than in the previous version but still good. Page scrollbox (Browse by page/pagination) has always been very useful. As a scientist I only rarely consult the sections "thumbnails" and "table of contents". In most cases I know the page number.
Very good: there exists a permalink on this document, extremely useful:
http://gallica.bnf.fr/ark:/12148/bpt6k991697
Not good:
(1) the much too broad frame at the right side, the permalink and the other information could also be placed at the left margin to leave more space for the digitized page. "Full screen" mode has the disadvantage that the page number scrollbox disappears. The navigation within a digitized work has been better in the previous version, where it was possible to enlarge the image without that the page scrollbox disappeared.
(2) the permalink URL could also show up in the browser line, it is not necessary to add
.r=Histoire+abr%C3%A9g%C3%A9e+des+insectes.langEN.
Journals:
A biodiversity researcher who needs to consult information in a journal, comes in with the following information:
Name of the journal (or an abbreviation), volume number, page number.
In most cases one single page is quickly consulted and then the job is done. Sometimes researchers like to download PDF files.
A biodiversity related database has in addition the desire to link to the journal by a permanent URL. This URL should be that of the list of all volumes of this journal, not to every volume independently.
Examples for citations of journal articles in scientific contexts:
Bloch, M. E. 1788 Beskrivelse over tvende nve Aborrer fra Indien. Kongelige Danske Selskab Skrivter, Nye Samlingafdet v. 3: 383-385.
Bloch, M. E. 1788. Beskrivelse over tvende nye Aborrer fra Indien. - Nye Samling af det Kongelige Danske Videnskabers Selskabs Skrivter 3: 383-385, pl. [1-2]. Kiøbenhavn.
Brown & Mielke, 1967 Lepidoptera of the Central Brazil Plateau. I. Preliminary list of Rhopalocera (continued)
J. Lep. Soc. 21 (3): 145-168
Da Silva Mengo, J. 1867. Descripção de um «Helix» novo de Portugal. - Jornal de Sciencias Mathematicas Physicas e Naturaes 1 ["1868"] : 170-171. Lisboa.
Examples for journal presentations in result pages:
BHL:
Anzeiger der kaiserlichen Akademie der Wissenschaften, mathematisch-naturwissenschaftliche Klasse
http://www.biodiversitylibrary.org/bibliography/6335 (Smithsonian Instititution Libraries)
38
39
40
41
44
45
v.1 (1864)
v.2 (1865)
v.3 (1866)
This presentation is not bad. All digitized volumes are visible on a well arranged page, in the scroll box we see the volume numbers (with years and without years, those with years are more useful). We can then click on the journal volume we need
http://www.biodiversitylibrary.org/item/28679 and then scroll to the page we need to consult. The presentation of the scan itself is too slow, the JPG2000 format is bad and it takes long time to load.
Archiv für Naturgeschichte:
http://www.biodiversitylibrary.org/bibliography/2371(Natural History Museum, London )
(no volume description)
(no volume description)
(no volume description)
(no volume description)
(no volume description)
This is the
worst standard of all. Without any information on volume number and year you have to click on every single item, then go to the title page and wait long time until it finally will load. If you finally come to the result that your volume was not digitized you will have spent more than 30 minutes clicking and waiting.
Gallica:
Annales des Sciences Naturelles
http://gallica.bnf.fr/ark:/12148/cb343504237/date.r=Annales+des+Sciences+Naturelles.langEN
1 1824 (T1).
Periodical issue. Image and text mode. Full text search available
Add to my documents
Display plain text
Get the document sur Gallica
Not very well arranged, 5 lines instead of 1 per volume, too much information provokes the need to scroll down to see the other volumes and play with two (!) frames. Frames are fixed and not flexible. It is not necessary to occupy more than one single line per volume/year, like shown in the the BHL example.
Very bad: this page with the important overview of the volumes seems to have no permanent URL.
The volume link leads to the digitized content page, with page number scrollbox at the margin, or metadata section in the Table of contents section. The presentation is the same as for monographs. The volume page has also a permanent link.
Biologiezentrum Linz (LANDOE):
Annalen des kaiserlich-königlichen Naturhistorischen Hofmuseums
http://www.biologiezentrum.at/biophp/de/annalen.php
6: (1891): Annalen des K.K. Naturhistorischen Hofmuseums 6
One line per volume, volume number and year, name of the journal (which may have changed slightly after 20 years), good presentation. The link leads to this page:
http://www.biologiezentrum.at/biophp/de/band_det.php?litnr=26258
Very good: short URL with ID at the end, and the next volume has the same ID + 1.
Gives a table of contents. Only PDFs for download per article, lack of a page browser is of course a shortcoming.
Metadata: names of authors, titles of articles and page numbers of articles, this is a high standard.
Uni-Bibliothek Bielefeld:
Der Naturforscher
http://www.ub.uni-bielefeld.de/diglib/aufkl/naturforscher/naturforscher.htm
23.St., 1788
Two frames, on the left the volume numbers, on the right the digitized title page of the first volume. One line per volume, volume number and year. The frame is flexible. A good presentation. The idea to show the title page of the first volume and to stay in the same window with a flexible frame is not bad.
Also good: no headline occupies any space.
Clicking on the volume link shows in the left frame the titles, authors and pages of the articles.
No enlargement is possible, but the pages are bitonal and load very quickly (1 second). One of the most efficient presentations of digitized works.
Shortcomings:
- No direct page navigation. You must navigate by arrows + or - 5 or 10 pages.
- Also plates are bitonal scans, and of course in unacceptable quality.
Uni-Bibliothek Göttingen:
Beyträge zur Naturgeschichte/Beiträge zur Geschichte der Amphibien
http://resolver.sub.uni-goettingen.de/purl?PPN572173725
Beyträge zur Naturgeschichte . . . Band Heft 2
One line per volume, obviously arranged by ID and not by volume number or year, the year is not given.
Clicking on a volume leads to a page with metadata, here with some chapters (in other journals articles with names of authors and first page links).
Clicking on one of the pages finally gets one into the digitized document, only here with the page number scrollbox.
The pURL is shown in the headline.
The digitized page can be enlarged (zoom in, zoom out), pages load relatively quickly (5-10 seconds).
This is generally not a very good presentation, obviously some more experience is needed for compiling the metadata. It is not necessary to click 3 times (if you know exactly where to click) to come to the digitized contents.
Plate volumes can show the plate numbers in the scrollbox:
http://resolver.sub.uni-goettingen.de/purl?PPN600750280
What would be the ideal presentation?
Selecting the best details of all.
- Short and permanent URL which should show up in the browser line, like in BHL, Bielefeld and Linz.
- Journal entry page link should lead to a bibliography page showing all volumes with years and volume numbers, and current name of the journal, one line per volume number, like in Linz.
- Volume number entry page should lead directly to the digitized title page, like in Gallica and Bielefeld.
- Page scrollbox for navigation is most useful if placed at the left margin in a frame, occupying a relatively long space downwards like in Gallica (not as short as in BHL).
- Page scrollboxes should show pages and plates, like in Gallica and Göttingen.
- Frames should be flexible, like in Bielefeld.
- Zoom in and zoom out should be possible and relatively quick in the same frame, like in Göttingen.
- Digitized pages should be presented in TIFF or JPG (not in JPG2000) and load quickly, like in Bielefeld, Gallica or Göttingen.
digitized title homepage
definition: a homepage in the sense as used here is a page that has one single ID, in our case the page where the digitized title is shown (item in BHL language)
- 1 - entry page of a digitized monograph should be the digitized-monograph homepage http://www.biodiversitylibrary.org/item/16108, not the bibliography page http://www.biodiversitylibrary.org/bibliography/1265. on the bibliography page almost nothing useful is written, this can totally be skipped and the contents should be shown on the digitized-monograph homepage.
- 2 - page views of contents should currently not be presented in JPG2000. old browsers are not able to read JPG2000. see also general requirements No. 3.. also, the procedure that an image is delivered in quadrants is not very useful. I have the impression that BHL digitised pages are much slower that for example the "cheap" JPG images shown at the GDZ web presentation at Göttingen (example), or by DFG viewer http://dfg-viewer.de. I think BHL has not found a good solution to show the digitized content.
- 3 - the "first" digitized-monograph-page should not show the cover, but immediately (default for all who come from anywhere else to access the item) the title page of the work (I mean that those who come surfed in to see this digitized work should not see the outside cover of the book but they should see the title page). this is necessary because the scientist is not at all interested in how the book cover looks like (it is absolutely no problem if the book cover is not digitized at all), but needs to verify if it is the correct book - and only after having verified this continues to consult its content. the title page is the page on which the title is recorded, many books have pretitles or other printed pages bound before the title page, which are not meant here, these should be "skipped".
- 4 - search result for a journal should lead to some kind of a useful entry page for the journal, where either the volume numbers and/or eventually the articles show up - of course in any case with years, and in chronological order! BHL has inherited us a big mess in this respect, many journals were uploaded and are online without any metadata at all, so that the user needs to search the title pages of 20 or more volumes and see which volume from which year it is. I anticipate that we need manual tools to help ourselves entering the metadata for this kind of undocumented journal volumes. this concerns also identities of title pages. in the arbitrarily selected work http://www.biodiversitylibrary.org/item/16108 I have taken here as an example, no pagination was documented at all. We will need tools to insert these. perhaps also here the goobi system may be able to help.
- 5 - on the digitized item homepage the citation should show up somewhere in the very upper section of the page
- 6 - in addition, the ID of the delivered title should be recorded somewhere on the page (if possible somewhere in the upper section of the page without a need to scroll down)
- 7 - the size (memory space, 342 KB or something) of the delivered page should be recorded somewhere on the page (a need to scroll down for this is no problem. this is important for those scientists who have to pay money for delivered memory space)
- 8 - if links are given to partner projects etc: include a statistical tool to record how often these links are clicked on, and if they are too rarely clicked, delete the links from pages where they are not necessary. see also don't number 5
minimum requirements for metadata standards for digitized works
- 1 - the title page shall be recorded. this is the page which shall open by default when the digitized work is accessed by a user.
- 2 - page numbers and also plate numbers shall be recorded.
- 3 - alphabetical indexes which are often at the end of volumes should be visible in the page scroll box, as well as the fact that some pages contain lists of contents.
- 4 - to save time and money chapter titles should not be documented too thoroughly. when taxonomists consult old literature they usually know exactly on which page they have to verify what they need. or they look for the page in the index. usually they look for a location where a certain name is mentioned, scientists don't read a scientific book like if it was a novel. the page scroll box is much more important than any documentation of chapter titles.
- 5 - for journals: series numbers, volume numbers, heft numbers, years: Journal for Studies (3) 5 (4), 1886.
Don'ts
- 1 - no default automatic correction of what the user has searched for (worst example: google). scientists usually know very well what they are looking for, and frequently they select rare and uncommon spellings of words contained in titles or specific names, because they know that they will find their title quickly among the few expected results
- so in the BHL search function, search results for folk must not contain volk. if the user likes to have such a function, an additional click allowing to enable this feature must be necessary. it must not be the default function.
- 2 - no frames subdividing the webpage contents! it should be possible to create a modern website in one single frame.
- 3 - no advertising, no commercials.
- 4 - no blinking/moving image components, no popups, everything must be silent and static.
- 5 - not every page needs the same links to things users are never interested in.
- 6 - no unnecessary changes of website design, general appearance and locations of functions and links.