BHLStaff092009
BHL Staff Call
24 September 2009
10 EST
Becky M., Joe D., Diana D., Don W., Diane R., Matthew B., Diana S., Kai, Doug H., Keri T., Jane S., Marco, Bianca, Suzanne, Matt P., Mike L., Kevin N., Danianne, Tom G., John M.
1. Status of the BHL Staff calls. Format and responsibilities.
The wiki will have a table set up so that institutions can pick the call/meeting that they will lead. There will be a place to assign a note taker/recorder. See
StaffConferenceCallAgendas. Let it be known, that if a month is not picked up by someone to be the point person, it will be assigned. Suzanne assigns without fear.
2. Status of CDL ingest analysis. Findings and reports to Institutional Counsel
Please see
Ingest Analysis Summary Ingest is the term used for incorporating biodiversity related texts that are available in IA. A lot of work has gone into the methodology of selecting material. We have the set included with our own material in the beta site.
Time sensitivity: We want to move forward on this sooner than later for various reasons: funders are asking about our material – quantity is not as high as we had said it would be by now. Once the major first ingest is done, we will then move to a monthly load. The longer we wait the larger the first ingest will be. We had hoped to turn around this test set soon but already we have a lag time and a lot more data has been added since this extract from IA. We will probably need to do an entire re-grab for the initial ingest. This take about three weeks as it involves shipping hard drives from Woods Hole to St. Louis etc.
The database can handle the size of the dataset. We do not know if the increase in content will effect search and retrieval (if we will get a lot more users traffic).
If we plan to move to a monthly schedule, then we should be very clear on our acceptance of this data. The profile should be set up and reviewed carefully. If we rush, we will be patching up afterwards with poor quality of metadata, scanning, images and subject scope.
Image quality: we can’t fix things that are not ours for example the missing page problems. User feedback will be interesting to see if this gets to be a frustration point.
Metadata assigned wrong or poor can be fix in the portal, by hand. We can always “bring down” problems that get reported.
Deal breakers? A missing page? We can’t find before ingest and will need to rely on response from users. If we don’t ingest we will not have the title at all. Is this a deal breaker?
Institutional Counsel will make the final decisions. The pros, cons, related issues need to be written out for the IC by the next call which Tom is in the midst of scheduling 8th or 15th of October. Everyone should use the wiki pages and the document that Bianca has draft to add comments and discuss these issues and others.
We as a group can proposal work-around, workable options, e.g. BHL staff will fix metadata to match the book that was non-BHL acquired etc. These can be articulated in the document for the IC.
It needs to be clearly communicated to the users what they are retrieving through the BHL once the ingest records are there. A BHL public facing wiki might solve the problem as a source of FAQ and documentation. It is clear that users do not go to documentation and read the details. Other ways of making information clear to them needs to be worked into the users interface. People are discovering BHL material from different avenues so the ‘front door’ cannot be our only way of connecting users to understanding our data set. The webpage needs to be self documenting. Though we are concerned about QA right now and it is one of our big concerns, we cannot predict that it will be the top concern for our users. We may not need to highlight many of our issues as prominently as it would appear to us today.
Relevancy of material is an issue already. This is problem that is known and is being looked at by others. This will not be solved in a timely manner and should not affect the timing of ingest.
Citebank development – searching and browsing will solve some of the issues that will need to be improved with the inclusion of the IA material. Conversation at the St. Louis technical meeting implied that the next phase of BHL will have a different look and functionality using the Drupal Biblio module.
None of these things are going to show up in our deduplication tools. We all have issues with our current model for dedupping. Kai discussed the need to potential rework these tools to have an internal format that can help with the dedupping. This will especially be needed as more data will be coming with various degrees of MARC and without MARC but other schema.
The current monographic dedupper was not built to be as robust as we need it to be now. It can not handle the ingesting of large quantities like the whole IA ingest set. The workflow now is not working for the BHL partners data only. This whole process needs to be thought out and redone to manage our own scanning workflow and the ingesting of new data.
A complete discussion dedupping needs to be done. It has large overall issues.
We need to schedule another separate call just to discuss ingest.
3. Suzanne will set up a doodle to carry on the rest of the agenda that we did not cover today:
- Gemini Project: error reporting / suggestions from the public system
- Portal editing
- Round robin: status up dates from the units
4. We do believe we need a face to face meeting soon. A group needs to form to spear head the organization. Bianca will begin the process and Don, Matthew and Kai will help. Everyone should check their calendars for November 2-5.