TechCall_17Oct2016
Agenda
Batch updates (Bianca)
Move (Joel and Martin)
BHL DOI reporting (Bianca)
General updates (all)
Notes
BHL Move update:
databases reloaded; admin dash is offline
new vs. old BHL has same exact number of items
to continue testing this week
some performance tests
see
http://new.biodiversitylibrary.org
no URL to new admin dash interface until BHL has actually been switched
hope to find that the site is faster and more responsive!
still on track to switch Tuesday of next week
Mike and Joel to beat up server for a while tomorrow to test how it responds
right now really really fast b/c no one really using
Joel’s next step to open issue with IT dept to make the switch and coord with them about when the switch actually takes place - aim for the morning sometime
once DNS entries are changed then it will take Admin Dash 1-2 hours
any discussion about powering down at MBG? not yet, WU still using data there but Mike hoping that can be removed by next Wednesday
Joel to schedule a call with Chuck about power down schedule, Mike to join
If any kind of support needed to keep WU’s project running, BHL offers support
WU’s project runs through the end of the year - MBG data will from now on be outdated
MK has been in touch with WU about move schedule so he is aware of timing
usually when a switch happens like this, for everyone around the world, it takes about 2
Admin Dash won’t be available until switch takes place
Footer will be changed from old site to include “Terms of Use” and “Privacy Policy” links instead of “Licensing & Copyright”
If Mike and Joel are successful then users shouldn’t notice any problems
Joel has done these kinds of migrations many many times in past, generally no trouble
where users have problems it’s more than likely a network issue
[ ] staff call Thursday - Joel to talk about move update
Batch updates
BHL tries to re-download from IA but slows ingest down when it sees changes
not a big deal but better to slow things down only once than several times
Year field not displayed on public UI
Bianca reports many negative year fields for example
Year field is character not numeric b/c Items have year ranges
Address year field issues first, then generate reports
treat year field updates and copyright metadata updates as separate projects
[ ] Bianca to switch gears and review Year fields and send to Mike
batch updates to copyright metadata should help us in mapping to
Item ID and new value, plus original value to help identify patterns for avoiding with future ingests
always going to be a moving target so might as well take report from August to do data analysis
goal is to give Mike a request for all Years at one time
copyright metadata for contributors can be chunked into 10 or so at a time
can do up to 5,000 w/out having a too much of an effect on slowing down the ingest
give Rod a heads up
METS files send updates to IA
then all items get examined during ingest
this slows ingest down
batch changes
Rod Page notices that ingest takes a couple of days, so do BHL Staff
not really a cascade of changes on his end but a cascade up to IA
value in batch updating, but also danger
do this batch change once and see if it makes sense to implement something
more of a process thing - how do we fix this data going forward
are other things normalizing data?
BC: quirky data is always going to be a thing
see what the data looks like in 6 months
how clean is the new data? what can we do to prevent quirky data? will BC need to go through this exercise again down the road?
Workflow: ML runs report > Bianca reviews via spreadsheets > produces reports for Mike > Mike implements > rinse repeat
Could BC provide a rule that could help simplify the process
Identify “standard” contributors vs. “nonstandard” contributors
also depends on Year value being cleaned
[ ] BC to see if she can develop a rule
DOIs
ML disappointed there is no context for DOI reports
[ ] BC to receive reports for BHL from now on
We need to have a convo with CrossRef
Does the report identify broken links that could be resolved
Set up a call with CrossRef
Google Analytics might know links that people come from re: CrossRef
ML to stop receiving and BC to receive going forward
CrossRef started putting huge amount of information into them
ML warns that emails are HUGE so be warned