Fahrenheit 675: October 2010

Thursday, October 28, 2010

Review of 3 search providers

The three service providers I looked at this week were all found on www.openarchives.org - primarily because the gita.grainger.uiuc.edu link was down during the time of this writing. From the list, I elected to explore the following providers: CASSIR (Cross Archive Search Services for Indian Repositories), Hispana (Spanish repositories), and NORA (Norwegian Open Research Archives).

CASSIR uses the same harvester we installed on our VM's. It includes 23 repositories, mostly science based, although one was specific to dentistry and another to management. Because I was already familiar with the layout, I found it easy to use and search. Most of the complaints I had about the search function in my practice machine were not present in this instance. For example, "All Archives" can be searched across multiple fields, and the results can be retrieved by clicking the "back" button.

Hispana is similiar to Europeana for European collections, but is specific to Spain. 129 repositories contribute to Hispana, and over 2.5 million digital objects are included. The 503 total digital collections can be searched individually, or one can search under specific repositories (default searches encompass all repositories). Hispana includes a diverse number of topics from across Spanish academia.

NORA is specific to Norway, and includes 6 broad topics (agriculture, humanities, social science, math/science, technology, and medicine). Dozens of repositories contribute (the list is long), however, a total number is not advertised. Unlike CASSIR, the search results show the name of the repository and, in many instances, the actual document can be downloaded in PDF format.

I was impressed with each of these, and found the metadata to be substantial and searching easy. None of them have an overwhelming number of repositories or records, and each appears manageable at current scale. I entered this exercise expecting to advocate for service providers that specialize in a small number of topics, which I imagine would benefit users and help maintain the most useful metadata. I still think there is real value to be had by this approach. However, large providers (like the oaister.org example) also have a role to play by bringing huge quantities of metadata together. Such large providers can provide greater context to collections, may be funded at a higher level which promotes preservation and sustainability, and may offer more dynamic searching.

Tuesday, October 19, 2010

More of a challenge than I expected...

This week, working with EPrints and the LOC subject headings, is the first time during 675 that I've encountered issues related to cataloging my collection. Now, I'd like to state up front that I've yet to take IRLS 530 (cataloging and metadata) - although it is on my schedule for spring semester. Hopefully that course will bring some clarity to the process.

Using my own taxonomy, as I did earlier in class, seemed easier for my 15 items than LOC because I could hand pick subjects and fields that I knew were related to my modest collection. LOC (or LCSH), of course, provides comprehensive subject headings which, surprisingly, can be difficult to navigate for items of a general, non-descript, nature. For instance, 3 of my items are simply pictures of books. I don't know the subject, author, or any details about their publication. I struggled with how to classify these, and eventually settled on "Fine Arts - Print Media". I'm not sure this is the best place for these items, but I gave it a shot. Problem is, were I looking for images of books, I'm not sure this is the first, or even second, place I would think to look.

I ran into a similar problem with my three sports-related pictures. I settled on a category in "Recreation", but it wasn't specific to the sport itself, which I found disappointing. I'm trying to be consistent in category selection, although I may only be consistently wrong in the choice of subject headings. That's obviously a separate issue. Say I have a collection of photos that don't neatly fit a specific subject heading, so I use my best judgement and pick one. Even if I'm consistent in using that heading, someone else with similar material may have chosen a different heading, and now our metadata doesn't match up - a problem for users and queries across repositories.

Hopefully my examples make sense. As I mentioned, cataloging is new to me, but I can already see that it's not nearly as easy as one might suppose, and will take a lot of practice and skill to apply on a consistent basis.

Wednesday, October 13, 2010

EPrints is bringing up the rear...

I didn't care much for DSpace during the first week, but by the second week I had warmed up to it. EPrints is running a similar deficit right off the bat, and I'm not sure it'll catch up. The installation itself went fairly smoothly - I rarely have installation problems. But the customization appears to require quite a bit of command line tinkering which isn't my preference. For instance, the subjects2.txt file appears to be the only way to change subject headings from LOC, whereas I much prefer editing them in a GUI like in Drupal. Additionally, customization of the home page comment line was accomplished only through the command line, which isn't necessarily a problem (I got it to work), but isn't my first choice. This is probably just an issue of confidence and practice, which could be overcome with time and EPrints experience. Nonetheless, it looks like EPrints offers less robust customization when compared against the range of options provided by Drupal.

This week I "branded" the home page by editing the command line to include a short sentence describing my collection. I'm still working to edit the subject headings. My collection is entirely images, which doesn't seem to be the primary format used by most EPrints repositories. From what I've seen, EPrints caters more toward the theses/papers/text crowd. In fact, the primary audience seems to be institutions seeking a CMS for digital preservation of research papers. At this point, I think Drupal or DSpace would be better choices for housing my collection than EPrints.

Monday, October 4, 2010

Rethinking DSpace and a couple (rhetorical) questions...

Last week I was down on DSpace. Today I logged back into it for the first time in a few days, and now I can't remember what all the fuss was about. I entered the rest of my collection - no problems. Got the workflows going - no issues. Everything seems fine. I still prefer Drupal (I like setting my own taxonomy and prefer the appearance), but I'm warming up to DSpace. A couple things do strike me as odd, however. For instance, when submitting a new item it gives three choices (more than one title, previously published, etc.), but why not a simple default with just one title? At first glance, it didn't appear that any of the choices applied to my simple collection of images. Also, must one click the license approval after each item is entered? There must be a work-around for this, because granting the license for each entry would be incredibly tedious for large collections.

Now my questions, which shall go unanswered for the time being. I am enjoying the practice with digital collections, and now think I may pursue a career as a digital archivist post-graduation. What is not clear to me is exactly what degree of expertise is required to compete for these positions. I feel like DigIn is providing a good foundation, but I seriously doubt I'll exit the program ready to install/create/manage production digital collections. Is that the goal of the program? What prospects exist for those of us interested in moving toward a position where we can contribute to a project like digitalMETRO? One of the authors of that initiative graduated from DigIn - an encouraging sign. However, his job title and expertise (as evidenced by digitalMETRO) lead me to believe he has additional computer experience. If so, what did he know, and when did he know it? Because, right now, I fear that my interest and basic understanding will be established, but I'll still lack the skill set to compete for digital archivist positions.

This line of questioning probably sounds naive - "Of course, you'll need more experience before you can be a full-blown digital archivist. Take a job with a digital component, learn that, and move up!" Is that how it's done? Maybe some of this insecurity will begin to vanish when I start looking in earnest for jobs next spring. But for the time being, I'm a little worried that my experience/skills will forever be lacking when compared to people with long-standing computer experience.