Tuesday, November 23, 2010

The Final Blog

Well, I guess this is it. 672 and 675 went fast - but I learned a lot. I remember being concerned when 672 started that my computer skills weren't up to the challenge, and that I would fall behind and never catch up. Fortunately that didn't happen, and both courses exceeded my expectations (including those I had for myself).

My 675 take-aways can basically be summarized in two categories: familiarity with the process of installing and manipulating CMS', and knowledge gained on how to select an appropriate CMS for a given collection. Both of these are equally important, but the second one in particular has lasting ramifications. Choosing the wrong CMS for a project can lead to huge financial and labor expenses if the effort is made to switch platforms. Looking at various CMS' provided the opportunity to see how they work, how they differ, and examine strengths and weaknesses of each. Additionally, practice installing the software was vital, although I must admit that I still lack confidence that I could install a fully functioning production environment.

I don't pretend to be prepared to lead a full-scale collection development project. If that was the goal of 672/675, then I apologize for failing in this regard. However, I think I can contribute to researching CMS', assist with installation and administration, and help establish the beginnings of a digital collection. In other words, I consider myself entry-level in terms of creating an IR. I do, however, believe I have a solid foundation to expand on, and look forward to increasing my knowledge, contributing to a production IR, and achieving greater levels of responsibility in creating/administering/managing institutional digital collections.

Tuesday, November 9, 2010

Prefer "do-it-yourself" VM's

My only VM experience comes courtesy IRLS 672 and 675, so I've never seen or worked with a pre-installed VM (unless I'm confusing the definitions). If pre-installed VM's were clean and easy to use, then I could see some benefit to having more time to manage the collection. However, judging from the blog assignment statements that, "the files are large" and "in practice, it's almost as much work and sometimes difficult to troubleshoot", it doesn't sound like the benefits would be easily realized.

And, of course, the learning experience can't compare to doing it yourself. For those managing a small or private collection, maybe the pre-installed VM is a good option. But for larger collections, with sophisticated features, numerous users, and large amounts of data, I think the "do-it-yourself" VM is the better option. Serious collection administrators will need to know the structure and design of their system in order to troubleshoot and take full advantage of the features; something I'm not sure you could master using a pre-installed VM.

Now, I'm writing with the understanding that VMWare Workstation is not a pre-installed VM, but one I created from scratch. Assuming that's the case, then I feel like I'm approaching a level of competency (and confidence) that suggests my preference is for building my own VM. If I'm wrong about my understanding, then I guess I'd have to reconsider, but at this point I believe my knowledge and career would both benefit from knowing how to administer a "do-it-yourself" VM installation. Of course, as I stated earlier, I don't believe I've ever used a pre-installed VM, so it's possible I'm speaking out of ignorance and perhaps these are (or someday will be) better options.

Thursday, November 4, 2010

Brief home sites review

I assume by "home sites" the assignment means the web presence of the CMS' we've looked at, and not the home pages of the repositories themselves. In any case, that's how I will be proceeding - by looking briefly at the web sites associated with EPrints, Drupal, DSpace, and Omeka.

It's hard not to be slightly biased because I prefer two of these (Omeka, Drupal) over the other two for my own collection. The home sites of all four were easy to find through a Google search, so accessibility is not an issue (and, of course, you only need to find the site once and add it to your "favorites"). The EPrints home page is as wordy as most of the collections that utilize it - not appealing. And the dense content actually makes what you're looking for harder to identify.

The Omeka site is pretty good - it's easy to find repository examples, user forums are prominent, and news and download options are easily identifiable. It looks like a site that is well thought out and organized. I like it better than Drupal's site which (although I like the product) doesn't look great and feels a little disjointed. The map identifying the global locations of people posting issues seems like overkill and doesn't add anything. I think this content should be left in the forums. The Drupal site also looks dated compared to other sites, which is surprising considering how many attractive sites have been designed with Drupal. Fortunately there are plenty of links to content that most developers would need, so I'm sure one could get used to the site pretty quickly if using Drupal for a production repository.

Finally there's DSpace, whose home page almost includes too little information. In fact, a weakly worded sentence is the only indication of what DSpace even does. Of course, there are links to other content, but the home page should include more, look better, and advertise the product more clearly. Like Drupal, the product deserves better than the home page that supports it. And, again like Drupal, the home site looks old and boring.

Obviously this is a pretty superficial analysis based entirely on first impressions. From a developer perspective, I'm not sure how much these observations matter. Anyone actually using any of these platforms would become intimately familiar with these sites, and would probably care less how they looked. I think from a marketing perspective they should look more modern and clean (only Omeka does a good job of this right now) but it's not critical. Organization is important, and would be a consideration for those debating which CMS to use for their repository, but it would probably be outweighed by the robustness of the product itself, assuming one can find what they need without too much inconvenience.

Thursday, October 28, 2010

Review of 3 search providers

The three service providers I looked at this week were all found on www.openarchives.org - primarily because the gita.grainger.uiuc.edu link was down during the time of this writing. From the list, I elected to explore the following providers: CASSIR (Cross Archive Search Services for Indian Repositories), Hispana (Spanish repositories), and NORA (Norwegian Open Research Archives).

CASSIR uses the same harvester we installed on our VM's. It includes 23 repositories, mostly science based, although one was specific to dentistry and another to management. Because I was already familiar with the layout, I found it easy to use and search. Most of the complaints I had about the search function in my practice machine were not present in this instance. For example, "All Archives" can be searched across multiple fields, and the results can be retrieved by clicking the "back" button.

Hispana is similiar to Europeana for European collections, but is specific to Spain. 129 repositories contribute to Hispana, and over 2.5 million digital objects are included. The 503 total digital collections can be searched individually, or one can search under specific repositories (default searches encompass all repositories). Hispana includes a diverse number of topics from across Spanish academia.

NORA is specific to Norway, and includes 6 broad topics (agriculture, humanities, social science, math/science, technology, and medicine). Dozens of repositories contribute (the list is long), however, a total number is not advertised. Unlike CASSIR, the search results show the name of the repository and, in many instances, the actual document can be downloaded in PDF format.

I was impressed with each of these, and found the metadata to be substantial and searching easy. None of them have an overwhelming number of repositories or records, and each appears manageable at current scale. I entered this exercise expecting to advocate for service providers that specialize in a small number of topics, which I imagine would benefit users and help maintain the most useful metadata. I still think there is real value to be had by this approach. However, large providers (like the oaister.org example) also have a role to play by bringing huge quantities of metadata together. Such large providers can provide greater context to collections, may be funded at a higher level which promotes preservation and sustainability, and may offer more dynamic searching.

Tuesday, October 19, 2010

More of a challenge than I expected...

This week, working with EPrints and the LOC subject headings, is the first time during 675 that I've encountered issues related to cataloging my collection. Now, I'd like to state up front that I've yet to take IRLS 530 (cataloging and metadata) - although it is on my schedule for spring semester. Hopefully that course will bring some clarity to the process.

Using my own taxonomy, as I did earlier in class, seemed easier for my 15 items than LOC because I could hand pick subjects and fields that I knew were related to my modest collection. LOC (or LCSH), of course, provides comprehensive subject headings which, surprisingly, can be difficult to navigate for items of a general, non-descript, nature. For instance, 3 of my items are simply pictures of books. I don't know the subject, author, or any details about their publication. I struggled with how to classify these, and eventually settled on "Fine Arts - Print Media". I'm not sure this is the best place for these items, but I gave it a shot. Problem is, were I looking for images of books, I'm not sure this is the first, or even second, place I would think to look.

I ran into a similar problem with my three sports-related pictures. I settled on a category in "Recreation", but it wasn't specific to the sport itself, which I found disappointing. I'm trying to be consistent in category selection, although I may only be consistently wrong in the choice of subject headings. That's obviously a separate issue. Say I have a collection of photos that don't neatly fit a specific subject heading, so I use my best judgement and pick one. Even if I'm consistent in using that heading, someone else with similar material may have chosen a different heading, and now our metadata doesn't match up - a problem for users and queries across repositories.

Hopefully my examples make sense. As I mentioned, cataloging is new to me, but I can already see that it's not nearly as easy as one might suppose, and will take a lot of practice and skill to apply on a consistent basis.

Wednesday, October 13, 2010

EPrints is bringing up the rear...

I didn't care much for DSpace during the first week, but by the second week I had warmed up to it. EPrints is running a similar deficit right off the bat, and I'm not sure it'll catch up. The installation itself went fairly smoothly - I rarely have installation problems. But the customization appears to require quite a bit of command line tinkering which isn't my preference. For instance, the subjects2.txt file appears to be the only way to change subject headings from LOC, whereas I much prefer editing them in a GUI like in Drupal. Additionally, customization of the home page comment line was accomplished only through the command line, which isn't necessarily a problem (I got it to work), but isn't my first choice. This is probably just an issue of confidence and practice, which could be overcome with time and EPrints experience. Nonetheless, it looks like EPrints offers less robust customization when compared against the range of options provided by Drupal.

This week I "branded" the home page by editing the command line to include a short sentence describing my collection. I'm still working to edit the subject headings. My collection is entirely images, which doesn't seem to be the primary format used by most EPrints repositories. From what I've seen, EPrints caters more toward the theses/papers/text crowd. In fact, the primary audience seems to be institutions seeking a CMS for digital preservation of research papers. At this point, I think Drupal or DSpace would be better choices for housing my collection than EPrints.

Monday, October 4, 2010

Rethinking DSpace and a couple (rhetorical) questions...

Last week I was down on DSpace. Today I logged back into it for the first time in a few days, and now I can't remember what all the fuss was about. I entered the rest of my collection - no problems. Got the workflows going - no issues. Everything seems fine. I still prefer Drupal (I like setting my own taxonomy and prefer the appearance), but I'm warming up to DSpace. A couple things do strike me as odd, however. For instance, when submitting a new item it gives three choices (more than one title, previously published, etc.), but why not a simple default with just one title? At first glance, it didn't appear that any of the choices applied to my simple collection of images. Also, must one click the license approval after each item is entered? There must be a work-around for this, because granting the license for each entry would be incredibly tedious for large collections.

Now my questions, which shall go unanswered for the time being. I am enjoying the practice with digital collections, and now think I may pursue a career as a digital archivist post-graduation. What is not clear to me is exactly what degree of expertise is required to compete for these positions. I feel like DigIn is providing a good foundation, but I seriously doubt I'll exit the program ready to install/create/manage production digital collections. Is that the goal of the program? What prospects exist for those of us interested in moving toward a position where we can contribute to a project like digitalMETRO? One of the authors of that initiative graduated from DigIn - an encouraging sign. However, his job title and expertise (as evidenced by digitalMETRO) lead me to believe he has additional computer experience. If so, what did he know, and when did he know it? Because, right now, I fear that my interest and basic understanding will be established, but I'll still lack the skill set to compete for digital archivist positions.

This line of questioning probably sounds naive - "Of course, you'll need more experience before you can be a full-blown digital archivist. Take a job with a digital component, learn that, and move up!" Is that how it's done? Maybe some of this insecurity will begin to vanish when I start looking in earnest for jobs next spring. But for the time being, I'm a little worried that my experience/skills will forever be lacking when compared to people with long-standing computer experience.