Fahrenheit 675: 2010

Tuesday, November 23, 2010

The Final Blog

Well, I guess this is it. 672 and 675 went fast - but I learned a lot. I remember being concerned when 672 started that my computer skills weren't up to the challenge, and that I would fall behind and never catch up. Fortunately that didn't happen, and both courses exceeded my expectations (including those I had for myself).

My 675 take-aways can basically be summarized in two categories: familiarity with the process of installing and manipulating CMS', and knowledge gained on how to select an appropriate CMS for a given collection. Both of these are equally important, but the second one in particular has lasting ramifications. Choosing the wrong CMS for a project can lead to huge financial and labor expenses if the effort is made to switch platforms. Looking at various CMS' provided the opportunity to see how they work, how they differ, and examine strengths and weaknesses of each. Additionally, practice installing the software was vital, although I must admit that I still lack confidence that I could install a fully functioning production environment.

I don't pretend to be prepared to lead a full-scale collection development project. If that was the goal of 672/675, then I apologize for failing in this regard. However, I think I can contribute to researching CMS', assist with installation and administration, and help establish the beginnings of a digital collection. In other words, I consider myself entry-level in terms of creating an IR. I do, however, believe I have a solid foundation to expand on, and look forward to increasing my knowledge, contributing to a production IR, and achieving greater levels of responsibility in creating/administering/managing institutional digital collections.

Tuesday, November 9, 2010

Prefer "do-it-yourself" VM's

My only VM experience comes courtesy IRLS 672 and 675, so I've never seen or worked with a pre-installed VM (unless I'm confusing the definitions). If pre-installed VM's were clean and easy to use, then I could see some benefit to having more time to manage the collection. However, judging from the blog assignment statements that, "the files are large" and "in practice, it's almost as much work and sometimes difficult to troubleshoot", it doesn't sound like the benefits would be easily realized.

And, of course, the learning experience can't compare to doing it yourself. For those managing a small or private collection, maybe the pre-installed VM is a good option. But for larger collections, with sophisticated features, numerous users, and large amounts of data, I think the "do-it-yourself" VM is the better option. Serious collection administrators will need to know the structure and design of their system in order to troubleshoot and take full advantage of the features; something I'm not sure you could master using a pre-installed VM.

Now, I'm writing with the understanding that VMWare Workstation is not a pre-installed VM, but one I created from scratch. Assuming that's the case, then I feel like I'm approaching a level of competency (and confidence) that suggests my preference is for building my own VM. If I'm wrong about my understanding, then I guess I'd have to reconsider, but at this point I believe my knowledge and career would both benefit from knowing how to administer a "do-it-yourself" VM installation. Of course, as I stated earlier, I don't believe I've ever used a pre-installed VM, so it's possible I'm speaking out of ignorance and perhaps these are (or someday will be) better options.

Thursday, November 4, 2010

Brief home sites review

I assume by "home sites" the assignment means the web presence of the CMS' we've looked at, and not the home pages of the repositories themselves. In any case, that's how I will be proceeding - by looking briefly at the web sites associated with EPrints, Drupal, DSpace, and Omeka.

It's hard not to be slightly biased because I prefer two of these (Omeka, Drupal) over the other two for my own collection. The home sites of all four were easy to find through a Google search, so accessibility is not an issue (and, of course, you only need to find the site once and add it to your "favorites"). The EPrints home page is as wordy as most of the collections that utilize it - not appealing. And the dense content actually makes what you're looking for harder to identify.

The Omeka site is pretty good - it's easy to find repository examples, user forums are prominent, and news and download options are easily identifiable. It looks like a site that is well thought out and organized. I like it better than Drupal's site which (although I like the product) doesn't look great and feels a little disjointed. The map identifying the global locations of people posting issues seems like overkill and doesn't add anything. I think this content should be left in the forums. The Drupal site also looks dated compared to other sites, which is surprising considering how many attractive sites have been designed with Drupal. Fortunately there are plenty of links to content that most developers would need, so I'm sure one could get used to the site pretty quickly if using Drupal for a production repository.

Finally there's DSpace, whose home page almost includes too little information. In fact, a weakly worded sentence is the only indication of what DSpace even does. Of course, there are links to other content, but the home page should include more, look better, and advertise the product more clearly. Like Drupal, the product deserves better than the home page that supports it. And, again like Drupal, the home site looks old and boring.

Obviously this is a pretty superficial analysis based entirely on first impressions. From a developer perspective, I'm not sure how much these observations matter. Anyone actually using any of these platforms would become intimately familiar with these sites, and would probably care less how they looked. I think from a marketing perspective they should look more modern and clean (only Omeka does a good job of this right now) but it's not critical. Organization is important, and would be a consideration for those debating which CMS to use for their repository, but it would probably be outweighed by the robustness of the product itself, assuming one can find what they need without too much inconvenience.

Thursday, October 28, 2010

Review of 3 search providers

The three service providers I looked at this week were all found on www.openarchives.org - primarily because the gita.grainger.uiuc.edu link was down during the time of this writing. From the list, I elected to explore the following providers: CASSIR (Cross Archive Search Services for Indian Repositories), Hispana (Spanish repositories), and NORA (Norwegian Open Research Archives).

CASSIR uses the same harvester we installed on our VM's. It includes 23 repositories, mostly science based, although one was specific to dentistry and another to management. Because I was already familiar with the layout, I found it easy to use and search. Most of the complaints I had about the search function in my practice machine were not present in this instance. For example, "All Archives" can be searched across multiple fields, and the results can be retrieved by clicking the "back" button.

Hispana is similiar to Europeana for European collections, but is specific to Spain. 129 repositories contribute to Hispana, and over 2.5 million digital objects are included. The 503 total digital collections can be searched individually, or one can search under specific repositories (default searches encompass all repositories). Hispana includes a diverse number of topics from across Spanish academia.

NORA is specific to Norway, and includes 6 broad topics (agriculture, humanities, social science, math/science, technology, and medicine). Dozens of repositories contribute (the list is long), however, a total number is not advertised. Unlike CASSIR, the search results show the name of the repository and, in many instances, the actual document can be downloaded in PDF format.

I was impressed with each of these, and found the metadata to be substantial and searching easy. None of them have an overwhelming number of repositories or records, and each appears manageable at current scale. I entered this exercise expecting to advocate for service providers that specialize in a small number of topics, which I imagine would benefit users and help maintain the most useful metadata. I still think there is real value to be had by this approach. However, large providers (like the oaister.org example) also have a role to play by bringing huge quantities of metadata together. Such large providers can provide greater context to collections, may be funded at a higher level which promotes preservation and sustainability, and may offer more dynamic searching.

Tuesday, October 19, 2010

More of a challenge than I expected...

This week, working with EPrints and the LOC subject headings, is the first time during 675 that I've encountered issues related to cataloging my collection. Now, I'd like to state up front that I've yet to take IRLS 530 (cataloging and metadata) - although it is on my schedule for spring semester. Hopefully that course will bring some clarity to the process.

Using my own taxonomy, as I did earlier in class, seemed easier for my 15 items than LOC because I could hand pick subjects and fields that I knew were related to my modest collection. LOC (or LCSH), of course, provides comprehensive subject headings which, surprisingly, can be difficult to navigate for items of a general, non-descript, nature. For instance, 3 of my items are simply pictures of books. I don't know the subject, author, or any details about their publication. I struggled with how to classify these, and eventually settled on "Fine Arts - Print Media". I'm not sure this is the best place for these items, but I gave it a shot. Problem is, were I looking for images of books, I'm not sure this is the first, or even second, place I would think to look.

I ran into a similar problem with my three sports-related pictures. I settled on a category in "Recreation", but it wasn't specific to the sport itself, which I found disappointing. I'm trying to be consistent in category selection, although I may only be consistently wrong in the choice of subject headings. That's obviously a separate issue. Say I have a collection of photos that don't neatly fit a specific subject heading, so I use my best judgement and pick one. Even if I'm consistent in using that heading, someone else with similar material may have chosen a different heading, and now our metadata doesn't match up - a problem for users and queries across repositories.

Hopefully my examples make sense. As I mentioned, cataloging is new to me, but I can already see that it's not nearly as easy as one might suppose, and will take a lot of practice and skill to apply on a consistent basis.

Wednesday, October 13, 2010

EPrints is bringing up the rear...

I didn't care much for DSpace during the first week, but by the second week I had warmed up to it. EPrints is running a similar deficit right off the bat, and I'm not sure it'll catch up. The installation itself went fairly smoothly - I rarely have installation problems. But the customization appears to require quite a bit of command line tinkering which isn't my preference. For instance, the subjects2.txt file appears to be the only way to change subject headings from LOC, whereas I much prefer editing them in a GUI like in Drupal. Additionally, customization of the home page comment line was accomplished only through the command line, which isn't necessarily a problem (I got it to work), but isn't my first choice. This is probably just an issue of confidence and practice, which could be overcome with time and EPrints experience. Nonetheless, it looks like EPrints offers less robust customization when compared against the range of options provided by Drupal.

This week I "branded" the home page by editing the command line to include a short sentence describing my collection. I'm still working to edit the subject headings. My collection is entirely images, which doesn't seem to be the primary format used by most EPrints repositories. From what I've seen, EPrints caters more toward the theses/papers/text crowd. In fact, the primary audience seems to be institutions seeking a CMS for digital preservation of research papers. At this point, I think Drupal or DSpace would be better choices for housing my collection than EPrints.

Monday, October 4, 2010

Rethinking DSpace and a couple (rhetorical) questions...

Last week I was down on DSpace. Today I logged back into it for the first time in a few days, and now I can't remember what all the fuss was about. I entered the rest of my collection - no problems. Got the workflows going - no issues. Everything seems fine. I still prefer Drupal (I like setting my own taxonomy and prefer the appearance), but I'm warming up to DSpace. A couple things do strike me as odd, however. For instance, when submitting a new item it gives three choices (more than one title, previously published, etc.), but why not a simple default with just one title? At first glance, it didn't appear that any of the choices applied to my simple collection of images. Also, must one click the license approval after each item is entered? There must be a work-around for this, because granting the license for each entry would be incredibly tedious for large collections.

Now my questions, which shall go unanswered for the time being. I am enjoying the practice with digital collections, and now think I may pursue a career as a digital archivist post-graduation. What is not clear to me is exactly what degree of expertise is required to compete for these positions. I feel like DigIn is providing a good foundation, but I seriously doubt I'll exit the program ready to install/create/manage production digital collections. Is that the goal of the program? What prospects exist for those of us interested in moving toward a position where we can contribute to a project like digitalMETRO? One of the authors of that initiative graduated from DigIn - an encouraging sign. However, his job title and expertise (as evidenced by digitalMETRO) lead me to believe he has additional computer experience. If so, what did he know, and when did he know it? Because, right now, I fear that my interest and basic understanding will be established, but I'll still lack the skill set to compete for digital archivist positions.

This line of questioning probably sounds naive - "Of course, you'll need more experience before you can be a full-blown digital archivist. Take a job with a digital component, learn that, and move up!" Is that how it's done? Maybe some of this insecurity will begin to vanish when I start looking in earnest for jobs next spring. But for the time being, I'm a little worried that my experience/skills will forever be lacking when compared to people with long-standing computer experience.

Wednesday, September 29, 2010

DSpace ain't no picnic

Wow... after a good week with Drupal, I'm feeling a little disappointed with DSpace. The installation went fine - as usual, the directions were clear and I had no issues. Now, could I have figured the installation out on my own? Certainly not. I recognized many of the commands and could follow what was going on about 85-90% of the time, but no way I could have completed the install without help from a systems specialist. Basic steps I get, but others - like editing files - leave me bewildered. Just not confident I would have ever known to do that on my own...

Drupal I found fairly easy to use and manipulate. DSpace seems much less user friendly. Many of the tutorial instructions seem more complex than I expected, and it looks like some require more code than Drupal. I haven't fooled with "look and feel" yet, but I'm already wondering if Drupal is more focused on presentation, and DSpace more about content. I also prefer how Drupal allows the administrator to assign metadata categories, whereas DSpace uses DC by default - which is fine - but I prefer the flexibility of Drupal. Plus, at first glance, editing categories in DSpace seems a bit difficult.

Wednesday, September 22, 2010

Final thoughts on Drupal?

If this was the final week we use Drupal, then I at least wish to report how much I enjoyed the experience. It was actually kinda fun - a new feeling for me since (especially before DigIn) I used to approach new computer applications with apprehension. But everything went smoothly, worked, and the results were quite satisfactory. I look forward to using Drupal again in the future.

I don't have a lot to say, so I'll briefly address both issues presented in the blog assignment. First, I decided to install the Imagemenu module for assignment 2. I'm not sure it enhances my collection in any meaningful way at this point. I chose it because it seemed relevant to a collection of images, and appeared to be an easy install without dependencies. It allows images to be arranged in a menu, and supports "optional mouseover behavior". I'll have to play with it a bit more to realize it's full potential, but for now I'm satisfied that the install worked and the module is functional.

Finally, it's clear (as I stated last week) that Drupal is a great application for managing digital collections. It's a popular choice among institutions for obvious reasons. Of course, we've barely scratched the surface, but this novice feels informed enough to recommend Drupal to libraries/archives and I hope to work with it again in earnest. One thing we didn't really discuss is how to manipulate the appearance of our sites. I know this is secondary to learning how to manage content, but I'm still interested to discover how the attractive Drupal sites I've seen online were created.

Friday, September 17, 2010

Impressed with Drupal

Drupal is the first CMS I've used, so my impression lacks the benefit of meaningful comparisons. I mentioned in a discussion post that Drupal reminds me of DotNetNuke, an open-source web management system (I guess it might be considered a CMS), which I have some novice experience using. In any case, I'm impressed with Drupal. It's fairly easy to use, and is clearly powerful for the development and maintenance of digital collections as evidenced by the number of institutions using it for their own collections.

Of course, based on my limited knowledge, I consider Drupal well suited to managing my 675 collection. There's a great deal of flexibility in defining categories - in fact, it appears any desired category can be created. I'm curious to know how to affect the presentation of the collection, since it currently exists in the default view only. Likewise, I look forward to using some of the "contrib modules" to expand the features currently available.

I'm not sure I have well-defined criteria yet for what features are important to me in a CMS. As I stated, Drupal is the first real experience I've had with this, so I need further exposure to really know what I'm looking for in terms of features and usability. It's natural to trust Drupal because so many institutions (including UofA) use it. I will say that an intuitive interface is important to me, and will be for users as well. Additionally, a variety of stable and powerful modules are important to ensure the site looks professional and the features work correctly.

Wednesday, September 8, 2010

Short one this week...

Everything has been going smoothly for me so far in 675, so I don't have much to report this week. The instructions have been clear for all tech assignments, and I've encountered no major problems. My only concern at the moment comes when I contemplate the Capstone course next summer. I'm afraid that without the instructions, I won't remember how to do each process in setting up a LAMP server, or installing/using Drupal. I'm not sure if that's a real or imagined fear - I'm not exactly sure what Capstone involves yet - but I do find myself worrying about it occasionally. In any case, since 672 I've printed out and saved the tech instructions each week for future reference, but I expect having the processes committed to memory will be more beneficial during the final course.

In short, the pace has been fine and my results satisfactory... no complaints or issues at this point.

Sunday, September 5, 2010

Small college CMS

The article "Building a collection development CMS on a shoe-string" by Regina Beach and Miqueas Dial (Library Hi Tech, 2006, vol. 24, issue 1) highlights the efforts made by the small campus of Texas A&M University-Kingsville to digitize their book ordering process.

Before this project, faculty at this small college of 5000 students continued to rely on physical order form cards to request acquisitions from library staff. The process was slow, unreliable, and difficult to track. Often books were ordered, but no follow-up was forthcoming to inform faculty of their arrival. Additionally, library holdings were considered insufficient to promote scholarly competitiveness, and professors frequently found themselves "teaching down to the library resources".

In partial remedy of this situation, the library instituted a small CMS designed to streamline the book ordering process. Created using Microsoft Access, the new database provides faculty with a one-stop platform to submit and track orders. A secondary goal, through collaboration with the acquisitions and cataloging staff, is to consolidate information by allowing bibliographic data to be entered into the library system only once (instead of multiple times across platforms).

I chose this article for two reasons. First, I was curious to see what CMS solution a small college (with limited staff and resources) would implement. Secondly, because it's quite likely my first position post-graduation will be with a smaller institution, I was interested in learning how they approach such issues. Quite frankly, I was expecting an open-source solution, and was a little surprised to find they used Access. Most likely they already had it on their systems, and didn't expend resources to acquire it solely for this purpose. I was also surprised by the limited scope of the project. I often imagine CMS' being very large and comprehensive, but this project addressed such a modest issue that the term "CMS" almost seemed too broad in relation to the problem being solved.

Monday, August 30, 2010

Collection preliminaries

I'm still ruminating over the best way to proceed with this collection, and what subject I want to focus on. So far, I've gathered 15 pictures from Getty Images divided into 5 subjects with 3 images each; namely, sports, books, travel, space (the final frontier variety), and nature.

My initial idea is to create a simple collection of images that might be used as a stock photograph resource for people needing pictures for various projects like blogs, PowerPoint presentations, papers, etc. Obviously, this is not a unique idea, but I don't think this project requires a high degree of originality.

However, it will require a useful and well-organized mechanism to search and tag the collection objects. Tags I've considered at this point include the subject headings listed above, plus a word describing each picture such as Greece, Jordan, baseball, book, stars, elephant, etc.

However, I'm still unsettled concerning my collection choice. I have a strong interest in ancient history, and am wondering whether a collection specific to this topic, and including formats beyond images (such as articles, books, and maps), might not make for a more interesting and meaningful collection. The audience in this case would be people with a similar interest who might peruse the collection to find relevant pictures or recent articles concerning ancient civilizations. Any feedback the reader may care to offer on my collection ideas is greatly appreciated.