Codex Monkey: 2007

Wednesday, September 19, 2007

Symposium on the Future of ILSs: Let's see numbers

I had the good fortune to attend the Symposium on the Future of Integrated Library Systems recently. It was an excellent, excellent conference. The Lincoln Trail folks should be proud of themselves. I have pages and pages of notes, but I figured I would do a couple of posts on what seemed to be to be several of the points that seem to be common among several of the talks. We'll start with the one that has the most resonance with me: "We need to see the evidence".

This point kept coming over and over, although no one was obnoxious about it. We need to have more evidence to support our actions and decisions as we move forward. One part of this is we simply need better information on our own costs and expenditures. Chip Nilges from OCLC mentions the value of a link in one of his talks. Do you know how many people view an individual catalog record? Can you estimate how much that space is worth? This seems vague and fuzzy in the library world, but I'm not in administration. Perhaps that's just the view from below.

Perhaps even more importantly seems to be a gap in our knowledge about our own users. We see organizations like Google and Amazon rise because they focus a lot on the average user. They are constantly studying logs, creating and reading usability studies, and just talking with people. It's not to say that librarians haven't done this in the past. I know there's some excellent papers out there in libraries and from some of the Information Retrieval folks. However, it seems to be that in our day-to-day planning we make wild guesses, ones that are frequently wrong.

It's difficult to get funding or budgets for usability studies. Some of this seems to be changing recently, but it's difficult to tell if this is a general trend or just a local one. I'd like to think some people at least have gotten used to me trying to figure out what our users are actually doing and have stated to try to find better evidence for changes they'd like to make, but I'm really not that important. More likely it's become clear to some who resisted things like this that our current ways just aren't working.

Now, I want to clarify something. The need for evidence shouldn't be a chilling factor. I've seen some people recently become overly critical of fledgling efforts and seemingly requiring usability studies and the like before a project even starts. This is a severe burden when someone is just starting a cycle of development. Usability should come early, but you need experimentation as well. It shouldn't be something that each research and experimenter needs to be an expert on, but something that gets built into the overall process for research and development. Ideally there's a constant cycle of experimentation, feedback, development, and feedback.

To clarify, this is one time when not having much data shouldn't be a sin. It shouldn't be an excuse to kill a project before it starts. Yes, it's a good indication if there's existing studies that a user might like recommendations. It's madness not to move forward with at least examining, experimenting and researching with the idea of recommendations just because there's no documented usability studies about how people like them. The foundations for the actual usability and user studies should be allowed to be created.

So....in an attempt to stave off the book I could probably write about this, let me just conclude: user testing and user-orientated design is great. It should be much, much more involved in all levels of the library. It should be a re-occurring part of the feedback loops within the library. A healthy institution has a feedback loop between it and the real world. It feels like a living, breathing, reacting thing. An unhealthy one seems like a machine shambling along blind, deaf, and oblivious of its surroundings. Keep working at trying to incorporate actual information about patrons and your own people and your library might just start a little bit more alive, maybe even a little more human.

Saturday, September 08, 2007

Open Source Misconceptions: Evaluating Software

I've noticed lately in the library software world lately that there seems to be a false distinction that's being made about commercial vs open source software. I"m not saying there's not difference, but I'm saying when evaluating software it's often useful to ignore whether it's open source, proprietary and actually decide some of the qualities you're looking for. I watch a lot of food network. They constantly have food contests where judging is done by choosing some qualities (originality, smell, flavor, flammability, what not). There's a table where the food's marked with numbers or letters. Judges taste the food, rate it in every category.

So, ok, we can't do it blind. But doing criteria can help avoid some biases.

So, for example, let's look at some possible categories.

Support:
Don't be fooled here. Multiple vendors can service proprietary software, just the same for open source software. True, open source supports will be quick to remind people that you can pay someone to develop any software. But an actual vendor with experience is really required.

Reputation is important here. Very important. What's the use of going with a vendor that's infamous for taking money and bug reports and doing nothing for years. Of course, people in the library world seem terrified about complaining about bad vendors. That's another post though.

Ease to modify:
Difficult to judge if you're not experienced. Some software is really easy to configure, but a pain to extend and modify. If it has an API, direct database access, or the code is visible, it's probably easier to modify than something locked on a vendor service.

Ease to configure:
A bit different than the above. Is there any way to change how the software function? Do you have to stumble through badly documented and bizarre text files? A screen full of unexplained little icons?

Expense of the software itself:
Well, it's a consideration. Really.

Quantity of customers/community:
Are there a lot of people using the software? Some guy and his friend?

Quality of customers/community:
Are they enhancing it, tweaking it, generally loving it? Or do they mostly buy it, install it on some server, and then write a bit in the newsletter and forget about it?

Longevity:
How long has the vendor/community/software been around? How healthy does it look?

You'll probably see some general trends that distinguish open source from proprietary solutions, but you might be surprised when you start examining it. Some open source projects might have a vibrant community with lots of users. Others are dead on arrival in the undergraduate's dorm room. A vendor might have built up an excellent product with a high level of quality. Or it could have transformed into a company full of managers and salesman striving to milk every dollar out of product they actually no longer know how to enhance or fix.

So, hopefully we can start moving beyond simplifications.

Sunday, April 29, 2007

The almost lightening talk: Accessibility

At the Code4Lib conference I was asked a few times if I was going to do a lightening talk. I really could only think of a few topics and while I stood around deciding, we ended getting the perfect number of needed lightning talks.

But all my thinking did give me an one idea that seemed well-suited to a lightening talk. And since I figure a good lightening talk should translate to a blog posting pretty well, here it is.

Recently I've been tilting at a particularly obstinate windmill, one that would do the Don of La Mancha proud. There's been complaints about the accessibility of our Voyager OPAC to various groups such as the blind, low-vision, and those with hand eye co-ordination issues. The complaints are right on the money, but sadly there isn't much we can do. With the great help of some good folks at the consortium and our disability offices we're making at least some progress. From this experienc I'm picking up a better sense of what's really important for accessibility.

Navigation:
This includes both in the page and the overall site. You need unique titles for pages and you need to use headers. So a good title might be "Detailed information for Moby Dick: Penguin Ed.". You can adjust the headers into negative space so they don't appear for sighted users, but headers allows a person using a screen reader to jump around.

Forms:
This is something I've been guilty of. Each form input element should have an id and a corresponding label with a 'for' attribute pointing to the input. I tend to be heavy with ids for later javascript manipulation, but I must admit to an old habit of avoiding labels due to styling quirkiness in browsers that no longer exists. They seem to be fine now, no reason not to use them.

Session timeouts and size of returns:
It seems when you broach a topic like this, you end up getting at least one person who says"Well, who would take more than ten minutes to read through results?" The fact is though, there's many patrons who might. The ideal here might be to have no timeout. Practical reality says try to have the most important pieces of information about each item on the screen. This is an area that would be interesting to evaluate some of the faceted interfaces. Is there a way to display it so even people who take longer to scan can do so successfully? I really don't know.

Clean HTML:
Really. Really. I still see really bad html, well, everywhere. At least skim through the spec, it's really not that long or hard. This seems actually less important, as long as you don't have a ton of tables. But still, clean html makes for easier post-processing manipulation for everyone, including yourself. I was around in the 90s. And I made clean html. Mostly because I'm lazy ;).

One last word, there is one trap to avoid:

Alt attributes seem to be constantly mentioned by the designers and web people, but doesn't seem as much of a concern for the actual patrons who are blind and low-sighted. The important part....have alt if it's an image that has meaning (is a control, indicate quality of return). Putting things like "filler" or "pretty picture of duck" can be quite annoying to some. I'm not quite sure why designers seem to start centering on this.

Wednesday, February 07, 2007

Stupid XSLT Trick #1: Escape Madness

XSLT is a language that has caused endless amounts of teeth gnashing and wailing. When you combine xslt with some of the spectacularly bad parts of RSS that makes people gnash their teeth and scream you end up with toothless programmers condemned to speak in whispers due to damaged vocal cords.

But what if I said there was a way out of the darkness, a quick and dirty hack that offered salvation from eating pudding for the rest of your life?

The following code takes advantage of the ability of XSLT to have "modes". If you attended a CS course on the fundamentals of Computer Science and managed to stay awake, you'll remember your professor mentioning Turing machines. These hypothetical machines travel along a series of 1s and 0s in various states. For example, If it's in state A and sees a one, it might change to state B and go forward two numbers. You can have just the same amount of fun with XSLT.

Seriously though, it allows us to traverse an unpredictable tree and specify behavior in a concise way.

So here's the code:


<xsl:template match="/">
 <someAlmostMarkup>
   <xsl:apply-templates mode="escape" />
 </someAlmostMarkup>
</xsl:template>

<xsl:template match="*" mode="escape">
 <xsl:text>&lt;</xsl:text>
 <xsl:value-of select="name()" />
 <xsl:apply-templates mode="escape" select="@*" />
 <xsl:text>&gt;</xsl:text>
 <xsl:apply-templates mode="escape" />
 <xsl:text>&lt;/</xsl:text>
 <xsl:value-of select="name()" />
 <xsl:text>&gt;</xsl:text>
</xsl:template>

<xsl:template match="@*" mode="escape">
 <xsl:text> </xsl:text>
 <xsl:value-of select="name()" />
 <xsl:text>="</xsl:text>
 <xsl:value-of select="." />
 <xsl:text>"</xsl:text>
</xsl:template>

And even an example xml doc


<songlist>
<song>
<Artist status="boozed" born="1-27-1913">J-Live</Artist>
<Genre>Rap</Genre>
</song>
<song>
<Artist status="dead">Phish</Artist>
<Genre>Rock</Genre>
</song>
<song>
<Artist status="energy" born="1-1-812">Radical From Planet G</Artist>
<Genre>Rap</Genre>
</song>
<song>
<Artist status="rocking">Queen</Artist>
<Genre>Rock</Genre>
</song>
</songlist>

In puesdo-xslt, this is saying

for each element you come across, print :
< the name of current element (Apply templates to current element's attributes) > (Apply templates current element's children elements and text nodes) </ the name of current element >

For each attribute print name="value"

I'm relying on the default template for text nodes which is to simply print them out as well as the default behavior of an which is to select both element nodes and text nodes.

Notice you'll want to change match="/" with something else, probably foo.

Of course, most people building up an rss feed won't want to just stick an escaped element in there. Well, remember, you're just sending out text. That means you can't use all those cool nifty things xslt can do with xml. But you can always write something that looks like an element by using < and the name() function.

Still don't like it? Complain to the RSS spec designers.

Sunday, January 21, 2007

Why I love the command line.

Last year about this time I did several workshops on how to use the command-line in a linux environment, ranging from how to log into some servers to more advanced find/xargs manipulations. I've recently been asked to do another series of workshops around the same topic. One issue is that these workshops aren't focused on code monkeys or techies. Instead they're more for folks in the humanities who have found themselves need web servers and the like.

So I guess underlying this is the fundamental question: "Why do I love the command-line so much?"

There's actually several answers:

It's cheap
It sounds like a crass reason, but working with the command-line only requires a linux/unix machine or installing cygwin on Windows. Free. No needing to pay extra money for help documentation. (Like for a brief while Microsoft did with VBA for student deals). No need to have an expensive IDE or macro program.

It's quick and easy.
Learning how to use the command-line may take a while, but once you do it opens up the ability to write quick and dirty commands that would otherwise require some programming. Good example: Recently there was a website where I needed documents that followed a certain filename convention, something like "*foo*bar.doc". The page that had the links had a LOT of other links. So...I do something like lynx -dump http://thepage.com | grep "foo*bar.doc" | xargs -n 1 wget

Acting on sets of things
Similar to my example above, the command-line makes it pretty easy to act on sets of things, say files that meet certain criteria.

Easy to automate
It's typically easier to use a series of steps in the command line environment into a script that can be automated rather then trying to automate GUI's with macros

Documentation
For most Linux/Unix systems, each command has documentation, and this documentation has several tools for searching it. (Look at man, info, and apropos for some examples of command-line help systems.

Cause my inner-geek occasionally reveals in complexity
Ok, so occasionally it's cool because I can do weird and complex things. Makes me feel like some great and powerful sorcerer. You can do so much with just a few words.

No guessing what all the little icons mean
Ok, this is one of my pet peeves. It always seems so difficult to figure out what the heck the weird little icons do in software inspired by Windows Office icon mania. Icons can be done well. But they can be done very, very badly.

Codex Monkey