Skip to content

On Google and Privacy and Library Search

10-Jun-07

A quick comment on the latest privacy analysis conducted by Privacy International. Their methodology is a very interesting read because it appears to be directly counter to what Google is doing with search and data aggregation. Two points in particular:

Data collection and processing
What type of information does the site collect, with and without consent? On some sites the personal information submitted by customers is necessary (e.g. billing addresses) but there are many sites that collect information that may be unnecessary (age, marital status, home address, preferences, medical information, extraneous financial information) from customers without adequate information about why this information is needed and how it is used. Some companies may collect and mine other information, such as viewing habits and preferences (e.g. musical genre, lifestyle choices etc.)Here, it is also important to note the status of ‘Internet Protocol Addresses’ (IP addresses). Many companies state that they see this data as non-personal - even anonymous - information, permitting them to collect and track users’ movements around the site to determine what a specific user reads. This approach permits profiling of a user’s habits and interests.
Data retention
Some companies delete the information they collect once it is no longer needed. Other companies are not quite so clear, and a few sites are quite open that they do not intend to delete personal information at all (or at least not until they are ready to do so). With increased consumer concern about information breaches from stolen and lost computing resources, or through malicious hackers gaining access to resources, companies need to be aware that the risk to their market position and customer base may be proportionate to the amount of personal data they store.

Both of those points directly contradict how I use Google daily. I rely on Google in my ceaseless and compulsive drive deep into the Long Tail. For me, it’s all about localization and frankly, I live in a small community. That’s true physically, esthetically, professionally, and personally. And I don’t think I’m alone in that.

In my experiment this year (using Google tools for email, blog reading, calendaring, and more) what they are doing for me is aggregating my online experience. If you’ve ever had the frightening opportunity of typing in your email password on a third-party social site, you know what I mean. That screams privacy suicide, but it’s also incredibly effective. Just today I took the plunge on Facebook and by comparing their database to my Gmail contact history I found several people I would like to be connected with. It’s a genuine service that Google is offering and it relies on their having collecting extraneous information from me, using it in a way that I couldn’t have foreseen, and then storing it long-term even though I can’t remember ever giving them permission to do so [although I'm sure a Google lawyer would correct me on that point]. So I don’t think Google is in the wrong on those two points, at least in terms of their long-term vision.

Here’s where I think they’re doing us a disservice: “Openness and Transparency” and “Responsiveness” (from Privacy International’s metrics). If Google were to let me into their system as an account holder and allow me to selectively weed out information that I didn’t want on their servers, I would be much more comfortable. Of course this type of activity would be a huge albatross for a private company to take on, especially taking into account the changeability of Internet business. They would have to make sure that all services offered, now and in the future, were designed in a way that allows me to access the underlying data regardless of how it is structured and where it’s located. I’m assuming they wouldn’t take on this task unless forced to.

Here’s where academic digital libraries could lead (assuming the availability of sufficient resources, of course) by enabling our researcher’s natural and vital drive into various “long tail” topics using socially derived data as jumping off points. This could be done by building a consortial database of personal information (demographics, searches, keywords, citations used, email contacts, professional memberships, conferences attended, etc., etc.) with the caveat that members can easily weed personalized data at will. And this data would need to be stored and accessible in perpetuity.

It would be a dream come true for researchers, and potentially our worst nightmare in the wrong hands.

Libraries offering Video on Demand

07-Jun-07

There’s an interesting, but brief, writeup on the new Overdrive and Recorded Books services in this month’s print edition of Library Journal.

Recorded Books, a supplier of downloadable material to libraries, plans to announce an agreement with major studios at the annual conference of the American Library Assn. June 21-27 in Washington, according to Recorded Books VP Brian Downing.

The company already offers titles from indie house Film Movement along with public domain and other films, such as The Autobiography of Miss Jane Pittman. The company also offers travel and cooking programming and will soon add children’s entertainment.

Called MyLibraryDV the download service went live in February and already has almost 600 libraries signed up, Downing says, including all the libraries in the entire state of Wyoming. Some libraries are seeing “thousands of downloads and thousands of users,” he says.

Exciting stuff. The service seems to be targeting public libraries, but we have several academic classes which require viewing mainstream and indie films as part of the curriculum.

Jott to Self: Future of the Library Interface?

24-May-07

I’ve been playing around with the beta release of Jott [http://jott.com/] and it’s an interesting example of the potential for the cell phone to become the next search interface. It’s a free service that allows you to give it a call, it recognizes your phone number as an account holder, and then transcribes your messages for broadcast out to yourself or others as an email.

Playing around with the service, it definitely has some downsides. My first test message wasn’t very well planned and I haltingly said something like “umm, test message to myself”. When I got the email, the transcription read “Tapped own weapons”. A couple more emails like that and I’ll get a visit from the FBI!

I can see something like this being used in a library if a patron wanted to save an overheard citation (as an example) and email it to their account. So my next message was the following:

Pitts, M. G., & Browne, G. J. (2007). Improving requirements elicitation: An empirical investigation of procedural prompts. Information Systems Journal, 17, 89-110.

The transcription for this took several minutes, at least several minutes for the email to arrive, and when it did arrive it read like this:

Pip and Brown 2007 improving requirements [unclear speech, please listen] and imperial investigation of procedural prompts. Information system Journal IM-17, page of 89 to 110.

While that’s not perfect, there’s enough information there to figure out what I was trying to remember. Also, the email arrives with a speaker/audio icon which allows me to listen to the original message.

It seems we’re a long way off because of the quality of transcriptions, but applying this technology to an OPAC search doesn’t seem quite so distant now.

The Landscape of User Research

09-May-07

I’m currently enjoying this book:

Mulder, S., & Yaar, Z. (2007). The user is always right: A practical guide to creating and using personas for the web. Berkeley, CA: New Riders.

Mostly because of the way the authors break down the persona creation process. This landscape of user research chart is a good example of their technique: (pg. 40) The Landscape of User Research

Part of the reason this is useful is because of its delineation between insight and validation and between what people say and what they do. I think a lot of times groups approach usability from the standpoint of tools, i.e. “we have this survey software, let’s conduct some user research”. This chart highlights the need to decide ahead of time exactly what your goals are for whatever study you’re embarking on.

For example, if your project needs funding, and a validation of what your users are currently putting up with will help highlight that, then a user survey is entirely appropriate. The Montana Memory Project is one such animal, because it’s a software platform in search of sponsorship, but the users are fairly well on board. A survey should go a long way to validate the depth of enthusiasm for the project as well as highlight gaps in service to users across the state.

On the other hand, this chart also points out the fact that surveys cannot satisfy all research needs in this area. For that you need an open-ended format, preferably face to face where “accidents” can happen. A really good example was a study I worked on in the past where I was interviewing a faculty member about to retire. He admitted that he uses library article databases merely to find phone numbers of people researching in his field so he could be sure to keep track of their work. That’s a use case I would never have dreamed of for this project, and probably wouldn’t have come up with in a survey because it was tangential to what we were studying at the time.

Lastly, I like this chart because it points out the breadth, depth, and potential of usability studies. Good stuff.

Game-like Elicitation Methods

23-Mar-07

I found a link to MindCanvas today by browsing the site for this week’s Information Architecture Summit in Vegas. Their business is built around a meme called Game-like Elicitation Methods (or GEM) and their manifesto makes the following claim:

The overuse of Likert scales must stop. We understand that Likert scales have their uses, but every study to understand people does not need Likert scales. Likert scales are problematic in many ways - response biases, sheer rating boredom, statistical issues (ordinal or interval- depends on who you ask) are just some of them!


I like that sentiment quite a bit. A quick browse through their online demos reveal examples of implementations that are actually a lot of fun. I love their divide-the-dollar method and would love to use it; if only they were creating OSS solutions instead of a consultancy. They also list their influences:

  1. Zaltman, Gerald. How Customers Think: Essential Insights into the Mind of the Marke. [Amazon]
  2. Russell Bernard and his research methods in cognitive psychology
  3. James Suroweicki’s Wisdom of Crowds
  4. Malcolm Gladwell
  5. Tufte (of course)
  6. The games at Popcap
  7. Doug Engelbart’s ideas for Augmentation, not automation
  8. Ben Schneiderman’s work with visualization
  9. Gmail (which I’ll report on at the end of the month)
  10. SurveyMonkey

It’s interesting how they’ve mixed a dot-com approach to usability studies.

Free academic images from ARTStor

19-Mar-07

Found via Academic Commons, ARTstor is planning to soon start offering high-def digital images for academics. This is great news, and somewhat surprising since it seems many companies are trying to crack down on the availability of digital content: Viacom’s lawsuit against YouTube and the RIAA’s influence regarding potential fees levied against online radio are two recent examples.

Expert Usability Review vs. Usability Testing

16-Mar-07

Lisa Halabi has a very good article at usabilitynews.com on the common question of whether or not to implement usability guidelines (aka an expert usability review) in place of conducting usability testing. An expert review is tempting because a reasonable person would expect to be able to codify what is and what is not good usability. Once that’s taken care of, all that would be left is to filter out usability bugs by way of a checklist. Take for example the excellent guidelines created by the US Dept. of Health and Human Services at usability.gov. It’s a fascinating read and very helpful.

Halabi differentiates the two techniques this way:

- An expert usability review is when a usability specialist inspects a website to identify potential usability problems.
- Usability testing involves getting people from the target audience to evaluate your site whilst performing tasks.

There are so may variables involved for the expert reviewer, that getting an accurate read on the real issues is very difficult:

Often, expert reviews will:
- Miss usability issues that arise during usability testing
- Find some issues that usability testing didn’t
- Report false alarms (i.e. not real issues)

Getting users to talk to you is where real issues come to the fore. By definition, if a user has a problem with a designer’s solution, it’s a usability problem and not a false drop. Recording and sharing actual users interacting with a site is definitely the best and most compelling way to go.

re: Not so fast, broadband providers tell big users

14-Mar-07

Comcast is shutting down service to patrons whose bandwidth use reaches a high but undefined level according to this Boston Globe Article. They’re refusing to let customers know what the cutoff point is, because they are claiming that the number changes based on network capacity. One sections I thought interesting:

Feddeman declined to say where Comcast draws the line on too much Internet usage, instead saying the amount of data that could trigger a warning call would be roughly the equivalent of 13 million e-mail messages or 256,000 photos a month. Although those files vary in size, a typical photo file size is 1 to 2 megabytes, meaning that excessive users are downloading hundreds of gigabytes per month.

Interesting that they are using an “e-mail” as a benchmark. Taking a look a this wikipedia article on streaming media it appears that a more common measurement is not an “email” but in mebibytes, gibibytes, and tebibytes.

Google’s forthcoming Ocropus

14-Mar-07

Found via DIGLIB, it appears that Google is not only working on an OCR package, they’re also planning on releasing it open source. Ocropus (open source document analysis and OCR system) currently runs only on Linux, but will surely be ported once it finds its legs. This is one to watch.

http://code.google.com/p/ocropus/

Is our enthusiasm for Library 2.0 justified?

09-Mar-07

There’s an interesting article out in Information Systems Journal about the difficulties encountered while trying to get a list of requirements from users. It’s not a problem that affects users only, but it also affects investigators and designers of user centered design:

Pitts, M. G., & Browne, G. J. (2007). Improving requirements elicitation: An empirical investigation of procedural prompts. Information Systems Journal, 17, 89-110. [via Blackwell Synergy] or [via World Cat]

The below figure is a chart from the article which outlines the various cognitive limitations that users and designers have with processing information. While a lot of this may not be new, it’s handy to see it all in one place.

Limitations on Information Processing

This is a great list of pitfalls to keep in mind while designing from a user’s perspective. For example, and here I’m thinking about Library2.0, I have this nagging worry that our users may not want the type of information interactivity that librarians love. The same thing holds for wikis; I love wikis, but if you’ve ever tried to get a small group of busy people to use a wiki, it can be tough sledding. So what exactly is going on? To what extent do our users want to get into metadata, tagging, reviewing, etc.? While I believe they do want to, I also think it will take a series of usability studies to find out why exactly.