Skip to content

On Google and Privacy and Library Search

A quick comment on the latest privacy analysis conducted by Privacy International. Their methodology is a very interesting read because it appears to be directly counter to what Google is doing with search and data aggregation. Two points in particular:

Data collection and processing
What type of information does the site collect, with and without consent? On some sites the personal information submitted by customers is necessary (e.g. billing addresses) but there are many sites that collect information that may be unnecessary (age, marital status, home address, preferences, medical information, extraneous financial information) from customers without adequate information about why this information is needed and how it is used. Some companies may collect and mine other information, such as viewing habits and preferences (e.g. musical genre, lifestyle choices etc.)Here, it is also important to note the status of ‘Internet Protocol Addresses’ (IP addresses). Many companies state that they see this data as non-personal - even anonymous - information, permitting them to collect and track users’ movements around the site to determine what a specific user reads. This approach permits profiling of a user’s habits and interests.
Data retention
Some companies delete the information they collect once it is no longer needed. Other companies are not quite so clear, and a few sites are quite open that they do not intend to delete personal information at all (or at least not until they are ready to do so). With increased consumer concern about information breaches from stolen and lost computing resources, or through malicious hackers gaining access to resources, companies need to be aware that the risk to their market position and customer base may be proportionate to the amount of personal data they store.

Both of those points directly contradict how I use Google daily. I rely on Google in my ceaseless and compulsive drive deep into the Long Tail. For me, it’s all about localization and frankly, I live in a small community. That’s true physically, esthetically, professionally, and personally. And I don’t think I’m alone in that.

In my experiment this year (using Google tools for email, blog reading, calendaring, and more) what they are doing for me is aggregating my online experience. If you’ve ever had the frightening opportunity of typing in your email password on a third-party social site, you know what I mean. That screams privacy suicide, but it’s also incredibly effective. Just today I took the plunge on Facebook and by comparing their database to my Gmail contact history I found several people I would like to be connected with. It’s a genuine service that Google is offering and it relies on their having collecting extraneous information from me, using it in a way that I couldn’t have foreseen, and then storing it long-term even though I can’t remember ever giving them permission to do so [although I'm sure a Google lawyer would correct me on that point]. So I don’t think Google is in the wrong on those two points, at least in terms of their long-term vision.

Here’s where I think they’re doing us a disservice: “Openness and Transparency” and “Responsiveness” (from Privacy International’s metrics). If Google were to let me into their system as an account holder and allow me to selectively weed out information that I didn’t want on their servers, I would be much more comfortable. Of course this type of activity would be a huge albatross for a private company to take on, especially taking into account the changeability of Internet business. They would have to make sure that all services offered, now and in the future, were designed in a way that allows me to access the underlying data regardless of how it is structured and where it’s located. I’m assuming they wouldn’t take on this task unless forced to.

Here’s where academic digital libraries could lead (assuming the availability of sufficient resources, of course) by enabling our researcher’s natural and vital drive into various “long tail” topics using socially derived data as jumping off points. This could be done by building a consortial database of personal information (demographics, searches, keywords, citations used, email contacts, professional memberships, conferences attended, etc., etc.) with the caveat that members can easily weed personalized data at will. And this data would need to be stored and accessible in perpetuity.

It would be a dream come true for researchers, and potentially our worst nightmare in the wrong hands.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*