2004/11/19

Google Scholar: personal reflections for the digital library

I've just finishing poking around the new Google service. Wow! I'm very excited and impressed. I agree with Free Range Librarian: Google Scholar that Google's new scholar service changes how the open access community, librarians, and publishing communities will have to interact with their customers.

For the record, indexing scientific publications is not new. Nor is using the citation count as a factor in ranking publications. Eugene Garfield pioneered the idea of using citation indexing in his seminal work in the 1970's and the field of bibliometrics that studies scholarly communities have been studying how scholars learn and adapt ideas from social networks. The idea of page rank is a natural extension of this work to the web and applying it to the visible medium of the web. NEC/IST Citeseer's has been the implementation of this notion for the CS scholarly community for the past five or six years, but it was only now that Google has casted its net out to deliver this service back to the community. I was wondering when Google was going to tackle this problem, and it is both not a surprise and a surprise that it is now here.

At a recent speech at WIDM, Lee Giles, one of the founders of Citeseer, argued for more topic-specific search engines. Google wasn't the way to go for niche search engines, and the existence and popularity of Citeseer was proof of that. Google's scholar incarnation will definitely put this statement to the test. Scholar is a niche search engine but from Google itself, with all its hallmarks: speed, clean user interface and relevant results. What remains to be seen is what auxiliary services are going to be provided. Work on Citeseer continues, with the release of its nascent API (I've gotten an API key, yet to have time to try it out).

How large is the collection? Les Carr in the http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/ forum suggests about 317 million articles have been indexed (by searching for the definite article "the"). Recall that currently Citeseer has about 700,000 articles indexed. Carr states that larger 317 million figure corresponds to about "13 years of the world's scientific and scholarly peer-reviewed research journal output". This figure will most certainly be revised downward as we get a better understanding of the types of items contained in Scholar.

A number of reference librarians have already done quite a bit of blogging on the Scholar and its deficiencies in terms of reference materials. They all have good points there. Things that are already on the wishlist: differentiating peer review from non-peer review articles, and access by controlled vocabularies, as pointed out by Kennedy and Price http://www.resourceshelf.com/2004/11/wow-its-google-scholar.html. This is a big one for most librarians. From the perspective of the digital library community, what else from can we wish for? There are so many things to push for here, I'll only add two to the wishlist now: differentiating author names (work from JCDL this year seems like a good bet here) and
integration with the OpenURL, DOI, CrossRef standards to let researchers seamlessly plug their institution's authorization to access publisher materials from their websites.

Meanwhile, I think I will go have a nice spot of tea. That's enough excitement for one day.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?