Friday, September 05, 2008

Musing on statistics and e-resources: ideas for manipulating the data

I was assigned the job of dealing with the usage statistics and have been working on getting through them since 2006. Things have picked up this year (2008) as we have already been budget cut once and are looking at continued budget cuts over the next few years. Thus the stats have become very important in 2009 for collection development purposes. As work continues on the stats I have also been investigating over these past years to try and discover how people are manipulating these usage stats. In 2008 more information is to be found in the literature. E-books stats are still relatively new and won’t be commented on here.

These are the manipulations of usage stats that we’re working with for databases: Cost per session; Cost per search; Cost per download. For databases and collections of e-journals that provide journal reports, we are also looking at: the top 10 downloaded titles and the % of total downloads they represent; and the number of titles with no downloads (and one download) and the % they represent separately and combined. Since we have access to both vendor websites and a digital repository, we are combining the stats before running the above measures, and are also comparing vendor website usage stats with the downloads from the digital archive.

Our work with the usage stats for downloaded journal articles hints at something I’ve started to run across in some literature, the “skewness” found in the titles, as in looking at how the titles clump/or not clump. One author recommended quantile analysis which I have to admit I don’t yet understand, in order to pursue this, as opposed to the basic top 10 downloads etc. noted above.

Other ideas include something called a “usage factor” which is the ratio of article downloads to the number of articles available per journal (UKSG @ http://www.uksg.org/usagefactors ), and the # of accesses divided by the number of articles which will give you an average (?) per session. I’ve even seen mention of the total # of articles in a collection, per yer, divided by the total annual accesses to give you the # of downloads for this collection (not sure of the utility of such a measure).

One approach I haven’t used, mostly because of the lack of accuracy in the tools available, is overlap analysis combined with a cost per use analysis.

These measures might be combined with other measures such as scholarly publishing measures: ISI Journal Impact factors; Journalprices.com (rank by price per article); Eigenfactor.org; citations from faculty; journals that faculty publish; and other considerations such as green publishers to give a better picture of use and context of use for collection development or management decisions.

A lot to think about.