Sourceforge: Number of Downloads per Project
UPDATE (2006-Jul-11): As far as I can tell, the problem below has been fixed and the "number of downloads" files are all set for you to use! Enjoy.
UPDATE (2006-Jul-10):Today, an alert user pointed out a problem with the data that I released yesterday for number of downloads. Sure enough, there was a problem with errant commas in the numeric values greater than 999. This was causing the SQL sum() to add values incorrectly for projects with large numbers of downloads. New files are being generated now, and they'll be posted shortly! Thanks for your patience. (I've removed the bad files, so for the time being the links below won't work.)
Original posting:
=================
From the Sourceforge stats page, you can get a variety of measures, such as number of downloads, rank, etc for a particular project.
I have begun releasing these measures (summed per project, over the 60 days between SF scrapes) as Raw Downloads under the SFRawData package. Here are the links, retroactive back to December 2005:
June, 2006 (link to release)
Apr, 2006 (link to release)
Feb, 2006 (link to release)
Dec, 2005 (link to release)
Some obvious applications would be to take a particular group of projects you are interested in (a dozen, a hundred, whatever) and track their number of downloads over these periods.
One thing to understand is how the project download data is collected: on the day(s) that I do the scrape of SF, I collect THAT DAY's 60-day stats page. This means that if the scrape is done on January 1 (for example) the 60-day stats will be for the approximately 2 month period before that (i.e. Nov 3 through Dec 31).
UPDATE (2006-Jul-10):Today, an alert user pointed out a problem with the data that I released yesterday for number of downloads. Sure enough, there was a problem with errant commas in the numeric values greater than 999. This was causing the SQL sum() to add values incorrectly for projects with large numbers of downloads. New files are being generated now, and they'll be posted shortly! Thanks for your patience. (I've removed the bad files, so for the time being the links below won't work.)
Original posting:
=================
From the Sourceforge stats page, you can get a variety of measures, such as number of downloads, rank, etc for a particular project.
I have begun releasing these measures (summed per project, over the 60 days between SF scrapes) as Raw Downloads under the SFRawData package. Here are the links, retroactive back to December 2005:
June, 2006 (link to release)
Apr, 2006 (link to release)
Feb, 2006 (link to release)
Dec, 2005 (link to release)
Some obvious applications would be to take a particular group of projects you are interested in (a dozen, a hundred, whatever) and track their number of downloads over these periods.
One thing to understand is how the project download data is collected: on the day(s) that I do the scrape of SF, I collect THAT DAY's 60-day stats page. This means that if the scrape is done on January 1 (for example) the 60-day stats will be for the approximately 2 month period before that (i.e. Nov 3 through Dec 31).
- megan's blog
- Log in to post comments