donated data, yay!

The OSSmole team has successfully imported data from Dawid Weiss' crawl of Sourceforge from December 2004. (Moles: This information has datasourceID=4 in the database.) Thanks, Dawid, for making your data available and for donating it to this project!

Anyone else who wants to be a mole: if you have data from ANY open source repository, ANY time frame, please let us know if you'd like to donate it. You can email Megan Conklin (mconklin AT elon DOT edu) or James Howison (jhowison AT syr DOT edu) or hop on IRC (irc.yuggoth.org #ossmole) to chat about what you have, and how it can be integrated into the OSSmole repository.

the good and the bad news

There's good news and there's bad news. The bad news is that we've found some problems with the developer data collected during the October 2004 run, namely that the last half of the letter 'z' (specifically project unixnames > 'zin') weren't collected. This means that there could be other problems lurking under the surface of the data for the October run, such as other missing chunks of information. Yuck.

The good news is multifold:
(a) we found the problem (yay);
(b) we have an active developer community that is using the data for real problems and is able to fix things like this when they arise;
(c) the newer developer runs (January) don't seem to be affected;
(d) we're adding in some donated data soon that will help fill in holes like this;
(e) we've got a brand new collection engine in alpha right now that will get the runs done faster and more accurately, thus reducing these risks in the future!

a september pattern at sourceforge?

Last September, right about the time we started up OSSmole, Sourceforge sent out a monthly email newsletter that included this observation:

(9/20/2004) Welcome to the September sitewide email. September is typically our busiest month for new traffic on SF.net. Students are arriving at college and getting on high speed connections. Open Source developers and consumers of Open Source software are returning from
their summer vacations. If you are back from vacation, it's good to have you back.


I was of course reminded of The Long September on usenet. Early participants on usenet began noticing that every September a new wave of cluless college students would flood in and ask dumb questions and make life miserable for a couple of months each year (until 1993 when usenet was made available to AOL users and so-called "The Long September" was born).

The SF message above talks about lots of "new traffic" on SF.net during September, but I'm not sure how "new traffic" is defined. It could mean generic web site traffic, as in "new users visiting the web site". Or it could mean "new projects being built", or it could mean "new users signing up". Or it could be some vestiges of September memories from usenet. Or, most likely, some combination of all of these things.

some graphs

I'm experimenting with making some graphs of the data we've collected.

Here is a graph of the growth in programming languages used on Sourceforge projects from October 2004 until January 2005.


click for a full-size image

Here is a graph showing the growth in the number of projects added per month to the Sourceforge repository from November 1999 until January 2005.


click for a full-size image

Here is a graph showing growth in total numbers of Sourceforge projects, by month, from November 1999 until January 2005.


click for a full-size image

January 2005 Summary Reports

The January 2005 Sourceforge summary reports have been posted:

sourceforge raw data files released

We have issued a new release of raw data files on sourceforge projects.

  • Raw data: full lists of projects and programming languages used, operating systems used, target user interfaces, etc. The information included is for the October 2004 run and the January 2005 run. Download raw data files here
  • Raw developer data: lists of all developers; list of projects and developers on each. The information included is for the October 2004 run and the January 2005 run. Download raw developer files here .

full project list released

We have released the full list of sourceforge project names as of 28-Jan-2005. You can get the list here.

Fall developer data fixed

The sfRawDeveloperData12-Nov-2004 files have been fixed. There was a corrupt file (raw developer data was messed up) in this release, so I have fixed that and re-released the tar.gz file. Enjoy! Download it here.

developer data files

sfRawDeveloperData12-Nov-2004 includes raw data files generated from the ossmole databases. Included data files are: sfRawDeveloperData12-Nov-2004.txt, sfRawDeveloperProjectsData12-Nov-2004.txt. These files list the developers on sourceforge (loginname, real name, sourceforge email) and which developer works on which project (and what role they occupy on that project).

october raw data

sfRawData21-Oct-2004 includes raw data files generated from the ossmole databases. Included data files are: sfRawLicenseData21-Oct-2004.txt, sfRawStatusData21-Oct-2004.txt, sfRawProjectData21-Oct-2004.
Syndicate content