almost there...

I'm leaving for the beach, but as soon as we get Internet set up over there (this afternoon?), I'll pick back up. Finished 'm', 'n', and 'o' last night, so we're on "p".


OSSmole gives a huge shout-out to swik, an open wiki-like database about open source projects.

Here's the OSSmole page on swik, and hopefully it'll reflect my comments here about the July run happening at this very moment... JULY DATA should be done before I leave for the beach next week, yay. I'm on 'g' right now.

july data - stay tuned

July data coming soon... I'm running the scrape of SF e'en as we speak... Then to add freshmeat!

java spider code released

We've released our spider code (java) and a nice library (with documentation!) so you can do spiders of sourceforge yourself. Here is java library file (api) and you can go to CVS (project=OSSmoleJava) to see the source. Enjoy! Special thanks to gconklin for this code.

April 2005 Summary Reports

The April 2005 Summary Reports have been posted:

How to use this data

(Note: This message is updated periodically with new info.)

The FLOSSmole project provides data about:

(a) all projects on Sourceforge
(b) all developers on Sourceforge
(c) all projects on Sourceforge AND who is developing for them, their roles, whether they are an administrator, etc.
(d) all Sourceforge projects and their programming languages, operating systems, user interfaces, end user audience, registration dates, etc (new: donations!)
(e) Edit, Oct-2005: much of the above, but for Freshmeat, also
(f) Edit, Jul-2006: also, Rubyforge
(g) Edit, Jul-2006: also, Objectweb
(h) Edit, Jan-2007: also, Free Software Foundation directory
(i) Edit, Feb-2007: also, SourceKibitzer donates data

We have done runs on Sourceforge starting in early 2004 and we have received donated Sourceforge data for December 2004 from Dawid Weiss in Poland.

We began also scraping Freshmeat, Rubyforge, and Objectweb, and we receive data from SourceKibitzer. Get the complete list of data sources here. (This is a list of each of our scrapes and the date and it's "datasource" ID.) The abbreviations for the forges are RF (Rubyforge), SF (Sourceforge), FM (Freshmeat), OW (Objectweb), FSF (Free Software Fndn Directory), SK (SourceKibitzer).

April 2005 Raw Data Released

I've released the raw data files for April 2005 Sourceforge scrape.
  1. Get the Raw List of Projects (full list of SF projects, registration dates, etc)
  2. Get the Raw Project Data (includes operating systems, programming languages, etc)

data donations

Thanks to all who have donated and used FLOSSmole data. Here is a short explanation of who has collected data from us so far:

  • Partially supported by NSF Grants 03-41475 and 04–14468. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

  • Partial data donated by Dawid Weiss, Institute of Computing Science, Poznan University of Technology from a research funded by the European Commission via FP6 Co-ordinated Action Project 004337 in priority IST-2002- (CALIBRE),

  • Partial data donated by Megan Conklin, Elon University, Department of Computing Sciences.

  • Partial data donated by Kevin Crowston and James Howison, Syracuse University.

  • Partial data donated by Mark Kofman and Anton Litvinenko of

  • [Your name here!]

Sourceforge Bug Tracker data and analysis scripts

Just wanted to put in a pointer to the data and scripts that we used for our recent First Monday paper, The social structure of Free and Open Source software development. This data is part of OSSmole and Megan and I are working away currently merging out databases. But it is available now on the Syracuse FLOSS research site if people want to jump in.

Graphs of developer counts over time

As an example of the data and analysis in the system here is a graphic of developer counts over time, taken from the Project Summaries pages, developed by James Howison and Kevin Crowston using the OSSmole data. The time series are sorted, programatically, into 6 categories, from constantly rising, mostly rising, not trending, mostly falling, consistently falling and dead projects.