OSSmole gives a huge shout-out to swik, an open wiki-like database about open source projects.

Here's the OSSmole page on swik, and hopefully it'll reflect my comments here about the July run happening at this very moment... JULY DATA should be done before I leave for the beach next week, yay. I'm on 'g' right now.

july data - stay tuned

July data coming soon... I'm running the scrape of SF e'en as we speak... Then to add freshmeat!

java spider code released

We've released our spider code (java) and a nice library (with documentation!) so you can do spiders of sourceforge yourself. Here is java library file (api) and you can go to CVS (project=OSSmoleJava) to see the source. Enjoy! Special thanks to gconklin for this code.

How to use this data

(Note: This message is updated periodically with new info.)

The FLOSSmole project provides data about:

(a) all projects on Sourceforge
(b) all developers on Sourceforge
(c) all projects on Sourceforge AND who is developing for them, their roles, whether they are an administrator, etc.
(d) all Sourceforge projects and their programming languages, operating systems, user interfaces, end user audience, registration dates, etc (new: donations!)
(e) Edit, Oct-2005: much of the above, but for Freshmeat, also
(f) Edit, Jul-2006: also, Rubyforge
(g) Edit, Jul-2006: also, Objectweb
(h) Edit, Jan-2007: also, Free Software Foundation directory
(i) Edit, Feb-2007: also, SourceKibitzer donates data

We have done runs on Sourceforge starting in early 2004 and we have received donated Sourceforge data for December 2004 from Dawid Weiss in Poland.

We began also scraping Freshmeat, Rubyforge, and Objectweb, and we receive data from SourceKibitzer. Get the complete list of data sources here. (This is a list of each of our scrapes and the date and it's "datasource" ID.) The abbreviations for the forges are RF (Rubyforge), SF (Sourceforge), FM (Freshmeat), OW (Objectweb), FSF (Free Software Fndn Directory), SK (SourceKibitzer).

April 2005 Raw Data Released

I've released the raw data files for April 2005 Sourceforge scrape.
  1. Get the Raw List of Projects (full list of SF projects, registration dates, etc)
  2. Get the Raw Project Data (includes operating systems, programming languages, etc)
  3. Get the Raw Developer Data (includes developer list and developer-projects list, with new administrative flag!)

This is good stuff! Summary reports coming soon.

data donations

Thanks to all who have donated and used FLOSSmole data. Here is a short explanation of who has collected data from us so far:

  • Partially supported by NSF Grants 03-41475 and 04–14468. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

  • Partial data donated by Dawid Weiss, Institute of Computing Science, Poznan University of Technology from a research funded by the European Commission via FP6 Co-ordinated Action Project 004337 in priority IST-2002- (CALIBRE), http://www.calibre.ie/

  • Partial data donated by Megan Conklin, Elon University, Department of Computing Sciences.

  • Partial data donated by Kevin Crowston and James Howison, Syracuse University.

  • Partial data donated by Mark Kofman and Anton Litvinenko of SourceKibitzer.org

  • [Your name here!]

Sourceforge Bug Tracker data and analysis scripts

Just wanted to put in a pointer to the data and scripts that we used for our recent First Monday paper, The social structure of Free and Open Source software development. This data is part of OSSmole and Megan and I are working away currently merging out databases. But it is available now on the Syracuse FLOSS research site if people want to jump in.

Graphs of developer counts over time

As an example of the data and analysis in the system here is a graphic of developer counts over time, taken from the Project Summaries pages, developed by James Howison and Kevin Crowston using the OSSmole data. The time series are sorted, programatically, into 6 categories, from constantly rising, mostly rising, not trending, mostly falling, consistently falling and dead projects.

This picture only shows curves and categories for a sample of 120 projects. This can be compared against the categorizations of the total population of Sourceforge projects as is shown in the histogram.

The sample of 120 projects has substantially more consistently rising projects, so it seems clear that that sample is generally more successful. The large unchanging (or not trending) category in the total population reflects the fact that the mode is NA,NA,1,1,1 and our finding that 65,561 of the total 98,568 projects (ie 67%) seen over 5 years have never had more than 1 developer.

The latest Database schema

Megan and I have done some work on commenting the proposed database schema, explaining what each field is and why it is there. The schema is in CVS and is available via the web interface. It is easier to read in an editor capable of syntax coloring for mysql.

We'd very much appreciate feedback on the generality, or lack there of, and coverage of people's interest areas. Best place is the ossmole-discuss mailing list.
Syndicate content