megan's blog

April 2005 Summary Reports

The April 2005 Summary Reports have been posted:

How to use this data

(Note: This message is updated periodically with new info.)

The FLOSSmole project provides data about:

(a) all projects on Sourceforge
(b) all developers on Sourceforge
(c) all projects on Sourceforge AND who is developing for them, their roles, whether they are an administrator, etc.
(d) all Sourceforge projects and their programming languages, operating systems, user interfaces, end user audience, registration dates, etc (new: donations!)
(e) Edit, Oct-2005: much of the above, but for Freshmeat, also
(f) Edit, Jul-2006: also, Rubyforge
(g) Edit, Jul-2006: also, Objectweb
(h) Edit, Jan-2007: also, Free Software Foundation directory
(i) Edit, Feb-2007: also, SourceKibitzer donates data

We have done runs on Sourceforge starting in early 2004 and we have received donated Sourceforge data for December 2004 from Dawid Weiss in Poland.

We began also scraping Freshmeat, Rubyforge, and Objectweb, and we receive data from SourceKibitzer. Get the complete list of data sources here. (This is a list of each of our scrapes and the date and it's "datasource" ID.) The abbreviations for the forges are RF (Rubyforge), SF (Sourceforge), FM (Freshmeat), OW (Objectweb), FSF (Free Software Fndn Directory), SK (SourceKibitzer).

April 2005 Raw Data Released

I've released the raw data files for April 2005 Sourceforge scrape.
  1. Get the Raw List of Projects (full list of SF projects, registration dates, etc)
  2. Get the Raw Project Data (includes operating systems, programming languages, etc)

data donations

Thanks to all who have donated and used FLOSSmole data. Here is a short explanation of who has collected data from us so far:

  • Partially supported by NSF Grants 03-41475 and 04–14468. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

  • Partial data donated by Dawid Weiss, Institute of Computing Science, Poznan University of Technology from a research funded by the European Commission via FP6 Co-ordinated Action Project 004337 in priority IST-2002- (CALIBRE),

  • Partial data donated by Megan Conklin, Elon University, Department of Computing Sciences.

  • Partial data donated by Kevin Crowston and James Howison, Syracuse University.

  • Partial data donated by Mark Kofman and Anton Litvinenko of

  • [Your name here!]

using 'free' and 'open' to name Sourceforge projects

The following graph shows the relative popularity of the words 'free' and 'open' in naming new Sourceforge projects 11-1999 through 12-2004. Note that 1999 only includes 2 months worth of new project registration data (November and December), which is why the 1999 totals are much lower than the other years represented on the chart. However, 10 new projects in 1999 had 'free' in their names, while 'open' had only 9. In looking at the chart, we might surmise that during the years 2000-2001, 'open' became more the preferred term over 'free'.

donated data, yay!

The OSSmole team has successfully imported data from Dawid Weiss' crawl of Sourceforge from December 2004. (Moles: This information has datasourceID=4 in the database.) Thanks, Dawid, for making your data available and for donating it to this project!

the good and the bad news

There's good news and there's bad news. The bad news is that we've found some problems with the developer data collected during the October 2004 run, namely that the last half of the letter 'z' (specifically project unixnames > 'zin') weren't collected. This means that there could be other problems lurking under the surface of the data for the October run, such as other missing chunks of information. Yuck.

The good news is multifold:
(a) we found the problem (yay);

a september pattern at sourceforge?

Last September, right about the time we started up OSSmole, Sourceforge sent out a monthly email newsletter that included this observation:

(9/20/2004) Welcome to the September sitewide email. September is typically our busiest month for new traffic on Students are arriving at college and getting on high speed connections. Open Source developers and consumers of Open Source software are returning from
their summer vacations. If you are back from vacation, it's good to have you back.

I was of course reminded of The Long September on usenet. Early participants on usenet began noticing that every September a new wave of cluless college students would flood in and ask dumb questions and make life miserable for a couple of months each year (until 1993 when usenet was made available to AOL users and so-called "The Long September" was born).

The SF message above talks about lots of "new traffic" on during September, but I'm not sure how "new traffic" is defined. It could mean generic web site traffic, as in "new users visiting the web site". Or it could mean "new projects being built", or it could mean "new users signing up". Or it could be some vestiges of September memories from usenet. Or, most likely, some combination of all of these things.

some graphs

I'm experimenting with making some graphs of the data we've collected.

Here is a graph of the growth in programming languages used on Sourceforge projects from October 2004 until January 2005.

click for a full-size image

Here is a graph showing the growth in the number of projects added per month to the Sourceforge repository from November 1999 until January 2005.

click for a full-size image

Here is a graph showing growth in total numbers of Sourceforge projects, by month, from November 1999 until January 2005.

click for a full-size image

January 2005 Summary Reports

The January 2005 Sourceforge summary reports have been posted: