FLOSSmole as a catalyst for research

One of the papers at the 2011 OSS conference is entitled "Building Knowledge in Open Source Software Research in Six Years of Conferences". It surveys the contributions of papers presented at the OSS conferences, and builds social networks of the papers, identifying research streams along the way.

Findings particular to FLOSSmole:

"Cluster #82. The largest cluster originates from node #82. Paper #82 introduces the OSSmole project (later called FLOSSmole). OSSmole is a repository of data, scripts, and analysis of data collected from OSS projects."


Current challenges for Fall

1. Free Software Foundation directory changed their layout to a wiki so we're re-writing our collector to parse RDF instead. This will change the tables we use for FSF data now.

2. We were able to convince our dear colleague Audris Mockus to run his Google Code collector and gather the latest list of project names for us. SWEET! This means a Google Code run is imminent.

3. UDD and Debian still need to be re-run, and automated.

Data Resources: 

Forges paper pre-print

Here is the pre-print copy of the paper on forges that David and I have written. I am going to present at HICSS 45 in January.

Squire, M. and Williams, D. (2012). Describing the software forge ecosystem. 45th Hawaii International Conference on System Sciences. Maui, Hawaii. January 4-7. Forthcoming.

Data Resources: 

September 2011 data, in progress

Here is the status of each collection for September 2011:

The stages are
1. collecting (some projects have sub-stages here)
2. parsing
3. files released to Google Code
4. data released to Teragrid

UPDATED as of 05-Sep-2011 at 12:41PM:
Freshmeat - collector/parser being re-written for accuracy and bugfixes

Rubyforge - files released to Google Code & data uploaded into Teragrid

Objectweb - files released to Google Code & data uploaded into Teragrid

Data Resources: 

June Data: Google Code, Launchpad, Github

Summer is a beautiful thing. Moles, we've got a huge Google Code release for you (ds=271), and the re-vamped Launchpad (ds=272), and also Github (ds=273).

Get your FRESH June data on our Google Code Downloads Page or LIVE on the Teragrid.

Tigris is fixed and is running right now. We're also writing a new collector for Alioth! Lots of new stuff.

Data Resources: 

Everything you ever wanted to know about software forges (code forges), June 2011


We have taken a list of 24 software forges and classified them according to what features and artifacts are present on that forge (as of early June 2011). The word cloud below represents the relative frequency of the forge tags. The links lead to tables that show what characteristics each code forge has.

We have a paper summarizing and extending these findings, which you can download as a pre-print attached to this posting. The citation is:


May 2011 Data Released

May 2011 data has been released to Google Code and uploaded into Data Central at Teragrid.

263 2011-Mar UDD bugfix replaces 262
264 2011-Mar UDD bugfix replaces 263
265 2011-May UDD May 2011 UDD donation
266 Rubyforge 2011-May Rubyforge 2011-May
267 Objectweb 2011-May Objectweb 2011-May
268 FSF 2011-May Free Software Foundation 2011-May
269 Savannah 2011-May Savannah 2011-May
270 2011-May FM May 2011 Freshmeat

Debian data, Ultimate Debian Database

Hello moles. A quick update on the Debian collections.

I told you earlier that we'd been collecting some Debian data and calculating software engineering metrics for each C/C++ package, and providing that data on both the raw data downloads page and in the database at Teragrid.


Bug on Google File Download page

Heads up Moles - If you've been searching our very, very long file list on the Google Code site, you might have noticed that "search" acts strangely over there. (Strange that Google Code would have search issues...but anyway...)

Today I turned in bug #5211, for some odd behavior in the way search results are returned for (in this case) the keyword "Debian".


Mozilla uses metrics

Here is an interesting post about how Mozilla is beginning to study itself using metrics gathered about contributors and contributions over time. They create charts and tables about patch rates and the like.