July 2012 data released

Data files have been released for July 2012. Go check it out on our Google Code downloads page or sign up for direct database access.

Special release notes:
--Alioth data has not made it into the SDSC site (direct db access) but we are working on it. In the meantime, the data is on Google Code site.
--The Google Code collector is still using an old list of projects, but this bug is being worked on.
--Launchpad collector is being re-written
--Alioth collector is being re-written

May 2012 data releases

Data files have been released for May 2012. Go check it out on our Google Code downloads page.

Re-writes:
--Free Software Foundation has been re-written from scratch to match their new layout.
--Google Code collector has been re-written to fix a few bugs (still running)
--Launchpad has been finished and will be re-written for June to fix bugs
--Alioth is being re-written to fix bugs

Student work using FLOSSmole data

I often have my students tackle FLOSSmole data as a way of learning more about FLOSS, databases, data visualization, etc.

Here is an example of one of the graphs my students worked on last week, using Freecode data in FLOSSmole, R, and Illustrator.

January 2012 Freecode data set is available here.

February Github data released

February data has been released for Github.

Get the data here from our Google Code downloads page or request direct database access here.

Included with Github data are the following values:
project name
developer name
description
private yes/no
fork number
homepage
number of watchers
open issues
...and all the xml values that these fields are based on!

Have fun!

February Google Code data released

Google Code data has been released for January/February 2012.

Get the data here from our Google Code downloads page or request direct database access here.

Be aware that there is one open bug for Google Code collection that may affect your use of this data.

Included in the Google Code run this time is: project info, developer list for each project (names obfuscated in some cases), blog info, labels, links, groups, etc etc. Have fun!!

January 2012 releases

We're cruising ahead with January 2012 releases. Grab the data from Google Code site or from the teragrid.

Freecode - done (formerly known as Freshmeat)
Savannah - done
Tigris - done
Rubyforge - done
Objectweb - done
Launchpad - done

Google Code - still running
Alioth - bug submitted #54
Gihub - will start as soon as Google is done

Free Software Foundation - bug still not fixed (this is my fault) #51

Interesting things: most popular data from November ..... drumroll please.... Google Code, Github.

Google Code data available

Google Code is our longest data collection effort each month. We've collected everything for November and posted it for your data mining pleasure. Get the files or access it on the Teragrid with direct database access (datasource_id=285).

Freshmeat becomes Freecode, and how our data is affected

Three things happened recently to affect our Freshmeat collection

1. Freshmeat announced a name change to Freecode.
2. We have an issue (issue #43) that talks about how the trove definitions for Freshmeat are out of date.
3. Freshmeat replaced trove with tagging and we missed the memo

What I've done is as follows:

For issue #1 - decided not to rename our abbreviation for Freshmeat. It will remain "FM".

For issue #2 & 3 - Added a new table to hold the tags associated with a project. It's called fm_projects_tags.

CREATE TABLE IF NOT EXISTS `fm_project_tags` (
`project_id` int(11) NOT NULL DEFAULT '0',
`datasource_id` int(11) NOT NULL DEFAULT '0',
`tag_name` varchar(50) NOT NULL DEFAULT '0',
`timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`project_id`,`datasource_id`,`tag_name`),
KEY `datasource_id_index18` (`datasource_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Added a new release file to hold the data from this table. The new file is called fmProjectTags2011-Nov.txt. Did not remove trove; we are still collecting the trove. Although there is no longer any "trove definition" list that I know of to describe each trove number, so these are not as useful as the "tags". But I'm leaving this alone in the database for historical purposes.

Here is a shot of the tags page for a sample project on Freshmeat, called amms.

Here is a shot of the way the tags look now in our release files (or database table) for that same project (#78922)

November 2011 data entered

Here is the status of the November 2011 collection:

done & ready to download on Google Code or query in Teragrid...
============
RUBYFORGE
OBJECTWEB
TIGRIS
LAUNCHPAD
SAVANNAH
ALIOTH
GITHUB

still collecting...
============
GOOGLE

collectors broken and waiting to be fixed...
============
FRESHMEAT (BUG # 43)
UDD (BUG # 50)
DEBIAN (BUG # 48)
FREE SOFTWARE FOUNDATION (BUG # 51)

FLOSSmole as a catalyst for research

One of the papers at the 2011 OSS conference is entitled "Building Knowledge in Open Source Software Research in Six Years of Conferences". It surveys the contributions of papers presented at the OSS conferences, and builds social networks of the papers, identifying research streams along the way.

Findings particular to FLOSSmole:

"Cluster #82. The largest cluster originates from node #82. Paper #82 introduces the OSSmole project (later called FLOSSmole). OSSmole is a repository of data, scripts, and analysis of data collected from OSS projects."

and

"Large clusters are initiated by empirical papers with the only exception being the paper on the FLOSSmole repository."

and

"Papers with a large number of citations [ed: such as FLOSSmole paper] are synthesizers of research often presenting a framework or a platform to guide research in OSS."

and

"In particular, we have found that the creation of a big repository for data mining (FLOSSmole) has originated research in social network analysis, tools for data mining, and analysis of code artefacts to understand maintenance processes, specifically bug fixing."

Syndicate content