Collection information

Details about the repository collections

Launchpad data released for June 2010

Introducing a new data source: Launchpad data. In this collection, Launchpad has about 19k projects in it and about 34k developers.

mysql> select count(*) from lp_projects where datasource_id=227;
18956

mysql> select count(distinct dev_loginname) from lp_developer_indexes where datasource_id=227;
34051

Available on our Google Code downloads page: Launchpad data

Data Resources: 

Github data released for May 2010

Data has been released for Github for May 2010. It is on our FLOSSmole Google Code downloads page.

Data Resources: 
Tags: 

May 2010 Data released

May 2010 data is released for some forges.

-Freshmeat (datasource 218)
-Rubyforge (datasource 219)
-ObjectWeb (datasource 220)
-Free Software Fntn (datasource 221)
-Google Code (datasource 222) - list of projects only

Our collectors for Savannah, Sourceforge, Github, Tigris, Launchpad are all undergoing maintenance at the moment.

UPDATE May 28, 2010
-Savannah data has been released (datasource 224)

Link to download the FLOSSmole data on Google Code.

Data Resources: 

February 2010 Data Released

Lots of new data for you to peruse out on our FLOSSmole Data Downloads Page.

Here's what's out there, recently added:

Google Code, March 2010 (GC) - list of all GC projects donated by Audris Mockus (HUGE THANK YOU TO AUDRIS FOR THIS!!)
Freshmeat, February 2010 (FM)
Objectweb, February 2010 (OW)
Rubyforge, February 2010 (RF)
Github, February 2010 (GH)
Free Software Foundation, February 2010 (FSF)
Savannah, February 2010 (SV)
and Sourceforge from December 2009 (SF)

Data Resources: 

December Sourceforge Data released

After long delay, the December Sourceforge data has been released. You may recall that over summer 2009, SF redesigned their web site which broke many of our crawlers and all of our parsers.

We have re-written these, and with only a few exceptions, have pretty much the same data as we always had.

Here are some release notes:

Data Resources: 

December 2009 data released

December data has been released for the following forges:

(datasource-abbreviation-full name)
200-fm-freshmeat
201-rf-rubyforge
202-ow-objectweb
203-fsf-free software foundation
204-sv-savannah
205-gh-github

Sourceforge is in progress... it will be datasource_id=206.

Get the data here:
http://code.google.com/p/flossmole/downloads/list

Remember that the files marked "DM" are SQL files (mysql) but the files marked .txt are flat text files (delimited)

Data Resources: 

November 2009 data released

This month we have data from Freshmeat, Rubyforge, Objectweb, Savannah, Github, Free Software Foundation.

Downloads available at Google Code

Remember, the SQL is available in the datamart*.sql.bz files, the flat (delimited) data is available in the other files.

We're still working on getting our Sourceforge scraper back up and running, and we thank you for your patience.

Data Resources: 

October 2009 data released

October 2009 data has been released. Here are the forges we have this month:
Freshmeat
Rubyforge
ObjectWeb
Free Software Foundation directory
Savannah (new)
GitHub (new)

FLOSSmole Downloads

Sourceforge is undergoing a re-write, still, but we will be collecting again from there soon. In the meantime, don't forget that the June 2009 data is available, and also there is the Notre Dame data if you find that helps at all.

Data Resources: 

September 2009 data released

Data has been released for FSF, FM, RF, OW. Go get it!! Have fun.

Google Code Downloads Page

That Freshmeat data looks fairly popular. Anyone want to tell us how you use this data?

Data Resources: 

Savannah data available

Savannah data has been released for July. See what you think! (Datasource_id = 182)

Data Resources: 

Pages