Submitted by megan on June 18, 2010 - 9:59pm
Introducing a new data source: Launchpad data. In this collection, Launchpad has about 19k projects in it and about 34k developers.
mysql> select count(*) from lp_projects where datasource_id=227;
mysql> select count(distinct dev_loginname) from lp_developer_indexes where datasource_id=227;
Available on our Google Code downloads page: Launchpad data
Submitted by megan on June 6, 2010 - 11:11am
Submitted by megan on June 4, 2010 - 2:18pm
The database schema page here on has been updated. I've got a single-page PNG of the schema, and an MWB file for those of you who like MySQL Workbench.
Submitted by megan on May 30, 2010 - 7:32am
I've decided to start using the wiki on our Google Code site to mark changes as they are made to the project. You can see what we're working on and where we left off on a particular project. This is as much for me as it is for you! Sometimes I forget where I was in a particular project and this will help me.
Here is a link to the wiki of the May 2010 Changelog on Google Code.
Submitted by megan on May 30, 2010 - 7:28am
It's been a long time since we shipped new data to Teragrid. I apologize for that oversight.
The new connection info is and then the mysql port and your username/password.
Differences/shortcomings/things to know:
1) We are still collecting data all the time so the Teragrid is always a bit behind the master collection at Elon and the project file downloads at Google Code
2) This includes datasources up to 222, although 223 is listed in the datasources table.
Submitted by megan on May 4, 2010 - 1:01pm
May 2010 data is released for some forges.
-Freshmeat (datasource 218)
-Rubyforge (datasource 219)
-ObjectWeb (datasource 220)
-Free Software Fntn (datasource 221)
-Google Code (datasource 222) - list of projects only
Our collectors for Savannah, Sourceforge, Github, Tigris, Launchpad are all undergoing maintenance at the moment.
UPDATE May 28, 2010
-Savannah data has been released (datasource 224)
Link to download the FLOSSmole data on Google Code.
Submitted by megan on February 17, 2010 - 11:54am
After long delay, the December Sourceforge data has been released. You may recall that over summer 2009, SF redesigned their web site which broke many of our crawlers and all of our parsers.
We have re-written these, and with only a few exceptions, have pretty much the same data as we always had.
Here are some release notes:
Submitted by megan on November 19, 2009 - 9:28am
This month we have data from Freshmeat, Rubyforge, Objectweb, Savannah, Github, Free Software Foundation.
Downloads available at Google Code
Remember, the SQL is available in the datamart* files, the flat (delimited) data is available in the other files.
We're still working on getting our Sourceforge scraper back up and running, and we thank you for your patience.
Submitted by megan on October 11, 2009 - 9:47am
October 2009 data has been released. Here are the forges we have this month:
Free Software Foundation directory
Savannah (new)
GitHub (new)
FLOSSmole Downloads
Sourceforge is undergoing a re-write, still, but we will be collecting again from there soon. In the meantime, don't forget that the June 2009 data is available, and also there is the Notre Dame data if you find that helps at all.
Submitted by megan on September 2, 2009 - 8:45pm
Data has been released for FSF, FM, RF, OW. Go get it!! Have fun.
Google Code Downloads Page
That Freshmeat data looks fairly popular. Anyone want to tell us how you use this data?