warning: Creating default object from empty value in /var/www/drupal/modules/taxonomy/taxonomy.pages.inc on line 33.

Collection information

Details about the repository collections

New data for March 2011

Most of the March data has been released to our page on Google Code. Included forge collections are: Free Software Foundation, Freshmeat, Rubyforge, Objectweb, Savannah, Tigris. Google Code is still running. Github and Launchpad are not functional right now (waiting on a bug fixes).

There are two ways to get the data:
You can download the data at our downloads page - the flat files are so marked, and the SQL files are marked "datamarts". Note that datamarts only contain the latest collection. If you want previous months' worth of data, you'll have to grab those datamarts too.

You can also log into our database on the Teragrid and live-query the data. Read these instructions on getting a login.

Have fun!

Jan/Feb 2011 data uploaded to Teragrid

I've backed up the Jan/Feb data to Teragrid for your live queries. Be sure to log in there and use your database querying tool of choice to check out the data. (If you need an account, read these instructions for how to get yourself an account.)

The datasource_id information is as follows:

237 FM-Freshmeat
238 RF-Rubyforge
239 OW-ObjectWeb
240 FSF-FreeSoftwareFndtn
241 SV-Savannah
243 GC-GoogleCode
244 TG-Tigris
246 - Debian metrics

Enjoy!!

Debian metrics data released

One of the undergraduate students working on this project, Carter Kozak, recently collected all of the Debian package data for January and parsed it for relevant software engineering metrics. He concentrated on C/C++ code. He also integrated some Debian metadata, such as popcon (popularity contest) and sources.gz. He has written up his findings in a paper (as yet unpublished, but just you wait!) and has donated his data to FLOSSmole. You can find the data on our FLOSSmole data downloads page on Google.

January file releases

Just released data files for the following forges. You can head over to the FLOSSmole data downloads page at Google Code to download any of these files, or wait for them to be released to the Teragrid for live querying (shortly!)

datasource_id, forge_id, abbreviation, name
237 2 FM Freshmeat
238 3 RF Rubyforge
239 4 OW ObjectWeb
240 5 FSF Free Software Foundation
241 10 SV Savannah
243 12 GC Google Code
244 13 TG Tigris

Still running...
245 14 LP Launchpad

Broken...
242 11 GH Github

Adding 1000 data files to Google Code

I've got about 1000 files that were hosted on Sourceforge (still are) but I'm trying to move all our files into one place. I am running scripts all day to d/l these from SF, relabel them, and move them to Google Code.

If you see old files showing up at Google Code, that's why! Don't forget that you can use the search box there if you are looking for a specific file or topic. Also, send email to the mailing list if you can't find something you're looking for and I'll help you out.

UPDATE: this action apparently broke the Google Code files download page for our project. I've submitted a bug report.

January data update

Freshmeat, Objectweb, Rubyforge, Savannah, FSF, Tigris - all done, waiting for release

Github collector is broken, have student working on fixing that now. They suddenly made it hard to seed the initial project list so we're trying to figure out a way to get the entire corpus of projects. I'm getting flashbacks of when SF got really big and made it difficult for everyone to work with.

Google Code is plugging along. We're on the 7th out of 8 collection processes. Won't be too much longer on that.

Debian data is being cleaned for release. Have a meeting with a student tomorrow to see the status of that cleaning but it should be released this weekend.

New features in the hopper: (1) auto-generating DOI information at the time of release so we won't be behind, (2) federated search - I know that is huge, but we'll see how it goes.

September 2010 data trickle

Just released Github data for September today. This took about 10 days to collect, parse and release. You can download the files here (along with Freshmeat, Rubyforge, Objectweb, Free Software Foundation, Savannah, Tigris, etc), or wait for the next Teragrid backup if you want direct database access. (Google Code is collecting now, and upon completion of that, I'll do a final September Teragrid backup.)

Launchpad data released for June 2010

Introducing a new data source: Launchpad data. In this collection, Launchpad has about 19k projects in it and about 34k developers.

mysql> select count(*) from lp_projects where datasource_id=227;
18956

mysql> select count(distinct dev_loginname) from lp_developer_indexes where datasource_id=227;
34051

Available on our Google Code downloads page: Launchpad data

Github data released for May 2010

Data has been released for Github for May 2010. It is on our FLOSSmole Google Code downloads page.

May 2010 Data released

May 2010 data is released for some forges.

-Freshmeat (datasource 218)
-Rubyforge (datasource 219)
-ObjectWeb (datasource 220)
-Free Software Fntn (datasource 221)
-Google Code (datasource 222) - list of projects only

Our collectors for Savannah, Sourceforge, Github, Tigris, Launchpad are all undergoing maintenance at the moment.

UPDATE May 28, 2010
-Savannah data has been released (datasource 224)

Link to download the FLOSSmole data on Google Code.

Syndicate content