About

FLOSSmole aims to:

  • freely provide data about open source projects in multiple formats for anyone to download
  • integrate donated data from other research teams
  • provide some tools so that you can gather your own data
  • provide a community for researchers to discuss public data about open source software development

FLOSSmole contains:

  • 300 GB of data covering the period 2004-now, and growing
  • data sets from over 200 web spidering operations, and growing each month
  • data about more than 200,000 different open source projects and their developers

Citation
Here's how to cite FLOSSmole data:

Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26.

All original data is copyright of its owners.

Conditions of Use

  1. If you use the data, please cite the source as shown above.

December 2009 data released

December data has been released for the following forges:

(datasource-abbreviation-full name)
200-fm-freshmeat
201-rf-rubyforge
202-ow-objectweb
203-fsf-free software foundation
204-sv-savannah
205-gh-github

Sourceforge is in progress... it will be datasource_id=206.

Get the data here:
http://code.google.com/p/flossmole/downloads/list

Remember that the files marked "DM" are SQL files (mysql) but the files marked .txt are flat text files (delimited)

November 2009 data released

This month we have data from Freshmeat, Rubyforge, Objectweb, Savannah, Github, Free Software Foundation.

Downloads available at Google Code

Remember, the SQL is available in the datamart*.sql.bz files, the flat (delimited) data is available in the other files.

We're still working on getting our Sourceforge scraper back up and running, and we thank you for your patience.

October 2009 data released

October 2009 data has been released. Here are the forges we have this month:
Freshmeat
Rubyforge
ObjectWeb
Free Software Foundation directory
Savannah (new)
GitHub (new)

FLOSSmole Downloads

Sourceforge is undergoing a re-write, still, but we will be collecting again from there soon. In the meantime, don't forget that the June 2009 data is available, and also there is the Notre Dame data if you find that helps at all.

Enjoy!

September 2009 data released

Data has been released for FSF, FM, RF, OW. Go get it!! Have fun.

Google Code Downloads Page

That Freshmeat data looks fairly popular. Anyone want to tell us how you use this data?

Savannah data available

Savannah data has been released for July. See what you think! (Datasource_id = 182)

July 2009 data

Hello moles, our July 2009 data has been released: this month we have Objectweb, Freshmeat, Rubyforge, Free Software Foundation directory.

Go to our Google Code pages to download the data.

The most recent datasource_ids are:
178-fm-July2009
179-rf-July2009
180-ow-July2009
181-fsf-July2009

What are the top programming languages used by projects listed in Rubyforge?

Description

This chart shows the top programming languages used by projects in Rubyforge.

Visualization

Rubyforge Programming Language Chart

SQL Script

SELECT rfpl.description, count(DISTINCT rfpl.proj_unixname) AS lang
FROM rf_project_programming_language rfpl
WHERE rfpl.datasource_id = <current>
GROUP BY rfpl.description
ORDER BY lang DESC;

What are the top operating systems used by projects listed in Rubyforge?

Description

This chart shows the top operating systems used by projects in Rubyforge.

Visualization

Rubyforge Operating System Chart

SQL Script

SELECT rfop.description, count(DISTINCT rfop.proj_unixname) AS system
FROM rf_project_operating_system rfop
WHERE rfop.datasource_id = <current>
GROUP BY rfop.description
ORDER BY system DESC;

What are the top programming languages used by projects listed in Sourceforge?

Description

This chart shows the top programming languages used by projects in Sourceforge.

Visualization

Sourceforge Programming Language Chart

SQL Script

SELECT ppl.description, count(DISTINCT ppl.proj_unixname) AS lang
FROM project_programming_language ppl
WHERE ppl.datasource_id = <current>
GROUP BY ppl.description
ORDER BY lang DESC;

Syndicate content