Getting the Data

FLOSSmole collects data from numerous open source software development forges, and we also accept data donations.

Organization of the Data
Each forge we collect from is given a "forge id", and each collection from a forge (or each donation) is given a number, called a "data source id". The collected data is stored in our database, parsed, and re-released for researchers to use as they wish. We have some schemas for our database to help you understand the organization better and details about each collection.

Getting the FLOSSmole Data
There are three ways to access the FLOSSmole data, which are described below.

  1. Flat delimited files
    Download these from FLOSSdata. The files are named with the abbreviation for their forge or source (e.g. Sourceforge is "SF"), and are divided by date of the collection. Example: sfRawPublicAreas2009-Jun.txt.bz2 (These files are in bzip format, which should open using any standard unzipping utility.)
  2. SQL files (mysql creates and inserts, generated from mysqldump)
    Download these from FLOSSdata. These files all start with the word "datamart", then the forge abbreviation, then the date. Example: datamart_sf_stats.2009-Jun.sql.bz2 Refer to the schema descriptions as you download these files so you'll know what you're getting.
  3. Direct database access
    If you are comfortable running SQL queries or using PhpMyAdmin, you can request access to our MySQL database by joining the FLOSSmole mailing list. More details on getting direct database access.

In addition, we have older data in flat files and SQL files that are still available from our older project hosting locations on Google Code and Sourceforge, but we think you'll find it easier to just grab the files from us directly using one of the three methods above!