debian data released

I collected some debian package data and started parsing it to see what kind of stuff we might find in there.

I will probably need some help from the user community on this one, to know what sort of data you find interesting in these packages.

Here are the files I collected:


Obviously there is a lot of information there, and I only parsed some of it out for this initial run. Here are the items I parsed and released:

  • package name, version, parent directory

July data released

July data is out for Freshmeat, Rubyforge, Objectweb, Free Software Foundation.

Go get it!

June data released - all forges

June data has been released for all forges.

Head over to the project page on Sourceforge and gather all the data you need!

May 2007 data released for small forges

May 2007 data is released for the small forges. (Reminder that Sourceforge data is next scheduled for a release in June.)

As usual, there are 3 ways to get FLOSSmole data:
(1) Flat files (includes May 2007 data, plus historical data if you wish)
(2) Get the data marts
(3) Browse results of common queries

Announcement: data marts now available

By popular demand of our user base (and some hard work by our developers, especially ruphus_13), we now provide data marts for Sourceforge data.

The new package, called DataMarts contains all the SQL create and insert statements for creating your own version of the FLOSSmole database - for multiple data sources (Sourceforge, Freshmeat, Rubyforge, ObjectWeb, FSF)

The marts are created following each of our data collections; we collect and parse the data as usual. We then load it into our database as usual, and create the raw flat-file data dumps as we have been doing since 2004. The new feature we are announcing today is that we also now provide the SQL data dumps so you can auto-load our data into your own local database for easier processing and more complex mining tasks.

So, there are now numerous ways to get our data:

--install the data marts into your own mysql database
--download and analyze the flat, delimited data files
--play around with the query tool

April 2007 data released for all forges

April 2007 data is released for all forges. Here is a summary of the data we have and where to get it:

  • Sourceforge data
    • General Forge Information(Get it)
      • Project code names, project display names, developer counts, date project was registered, long project descriptions

    • Developer Information(Get it)
      • Developer login names, real names, developers-per-project and what role they have on that project, are they an admin?

    • Data about Projects(Get it)
      • Database type by project, number of downloads per project, rank of project, intended audience, topic of project, status of project, license(s), operating system(s), programming language(s), real URL of project, tracker data, donors to projects, user interfaces


  • Freshmeat data (Get it)

New Query Tool

Check out the New Query Tool for running common, pre-defined canned queries. (Thanks Gregg!)

The old query tool is still available. We'll be adding real-time graphing and some more bells and whistles to the new tool as time is available.

March data released

March data is released for the following sources (forges/directories/repositories):

--Freshmeat
--Rubyforge
--ObjectWeb
--Free Software Foundation
--SourceKibitzer

Get the data from our Sourceforge file release page

Enjoy!

(The April release will include Sourceforge and the other 5 forges.)

new donations from SourceKibitzer

Great news moles, we have a new donation partner: Source Kibitzer.

The facts:
--In our system, SourceKibitzer is forge #6, and has the abbreviation "SK".
--SK will be part of the monthly data cycle, so expect new SK files once per month (just like Freshmeat, Rubyforge, Objectweb, and Free Software Fndn.)
--SK files are available on our file releases page on Sourceforge.

The first file we released from SourceKibitzer is for February, 2007. For each of some 500-odd projects, it includes:

project name
density of comments
todo count
commented lines of code
total lines of code
non-comment lines of code
non-commenting source statements
number of methods
sum of data abstraction coupling
boolean expression complexity
fanout
npath complexity
weighted method count

Some very interesting stuff! Get the SourceKibitzer February data.

February Data Released for All Forges

Hi moles!

We've been digging as usual, and we now announce that February data is released for all 5(!) forges. This includes:

forge (abbreviation) - datasource_id
=====================================
Sourceforge (SF) - 46
Freshmeat (FM) - 47
Rubyforge (RF) - 48
Objectweb (OW) - 49
Free Software Foundation (FSF) - 50

Get the files at our Sourceforge Project Page
Syndicate content