september Launchpad data is released

Huge thank you to our former student and flossmole alumnus Christian Funkhouser for working on re-writing the Launchpad collector to use the API.

Get the code!

The datasource_id for Launchpad in September is 342!

September 2012 data released

Data files have been released for September 2012. Go check it out on our Google Code downloads page or sign up for direct database access.

Special release notes:
--The Google Code, Github, and Launchpad collections are not included this month.
--Alioth is back, and Tigris is back, including emails.

August 2012 data released

Data files have been released for August 2012. Go check it out on our Google Code downloads page or sign up for direct database access.

Special release notes:
--The Google Code, Github, and Launchpad collections are not included this month.

Freecode New Project Registrations (1998-2011) and language tags

This chart shows the new project registrations for each year 1998-2011, and what programming language those projects were tagged with.

For example, 2003 was the highest year for new "C" projects to be registered with Freecode (then called Freshmeat).

(Couple of caveats about the data here: (1) A project could be tagged with one language when it's created, and then the tag could change later. For example a project may have been created as "C" and then later switched its tags to "Java". This chart will show whatever the most current language tag is for that project. I have not calculated the number of times this happens or if it is likely, but I suspect that it is not too common. (2) The languages in the legend are in order of the total number of times they exist as tags for any project. (3) Not every project has a language tag for itself.)

Here is the SQL code used to generate the data sets for this graph:

SELECT YEAR(p.date_added ) , COUNT( DISTINCT p.project_id )
FROM fm_projects p
INNER JOIN fm_project_tags t
ON t.project_id = p.project_id
WHERE p.datasource_id = 316
AND t.datasource_id=316
AND t.tag_name="C"
GROUP BY 1
ORDER BY 1;

Substitute your current datasource_id and any valid programming language tag. I used the following:

C
Java
C++
PHP
Perl
Python
JavaScript

These are in order of total number of times they appeared in the sitewide tag list over time.

Other top languages, in order of frequency the tag is used:

Unix Shell
XML
SQL
HTML
Ruby
Tcl

July 2012 data released

Data files have been released for July 2012. Go check it out on our Google Code downloads page or sign up for direct database access.

Special release notes:
--Alioth data has not made it into the SDSC site (direct db access) but we are working on it. In the meantime, the data is on Google Code site.
--The Google Code collector is still using an old list of projects, but this bug is being worked on.
--Launchpad collector is being re-written
--Alioth collector is being re-written

May 2012 data releases

Data files have been released for May 2012. Go check it out on our Google Code downloads page.

Re-writes:
--Free Software Foundation has been re-written from scratch to match their new layout.
--Google Code collector has been re-written to fix a few bugs (still running)
--Launchpad has been finished and will be re-written for June to fix bugs
--Alioth is being re-written to fix bugs

Student work using FLOSSmole data

I often have my students tackle FLOSSmole data as a way of learning more about FLOSS, databases, data visualization, etc.

Here is an example of one of the graphs my students worked on last week, using Freecode data in FLOSSmole, R, and Illustrator.

January 2012 Freecode data set is available here.

February Github data released

February data has been released for Github.

Get the data here from our Google Code downloads page or request direct database access here.

Included with Github data are the following values:
project name
developer name
description
private yes/no
fork number
homepage
number of watchers
open issues
...and all the xml values that these fields are based on!

Have fun!

February Google Code data released

Google Code data has been released for January/February 2012.

Get the data here from our Google Code downloads page or request direct database access here.

Be aware that there is one open bug for Google Code collection that may affect your use of this data.

Included in the Google Code run this time is: project info, developer list for each project (names obfuscated in some cases), blog info, labels, links, groups, etc etc. Have fun!!

Syndicate content