warning: Creating default object from empty value in /var/www/drupal/modules/taxonomy/taxonomy.pages.inc on line 33.

Examples

Queries and visualizations to help people use data

Apache Camel data

We have released several files of Apache Camel IRC log data.

Sources:
originally stored by Dan Kulp
More about Apache Camel

Related Data Sets
Apache Twitter Handles
Apache Project People & Roles

Sample Queries for the IRC data:

List the most prolific IRC posters, in order of their post count
SELECT about_user, count( * )
FROM apache_camel_irc
GROUP BY 1
ORDER BY 2 DESC

List the twitter handles and svn_ids (if known) for anyone who is also on Apache Camel's IRC
SELECT distinct i.about_user, t.twitter_screen_name, t.svn_id
FROM apache_camel_irc i
inner join apache_twitter t
on i.about_user = t.svn_id

The datasources for the IRC data are (currently) #393-1572. Each log file (daily) gets its own datasource_id (since each one is a separate source)!

Freecode New Project Registrations (1998-2011) and language tags

This chart shows the new project registrations for each year 1998-2011, and what programming language those projects were tagged with.

For example, 2003 was the highest year for new "C" projects to be registered with Freecode (then called Freshmeat).

(Couple of caveats about the data here: (1) A project could be tagged with one language when it's created, and then the tag could change later. For example a project may have been created as "C" and then later switched its tags to "Java". This chart will show whatever the most current language tag is for that project. I have not calculated the number of times this happens or if it is likely, but I suspect that it is not too common. (2) The languages in the legend are in order of the total number of times they exist as tags for any project. (3) Not every project has a language tag for itself.)

Here is the SQL code used to generate the data sets for this graph:

SELECT YEAR(p.date_added ) , COUNT( DISTINCT p.project_id )
FROM fm_projects p
INNER JOIN fm_project_tags t
ON t.project_id = p.project_id
WHERE p.datasource_id = 316
AND t.datasource_id=316
AND t.tag_name="C"
GROUP BY 1
ORDER BY 1;

Substitute your current datasource_id and any valid programming language tag. I used the following:

C
Java
C++
PHP
Perl
Python
JavaScript

These are in order of total number of times they appeared in the sitewide tag list over time.

Other top languages, in order of frequency the tag is used:

Unix Shell
XML
SQL
HTML
Ruby
Tcl

Objectweb Project Licenses, July 2012

Objectweb allows each project to state what license it uses. Here is a chart showing these licenses for all the projects on Objectweb, as of July 2012.

Here is the SQL code to generate the data set:

SELECT description, count( * )
FROM `ow_project_licenses`
WHERE datasource_id =318
GROUP BY 1
ORDER BY 2 DESC;

Rubyforge License Counts, July 2012

Each project on Rubyforge can list what license it uses. The following chart was generated in July 2012 (datasource_id=317) to show the most common licenses (all those with more than 10 projects using it) and how many projects.

What seems interesting about this chart is that Rubyforge is the only forge I've seen where GPL (or some variant thereof) is NOT the most common license.

Here is the SQL code used to generate the data for this chart:

SELECT description, count( * )
FROM rf_project_licenses
WHERE datasource_id =[current datasource_id]
GROUP BY 1
ORDER BY 2 DESC;

Tigris License Counts, July 2012

Tigris lists the open source license that each project is using. The following chart shows the licenses used by projects on Tigris, and how many projects use each. Note that UNLIKE some other forges, variations of the GPL (and other licenses with multiple versions) are not differentiated by version, rather they are all lumped together.

Here is the SQL needed to generate the data set from the FLOSSmole database:

SELECT license, count( * )
FROM tig_projects
WHERE datasource_id = [current datasource_id]
GROUP BY 1
ORDER BY 2 DESC

Freecode Project Dependencies, July 2012

Freecode allows each project to declare which other projects it is dependent upon. Here is a chart showing the projects that were declared dependencies most frequently. PHP was the project selected most often by other projects as a dependency.

Here is the SQL code used to generate this data set from the FLOSSmole database:

SELECT dependency_project_title, count(*)
FROM fm_project_dependencies
WHERE datasource_id=[current datasource_id]
GROUP BY 1
ORDER BY 2 DESC
LIMIT 25

FSF Network of Required Projects, July 2012

FSF allows each project to state what other packages are required by this project.

Here is the SQL code to generate the data showing which software packages are most commonly required by other projects.

Notes: (1)We also have data on the TYPE of requirement (use requirement, build requirement, etc), but this column is not used in this query. (2) The required project does not have to be listed on the FSF directory to be included in the list. (For example, Perl is the most commonly required package, but this is not a FSF-directory listed project.)

SELECT requirement, count( * )
FROM fsf_project_requirements
WHERE datasource_id =319
GROUP BY 1
ORDER BY 2 DESC

FSF Network of Related Projects, July 2012

The Free Software Foundation directory (FSF) allows each project to list other FSF projects that are related to it.

We could construct a social network showing what projects are related. Here is the SQL code needed to make the list of what projects are related to what other projects:

SELECT pr.proj_num, p.proj_unixname, pr.related_project_name
FROM fsf_project_related pr
INNER JOIN fsf_projects p ON p.proj_num = pr.proj_num
WHERE p.datasource_id=[current datasource_id]
AND pr.datasource_id=[current datasource_id];

You can also just see the most commonly related projects by counting as follows:

SELECT related_project_name, count( * )
FROM `fsf_project_related`
WHERE datasource_id = [current datasource_id]
GROUP BY 1
ORDER BY 2 DESC

FSF Count of Projects by License, July 2012

FSF requires projects to list a license under which the project is released. This chart shows the top 25 licenses. Note that there is an "Other" category (211 projects) that includes ALL other licenses AND projects that chose "Other" as their actual license choice. This chart was generated using the licenses listed for projects in July 2012 (datasource_id=319).

SQL Code to generate the data set to build this chart:

SELECT license, count(*)
FROM `fsf_project_licenses`
WHERE datasource_id=[current data source]
GROUP BY 1
ORDER BY 2 DESC;

FSF release dates, by month and year (as of July 2012)

The Free Software Foundation directory lists free software projects and pertinent information about those projects.

One of the items we collect is "released on" date. This information refers to the last release date for the software, as reported in the FSF Directory.

This graph shows the aggregated release dates for the 6855 software packages listed in the FSF directory during the month of July 2012 (datasource_id=319).

Code to collect the number of projects:
SELECT count( * )
FROM `fsf_projects`
WHERE datasource_id = [current datasource];

Code to collect data to fill the chart:
SELECT MONTH( released_on ) , YEAR( released_on ) , COUNT( DISTINCT proj_num )
FROM fsf_projects
WHERE datasource_id =[current datasource_id]
GROUP BY 2 , 1
ORDER BY 2 , 1;

Syndicate content