rubyforge

Mining Software Repositories '16 paper, slides & data

This weekend I'll be presenting at Mining Software Repositories 2016 in Austin, TX. My talk is in the data sets track, and it is entitled Data Sets: The Circle of Life in Ruby Hosting, 2003-2015(PDF). Here are the slides.

Growth of Projects in Each Repository (June 2014)

Description:

This chart shows the number of NEW projects added to each repository by month/year.

Visualization:

Notes: RF had ~750 projects without a project start date.

SQL Script:

SELECT MONTH( date_added ) , YEAR( date_added ) , COUNT( DISTINCT project_id )
FROM fm_projects
WHERE datasource_id = [current]
GROUP BY YEAR( date_added ) , MONTH( date_added )
ORDER BY YEAR( date_added ) , MONTH( date_added );

Data Resources: 

Growth of use of "Free" and "Open" in Project Names (2014 data)

Description:

This chart shows the number of new projects in each repository that use the words "Free" and "Open" in project names through 2014.

Visualization:

SQL Script:

Freshmeat:

SELECT YEAR( date_added ) , COUNT( DISTINCT project_id ) AS Count
FROM fm_projects
WHERE projectname_full LIKE "%free%"
AND datasource_id = [current]
GROUP BY YEAR( date_added )
ORDER BY YEAR( date_added );

Data Resources: 

Number of Projects per Team Size in Rubyforge (June 2014)

Description:

This chart show the number of projects for each team size in Rubyforge.

Visualization:

Projects listed as having 0 developers were disregarded (159 projects).

SQL Script:

SELECT DISTINCT dev_count, COUNT( DISTINCT proj_unixname ) AS count
FROM rf_projects
WHERE datasource_id = [current]
GROUP BY dev_count
ORDER BY count DESC , dev_count

Data Resources: 

Most Commonly Used Operating Systems by Rubyforge Projects (June 2014)

Description:

This chart shows the top operating systems used by projects in Rubyforge.

Visualization:

SQL Script:

SELECT rfop.description AS System, COUNT( DISTINCT rfop.proj_unixname ) AS Count
FROM rf_project_operating_system rfop
WHERE rfop.datasource_id = [current]
GROUP BY System
ORDER BY Count DESC;

Data Resources: 

Most Commonly Used Languages by Rubyforge Projects (June 2014)

Description:

This chart shows the top programming languages used by projects in Rubyforge.

Visualization:

SQL Script:

SELECT rfpl.description AS Lang, COUNT( DISTINCT rfpl.proj_unixname ) AS Count
FROM rf_project_programming_language rfpl
WHERE rfpl.datasource_id = [current]
GROUP BY Lang
ORDER BY Count DESC;

Data Resources: 

Rubyforge Project Registrations, 2003-2014

Rubyforge is a software development "code forge" associated with projects written in the Ruby programming language. This chart shows the growth of new projects registered on this forge from July 2003 - December 2013. We used datasource_id=317 in this example.

The SQL to generate the data used to populate this graph is as follows (fill in the datasource_id accordingly):

Data Resources: 
Tags: 

Rubyforge License Counts, June 2014

Each project on Rubyforge can list what license it uses. The following chart was generated in June 2014 (datasource_id=12987) to show the most common licenses (all those with more than 10 projects using it) and how many projects.

Here is the SQL code used to generate the data for this chart:

SELECT description, count( * )
FROM rf_project_licenses
WHERE datasource_id =[current datasource_id]
GROUP BY 1
ORDER BY 2 DESC;

Data Resources: 

Last Rubyforge Collection

The last Rubyforge collection happened yesterday. The datasource_id = 12987. All the data is located on our file downloads site, or in the database (ossmole_merged schema, tables prefixed 'rf', use datasource_id=12987 in your SQL queries).

RIP Rubyforge! We have been collecting from there for 10 years. Charts and graphs coming soon.

rubyforge shuts down

Data Resources: 

New March 2014 data released

Some new forge data has been released collected 04-Mar-2014.

Datasource_id's are as follows:

8079 - freecode
8080 - rubyforge
8081 - objectweb
8082 - savannah
8083 - tigris
8084 - alioth

IRC data:
8085 - 8134: Apache ServiceMix
8135 - 8185: Apache Camel
8186 - 8236: Apache ActiveMQ
8237 - 8287: Apache CXF
8288 - 8338: Apache-Aries
8339 - 8389: Apache Kalumet
8390 - 8440: Apache Karaf

Data Resources: 

Pages