warning: Creating default object from empty value in /var/www/drupal/modules/taxonomy/taxonomy.pages.inc on line 33.

Examples

Queries and visualizations to help people use data

How do various corporations populate the Apache projects?

With the FLOSSmole Apache Project/Contributor/Roles data we updated earlier today, we thought an interesting initial analysis would be to figure out how various corporations populate the Apache projects (at least according to the official lists of contributors posted on each Apache project page).

Here is a list of the Apache projects with the highest density of participation by a single corporation:


We only show the first page of results here.

How did we get this result?

1. We used the Apache Contributor Data Set described in this previous FLOSSmole blog posting. Each project in Apache family of projects lists their members, and sometimes they list what company that person works for. Here is an example from the Geronimo Project.

Not every project lists its members, and not every project lists its members' affiliations.

2. We limited our analysis to datasource_id 65935, or May 18, 2016

3. We created a view in SQL so that we could more easily calculate the percentage of the total number of developers for each project:

SELECT project_name, count(*) as 'devcount'
FROM apache_people_projects
WHERE real_name IN (
SELECT distinct(real_name) FROM `apache_people_projects` WHERE datasource_id=65935
ORDER BY `apache_people_projects`.`real_name`)
AND datasource_id=65935
GROUP BY 1 ORDER BY 1 ASC;

4. Then we ran this SQL query to generate the data shown in the table above. The rows are sorted by the highest percent.

SELECT app.project_name, app.organization, count(*) as 'org dev count', app2.devcount as 'all devs', cast((count(*)/devcount)*100 as decimal(4,2)) as 'pct of team' FROM apache_people_projects app
INNER JOIN apache_project_dev_count_65935 app2
ON app.project_name = app2.project_name
WHERE app.real_name
IN (SELECT distinct(real_name) FROM `apache_people_projects` where datasource_id=65935)
AND app.organization IS NOT NULL
AND app.organization !=""
AND app.datasource_id=65935
GROUP BY 1, 2
ORDER BY 5 DESC, 1 asc;

Interested in getting this data? Apache Contributor Data Set

Want to see more examples of how to use FLOSSmole data? Examples

Django IRC Contributions Graph

Django IRC D3 CONTRIBUTIONS GRAPH

This graph represents The number of posts in the Django IRC logs. The lighter green squares represent days with less posts than the darker green squares. Months go from left to right and are separated by the darker lines. Days go by columns from left to right.

SQL Code

SELECT YEAR( date_of_entry ) , MONTH( date_of_entry ) , DAY( date_of_entry ) , COUNT( `type` )
FROM `django_irc`
WHERE TYPE = 'message'
GROUP BY 1 , 2, 3
ORDER BY YEAR( date_of_entry ) ASC, MONTH( date_of_entry ) ASC,DAY( date_of_entry ) ASC

Source code

This visualization was created using D3 and JavaScript. To view similar source code on GitHub head over to the FLOSSmole Github repositories

Want to see more examples of how to use FLOSSmole data? Examples

Ubuntu IRC Contributions Graph

Ubuntu IRC D3 CONTRIBUTIONS GRAPH

This graph represents The number of posts in the UbuntuIRC logs. The lighter green squares represent days with less posts than the darker green squares. Months go from left to right and are separated by the darker lines. Days go by columns from left to right.


SQL Code

SELECT YEAR( date_of_entry ) , MONTH( date_of_entry ) , DAY( date_of_entry ) , COUNT( `type` )
FROM `ubuntu_irc`
WHERE TYPE = 'message'
GROUP BY 1 , 2, 3

Source code

This visualization was created using D3 and JavaScript. To view similar source code on GitHub click here

Perl6 Data Visualizations

The following are a few examples of some quick queries and visualizations we made to show how to use the Perl6 IRC data.

(1) Posts by hour of day over the years

SQL code:

SELECT YEAR(`date_of_entry`),HOUR(`time_of_entry`),COUNT(HOUR(`time_of_entry`))
FROM `perl6_irc`
GROUP BY 1,2

(2) Perl6 IRC Posts by hour

SQL Code:

SELECT HOUR( `time_of_entry` ) , COUNT( HOUR( `time_of_entry` ) )
FROM `perl6_irc`
GROUP BY 1

(3) Perl6 Developer IRC Posts Over Time

SQL Code:

SELECT `send_user` , YEAR( `date_of_entry` ) , MONTH( `date_of_entry` ) , `date_of_entry` , COUNT(
TYPE )
FROM `perl6_irc`
WHERE `send_user` = 'pmichaud'
OR `send_user` = 'audreyt'
OR `send_user` = 'TimToady'
OR `send_user` = 'jnthn'
GROUP BY 1 , 2, 3

Bitcoin IRC Graphs

Here are some graphs and SQL to show how to use the Bitcoin IRC data. These charts track the activity of three main developers of Bitcoin: Mike Hearn, Gregory Maxwell, and Gavin Andresen. These three developers were featured in a New York Times article about Bitcoin developers.

(1) Bitcoin Developers IRC Posts Over Time




Click to enlarge image

SQL Code

SELECT `send_user` , YEAR( date_of_entry ) , MONTH( date_of_entry ) , COUNT( `type` )
FROM `bitcoindev_irc`
WHERE `send_user` = 'gavinandresen'
GROUP BY 1, 2 , 3;

(2)Bitcoin Developers IRC Posts by Total Percentage




Click to enlarge image

SQL Code

SELECT YEAR( date_of_entry ) , MONTH( date_of_entry ) , COUNT( `type` )
FROM `bitcoindev_irc`
WHERE TYPE = 'message'
GROUP BY 1 , 2;

(3) Bitcoin IRC Log by Day of the Week




Click to enlarge image

SQL Code

SELECT DAYOFWEEK( date_of_entry ) , COUNT( `type` )
FROM `bitcoindev_irc`
WHERE TYPE = 'message'
GROUP BY 1 ;

Bitcoin IRC D3 Contributions Graph

BITCOIN IRC D3 CONTRIBUTIONS GRAPH

This graph represents The number of posts in the Bitcoin IRC logs. The lighter green squares represent days with less posts than the darker green squares. Months go from left to right and are separated by the darker lines. Days go by columns from left to right.

SQL Code

SELECT YEAR( date_of_entry ) , MONTH( date_of_entry ) , DAY( date_of_entry ) , COUNT( `type` )
FROM `bitcoindev_irc`
WHERE TYPE = 'message'
GROUP BY 1 , 2, 3

Source code

This visualization was created using D3 and JavaScript. To view the source code on GitHub click here

Growth of Projects in Each Repository (June 2014)

Description:

This chart shows the number of NEW projects added to each repository by month/year.

Visualization:

Notes: RF had ~750 projects without a project start date.

SQL Script:

SELECT MONTH( date_added ) , YEAR( date_added ) , COUNT( DISTINCT project_id )
FROM fm_projects
WHERE datasource_id = [current]
GROUP BY YEAR( date_added ) , MONTH( date_added )
ORDER BY YEAR( date_added ) , MONTH( date_added );

SELECT MONTH( date_registered ) , YEAR( date_registered ) , COUNT( DISTINCT proj_unixname )
FROM rf_projects
WHERE datasource_id = [current]
GROUP BY YEAR( date_registered ) , MONTH( date_registered )
ORDER BY YEAR( date_registered ) , MONTH( date_registered );

SELECT MONTH( registration_date ) , YEAR( registration_date ) , COUNT( DISTINCT project_name )
FROM sv_projects
WHERE datasource_id = [current]
GROUP BY YEAR( registration_date ) , MONTH( registration_date )
ORDER BY YEAR( registration_date ) , MONTH( registration_date );

Growth of use of "Free" and "Open" in Project Names (2014 data)

Description:

This chart shows the number of new projects in each repository that use the words "Free" and "Open" in project names through 2014.

Visualization:

SQL Script:

Freshmeat:

SELECT YEAR( date_added ) , COUNT( DISTINCT project_id ) AS Count
FROM fm_projects
WHERE projectname_full LIKE "%free%"
AND datasource_id = [current]
GROUP BY YEAR( date_added )
ORDER BY YEAR( date_added );

SELECT YEAR( date_added ) , COUNT( DISTINCT project_id ) AS Count
FROM fm_projects
WHERE projectname_full LIKE "%open%"
AND datasource_id = [current]
GROUP BY YEAR( date_added )
ORDER BY YEAR( date_added );

Rubyforge:

SELECT YEAR( date_registered ) , COUNT( DISTINCT proj_unixname ) AS Count
FROM rf_projects
WHERE proj_unixname LIKE "%free%"
AND datasource_id = [current]
GROUP BY YEAR( date_registered )
ORDER BY YEAR( date_registered );

SELECT YEAR( date_registered ) , COUNT( DISTINCT proj_unixname ) AS Count
FROM rf_projects
WHERE proj_unixname LIKE "%open%"
AND datasource_id = [current]
GROUP BY YEAR( date_registered )
ORDER BY YEAR( date_registered );

Savannah:

SELECT YEAR( registration_date ) , COUNT( DISTINCT project_name ) AS Count
FROM sv_projects
WHERE project_name LIKE "%free%"
AND datasource_id = [current]
GROUP BY YEAR( registration_date )
ORDER BY YEAR( registration_date );

SELECT YEAR( registration_date ) , COUNT( DISTINCT project_name ) AS Count
FROM sv_projects
WHERE project_name LIKE "%open%"
AND datasource_id = [current]
GROUP BY YEAR( registration_date )
ORDER BY YEAR( registration_date );

Number of Projects per Team Size in Rubyforge (June 2014)

Description:

This chart show the number of projects for each team size in Rubyforge.

Visualization:

Projects listed as having 0 developers were disregarded (159 projects).

SQL Script:

SELECT DISTINCT dev_count, COUNT( DISTINCT proj_unixname ) AS count
FROM rf_projects
WHERE datasource_id = [current]
GROUP BY dev_count
ORDER BY count DESC , dev_count

Number of Projects per Team Size in Objectweb (June 2014)

Description:

This chart shows the number of projects of each team size listed in Objectweb.

Visualization:

SQL Script:

SELECT DISTINCT dev_count, COUNT( DISTINCT proj_unixname ) AS count
FROM ow_projects
WHERE datasource_id = [current]
GROUP BY dev_count
ORDER BY count DESC , dev_count

Syndicate content