With the FLOSSmole Apache Project/Contributor/Roles data we updated earlier today, we thought an interesting initial analysis would be to figure out how various corporations populate the Apache projects (at least according to the official lists of contributors posted on each Apache project page).
Here is a list of the Apache projects with the highest density of participation by a single corporation:
We only show the first page of results here.
How did we get this result?
1. We used the Apache Contributor Data Set described in this previous FLOSSmole blog posting. Each project in Apache family of projects lists their members, and sometimes they list what company that person works for. Here is an example from the Geronimo Project.
Not every project lists its members, and not every project lists its members' affiliations.
2. We limited our analysis to datasource_id 65935, or May 18, 2016
3. We created a view in SQL so that we could more easily calculate the percentage of the total number of developers for each project:
SELECT project_name, count(*) as 'devcount'
WHERE real_name IN (
SELECT distinct(real_name) FROM `apache_people_projects` WHERE datasource_id=65935
ORDER BY `apache_people_projects`.`real_name`)
GROUP BY 1 ORDER BY 1 ASC;
4. Then we ran this SQL query to generate the data shown in the table above. The rows are sorted by the highest percent.
SELECT app.project_name, app.organization, count(*) as 'org dev count', app2.devcount as 'all devs', cast((count(*)/devcount)*100 as decimal(4,2)) as 'pct of team' FROM apache_people_projects app
INNER JOIN apache_project_dev_count_65935 app2
ON app.project_name = app2.project_name
IN (SELECT distinct(real_name) FROM `apache_people_projects` where datasource_id=65935)
AND app.organization IS NOT NULL
AND app.organization !=""
GROUP BY 1, 2
ORDER BY 5 DESC, 1 asc;
Interested in getting this data? Apache Contributor Data Set
Want to see more examples of how to use FLOSSmole data? Examples