Submitted by megan on July 21, 2016 - 2:01pm
With the FLOSSmole Apache Project/Contributor/Roles data we updated earlier today, we thought an interesting initial analysis would be to figure out how various corporations populate the Apache projects (at least according to the official lists of contributors posted on each Apache project page).
Here is a list of the Apache projects with the highest density of participation by a single corporation:
Submitted by megan on July 21, 2016 - 12:39pm
I spent a few days in May updating the list of all the Apache project contributors (full name & Apache system name when available) and their organizations when available. This data set was first released in 2013 in the MSR paper entitled "Project Roles in the Apache Foundation: A Data Set".
Fields:
Submitted by megan on August 1, 2014 - 12:17pm
In my continuing quest to be organized, I've created a new schema to hold just the IRC log data. On the database server (access instructions here), there is a new schema called 'irc' and it includes (for now) Ubuntu logs, Django logs, 7 Apache projects, and the topic lines from Freenode for all channels with 3+ users.
Coming soon: email updates, including Linux Kernel Mailing List (LKML) and more IRC (Wordpress, etc).
Enjoy!
Submitted by megan on January 22, 2014 - 10:44am
Hello moles! Happy January. Here are some fresh new data sources for your mining pleasure:
1. Freenode channel list and topics (all public channels with 3 or more users). The table is called "fn_irc_channels".
2. Apache Activemq IRC logs (one datasource_id per day, one row per message).
3. Apache Aries IRC logs
4. Apache Camel IRC logs
5. Apache CXF IRC logs
6. Apache Karaf IRC logs
7. Apache Kalumet IRC logs
8. Apache Servicemix IRC logs
here is a sample of what the structure looks like for 2-8:
Submitted by megan on December 24, 2013 - 6:13pm
Hot off the presses! Another update to the Apache people-roles-projects data:
Datasources 1578-1585 have updated information on people working on Apache projects, including committer lists, PMC lists, PMC chairs, etc.
Timezones are also now being collected as well.
This is an update to the original dataset described in the paper "Project Roles in the Apache Software Foundation: A Dataset" (2013), written by yours truly.
Submitted by megan on December 16, 2013 - 1:53pm
Submitted by megan on December 16, 2013 - 1:29pm
December data has been released. We have a few old standbys (fc, rf, ow, sv, al) and some hot fresh data as well.
What is new, you ask? Well, we have some IRC chat log data for the Apache project Camel [1]. A nice new social data set, all parsed and organized into relational database format for you to query.
Submitted by megan on March 19, 2013 - 2:15pm
Hi moles! I've got two new datasets for you to play with. These aren't perfect, but they're a start of a new type of dataset for FLOSSmole!
(1) Apache Roles: This dataset stores information about people affiliated with all the subprojects of the Apache Software Foundation, their roles, and what project they're working on with that role. Data sources include: Apache web site pages, board meeting minutes, etc. (Pre-Print on FLOSShub describing collection, curation, storage, sample queries)