apache

How do various corporations populate the Apache projects?

Submitted by megan on July 21, 2016 - 2:01pm

With the FLOSSmole Apache Project/Contributor/Roles data we updated earlier today, we thought an interesting initial analysis would be to figure out how various corporations populate the Apache projects (at least according to the official lists of contributors posted on each Apache project page).

Here is a list of the Apache projects with the highest density of participation by a single corporation:

Data Resources:

Examples

Tags:

apache

Read more about How do various corporations populate the Apache projects?
megan's blog
Log in to post comments

New "Apache Projects & Contributors" data dump

Submitted by megan on July 21, 2016 - 12:39pm

I spent a few days in May updating the list of all the Apache project contributors (full name & Apache system name when available) and their organizations when available. This data set was first released in 2013 in the MSR paper entitled "Project Roles in the Apache Foundation: A Data Set".

Fields:

Data Resources:

Collection information

Tags:

apache

Read more about New "Apache Projects & Contributors" data dump
megan's blog
Log in to post comments

New schema for IRC data

Submitted by megan on August 1, 2014 - 12:17pm

In my continuing quest to be organized, I've created a new schema to hold just the IRC log data. On the database server (access instructions here), there is a new schema called 'irc' and it includes (for now) Ubuntu logs, Django logs, 7 Apache projects, and the topic lines from Freenode for all channels with 3+ users.

Coming soon: email updates, including Linux Kernel Mailing List (LKML) and more IRC (Wordpress, etc).

Enjoy!

Data Resources:

Collection information

Tags:

New Apache project IRC data

Submitted by megan on January 22, 2014 - 10:44am

Hello moles! Happy January. Here are some fresh new data sources for your mining pleasure:

1. Freenode channel list and topics (all public channels with 3 or more users). The table is called "fn_irc_channels".
2. Apache Activemq IRC logs (one datasource_id per day, one row per message).
3. Apache Aries IRC logs
4. Apache Camel IRC logs
5. Apache CXF IRC logs
6. Apache Karaf IRC logs
7. Apache Kalumet IRC logs
8. Apache Servicemix IRC logs

here is a sample of what the structure looks like for 2-8:

Data Resources:

Collection information

Tags:

apache

irc

freenode

Read more about New Apache project IRC data
megan's blog
Log in to post comments

New Apache People-Roles-Projects data

Submitted by megan on December 24, 2013 - 6:13pm

Hot off the presses! Another update to the Apache people-roles-projects data:

Datasources 1578-1585 have updated information on people working on Apache projects, including committer lists, PMC lists, PMC chairs, etc.

Timezones are also now being collected as well.

This is an update to the original dataset described in the paper "Project Roles in the Apache Software Foundation: A Dataset" (2013), written by yours truly.

Data Resources:

Collection information

Tags:

apache

1585

Read more about New Apache People-Roles-Projects data
megan's blog
Log in to post comments

Apache Camel data

Submitted by megan on December 16, 2013 - 1:53pm

We have released several files of Apache Camel IRC log data.

Sources:
originally stored by Dan Kulp
More about Apache Camel

Sample Queries for the IRC data:

Data Resources:

Examples

Tags:

apache

December 2013 data released

Submitted by megan on December 16, 2013 - 1:29pm

December data has been released. We have a few old standbys (fc, rf, ow, sv, al) and some hot fresh data as well.

What is new, you ask? Well, we have some IRC chat log data for the Apache project Camel [1]. A nice new social data set, all parsed and organized into relational database format for you to query.

Data Resources:

Collection information

Tags:

Read more about December 2013 data released
megan's blog
Log in to post comments

Two new data sets

Submitted by megan on March 19, 2013 - 2:15pm

Hi moles! I've got two new datasets for you to play with. These aren't perfect, but they're a start of a new type of dataset for FLOSSmole!

(1) Apache Roles: This dataset stores information about people affiliated with all the subprojects of the Apache Software Foundation, their roles, and what project they're working on with that role. Data sources include: Apache web site pages, board meeting minutes, etc. (Pre-Print on FLOSShub describing collection, curation, storage, sample queries)

Data Resources:

Collection information

Tags:

apache

twitter

Navigation

Search form

Getting data

Using Data

Related Projects

Recent blog posts

How do various corporations populate the Apache projects?

New "Apache Projects & Contributors" data dump

New schema for IRC data

New Apache project IRC data

New Apache People-Roles-Projects data

Apache Camel data

December 2013 data released

Two new data sets

Navigation

Search form

Getting data

Using Data

Related Projects

Recent blog posts

You are here

apache