Submitted by megan on March 11, 2014 - 11:57am
Some new forge data has been released collected 04-Mar-2014.
Datasource_id's are as follows:
8079 - freecode
8080 - rubyforge
8081 - objectweb
8082 - savannah
8083 - tigris
8084 - alioth
IRC data:
8085 - 8134: Apache ServiceMix
8135 - 8185: Apache Camel
8186 - 8236: Apache ActiveMQ
8237 - 8287: Apache CXF
8288 - 8338: Apache-Aries
8339 - 8389: Apache Kalumet
8390 - 8440: Apache Karaf
Submitted by megan on January 22, 2014 - 10:44am
Hello moles! Happy January. Here are some fresh new data sources for your mining pleasure:
1. Freenode channel list and topics (all public channels with 3 or more users). The table is called "fn_irc_channels".
2. Apache Activemq IRC logs (one datasource_id per day, one row per message).
3. Apache Aries IRC logs
4. Apache Camel IRC logs
5. Apache CXF IRC logs
6. Apache Karaf IRC logs
7. Apache Kalumet IRC logs
8. Apache Servicemix IRC logs
here is a sample of what the structure looks like for 2-8:
Submitted by megan on December 24, 2013 - 6:13pm
Hot off the presses! Another update to the Apache people-roles-projects data:
Datasources 1578-1585 have updated information on people working on Apache projects, including committer lists, PMC lists, PMC chairs, etc.
Timezones are also now being collected as well.
This is an update to the original dataset described in the paper "Project Roles in the Apache Software Foundation: A Dataset" (2013), written by yours truly.
Submitted by megan on December 16, 2013 - 1:53pm
Submitted by megan on December 16, 2013 - 1:29pm
December data has been released. We have a few old standbys (fc, rf, ow, sv, al) and some hot fresh data as well.
What is new, you ask? Well, we have some IRC chat log data for the Apache project Camel [1]. A nice new social data set, all parsed and organized into relational database format for you to query.
Submitted by megan on December 1, 2013 - 11:12am
We've been collecting Rubyforge data almost since the beginning. Last month we reported on the decline of Rubyforge in light of newer forges, like Github. Here's the chart we drew:
Now we've got this lovely pair of images to contend with:
and
Submitted by megan on October 2, 2013 - 2:34pm
Many of you know that we provide flat files of our data for download by anyone at any time. Until recently we had hosted these on Google Code (before 2009 or so, we hosted them on Sourceforge). Recently, Google Code announced that projects will not be able to have file downloads as of January 2014. So we had to find a new home for our files.
Submitted by megan on August 5, 2013 - 10:13am
Submitted by megan on March 19, 2013 - 2:15pm
Hi moles! I've got two new datasets for you to play with. These aren't perfect, but they're a start of a new type of dataset for FLOSSmole!
(1) Apache Roles: This dataset stores information about people affiliated with all the subprojects of the Apache Software Foundation, their roles, and what project they're working on with that role. Data sources include: Apache web site pages, board meeting minutes, etc. (Pre-Print on FLOSShub describing collection, curation, storage, sample queries)
Submitted by megan on February 7, 2013 - 12:34pm
Hello moles! I have completed the move of all our data from the Teragrid over to FLOSSdata and it is ready to go.
There are some new things to know:
1. New schemas for 'old', 'sf' and 'udd' data. You're used to seeing those in the 'ossmole_merged' schema, but they've been moved out into their own schemas.
2. Reminder that the 'sf' data is quite old and we recommend that you use SRDA instead.
3. There are some new tables coming for Apache data. More information on these will be forthcoming.
Pages