Freecode is no longer updating

Freecode (formerly Freshmeat), the directory of Free and Open Source Software Projects, is no longer accepting new submissions. As of June 18, their site has this message on top:

The last full scrape of the Freecode RDF files took place in March. The data for the March 2014 Freecode collection is available for download from the FLOSSmole FC data site (or in the MySQL database).

RIP Freecode, nee Freshmeat.

Last Rubyforge Collection

The last Rubyforge collection happened yesterday. The datasource_id = 12987. All the data is located on our file downloads site, or in the database (ossmole_merged schema, tables prefixed 'rf', use datasource_id=12987 in your SQL queries).

RIP Rubyforge! We have been collecting from there for 10 years. Charts and graphs coming soon.

rubyforge shuts down

Django IRC data loaded into database

Django is a Python web framework. And of course it is an open source project. I have downloaded the entire collection of IRC logs for this project starting with the first logs from 2011. The logs are split into lines, parsed into fields (message, sender, time, date, etc) are now loaded into ossmole_merged database on our live MySQL server in a table called django_irc.

Each datasource_id represents one day's log file. Right now we have datasource_id 8442-9435.

We will update the collection periodically.

Usage example:

SELECT about_user, count(*)
FROM django_irc
GROUP BY about_user
ORDER BY 2 desc;

Like the Apache IRC logs, the Django IRC data will not be released as flat files since it's already available at the original django-irc-logs site.

New March 2014 data released

Some new forge data has been released collected 04-Mar-2014.

Datasource_id's are as follows:

8079 - freecode
8080 - rubyforge
8081 - objectweb
8082 - savannah
8083 - tigris
8084 - alioth

IRC data:
8085 - 8134: Apache ServiceMix
8135 - 8185: Apache Camel
8186 - 8236: Apache ActiveMQ
8237 - 8287: Apache CXF
8288 - 8338: Apache-Aries
8339 - 8389: Apache Kalumet
8390 - 8440: Apache Karaf

Data is available either in the flat files or by direct database access. Happy digging!

New Apache project IRC data

Hello moles! Happy January. Here are some fresh new data sources for your mining pleasure:

1. Freenode channel list and topics (all public channels with 3 or more users). The table is called "fn_irc_channels".
2. Apache Activemq IRC logs (one datasource_id per day, one row per message).
3. Apache Aries IRC logs
4. Apache Camel IRC logs
5. Apache CXF IRC logs
6. Apache Karaf IRC logs
7. Apache Kalumet IRC logs
8. Apache Servicemix IRC logs

here is a sample of what the structure looks like for 2-8:

CREATE TABLE `apache_servicemix_irc` (
`datasource_id` int(11) NOT NULL,
`line_num` int(11) NOT NULL,
`full_line_text` varchar(500) NOT NULL,
`line_message` varchar(500) NOT NULL,
`date_of_entry` date NOT NULL,
`time_of_entry` varchar(5) NOT NULL,
`type` enum('action','system','message') NOT NULL,
`about_user` varchar(50) NOT NULL,
`last_updated` datetime NOT NULL,
PRIMARY KEY (`datasource_id`,`line_num`)
)

These are available on the live MySQL connection.

New Apache People-Roles-Projects data

Hot off the presses! Another update to the Apache people-roles-projects data:

Datasources 1578-1585 have updated information on people working on Apache projects, including committer lists, PMC lists, PMC chairs, etc.

Timezones are also now being collected as well.

This is an update to the original dataset described in the paper "Project Roles in the Apache Software Foundation: A Dataset" (2013), written by yours truly.

Apache Camel data

We have released several files of Apache Camel IRC log data.

Sources:
originally stored by Dan Kulp
More about Apache Camel

Related Data Sets
Apache Twitter Handles
Apache Project People & Roles

Sample Queries for the IRC data:

List the most prolific IRC posters, in order of their post count
SELECT about_user, count( * )
FROM apache_camel_irc
GROUP BY 1
ORDER BY 2 DESC

List the twitter handles and svn_ids (if known) for anyone who is also on Apache Camel's IRC
SELECT distinct i.about_user, t.twitter_screen_name, t.svn_id
FROM apache_camel_irc i
inner join apache_twitter t
on i.about_user = t.svn_id

The datasources for the IRC data are (currently) #393-1572. Each log file (daily) gets its own datasource_id (since each one is a separate source)!

December 2013 data released

December data has been released. We have a few old standbys (fc, rf, ow, sv, al) and some hot fresh data as well.

What is new, you ask? Well, we have some IRC chat log data for the Apache project Camel [1]. A nice new social data set, all parsed and organized into relational database format for you to query.

Get direct database access
Or, download the files either as delimited flat files or as .sql files ("marts")

[1] Why Apache Camel? Well, that project was one of the only Apache projects where we were able to get logs for both email in mbox format and IRC chat logs. More projects forthcoming...(we're working on that now)

Rubyforge goes into the sunset

We've been collecting Rubyforge data almost since the beginning. Last month we reported on the decline of Rubyforge in light of newer forges, like Github. Here's the chart we drew:

Now we've got this lovely pair of images to contend with:

and

It's apparently been offline for four or five days now.

We'll keep collecting as long as we can.

New home for flat files

Many of you know that we provide flat files of our data for download by anyone at any time. Until recently we had hosted these on Google Code (before 2009 or so, we hosted them on Sourceforge). Recently, Google Code announced that projects will not be able to have file downloads as of January 2014. So we had to find a new home for our files.

The new home is a server called FLOSSdata. Hope you enjoy the flat file access. As usual, if you require direct database access, we have a procedure for giving you that as well.

On FLOSSdata, there are about 3300 files, either delimited or SQL dumps. These are divided into forge abbreviations and then stored by date. Enjoy!

Syndicate content