Blog

Debian metrics data released

Submitted by megan on February 16, 2011 - 10:35am

One of the undergraduate students working on this project, Carter Kozak, recently collected all of the Debian package data for January and parsed it for relevant software engineering metrics. He concentrated on C/C++ code. He also integrated some Debian metadata, such as popcon (popularity contest) and sources.gz. He has written up his findings in a paper (as yet unpublished, but just you wait!) and has donated his data to FLOSSmole. You can find the data on our FLOSSmole data downloads page on Google.

Data Resources:

Collection information

Tags:

debian

Read more about Debian metrics data released
megan's blog
Log in to post comments

January file releases

Submitted by megan on February 15, 2011 - 11:04am

Just released data files for the following forges. You can head over to the FLOSSmole data downloads page at Google Code to download any of these files, or wait for them to be released to the Teragrid for live querying (shortly!)

datasource_id, forge_id, abbreviation, name
237 2 FM Freshmeat
238 3 RF Rubyforge
239 4 OW ObjectWeb
240 5 FSF Free Software Foundation
241 10 SV Savannah
243 12 GC Google Code
244 13 TG Tigris

Still running...
245 14 LP Launchpad

Data Resources:

Collection information

Tags:

freshmeat

rubyforge

objectweb

free software foundation

savannah

google code

tigris

Adding 1000 data files to Google Code

Submitted by megan on February 11, 2011 - 9:28am

I've got about 1000 files that were hosted on Sourceforge (still are) but I'm trying to move all our files into one place. I am running scripts all day to d/l these from SF, relabel them, and move them to Google Code.

If you see old files showing up at Google Code, that's why! Don't forget that you can use the search box there if you are looking for a specific file or topic. Also, send email to the mailing list if you can't find something you're looking for and I'll help you out.

Data Resources:

Collection information

Tags:

google code

Read more about Adding 1000 data files to Google Code
megan's blog
Log in to post comments

January data update

Submitted by megan on February 10, 2011 - 9:20am

Freshmeat, Objectweb, Rubyforge, Savannah, FSF, Tigris - all done, waiting for release

Github collector is broken, have student working on fixing that now. They suddenly made it hard to seed the initial project list so we're trying to figure out a way to get the entire corpus of projects. I'm getting flashbacks of when SF got really big and made it difficult for everyone to work with.

Google Code is plugging along. We're on the 7th out of 8 collection processes. Won't be too much longer on that.

Data Resources:

Collection information

Teragrid backed up for December

Submitted by megan on December 9, 2010 - 1:32pm

Hello moles! All Sept-Dec data has been backed up to Teragrid. (I had to re-write the backup scripts because the ones we had no longer worked, plus they took forever anyway. The backup is much faster now so I will be able to do it more frequently.)

You should see the new datasources in there (228-235). This includes the Google Code - which is HUGE.

Enjoy!

Tags:

teragrid

backup

Read more about Teragrid backed up for December
megan's blog
Log in to post comments

Google Code Sept 2010 data out

Submitted by megan on November 30, 2010 - 2:26pm

Whew! Google Code is collected, parsed, and released. Backup to Teragrid is happening now (just as soon as I solve this little issue of"disk quota exceeded" - fun!). In the meantime, go to the FLOSSmole Google Code Downloads Page and get your hot fresh data.

Remember, the files marked "datamarts" are SQL code you can use to make your own version of the database. The raw delimited files are marked .txt.bz2. The datasource_id for Google Code this release is 235.

Tags:

google code

235

Read more about Google Code Sept 2010 data out
megan's blog
Log in to post comments

September 2010 data trickle

Submitted by megan on September 21, 2010 - 10:47am

Just released Github data for September today. This took about 10 days to collect, parse and release. You can download the files here (along with Freshmeat, Rubyforge, Objectweb, Free Software Foundation, Savannah, Tigris, etc), or wait for the next Teragrid backup if you want direct database access. (Google Code is collecting now, and upon completion of that, I'll do a final September Teragrid backup.)

Data Resources:

Collection information

Tags:

github 234

Read more about September 2010 data trickle
megan's blog
Log in to post comments

Code backed up to Teragrid

Submitted by megan on September 1, 2010 - 4:38pm

Hello moles,

We have new data backed up to Teragrid. Here is what is included:

Google Code - metadata on projects, developers, issues, etc. Plus HTML.
Github - metadata. Plus XML.
Launchpad - metadata on projects, groups, developers, wiki. Plus HTML.
Tigris - discussions! messages! project metadata. Plus HTML.

Plus all the other forges you've known and loved for so many years: fm, ow, rf, fsf, sv, etc etc.

Tags:

teragrid

Read more about Code backed up to Teragrid
megan's blog
Log in to post comments

New Google Code Data Released

Submitted by megan on August 12, 2010 - 11:06am

Hello moles! I've released a new set of Google Code project data to our own downloads page (on Google Code, no less!) - the datasource_id is 226.

This data took over a month to collect. Included are the following:

Tags:

google code

226

Collection Information

Read more about New Google Code Data Released
megan's blog
Log in to post comments

Crawlers vs API

Submitted by megan on August 5, 2010 - 4:50pm

Interesting article by some folks at 80Legs about crawling the web versus using API to gather data. On several occasions we've chosen to use an API rather than crawling. This pretty much summarizes the limitations around that choice.

Navigation

Search form

Getting data

Using Data

Related Projects

Recent blog posts

Debian metrics data released

January file releases

Adding 1000 data files to Google Code

January data update

Teragrid backed up for December

Google Code Sept 2010 data out

September 2010 data trickle

Code backed up to Teragrid

New Google Code Data Released

Crawlers vs API

Pages

Navigation

Search form

Getting data

Using Data

Related Projects

Recent blog posts

You are here

Blog

Pages