debian data released

I collected some debian package data and started parsing it to see what kind of stuff we might find in there.

I will probably need some help from the user community on this one, to know what sort of data you find interesting in these packages.

Here are the files I collected:

Obviously there is a lot of information there, and I only parsed some of it out for this initial run. Here are the items I parsed and released:

  • package name, version, parent directory
  • any URLs found in the copyright page, and any URLs found within the textual description of the project found on the stable project page
  • developers (maintainers and co-maintainers listed on the developer information page)

I'd love to hear from the community about what items you would like to see parsed out.


Hey, I'm gathering Debian index files since 2006. Are they usefull to you?Currently, I have 1197 of them, totaling 7GB. You can find me by the email marcosdumay at gmail.

Ok, I tried answering your email, but it seems you don't read it that often...I've set a http server with my Debian index files at: indexes are for the main branch of Sid.There is a file named "files" there, with a list (one per line) of the files you can download. Some of the indexes may have a length of 0, that happens because of problems downloading them.Since I am behind a slow connection, and don't want to have my upload bandwidth entirely taken by that, I ask you to download those files between 01:00 and 11:00 GMT. Also, please send me an email before you do so, this way, I can configure the computer to stay on during that time (otherwises it automaticaly turns off by 03:00 GMT).