debian data released
I collected some debian package data and started parsing it to see what kind of stuff we might find in there.
I will probably need some help from the user community on this one, to know what sort of data you find interesting in these packages.
Here are the files I collected:
Obviously there is a lot of information there, and I only parsed some of it out for this initial run. Here are the items I parsed and released:
I'd love to hear from the community about what items you would like to see parsed out.
I will probably need some help from the user community on this one, to know what sort of data you find interesting in these packages.
Here are the files I collected:
- project home pages for stable, unstable, and testing versions (Example of stable page)
- copyright pages (Example of a copyright page)
- developer information page (Example of a developer information page)
- changelog page (Example of a changelog)
- bug reports page (Example of a bug reports page)
Obviously there is a lot of information there, and I only parsed some of it out for this initial run. Here are the items I parsed and released:
- package name, version, parent directory
- any URLs found in the copyright page, and any URLs found within the textual description of the project found on the stable project page
- developers (maintainers and co-maintainers listed on the developer information page)
I'd love to hear from the community about what items you would like to see parsed out.
- megan's blog
- Log in to post comments
Comments
Hey, I'm gathering Debian
Hey, I'm gathering Debian index files since 2006. Are they usefull to you?Currently, I have 1197 of them, totaling 7GB. You can find me by the email marcosdumay at gmail.
Ok, I tried answering your
Ok, I tried answering your email, but it seems you don't read it that often...I've set a http server with my Debian index files at:http://marcosdumay.no-ip.org/index-debianThose indexes are for the main branch of Sid.There is a file named "files" there, with a list (one per line) of the files you can download. Some of the indexes may have a length of 0, that happens because of problems downloading them.Since I am behind a slow connection, and don't want to have my upload bandwidth entirely taken by that, I ask you to download those files between 01:00 and 11:00 GMT. Also, please send me an email before you do so, this way, I can configure the computer to stay on during that time (otherwises it automaticaly turns off by 03:00 GMT).