Project created on Github

25 Jan 2012
Project created on Github

Hi everyone,

I’ve started work on a parser for the static content of dylanchords.info.  It’s still a bit rough but I think it’s a decent start. I started by creating a schema based on what (I think) Eyolf was looking for. The source files were parsed in Perl and put into a database.  I’ve also setup a quick and dirty codeigniter site to view the parsed data like DC1.0. It includes a nice little CRUD component to view and edit the raw/live data. Finally, I put it all up on github including the parsers, a database dump, and the codeigniter site.  Here are the links:

Please take a look and let me know what you think. I’m sure more fields need to be added and you may want to normalize. The data/parser definitely needs work, some songs parsed better than others. The original html was largely preserved and should probably be cleaned up. Perhaps the data should be massaged manually?


Eyolf Østrem
Eyolf Østrem's picture
I’ve now had a closer look,

I’ve now had a closer look, finally, and I think it’s good. Some remarks, mainly to the schema:



The tuning field should be an id reference to a tunings table, not just a text field.

The same goes for the Venue field.

Missing fields:
- preamble
- Chords
- sounding key
- tabbedby

There should also be a field to indicate whether the song_version is live or studio. If it is studio, that almost always implies that it’s an outtake, but there are exceptions, so this has to be different fields.

Related: The outtake problem: must also record which album it is an outtake from.  Perhaps this could/should go in the album_song_version table? That would at least solve the problem with “same version on multiple albums”.

Case study: Blind Wille McTell. It’s an Infidels outtake, but it’s also a bona fide track on Bootleg Series 1-3. If the “outtake” flag were set in the song_version table, it would register as an outtake even on BS1-3, which is not what I’m after: I only want to list on the various album pages the songs that were recorded but didn’t make it to the album.  In the Blind Willie case, the album_song_version table for Infidels would have an indication that it’s an outtake there (but no such entry in the album_song_version table for BS1-3).

As for the other tables, I think they look OK. There may be some fields missing, but it’s probably easier to clean that up later anyway.


Haven’t tested it thoroughly, but it looks good, overall. I’ve come across a few strange things (e.g. in the Intro field in the album table empty lines are inserted between each line of code, and the end-tag of the <div> wrapper is not included), but once such bugs are squashed, I think it’s ready to roll.

Anonymous's picture
Glad you got it winkorg.I

kettle's picture
Nice work - I uploaded
Nice work - I uploaded screenshots of my working schema to the file browser dc_ds_cs_kettle dylanchords_kettle_1


Eyolf Østrem
Eyolf Østrem's picture
Thanks a lot for this. I

Thanks a lot for this. I haven’t had time to look at it more closely yet, so this note is just to acknowledge that I’ve seen it and that I really appreciate the effort. AND that I will have a look at it ASAP.

