Web Site, XML structures (16)

1 Name: Charlie : 2007-01-22 23:26 ID:keHVuLob [Del]

I have started to sketch out some XML structures that I think will work with the SimpleXML and reinforce the object referencing principles. And with some well designed Perl modules we should be able to manage the content pretty well.

So that we can do shared edits I can created a page in my temporary wiki: http://ibcwiki.spaceroom.org

I thought about a regular document but editing it would be a pain.

2 Name: talysman!!/0CigS8/ : 2007-01-23 03:32 ID:f/H4VQyH [Del]

I had started tryimg to create an XML structure myself, and it turns out mine doesn't match yours. I added my dieas to the wiki as an alternate XML structure.

3 Name: Charlie : 2007-01-23 14:29 ID:xeoBy0UD [Del]

I turned on comments for the page on the wiki in case simple things might be appropriate, but I think keeping the discussion in one place makes sense.

I see where you are going with the page oriented structure. I will have to think this through. I have a sense that perhaps a hybrid of the two will be needed. I like having a structure that explicitly defines page content. I do not see how the wiki text content merges.

The main factor will be how we manage the content in the interface. The approach I was taking was to be dynamic and load information in the background and update the relavant parts of the page. However this is a very dynamic system and does not work well for the limited functionality browsers.

If we need to include all relavent information on a page (even to support scrolling features), then each page change will require a complete re-rendering of the page. In that case I would suggest using XSLT to transform the XML into the page format. In that case if we can assess browser capability we can use the XSLT to generated content styles that are appropriate to the browser. (That could include the use of Javascript or not).

Hmm. Maybe that could be a way to determine whether or not dynamic would be allowed?

I still need to get a better full picture of how to implement this with the functions we want.

4 Name: Charlie : 2007-01-26 00:06 ID:keHVuLob [Del]

OK. I started playing with Perl, using XML::Simple.
Pretty cool. I was able to load my database.xml file with the different sections and select the albums, songs, artists,whatever by name. That means it will be easy to extract data elements.

Now I am stuck on a simple Perl syntax issue.

Lets assume I have the hash structures created by Simple.
<database>

<albums>
<album name="Needs More Wanger" />
<albums>

<database>

turns into $database->{Albums}->{Album}

How do I insert a new album hash into the structure?

5 Name: Doctroid : 2007-01-26 10:54 ID:GOfjN7PO [Del]

Just

$database->{"Albums"}->{"Siberian Fields of Cheese"} = $value;

isn't it?

In Perl hashes, a new entry is defined when you assign to an undefined entry.

6 Name: Charlie : 2007-01-26 17:00 ID:keHVuLob [Del]

I guess I was a bit vague.
If the value is simple then yes. I guess I need to build up more complicated pieces and then composite them together.

I had tried creating an object and it almost worked but the hash was surrounded by a "bless" which I have no idea how to get rid of.

I will probably have to put up an example

7 Name: talysman!!/0CigS8/ : 2007-01-26 23:33 ID:f/H4VQyH [Del]

Glad you've started work with XML::Simple. I tried some stuff myself a couple days ago, but there appears to be a problem with my perl installation that perl -MCPAN -e shell doesn't seem to fix (the process hanged on their server and won't stop.)

I'm not sure about the example of adding albums, because of the type of element it's using:

<album name="Needs More Wanger" />

If forcearray is on and keyattr is set to fold on both "album" and "name", this sets $database->{"Albums"}->{"Album"}->{"Needs More Wanger"} to... something, I'm not sure what, maybe just "1". By default, the method for adding to it should be similar to what Doctroid suggested:

$database->{"Albums"}->{"Album"}->{"Needs More Wanger"} = $value;

I think we need to simplify it a little. Take out the <albums> block and have a series of <album> elements, which get put into a hash named "album" with album names as keys and values equal to the other attributes in the element. Either that, or use <album> as a block around plain text, like this:

<album>Needs More Wanger</album>

This will wind up with something like:

$database->{"Albums"}->[0] = "Needs More Wanger";

8 Name: talysman!!/0CigS8/ : 2007-01-27 00:12 ID:f/H4VQyH [Del]

ok, update: somehow, I got a test script to work to verify what a given structure would look like. I used forcearray => [ 'image', 'mp3' ] and the defaults for keyattr ('name', 'key', 'id') and tested an XML structure that looked like this:

<page>
<version name="foo (version 1)">
<album>bar</album>
<image>albums/bar_front_thumb.jpg</image>
<image>albums/bar_back_thumb.jpg</image>
<artists>
<concept>jwgh</concept>
<lyrics>talysman</lyrics>
<composer>Casey Bennetto</composer>
<guitar>jwgh</guitar>
<guitar>manfire</guitar>
</artists>
<mp3>songs/foo_polka.mp3</mp3>
<mp3>songs/foo_disco.mp3</mp3>
<tab key="lyrics" />
<tab key="linernotes" />
<tab key="artists" />
<tab key="comments" />
</version>
<version ...>
</page>

It translated to this:

{
'version' => {
'foo (version 2)' => {
'mp3' => [
'songs/foo_grunge.mp3'
],
'album' => 'bar',
'artists' => {
'vocals' => [
'Major Zed',
'Kerri'
],
'guitar' => 'Charlie',
'composer' => 'Major Zed',
'concept' => 'jwgh',
'lyrics' => 'talysman'
},
'tab' => {
'artists' => {},
'linernotes' => {},
'comments' => {},
'lyrics' => {}
},
'image' => [
'albums/bar_front_thumb.jpg',
'albums/bar_back_thumb.jpg'
]
},
'foo (version 1)' => {
'mp3' => [
'songs/foo_polka.mp3',
'songs/foo_disco.mp3'
],
'album' => 'bar',
'artists' => {
'guitar' => [
'jwgh',
'manfire'
],
'composer' => 'Casey Bennetto',
'concept' => 'jwgh',
'lyrics' => 'talysman'
},
'tab' => {
'artists' => {},
'linernotes' => {},
'comments' => {},
'lyrics' => {}
},
'image' => [
'albums/bar_front_thumb.jpg',
'albums/bar_back_thumb.jpg'
]
}
}
};

This means that elements of the form <tab key='comments' /> get turned into a 'comments' key with a null hash as a value, as part of the has named 'tab', which answers some questions we had about the <album> element in the example you gave, Charlie.

9 Name: talysman!!/0CigS8/ : 2007-01-27 00:24 ID:f/H4VQyH [Del]

Oh, and as a result of this test, I'm thinking the <version> element is a bad choice, because it adds an unnecessary level. I'm thinking about the best way to change this.

The key values set to an empty hash may be a problem, too, since I was planning on testing for their presense with a simple if($database->{$version}->{'tab'}->{'comments'}) test.

10 Name: Charlie : 2007-01-27 16:53 ID:keHVuLob [Del]

One of the reasons I like having a container tag such as Album is from an efficiency angle. If I am looking for an album the code is not going to have to do a selection from a large heterogeneous groups of tags. There will be one Albums tag, one Artists tag, etc. Once I have that element I the selection is narrowed down to only Albums.

In my schema, Album is also a complex type since it will have a Page tag for text content, as well as Tracks for listing the song references with the extra track number attribute as well as other data.

I think the programming problem for me is how do I create a complex hash and insert it into another hash.

<Album name="House Made of Cheese">
<Page>
<Track name="Angsty Teen Suess" track="3">
</Album>

into

<database>
<Albums>

<Album name="The Last Operatic Fortran Singer">
...
</Album>

<Albums>
</database>

I see the same issue at other levels.

The Pages only have to have references to the content and the scripts can extract the information and build a complete XML.

11 Post deleted by user.

12 Name: Charlie : 2007-01-28 14:38 ID:keHVuLob [Del]

I have been thinking. To keep things simple and move forward, I would like to try talysman's page model. I can create an XSLT stylesheet that will place the data into appropriate divs which we can the style accordingly.

To do that we just need to define our page structure. For dynamically enhanced pages, the order is not so important, but for non dynamic/text only pages it will have a big impact.

For ease of mutual editing I will create a new page in the ibcwiki.spaceroom.org area. I have a initial ordering by I have no real attachment to it so feel free to propose something different/better.

13 Name: Charlie : 2007-01-28 17:13 ID:keHVuLob [Del]

I have been working on trying to get Last Days of the Crazy People's Super Market to fit into the page xml structure.

There is an area of redundancy that we need to decided to how to deal with. One the original page there is a track listing with the songs and the order in which they are on the album. (This corresponds with my track element that I introduced.)

However there is also a list of credits for each of the songs that effectively duplicates much of the information from the actual song pages. Replicating within the display page is fine, but if we store it in the page xml file, then we will inevitably wind up having it get out of sync with the song pages.

One way we could do it is only include the track information and then when the page is requested resolve the references to the song XML files and include that (or a subset of the information such as credits, mp3 URL, etc.) in the album page prior to stylesheet transformation.

14 Name: talysman!!/0CigS8/ : 2007-01-28 20:01 ID:f/H4VQyH [Del]

I think the main difference between our two approaches is that you are seeing all the data loaded from a single file, while I'm worried that that approach will use a lot of memory for the CGI process and slow things down; I see the primary content being stored as plain text (with Markdown formatting) and the XML files pointing to which files should be linked together to create a particular page.

That does leave open a mixed approach: putting the data that is most likely to be useful all together in one file, but leaving the bulk of the content (like lyrics) in text files. I'm still not sure whether this will be quick enough, but we can test both approaches and see differences in speed and simplicity of coding.

There's also the question of reusability. I'd like the actual CGI to be content-agnostic as much as possible. Even though songs, albums, and artists are displayed differently, I don't want the code to have to determine which is what.

XSLT looks interesting, because we can create artists.xslt, mp3.xslt, and so on, one for each major XML data grouping, and store them in a /format directory. When the script is building the page and needs to display artists for a given song, for example, it loads and applies the .xslt file with the same name as the tag, if one exists. This would allow the web site to be extensible without rewriting the code.

I added a page to the temporary wiki on the IBC CGI so that we can better visualize what the script is actually going to do, which should make it easier to work out what kind of data we need. I suggested there that we should use XML elements in the page info files with the format <tagname key='keyname' /> to define content areas. Combining this with the XSLT idea: if the page info file contains data that matches the keyname of a content area, the script would apply format/keyname.xslt; otherwise, it would load, translate and include keyname.txt.

15 Name: talysman!!/0CigS8/ : 2007-01-28 20:52 ID:f/H4VQyH [Del]

Now, here are my thoughts about a mixed approach to the XML file and to problems of redundancy. I don't think redundancy is a big problem, because I can only think of three cases where we might have duplicated information:

  • images associated with album pages duplicated in song pages;
  • artists and mp3s associated with song pages duplicated in album pages;
  • artistic roles (guitar, drums, vocals, lyrics) associated with song or album pages duplicated in artist pages.

I think these kinds of redundancy are mostly unavoidable; they will happen whether we have individual page files or a single database file. However, the amount of data duplicated is negligible, just a page name, really. The bigger issue is synchronization, when editing one page changes the contents of others. You mentioned the issue of Last Days Of The Crazy People's Super Market, although I think that's not a good example, because those songs were added to the album after being recorded, so changes to the song page (adding another arrangement, for example) shouldn't affect the album at all.

A better example would be the Tribute Album or Bad Coelacanth, which have unrecorded songs linked to them. We could make a rule that if a song page associated with an album is updated, the CGI should automatically update the album page, too... but this leads to a problem of alternate arrangements of songs already added to an album. For example, if I record the radio edit version of "Peroxide Piranha", what happens to the Super Market page? This may be a flaw in our data model...

I have to think about it more, but here's an idea: as I mentioned, I'm concerned about loading one big XML file into memory, even though that may give us some special search and sorting capabilities we might want to use later. However, it seems there's two kinds of data objects we are working with: stand-alone items (mp3s, album images, music videos, pdfs for cd inserts) and page items (songs, albums, artists.) Stand-alone items, once created, never need to be updated except to correct errors; a specific mp3 will never mysteriously gain another vocalist, although someone could remix it with another vocalist, creating a new arrangement.

Let's create an xml file for each stand-alone item class, like mp3.xml or image.xml, which would list every item on the site other than text files. Pages, on the other hand, would have individual xml files that would describe which content display areas that page needs, as well as things like page name, number of versions, track lists, and three kinds of artist: lyrics, melody, and concept. For each content display area, the script would check if there's a directory with the same name; if so, it tries to load a text file with that name. Otherwise, it tries to load an xml file with that name and looks for images, mp3s, or other objects tagged as belonging to that page. This might help us avoid synchronization problems and confusion about which mp3 belongs to an album.

16 Name: Charlie : 2007-01-28 21:00 ID:keHVuLob [Del]

I have no problem separating the content into individual xml files. For example each album would have its own xml file with the album content, and songs could have their own content too, as well as artists. So in many ways it is much like the wiki organization where the element is contained in a like named page.

The challenge is trying to eliminate content that is repeated across all of those pages because it will get out of sync and be inconsistent without careful editing of content across the files.

I just want to separate content from presentation. I think the page files are a good presentation description in that they describe what information is relavant to a page. (Forget about styling for a moment).

XSLT can be very handy because we can convert from data to layout formats very easily. The only thing that I would say is that from using XSLT quite a bit is that you have to match on tags that are known entities. Your rules are coded to match certain tag names. This means that it is not very easy to match against the example you had before related to the artist sub tags that you defined because they are mutable. However if we can define that information as an attribute of a tag such as <Credit type="composition" name="Casey B"/> then it is much easier than <Composition name="Casey B" />. So I think this is inline with your tagname key="keyname" idea. If you look at my proposal I don't think we are too far off from each other.

I read the IBC CGI description and I think it makes sense. It is really just working out the details of the XML structure to keep things easy for ourselves.

Name: Link:
Leave these fields empty (spam trap):
More options...
Image: