The blog of Charles Pence. For more non-blog content, head to my website.

Tuesday, May 12, 2009

Website Maintenance with Pandoc and Markdown

I've been struggling for a long time with how to properly and simply maintain a website. The eternal dilemma seems to be this: If you want to keep the HTML simple enough to be readily editable in a text editor, you're stuck with keeping the styling really simple. On the other hand, a nice theme has lots of header and footer material, which makes it a nightmare to deal with editing a page on a regular basis. You can smooth over the problem, but only if you have some heavy server power available to use stuff like PHP or server-side includes, and since my professional website is hosted on academic web servers, I don't have any control over what's happening server-side.

I finally think I've got a solution, using Pandoc. More below the fold.


I've known about Pandoc for a while now -- I'm using it to generate the user manual materials for Logos. But it only just dawned on me that a cautious use of Pandoc could make for a great easy-maintenance website system. So I've got a website tree set up like this:

/
/research
/teaching
/source
/source/research
/source/teaching

In the source directories are a bunch of files with ".mdtext" extension, written in Pandoc's extended Markdown language (the extension of Pandoc's which I use the most is the title block, which lets you do custom page titles).

Then, at the root of the source directory, I have a bash script, which recursively does the following: (1) calls pandoc on every .mdtext file in /source/, using a different HTML header and footer depending on whether or not the page is the main index (which has a slightly different theme), (2) moves the output to the appropriate place in the directory tree, (3) runs HTML Tidy over it to sanitize Pandoc's pretty heinous HTML source, (4, and most importantly) computes the depth in the directory tree at which the .mdtext file is located, builds a string of "../../" characters, and runs sed over the HTML output, replacing $TOP with the dots.

Why are the dots so important? Well, I don't want to use any static path names in my code, since there's multiple addresses you could be using to get to my website (charlespence.net, as well as whatever current host it's on, ND at the moment). So I need to be able to do things like refer to the stylesheet in each document, using a relative path. But the header and footer for all the documents is the same. So I set the href in the link tag to "$TOPstyle.css", and then have the script turn this into "style.css" or "../style.css" or "../../style.css" as the occasion requires.

Finally, I can mount the Notre Dame web host using SFTP and Fuse, and use rsync (rsync -crLO -T /tmp --progress --delete) to copy the files from my local directory to the server mountpoint. And finally I have a website system that makes things easy to maintain!

Watch this space -- I'm planning to put up a little collection of history of biology resources, now that I'm teaching a course this summer and can easily do such a thing.