Using HaXml to make a PDF slideshow from an Inkscape SVG

I recently got a tablet to input handwritten math for slideshow presentations, but instead of using a note-taking program ([Jarnal](http://www.dklevine.com/general/software/tc1000/jarnal.htm),
[Xournal](http://xournal.sourceforge.net/),
[Gournal](http://www.adebenham.com/gournal/)) I decided that I wanted the full power of image manipulation of a program like [Gimp](http://www.gimp.org/) or [Inkscape](http://www.inkscape.org/). Neither of these, though, has the level of support for multi-page documents that you find in note-taking software. But Inkscape uses SVG as its native file format, so I wrote this Haskell script to transform the layers of an Inkscape SVG file into the slides of a PDF presentation. I use the
[HaXml](http://www.cs.york.ac.uk/fp/HaXml/) library to manipulate the SVG, the Inkscape command-line interface to convert each page to PDF, and [pdftk](http://www.pdfhacks.com/pdftk/) to glue the whole thing back together.

slide001.png
slide002.png

slide003.png
slide004.png

As usual, this post is a literate Haskell file, so you can try it out by saving it to `Inkscape.lhs`, compiling with `ghc –make Inkscape`, grabbing the (http://mathematicalpamphlet.files.wordpress.com/2008/04/slide0041.pngwp-content/uploads/2008/04/demo.svg), and running `./Inkscape < demo.svg`. The output will appear in `Slides.pdf` (and your directory will be polluted with temp files, so be aware).

For the record, multi-page documents have been on the Inkscape feature
request tracker for many versions, so I presume it is a significant
change. I _do_ grok C and C++, thanks to the legacy-oriented
education system, but take little enough pleasure from them that I
would rather hack around the issue in Haskell.

> import Text.XML.HaXml
> import Text.XML.HaXml.Pretty
> import Text.XML.HaXml.Posn
> import Text.PrettyPrint.HughesPJ
> import Text.Printf
> import Data.List
> import System.IO
> import System.Cmd

HaXml is based on a combinator library for `CFilter`s to filter, search, output, etc XML content. It is a little crufty in some ways — many datatypes are transpararent, and you have to do a lot of your own set up and tear down. The expected way to use it seems to be via `processXmlWith :: CFilter -> IO ()` which is not sufficient for today’s task. The Hackage documentation pointed to an old version of the API, so I used the current version of the source code for documentation. I’d love any criticism like “you didn’t have to do X” or “here is an easier, safer way to do Y”.

I couldn’t think of a better way to narrate this code, so I’ll start with `main` for a high-level read, and then later fill in all the helper functions. Naturally we start with a call to `xmlParse`; the `”-”` is a required filename for error reporting.

> main = do input            let xml = xmlParse "-" input

Then I grab the names of all the layer objects in the order they appear in the file, except for the special layer “Background” which I’ll include behind every slide. The call to `verbatim` spits them out as `String`s instead of XML `Content`, and the `”-”` is yet another required filename for error reporting.

>           let names = delete "Background"
>                       $ map verbatim
>                       $ filterElem "-" getLayerNames
>                       $ xmlElem
>                       $ xml
>           putStrLn $ "Making slides from layers:"
>                        ++ concatMap ("\n\t"++) names ++ "\n"

Then for each layer, make a new version of the file with just that layer visible.

>           let outXmls = map (flip selectLayer xml) names
>               usedSlides = take (length names) slideNames
>           mapM_ (uncurry writeFile)
>                 (zip (map (++".svg") slideNames)
>                      (map (renderStyle xmlStyle . document) outXmls))

And some shell scripting done in Haskell. I didn’t even try to find a scripting library or anything to e.g. prevent me from building a malformed command.

>           mapM_ (\slide -> do
>                    system $ "inkscape --export-text-to-path --export-pdf='"
>                             ++ slide ++ ".pdf' '" ++ slide ++ ".svg'")
>                 usedSlides
>
>           system $ "pdftk "
>                      ++ concat (intersperse " " (map (++".pdf") usedSlides))
>                      ++ " cat output Slides.pdf"

So now to the little details:

Grabbing the layer names
————————

Here is the first helper I wrote, wrapping HaXml’s `attrval` for a common case. This filter returns every tag whose `attr` attribute has the string value `val`.

> matchAttrString :: String -> String -> CFilter i
> matchAttrString attr val = attrval (attr, AttValue [Left val])

The next helper is one that maps a tag to its attribute value, otherwise discards anything else it sees. The HaXml function `iffind` will pass the `attr` attribute value of a tag to `literal` which just returns it. If the attribute isn’t found, or the XML data isn’t a tag, then `none` will discard it.

> showAttr :: String -> CFilter i
> showAttr attr = iffind attr literal none

The Inkscape layers are contained in “ tags. The name of the layer is in the `inkscape:label` attribute. I imagine this will change as Inkscape evolves. The `o` is the composition operator for `CFilter`s.

> isLayer = matchAttrString "inkscape:groupmode" "layer"
> getLayerNames = showAttr "inkscape:label" `o` isLayer `o` children

Isolating the layers
——————–

Again proceeding from the outside of my program inwards, a layer is isolated with this helper, using `iffind` to match either the layer name or the layer “Background” which I’m going to leave in all the output files. The final `keep` argument to `iffind` says to keep
parts of the XML that don’t have the `”inkscape:label”` attribute.

> selectLayer :: String -> Document Posn -> Document Posn
> selectLayer layer doc = onContent "-" (chip (visible `o` onlyLayer)) doc
>     where onlyLayer = iffind "inkscape:label" layerOrBG keep
>           layerOrBG l = if l == layer || l == "Background" then keep else none

In writing `visible` I was surprised that there was a combinator to set _all_ attributes for a tag, but none to set a single attribute.

> visible = setAttr "style" "display:inline"
> setAttr key val (CElem (Elem tag attrs cs) i) = [CElem (Elem tag newattrs cs) i]
>     where newattrs = (key, AttValue [Left val]) : filter ((/= key) . fst) attrs
> setAttr key val other = [other] -- Hackish?

As I mentioned before, there is no way that I see to directly apply this filter to an XML file using HaXml. The type `CFilter = Content -> [Content]` needs wrapping to apply to an XML `Element` directly. Notice how I have to pass in a `file` for error reporting; it feels like I’m doing things I’m not supposed to.

> filterElem :: FilePath -> CFilter Posn -> Element Posn -> [Content Posn]
> filterElem file f e = f (CElem e (posInNewCxt file Nothing))

> xmlElem (Document _ _ e _) = e

And now the function to actually apply a filter to an XML document. This is straight from the body of `processXmlWith` in the HaXml source, with `filterElem` pulled out.

> onContent :: FilePath -> (CFilter Posn) -> Document Posn -> Document Posn
> onContent file filter (Document p s e m) =
>     case filterElem file filter e of
>              [CElem e' _] -> Document p s e' m
>              []           -> error "produced no output"
>              _            -> error "produced more than one output"

Bits and pieces
—————

I also used a modified style for the HughesPJ pretty printer

> xmlStyle = style { mode = LeftMode }

And a big list of slide names with three digits, for this one-off job. Better would be to use an API for generating fresh temporary files.

> slideNumbers = map (printf "%03d") ([1..999] :: [Int])
> slideNames = map ("Slide"++) slideNumbers
About these ads

4 Comments

Filed under Haskell, LaTeX, Linux, Mathematics

4 Responses to Using HaXml to make a PDF slideshow from an Inkscape SVG

  1. ingolia

    I was just today thinking about making animated slides from an Inkscape SVG file by generating a set of PDF files with only a subset of layers. This seems like a pretty good place to start.

    For this project, when you don’t need to examine or update the top-level element itself, I found it useful to use

    myFilterElem cfilt (Elem _ _ contents) = concatMap cfilt contents
    layerNames = map verbatim . myFilterElem getLayerNames . xmlElem

    and similarly

    filterLayers layerPred doc = myOnContent (visible o onlyLayer) doc
    where onlyLayer = iffind “inkscape:label” layerOrBG keep
    layerOrBG l = if layerPred l then keep else none

    myOnContent cfilt (Document p s (Elem en ea contents) m)
    = Document p s (Elem en ea (concatMap cfilt contents)) m

  2. Björn Buckwalter

    Nice. What tablet model did you get? I’d like to be able to embed images (in particular free-form drawings from a tablet) in my literate haskell code, perhaps as XML or some mime-encoded binary format(?). Would of course need a decent editor to go with it…

  3. I got this
    Wacom Bamboo mostly because it was so cheap. (Student budget)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s