mircoschoenfeld
  • publications
  • teaching
  • talks and workshops
  • community service
  • research projects
  • lab@ubt
  • blog

Extract Notes From Powerpoint Files

Edit 2022-03-25: The tool is part of something bigger! Read more in a separate post.

Recently, I built a PDF-viewer that mimicks Powerpoint's presenter view. Among other things, it shows slide notes next to the corresponding slides. But, since I continue using Powerpoint to create the slides I also continue writing down slide notes in Powerpoint. That's why I needed a way to extract the slide notes from Powerpoint presentation files and convert them to a text file that my PDF-viewer accepts.

The result is a Bash script that requires xmlstarlet (sudo apt install xmlstarlet). Usage is pretty straight forward:

extractnotespptx /path/to/presentation.pptx

It produces a file /path/to/presentation.notes containing all the slide notes in the format that is required by the presenterview-detached:

#1
Lorem ipsum...

#2
Further bla bla
...

Under the hood¶

Continue reading if you want to know how its done. The complete script can be found on github.

Unzip PPTX¶

Fortunately, pptx-files are just zip-files containing a bunch of XML-files. Inside an unzipped pptx you can find a folder ppt/notesSlides and a number of XML-files called notesSlides1.xml, notesSlides2.xml, notesSlides3.xml, and so on. These contain the notes that were added to the slides. Notice that the number in the file name doesn't match a slide, i.e. notesSlides13.xml may contain the notes you added to slide number 22.

Extract Notes From XML¶

The above-mentioned XML-files have two interesting fields: /p:notes/p:cSld/p:spTree/p:sp/p:txBody/a:p/a:fld[@type='slidenum']/a:t and /p:notes/p:cSld/p:spTree/p:sp/p:txBody/a:p. The former contains the number of the slide to which the notes belong and the latter contains a lot of subfields with the actual slide notes.

Using xmlstarlet, you can easily extract the contents: xmlstarlet sel -t -m "//a:fld[@type='slidenum']" -v . gives you the slide number and xmlstarlet sel -t -m "//p:txBody//a:p[.//a:r//a:t]" -v . -n extracts the actual note text spread across several lines, one for each XML subfield.

The final script collects these things in an associative Bash array, sorts everything based on array keys, i.e. slide numbers, and writes out a Markdown file that is needed for the presenterview-detached.


  • « Personal Information Manager: Updates
  • nctx: Analyze Networks in ConTeXt »

Published

9. Apr, 2021

Last Updated

Apr 9, 2021

Tags

  • linux 5
  • open source 15
  • presentations 5
  • teaching 22

Links

  • git
  • presenterview-detached

Find me here

  • This website contains no ads, cookies, trackers or social media buttons.
  • Powered by Pelican and Elegant.