Mark Boas: Hyperaudio - Making Audio a First Class Citizen of the Web

Orde Saunders' avatarPublished: by Orde Saunders

Mark Boas (@maboa) was tallking about audio on the web at Scotland.js, these are my notes from his talk.


  • Non passive - not like a traditional player
  • Dynamically generated
  • Integrated into the web experience.

As hypertext is to text, hyperaudio is to audio.


  • Requires only partial attention - can do other things whilst listening.
  • Conveys emotion well
  • Goes deeper...

Making audio interactive

Demo linking a speech by Martin Luther King to the text of the speech - click on a word to jump to that word in the audio.

  • Set the playback position
  • Get the playback position

Use this to build hypertranscripts. Spans with data-t="123" that links to that time in the audio via JS. Audio player can work back to show a live transcript as it plays. If you get within about 250ms it's probably good enough.

Hypertranscripts break audio out of its black box and make it navigable, searchable, shareable. Can scroll through the content much more easily. Can share more easily - link straight to parts of the audio. Can use audio timings to link to images and other web resources.


  • Popcorn.js - light, modular and can trigger events based on time.
  • jPlayer - flash fallback for legacy (IE6+) support
  • jQuery

How do we create word aligned transcripts?

We need to get the transcription from the audio: third parties, by hand or both. Can't be fully automated.


Video has an audio track, we can apply exactly the same process to a video.

The Hyperaudio Pad

When we synchronise audio with text can manipulate audio in the same way as text? Copy and paste text over and it takes the media over with it. The hyperaudio pad. This is much easier than using a traditional video editor. Nothing is destroyed, just pointing to start and end times. Can type in transitions: [fade through green over 1 second] - this is like a script. Library called seriously.js to add effects.

  • Crate audio and video programs easily
  • Web based intuitive interface
  • Each program comes with hypertranscript by default
  • Remix the remixes
  • Programs come with source intact - nothing is left on the cutting room floor.

Ideal for non-professionals and professionals in a rush.

Something completely different

  • speech.js - text to speach. Perfect for creating robot voices.
  • Perceptive media -
  • Dynamically generated audio via Web Audio API - synthesises audio in the browser.

What's next?

  • WebRTC - peer to peer networking.
  • Opus Audio Codec - various codecs and dynamically adjustable
  • Web Speech API - speech based input, server based (like Siri)
  • Media Fragmetns - Split out a video by: area of screen, time dimentions, track (e.g. audio).
  • Mobile is getting better - still lots of constraints.
  • Live streaming is possible

Native support in browsers is getting much better - IE8 is the main problem.

The future is now.

Comments, suggestions, corrections? Contact me via this website