Ambisonics - not the format of the future

I know, radical, right? But let's talk for a moment about our ear-brain system - how do we perceive position? I'm not going to go all academic here and cite sources because you can easily write a thesis on how all of this works and nobody wants to wade through a thesis...




Start with the obvious: we have two ears. Our head causes an acoustic shadow for frequencies that have a wavelength smaller than the size of our head. This means that a high frequency sound coming from the right will be louder in the right ear than the left ear. Frequencies with a wavelength larger than our head will diffract around the head and reach the other ear with no real change in level (unless the sound is very close to the head - inverse square law etc etc).

However, my head is only about 1.7kHz wide (20cm ish) but I can discern the direction of sounds that have frequency content below 1.7kHz. So there's something else at play here.


Time. Our brain uses the arrival time of sounds at each ear to position sounds. You can prove this by playing a sound through two loudspeakers separated a distance apart, if you add delay to the left loudspeaker the "image" pans to the right. The level coming out of the loudspeakers stays the same and therefore the level at your ears stays the same. But the source still appears to move. Our ear-brain system uses time information as well as level information to decode location.

This is where ambisonics has issues. Ambisonics is level based panning (at low orders and with low density loudspeaker positions). There is no time information. In order to recreate time information for what I would call real-world loudspeaker setups (i.e anything that's not an array of loudspeakers forming a circle around the listener) you need to know your loudspeaker layout, the position of every source in your environment and then you calculate the arrival time of each source at each loudspeaker in real-time (not actually as complicated as it sounds - I've written a Q-SYS plugin that does it using a delay/level matrix).

Low order ambisonics only excites half of the ear-brain localisation system.

However, be warned - as soon as you start moving things around on a system that does calculate time arrivals you get the side-effect of pitch shifting! My preference is to use a hybrid - time and level based positioning for stationary objects, and only level based positioning for moving objects.

I know that's not the end of the story - our pinnae are also a vital part of discerning position. This is why those that can only hear with one ear can still localise sounds. But let's tackle one part of this process at a time!

Thinking about all of this makes you realise why it's so hard to generate decent immersive audio content for playback on headphones. There are so many ways to excite our ear-brain system, and all of them need to be activated in exactly the right way for our brain to be fooled (I think "exactly right" is different for each person too!).

No comments:

Post a Comment

Processing as a proof of concept tool

I've given my self a few days to do some of my favourite sort of work: quickly throwing together a proof of concept of an idea. I had fo...