Serious Games @ LIP6
Serious Games By the MOCAH research team (LIP6 lab, Paris)

Home page > Research Topics > Dialogue dans les jeux serieux > LipSync component with Flash Builder 4 (Feder Project)

LipSync component with Flash Builder 4 (Feder Project)

Thursday 18 November 2010, by Amel Yessad

We are actually working on a framework to help developing “serious games”.

So here you ‘ll find an exemple of what we did on the dynamic lipsync problem. This could be one of the component of this framework…

<span class="caps">PNG</span> - 184.6 kb

First of all, what is dynamic lipsync?

“Lip-sync or Lip-synch (short for lip synchronization) is a technical term for matching lip movements with voice” (wikipedia). So if you say an [e] (in english), you will have to display a face smiling just like when you speak. Globally we’ll call the [e] sound a phonem and the smilling face a visem. One phonem usually have one visem associated but one visem can have different phonems.

What did we do?

We didn’t find a lot of stuff explaining how to implement that in Flash so we decided to do our own experiments. As most of the ressources were theorical, we had a step by step aproach based on what we understood and what Flash allow us.

The first problem was to isolate one audio channel to work on it (and only it). This feature might seem very basic but Flash does not offer a proper sound engine. You think i’m kidding? if you have the debugger on your browser just try to open deezer or any sound player AND any spectrum analyser in Flash found on the web like this one.

As you can see, it is not working. The problem is that the spectrum analyser in Flash doesn’t give you a great control over the sound you’re playing. For a mysterious reason you cannot have the computeSpectrum() on a SoundChannel but only globally on the SoundMixer. The bug is resulting from the fact that the SoundMixer is common to all application playing music on your computer with SoundMixer. Which of course drop a security error… Anyway, if you want a do some lipsync, you don’t want to have the voice and the game’s music mixed together!

The solution came from the Audio Processing Library for Flash (ALF) to get a better sound engine! Well once you have this library, you have quite a lot of cool features to manipulate sounds!

If you search about lipsync over google you ll find that there is two different transformations of the sound’s shape that seems to be useful to detect phonems. The first one is the Linear Predictive Coefficients (LPC) and the second on is the Fast Fourrier Transform (FFT). We have tried both. Actually we didn’t see any “typical-easy-to-use” pattern after the LPC but the FFT gave us some great results.

Here you can see different representation of the FFT for 3 different sounds made by 2 persons.

Representation of the FFT for 3 different sounds made by 2 persons
sounds1st Person2nd Person
This is a [a] (like in “attack”) <span class="caps">PNG</span> - 18.7 kb <span class="caps">PNG</span> - 17.9 kb
This is a [e] (like in “bee”) <span class="caps">PNG</span> - 14.8 kb <span class="caps">PNG</span> - 15.6 kb
Finnaly here is a [u] (like in… well english poeple don’t have this sound we would say “rue” in french ) <span class="caps">PNG</span> - 16.2 kb <span class="caps">PNG</span> - 15.9 kb

As you can see there is some similar points!

So we simplify this shape into a 5-bits array. Then we just did some pattern-matching to find out if the sound looks like something known. This is probably not the best way to do it but it is working and not too bad. It might be usefull sometimes to look at the order of the peak (might even be enough ).

I’m sorry but actually it’s only working for French but you’ll only have to adapt the pattern-matching and the visem to have it working in your language!

Update: If you want to play, here are the sources. Warning: the work was not yet adapted to Flex 4.0.

Reply to this article

find on the site

Our events