Celemony Melodyne DNA: Has this man done the impossible?

This is Peter Neubäcker from Celemony. You've probably already seen this promo video of his new product, Direct Note Access. It's a new version of the autotune-type pitch correction software which - it appears - can work with polyphonic sound. Record a chord, and it lets you explode that chord and re-tune individual notes. I thought that this was impossible. Peter Neubäcker says "What doesn't work in theory can still work in reality."
Well, maybe. In May 2005, a startup called Zenph Studios claimed to have cracked the problem of polyphonic transcription. They analyse old piano recordings (i.e. Glenn Gould playing Goldberg Variations in 1955) and produce a high-resolution MIDI-type file with exact pedal movements and note/pressure data. They feed that into a Disklavier MIDI grand piano, and record the results. They've had good reviews (at least in audiophile mags) for the recordings.
The potential of this kind of polyphonic transcription is enormous - it would let you sample a performance, not just the recording of a performance. Zenph may be able to do it in a slow, precise, way - presumably with a considerable amount of human help, and they're just pulling out note data, not separating the actual sounds of the notes. Celemony are claiming a lot more. If it works, it's a revolution. It shouldn't be long before you can separate any mixed recording into unmixed tracks. You'll be able to turn any guitar into a guitar synth with no special hardware.
It's very exciting. Does it actually work? I can't imagine how it could, but I know almost nothing about signal processing or the theory of sound. That's where you come in... (More coverage at Create Digital Music)

Posted by Tom Whitwell.

Comments:

Seriously impressive, if it works as well as claimed.

I notice that the examples in the video are all of plucked/hit sounds (clean guitars, fender rhodes), so I wonder if the algorithm is using the fast attack times to help identify the timing of the notes, and the pitch and various harmonics (which remain constant in time - I haven't noticed any pitch bends) to get the pitch. It's very impressive that it seems to recognise the very gravelly rhodes bass notes though, given how different their timbre is to the standard quiet rhodes notes.

I think the theory he's referring to with "What doesn't work in theory can still work in reality" is the self-evident theory that in order to split a recording up into individual notes, you have to tell the algorithm exactly what a general "note" sounds like. If celemony have managed to do this well, even for just the limited class of "plucked-type" sounds, then I think they're still miles ahead of the academic research.

# posted by

Chris : 12:58 am

The mind reels. Or boggles. Or one of those quaint old expressions that I don't know the origin of.

# posted by

KeithHandy : 1:29 am

By the way From where you Find that>>??

www.funnythread.com

# posted by

Unknown : 6:52 am

I guess it could work to a certain extent in that you would be able to manipulate individual frequency bands that change over time, but how on earth are you going to isolate individual instruments from a mix?

# posted by

Unknown : 8:17 am

Re: sander - "how on earth are you going to isolate individual instruments from a mix". The FAQ on the celemony site says that while it can separate notes, it can't separate different instruments. This is OK if the two instruments never play the same note at the same time, but if this does happen, the program treats it as a single note.

# posted by

Chris : 9:03 am

I've seen the presentation yesterday and it works and it is really mindblowing. Yes, we would have to listen to it in a quiet environment other than Musikmesse, but let me tell you: it is the holy grail.

# posted by

tanzpartner : 9:26 am

Perhaps it uses some kind of granular and-or resynthesis system to recreate the notes. So it hears the chord, analyses it, and replaces it with a slightly smudgy copy, which can then be tweaked...

# posted by

Tom Whitwell : 9:41 am

My guess is it uses short-time ICA for a first cut segmentation and then searches for harmonically related partials (perhaps with some help from the phase spectrum).

The fact that it can't separate two tones of the same pitch suggests it relies heavily on harmonics.

So you won't be able to separate drumkits, and it might not work with timbres that have lots of non-harmonic partials (e.g. bells).

What's really impressive is the quality of the implementation. At least in the demos I heard there were almost no artifacts. Now that's mindblowing.

# posted by

kkonkkrete : 10:35 am

As to how it works? Simple really. Voodoo.

# posted by

tomh : 1:08 pm

for those who think it only works with guitars, or only with certain types of sinal ... then watch the sonicstate messe video. At 12:00 peter extracts Chet Bakers recording from 1950 into separate components and plays around with the trumpet solo.

http://www.sonicstate.com/news/shownews.cfm?newsid=6281

# posted by

Simulated Person : 1:11 pm

"You'll be able to turn any guitar into a guitar synth with no special hardware."

This much is actually already possible. Details here: http://www.decrementia.com/MeansBlog.htm#p030608

# posted by

Unknown : 1:49 pm

This man is a genius. I can't even begin to fathom the software DSP involved with this program. So much FFT....

Anyway, there comes a point when an audio track can only contain so much data. If a piano and a guitar play the same note at the same time, it's guaranteed that the two will play in the same exact frequency at any given time. This means that if you split the sounds up, you'll have to make up some data to compensate the cancellation of the two instruments

# posted by

Unknown : 2:33 pm

Well, the Zenph thing is a little different than this. As a transcription tool, you have to do more -- just to figure out things like pedaling, articulation, dynamics, which our ears are more sensitive than we might imagine -- even on a piano, let alone something more complex. Just separating the pitches is hard, but it least removes some dimensions.

# posted by

Unknown : 3:47 pm

Tom wrote that piece recently bemoaning the way that the music industry overprocesses music... I'm interested in the idea of abusing the Melodyne system, by feeding it crazily distorted sounds and seeing what it comes up with, but isn't it likely to result in music based even less on performance and more on producers engineering product out of what might have been quite ropey performances? It's clever technology, but I don't know if it's going to help the spirit of music.

# posted by

Museum of Techno : 12:39 am

Museum of techno: yeah, but at least it will be a fresh, new kind of abuse. Out of the fire and into the frying pan. A relief until it becomes the staple of 95% of the hit songs on the radio.

On a more serious note, I think "the spirit of music" is separate from technology altogether. You can have spirit and technology, or just spirit, or just technology, or neither. They're on two different axes, and we have to resist the temptation to draw correlations where there aren't necessarily any.

Now, marketing, fashion, and focus groups, that's another story.

# posted by

KeithHandy : 12:49 am

This blog is very interesting, I'm looking forward to reading more!

# posted by

Mariposa : 6:28 am

is that Santa Claus?

# posted by

Unknown : 7:20 am

I have no idea how he does it, but ...

1) this research problem has been around for decades.

2) If you want to start reading about related DSP, try looking up "phase vocoder" and "multi-frequency cepstral coefficients" (or the acronym MFCC). These are variants on the FFT for audio pitch/overtone analysis.

3) Then, look up "polyphonic transcription" or "polyphonic transcription algorithm".

Have fun!

# posted by

Unknown : 3:55 am

Doesn't this creep you guys out just a little bit? Like, next, he's gonna build a program to gain direct access to your thoughts/feelings, and pull them apart with similar ease. Yes I do worry.

# posted by

Anonymous : 8:28 am

Maybe I'm confused, but what's the big deal with this? Hasn't Finale and Sibelius software had polyphonic transcription of sound for awhile now? I don't see how adding an auto tune program is that big of a step forward....

# posted by

Wes : 3:57 pm

Wes: transcription discards timbral information, and can't recreate the actual sounds of isolated notes from within a chord.

# posted by

KeithHandy : 4:08 pm

Doesn't transcription software have to isolate the actual sound in order to determine it's specific pitch? Maybe to an extent, but the sound is incomplete or chopped, but it has to be separating the notes because they read chords from wave files. Alternatively wouldn't the celemony dna be flawed in separating chords that take place over a full octave or more? Surely pulling a high c and a middle c from the same chord is going to split some harmonics up between the two that they would normally share, making at least one of the notes incomplete?

# posted by

Wes : 4:35 pm

Surely pulling a high c and a middle c from the same chord is going to split some harmonics up between the two that they would normally share, making at least one of the notes incomplete?

Yes. Maybe as DNA attempts to make the "best guess" in dealing with shared harmonics, it's not a dealbreaker for human perception.

But I would think transcription software doesn't even make the attempt, probably discarding all but the loudest few harmonics before beginning to analyze. (Not being the person who coded said software, I'm not saying this authoritatively, but it seems logical.)

DNA has to be more complicated than "transcription plus autotune" as you initially suggested.

# posted by

KeithHandy : 4:46 pm

I have always wondered if there was technology that could do this, the same way I always wondered if one day , amuptated limbs could really be reattached like Luke Skywalker's hand... no joke! This is a fascinating story!

# posted by

Kaybee322 : 12:21 am

Just to explain and expand on what I wrote before:

1) The "phase vocoder" is an old way of extracting precise time-varying pitch and overtone frequencies fromm FFTs of one instrument playing one note at a time.

2) "MFCC"s are a sort of Fourier transform with a logarithmic frequency axis (in other words, octaves instead of Hertz). This is great for recognizing overtones and chords.

Here's more:

3) By analyzing MFCC's or something called a Q-constant Fourier Transform (QFT), you can get a "chord spectrum" which is essentially the chords in the music. If you search for "chord spectrum" online, you will see lots of interesting papers.

4) Yes, the problem of instruments playing exactly the same pitch is there, but real instruments will be off, or have vibrato or whatever.

5) kkonkkrete mentioned ICA. This is Independent Component Analysis, which is a "blind" way to adaptively separate a signal into different "sources" with no prior information.

That was a lot of words, but it means that there are real known algorithms to do what Celemony claims to have done.

# posted by

Unknown : 6:43 am

Drat. I was slightly incorrect in my terms. MFCC means "mel frequency cepstral coefficients". Look them up on Wikipedia.

# posted by

Unknown : 6:58 am

Well, it doesn't sound perfect, but it sounds good. It still has some grainy quality in the audio. But of course, who cares. It opens new possibilities to sound mangling. I'd love to apply it to complex chords and find out the random errors it would churn out. I think it will allow us to experiment a lot. It will be fun, if it is as easy as the video shows.

Interesting news, considering how boring the audio world was becoming in the last years.

However, i'd like to see it working. I still remember all the hopes we put in Vocaloid. Sigh.

# posted by

Masked Avenger : 5:45 am

My hunch is that it's a formant based system. Extracting an individual pure tone note is simple (narrow band fft) but what seems to have been managed here is working out which harmonics should be extracted and shifted along with it. It could be using the attack, as suggested, to guide this in some way.

# posted by

Dan : 9:00 am

watch the demo at musicmesse. wow.

http://digitalmusicmag.blogspot.com/2008/03/wow-melodyne-direct-note-access.html

# posted by

Unknown : 6:24 pm

Can't wait till v/vm get hold of this technology!

http://brainwashed.com/vvm/

# posted by

cemenTIMental : 5:38 pm

There's a video on Youtube of SoundOnSound editor Paul White talking to Peter http://www.youtube.com/watch?v=xNx7MrBPm-Y

Around 4 minute in they talk about how the detection works (in broad terms) and the suggestion is that it doesn't only look for the attack part of the notes but may use the body of the note and then (I guess) search backwards for the start - he gives a soft attack violin as an example.

You have to wonder what is next for melodyne and what they are working on in the backroom now :-)

# posted by

Anonymous : 3:36 pm

To me, it doesn't seems hard to understand how to distinguish between a unissone and an octave (or other non-dissonant intervals) harmonic interval : after stablishing the base tone, you simply have to compare the intensity of the harmonics; in case of fifths, for example, this would be followed by searching for dissonant harmonics.

In previous versions of melodyne, there is a bar in wich you can adjust the sensivity. This seems to be a proof that the most important thing is to analise the intensity, and not only the nature, of the harmonics.

IMO the real issue is to understand how it would work for dissonant intervals....

# posted by

Anonymous : 8:26 pm

Peter Neubäcker and Celemony are the real deal. If it was any other developer making this claim, I would call BS. But you can rest assured that Melodyne DNA will exceed expectations. Melodyne has been doing the impossible since 2001.

# posted by

Anonymous : 7:04 am

....as I watch all of the vids on the process, it sems absolutely mindblowing....

my question is, when you are splitting 2 notes that are exactly the same, say piano and guitar, is the program "smart" enough to identify the two separate instruments if the start time for eac instrument is slightly off...as you would see in a live performance...

# posted by

Anonymous : 4:02 am