RE for Vocal Timbre Dynamics manipulation/transformation?

This forum is for discussing Rack Extensions. Devs are all welcome to show off their goods.
Post Reply
User avatar
jappe
Moderator
Posts: 2437
Joined: 19 Jan 2015

28 Jan 2015

Hi!

I wonder if there is any dev planning to make an RE with the aim to tweak vocals?

I'm thinking something more advanced than a standard vocoder or : something that can analyze the timbre dynamics and provide a GUI that can tweak relevant parameters. 

Most of the time I sound like a pillock when I try to sing. Melodyne can correct the pitch and formant, but there's more than this to a beautiful singing voice, and I imagine that's the timbre dynamics.


Also, if the above can be done, it should also be possible to have voice transformation presets like "Frank Zappa", "Demos Roussos" and "Mark Mothersbaugh".
A person can mimic another persons voice, sometimes with perfection, so it should be possible for a program to at least help out with that.

User avatar
Juan Rosa
Posts: 96
Joined: 26 Jan 2015

30 Jan 2015

Agree!

User avatar
bxbrkrz
Posts: 3811
Joined: 17 Jan 2015

30 Jan 2015

jappe wrote:Hi!

I wonder if there is any dev planning to make an RE with the aim to tweak vocals?

I'm thinking something more advanced than a standard vocoder or : something that can analyze the timbre dynamics and provide a GUI that can tweak relevant parameters. 

Most of the time I sound like a pillock when I try to sing. Melodyne can correct the pitch and formant, but there's more than this to a beautiful singing voice, and I imagine that's the timbre dynamics.


Also, if the above can be done, it should also be possible to have voice transformation presets like "Frank Zappa", "Demos Roussos" and "Mark Mothersbaugh".
A person can mimic another persons voice, sometimes with perfection, so it should be possible for a program to at least help out with that.
Like a convolution voice RE. I am for it!  :s0826:
757365206C6F67696320746F207365656B20616E73776572732075736520726561736F6E20746F2066696E6420776973646F6D20676574206F7574206F6620796F757220636F6D666F7274207A6F6E65206F7220796F757220696E737069726174696F6E2077696C6C206372797374616C6C697A6520666F7265766572

User avatar
selig
RE Developer
Posts: 11685
Joined: 15 Jan 2015
Location: The NorthWoods, CT, USA

30 Jan 2015

I'm not aware of any devices that can alter the sound of your voice, ala turning Tiny Tim into Frank Sinatra. 

There are already devices that can "analyze the timbre dynamics and provide a GUI that can tweak relevant parameters. " and it's called an EQ. ;)

I would fully expect synthesized speech to accomplish this goal much sooner than voice modification. 
:)
Selig Audio, LLC

User avatar
jappe
Moderator
Posts: 2437
Joined: 19 Jan 2015

30 Jan 2015

selig wrote:I'm not aware of any devices that can alter the sound of your voice, ala turning Tiny Tim into Frank Sinatra. 

There are already devices that can "analyze the timbre dynamics and provide a GUI that can tweak relevant parameters. " and it's called an EQ. ;)

I would fully expect synthesized speech to accomplish this goal much sooner than voice modification. 
:)
I agree that it could be simpler to do with synthesized speech, where for example dialect can be added rather than having to transform a possibly bad imitation into Elvis true voice.
(like it's sometimes easier to make an entirely new program than to transform a complex program into another.)
But if the synthesized speech can be done, then we'd only need to have phonem/speech detection of a voice in real time to feed that singing synthesizer with text, and we could have that dream device.


I was vague when I mentioned timbre dynamics: I'm actually thinking about not static timbre, but instead catching spectral patterns of how the timbre changes over time or is dependent on other parameters like pitch or volume or tone duration. A device that works in the frequency domain, like Parsec...hmm..."Voicec"

To gather all possible intelligence from a singing voice, and make a smart interface to tweak interesting parameters without too much effort. 

So when I want to tweak the timbre dynamics, I wan't the RE to make an analysis of clusters of frequencies that are related to each other (like if frequency A Increases Y times, then frequency B decreases Y x 2 time).
And after analysis, I want to have knobs to change relevant parameters for the identified change patterns, like for example Increasing/decreasing Y in the example above.
That and tons of other possible modifications.


Hmm...unsure if that made anything more clear

User avatar
selig
RE Developer
Posts: 11685
Joined: 15 Jan 2015
Location: The NorthWoods, CT, USA

30 Jan 2015

selig wrote:I'm not aware of any devices that can alter the sound of your voice, ala turning Tiny Tim into Frank Sinatra. 

There are already devices that can "analyze the timbre dynamics and provide a GUI that can tweak relevant parameters. " and it's called an EQ. ;)

I would fully expect synthesized speech to accomplish this goal much sooner than voice modification. 
:)
jappe wrote:
I agree that it could be simpler to do with synthesized speech, where for example dialect can be added rather than having to transform a possibly bad imitation into Elvis true voice.
(like it's sometimes easier to make an entirely new program than to transform a complex program into another.)
But if the synthesized speech can be done, then we'd only need to have phonem/speech detection of a voice in real time to feed that singing synthesizer with text, and we could have that dream device.
We already have speech to text, and we already have text to speech. We're just waiting for the quality to improve, right? 

But here's something else to consider. It's the unique phrasing and intensity changes that can't be easily applied by a real time device. For someone wanting to use this effect, they would STILL have to put in a lot of work in learning to phrase like the singer they are emulating. Otherwise it wouldn't be worth the trouble cause if you don't have good phrasing there's little a voice emulator can do for you. That is to say, there are many more qualities that make a great vocal track beyond tone and pitch, but folks seem to assume that if you can just tune me and correct my tone, I'd be a fantastic singer - this would be true if you lack just only those two qualities, and nail the rest.
jappe wrote:I was vague when I mentioned timbre dynamics: I'm actually thinking about not static timbre, but instead catching spectral patterns of how the timbre changes over time or is dependent on other parameters like pitch or volume or tone duration. A device that works in the frequency domain, like Parsec...hmm..."Voicec"

To gather all possible intelligence from a singing voice, and make a smart interface to tweak interesting parameters without too much effort. 

So when I want to tweak the timbre dynamics, I wan't the RE to make an analysis of clusters of frequencies that are related to each other (like if frequency A Increases Y times, then frequency B decreases Y x 2 time).
And after analysis, I want to have knobs to change relevant parameters for the identified change patterns, like for example Increasing/decreasing Y in the example above.
That and tons of other possible modifications.


Hmm...unsure if that made anything more clear
Yes, but it seems you are asking for something that would require almost as much training to pull off as learning to sing better in the first place IMO! At that level of complexity, you would also have to make it timeline based, which means it can't be an RE. You would have to introduce a new set of controls, concepts, and parameters that could be quite complex to someone who has never manipulated speech on this level before. With these comments I'm totally ignoring the limits on current technology that would likely prohibit such a device today, certainly as a real time effect.

And finally, I'm not sure I'd even care to listen to someone who doesn't have any vocal personality of their own - I'd probably rather listen to synthesized vocals if those were my only choices!

For me, the biggest thing I coach singers on in the studio is "believability". I don't want to hear someone READ the lyrics even if they're in tune and have great tone: I want to FEEL the lyrics. We can fix the rest! ;)

There's very likely no device in our lifetime that will impart a believable feel on a lifeless vocal. 

Some day, but not today (for which I'm thankful!).
Selig Audio, LLC

User avatar
jappe
Moderator
Posts: 2437
Joined: 19 Jan 2015

30 Jan 2015

selig wrote:I'm not aware of any devices that can alter the sound of your voice, ala turning Tiny Tim into Frank Sinatra. 

There are already devices that can "analyze the timbre dynamics and provide a GUI that can tweak relevant parameters. " and it's called an EQ. ;)

I would fully expect synthesized speech to accomplish this goal much sooner than voice modification. 
:)
jappe wrote:
[...
jappe wrote:dreaming about a voice-to-voice transformer
jappe wrote:...]
selig wrote:
We already have speech to text, and we already have text to speech. We're just waiting for the quality to improve, right? 

But here's something else to consider. It's the unique phrasing and intensity changes that can't be easily applied by a real time device. For someone wanting to use this effect, they would STILL have to put in a lot of work in learning to phrase like the singer they are emulating. Otherwise it wouldn't be worth the trouble cause if you don't have good phrasing there's little a voice emulator can do for you. That is to say, there are many more qualities that make a great vocal track beyond tone and pitch, but folks seem to assume that if you can just tune me and correct my tone, I'd be a fantastic singer - this would be true if you lack just only those two qualities, and nail the rest.
jappe wrote:[...explaining a vision about a device with the purpose to modify spectral characteristics of a voice, without any ambition to transform it to Elvis...]
selig wrote:
Yes, but it seems you are asking for something that would require almost as much training to pull off as learning to sing better in the first place IMO! At that level of complexity, you would also have to make it timeline based, which means it can't be an RE. You would have to introduce a new set of controls, concepts, and parameters that could be quite complex to someone who has never manipulated speech on this level before. With these comments I'm totally ignoring the limits on current technology that would likely prohibit such a device today, certainly as a real time effect.

And finally, I'm not sure I'd even care to listen to someone who doesn't have any vocal personality of their own - I'd probably rather listen to synthesized vocals if those were my only choices!

For me, the biggest thing I coach singers on in the studio is "believability". I don't want to hear someone READ the lyrics even if they're in tune and have great tone: I want to FEEL the lyrics. We can fix the rest! ;)

There's very likely no device in our lifetime that will impart a believable feel on a lifeless vocal. 

Some day, but not today (for which I'm thankful!).
Yes that voice-to-voice transformer is not likely feasible with a good result. And like you say, there's many characteristics that you'd need to imitate anyway (unless that transformer is going do do its own bolting interpretation of what I sing, and that could be really weird and with awful lipsync lol). And the technical difficulties.

But to capture spectral patterns in a voice, and tweak it, IMO that should be feasible. And when you've captured them, then they can be tweaked. 
EDIT: though I don't yet know about the RE SDK limitations when it comes to FFT/realtime manipulation in the frequency domain. (...and it shall ofc not be discussed here due to NDA).
I have registered as an RE dev, but have had no time yet to dig into the API:s and manuals.



Post Reply
  • Information
  • Who is online

    Users browsing this forum: No registered users and 18 guests