RVC's Realtime AI Voice Changer - Is It Any Good?

preview_player
Показать описание
Today, you will learn how to use RVC's free AI Voice Changer - FREE & Realtime! Transform your voice into your favorite YouTuber, VTuber, Anime Character, and more! We'll also talk about if this is better than W-Okada's voice changer or if you should just stick with that one.

Go to this link to install the Voice Changer:

How to get it to work with Discord and other apps:

How to find your own models:

How to train your own model:

W-Okada's Voice Changer:

Search the most complete list of AI Tools, also available in 中文, español, 日本語:

DISCLAIMER:
Please do not use these models for malicious, harmful, or deceitful things. Please use them to have fun and experience this new technological age.

~~~~~~~~~~~~Timecodes~~~~~~~~~~~~
Intro - 0:00
Installation Tutorial - 0:23
Using the Software - 3:42
Is it better than W-Okada? - 9:44
Wrapping up - 10:59
~~~~~~~~~~~~Timecodes~~~~~~~~~~~~

Here's our equipment, in case you're wondering:

Secondary GPU: GTX 1080 (too old, would not recommend)

If you found this helpful, consider supporting me here. Hopefully I can turn this from a side-hustle into a full-time thing!
Рекомендации по теме
Комментарии
Автор

I've been experimenting with this for a bit, and I'm disappointed by how vague and incomplete the English documentation on these settings is. In an effort to remedy this, here's my breakdown of each setting:

Response threshold: Controls the noise gate. Any sound below the threshold is suppressed. This is used to prevent background noise and hiss from being turned into strange mumbling. Equivalent to "S. Threshold" in w-okada. Not applicable in RVC WebUI.
Pitch settings: Applies a pitch offset to your input voice. Every multiple of 12 setting increases or decreases the voice by an octave. Adjustments by 1 increase or decrease by a semitone. Using whole octaves is primarily used to ensure you can sing in the same key. Equivalent to "TUNE" in w-okada. Equivalent to "Transpose" in RVC WebUI.
Index rate: When an index file is provided, this slider augments the target voice by preserving more of its accent and less of the input voice (to reduce tone leakage). This is particularly useful for voices trained with a low epoch count (around 200-ish or less). If set too high, it can cause strange pronunciation artifacts. I usually find something around 0.30 to sound good, but it varies by voice model. Equivalent to "INDEX" in w-okada. Equivalent to "Search feature ratio" in RVC WebUI.
Loudness factor: How little to preserve the loudness of the input performance. At 0, the loudness of the cloned voice should match the loudness of the input voice. At 1, the cloned voice will always be at full loudness. 0 is useful if you want to distinguish between whispers, talking, screaming, etc. 1 is useful to have the cloned voice always speak loudly and clearly, as loud as the loudest things it was trained on (which can have artifacts such as mic clipping depending on the training set). Values in-between provide partial volume control biased toward being louder, the closer you get to 1. There is no equivalent in w-okada. Equivalent to "volume envelope scaling" in RVC WebUI.
Pitch detection algorithm: Different algorithms are better at different things. rmvpe is the current state-of-the-art and works fastest and usually with the highest quality. Equivalent to "F0 Det." in w-okada. Equivalent to "pitch extraction algorithm" in RVC WebUI.
Sample length: The realtime voice changer works by sending small chunks of audio for quick conversion, then stitching them together. Longer sample lengths feed in longer chunks, making the stitches less obvious and reducing GPU requirements but increasing output latency. On a low end GPU, setting this too low will make the GPU unable to keep up and produces stutters. On a high end GPU, setting this too low will cause warbling as an artifact of stitching many overly-short chunks together. Equivalent to "CHUNK" in w-okada. Not applicable in RVC WebUI.
Number of CPUs: Self explanatory. Note, however, that rmvpe is a GPU-based pitch extractor and should be relatively unaffected by this setting. There is no equivalent in w-okada. Not applicable in RVC WebUI.
Fade length: The length between chunks to crossfade together. Longer may reduce warbling. Equivalent to "overlap" in w-okada advanced settings. Not applicable in RVC WebUI.
Extra inference time: How much old audio to load into each chunk. The extra context usually improves voice quality for the generated chunk but is more demanding for the GPU. Equivalent to "EXTRA" in w-okada. Not applicable in RVC WebUI.
Input noise reduction: Attempts to remove non-speech background noise from the input to prevent sounds from being turned into strange mumbling. Equivalent to "NOISE" in w-okada. Not applicable in RVC WebUI.
Output noise reduction: Applies the same noise reduction to the output voice. Possibly good for poorly trained voices with lots of background noise. There is no equivalent in w-okada, but the usefulness of this setting is dubious. Not applicable in RVC WebUI.
Input voice monitor: Lets you hear the voice audio being passed in to the voice changer, sent to the target output device. Useful to ensure you are passing in the audio you actually want or to passthrough your audio without voice changing. Comparable to "monitor" settings in w-okada. Not applicable in RVC WebUI.
Output converted voice: Outputs the voice conversion to the target output device.

Main features RVC realtime has that w-okoda doesn't:
Loudness factor controls. W-okoda seems to always use a value of 0.
Significantly lower CPU usage at equivalent performance settings, in my experience.

Main features that w-okoda has that RVC realtime doesn't:
No system to save model presets.
Input/output gain is missing.
Input noise reduction is less robust compared to w-okoda, which offers echo reduction and multiple noise suppression techniques.
Unlike w-okoda, you cannot passthrough to the input mic, instead requiring the use of virtual audio cable to pass the cloned voice into voice calls and microphone recording programs.
In w-okoda, when the mic loudness falls below the response threshold, the tool is paused until speech is once again loud enough, saving GPU and CPU resources. RVC realtime always passes audio whenever it is running.
Unlike w-okoda, you cannot monitor the cloned voice while outputting it. You can work around this by using the "listen" feature in the Windows sounds panel on a virtual audio cable instead.
No built-in recording functionality.
Missing most of the settings in the w-okoda "advanced settings" menu.
No way to choose which GPU to run the voice model on. You can get around this by setting CUDA_VISIBLE_DEVICES=# in a terminal before launching the tool from there, where # is the index of your target GPU (0, 1, 2, etc.).

MarioManTV
Автор

Intallation tutorial : " So I wont go through most of the installation " wtf dude

paulbess
Автор

you can use it in a real time environment like Zoom or Teams. You just need to delay the video the same length as what the AI voice is delayed. Use OBS for that.

DominicFlynn
Автор

these ai voices are scarily accurate, even the markiplier one

Hollarite
Автор

Your content can still be easily understood by a non-native English speaker. Thank u <3.🥰

rjay
Автор

I rlly do appreciate the effort u put in ur content i rlly do

The_Spooky_Boi
Автор

Will this work better on AMD video cards compared to W-Okada?

VeteranX
Автор

Anyone else having issues with this particular application and working in games? It seems like it does not work in the background?

TheDkmariolink
Автор

it just can't work (No response / stopped working when I run it)
any ideas of what could I have been possibly done wrong?
The code can't even run, it just show the input and output device and then 'cuda_is_available: True'

Then no more

adad
Автор

hey can anyone help me?, when i try to load, it s just showing terminal, with two lines, and gui is not opening

Piyush-bv
Автор

How to uninstall RVC or Okada software if we want to update? is it okay to just delete the folder?

rrchids
Автор

Can you recommend a good text to speech AI that i can change the voice using RVC?

liangzx
Автор

I am trying this one because my RTX 2060 is struggling a bit with W-Okada and it cuts out a letter or 2 from some words i Say

saturnGT
Автор

i need help, i have all my models installed, and have my microphone and headset plugged in and put the correct ones into the settings. but once i start the audio conversion it crashes. Can anyone help?

prodbyMyles-qx
Автор

Kinda waiting and hoping for ElvenLabs to release a voice changer one day.

valermo
Автор

i have an rx580 with 8gb and does this error ''Error opening Stream: Illegal combination of I/O devices [PaErrorCode -9993]''

FoxyWT
Автор

*Bro really finding excellent excuse's to use gura's voice and im here for it* 👏🐴

DjHazardous
Автор

why there no just one installer with all requirements? it would make life easier. I takes so much time and power to install this.

ALEXVORN
Автор

Question, how do you convert mp3 or wav files to the correct format to use other clones?

jimbarrofficial
Автор

Does this work on Mac? I noticed something about dependencies for Mac in the readme, but I can't read Japanese so not sure if they talk about building the app for macOS

PaulRadaker