AI-powered voice denoiser, optimized for Apple Silicon

Post by James Perrett » Sat Jan 21, 2023 12:46 pm

iansampson wrote: ↑Sat Jan 21, 2023 3:31 am That also touches on the question of whether this voice denoiser (and any spinoff tools) would be more useful as a plugin or a standalone app. Right now it’s an app mainly for practical reasons: getting something with this much complexity to run in real-time is quite a challenge, and for the moment it’s more consistent and more stable to process all the audio offline. Though an offline editor can have advantages too, like the ability to select parts of a spectrogram and just de-noise those rather than the whole signal.

On the other hand, I did get a working prototype of an AU plugin running in Logic (on an M1 Mac Mini)—but with 300 ms of latency, and only one instance at a time. Thanks to the Neural Engine handling all the processing, the CPU usage stays pretty low, but it’s still a bit cumbersome—you’d have to make liberal use of the bounce/freeze functions if you needed multiple instances. I wonder if that’d still be useful?

I've tried a couple of Spleeter based VST plug-ins but they really haven't been very practical. For this sort of work I don't think you would lose much by keeping it as a simple standalone app. I keep all my processed files as FLAC in order to reduce the space needed although, as you are Apple based, I guess ALAC would be better for you.

Post by iansampson » Tue Jan 24, 2023 11:00 pm

Wonks wrote: ↑Sat Jan 21, 2023 12:09 pm I was thinking more along the lines of a built-in processing-tool that comes with a DAW, so it wouldn't need to be real time and latency wouldn't be an issue.

Ah yes, a built-in offline processor for Logic would definitely come in handy. I tend to keep all my edited/assembled takes as separate regions so I can always go back and change things later — it’d be very convenient to process those all together without having to export anything. As you say, though, there’s no way to add extensions (besides real-time plugins) without going through Apple, and I have no idea how you’d even approach them about that. So for now, yes, a standalone app may well be the best way to go.

Post by iansampson » Tue Jan 24, 2023 11:30 pm

James Perrett wrote: ↑Sat Jan 21, 2023 12:46 pm I keep all my processed files as FLAC in order to reduce the space needed although, as you are Apple based, I guess ALAC would be better for you.

Ah, that’s good to know. The app can import FLAC and ALAC already, but only exports .wav and .aif — I’ll work on adding support for losslessly compressed formats to a future release.

James Perrett wrote: ↑Sat Jan 21, 2023 12:46 pm I’ve tried a couple of Spleeter based VST plug-ins but they really haven't been very practical.

Out of curiosity, what was it about the Spleeter-based plugins that made them impractical? Was the audio quality not good enough, or is a plugin just not the right tool for that sort of thing?

James Perrett wrote: ↑Sat Jan 21, 2023 12:46 pm For this sort of work I don't think you would lose much by keeping it as a simple standalone app.

So far I’ve kept things as simple as can be: a minimal drag-and-drop interface, where you drop audio files onto the app and it saves denoised versions to a specified folder. More like a batch processor, I suppose, since you can do several at a time. Ultimately, I’d like to move towards a document-based approach, where you open an audio file and see the waveform and/or spectrogram, with the ability to process the whole thing or just selected regions. Turns out spectrograms are pretty tricky to implement efficiently, though, so a full-blown editor like that may have to wait a little.

Post by James Perrett » Wed Jan 25, 2023 12:21 am

iansampson wrote: ↑Tue Jan 24, 2023 11:30 pm Out of curiosity, what was it about the Spleeter-based plugins that made them impractical? Was the audio quality not good enough, or is a plugin just not the right tool for that sort of thing?

They were very processor hungry and weren't very stable. While my computer is fairly old, it is still reasonably powerful compared to many so I would expect it to be able to run most things - just maybe only a single instance rather than 2 or 3 that a modern computer could run.

iansampson wrote: ↑Tue Jan 24, 2023 11:30 pm Ultimately, I’d like to move towards a document-based approach, where you open an audio file and see the waveform and/or spectrogram, with the ability to process the whole thing or just selected regions. Turns out spectrograms are pretty tricky to implement efficiently, though, so a full-blown editor like that may have to wait a little.

In my experience, my old copy of Adobe Audition does spectrograms far better than Izotope RX does. RX always redraws the spectrum from the left hand side which is very annoying when you are scrolling through a file because you have to wait for the whole screen to redraw before you see the new part. Audition seems smarter because it doesn't seem to redraw the whole spectrum all the time - only the parts that have newly come into view.

It may be worth exploring the Cockos WDL library - they certainly have FFT functions - I don't know if they've included the code they use to do spectral displays in Reaper.

https://www.cockos.com/wdl/

Post by iansampson » Wed Mar 22, 2023 5:11 pm

Sorry for the hiatus! Had my nose to the grindstone the last couple months getting the app ready for public release. For anyone that’s interested, Luke Wood just published a great write-up about it over in the news section.

James Perrett wrote: ↑Wed Jan 25, 2023 12:21 am
iansampson wrote: ↑Tue Jan 24, 2023 11:30 pm Ultimately, I’d like to move towards a document-based approach, where you open an audio file and see the waveform and/or spectrogram, with the ability to process the whole thing or just selected regions. Turns out spectrograms are pretty tricky to implement efficiently, though, so a full-blown editor like that may have to wait a little.

In my experience, my old copy of Adobe Audition does spectrograms far better than Izotope RX does. RX always redraws the spectrum from the left hand side which is very annoying when you are scrolling through a file because you have to wait for the whole screen to redraw before you see the new part. Audition seems smarter because it doesn't seem to redraw the whole spectrum all the time - only the parts that have newly come into view.

As for spectrograms — to revive a two-month-old topic! — I’m still working on it, but taking the Audition-like approach that James recommends here, i.e. redrawing only the parts that come into view. Easier said than done, though, if you want the spectra to follow a non-uniform scale (like human hearing — vs. FFT bins which are linear along the frequency axis) and be resizable, etc. For now, the app is still a batch processor with a simple drag-and-drop interface, but I do hope to get the spectrogram editor working in the near future.

Post by resistorman » Wed Mar 22, 2023 7:46 pm

Sure could have used this 6 weeks ago

Congrats...

Post by iansampson » Thu Mar 23, 2023 3:00 am

resistorman wrote: ↑Wed Mar 22, 2023 7:46 pm Sure could have used this 6 weeks ago Congrats...

Haha thanks! If only I’d known, I’d have finished it sooner :p.

Post by Wonks » Thu Mar 23, 2023 10:56 am

Well done, Ian.

Post by iansampson » Thu Mar 23, 2023 4:03 pm

Thanks! Really appreciate all the folks who posted earlier in this thread — gave me a much-needed push to finally get this done.

AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon

Re: AI-powered voice denoiser, optimized for Apple Silicon