Multispeaker Community Vocoder model for DiffSinger

Trained with ~95 hours of varied singing data.

The goal of our vocoder model is to provide more quality possibilities that can be brought out of DiffSinger acoustics. The vocoder can be used with any voice.
We would like to send huge thanks to the Western Diffsinger community for providing datasets for training! Without them, this model wouldn’t be possible.

Code used to train the vocoder is avaible on HiFiPLN Github.
Pretrained checkpoint for finetuning is available at the bottom of this page.

  • Model Developed & Trained by Scarfmonster
  • Data coordination by PixPrucer

How To Use

  1. Download your vocoder of choice
  2. Drag and drop the downloaded .oudep file onto the OpenUtau window.
  3. Change the dsconfig.yaml configuration file of your chosen voice and set vocoder: to a proper value. For your convenienve the setting is listed in the table. Save the file, restart OpenUtau if you had it opened.

Download

Name Download Version dsconfig.yaml Notes
hifipln 89,1 MB | -- 1.1 (2024-02-17)vocoder: hifipln_1.1sample_rate: 44100, n_mels: 128, hop_length: 512
ddsppln 23.2 MB | -- 1.0 (2024-01-29)vocoder: ddsppln_1.0A DDSP-like vocoder.
Not realistic, somewhat robotic, but some people like the sound.
sample_rate: 44100, n_mels: 128, hop_length: 512
hifipln 89,1 MB | -- 1.0 (2024-02-03)vocoder: hifipln_1.0sample_rate: 44100, n_mels: 128, hop_length: 512

Singing databases used for training the model

Name Length Languages Contributor
AdoVoc Pro 00:05:28 Caló, Spanish AdoVoc Pro
A.I.chi 01:47:52 English, Japanese Peeslubn
Aida 04:00:48 EN, JA, DE, FR Violin
Albert 01:17:14 Polish SzTJ
Aleks 00:06:10 Polish SzTJ
Ameko Kero 02:54:31 English, Japanese HoodyPisDed
Ariel 01:09:01 Japanese ariika
Brent 00:05:15 Spanish Beatrix
Cantoria Dataset 02:25:08 Spanish Cantoria Dataset
Codie 01:00:37 Japanese code41den
Deshi 01:47:45 Japanese, Tagalog UtaUtaUtau
Esmuc Choir Dataset 00:21:31 German Esmuc Choir Dataset
Evelyn 01:05:32 English Violin
Filip 01:17:49 Polish Rainygardens
Geppei 00:30:15 Japanese, Polish, Ukrainian vahntanabe
Hania 00:05:32 Polish SzTJ
Hisaki 02:42:57 Japanese ryutsu
Inka 00:39:18 English, Japanese postTEENIDOL
Jalo 00:54:53 Polish SzTJ
Karasu 00:49:55 Japanese rev
Kazuo 00:33:40 Japanese Felipe Souza
Kiiro 01:44:54 English, Japanese Ryouichi
Konryuu 01:10:55 Japanese PixPrucer
Kurenai 00:55:26 Japanese liure
Leif 01:28:40 English, Japanese Tigermeat
Lem 00:14:22 Polish Wik
Liee 00:25:01 JA, EN, Latin julieraptor
Makam Acapella 00:38:53 Turkish Makam Acapella
Makku 02:06:58 JA, EN, ES, IT Gianloop
Mat 00:35:44 Polish hq_png
Matsuki Max 01:25:32 Japanese Haraoo
Mava 01:46:33 English, Japanese Enzo
Mora 01:49:03 English, Japanese funhouse
Namine Criss 00:31:02 Spanish, Japanese CrissZ3R0VZ
Nanabot 00:29:23 English postTEENIDOL
Naoky 03:31:55 EN, JA, KO, ZH xuu
—— 03:55:02 —— Anonymous Contributor
Paulina 00:29:34 Polish SzTJ
Peiton 02:31:09 English NebulaMeadow
PIX 04:10:54 Polish, Japanese PixPrucer
Otozora Rinly 02:49:43 Japanese UniverStars
Ron 02:28:10 EN, JA, PL, KO, ZH Galanist
Rose 00:42:39 Polish, Japanese Kisa
Ryszard 02:24:16 Polish, Japanese Scarfmonster
—— 01:50:00 —— Anonymous Contributor
Singing Database 02:46:46 Chinese, Italian Singing Database
Ace 02:50:26 English, Japanese SpoopyAce
Stefan 02:49:07 Polish, Latin SzTJ
Suzu 01:42:03 Japanese ariika
Taylor 01:09:24 English postTEENIDOL
Teo Vampa 01:56:33 Japanese Delphic
Tetsu 01:13:46 Japanese ariika
Tiger 03:31:27 EN, ES, JP, KO, ZH, PT, FR Tigermeat
Tomo 00:57:00 Spanish, Japanese Tomo
Vocadito 00:13:37 EN, FR, HAW, ES, TL, Valencian Vocadito
VocalSet 08:46:18 Vocalise VocalSet
Wanda 01:11:03 Polish Vieri
Wioletta 00:32:56 Polish SzTJ
Zethiel Yu 02:19:19 English xiel exalt
Zethiel Zero 00:32:07 English, Japanese xiel exalt
Total length: 98:28:51
Used length: 82:06:15

Pitch distribution

Dataset

Dataset pitch Distrbution

After augmentation

Augmented Dataset pitch Distrbution

Checkpoints for finetuning

Name Download Version Notes
hifipln 378 MB | -- 1.0 (2024-02-03)sample_rate: 44100, n_mels: 128, hop_length: 512