- 參考github網址:
GitHub - roedoejet/FastSpeech2: An implementation of Microsoft’s “FastSpeech 2: Fast and High-Quality End-to-End Text to Speech”
- 數據訓練所用python 命令:
python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml
- 數據訓練代碼解析
3.1 代碼架構overview:
通過 if __name__ == "__main__"運行整個py文件:
調用 “train.txt"和dataset.py加載數據,
調用utils文件夾下的model.py加載模型,聲碼器,
調用model文件夾下的loss.py中的FastSpeech2Loss class 設置損失函數,
用前面加載的模型和損失函數開始訓練模型,導出結果并記錄日志。
3.2 按訓練步驟分解代碼:
Step?0?: 定義可控訓練參數, 調動main函數
if __name__ == "__main__":#Define Argsparser = argparse.ArgumentParser()parser.add_argument("--restore_step", type=int, default=0)parser.add_argument("-p","--preprocess_config",type=str,required=True,help="path to preprocess.yaml",)parser.add_argument("-m", "--model_config", type=str, required=True, help="path to model.yaml")parser.add_argument("-t", "--train_config", type=str, required=True, help="path to train.yaml")args = parser.parse_args() #args為可控訓練參數# Read Configpreprocess_config = yaml.load(open(args.preprocess_config, "r"), Loader=yaml.FullLoader)model_config = yaml.load(open(args.model_config, "r"), Loader=yaml.FullLoader)train_config = yaml.load(open(args.train_config, "r"), Loader=yaml.FullLoader)configs = (preprocess_config, model_config, train_config)#Run _main_ functionmain(args, configs)
Step 1 : 啟動main函數,加載可控訓練參數
def main(args, configs): print("Prepare training ...")#加載可控訓練參數preprocess_config, model_config, train_config = configs
Step 2 : 從train.txt加載數據,并經由dataset.py和torch里的Dataloader處理
def main(args, configs):# Get datasetdataset = Dataset("train.txt", preprocess_config, train_config, sort=True, drop_last=True) #從 train.txt 中獲取datasetbatch_size = train_config["optimizer"]["batch_size"]group_size = 4 # Set this larger than 1 to enable sorting in Dataset,初始值為4assert batch_size * group_size < len(dataset)loader = DataLoader(dataset,batch_size=batch_size * group_size,shuffle=True,collate_fn=dataset.collate_fn,)
Step 3 : 定義模型,聲碼器,損失函數
def main(args, configs):# Prepare modelmodel, optimizer = get_model(args, configs, device, train=True) #設置優化器# 將模型并行訓練并移入計算設備中model = nn.DataParallel(model) # Model Has Been Defined# 計算模型參數量num_param = get_param_num(model) # Number of TTS Parameters: num_paramprint("Number of FastSpeech2 Parameters:", num_param)# 設置損失函數Loss = FastSpeech2Loss(preprocess_config, model_config).to(device)# 加載聲碼器vocoder = get_vocoder(model_config, device)
Step 4 : 加載日志,在"./output/log/AISHELL3"目錄建立train, val兩個文件夾來記錄日志
def main(args, configs):# Init loggerfor p in train_config["path"].values():os.makedirs(p, exist_ok=True)train_log_path = os.path.join(train_config["path"]["log_path"], "train")val_log_path = os.path.join(train_config["path"]["log_path"], "val")os.makedirs(train_log_path, exist_ok=True)os.makedirs(val_log_path, exist_ok=True)train_logger = SummaryWriter(train_log_path)val_logger = SummaryWriter(val_log_path)
Step 5 : 準備訓練,加載可控訓練參數
def main(args, configs):# Trainingstep = args.restore_step + 1epoch = 1grad_acc_step = train_config["optimizer"]["grad_acc_step"]grad_clip_thresh = train_config["optimizer"]["grad_clip_thresh"]total_step = train_config["step"]["total_step"]log_step = train_config["step"]["log_step"]save_step = train_config["step"]["save_step"]synth_step = train_config["step"]["synth_step"]val_step = train_config["step"]["val_step"]outer_bar = tqdm(total=total_step, desc="Training", position=0)outer_bar.n = args.restore_stepouter_bar.update()
Step 6 : 準備訓練,加載進度條,調動utils文件夾下tools.py中的to_device function來提取數據
while True:inner_bar = tqdm(total=len(loader), desc="Epoch {}".format(epoch), position=1)for batchs in loader:for batch in batchs:batch = to_device(batch, device)
Step 7 :開始訓練,前向傳播,計算損失,反向傳播,梯度剪枝,更新模型權重參數
#Load Datafor batch in batchs:batch = to_device(batch, device)# Forwardoutput = model(*(batch[2:]))# Cal Losslosses = Loss(batch, output)total_loss = losses[0]# Backwardtotal_loss = total_loss / grad_acc_steptotal_loss.backward()if step % grad_acc_step == 0:# Clipping gradients to avoid gradient explosionnn.utils.clip_grad_norm_(model.parameters(), grad_clip_thresh)# Update weightsoptimizer.step_and_update_lr()optimizer.zero_grad()
Step 8 : 當訓練步數到達預先設定的log_step時,調動utils文件夾下tool.py里的log function,記錄loss和step
if step % log_step == 0:losses = [l.item() for l in losses]message1 = "Step {}/{}, ".format(step, total_step)message2 = "Total Loss: {:.4f}, Mel Loss: {:.4f}, Mel PostNet Loss: {:.4f}, Pitch Loss: {:.4f}, Energy Loss: {:.4f}, Duration Loss: {:.4f}".format(*losses)with open(os.path.join(train_log_path, "log.txt"), "a") as f:f.write(message1 + message2 + "\n")outer_bar.write(message1 + message2)log(train_logger, step, losses=losses)
Step 9 : 當訓練步數到達預先設定的synth_step時,調動utils文件夾下tool.py里的log function 和?synth_one_sample function(具體用來干什么沒看懂)
if step % synth_step == 0:fig, wav_reconstruction, wav_prediction, tag = synth_one_sample(batch,output,vocoder,model_config,preprocess_config,)log(train_logger,fig=fig,tag="Training/step_{}_{}".format(step, tag),)sampling_rate = preprocess_config["preprocessing"]["audio"]["sampling_rate"]log(train_logger,audio=wav_reconstruction,sampling_rate=sampling_rate,tag="Training/step_{}_{}_reconstructed".format(step, tag),)log(train_logger,audio=wav_prediction,sampling_rate=sampling_rate,tag="Training/step_{}_{}_synthesized".format(step, tag),)
Step 10 : 當訓練步數到達預先設定的val_step時,調動evaluate.py里的evaluate function來進行evaluation,并記錄在log/AISHELL3/val/log.txt
if step % val_step == 0:model.eval()message = evaluate(model, step, configs, val_logger, vocoder)with open(os.path.join(val_log_path, "log.txt"), "a") as f:f.write(message + "\n")outer_bar.write(message)model.train()
Step 11 : 當訓練步數到達預先設定的save_step時,保存訓練模型
if step % save_step == 0:torch.save({"model": model.module.state_dict(),"optimizer": optimizer._optimizer.state_dict(),},os.path.join(train_config["path"]["ckpt_path"],"{}.pth.tar".format(step),),)
Step 12 : 當訓練步數到達預先設定的total_step時,退出訓練
if step == total_step:quit()step += 1outer_bar.update(1)inner_bar.update(1)epoch += 1
- 數據訓練代碼的輸出
在train_log_path和val_log_path輸出日志
在ckpt_path輸出訓練過程中按照save_step存儲的模型