接口概述
Index-TTS2 同步语音合成 API 是一款支持文件上传的语音合成接口,用户可通过上传说话人音色参考音频与情感参考音频,生成具备特定情感基调与专属音色的语音内容。该接口支持流式输出,可满足实时语音合成需求;同时,考虑到网站用户量大、接口调用频次高可能导致的调用失败问题,我们已新增其他算力接口作为补充,保障接口调用的稳定性与流畅性。
如需要在线使用Index-TTS2 语音合成,请访问Index-TTS2在线语音合成,可选模式1进行体验!
接口信息
- 接口地址:
https://www.yuntts.com/api/v1/indextts2_infer - 请求方法:
POST - 认证方式:
Authorization: Bearer <API_KEY> - Content-Type:
multipart/form-data
请求参数
| 参数名 | 类型 | 必填 | 默认值 | 描述 |
|---|---|---|---|---|
input |
string | 是 | - | 要合成的文本 |
speed |
float | 否 | 1.0 |
语速,范围 0.1-4.0 |
sample_rate |
int | 否 | 24000 |
目标音频采样率,支持 16000、22050、24000 |
gain |
float | 否 | 1.0 |
音量,范围 0.1-10.0 |
interval_silence |
int | 否 | 200 |
句子间隔静音 (ms) |
use_random |
boolean | 否 | false |
启用情绪随机性 |
stream_mode |
boolean | 否 | true |
启用流式输出 |
emo_control_method |
int | 否 | 0 |
情绪控制方式:0=无情感参考,1=基于情绪音频,2=基于情绪向量,3=基于情绪文本 |
emo_alpha |
float | 否 | - | 情感权重,范围 0.0-1.0,仅在 emo_control_method 不为 0 时有效 |
emo_vec |
array | 否 | - | 情绪向量,8个维度:[高兴,生气,悲伤,害怕,厌恶,忧郁,惊讶,平静],每个维度范围 0-1.2,总和不超过1.5,仅在 emo_control_method=2 时有效 |
emo_text |
string | 否 | - | 情感文本,仅在 emo_control_method=3 时有效 |
spk_audio_file |
file | 否 | - | 说话人音色参考音频,支持 wav、mp3 格式,最大 20MB |
emo_audio_file |
file | 否 | - | 情感参考音频,支持 wav、mp3 格式,最大 10MB,仅在 emo_control_method=1 时必填 |
情绪向量维度说明
情绪向量包含 8 个维度,顺序如下:
- 高兴 - 表示愉快、快乐的情绪
- 生气 - 表示愤怒、恼火的情绪
- 悲伤 - 表示难过、悲伤的情绪
- 害怕 - 表示恐惧、害怕的情绪
- 厌恶 - 表示讨厌、厌恶的情绪
- 忧郁 - 表示忧郁、消沉的情绪
- 惊讶 - 表示惊讶、震惊的情绪
- 平静 - 表示平静、安宁的情绪
每个维度的值范围为 [0, 1.2],且所有维度的值相加不能大于 1.5。
请求示例
使用 cURL 发送请求
案例1:无情感参考 (emo_control_method=0)
curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "input=你好,欢迎使用 Index-TTS2 语音合成服务" \
-F "speed=1.0" \
-F "emo_control_method=0" \
-F "spk_audio_file=@speaker.wav"
案例2:基于情绪音频 (emo_control_method=1)
curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "input=你好,欢迎使用 Index-TTS2 语音合成服务" \
-F "speed=1.0" \
-F "emo_control_method=1" \
-F "emo_alpha=0.6" \
-F "spk_audio_file=@speaker.wav" \
-F "emo_audio_file=@emotion.wav"
案例3:基于情绪向量 (emo_control_method=2)
curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer"
-H "Authorization: Bearer YOUR_API_KEY"
-F "input=你好,欢迎使用 Index-TTS2 语音合成服务"
-F "speed=1.0"
-F "emo_control_method=2"
-F "emo_vec=[0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2]"
-F "emo_alpha=0.6"
-F "spk_audio_file=@speaker.wav"
案例4:基于情绪文本 (emo_control_method=3)
curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "input=你好,欢迎使用 Index-TTS2 语音合成服务" \
-F "speed=1.0" \
-F "emo_control_method=2" \
-F "emo_vec=[0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2]" \
-F "emo_alpha=0.6" \
-F "spk_audio_file=@speaker.wav"
使用 JavaScript 发送请求
// 构建表单数据
const formData = new FormData();
formData.append('input', '你好,欢迎使用 Index-TTS2 语音合成服务');
formData.append('response_format', 'mp3');
formData.append('speed', '1.0');
formData.append('emo_control_method', '2');
formData.append('emo_vec', JSON.stringify([0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2]));
formData.append('emo_alpha', '0.6');
formData.append('spk_audio_file', spkAudioFile); // spkAudioFile 是从文件输入获取的文件对象
// 发送请求
const response = await fetch('https://www.yuntts.com/api/v1/indextts2_infer', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
},
body: formData
});
// 解析响应
const data = await response.json();
if (response.ok) {
console.log('合成成功:', data);
console.log('音频URL:', data.data.audio_url);
} else {
console.error('合成失败:', data.message);
}
使用 Python 发送请求 (基于情绪文本)
import requests
url = "https://www.yuntts.com/api/v1/indextts2_infer"
headers = {
"Authorization": "Bearer YOUR_API_KEY"
}
# 准备表单数据
data = {
"input": "你好,欢迎使用 Index-TTS2 语音合成服务",
"response_format": "mp3",
"speed": "1.0",
"emo_control_method": "3",
"emo_text": "开心",
"emo_alpha": "0.6"
}
# 准备文件数据
files = {
"spk_audio_file": open("speaker.wav", "rb")
}
# 发送请求
response = requests.post(url, headers=headers, data=data, files=files)
# 处理响应
if response.status_code == 200:
result = response.json()
print("合成成功!")
print(f"音频URL: {result['data']['audio_url']}")
else:
result = response.json()
print(f"合成失败: {result['message']}")
# 关闭文件
files["spk_audio_file"].close()
前端参考文件
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>IndexTTS2 语音合成测试页面</title>
<link href="https://cdn.staticfile.net/bootstrap/5.3.2/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
<style>
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
background-color: #f8f9fa;
}
.upload-area {
border: 2px dashed #ced4da;
border-radius: 8px;
padding: 2rem;
text-align: center;
transition: all 0.3s ease;
}
.upload-area:hover {
border-color: #007bff;
background-color: #f0f8ff;
}
.upload-area.dragover {
border-color: #007bff;
background-color: #e3f2fd;
}
.card {
box-shadow: 0 0.5rem 1rem rgba(0, 0, 0, 0.15);
border-radius: 0.75rem;
}
#result-container {
display: none;
}
.audio-player {
width: 100%;
margin-top: 1rem;
}
</style>
</head>
<body>
<!-- 页面头部 -->
<header class="bg-primary text-white py-5">
<div class="container">
<div class="text-center">
<h1 class="display-5 fw-bold mb-4">Index‑TTS2 语音合成测试</h1>
<p class="lead mb-4">测试 API: /wp-json/v1/indextts2_infer</p>
</div>
</div>
</header>
<!-- 主内容区域 -->
<section class="py-5">
<div class="container">
<div class="row g-4">
<!-- 左侧表单 -->
<div class="col-md-7">
<div class="card p-4">
<form id="synthesize-form" enctype="multipart/form-data">
<!-- API密钥输入 -->
<div class="mb-4">
<label for="api-key" class="form-label">API 密钥</label>
<input type="text" class="form-control" id="api-key" placeholder="请输入API密钥" required>
<div class="form-text">API密钥用于身份验证</div>
</div>
<!-- 说话人音色参考音频 -->
<div class="mb-4">
<label class="form-label">说话人音色参考音频</label>
<div class="upload-area mb-3" id="spk-audio-dropzone">
<input type="file" id="spk-audio-file" name="spk_audio_file" class="d-none" accept="audio/mp3,audio/wav">
<i class="fas fa-cloud-upload display-4 text-primary mb-2"></i>
<p class="mb-1">点击或拖拽音频文件至此处上传</p>
<p class="text-muted small">支持WAV、MP3格式,最大20MB</p>
</div>
<div class="upload-area" id="spk-audio-result" style="display: none;">
<div id="spk-waveform"></div>
<div class="d-flex gap-2 mt-2">
<button type="button" class="btn btn-sm btn-outline-primary" id="spk-audio-play">
<i class="fas fa-play me-1"></i>播放
</button>
<button type="button" class="btn btn-sm btn-outline-secondary" id="spk-audio-stop">
<i class="fas fa-stop me-1"></i>停止
</button>
<button type="button" class="btn btn-sm btn-outline-secondary ms-auto" id="spk-audio-remove">
<i class="fas fa-times me-1"></i>更换音频
</button>
</div>
</div>
</div>
<!-- 合成文本 -->
<div class="mb-4">
<label for="synthesize-text" class="form-label">合成文本</label>
<textarea class="form-control" id="synthesize-text" rows="5" placeholder="请输入要合成的文本" required maxlength="600"></textarea>
<div class="d-flex justify-content-between align-items-center mt-2">
<div class="form-text mb-0">支持中文、英文等多种语言</div>
<div class="form-text mb-0">
<span id="char-count-display">0</span>/600 字符
</div>
</div>
</div>
<!-- 高级设置 -->
<div class="mb-4">
<button class="btn btn-primary w-100 d-flex justify-content-between align-items-center" type="button" data-bs-toggle="collapse" data-bs-target="#advanced-options" aria-expanded="false" aria-controls="advanced-options">
<span>高级设置</span>
<i class="fas fa-chevron-down"></i>
</button>
<div class="collapse" id="advanced-options">
<div class="card card-body mt-2">
<div class="row g-3">
<!-- 输出格式:远程API只返回WAV,不需要选择 -->
<div class="col-md-6 mb-4">
<label for="sample-rate" class="form-label">目标音频采样率</label>
<select class="form-select" id="sample-rate">
<option value="16000">16000</option>
<option value="22050">22050</option>
<option value="24000" selected>24000</option>
</select>
</div>
<div class="col-md-6 mb-4">
<label for="interval-silence" class="form-label">句子间隔静音 (ms) (默认值200)</label>
<input type="number" class="form-control" id="interval-silence" min="0" max="1000" value="200" />
</div>
<!-- 单句最大Token数:文档中未定义,已删除 -->
<div class="col-md-6 mb-4">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="speed" class="form-label mb-0">语速 (默认1.0)</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="speed-value">1.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="speed" min="0.1" max="4.0" step="0.1" value="1.0" oninput="document.getElementById('speed-value').textContent = this.value" />
</div>
</div>
<div class="col-md-6 mb-4">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="gain" class="form-label mb-0">音量 (默认1.0)</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="gain-value">1.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="gain" min="0.1" max="10.0" step="0.1" value="1.0" oninput="document.getElementById('gain-value').textContent = this.value" />
</div>
</div>
</div>
</div>
</div>
</div>
<!-- 情绪随机性 -->
<div class="my-5">
<div class="form-check form-switch">
<input class="form-check-input" type="checkbox" id="use-random" />
<label class="form-check-label" for="use-random">启用情绪随机性</label>
</div>
<div class="form-text"> 在情绪控制中引入一定随机性,增加多样性 </div>
</div>
<!-- 流式输出 -->
<div class="my-5">
<div class="form-check form-switch">
<input class="form-check-input" type="checkbox" id="stream-mode" checked />
<label class="form-check-label" for="stream-mode">启用流式输出</label>
</div>
<div class="form-text"> 开启后将实时返回音频流,关闭后将等待合成完成后返回完整音频文件 </div>
</div>
<!-- 情绪控制方式 -->
<div class="mb-4">
<label class="form-label">情绪控制方式</label>
<div class="row g-3">
<div class="col-md-4 emotion-control-option" data-target="emotion-control-0" style="cursor: pointer;">
<div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
<input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-0" value="0" checked onchange="toggleEmotionControl()" />
<label class="form-check-label" for="emotion-control-0"> 无情感参考 </label>
</div>
</div>
<div class="col-md-4 emotion-control-option" data-target="emotion-control-1" style="cursor: pointer;">
<div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
<input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-1" value="1" onchange="toggleEmotionControl()" />
<label class="form-check-label" for="emotion-control-1"> 基于情绪音频 </label>
</div>
</div>
<div class="col-md-4 emotion-control-option" data-target="emotion-control-2" style="cursor: pointer;">
<div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
<input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-2" value="2" onchange="toggleEmotionControl()" />
<label class="form-check-label" for="emotion-control-2"> 基于情绪向量 </label>
</div>
</div>
<div class="col-md-4 emotion-control-option" data-target="emotion-control-3" style="cursor: pointer;">
<div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
<input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-3" value="3" onchange="toggleEmotionControl()" />
<label class="form-check-label" for="emotion-control-3"> 基于情绪文本 </label>
</div>
</div>
</div>
<div class="form-text">选择模型采用的情感控制策略,选择基于情绪音频时,需要上传情绪参考音频</div>
</div>
<!-- 基于情绪音频 -->
<div class="mb-4" id="emotion-audio-section">
<label class="form-label">情感参考音频(可选)</label>
<div class="upload-area mb-3" id="emo-audio-dropzone">
<input type="file" id="emo-audio-file" name="emo_audio_file" class="d-none" accept="audio/mp3,audio/wav">
<i class="fas fa-cloud-upload display-4 text-primary mb-2"></i>
<p class="mb-1">点击或拖拽音频文件至此处上传</p>
<p class="text-muted small">支持WAV、MP3格式,最大10MB</p>
</div>
<div class="upload-area" id="emo-audio-result" style="display: none;">
<div id="emo-waveform"></div>
<div class="d-flex gap-2 mt-2">
<button type="button" class="btn btn-sm btn-outline-primary" id="emo-audio-play">
<i class="fas fa-play me-1"></i>播放
</button>
<button type="button" class="btn btn-sm btn-outline-secondary" id="emo-audio-stop">
<i class="fas fa-stop me-1"></i>停止
</button>
<button type="button" class="btn btn-sm btn-outline-secondary ms-auto" id="emo-audio-remove">
<i class="fas fa-times me-1"></i>更换音频
</button>
</div>
</div>
</div>
<!-- 情感向量控制 -->
<div class="mb-4 bg-light rounded-4 border border-secondary-subtle p-4" id="emotion-vector-section" style="display: none;">
<label class="form-label fw-medium py-3">情感向量维度权重</label>
<div class="alert alert-info mb-4 p-3 rounded-3 border border-info-subtle">
<strong>总和限制提示:</strong>所有情感维度的值相加不能超过1.5。 <span class="float-end">当前总和:<strong id="emotion-vector-total">0.0</strong></span>
</div>
<div class="row g-3">
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-happy" class="form-label text-body mb-0">高兴</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-happy-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-happy" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-happy-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-angry" class="form-label text-body mb-0">生气</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-angry-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-angry" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-angry-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-sad" class="form-label text-body mb-0">悲伤</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-sad-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-sad" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-sad-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-fear" class="form-label text-body mb-0">害怕</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-fear-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-fear" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-fear-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-disgust" class="form-label text-body mb-0">厌恶</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-disgust-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-disgust" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-disgust-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-melancholy" class="form-label text-body mb-0">忧郁</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-melancholy-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-melancholy" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-melancholy-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-surprise" class="form-label text-body mb-0">惊讶</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-surprise-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-surprise" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-surprise-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
<div class="col-md-6">
<div class="rounded-3 border border-primary-subtle p-3 bg-white">
<div class="d-flex justify-content-between align-items-center mb-2">
<label for="emotion-calm" class="form-label text-body mb-0">平静</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-calm-value">0.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-calm" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-calm-value').textContent = this.value; updateEmotionVectorTotal()" />
</div>
</div>
</div>
</div>
<!-- 基于情绪文本 -->
<div class="mb-4 shadow-none p-3 bg-light rounded" id="emotion-text-section" style="display: none;">
<label for="emo_text" class="form-label">情感 (可选)</label>
<input type="text" class="form-control" id="emo_text" list="emotion-options" placeholder="请选择或输入情感..." value="开心" />
<datalist id="emotion-options">
<option value="开心">
<option value="高兴">
<option value="生气">
<option value="悲伤">
<option value="害怕">
<option value="厌恶">
<option value="忧郁">
<option value="惊讶">
<option value="平静">
</datalist>
<div class="form-text mt-1">支持从列表中选择或手动输入自定义情感。</div>
</div>
<!-- 情感融合权重 -->
<div class="mb-4 rounded-3 border-primary-subtle" id="emotion-weight-section">
<div class="p-3 md-2">
<div class="d-flex justify-content-between align-items-center w-100 mb-2">
<label for="emo_alpha" class="form-label mb-0">情感融合权重 (0.0-1.0)</label>
<span class="form-text text-muted fw-semibold"> 当前值: <span id="emo_alpha-value">1.0</span>
</span>
</div>
<input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emo_alpha" min="0.0" max="1.0" step="0.1" value="1.0" oninput="document.getElementById('emo_alpha-value').textContent = this.value" />
</div>
<div class="form-text mt-1">参数用于控制情感特征对输出结果的影响程度,数值越大,生成语音的情绪特征越明显。</div>
</div>
<!-- 合成按钮 -->
<div class="d-grid gap-2">
<button type="submit" class="btn btn-primary btn-lg" id="synthesize-btn">
<span id="btn-text">开始合成</span>
<span id="btn-spinner" class="spinner-border spinner-border-sm" role="status" aria-hidden="true" style="display: none;"></span>
</button>
</div>
</form>
</div>
</div>
<!-- 右侧结果 -->
<div class="col-md-5">
<div class="card h-100">
<div class="card-header">
<h5 class="card-title mb-0">合成结果</h5>
</div>
<div class="card-body">
<!-- 空状态 -->
<div id="empty-state" class="text-center py-5">
<i class="fas fa-waveform fa-3x text-primary mb-3"></i>
<h5 class="text-muted mb-2">暂无合成结果</h5>
<p class="text-muted small">请先填写表单并点击"开始合成"按钮</p>
</div>
<!-- 结果状态 -->
<div id="result-container" class="py-3">
<div class="mb-3">
<label class="form-label">合成状态</label>
<div id="status-message" class="alert alert-info"></div>
</div>
<div class="mb-3">
<label class="form-label">音频预览</label>
<audio id="result-audio" class="audio-player" controls></audio>
</div>
<div class="mb-3">
<label class="form-label">音频链接</label>
<input type="text" id="audio-url" class="form-control" readonly>
<button type="button" class="btn btn-sm btn-outline-primary mt-2" onclick="copyAudioUrl()">
<i class="fas fa-copy me-1"></i>复制链接
</button>
</div>
<div class="mb-3">
<label class="form-label">消耗信息</label>
<div id="cost-info" class="bg-light p-3 rounded"></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<!-- 页脚 -->
<footer class="bg-dark text-white py-4">
<div class="container text-center">
<p class="mb-0">IndexTTS2 语音合成测试页面 © 2026</p>
</div>
</footer>
<!-- 必要的JavaScript -->
<script src="https://cdn.staticfile.net/bootstrap/5.3.2/js/bootstrap.bundle.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/wavesurfer.js/7.3.2/wavesurfer.min.js"></script>
<script>
// 全局变量
let spkWaveSurfer = null;
let emoWaveSurfer = null;
const allowedAudioExtensions = ['wav', 'mp3'];
const allowedAudioMimeTypes = ['audio/wav', 'audio/mp3', 'audio/mpeg'];
// DOM元素引用
const $synthesizeForm = document.getElementById('synthesize-form');
const $synthesizeBtn = document.getElementById('synthesize-btn');
const $btnText = document.getElementById('btn-text');
const $btnSpinner = document.getElementById('btn-spinner');
const $synthesizeText = document.getElementById('synthesize-text');
const $charCountDisplay = document.getElementById('char-count-display');
const $emptyState = document.getElementById('empty-state');
const $resultContainer = document.getElementById('result-container');
const $statusMessage = document.getElementById('status-message');
const $resultAudio = document.getElementById('result-audio');
const $audioUrl = document.getElementById('audio-url');
const $costInfo = document.getElementById('cost-info');
// 字符计数监听
$synthesizeText.addEventListener('input', function() {
const charCount = this.value.length;
$charCountDisplay.textContent = charCount;
});
// 情绪控制方式切换
function toggleEmotionControl() {
const emotionControl = document.querySelector('input[name="emotion-control"]:checked').value;
// 隐藏所有情感控制部分
document.getElementById('emotion-audio-section').style.display = 'none';
document.getElementById('emotion-vector-section').style.display = 'none';
document.getElementById('emotion-text-section').style.display = 'none';
document.getElementById('emotion-weight-section').style.display = 'none';
// 根据选择显示对应的UI
if (emotionControl === '0') {
// 无情感参考,不需要显示任何情感控制部分
} else if (emotionControl === '1') {
document.getElementById('emotion-audio-section').style.display = 'block';
document.getElementById('emotion-weight-section').style.display = 'block';
} else if (emotionControl === '2') {
document.getElementById('emotion-vector-section').style.display = 'block';
document.getElementById('emotion-weight-section').style.display = 'block';
} else if (emotionControl === '3') {
document.getElementById('emotion-text-section').style.display = 'block';
document.getElementById('emotion-weight-section').style.display = 'block';
}
}
// 更新情感向量总和
function updateEmotionVectorTotal() {
const emotions = ['happy', 'angry', 'sad', 'fear', 'disgust', 'melancholy', 'surprise', 'calm'];
let total = 0;
emotions.forEach(function(emotion) {
total += parseFloat(document.getElementById('emotion-' + emotion).value) || 0;
});
document.getElementById('emotion-vector-total').textContent = total.toFixed(1);
}
// 音频文件验证
function validateAudioFile(file) {
// 1. 检查文件是否为空
if (!file || file.size === 0) {
alert('请选择有效的音频文件,文件不能为空');
return false;
}
// 2. 检查文件大小(5MB限制)
const maxSize = 5 * 1024 * 1024;
if (file.size > maxSize) {
alert('音频文件不能超过 5MB');
return false;
}
// 3. 验证文件扩展名
const fileExtension = file.name.split('.').pop().toLowerCase();
if (!allowedAudioExtensions.includes(fileExtension)) {
alert('不支持的文件格式,仅支持WAV、MP3格式');
return false;
}
// 4. 验证MIME类型
if (!allowedAudioMimeTypes.includes(file.type) && !file.type.startsWith('audio/')) {
alert('无效的音频文件类型');
return false;
}
return true;
}
// 设置拖放事件处理
function setupDragDrop(dropzone, fileInput, handleFunc, type) {
dropzone.addEventListener('dragover', (e) => {
e.preventDefault();
dropzone.classList.add('dragover');
});
dropzone.addEventListener('dragleave', () => {
dropzone.classList.remove('dragover');
});
dropzone.addEventListener('drop', (e) => {
e.preventDefault();
dropzone.classList.remove('dragover');
const files = e.dataTransfer.files;
if (files.length > 0) {
if (validateAudioFile(files[0])) {
handleFunc(files[0], type);
}
}
});
}
// 处理音频文件
function handleAudioFile(file, type) {
const fileUrl = URL.createObjectURL(file);
const dropzone = type === 'spk' ? document.getElementById('spk-audio-dropzone') : document.getElementById('emo-audio-dropzone');
const result = type === 'spk' ? document.getElementById('spk-audio-result') : document.getElementById('emo-audio-result');
dropzone.style.display = 'none';
result.style.display = 'block';
// 初始化 wavesurfer 显示音谱
initWaveSurfer(fileUrl, type);
}
// 初始化 WaveSurfer
function initWaveSurfer(fileUrl, type) {
const container = type === 'spk' ? '#spk-waveform' : '#emo-waveform';
const waveColor = type === 'spk' ? '#4f46e5' : '#ef4444';
const progressColor = type === 'spk' ? '#818cf8' : '#f87171';
const playBtn = type === 'spk' ? document.getElementById('spk-audio-play') : document.getElementById('emo-audio-play');
if (type === 'spk' && spkWaveSurfer) {
spkWaveSurfer.destroy();
} else if (type === 'emo' && emoWaveSurfer) {
emoWaveSurfer.destroy();
}
const newWaveSurfer = WaveSurfer.create({
container: container,
waveColor: waveColor,
progressColor: progressColor,
cursorColor: waveColor,
barWidth: 2,
barRadius: 3,
height: 80,
normalize: true
});
newWaveSurfer.load(fileUrl);
newWaveSurfer.on('play', () => {
playBtn.innerHTML = '<i class="fas fa-pause me-1"></i>暂停';
});
newWaveSurfer.on('pause', () => {
playBtn.innerHTML = '<i class="fas fa-play me-1"></i>播放';
});
if (type === 'spk') {
spkWaveSurfer = newWaveSurfer;
} else {
emoWaveSurfer = newWaveSurfer;
}
}
// 清除音频文件
function clearAudioFile(e) {
const isSpk = e.target.id.includes('spk');
const dropzone = isSpk ? document.getElementById('spk-audio-dropzone') : document.getElementById('emo-audio-dropzone');
const result = isSpk ? document.getElementById('spk-audio-result') : document.getElementById('emo-audio-result');
const fileInput = isSpk ? document.getElementById('spk-audio-file') : document.getElementById('emo-audio-file');
const wavesurfer = isSpk ? spkWaveSurfer : emoWaveSurfer;
fileInput.value = '';
result.style.display = 'none';
dropzone.style.display = 'block';
if (wavesurfer) {
wavesurfer.stop();
wavesurfer.destroy();
if (isSpk) {
spkWaveSurfer = null;
} else {
emoWaveSurfer = null;
}
}
}
// 绑定事件
document.addEventListener('DOMContentLoaded', function() {
// 初始化情绪控制
toggleEmotionControl();
// 绑定说话人音频事件
const spkAudioDropzone = document.getElementById('spk-audio-dropzone');
const spkAudioFile = document.getElementById('spk-audio-file');
const spkAudioRemove = document.getElementById('spk-audio-remove');
const spkAudioPlay = document.getElementById('spk-audio-play');
const spkAudioStop = document.getElementById('spk-audio-stop');
spkAudioDropzone.addEventListener('click', () => spkAudioFile.click());
spkAudioFile.addEventListener('change', (e) => {
if (e.target.files.length > 0) {
if (validateAudioFile(e.target.files[0])) {
handleAudioFile(e.target.files[0], 'spk');
} else {
e.target.value = '';
}
}
});
spkAudioRemove.addEventListener('click', clearAudioFile);
spkAudioPlay.addEventListener('click', () => spkWaveSurfer && spkWaveSurfer.playPause());
spkAudioStop.addEventListener('click', () => spkWaveSurfer && spkWaveSurfer.stop());
// 设置拖放事件
setupDragDrop(spkAudioDropzone, spkAudioFile, handleAudioFile, 'spk');
// 绑定情感音频事件
const emoAudioDropzone = document.getElementById('emo-audio-dropzone');
const emoAudioFile = document.getElementById('emo-audio-file');
const emoAudioRemove = document.getElementById('emo-audio-remove');
const emoAudioPlay = document.getElementById('emo-audio-play');
const emoAudioStop = document.getElementById('emo-audio-stop');
emoAudioDropzone.addEventListener('click', () => emoAudioFile.click());
emoAudioFile.addEventListener('change', (e) => {
if (e.target.files.length > 0) {
if (validateAudioFile(e.target.files[0])) {
handleAudioFile(e.target.files[0], 'emo');
} else {
e.target.value = '';
}
}
});
emoAudioRemove.addEventListener('click', clearAudioFile);
emoAudioPlay.addEventListener('click', () => emoWaveSurfer && emoWaveSurfer.playPause());
emoAudioStop.addEventListener('click', () => emoWaveSurfer && emoWaveSurfer.stop());
// 设置拖放事件
setupDragDrop(emoAudioDropzone, emoAudioFile, handleAudioFile, 'emo');
// 表单提交事件
$synthesizeForm.addEventListener('submit', function(e) {
e.preventDefault();
synthesizeSpeech();
});
});
// 合成语音
async function synthesizeSpeech() {
// 显示加载状态
$btnText.style.display = 'none';
$btnSpinner.style.display = 'inline-block';
$synthesizeBtn.disabled = true;
try {
// 获取表单数据
const apiKey = document.getElementById('api-key').value;
const text = document.getElementById('synthesize-text').value;
const emotionControl = document.querySelector('input[name="emotion-control"]:checked').value;
const emoAlpha = document.getElementById('emo_alpha').value;
// 获取高级设置参数
const sampleRate = document.getElementById('sample-rate').value;
const intervalSilence = document.getElementById('interval-silence').value;
const speed = document.getElementById('speed').value;
const gain = document.getElementById('gain').value;
const useRandom = document.getElementById('use-random').checked;
const streamMode = document.getElementById('stream-mode').checked;
// 构建请求参数
const formData = new FormData();
formData.append('input', text);
formData.append('sample_rate', sampleRate);
formData.append('interval_silence', intervalSilence);
formData.append('speed', speed);
formData.append('gain', gain);
formData.append('use_random', useRandom);
formData.append('stream_mode', streamMode);
formData.append('emo_control_method', emotionControl);
// 只有在非0情绪控制方式时添加情感融合权重
if (emotionControl !== '0') {
formData.append('emo_alpha', emoAlpha);
}
// 添加情感相关参数
if (emotionControl === '2') {
// 情感向量
const emotions = ['happy', 'angry', 'sad', 'fear', 'disgust', 'melancholy', 'surprise', 'calm'];
const emoVec = emotions.map(emotion => {
return document.getElementById('emotion-' + emotion).value;
});
formData.append('emo_vec', JSON.stringify(emoVec));
} else if (emotionControl === '3') {
// 情感文本
const emoText = document.getElementById('emo_text').value;
formData.append('emo_text', emoText);
}
// 添加音频文件
const spkAudioFile = document.getElementById('spk-audio-file');
if (spkAudioFile.files.length > 0) {
formData.append('spk_audio_file', spkAudioFile.files[0]);
}
const emoAudioFile = document.getElementById('emo-audio-file');
if (emoAudioFile.files.length > 0) {
formData.append('emo_audio_file', emoAudioFile.files[0]);
}
// 发送请求
const response = await fetch('https://www.yuntts.com/api/v1/indextts2_infer', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + apiKey
},
body: formData
});
// 处理响应
if (streamMode) {
// 流式输出:直接处理音频数据
if (response.ok) {
const blob = await response.blob();
const audioUrl = URL.createObjectURL(blob);
$emptyState.style.display = 'none';
$resultContainer.style.display = 'block';
$statusMessage.className = 'alert alert-success';
$statusMessage.textContent = '合成成功(流式输出)!';
$resultAudio.src = audioUrl;
$audioUrl.value = audioUrl;
$costInfo.innerHTML = `
<p>字符数: ${text.length}</p>
<p>消耗: 计算中...</p>
`;
} else {
// 尝试解析错误响应
try {
const errorData = await response.json();
$emptyState.style.display = 'none';
$resultContainer.style.display = 'block';
$statusMessage.className = 'alert alert-danger';
$statusMessage.textContent = `合成失败: ${errorData.message || '未知错误'}`;
} catch {
$emptyState.style.display = 'none';
$resultContainer.style.display = 'block';
$statusMessage.className = 'alert alert-danger';
$statusMessage.textContent = '合成失败: 服务器错误';
}
$resultAudio.src = '';
$audioUrl.value = '';
$costInfo.innerHTML = '';
}
} else {
// 非流式输出:解析JSON响应
const data = await response.json();
if (response.ok) {
$emptyState.style.display = 'none';
$resultContainer.style.display = 'block';
$statusMessage.className = 'alert alert-success';
$statusMessage.textContent = '合成成功!';
$resultAudio.src = data.data.audio_url;
$audioUrl.value = data.data.audio_url;
$costInfo.innerHTML = `
<p>字符数: ${data.data.char_count}</p>
<p>消耗: ${data.data.points_deducted} 积分</p>
`;
} else {
$emptyState.style.display = 'none';
$resultContainer.style.display = 'block';
$statusMessage.className = 'alert alert-danger';
$statusMessage.textContent = `合成失败: ${data.message || '未知错误'}`;
$resultAudio.src = '';
$audioUrl.value = '';
$costInfo.innerHTML = '';
}
}
} catch (error) {
console.error('合成失败:', error);
$emptyState.style.display = 'none';
$resultContainer.style.display = 'block';
$statusMessage.className = 'alert alert-danger';
$statusMessage.textContent = '合成失败: 网络错误,请稍后重试';
} finally {
// 恢复按钮状态
$btnText.style.display = 'inline';
$btnSpinner.style.display = 'none';
$synthesizeBtn.disabled = false;
}
}
// 复制音频链接
function copyAudioUrl() {
const audioUrl = document.getElementById('audio-url');
audioUrl.select();
document.execCommand('copy');
alert('音频链接已复制到剪贴板');
}
</script>
</body>
</html>
响应格式
成功响应
流式输出: 启用后直接返回音频二进制数据,禁用后返回 JSON 响应
{
"code": 200,
"message": "合成成功!",
"data": {
"audio_url": "https://www.yuntts.com/wp-content/uploads/audio/processed/indextts_infer_5e8f9a_1678901234.mp3",
"format": "mp3",
"char_count": 15,
"points_deducted": 0.01
}
}
失败响应
{
"code": 400,
"message": "情绪向量无效:必须包含8个维度",
"data": null
}
错误码说明
| 错误码 | 描述 |
|---|---|
| 401 | 未授权,API密钥无效或缺失 |
| 400 | 请求参数错误,如文本为空、情绪向量格式错误等 |
| 403 | 余额不足,无法完成合成 |
| 500 | 服务器内部错误,如API调用失败、文件保存失败等 |
注意事项
- API密钥:请妥善保管您的API密钥,不要在前端代码中暴露。
- 文件大小:说话人音色参考音频最大 20MB,情感参考音频最大 10MB。
- 文本长度:单次合成文本建议不超过 600 字符。
- 情绪向量:确保提供的情绪向量包含 8 个维度,且每个维度的值在有效范围内。
- 情感音频:当情绪控制方式为 1 时,必须提供情感参考音频。
- 流式输出:默认启用流式输出,可实时返回音频流。
计费说明
- 计费单位:按合成文本的字符数计费
- 计费规则:根据用户类型应用不同折扣
- 最低扣费:0.01 元
- 字符计算:汉字按 2 个字符计算,其他字符按 1 个字符计算
示例应用场景
- 个性化语音合成:上传说话人音色参考音频,生成具有个人特色的语音。
- 情感语音合成:通过情绪向量或情感参考音频,生成具有特定情感的语音。
- 多媒体内容制作:为视频、动画等多媒体内容生成配音。
- 智能助手:为智能助手添加个性化的语音回应。
联系我们
如果您在使用过程中遇到任何问题,请联系我们的技术支持团队:
- 微信:yuntts
- 官网:https://www.yuntts.com
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。


评论(0)