接口概述

Index-TTS2 同步语音合成 API 是一款支持文件上传的语音合成接口,用户可通过上传说话人音色参考音频与情感参考音频,生成具备特定情感基调与专属音色的语音内容。该接口支持流式输出,可满足实时语音合成需求;同时,考虑到网站用户量大、接口调用频次高可能导致的调用失败问题,我们已新增其他算力接口作为补充,保障接口调用的稳定性与流畅性。
如需要在线使用Index-TTS2 语音合成,请访问Index-TTS2在线语音合成,可选模式1进行体验!

API 接口插图

接口信息

  • 接口地址: https://www.yuntts.com/api/v1/indextts2_infer
  • 请求方法: POST
  • 认证方式: Authorization: Bearer <API_KEY>
  • Content-Type: multipart/form-data

请求参数

参数名 类型 必填 默认值 描述
input string - 要合成的文本
speed float 1.0 语速,范围 0.1-4.0
sample_rate int 24000 目标音频采样率,支持 16000、22050、24000
gain float 1.0 音量,范围 0.1-10.0
interval_silence int 200 句子间隔静音 (ms)
use_random boolean false 启用情绪随机性
stream_mode boolean true 启用流式输出
emo_control_method int 0 情绪控制方式:0=无情感参考,1=基于情绪音频,2=基于情绪向量,3=基于情绪文本
emo_alpha float - 情感权重,范围 0.0-1.0,仅在 emo_control_method 不为 0 时有效
emo_vec array - 情绪向量,8个维度:[高兴,生气,悲伤,害怕,厌恶,忧郁,惊讶,平静],每个维度范围 0-1.2,总和不超过1.5,仅在 emo_control_method=2 时有效
emo_text string - 情感文本,仅在 emo_control_method=3 时有效
spk_audio_file file - 说话人音色参考音频,支持 wav、mp3 格式,最大 20MB
emo_audio_file file - 情感参考音频,支持 wav、mp3 格式,最大 10MB,仅在 emo_control_method=1 时必填

情绪向量维度说明

情绪向量包含 8 个维度,顺序如下:

  1. 高兴 - 表示愉快、快乐的情绪
  2. 生气 - 表示愤怒、恼火的情绪
  3. 悲伤 - 表示难过、悲伤的情绪
  4. 害怕 - 表示恐惧、害怕的情绪
  5. 厌恶 - 表示讨厌、厌恶的情绪
  6. 忧郁 - 表示忧郁、消沉的情绪
  7. 惊讶 - 表示惊讶、震惊的情绪
  8. 平静 - 表示平静、安宁的情绪

每个维度的值范围为 [0, 1.2],且所有维度的值相加不能大于 1.5。

请求示例

使用 cURL 发送请求

案例1:无情感参考 (emo_control_method=0)

curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "input=你好,欢迎使用 Index-TTS2 语音合成服务" \
  -F "speed=1.0" \
  -F "emo_control_method=0" \
  -F "spk_audio_file=@speaker.wav"

案例2:基于情绪音频 (emo_control_method=1)

curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "input=你好,欢迎使用 Index-TTS2 语音合成服务" \
  -F "speed=1.0" \
  -F "emo_control_method=1" \
  -F "emo_alpha=0.6" \
  -F "spk_audio_file=@speaker.wav" \
  -F "emo_audio_file=@emotion.wav"

案例3:基于情绪向量 (emo_control_method=2)

curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" 
  -H "Authorization: Bearer YOUR_API_KEY" 
  -F "input=你好,欢迎使用 Index-TTS2 语音合成服务" 
  -F "speed=1.0" 
  -F "emo_control_method=2" 
  -F "emo_vec=[0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2]" 
  -F "emo_alpha=0.6" 
  -F "spk_audio_file=@speaker.wav"

案例4:基于情绪文本 (emo_control_method=3)

curl -X POST "https://www.yuntts.com/api/v1/indextts2_infer" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "input=你好,欢迎使用 Index-TTS2 语音合成服务" \
  -F "speed=1.0" \
  -F "emo_control_method=2" \
  -F "emo_vec=[0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2]" \
  -F "emo_alpha=0.6" \
  -F "spk_audio_file=@speaker.wav"

使用 JavaScript 发送请求

// 构建表单数据
const formData = new FormData();
formData.append('input', '你好,欢迎使用 Index-TTS2 语音合成服务');
formData.append('response_format', 'mp3');
formData.append('speed', '1.0');
formData.append('emo_control_method', '2');
formData.append('emo_vec', JSON.stringify([0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2]));
formData.append('emo_alpha', '0.6');
formData.append('spk_audio_file', spkAudioFile); // spkAudioFile 是从文件输入获取的文件对象

// 发送请求
const response = await fetch('https://www.yuntts.com/api/v1/indextts2_infer', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: formData
});

// 解析响应
const data = await response.json();
if (response.ok) {
  console.log('合成成功:', data);
  console.log('音频URL:', data.data.audio_url);
} else {
  console.error('合成失败:', data.message);
}

使用 Python 发送请求 (基于情绪文本)

import requests

url = "https://www.yuntts.com/api/v1/indextts2_infer"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}

# 准备表单数据
data = {
    "input": "你好,欢迎使用 Index-TTS2 语音合成服务",
    "response_format": "mp3",
    "speed": "1.0",
    "emo_control_method": "3",
    "emo_text": "开心",
    "emo_alpha": "0.6"
}

# 准备文件数据
files = {
    "spk_audio_file": open("speaker.wav", "rb")
}

# 发送请求
response = requests.post(url, headers=headers, data=data, files=files)

# 处理响应
if response.status_code == 200:
    result = response.json()
    print("合成成功!")
    print(f"音频URL: {result['data']['audio_url']}")
else:
    result = response.json()
    print(f"合成失败: {result['message']}")

# 关闭文件
files["spk_audio_file"].close()

前端参考文件

<!DOCTYPE html>
<html lang="zh-CN">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>IndexTTS2 语音合成测试页面</title>
    <link href="https://cdn.staticfile.net/bootstrap/5.3.2/css/bootstrap.min.css" rel="stylesheet">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
    <style>
        body {
            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
            background-color: #f8f9fa;
        }

        .upload-area {
            border: 2px dashed #ced4da;
            border-radius: 8px;
            padding: 2rem;
            text-align: center;
            transition: all 0.3s ease;
        }

        .upload-area:hover {
            border-color: #007bff;
            background-color: #f0f8ff;
        }

        .upload-area.dragover {
            border-color: #007bff;
            background-color: #e3f2fd;
        }

        .card {
            box-shadow: 0 0.5rem 1rem rgba(0, 0, 0, 0.15);
            border-radius: 0.75rem;
        }

        #result-container {
            display: none;
        }

        .audio-player {
            width: 100%;
            margin-top: 1rem;
        }
    </style>
</head>
<body>
    <!-- 页面头部 -->
    <header class="bg-primary text-white py-5">
        <div class="container">
            <div class="text-center">
                <h1 class="display-5 fw-bold mb-4">Index‑TTS2 语音合成测试</h1>
                <p class="lead mb-4">测试 API: /wp-json/v1/indextts2_infer</p>
            </div>
        </div>
    </header>

    <!-- 主内容区域 -->
    <section class="py-5">
        <div class="container">
            <div class="row g-4">
                <!-- 左侧表单 -->
                <div class="col-md-7">
                    <div class="card p-4">
                        <form id="synthesize-form" enctype="multipart/form-data">
                            <!-- API密钥输入 -->
                            <div class="mb-4">
                                <label for="api-key" class="form-label">API 密钥</label>
                                <input type="text" class="form-control" id="api-key" placeholder="请输入API密钥" required>
                                <div class="form-text">API密钥用于身份验证</div>
                            </div>

                            <!-- 说话人音色参考音频 -->
                            <div class="mb-4">
                                <label class="form-label">说话人音色参考音频</label>
                                <div class="upload-area mb-3" id="spk-audio-dropzone">
                                    <input type="file" id="spk-audio-file" name="spk_audio_file" class="d-none" accept="audio/mp3,audio/wav">
                                    <i class="fas fa-cloud-upload display-4 text-primary mb-2"></i>
                                    <p class="mb-1">点击或拖拽音频文件至此处上传</p>
                                    <p class="text-muted small">支持WAV、MP3格式,最大20MB</p>
                                </div>
                                <div class="upload-area" id="spk-audio-result" style="display: none;">
                                    <div id="spk-waveform"></div>
                                    <div class="d-flex gap-2 mt-2">
                                        <button type="button" class="btn btn-sm btn-outline-primary" id="spk-audio-play">
                                            <i class="fas fa-play me-1"></i>播放
                                        </button>
                                        <button type="button" class="btn btn-sm btn-outline-secondary" id="spk-audio-stop">
                                            <i class="fas fa-stop me-1"></i>停止
                                        </button>
                                        <button type="button" class="btn btn-sm btn-outline-secondary ms-auto" id="spk-audio-remove">
                                            <i class="fas fa-times me-1"></i>更换音频
                                        </button>
                                    </div>
                                </div>
                            </div>



                            <!-- 合成文本 -->
                            <div class="mb-4">
                                <label for="synthesize-text" class="form-label">合成文本</label>
                                <textarea class="form-control" id="synthesize-text" rows="5" placeholder="请输入要合成的文本" required maxlength="600"></textarea>
                                <div class="d-flex justify-content-between align-items-center mt-2">
                                    <div class="form-text mb-0">支持中文、英文等多种语言</div>
                                    <div class="form-text mb-0">
                                        <span id="char-count-display">0</span>/600 字符
                                    </div>
                                </div>
                            </div>

                            <!-- 高级设置 -->
                            <div class="mb-4">
                                <button class="btn btn-primary w-100 d-flex justify-content-between align-items-center" type="button" data-bs-toggle="collapse" data-bs-target="#advanced-options" aria-expanded="false" aria-controls="advanced-options">
                                    <span>高级设置</span>
                                    <i class="fas fa-chevron-down"></i>
                                </button>
                                <div class="collapse" id="advanced-options">
                                    <div class="card card-body mt-2">
                                        <div class="row g-3">
                                            <!-- 输出格式:远程API只返回WAV,不需要选择 -->
                                            <div class="col-md-6 mb-4">
                                                <label for="sample-rate" class="form-label">目标音频采样率</label>
                                                <select class="form-select" id="sample-rate">
                                                    <option value="16000">16000</option>
                                                    <option value="22050">22050</option>
                                                    <option value="24000" selected>24000</option>
                                                </select>
                                            </div>
                                            <div class="col-md-6 mb-4">
                                                <label for="interval-silence" class="form-label">句子间隔静音 (ms) (默认值200)</label>
                                                <input type="number" class="form-control" id="interval-silence" min="0" max="1000" value="200" />
                                            </div>
                                            <!-- 单句最大Token数:文档中未定义,已删除 -->
                                            <div class="col-md-6 mb-4">
                                                <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                                    <div class="d-flex justify-content-between align-items-center mb-2">
                                                        <label for="speed" class="form-label mb-0">语速 (默认1.0)</label>
                                                        <span class="form-text text-muted fw-semibold"> 当前值: <span id="speed-value">1.0</span>
                                                        </span>
                                                    </div>
                                                    <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="speed" min="0.1" max="4.0" step="0.1" value="1.0" oninput="document.getElementById('speed-value').textContent = this.value" />
                                                </div>
                                            </div>
                                            <div class="col-md-6 mb-4">
                                                <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                                    <div class="d-flex justify-content-between align-items-center mb-2">
                                                        <label for="gain" class="form-label mb-0">音量 (默认1.0)</label>
                                                        <span class="form-text text-muted fw-semibold"> 当前值: <span id="gain-value">1.0</span>
                                                        </span>
                                                    </div>
                                                    <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="gain" min="0.1" max="10.0" step="0.1" value="1.0" oninput="document.getElementById('gain-value').textContent = this.value" />
                                                </div>
                                            </div>
                                        </div>
                                    </div>
                                </div>
                            </div>

                            <!-- 情绪随机性 -->
                            <div class="my-5">
                                <div class="form-check form-switch">
                                    <input class="form-check-input" type="checkbox" id="use-random" />
                                    <label class="form-check-label" for="use-random">启用情绪随机性</label>
                                </div>
                                <div class="form-text"> 在情绪控制中引入一定随机性,增加多样性 </div>
                            </div>

                            <!-- 流式输出 -->
                            <div class="my-5">
                                <div class="form-check form-switch">
                                    <input class="form-check-input" type="checkbox" id="stream-mode" checked />
                                    <label class="form-check-label" for="stream-mode">启用流式输出</label>
                                </div>
                                <div class="form-text"> 开启后将实时返回音频流,关闭后将等待合成完成后返回完整音频文件 </div>
                            </div>

                            <!-- 情绪控制方式 -->
                            <div class="mb-4">
                                <label class="form-label">情绪控制方式</label>
                                <div class="row g-3">
                                    <div class="col-md-4 emotion-control-option" data-target="emotion-control-0" style="cursor: pointer;">
                                        <div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
                                            <input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-0" value="0" checked onchange="toggleEmotionControl()" />
                                            <label class="form-check-label" for="emotion-control-0"> 无情感参考 </label>
                                        </div>
                                    </div>
                                    <div class="col-md-4 emotion-control-option" data-target="emotion-control-1" style="cursor: pointer;">
                                        <div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
                                            <input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-1" value="1" onchange="toggleEmotionControl()" />
                                            <label class="form-check-label" for="emotion-control-1"> 基于情绪音频 </label>
                                        </div>
                                    </div>
                                    <div class="col-md-4 emotion-control-option" data-target="emotion-control-2" style="cursor: pointer;">
                                        <div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
                                            <input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-2" value="2" onchange="toggleEmotionControl()" />
                                            <label class="form-check-label" for="emotion-control-2"> 基于情绪向量 </label>
                                        </div>
                                    </div>
                                    <div class="col-md-4 emotion-control-option" data-target="emotion-control-3" style="cursor: pointer;">
                                        <div class="form-check border p-4 rounded d-flex align-items-center justify-content-start h-100">
                                            <input class="form-check-input me-3 mx-2" type="radio" name="emotion-control" id="emotion-control-3" value="3" onchange="toggleEmotionControl()" />
                                            <label class="form-check-label" for="emotion-control-3"> 基于情绪文本 </label>
                                        </div>
                                    </div>
                                </div>
                                <div class="form-text">选择模型采用的情感控制策略,选择基于情绪音频时,需要上传情绪参考音频</div>
                            </div>

                            <!-- 基于情绪音频 -->
                            <div class="mb-4" id="emotion-audio-section">
                                <label class="form-label">情感参考音频(可选)</label>
                                <div class="upload-area mb-3" id="emo-audio-dropzone">
                                    <input type="file" id="emo-audio-file" name="emo_audio_file" class="d-none" accept="audio/mp3,audio/wav">
                                    <i class="fas fa-cloud-upload display-4 text-primary mb-2"></i>
                                    <p class="mb-1">点击或拖拽音频文件至此处上传</p>
                                    <p class="text-muted small">支持WAV、MP3格式,最大10MB</p>
                                </div>
                                <div class="upload-area" id="emo-audio-result" style="display: none;">
                                    <div id="emo-waveform"></div>
                                    <div class="d-flex gap-2 mt-2">
                                        <button type="button" class="btn btn-sm btn-outline-primary" id="emo-audio-play">
                                            <i class="fas fa-play me-1"></i>播放
                                        </button>
                                        <button type="button" class="btn btn-sm btn-outline-secondary" id="emo-audio-stop">
                                            <i class="fas fa-stop me-1"></i>停止
                                        </button>
                                        <button type="button" class="btn btn-sm btn-outline-secondary ms-auto" id="emo-audio-remove">
                                            <i class="fas fa-times me-1"></i>更换音频
                                        </button>
                                    </div>
                                </div>
                            </div>

                            <!-- 情感向量控制 -->
                            <div class="mb-4 bg-light rounded-4 border border-secondary-subtle p-4" id="emotion-vector-section" style="display: none;">
                                <label class="form-label fw-medium py-3">情感向量维度权重</label>
                                <div class="alert alert-info mb-4 p-3 rounded-3 border border-info-subtle">
                                    <strong>总和限制提示:</strong>所有情感维度的值相加不能超过1.5。 <span class="float-end">当前总和:<strong id="emotion-vector-total">0.0</strong></span>
                                </div>
                                <div class="row g-3">
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-happy" class="form-label text-body mb-0">高兴</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-happy-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-happy" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-happy-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-angry" class="form-label text-body mb-0">生气</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-angry-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-angry" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-angry-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-sad" class="form-label text-body mb-0">悲伤</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-sad-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-sad" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-sad-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-fear" class="form-label text-body mb-0">害怕</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-fear-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-fear" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-fear-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-disgust" class="form-label text-body mb-0">厌恶</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-disgust-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-disgust" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-disgust-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-melancholy" class="form-label text-body mb-0">忧郁</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-melancholy-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-melancholy" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-melancholy-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-surprise" class="form-label text-body mb-0">惊讶</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-surprise-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-surprise" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-surprise-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                    <div class="col-md-6">
                                        <div class="rounded-3 border border-primary-subtle p-3 bg-white">
                                            <div class="d-flex justify-content-between align-items-center mb-2">
                                                <label for="emotion-calm" class="form-label text-body mb-0">平静</label>
                                                <span class="form-text text-muted fw-semibold"> 当前值: <span id="emotion-calm-value">0.0</span>
                                                </span>
                                            </div>
                                            <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emotion-calm" min="0.0" max="1.2" step="0.1" value="0.0" oninput="document.getElementById('emotion-calm-value').textContent = this.value; updateEmotionVectorTotal()" />
                                        </div>
                                    </div>
                                </div>
                            </div>

                            <!-- 基于情绪文本 -->
                            <div class="mb-4 shadow-none p-3 bg-light rounded" id="emotion-text-section" style="display: none;">
                                <label for="emo_text" class="form-label">情感 (可选)</label>
                                <input type="text" class="form-control" id="emo_text" list="emotion-options" placeholder="请选择或输入情感..." value="开心" />
                                <datalist id="emotion-options">
                                    <option value="开心">
                                    <option value="高兴">
                                    <option value="生气">
                                    <option value="悲伤">
                                    <option value="害怕">
                                    <option value="厌恶">
                                    <option value="忧郁">
                                    <option value="惊讶">
                                    <option value="平静">
                                </datalist>
                                <div class="form-text mt-1">支持从列表中选择或手动输入自定义情感。</div>
                            </div>

                            <!-- 情感融合权重 -->
                            <div class="mb-4 rounded-3 border-primary-subtle" id="emotion-weight-section">
                                <div class="p-3 md-2">
                                    <div class="d-flex justify-content-between align-items-center w-100 mb-2">
                                        <label for="emo_alpha" class="form-label mb-0">情感融合权重 (0.0-1.0)</label>
                                        <span class="form-text text-muted fw-semibold"> 当前值: <span id="emo_alpha-value">1.0</span>
                                        </span>
                                    </div>
                                    <input type="range" class="form-range rounded-pill border border-primary/20 p-1" id="emo_alpha" min="0.0" max="1.0" step="0.1" value="1.0" oninput="document.getElementById('emo_alpha-value').textContent = this.value" />
                                </div>
                                <div class="form-text mt-1">参数用于控制情感特征对输出结果的影响程度,数值越大,生成语音的情绪特征越明显。</div>
                            </div>

                            <!-- 合成按钮 -->
                            <div class="d-grid gap-2">
                                <button type="submit" class="btn btn-primary btn-lg" id="synthesize-btn">
                                    <span id="btn-text">开始合成</span>
                                    <span id="btn-spinner" class="spinner-border spinner-border-sm" role="status" aria-hidden="true" style="display: none;"></span>
                                </button>
                            </div>
                        </form>
                    </div>
                </div>

                <!-- 右侧结果 -->
                <div class="col-md-5">
                    <div class="card h-100">
                        <div class="card-header">
                            <h5 class="card-title mb-0">合成结果</h5>
                        </div>
                        <div class="card-body">
                            <!-- 空状态 -->
                            <div id="empty-state" class="text-center py-5">
                                <i class="fas fa-waveform fa-3x text-primary mb-3"></i>
                                <h5 class="text-muted mb-2">暂无合成结果</h5>
                                <p class="text-muted small">请先填写表单并点击"开始合成"按钮</p>
                            </div>

                            <!-- 结果状态 -->
                            <div id="result-container" class="py-3">
                                <div class="mb-3">
                                    <label class="form-label">合成状态</label>
                                    <div id="status-message" class="alert alert-info"></div>
                                </div>
                                <div class="mb-3">
                                    <label class="form-label">音频预览</label>
                                    <audio id="result-audio" class="audio-player" controls></audio>
                                </div>
                                <div class="mb-3">
                                    <label class="form-label">音频链接</label>
                                    <input type="text" id="audio-url" class="form-control" readonly>
                                    <button type="button" class="btn btn-sm btn-outline-primary mt-2" onclick="copyAudioUrl()">
                                        <i class="fas fa-copy me-1"></i>复制链接
                                    </button>
                                </div>
                                <div class="mb-3">
                                    <label class="form-label">消耗信息</label>
                                    <div id="cost-info" class="bg-light p-3 rounded"></div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <!-- 页脚 -->
    <footer class="bg-dark text-white py-4">
        <div class="container text-center">
            <p class="mb-0">IndexTTS2 语音合成测试页面 &copy; 2026</p>
        </div>
    </footer>

    <!-- 必要的JavaScript -->
    <script src="https://cdn.staticfile.net/bootstrap/5.3.2/js/bootstrap.bundle.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/wavesurfer.js/7.3.2/wavesurfer.min.js"></script>
    <script>
        // 全局变量
        let spkWaveSurfer = null;
        let emoWaveSurfer = null;
        const allowedAudioExtensions = ['wav', 'mp3'];
        const allowedAudioMimeTypes = ['audio/wav', 'audio/mp3', 'audio/mpeg'];

        // DOM元素引用
        const $synthesizeForm = document.getElementById('synthesize-form');
        const $synthesizeBtn = document.getElementById('synthesize-btn');
        const $btnText = document.getElementById('btn-text');
        const $btnSpinner = document.getElementById('btn-spinner');
        const $synthesizeText = document.getElementById('synthesize-text');
        const $charCountDisplay = document.getElementById('char-count-display');
        const $emptyState = document.getElementById('empty-state');
        const $resultContainer = document.getElementById('result-container');
        const $statusMessage = document.getElementById('status-message');
        const $resultAudio = document.getElementById('result-audio');
        const $audioUrl = document.getElementById('audio-url');
        const $costInfo = document.getElementById('cost-info');

        // 字符计数监听
        $synthesizeText.addEventListener('input', function() {
            const charCount = this.value.length;
            $charCountDisplay.textContent = charCount;
        });

        // 情绪控制方式切换
        function toggleEmotionControl() {
            const emotionControl = document.querySelector('input[name="emotion-control"]:checked').value;

            // 隐藏所有情感控制部分
            document.getElementById('emotion-audio-section').style.display = 'none';
            document.getElementById('emotion-vector-section').style.display = 'none';
            document.getElementById('emotion-text-section').style.display = 'none';
            document.getElementById('emotion-weight-section').style.display = 'none';

            // 根据选择显示对应的UI
            if (emotionControl === '0') {
                // 无情感参考,不需要显示任何情感控制部分
            } else if (emotionControl === '1') {
                document.getElementById('emotion-audio-section').style.display = 'block';
                document.getElementById('emotion-weight-section').style.display = 'block';
            } else if (emotionControl === '2') {
                document.getElementById('emotion-vector-section').style.display = 'block';
                document.getElementById('emotion-weight-section').style.display = 'block';
            } else if (emotionControl === '3') {
                document.getElementById('emotion-text-section').style.display = 'block';
                document.getElementById('emotion-weight-section').style.display = 'block';
            }
        }

        // 更新情感向量总和
        function updateEmotionVectorTotal() {
            const emotions = ['happy', 'angry', 'sad', 'fear', 'disgust', 'melancholy', 'surprise', 'calm'];
            let total = 0;
            emotions.forEach(function(emotion) {
                total += parseFloat(document.getElementById('emotion-' + emotion).value) || 0;
            });
            document.getElementById('emotion-vector-total').textContent = total.toFixed(1);
        }

        // 音频文件验证
        function validateAudioFile(file) {
            // 1. 检查文件是否为空
            if (!file || file.size === 0) {
                alert('请选择有效的音频文件,文件不能为空');
                return false;
            }

            // 2. 检查文件大小(5MB限制)
            const maxSize = 5 * 1024 * 1024;
            if (file.size > maxSize) {
                alert('音频文件不能超过 5MB');
                return false;
            }

            // 3. 验证文件扩展名
            const fileExtension = file.name.split('.').pop().toLowerCase();
            if (!allowedAudioExtensions.includes(fileExtension)) {
                alert('不支持的文件格式,仅支持WAV、MP3格式');
                return false;
            }

            // 4. 验证MIME类型
            if (!allowedAudioMimeTypes.includes(file.type) && !file.type.startsWith('audio/')) {
                alert('无效的音频文件类型');
                return false;
            }

            return true;
        }

        // 设置拖放事件处理
        function setupDragDrop(dropzone, fileInput, handleFunc, type) {
            dropzone.addEventListener('dragover', (e) => {
                e.preventDefault();
                dropzone.classList.add('dragover');
            });

            dropzone.addEventListener('dragleave', () => {
                dropzone.classList.remove('dragover');
            });

            dropzone.addEventListener('drop', (e) => {
                e.preventDefault();
                dropzone.classList.remove('dragover');
                const files = e.dataTransfer.files;
                if (files.length > 0) {
                    if (validateAudioFile(files[0])) {
                        handleFunc(files[0], type);
                    }
                }
            });
        }

        // 处理音频文件
        function handleAudioFile(file, type) {
            const fileUrl = URL.createObjectURL(file);
            const dropzone = type === 'spk' ? document.getElementById('spk-audio-dropzone') : document.getElementById('emo-audio-dropzone');
            const result = type === 'spk' ? document.getElementById('spk-audio-result') : document.getElementById('emo-audio-result');

            dropzone.style.display = 'none';
            result.style.display = 'block';

            // 初始化 wavesurfer 显示音谱
            initWaveSurfer(fileUrl, type);
        }

        // 初始化 WaveSurfer
        function initWaveSurfer(fileUrl, type) {
            const container = type === 'spk' ? '#spk-waveform' : '#emo-waveform';
            const waveColor = type === 'spk' ? '#4f46e5' : '#ef4444';
            const progressColor = type === 'spk' ? '#818cf8' : '#f87171';
            const playBtn = type === 'spk' ? document.getElementById('spk-audio-play') : document.getElementById('emo-audio-play');

            if (type === 'spk' && spkWaveSurfer) {
                spkWaveSurfer.destroy();
            } else if (type === 'emo' && emoWaveSurfer) {
                emoWaveSurfer.destroy();
            }

            const newWaveSurfer = WaveSurfer.create({
                container: container,
                waveColor: waveColor,
                progressColor: progressColor,
                cursorColor: waveColor,
                barWidth: 2,
                barRadius: 3,
                height: 80,
                normalize: true
            });

            newWaveSurfer.load(fileUrl);

            newWaveSurfer.on('play', () => {
                playBtn.innerHTML = '<i class="fas fa-pause me-1"></i>暂停';
            });

            newWaveSurfer.on('pause', () => {
                playBtn.innerHTML = '<i class="fas fa-play me-1"></i>播放';
            });

            if (type === 'spk') {
                spkWaveSurfer = newWaveSurfer;
            } else {
                emoWaveSurfer = newWaveSurfer;
            }
        }

        // 清除音频文件
        function clearAudioFile(e) {
            const isSpk = e.target.id.includes('spk');
            const dropzone = isSpk ? document.getElementById('spk-audio-dropzone') : document.getElementById('emo-audio-dropzone');
            const result = isSpk ? document.getElementById('spk-audio-result') : document.getElementById('emo-audio-result');
            const fileInput = isSpk ? document.getElementById('spk-audio-file') : document.getElementById('emo-audio-file');
            const wavesurfer = isSpk ? spkWaveSurfer : emoWaveSurfer;

            fileInput.value = '';
            result.style.display = 'none';
            dropzone.style.display = 'block';

            if (wavesurfer) {
                wavesurfer.stop();
                wavesurfer.destroy();
                if (isSpk) {
                    spkWaveSurfer = null;
                } else {
                    emoWaveSurfer = null;
                }
            }
        }

        // 绑定事件
        document.addEventListener('DOMContentLoaded', function() {
            // 初始化情绪控制
            toggleEmotionControl();

            // 绑定说话人音频事件
            const spkAudioDropzone = document.getElementById('spk-audio-dropzone');
            const spkAudioFile = document.getElementById('spk-audio-file');
            const spkAudioRemove = document.getElementById('spk-audio-remove');
            const spkAudioPlay = document.getElementById('spk-audio-play');
            const spkAudioStop = document.getElementById('spk-audio-stop');

            spkAudioDropzone.addEventListener('click', () => spkAudioFile.click());
            spkAudioFile.addEventListener('change', (e) => {
                if (e.target.files.length > 0) {
                    if (validateAudioFile(e.target.files[0])) {
                        handleAudioFile(e.target.files[0], 'spk');
                    } else {
                        e.target.value = '';
                    }
                }
            });
            spkAudioRemove.addEventListener('click', clearAudioFile);
            spkAudioPlay.addEventListener('click', () => spkWaveSurfer && spkWaveSurfer.playPause());
            spkAudioStop.addEventListener('click', () => spkWaveSurfer && spkWaveSurfer.stop());

            // 设置拖放事件
            setupDragDrop(spkAudioDropzone, spkAudioFile, handleAudioFile, 'spk');

            // 绑定情感音频事件
            const emoAudioDropzone = document.getElementById('emo-audio-dropzone');
            const emoAudioFile = document.getElementById('emo-audio-file');
            const emoAudioRemove = document.getElementById('emo-audio-remove');
            const emoAudioPlay = document.getElementById('emo-audio-play');
            const emoAudioStop = document.getElementById('emo-audio-stop');

            emoAudioDropzone.addEventListener('click', () => emoAudioFile.click());
            emoAudioFile.addEventListener('change', (e) => {
                if (e.target.files.length > 0) {
                    if (validateAudioFile(e.target.files[0])) {
                        handleAudioFile(e.target.files[0], 'emo');
                    } else {
                        e.target.value = '';
                    }
                }
            });
            emoAudioRemove.addEventListener('click', clearAudioFile);
            emoAudioPlay.addEventListener('click', () => emoWaveSurfer && emoWaveSurfer.playPause());
            emoAudioStop.addEventListener('click', () => emoWaveSurfer && emoWaveSurfer.stop());

            // 设置拖放事件
            setupDragDrop(emoAudioDropzone, emoAudioFile, handleAudioFile, 'emo');

            // 表单提交事件
            $synthesizeForm.addEventListener('submit', function(e) {
                e.preventDefault();
                synthesizeSpeech();
            });
        });

        // 合成语音
        async function synthesizeSpeech() {
            // 显示加载状态
            $btnText.style.display = 'none';
            $btnSpinner.style.display = 'inline-block';
            $synthesizeBtn.disabled = true;

            try {
                // 获取表单数据
                const apiKey = document.getElementById('api-key').value;
                const text = document.getElementById('synthesize-text').value;
                const emotionControl = document.querySelector('input[name="emotion-control"]:checked').value;
                const emoAlpha = document.getElementById('emo_alpha').value;

                // 获取高级设置参数
                const sampleRate = document.getElementById('sample-rate').value;
                const intervalSilence = document.getElementById('interval-silence').value;
                const speed = document.getElementById('speed').value;
                const gain = document.getElementById('gain').value;
                const useRandom = document.getElementById('use-random').checked;
                const streamMode = document.getElementById('stream-mode').checked;

                // 构建请求参数
                const formData = new FormData();
                formData.append('input', text);
                formData.append('sample_rate', sampleRate);
                formData.append('interval_silence', intervalSilence);
                formData.append('speed', speed);
                formData.append('gain', gain);
                formData.append('use_random', useRandom);
                formData.append('stream_mode', streamMode);
                formData.append('emo_control_method', emotionControl);
                
                // 只有在非0情绪控制方式时添加情感融合权重
                if (emotionControl !== '0') {
                    formData.append('emo_alpha', emoAlpha);
                }

                // 添加情感相关参数
                if (emotionControl === '2') {
                    // 情感向量
                    const emotions = ['happy', 'angry', 'sad', 'fear', 'disgust', 'melancholy', 'surprise', 'calm'];
                    const emoVec = emotions.map(emotion => {
                        return document.getElementById('emotion-' + emotion).value;
                    });
                    formData.append('emo_vec', JSON.stringify(emoVec));
                } else if (emotionControl === '3') {
                    // 情感文本
                    const emoText = document.getElementById('emo_text').value;
                    formData.append('emo_text', emoText);
                }

                // 添加音频文件
                const spkAudioFile = document.getElementById('spk-audio-file');
                if (spkAudioFile.files.length > 0) {
                    formData.append('spk_audio_file', spkAudioFile.files[0]);
                }

                const emoAudioFile = document.getElementById('emo-audio-file');
                if (emoAudioFile.files.length > 0) {
                    formData.append('emo_audio_file', emoAudioFile.files[0]);
                }

                // 发送请求
                const response = await fetch('https://www.yuntts.com/api/v1/indextts2_infer', {
                    method: 'POST',
                    headers: {
                        'Authorization': 'Bearer ' + apiKey
                    },
                    body: formData
                });

                // 处理响应
                if (streamMode) {
                    // 流式输出:直接处理音频数据
                    if (response.ok) {
                        const blob = await response.blob();
                        const audioUrl = URL.createObjectURL(blob);
                        
                        $emptyState.style.display = 'none';
                        $resultContainer.style.display = 'block';
                        $statusMessage.className = 'alert alert-success';
                        $statusMessage.textContent = '合成成功(流式输出)!';
                        $resultAudio.src = audioUrl;
                        $audioUrl.value = audioUrl;
                        $costInfo.innerHTML = `
                            <p>字符数: ${text.length}</p>
                            <p>消耗: 计算中...</p>
                        `;
                    } else {
                        // 尝试解析错误响应
                        try {
                            const errorData = await response.json();
                            $emptyState.style.display = 'none';
                            $resultContainer.style.display = 'block';
                            $statusMessage.className = 'alert alert-danger';
                            $statusMessage.textContent = `合成失败: ${errorData.message || '未知错误'}`;
                        } catch {
                            $emptyState.style.display = 'none';
                            $resultContainer.style.display = 'block';
                            $statusMessage.className = 'alert alert-danger';
                            $statusMessage.textContent = '合成失败: 服务器错误';
                        }
                        $resultAudio.src = '';
                        $audioUrl.value = '';
                        $costInfo.innerHTML = '';
                    }
                } else {
                    // 非流式输出:解析JSON响应
                    const data = await response.json();
                    
                    if (response.ok) {
                        $emptyState.style.display = 'none';
                        $resultContainer.style.display = 'block';
                        $statusMessage.className = 'alert alert-success';
                        $statusMessage.textContent = '合成成功!';
                        $resultAudio.src = data.data.audio_url;
                        $audioUrl.value = data.data.audio_url;
                        $costInfo.innerHTML = `
                            <p>字符数: ${data.data.char_count}</p>
                            <p>消耗: ${data.data.points_deducted} 积分</p>
                        `;
                    } else {
                        $emptyState.style.display = 'none';
                        $resultContainer.style.display = 'block';
                        $statusMessage.className = 'alert alert-danger';
                        $statusMessage.textContent = `合成失败: ${data.message || '未知错误'}`;
                        $resultAudio.src = '';
                        $audioUrl.value = '';
                        $costInfo.innerHTML = '';
                    }
                }
            } catch (error) {
                console.error('合成失败:', error);
                $emptyState.style.display = 'none';
                $resultContainer.style.display = 'block';
                $statusMessage.className = 'alert alert-danger';
                $statusMessage.textContent = '合成失败: 网络错误,请稍后重试';
            } finally {
                // 恢复按钮状态
                $btnText.style.display = 'inline';
                $btnSpinner.style.display = 'none';
                $synthesizeBtn.disabled = false;
            }
        }

        // 复制音频链接
        function copyAudioUrl() {
            const audioUrl = document.getElementById('audio-url');
            audioUrl.select();
            document.execCommand('copy');
            alert('音频链接已复制到剪贴板');
        }
    </script>
</body>
</html>

 

响应格式

成功响应

流式输出: 启用后直接返回音频二进制数据,禁用后返回 JSON 响应

{
  "code": 200,
  "message": "合成成功!",
  "data": {
    "audio_url": "https://www.yuntts.com/wp-content/uploads/audio/processed/indextts_infer_5e8f9a_1678901234.mp3",
    "format": "mp3",
    "char_count": 15,
    "points_deducted": 0.01
  }
}

失败响应

{
  "code": 400,
  "message": "情绪向量无效:必须包含8个维度",
  "data": null
}

错误码说明

错误码 描述
401 未授权,API密钥无效或缺失
400 请求参数错误,如文本为空、情绪向量格式错误等
403 余额不足,无法完成合成
500 服务器内部错误,如API调用失败、文件保存失败等

注意事项

  1. API密钥:请妥善保管您的API密钥,不要在前端代码中暴露。
  2. 文件大小:说话人音色参考音频最大 20MB,情感参考音频最大 10MB。
  3. 文本长度:单次合成文本建议不超过 600 字符。
  4. 情绪向量:确保提供的情绪向量包含 8 个维度,且每个维度的值在有效范围内。
  5. 情感音频:当情绪控制方式为 1 时,必须提供情感参考音频。
  6. 流式输出:默认启用流式输出,可实时返回音频流。

计费说明

  • 计费单位:按合成文本的字符数计费
  • 计费规则:根据用户类型应用不同折扣
  • 最低扣费:0.01 元
  • 字符计算:汉字按 2 个字符计算,其他字符按 1 个字符计算

示例应用场景

  1. 个性化语音合成:上传说话人音色参考音频,生成具有个人特色的语音。
  2. 情感语音合成:通过情绪向量或情感参考音频,生成具有特定情感的语音。
  3. 多媒体内容制作:为视频、动画等多媒体内容生成配音。
  4. 智能助手:为智能助手添加个性化的语音回应。

联系我们

如果您在使用过程中遇到任何问题,请联系我们的技术支持团队:

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。