微软 Azure语音合成服务API接口使用说明

前言说明

在人工智能语音技术的应用场景中，语音合成服务以其自然的发音效果和多语言支持备受关注。然而，由于微软官方接口需绑定国外信用卡进行认证，这一限制为国内用户的接入设置了较高门槛，导致许多企业和开发者无法便捷地使用该服务。

为解决这一痛点，我们凭借技术积累与行业经验，自主研发了一套微软 TTS 文本转语音 API 接口。通过深度集成微软语音合成能力，我们将服务进行本地化适配，用户无需直接对接海外平台，仅需通过我们提供的 API 接口，即可快速实现文本转语音功能。此举不仅规避了国外信用卡认证的壁垒，还保留了微软语音的高保真音质与自然流畅度，支持中文及多语言合成、自定义语速 / 音调、SSML 标记语言等核心特性。

无论您是需要为应用添加语音播报功能的开发者，还是希望提升内容传播效率的企业用户，我们的 API 接口均能提供稳定、高效的语音合成服务。通过简化接入流程、降低使用门槛，我们致力于让更多国内用户享受到前沿语音技术的红利，助力智能交互场景的落地与创新。

使用说明

接口地址

https://www.yuntts.com/api/v1//azure   （支持 GET 和 POST）

请求头

Content-Type: application/json
Authorization: Bearer API-KEY  // API密钥

注意：秘钥请登录后前往 https://www.yuntts.com/user/api/ 生成，接口秘钥支持访问本站所有接口服务！

请求参数

{
    "text": "让配音达到自然流畅境界！多种调音功能助你打造精品配音！",
    "ssml": false,
    "cache": true,
    "language": "zh-CN",
    "voice": "zh-CN-XiaoxiaoNeural",
    "style": "general",
    "role": "null",
    "styledegree": 1,
    "rate": 0,
    "pitch": 0,
    "volume": 100,
    "audioformat": "audio-16khz-128kbitrate-mono-mp3",
    "backgroundaudio": "",
    "backgroundaudio_volume": 100,
    "fadein": 0,
    "fadeout": 0
}

请求参数说明

参数名	说明	格式说明	默认值	是否可选
language	语音参数	语音	zh-CN	否
voice	角色	zh-CN-XiaoxiaoNeural	无	是
text	文本内容	-	无	是
style	风格	-	general	否
role	讲话角色扮演	-	null	否
rate	语速	范围从 -50 到 +100	0	否
pitch	音调	范围从 -50 到 +100	0	否
volume	音量	范围从 0 到 100	100	否
styledegree	感情强度	接受值的范围为：0.01 到 2	1	否
backgroundaudio	背景音乐	音频地址	无	否
backgroundvolume	背景音乐音量	整数，范围从 0 到 100。	100	否
fadein	淡入时间	接受的值：0 到 10000，字符串格式，表示时间（毫秒）。	3000	否
fadeout	淡出时间	接受的值：0 到 10000，字符串格式，表示时间（毫秒）。	4000	否
cache	是否缓存	布尔值，true 或 false。	true	否
ssml	是否启用 ssml 合成	布尔值，true 或 false。	false	否
audioformat	音频格式	-	audio-16khz-128kbitrate-mono-mp3	否

返回参数

非流式返回

{
    "code": 200,
    "msg": "合成成功",
    "audio_url": "https://www.yuntts.com/wp-content/uploads/2025/05/682ad63e4b927_1747637822.mp3",
    "format": "mp3",
    "characters_used": 52,
    "price": 0.01,
    "quota_remaining": 184.4315
}

流式返回

{
    "code": 200,
    "msg": "合成成功",
    "audio_data": "base64编码的音频数据",
    "format": "mp3",
    "characters_used": 52,
    "price": 0.01,
    "quota_remaining": 184.4115
}

错误说明

参考代码

<!DOCTYPE html>
<html lang="zh-CN">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Azure TTS 前端页面</title>
    <link href="https://cdn.bootcdn.net/ajax/libs/twitter-bootstrap/5.3.3/css/bootstrap.min.css" rel="stylesheet">
</head>

<body>
    <div class="container mt-4">
        <div class="row">
            <div class="col-md-8">
                <div class="mb-3">
                    <button type="button" class="btn btn-primary" data-bs-toggle="modal" data-bs-target="#backgroundAudioModal">
                        选择背景音乐
                    </button>
                </div>
                <div class="mb-3">
                    <label for="textInput" class="form-label">文本输入</label>
                    <textarea class="form-control" id="textInput" rows="25"></textarea>
                </div>
                <div class="d-flex justify-content-end gap-3">
                    <div class="form-check form-switch">
                        <input class="form-check-input" type="checkbox" id="ssmlSwitch">
                        <label class="form-check-label" for="ssmlSwitch">是否启用SSML合成</label>
                    </div>
                    <div class="form-check form-switch">
                        <input class="form-check-input" type="checkbox" id="cacheSwitch" checked>
                        <label class="form-check-label" for="cacheSwitch">是否缓存</label>
                    </div>
                </div>
            </div>
            <div class="col-md-4">
                <div class="mb-3">
                    <label for="languageSelect" class="form-label">语言</label>
                    <select class="form-select" id="languageSelect">
                        <option value="zh-CN">zh-CN</option>
                    </select>
                </div>
                <div class="mb-3">
                    <label for="voiceSelect" class="form-label">角色</label>
                    <select class="form-select" id="voiceSelect">
                        <option value="zh-CN-XiaoxiaoNeural">zh-CN-XiaoxiaoNeural</option>
                    </select>
                </div>
                <div class="mb-3">
                    <label for="styleSelect" class="form-label">风格</label>
                    <select class="form-select" id="styleSelect">
                        <option value="general">默认风格</option>
                    </select>
                </div>
                <div class="mb-3">
                    <label for="roleInput" class="form-label">讲话角色扮演</label>
                    <select class="form-select" id="roleInput">
                        <option value="null">默认风格</option>
                    </select>
                </div>
                <div class="mb-3">
                    <label for="audioFormatSelect" class="form-label">音频格式</label>
                    <select class="form-select" id="audioFormatSelect">
                        <option value="audio-16khz-128kbitrate-mono-mp3">audio-16khz-128kbitrate-mono-mp3</option>
                    </select>
                </div>
                <div class="mb-3">
                    <label for="styleDegreeRange" class="form-label">风格强度</label>
                    <div class="d-flex align-items-center">
                        <input type="range" class="form-range" id="styleDegreeRange" min="0.01" max="2" step="0.01" value="1">
                        <span class="ms-2" id="styleDegreeValue">1</span>
                    </div>
                </div>
                <div class="mb-3">
                    <label for="rateRange" class="form-label">语速</label>
                    <div class="d-flex align-items-center">
                        <input type="range" class="form-range" id="rateRange" min="-50" max="+100" step="1" value="0">
                        <span class="ms-2" id="rateValue">0</span>
                    </div>
                </div>
                <div class="mb-3">
                    <label for="pitchRange" class="form-label">音调</label>
                    <div class="d-flex align-items-center">
                        <input type="range" class="form-range" id="pitchRange" min="-50" max="+100" step="1" value="0">
                        <span class="ms-2" id="pitchValue">0</span>
                    </div>
                </div>
                <div class="mb-3">
                    <label for="volumeRange" class="form-label">音量</label>
                    <div class="d-flex align-items-center">
                        <input type="range" class="form-range" id="volumeRange" min="0.0" max="100.0" step="5" value="100">
                        <span class="ms-2" id="volumeValue">100</span>
                    </div>
                </div>
                <div class="d-flex flex-wrap gap-2">
                    <button type="button" class="btn btn-primary flex-grow-1" id="submitBtn">开始合成 <span class="spinner-border spinner-border-sm" role="status" aria-hidden="true" style="display: none;"></span></button>
                    <button type="button" class="btn btn-success flex-grow-1" id="playBtn">播放音频 <span class="spinner-border spinner-border-sm" role="status" aria-hidden="true" style="display: none;"></span></button>
                    <button type="button" class="btn btn-warning flex-grow-1" id="downloadBtn">下载音频 <span class="spinner-border spinner-border-sm" role="status" aria-hidden="true" style="display: none;"></span></button>
                </div>
            </div>
            <div class="modal fade" id="backgroundAudioModal" tabindex="-1" aria-labelledby="backgroundAudioModalLabel" aria-hidden="true">
                <div class="modal-dialog">
                    <div class="modal-content">
                        <div class="modal-header">
                            <h5 class="modal-title" id="backgroundAudioModalLabel">选择背景音乐</h5>
                            <button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
                        </div>
                        <div class="modal-body">
                            <div class="mb-3">
                                <label for="backgroundAudioUrl" class="form-label">背景音乐URL</label>
                                <input type="text" class="form-control" id="backgroundAudioUrl">
                            </div>
                            <div class="mb-3">
                                <label for="backgroundAudioVolume" class="form-label">音量</label>
                                <input type="number" class="form-control" id="backgroundAudioVolume" min="0" max="100" value="100">
                            </div>
                            <div class="mb-3">
                                <label for="fadeIn" class="form-label">淡入时间(毫秒)</label>
                                <input type="number" class="form-control" id="fadeIn" min="0" max="10000" value="0">
                            </div>
                            <div class="mb-3">
                                <label for="fadeOut" class="form-label">淡出时间(毫秒)</label>
                                <input type="number" class="form-control" id="fadeOut" min="0" max="10000" value="0">
                            </div>
                        </div>
                        <div class="modal-footer">
                            <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button>
                            <button type="button" class="btn btn-primary" id="confirmAudioBtn">确认选择</button>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <script src="https://cdn.bootcdn.net/ajax/libs/twitter-bootstrap/5.3.3/js/bootstrap.bundle.min.js"></script>
    <script>
        function updateSliderValue(elementId, displayId) {
            document.getElementById(elementId).addEventListener('input', function () {
                document.getElementById(displayId).textContent = this.value;
            });
        }

        updateSliderValue('rateRange', 'rateValue');
        updateSliderValue('pitchRange', 'pitchValue');
        updateSliderValue('volumeRange', 'volumeValue');
        updateSliderValue('styleDegreeRange', 'styleDegreeValue');

        function convertToHttps(url) {
            return url;
        }

        document.addEventListener('DOMContentLoaded', function () {
            document.getElementById('playBtn').disabled = true;
            document.getElementById('downloadBtn').disabled = true;
            document.getElementById('rateValue').textContent = document.getElementById('rateRange').value;
            document.getElementById('pitchValue').textContent = document.getElementById('pitchRange').value;
        });

        function sendRequest() {
            document.getElementById('submitBtn').disabled = true;
            document.getElementById('submitBtn').querySelector('.spinner-border').style.display = 'inline-block';

            const data = {
                text: document.getElementById('textInput').value,
                ssml: document.getElementById('ssmlSwitch').checked,
                cache: document.getElementById('cacheSwitch').checked,
                language: document.getElementById('languageSelect').value,
                voice: document.getElementById('voiceSelect').value,
                style: document.getElementById('styleSelect').value,
                role: document.getElementById('roleInput').value,
                styledegree: parseFloat(document.getElementById('styleDegreeRange').value),
                rate: parseFloat(document.getElementById('rateRange').value),
                pitch: parseFloat(document.getElementById('pitchRange').value),
                volume: parseInt(document.getElementById('volumeRange').value),
                audioformat: document.getElementById('audioFormatSelect').value,
                backgroundaudio: document.getElementById('backgroundAudioUrl').value,
                backgroundaudio_volume: parseInt(document.getElementById('backgroundAudioVolume').value),
                fadein: parseInt(document.getElementById('fadeIn').value),
                fadeout: parseInt(document.getElementById('fadeOut').value)
            };
            console.log('即将发送的请求数据:', data);
            fetch('https://www.yuntts.com/api/v1//azure', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                    'Authorization': `Bearer 68985a242e655fb2d0194dfb478f1f76`
                },
                body: JSON.stringify(data)
            })
            .then(response => {
                if (response.headers.get('Content-Type').includes('audio/mpeg')) {
                    return response.blob();
                } else {
                    return response.json();
                }
            })
            .then(data => {
                console.log('请求返回的数据:', data);
                let audioUrl;

                if (data.code === 200 && data.audio_url) {
                    audioUrl = convertToHttps(data.audio_url);
                } else if (data instanceof Blob) {
                    audioUrl = window.URL.createObjectURL(data);
                }

                if (audioUrl) {
                    let audioElement = document.getElementById('generatedAudio');
                    if (!audioElement) {
                        audioElement = document.createElement('audio');
                        audioElement.id = 'generatedAudio';
                        audioElement.style.display = 'none';
                        document.body.appendChild(audioElement);
                    }
                    audioElement.src = audioUrl;

                    document.getElementById('playBtn').disabled = false;
                    document.getElementById('downloadBtn').disabled = false;
                }

                document.getElementById('submitBtn').disabled = false;
                document.getElementById('submitBtn').querySelector('.spinner-border').style.display = 'none';
            })
            .catch(error => {
                console.error('请求出错:', error);
                document.getElementById('submitBtn').disabled = false;
                document.getElementById('submitBtn').querySelector('.spinner-border').style.display = 'none';
            });
        }

        document.getElementById('playBtn').addEventListener('click', function () {
            const audioElement = document.getElementById('generatedAudio');
            if (audioElement && audioElement.src) {
                this.querySelector('.spinner-border').style.display = 'inline-block';
                this.disabled = true;

                audioElement.play().then(() => {
                    this.querySelector('.spinner-border').style.display = 'none';
                    this.disabled = false;
                }).catch(error => {
                    console.error('播放音频失败:', error);
                    this.querySelector('.spinner-border').style.display = 'none';
                    this.disabled = false;
                });
            } else {
                alert('请先合成音频');
            }
        });

        document.getElementById('downloadBtn').addEventListener('click', function () {
            const audioElement = document.getElementById('generatedAudio');
            if (audioElement && audioElement.src) {
                this.querySelector('.spinner-border').style.display = 'inline-block';
                this.disabled = true;

                const a = document.createElement('a');
                a.href = audioElement.src;
                a.download = 'output.mp3';
                a.click();

                setTimeout(() => {
                    this.querySelector('.spinner-border').style.display = 'none';
                    this.disabled = false;
                }, 500);
            } else {
                alert('请先合成音频');
            }
        });

        document.getElementById('confirmAudioBtn').addEventListener('click', function () {
            const modal = new bootstrap.Modal(document.getElementById('backgroundAudioModal'));
            modal.hide();
        });

        document.getElementById('submitBtn').addEventListener('click', sendRequest);
    </script>
</body>

</html>

定价说明

定价请参考《关于语音工坊各项服务计费总览》，如果合成字符数量过少（如10个字符×单价0.00031834=0.00382008）最终计算扣除费用不满0.01时按0.01算！

官方文档

使用 SSML 来自定义语音和声音

语音合成标记语言 (SSML) 的语音和声音

语音角色(name)属性

语言支持 - 语音服务

其他说明

讲话风格

感情风格（style ）属性说话风格（role ）属性

Style	说明
`style="advertisement_upbeat"`	用兴奋和精力充沛的语气推广产品或服务。
`style="affectionate"`	以较高的音调和音量表达温暖而亲切的语气。说话者处于吸引听众注意力的状态。说话者的个性往往是讨喜的。
`style="angry"`	表达生气和厌恶的语气。
`style="assistant"`	数字助理用的是热情而轻松的语气。
`style="calm"`	以沉着冷静的态度说话。语气、音调和韵律与其他语音类型相比要统一得多。
`style="chat"`	表达轻松随意的语气。
`style="cheerful"`	表达积极愉快的语气。
`style="customerservice"`	以友好热情的语气为客户提供支持。
`style="depressed"`	调低音调和音量来表达忧郁、沮丧的语气。
`style="disgruntled"`	表达轻蔑和抱怨的语气。这种情绪的语音表现出不悦和蔑视。
`style="documentary-narration"`	用一种轻松、感兴趣和信息丰富的风格讲述纪录片，适合纪录片、专家评论和类似内容。
`style="embarrassed"`	在说话者感到不舒适时表达不确定、犹豫的语气。
`style="empathetic"`	表达关心和理解。
`style="envious"`	当你渴望别人拥有的东西时，表达一种钦佩的语气。
`style="excited"`	表达乐观和充满希望的语气。似乎发生了一些美好的事情，说话人对此满意。
`style="fearful"`	以较高的音调、较高的音量和较快的语速来表达恐惧、紧张的语气。说话人处于紧张和不安的状态。
`style="friendly"`	表达一种愉快、怡人且温暖的语气。听起来很真诚且满怀关切。
`style="gentle"`	以较低的音调和音量表达温和、礼貌和愉快的语气。
`style="hopeful"`	表达一种温暖且渴望的语气。听起来像是会有好事发生在说话人身上。
`style="lyrical"`	以优美又带感伤的方式表达情感。
`style="narration-professional"`	以专业、客观的语气朗读内容。
`style="narration-relaxed"`	为内容阅读表达一种舒缓而悦耳的语气。
`style="newscast"`	以正式专业的语气叙述新闻。
`style="newscast-casual"`	以通用、随意的语气发布一般新闻。
`style="newscast-formal"`	以正式、自信和权威的语气发布新闻。
`style="poetry-reading"`	在读诗时表达出带情感和节奏的语气。
`style="sad"`	表达悲伤语气。
`style="serious"`	表达严肃和命令的语气。说话者的声音通常比较僵硬，节奏也不那么轻松。
`style="shouting"`	表达一种听起来好像声音在远处或在另一个地方的语气，努力让别人听清楚。
`style="sports_commentary"`	表达一种既轻松又感兴趣的语气，用于播报体育赛事。
`style="sports_commentary_excited"`	用快速且充满活力的语气播报体育赛事精彩瞬间。
`style="whispering"`	表达一种柔和的语气，试图发出安静而柔和的声音。
`style="terrified"`	表达一种害怕的语气，语速快且声音颤抖。听起来说话人处于不稳定的疯狂状态。
`style="unfriendly"`	表达一种冷淡无情的语气。

角色	说明
`role="Girl"`	声音模仿女孩。
`role="Boy"`	声音模仿男孩。
`role="YoungAdultFemale"`	声音模仿年轻的成年女性。
`role="YoungAdultMale"`	声音模仿年轻的成年男性。
`role="OlderAdultFemale"`	声音模仿年长的成年女性。
`role="OlderAdultMale"`	声音模仿年长的成年男性。
`role="SeniorFemale"`	声音模仿年老女性。
`role="SeniorMale"`	声音模仿年老男性。

背景音频

属性	说明	必需还是可选
`src`	背景音频文件的 URI 位置。	必需
`volume`	背景音频文件的音量。接受的值：`0` 到 `100`（含）。默认值为 `1`。	可选
`fadein`	背景音频淡入的持续时间，以毫秒为单位。默认值为 `0`，即，不淡入。接受的值：`0` 到 `10000`（含）。	可选
`fadeout`	背景音频淡出的持续时间，以毫秒为单位。默认值为 `0`，即，不淡出。接受的值：`0` 到 `10000`（含）。	可选

音频格格式

流式处理选非流式处理

amr-wb-16000hz
audio-16khz-16bit-32kbps-mono-opus
audio-16khz-32kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3
audio-16khz-128kbitrate-mono-mp3
audio-24khz-16bit-24kbps-mono-opus
audio-24khz-16bit-48kbps-mono-opus
audio-24khz-48kbitrate-mono-mp3
audio-24khz-96kbitrate-mono-mp3
audio-24khz-160kbitrate-mono-mp3
audio-48khz-96kbitrate-mono-mp3
audio-48khz-192kbitrate-mono-mp3
g722-16khz-64kbps
ogg-16khz-16bit-mono-opus
ogg-24khz-16bit-mono-opus
ogg-48khz-16bit-mono-opus
raw-8khz-8bit-mono-alaw
raw-8khz-8bit-mono-mulaw
raw-8khz-16bit-mono-pcm
raw-16khz-16bit-mono-pcm
raw-16khz-16bit-mono-truesilk
raw-22050hz-16bit-mono-pcm
raw-24khz-16bit-mono-pcm
raw-24khz-16bit-mono-truesilk
raw-44100hz-16bit-mono-pcm
raw-48khz-16bit-mono-pcm
webm-16khz-16bit-mono-opus
webm-24khz-16bit-24kbps-mono-opus
webm-24khz-16bit-mono-opus

riff-8khz-8bit-mono-alaw
riff-8khz-8bit-mono-mulaw
riff-8khz-16bit-mono-pcm
riff-22050hz-16bit-mono-pcm
riff-24khz-16bit-mono-pcm
riff-44100hz-16bit-mono-pcm
riff-48khz-16bit-mono-pcm

更多关于音频格式的说明请查看音频格式全解析：编码、采样率与封装格式的深度剖析

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

微软 Azure语音合成服务API接口使用说明

前言说明

使用说明

接口地址

请求头

请求参数

请求参数说明

返回参数

错误说明

参考代码

定价说明

官方文档

其他说明

评论(0)

提示：请文明发言取消回复

作者信息

文章目录

微软 Azure语音合成服务API接口使用说明

前言说明

使用说明

接口地址

请求头

请求参数

请求参数说明

返回参数

错误说明

参考代码

HTML前端

定价说明

官方文档

其他说明

相关文章

GPT-SoVITS-WebUI：革新性的开源声音克隆项目

音频格式全解析：编码、采样率与封装格式的深度剖析

Demucs 人声分离一键安装包下载：开启音乐源分离的新时代

全民 K 歌本地录音上传、替换、修改SSS级别评分全教程

评论(0)

提示：请文明发言 取消回复

作者信息

文章目录

标签

提示：请文明发言取消回复