MAGIC-TTS Demo

细粒度时序可控语音合成

Fine-Grained Controllable Speech Synthesis

使用同一条默认音色,在四个实用场景下对比 baseline、关键停顿控制、局部内容拉长与 spontaneous 生成效果。

Compare baseline timing, critical pause control, local content stretching, and spontaneous generation under the same default voice.

场景数 Scenes 4
模式 Modes controlled / spontaneous
控制粒度 Control Unit token duration / local pause

Controlled vs Spontaneous

场景对比

Scene Comparisons

Scene 01

导航转向

Navigation Turn

前方路口,左转。

目标:动作词需要被稳定听清,单靠自然合成不够。

Goal: the action phrase should remain clearly audible instead of being left entirely to natural timing.

v1 Baseline

所有内容字统一设为 170 ms。

All content tokens are fixed at 170 ms.

v2 Pause Only

保持内容字 170 ms,把动作前逗号停顿设为 260 ms。

Keep content at 170 ms and set the comma pause before the action phrase to 260 ms.

v3 Pause + Content

保留 260 ms 停顿,并把“左 / 转”拉长到 300 ms。

Keep the 260 ms pause and stretch “左 / 转” to 300 ms.

Scene 02

儿童跟读

Kids Reading

请跟我读,苹果。

目标:教学场景里,目标词音节需要被明确拉出来。

Goal: the target word should be clearly stretched in a guided-reading setting.

v1 Baseline

所有内容字统一设为 170 ms。

All content tokens are fixed at 170 ms.

v2 Pause Only

目标词前逗号停顿设为 260 ms。

Set the comma pause before the target word to 260 ms.

v3 Pause + Content

保留 260 ms 停顿,并把“苹 / 果”拉长到 300 ms。

Keep the 260 ms pause and stretch “苹 / 果” to 300 ms.

Scene 03

验证码播报

Accessibility Code

验证码是三七九,二一八。

目标:数字串播报必须控制分组和重点数字时长。

Goal: grouped digits and key numbers should be controlled explicitly rather than left to fluency alone.

v1 Baseline

所有内容字统一设为 170 ms。

All content tokens are fixed at 170 ms.

v2 Pause Only

把 3+3 分组边界处的逗号停顿设为 260 ms。

Set the boundary pause between the two 3-digit groups to 260 ms.

v3 Pause + Content

保留 260 ms 停顿,并把六个数字全部拉长到 300 ms。

Keep the 260 ms pause and stretch all six digits to 300 ms.

Scene 04

站点播报

Station Arrival

前方到站,五山站。

目标:站名前缀和站名本体需要分离,站名本体要被突出。

Goal: the station prefix and station name should be separated, and the station name should be emphasized.

v1 Baseline

所有内容字统一设为 170 ms。

All content tokens are fixed at 170 ms.

v2 Pause Only

站名前逗号停顿设为 260 ms。

Set the comma pause before the station name to 260 ms.

v3 Pause + Content

保留 260 ms 停顿,并把“五 / 山 / 站”拉长到 300 ms。

Keep the 260 ms pause and stretch “五 / 山 / 站” to 300 ms.

Spontaneous Duration Modeling

自发 Duration 建模 Demo

Spontaneous Duration Modeling Demos

导航转向

Navigation Turn

不提供任何 target-side 显式时长,模型自行建模节奏。

No explicit target-side duration is provided; the model predicts timing on its own.

儿童跟读

Kids Reading

不提供任何 target-side 显式时长,模型自行建模节奏。

No explicit target-side duration is provided; the model predicts timing on its own.

验证码播报

Accessibility Code

不提供任何 target-side 显式时长,模型自行建模节奏。

No explicit target-side duration is provided; the model predicts timing on its own.

站点播报

Station Arrival

不提供任何 target-side 显式时长,模型自行建模节奏。

No explicit target-side duration is provided; the model predicts timing on its own.