Microsoft Speech API (Windows 10) Sample (PowerShell)

April 03, 2023

Microsoft Speech API (Windows 10) Sample (PowerShell)

This sample can be run in Windows 10 Desktop. The script makes use of Windows local Speech API. The later part of this script also converts the WAV file from ordinary WAV to headerless WAV/VOX which is commonly used in IVRS.

# To be run in PowerShell

# Code converted from C# to PowerShell

# Tested in Win 10 1809 installation

# At least Chinese (Hong Kong) IME installed

# https://blog.darkthread.net/blog/microsoft-speech-api/

Add-Type -AssemblyName System.speech

$speak = New-Object System.Speech.Synthesis.SpeechSynthesizer

# Using Prompt Builder

$pb = New-Object System.Speech.Synthesis.PromptBuilder

# Define in simple way, Cantonese (under Tracy)

$pb.StartVoice('Microsoft Tracy Desktop')

$pb.AppendText('歡迎致電')

# Define using SSML, English, under Zira

$pb.AppendSsmlMarkup("<voice name='Microsoft Zira Desktop'>Thanks for calling </voice>")

# Define using SSML, Mandarin, under Huihui

$pb.AppendSsmlMarkup("<voice name='Microsoft Huihui Desktop'>歡迎致電</voice>")

$pb.EndVoice()

# Default is PCM format

$format = New-Object System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, 8, 'Mono')

$speak.SetOutputToWaveFile("C:\temp\testwave.wav", $format)

$speak.Speak($pb)

# Set output back to speaker (unlock the WAV file)

$speak.SetOutputToDefaultAudioDevice()

# Clear the PromptBuilder, ready for next Prompt

$pb.ClearContent()

# list supported voice

$speak.GetInstalledVoices().voiceInfo

# Microsoft David Desktop --> Male, English

# Microsoft Zira Desktop --> Female, English

# Microsoft Huihui Desktop --> Female, Mandarn

# Microsoft Tracy Desktop --> Female, Cantonese

# SSML reference - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup

# Phonetic set SAPI for zh-HK

# https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-ssml-phonetic-sets#zh-hk

$pb.ClearContent()

# 50% slower

$pb.AppendSsmlMarkup("<voice name='Microsoft Tracy Desktop'><prosody pitch=""+1st"" rate=""-10%"" volume=""50""> 香港</prosody></voice>")

#https://learn.microsoft.com/en-us/dotnet/api/system.speech.audioformat.speechaudioformatinfo.-ctor?view=netframework-4.8#system-speech-audioformat-speechaudioformatinfo-ctor(system-speech-audioformat-encodingformat-system-int32-system-int32-system-int32-system-int32-system-int32-system-byte())

# Change output format to uLaw, which is common for IVRS.

# 7 is for uLaw; A law is 6

$format = New-Object System.Speech.AudioFormat.SpeechAudioFormatInfo(7, 8000, 8, 1, 1, 2, $null )

$speak.SetOutputToWaveFile("C:\temp\testwave.wav", $format)

# $speak.SetOutputToWaveFile("C:\temp\testwave.wav")

$speak.Speak($pb)

# Set output back to speaker

$speak.SetOutputToDefaultAudioDevice()

# Convert to headerless VOX file for IVRS

$path = "C:\temp\testwave_alaw1.wav"

$pathvox = "C:\temp\testwave_alaw1.vox"

$headersize = 46

$binaryReader = New-Object System.IO.BinaryReader([System.IO.File]::Open($path , [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite))

$FileInfo = New-Object System.IO.FileInfo($path)

$buffhead = [char[]]::new($headersize)

$buffvox = [char[]]::new(($FileInfo.Length - $headersize))

$buffhead = $binaryReader.ReadBytes($headersize)

$buffvox = $binaryReader.ReadBytes($buffvox.Length)

$binaryWriter = New-Object System.IO.BinaryWriter([System.IO.File]::Open($pathvox , [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite))

[io.file]::WriteAllBytes($pathvox, $buffvox )

Search This Blog

At the frontier of technology and business

Microsoft Speech API (Windows 10) Sample (PowerShell)

Comments

Popular Posts

Microsoft Terminal Server (Remote Desktop) client span across multiple monitors

Siebel - Extract Siebel File Attachment with SSEUNZIP or SSEMUZIP