Command line tool to generate audio using SAPI.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vampi 3e5dadcd35 0.4 2 days ago
.gitignore misc 3 days ago
LICENSE Initial commit 2 weeks ago misc 3 days ago
add-voice.ps1 add some older files 2 weeks ago
buildtools.png add some older files 2 weeks ago
getoptw.c initial separation from ttsservice into a separate project 2 weeks ago
getoptw.h initial separation from ttsservice into a separate project 2 weeks ago
list-voices.ps1 add some older files 2 weeks ago
riffpad.png add some older files 2 weeks ago
sapicli.cpp set opus bitrate to lower than the default 25000 2 days ago
sapicli.sln cleanup project configuration and building with ogg and vorbis 4 days ago
sapicli.vcxproj add opus support, cleanups, error checks 3 days ago
sapierr.h add more error strings 2 weeks ago
test.bat less testing 2 days ago
zip.bat 0.4 2 days ago

SAPI command line interface

A simple tool to generate audio from text.

Using getoptW.

Development process

Microsoft can suck a big fat poopy pee pee.


  • Install Visual Studio Build Tools (direct download), and click on "Desktop Development with C++", on the right make sure to check "C++ ATL for latest ...", and you can uncheck "C++ Cmake tools ...", "Testing tools ..." and "C++ AddressSanitizer", to save on space.

Build Tools

After installing, run "Developer Command Prompt for VS 2022" from the start menu, or just hit "Launch" in Visual Studio Installer. cd to the folder where you've cloned this repo, cd to sapicli, and type:

msbuild sapicli.vcxproj -p:Configuration=Release

EVNT chunk

.wav files generated by using SPBindToFile() contain an EVNT chunk, which is a list of serialized events, their structure being that of SPSERIALIZEDEVENT plus any string referenced inside the event itself.

The first byte is the event type, and most events are 24 bytes long. Strings that follow events are in wide char format. String lengths are padded upwards to multiples of 4. So if the string is 126 bytes, it is stored as 128 bytes, with the last two bytes beign zeroes.

EVNT Chunk in RIFFPad

Excerpt from sphelper.h:

* SpSerializedEventSize *
*   Description:
*       Returns the size, in bytes, used by a serialized event.  The caller can
*   pass a pointer to either a SPSERIAILZEDEVENT or SPSERIALIZEDEVENT64 structure.
*   Returns:
*       Number of bytes used by serizlied event

template <class T>
inline ULONG SpSerializedEventSize(const T * pSerEvent)
    ULONG ulSize = sizeof(T);

    if( ( pSerEvent->elParamType == SPET_LPARAM_IS_POINTER ) && pSerEvent->SerializedlParam )
        ulSize += ULONG(pSerEvent->SerializedwParam);
    else if ((pSerEvent->elParamType == SPET_LPARAM_IS_STRING || pSerEvent->elParamType == SPET_LPARAM_IS_TOKEN) &&
             pSerEvent->SerializedlParam != NULL)
        ulSize += ((ULONG)wcslen((WCHAR*)(pSerEvent + 1)) + 1) * sizeof( WCHAR );
    // Round up to nearest DWORD
    ulSize += 3;
    ulSize -= ulSize % 4;
    return ulSize;