Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLSpeech: Fix various issues #2671

Merged
merged 14 commits into from
Dec 6, 2019
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- `component`: Updated timer to use functional component, by [@spyip](https://github.com/spyip) in PR [#2546](https://github.com/microsoft/BotFramework-WebChat/pull/2546)
- Fixes [#2651](https://github.com/microsoft/BotFramework-WebChat/issues/2651). Add `ends-with` string module to es5 bundle, by [@corinagum](https://github.com/corinagum) in PR [#2654](https://github.com/microsoft/BotFramework-WebChat/pull/2654)
- Fixes [#2658](https://github.com/microsoft/BotFramework-WebChat/issues/2658). Fix rendering of markdown images in IE11, by [@corinagum](https://github.com/corinagum) in PR [#2659](https://github.com/microsoft/BotFramework-WebChat/pull/2659)
- Fixes [#2662](https://github.com/microsoft/BotFramework-WebChat/issues/2662) and [#2666](https://github.com/microsoft/BotFramework-WebChat/issues/2666). Fix various issues related to Direct Line Speech, by [@compulim](https://github.com/compulim) in PR [#2671](https://github.com/microsoft/BotFramework-WebChat/pull/2671)
- Added triple-buffering to reduce pops/cracks.
- Enable Safari by upsampling to 48000 Hz.
- Support detailed output format on Web Chat side.

### Changed

Expand Down
Binary file not shown.
2 changes: 1 addition & 1 deletion packages/directlinespeech/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 9 additions & 11 deletions packages/directlinespeech/src/createAdapters.js
Original file line number Diff line number Diff line change
Expand Up @@ -92,20 +92,17 @@ export default async function create({

// Supported options can be found in DialogConnectorFactory.js.

// Setting the language use for recognition.
compulim marked this conversation as resolved.
Show resolved Hide resolved
config.setProperty(PropertyId.SpeechServiceConnection_RecoLanguage, speechRecognitionLanguage);

// The following code set the output format. But currently, none of the following works for setting detailed output format.
// We will leave these code commented until the Speech SDK support, possibly it in one of the way mentioned below.
// The following code set the output format.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// As advised by Speech team, this API may change in the future.
compulim marked this conversation as resolved.
Show resolved Hide resolved
compulim marked this conversation as resolved.
Show resolved Hide resolved
config.setProperty(PropertyId.SpeechServiceResponse_OutputFormatOption, 'detailed');

// config.setProperty(PropertyId.SpeechServiceResponse_OutputFormatOption, OutputFormat[OutputFormat.Detailed]);
// config.setProperty(PropertyId.SpeechServiceResponse_RequestDetailedResultTrueFalse, true);
// config.setProperty(OutputFormatPropertyName, OutputFormat[OutputFormat.Detailed]);
// config.setServiceProperty(PropertyId.SpeechServiceResponse_RequestDetailedResultTrueFalse, "true", ServicePropertyChannel.UriQueryParameter);

// The following code is copied from C#, it should set from.id, but it did not.
// https://github.com/Azure-Samples/Cognitive-Services-Direct-Line-Speech-Client/blob/master/DLSpeechClient/MainWindow.xaml.cs#L236
// Setting the user ID for starting the conversation.
compulim marked this conversation as resolved.
Show resolved Hide resolved
userID && config.setProperty(PropertyId.Conversation_From_Id, userID);

// Setting Custom Speech and Custom Voice.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// The following code is copied from C#, and it is not working yet.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// https://github.com/Azure-Samples/Cognitive-Services-Direct-Line-Speech-Client/blob/master/DLSpeechClient/MainWindow.xaml.cs
// speechRecognitionEndpointId && config.setServiceProperty('cid', speechRecognitionEndpointId, ServicePropertyChannel.UriQueryParameter);
Expand All @@ -115,11 +112,12 @@ export default async function create({

dialogServiceConnector.connect();

// Renew token
// Renew token per interval.
if (authorizationToken) {
const interval = setInterval(async () => {
// If the connector has been disposed, we should stop renewing the token.
// TODO: We should use a public implementation if Speech SDK has one.

// TODO: We should use a public implementation if Speech SDK has one related to "privIsDisposed".
compulim marked this conversation as resolved.
Show resolved Hide resolved
if (dialogServiceConnector.privIsDisposed) {
clearInterval(interval);
}
Expand Down
108 changes: 108 additions & 0 deletions packages/directlinespeech/src/createMultiBufferingPlayer.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
// Currently, we use a triple-buffer approach.
compulim marked this conversation as resolved.
Show resolved Hide resolved
const NUM_BUFFER = 3;

function zeroBuffer(buffer) {
const channels = buffer.numberOfChannels;

for (let channel = 0; channel < channels; channel++) {
const audioData = buffer.getChannelData(channel);

[].fill.call(audioData, 0);
}
}

function copyBuffer(buffer, multiChannelArrayBuffer) {
const channels = buffer.numberOfChannels;

for (let channel = 0; channel < channels; channel++) {
const arrayBuffer = multiChannelArrayBuffer[channel];

// Safari does not support AudioBuffer.copyToChannel yet.
compulim marked this conversation as resolved.
Show resolved Hide resolved
if (buffer.copyToChannel) {
buffer.copyToChannel(arrayBuffer, channel);
} else {
const { length: arrayBufferLength } = arrayBuffer;
const perChannelBuffer = buffer.getChannelData(channel);

for (let offset = 0; offset < arrayBufferLength; offset++) {
perChannelBuffer[offset] = arrayBuffer[offset];
}
}
}
}

// This is a multi-buffering player. Users can keep pushing buffer to us.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// We will realize the buffer as BufferSource and queue it to AudioContext.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// We will queue as soon, and as much as possible.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// We do not support progressive buffering (push partial buffer) and do not have plan for it.
compulim marked this conversation as resolved.
Show resolved Hide resolved

export default function createMultiBufferingPlayer(audioContext, { channels, samplesPerSec }, numSamplePerBuffer) {
const freeBuffers = new Array(NUM_BUFFER)
.fill()
.map(() => audioContext.createBuffer(channels, numSamplePerBuffer, samplesPerSec));
let queuedBufferSources = [];
let nextSchedule;

const queue = [];

const playNext = () => {
if (typeof nextSchedule !== 'number') {
nextSchedule = audioContext.currentTime;
}

const bufferSource = audioContext.createBufferSource();
const multiChannelArrayBuffer = queue.shift();

if (typeof multiChannelArrayBuffer === 'function') {
// If the queued item is a function, it is because the user called "flush".
// The "flush" function will callback when all queued buffer before the "flush" call had played.
compulim marked this conversation as resolved.
Show resolved Hide resolved
multiChannelArrayBuffer();
} else if (multiChannelArrayBuffer) {
const nextBuffer = freeBuffers.shift();

// If all buffers are currently occupied, prepend the data back to the queue.
// When one of the buffer finish, it will call playNext() again to pick up things from the queue.
compulim marked this conversation as resolved.
Show resolved Hide resolved
if (!nextBuffer) {
queue.unshift(multiChannelArrayBuffer);

return;
}

zeroBuffer(nextBuffer);
copyBuffer(nextBuffer, multiChannelArrayBuffer);

bufferSource.buffer = nextBuffer;
bufferSource.connect(audioContext.destination);
bufferSource.start(nextSchedule);

// We will remember all BufferSource that is currently queued at the AudioContext, thru bufferSource.start().
compulim marked this conversation as resolved.
Show resolved Hide resolved
// This is for cancelAll() to effectively cancel all BufferSource queued at the AudioContext.
queuedBufferSources.push(bufferSource);

nextSchedule += nextBuffer.duration;

bufferSource.addEventListener('ended', () => {
queuedBufferSources = queuedBufferSources.filter(target => target !== bufferSource);

// Declare this buffer is free to pick up on next round.
compulim marked this conversation as resolved.
Show resolved Hide resolved
freeBuffers.push(nextBuffer);
playNext();
});
}
};

return {
cancelAll: () => {
queue.splice(0);

// Although all buffer are cleared, there are still some BufferSources queued at the AudioContext that need to be stopped.
compulim marked this conversation as resolved.
Show resolved Hide resolved
queuedBufferSources.forEach(bufferSource => bufferSource.stop());
},
flush: () => new Promise(resolve => queue.push(resolve)),
push: multiChannelArrayBuffer => {
queue.push(multiChannelArrayBuffer);

playNext();
}
};
}
134 changes: 99 additions & 35 deletions packages/directlinespeech/src/playCognitiveServicesStream.js
Original file line number Diff line number Diff line change
@@ -1,27 +1,29 @@
/* eslint no-magic-numbers: ["error", { "ignore": [8, 16, 32, 128, 1000, 32768, 2147483648] }] */
/* eslint no-magic-numbers: ["error", { "ignore": [0, 1, 8, 16, 32, 128, 1000, 32768, 96000, 2147483648] }] */
/* eslint no-await-in-loop: "off" */
/* eslint prefer-destructuring: "off" */

import cognitiveServicesPromiseToESPromise from './cognitiveServicesPromiseToESPromise';
import createDeferred from 'p-defer';
import createMultiBufferingPlayer from './createMultiBufferingPlayer';

function createBufferSource(audioContext, { channels, samplesPerSec }, channelInterleavedAudioData) {
const bufferSource = audioContext.createBufferSource();
const frames = channelInterleavedAudioData.length / channels;
const audioBuffer = audioContext.createBuffer(channels, frames, samplesPerSec);
// Safari requires audio buffer with sample rate of 22050 Hz.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// Let's use 44100 Hz, Speech SDK's default 16000 Hz sample will be upsampled to 48000 Hz.
compulim marked this conversation as resolved.
Show resolved Hide resolved
const MIN_SAMPLE_RATE = 44100;

for (let channel = 0; channel < channels; channel++) {
const perChannelAudioData = audioBuffer.getChannelData(channel);
// We assume Speech SDK chop packet at size 4096 bytes, they hardcode it in Speech SDK.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// We will set up our multi-buffering player with 3 buffers each of 4096 bytes (2048 of 16-bit samples).
compulim marked this conversation as resolved.
Show resolved Hide resolved
// For simplicity, our multi-buffer player currently do not support progressive buffering.
compulim marked this conversation as resolved.
Show resolved Hide resolved

// We are copying channel-interleaved audio data, into per-channel audio data
for (let perChannelIndex = 0; perChannelIndex < channelInterleavedAudioData.length; perChannelIndex++) {
perChannelAudioData[perChannelIndex] = channelInterleavedAudioData[perChannelIndex * channels + channel];
}
}
// Progressive buffering means, we can queue at any sample size and they will be concatenated.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// For example, queue 1000 samples, then queue 1048 samples, they will be concatenated into a single buffer of size 2048.
compulim marked this conversation as resolved.
Show resolved Hide resolved

// Currently, for simplicity, we will queue as two buffers.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// First one is 1000 samples followed by 1048 zeroes, second one is 1048 sample followed by 1000 zeroes.
compulim marked this conversation as resolved.
Show resolved Hide resolved

bufferSource.buffer = audioBuffer;
// There is no plan to support progressive buffering unless Speech SDK chop at dynamic size.
compulim marked this conversation as resolved.
Show resolved Hide resolved
const DEFAULT_BUFFER_SIZE = 4096;

return bufferSource;
function average(array) {
return array.reduce((sum, value) => sum + value, 0) / array.length;
}

function formatTypedBitArrayToFloatArray(audioData, maxValue) {
Expand Down Expand Up @@ -56,6 +58,49 @@ function abortToReject(signal) {
});
}

// In a 2 channel audio (A/B), the data come as interleaved like "ABABABABAB".
compulim marked this conversation as resolved.
Show resolved Hide resolved
// This function will take "ABABABABAB" and return an array ["AAAAA", "BBBBB"].
function deinterleave(channelInterleavedAudioData, { channels }) {
const multiChannelArrayBuffer = new Array(channels);
const frameSize = channelInterleavedAudioData.length / channels;

for (let channel = 0; channel < channels; channel++) {
const audioData = new Float32Array(frameSize);

multiChannelArrayBuffer[channel] = audioData;

for (let offset = 0; offset < frameSize; offset++) {
audioData[offset] = channelInterleavedAudioData[offset * channels + channel];
}
}

corinagum marked this conversation as resolved.
Show resolved Hide resolved
return multiChannelArrayBuffer;
}

// This function upsample the audio data by an integer multiplier.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// We implemented simple anti-aliasing. For simplicity, the anti-aliasing do not roll over to next buffer.
compulim marked this conversation as resolved.
Show resolved Hide resolved
function multiplySampleRate(source, sampleRateMultiplier) {
if (sampleRateMultiplier === 1) {
return source;
}

const lastValues = new Array(sampleRateMultiplier).fill(source[0]);
const target = new Float32Array(source.length * sampleRateMultiplier);

for (let sourceOffset = 0; sourceOffset < source.length; sourceOffset++) {
const value = source[sourceOffset];
const targetOffset = sourceOffset * sampleRateMultiplier;

for (let multiplierIndex = 0; multiplierIndex < sampleRateMultiplier; multiplierIndex++) {
lastValues.shift();
lastValues.push(value);
target[targetOffset + multiplierIndex] = average(lastValues);
}
}

return target;
}

export default async function playCognitiveServicesStream(
audioContext,
audioFormat,
Expand All @@ -66,7 +111,6 @@ export default async function playCognitiveServicesStream(

try {
const abortPromise = abortToReject(signal);
let lastBufferSource;

const read = () =>
Promise.race([
Expand All @@ -79,43 +123,63 @@ export default async function playCognitiveServicesStream(
throw new Error('aborted');
}

let newSamplesPerSec = audioFormat.samplesPerSec;
let sampleRateMultiplier = 1;

// Safari requires a minimum sample rate of 22100 Hz.
// We will calculate a multiplier so it meet the minimum sample rate.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// We prefer an integer-based multiplier to simplify our upsampler.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// For safety, we will only upsample up to 96000 Hz.
compulim marked this conversation as resolved.
Show resolved Hide resolved
while (newSamplesPerSec < MIN_SAMPLE_RATE && newSamplesPerSec < 96000) {
sampleRateMultiplier++;
newSamplesPerSec = audioFormat.samplesPerSec * sampleRateMultiplier;
}

// The third parameter is sample size in bytes.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// For example, Speech SDK send us 4096 bytes of 16-bit samples. That means, 2048 samples per channel.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// The multi-buffering player will be set up to handle 2048 samples per buffer.
compulim marked this conversation as resolved.
Show resolved Hide resolved
// If we have a multiplier of 3x, it will handle 6144 samples per buffer.
compulim marked this conversation as resolved.
Show resolved Hide resolved
const player = createMultiBufferingPlayer(
audioContext,
{ ...audioFormat, samplesPerSec: newSamplesPerSec },
(DEFAULT_BUFFER_SIZE / (audioFormat.bitsPerSample / 8)) * sampleRateMultiplier
);

// For safety, we will only handle up to 1000 chunks.
compulim marked this conversation as resolved.
Show resolved Hide resolved
for (
let chunk = await read(), currentTime, maxChunks = 0;
let chunk = await read(), maxChunks = 0;
!chunk.isEnd && maxChunks < 1000 && !signal.aborted;
chunk = await read(), maxChunks++
) {
if (signal.aborted) {
break;
}

const audioData = formatAudioDataArrayBufferToFloatArray(audioFormat, chunk.buffer);
const bufferSource = createBufferSource(audioContext, audioFormat, audioData);
const { duration } = bufferSource.buffer;
// Data received from Speech SDK is interleaved. It means, 2 channel (A/B) will be sent as "ABABABABAB"
compulim marked this conversation as resolved.
Show resolved Hide resolved
// And each sample (A/B) will be a 8 to 32-bit number.
compulim marked this conversation as resolved.
Show resolved Hide resolved

if (!currentTime) {
currentTime = audioContext.currentTime;
}
// First, we convert 8 to 32-bit number, into a floating-point number, which is required by Web Audio API.
compulim marked this conversation as resolved.
Show resolved Hide resolved
const interleavedArrayBuffer = formatAudioDataArrayBufferToFloatArray(audioFormat, chunk.buffer);

bufferSource.connect(audioContext.destination);
bufferSource.start(currentTime);
// Then, we deinterleave them back into two array buffer, as "AAAAA" and "BBBBB".
compulim marked this conversation as resolved.
Show resolved Hide resolved
const multiChannelArrayBuffer = deinterleave(interleavedArrayBuffer, audioFormat);

queuedBufferSourceNodes.push(bufferSource);
// Lastly, if needed, we will upsample them. If the multiplier is 2x, "AAAAA" will become "AAAAAAAAAA" (with anti-alias).
compulim marked this conversation as resolved.
Show resolved Hide resolved
const upsampledMultiChannelArrayBuffer = multiChannelArrayBuffer.map(arrayBuffer =>
multiplySampleRate(arrayBuffer, sampleRateMultiplier)
);

lastBufferSource = bufferSource;
currentTime += duration;
// Queue it to the buffering player.
compulim marked this conversation as resolved.
Show resolved Hide resolved
player.push(upsampledMultiChannelArrayBuffer);
}

abortPromise.catch(() => player.cancelAll());

if (signal.aborted) {
throw new Error('aborted');
}

if (lastBufferSource) {
const { promise, resolve } = createDeferred();

lastBufferSource.onended = resolve;

await Promise.race([abortPromise, promise]);
}
await Promise.race([abortPromise, player.flush()]);
} finally {
queuedBufferSourceNodes.forEach(node => node.stop());
}
Expand Down