构建您的第一个语音驱动的Web应用程序

前段时间，我有个目标：实现一个 Web 应用，能够自动记录我所说的内容，并允许通过语音命令执行操作。我觉得最好能提供一种方式来监听来自应用的反馈。快速研究之后，我发现了一些 Web API 可以解决这个问题。

在本文中，我将讲解如何使用现代 Web API 为您的 Web 应用添加强大的通信功能，并使其能够响应您的请求。我们将从零开始实现该应用。

什么是 API？

API 是应用程序编程接口 (Application Programming Interface)的缩写。根据MDN网站：

API 是编程语言中可用的构造，允许开发人员更轻松地创建复杂的功能。

简单来说，API 提供了一种创建复杂应用程序的方法，而无需学习或实现细节。

Web API

你曾经使用过fetch或Service Worker吗？也许你曾经在 JavaScript 中使用或访问过DOM ？

好吧，您可以基于这些功能完成复杂的任务，因为它们是大量Web API的一部分。这些 API 不是 JavaScript 的一部分，但是，您可以通过 JavaScript（或任何其他基于 JavaScript 的库/框架）使用它们。

另一方面，在开始构建基于 Web API 的应用程序之前，你可能需要确保你的 Web 浏览器完全支持该 API。例如，如果你打算使用fetch，你可以查看哪些浏览器或 JavaScript 引擎支持它。

Web 语音 API

正如您在上图中所看到的，此 Web API 可以帮助您完成以下操作：

生成语音转文本输出
使用语音识别作为输入
支持连续听写（可以写完整的字母）
网络浏览器的控制界面

更多详细信息请参阅Web Speech API规范。

语音合成接口

看完上面的图片你应该明白了。Web语音合成界面可以生成文本转语音的输出。

请参阅规范以了解有关此接口的更多信息。

观看视频

演示：语音驱动的 Web 应用程序

实现 Web 应用程序

该应用程序将基于 HTML、CSS 和 TypeScript 作为编程语言。我们将通过 Angular Material 组件使用最新版本的 Angular。
此外，我们将使用 Angular 中的Observables和AsyncPipe定义一种响应式编程方法。最后，我们将提供策略模式的实现以及其他一些功能。

创建项目

让我们使用最新的Angular CLI从头开始创建 Web 应用程序：

ng new web-speech-angular --routing --style css --prefix wsa --strict

--routing：为项目生成路由模块。
--style：样式文件的文件扩展名。
--prefix：为组件选择器设置前缀
--strict：从Angular 10开始可用。启用更严格的类型检查并构建优化选项。

添加 Angular Material

此时添加 Angular Material 就很简单了：

ng add @angular/material

现在，我们可以按照Angular 的整体结构指南shared来生成material模块：

ng generate module shared --module app
ng generate module shared/material --module shared

这些命令将在您的项目中生成以下结构：

|- src/
    |- app/
        |- shared/
            |- material/
                |- material.module.ts
            |- shared.module.ts

添加`web-speech`模块

现在是时候添加一个新模块来定义显示应用程序控件所需的组件了。

ng generate module web-speech --module app
ng generate component web-speech

现在我们将拥有以下结构：

|- src/
    |- app/
        |- shared/
        |- web-speech/
            |- web-speech.module.ts
            |- web-speech.component.ts|html|css

添加`web-apis`目录

让我们创建一个新文件夹，将与我们将要使用的 Web API 相关的服务分组。此外，让我们为新服务支持的语言、通知、错误和事件定义一些 TypeScript 文件。

ng generate service shared/services/web-apis/speech-recognizer

运行上述命令并创建模型文件后，结构如下：

|- src/
    |- app/
        |- shared/
            |- shared.module.ts
            |- services/
                |- web-apis/
                    |- speech-recognizer.service.ts
            |- model/
                |- languages.ts
                |- speech-error.ts
                |- speech-event.ts
                |- speech-notification.ts
        |- web-speech/
            |- web-speech.module.ts
            |- web-speech.component.ts|html|css

建模通知、事件和错误

由于当前规范是用 JavaScript 编写的，我们可以提供一些 TypeScript 代码来利用TypeScriptstrict功能。由于项目已配置为启用 TypeScript 模式，这一点尤为重要。

// languages.ts
export const languages = ['en-US', 'es-ES'];
export const defaultLanguage = languages[0];

// speech-error.ts
export enum SpeechError {
  NoSpeech = 'no-speech',
  AudioCapture = 'audio-capture',
  NotAllowed = 'not-allowed',
  Unknown = 'unknown'
}

// speech-event.ts
export enum SpeechEvent {
  Start,
  End,
  FinalContent,
  InterimContent
}

// speech-notification.ts
export interface SpeechNotification<T> {
    event?: SpeechEvent;
    error?: SpeechError;
    content?: T;
}

注意SpeechError枚举。字符串键与SpeechRecognitionErrorEvent规范中的实际值匹配。

创建`SpeechRecognizerService`（异步语音识别）

主要目标是定义应用程序所需功能的抽象：

SpeechRecognizerService为（webkitSpeechRecognitionGoogle Chrome 支持的实例）定义基本配置。
定义语言配置。
掌握中期和最终结果。
允许启动和停止识别器服务。

以下代码提供了这些要求的实现：

// speech-recognizer.service.ts
@Injectable({
  providedIn: 'root',
})
export class SpeechRecognizerService {
  recognition: SpeechRecognition;
  language: string;
  isListening = false;

  constructor() {}

  initialize(language: string): void {
    this.recognition = new webkitSpeechRecognition();
    this.recognition.continuous = true;
    this.recognition.interimResults = true;
    this.setLanguage(language);
  }

  setLanguage(language: string): void {
    this.language = language;
    this.recognition.lang = language;
  }

  start(): void {
    this.recognition.start();
    this.isListening = true;
  }

  stop(): void {
    this.recognition.stop();
  }
}

现在是时候提供一个面向响应式编程的 API，以便使用可观察对象实现持续数据流。这将有助于在用户持续说话时“捕捉”推断出的文本（我们不需要每次都拉取值来查看是否有新内容）。

export class SpeechRecognizerService {
  // previous implementation here...

  onStart(): Observable<SpeechNotification<never>> {
    if (!this.recognition) {
      this.initialize(this.language);
    }

    return new Observable(observer => {
      this.recognition.onstart = () => observer.next({
        event: SpeechEvent.Start
      });
    });
  }

  onEnd(): Observable<SpeechNotification<never>> {
    return new Observable(observer => {
      this.recognition.onend = () => {
        observer.next({
          event: SpeechEvent.End
        });
        this.isListening = false;
      };
    });
  }

  onResult(): Observable<SpeechNotification<string>> {
    return new Observable(observer => {
      this.recognition.onresult = (event: SpeechRecognitionEvent) => {
        let interimContent = '';
        let finalContent = '';

        for (let i = event.resultIndex; i < event.results.length; ++i) {
          if (event.results[i].isFinal) {
            finalContent += event.results[i][0].transcript;
            observer.next({
              event: SpeechEvent.FinalContent,
              content: finalContent
            });
          } else {
            interimContent += event.results[i][0].transcript;
            observer.next({
              event: SpeechEvent.InterimContent,
              content: interimContent
            });
          }
        }
      };
    });
  }

  onError(): Observable<SpeechNotification<never>> {
    return new Observable(observer => {
      this.recognition.onerror = (event) => {
        const eventError: string = (event as any).error;
        let error: SpeechError;
        switch (eventError) {
          case 'no-speech':
            error = SpeechError.NoSpeech;
            break;
          case 'audio-capture':
            error = SpeechError.AudioCapture;
            break;
          case 'not-allowed':
            error = SpeechError.NotAllowed;
            break;
          default:
            error = SpeechError.Unknown;
            break;
        }

        observer.next({
          error
        });
      };
    });
  }  
}

在前面的代码中，我们编写了返回Observable 的包装函数来管理以下事件处理程序：

recognition.onstart = function() { ... }
recognition.onend = function() { ... }
recognition.onresult = function(event) { ... }
recognition.onerror = function(event) { ... }

为了更好地理解这些功能的工作原理，请参阅SpeechRecognition Events、SpeechRecognitionResult和SpeechRecognitionErrorEvent的 API 规范。

致力于`WebSpeechComponent`

由于我们SpeechRecognizerService已经有了，现在是时候定义 Angular 组件了：

// web-speech-component.ts
import { ChangeDetectionStrategy, Component, OnInit } from '@angular/core';
import { merge, Observable, Subject } from 'rxjs';
import { map, tap } from 'rxjs/operators';
import { defaultLanguage, languages } from '../shared/model/languages';
import { SpeechError } from '../shared/model/speech-error';
import { SpeechEvent } from '../shared/model/speech-event';
import { SpeechRecognizerService } from '../shared/web-apis/speech-recognizer.service';

@Component({
  selector: 'wsa-web-speech',
  templateUrl: './web-speech.component.html',
  styleUrls: ['./web-speech.component.css'],
  changeDetection: ChangeDetectionStrategy.OnPush,
})
export class WebSpeechComponent implements OnInit {
  languages: string[] = languages;
  currentLanguage: string = defaultLanguage; // Set the default language
  totalTranscript: string; // The variable to accumulate all the recognized texts

  transcript$: Observable<string>; // Shows the transcript in "real-time"
  listening$: Observable<boolean>; // Changes to 'true'/'false' when the recognizer starts/stops
  errorMessage$: Observable<string>; // An error from the Speech Recognizer
  defaultError$ = new Subject<undefined>(); // Clean-up of the previous errors

  constructor(private speechRecognizer: SpeechRecognizerService) {}

  ngOnInit(): void {
    // Initialize the speech recognizer with the default language
    this.speechRecognizer.initialize(this.currentLanguage);
    // Prepare observables to "catch" events, results and errors.
    this.initRecognition();
  }

  start(): void {
    if (this.speechRecognizer.isListening) {
      this.stop();
      return;
    }

    this.defaultError$.next(undefined);
    this.speechRecognizer.start();
  }

  stop(): void {
    this.speechRecognizer.stop();
  }

  selectLanguage(language: string): void {
    if (this.speechRecognizer.isListening) {
      this.stop();
    }
    this.currentLanguage = language;
    this.speechRecognizer.setLanguage(this.currentLanguage);
  }
}

本质上，前面的代码展示了如何定义要完成的主要属性和功能：

允许切换语音识别的语言。
了解SpeechRecognizer何时在“聆听”。
允许从组件上下文启动和停止SpeechRecognizer 。

现在的问题是：我们如何获取文字记录（用户正在以文本形式讲话的内容），以及如何知道语音服务何时正在监听？此外，我们如何知道麦克风或 API 本身是否存在错误？

答案是：使用来自的Observables 。我们SpeechRecognizerService不使用subscribe ，而是从服务中获取并分配Observables ，稍后将通过模板中的Async Pipes使用它。

// web-speech.component.ts
export class WebSpeechComponent implements OnInit {
  // Previous code here...
  private initRecognition(): void {

    // "transcript$" now will receive every text(interim result) from the Speech API.
    // Also, for every "Final Result"(from the speech), the code will append that text to the existing Text Area component.
    this.transcript$ = this.speechRecognizer.onResult().pipe(
      tap((notification) => {
        if (notification.event === SpeechEvent.FinalContent) {
          this.totalTranscript = this.totalTranscript
            ? `${this.totalTranscript}\n${notification.content?.trim()}`
            : notification.content;
        }
      }),
      map((notification) => notification.content || '')
    );

  // "listening$" will receive 'true' when the Speech API starts and 'false' when it's finished.
    this.listening$ = merge(
      this.speechRecognizer.onStart(),
      this.speechRecognizer.onEnd()
    ).pipe(
      map((notification) => notification.event === SpeechEvent.Start)
    );

  // "errorMessage$" will receive any error from Speech API and it will map that value to a meaningful message for the user
    this.errorMessage$ = merge(
      this.speechRecognizer.onError(),
      this.defaultError$
    ).pipe(
      map((data) => {
        if (data === undefined) {
          return '';
        }
        let message;
        switch (data.error) {
          case SpeechError.NotAllowed:
            message = `Cannot run the demo.
            Your browser is not authorized to access your microphone.
            Verify that your browser has access to your microphone and try again.`;
            break;
          case SpeechError.NoSpeech:
            message = `No speech has been detected. Please try again.`;
            break;
          case SpeechError.AudioCapture:
            message = `Microphone is not available. Plese verify the connection of your microphone and try again.`;
            break;
          default:
            message = '';
            break;
        }
        return message;
      })
    );
  }
}

模板`WebSpeechComponent`

正如我们之前所说，组件的模板将由Async Pipes提供支持：

<section>
  <mat-card *ngIf="errorMessage$| async as errorMessage" class="notification">{{errorMessage}}</mat-card>
</section>
<section>
  <mat-form-field>
    <mat-label>Select your language</mat-label>
    <mat-select [(value)]="currentLanguage">
      <mat-option *ngFor="let language of languages" [value]="language" (click)="selectLanguage(language)">
        {{language}}
      </mat-option>
    </mat-select>
  </mat-form-field>
</section>
<section>
  <button mat-fab *ngIf="listening$ | async; else mic" (click)="stop()">
    <mat-icon class="soundwave">mic</mat-icon>
  </button>
  <ng-template #mic>
    <button mat-fab (click)="start()">
      <mat-icon>mic</mat-icon>
    </button>
  </ng-template>
</section>
<section *ngIf="transcript$ | async">
  <mat-card class="notification mat-elevation-z4">{{transcript$ | async}}</mat-card>
</section>
<section>
  <mat-form-field class="speech-result-width">
    <textarea matInput [value]="totalTranscript || ''" placeholder="Speech Input Result" rows="15" disabled="false"></textarea>
  </mat-form-field>
</section>

此时，应用程序已准备好启用麦克风并聆听您的声音！

添加`SpeechSynthesizerService`（文本转语音）

让我们首先创建服务：

ng generate service shared/services/web-apis/speech-synthesizer

将以下代码添加到该文件中。

// speech-synthesizer.ts
import { Injectable } from '@angular/core';

@Injectable({
  providedIn: 'root',
})
export class SpeechSynthesizerService {
  speechSynthesizer!: SpeechSynthesisUtterance;

  constructor() {
    this.initSynthesis();
  }

  initSynthesis(): void {
    this.speechSynthesizer = new SpeechSynthesisUtterance();
    this.speechSynthesizer.volume = 1;
    this.speechSynthesizer.rate = 1;
    this.speechSynthesizer.pitch = 0.2;
  }

  speak(message: string, language: string): void {
    this.speechSynthesizer.lang = language;
    this.speechSynthesizer.text = message;
    speechSynthesis.speak(this.speechSynthesizer);
  }
}

现在，应用程序就可以与您对话了。当应用程序准备好执行语音操作时，我们可以调用此服务。此外，我们还可以确认操作何时完成，甚至询问参数。

下一个目标是定义一组语音命令来通过应用程序执行操作。

通过策略定义行动

让我们思考一下应用程序中语音命令要执行的主要操作：

该应用程序可以通过 Angular Material 提供的任何其他主题更改默认主题。
应用程序可以更改应用程序的标题属性。
同时，我们应该能够将每个最终结果附加到现有的文本区域组件上。

针对这种情况，设计解决方案的方法有很多种。在本例中，我们来思考如何定义一些策略来更改应用程序的主题和标题。

目前，策略是我们最喜欢的关键词。在了解了设计模式之后，我们显然可以使用策略模式来解决这个问题。

添加`ActionContext`服务和策略

让我们创建ActionContext、ActionStrategy和ChangeThemeStrategy类ChangeTitleStrategy：

ng generate class shared/services/action/action-context
ng generate class shared/services/action/action-strategy
ng generate class shared/services/action/change-theme-strategy
ng generate class shared/services/action/change-title-strategy

// action-context.ts
@Injectable({
  providedIn: 'root',
})
export class ActionContext {
  private currentStrategy?: ActionStrategy;

  constructor(
    private changeThemeStrategy: ChangeThemeStrategy,
    private changeTitleStrategy: ChangeTitleStrategy,
    private titleService: Title,
    private speechSynthesizer: SpeechSynthesizerService
  ) {
    this.changeTitleStrategy.titleService = titleService;
  }

  processMessage(message: string, language: string): void {
    const msg = message.toLowerCase();
    const hasChangedStrategy = this.hasChangedStrategy(msg, language);

    let isFinishSignal = false;
    if (!hasChangedStrategy) {
      isFinishSignal = this.isFinishSignal(msg, language);
    }

    if (!hasChangedStrategy && !isFinishSignal) {
      this.runAction(message, language);
    }
  }

  runAction(input: string, language: string): void {
    if (this.currentStrategy) {
      this.currentStrategy.runAction(input, language);
    }
  }

  setStrategy(strategy: ActionStrategy | undefined): void {
    this.currentStrategy = strategy;
  }

  // Private methods omitted. Please refer to the repository to see all the related source code.

// action-strategy.ts
export abstract class ActionStrategy {
  protected mapStartSignal: Map<string, string> = new Map<string, string>();
  protected mapEndSignal: Map<string, string> = new Map<string, string>();

  protected mapInitResponse: Map<string, string> = new Map<string, string>();
  protected mapFinishResponse: Map<string, string> = new Map<string, string>();
  protected mapActionDone: Map<string, string> = new Map<string, string>();

  constructor() {
    this.mapFinishResponse.set('en-US', 'Your action has been completed.');
    this.mapFinishResponse.set('es-ES', 'La accion ha sido finalizada.');
  }

  getStartSignal(language: string): string {
    return this.mapStartSignal.get(language) || '';
  }

  getEndSignal(language: string): string {
    return this.mapEndSignal.get(language) || '';
  }

  getInitialResponse(language: string): string {
    return this.mapInitResponse.get(language) || '';
  }
  getFinishResponse(language: string): string {
    return this.mapFinishResponse.get(language) || '';
  }
  abstract runAction(input: string, language: string): void;
}

// change-theme-strategy.ts
@Injectable({
  providedIn: 'root',
})
export class ChangeThemeStrategy extends ActionStrategy {
  private mapThemes: Map<string, Theme[]> = new Map<string, Theme[]>();
  private styleManager: StyleManager = new StyleManager();

  constructor(private speechSynthesizer: SpeechSynthesizerService) {
    super();
    this.mapStartSignal.set('en-US', 'perform change theme');
    this.mapStartSignal.set('es-ES', 'iniciar cambio de tema');

    this.mapEndSignal.set('en-US', 'finish change theme');
    this.mapEndSignal.set('es-ES', 'finalizar cambio de tema');

    this.mapInitResponse.set('en-US', 'Please, tell me your theme name.');
    this.mapInitResponse.set('es-ES', 'Por favor, mencione el nombre de tema.');

    this.mapActionDone.set('en-US', 'Changing Theme of the Application to');
    this.mapActionDone.set('es-ES', 'Cambiando el tema de la Aplicación a');

    this.mapThemes.set('en-US', [
      {
        keyword: 'deep purple',
        href: 'deeppurple-amber.css',
      }
    ]);
    this.mapThemes.set('es-ES', [
      {
        keyword: 'púrpura',
        href: 'deeppurple-amber.css',
      }
    ]);
    }

  runAction(input: string, language: string): void {
    const themes = this.mapThemes.get(language) || [];
    const theme = themes.find((th) => {
      return input.toLocaleLowerCase() === th.keyword;
    });

    if (theme) {
      this.styleManager.removeStyle('theme');
      this.styleManager.setStyle('theme', `assets/theme/${theme.href}`);
      this.speechSynthesizer.speak(
        `${this.mapActionDone.get(language)}: ${theme.keyword}`,
        language
      );
    }
  }
}

// change-title-strategy.ts
@Injectable({
  providedIn: 'root',
})
export class ChangeTitleStrategy extends ActionStrategy {
  private title?: Title;

  constructor(private speechSynthesizer: SpeechSynthesizerService) {
    super();
    this.mapStartSignal.set('en-US', 'perform change title');
    this.mapStartSignal.set('es-ES', 'iniciar cambio de título');

    this.mapEndSignal.set('en-US', 'finish change title');
    this.mapEndSignal.set('es-ES', 'finalizar cambio de título');

    this.mapInitResponse.set('en-US', 'Please, tell me the new title');
    this.mapInitResponse.set('es-ES', 'Por favor, mencione el nuevo título');

    this.mapActionDone.set('en-US', 'Changing title of the Application to');
    this.mapActionDone.set('es-ES', 'Cambiando el título de la Aplicación a');
  }

  set titleService(title: Title) {
    this.title = title;
  }

  runAction(input: string, language: string): void {
    this.title?.setTitle(input);
    this.speechSynthesizer.speak(
      `${this.mapActionDone.get(language)}: ${input}`,
      language
    );
  }
}

注意此服务的用途SpeechSynthesizerService和调用位置。当您使用该speak功能时，应用程序将使用您的扬声器回答您的问题。

源代码和现场演示

源代码

完整项目请见 GitHub 仓库：https://github.com/luixaviles/web-speech-angular。如果您希望贡献更多功能，请记得点赞⭐️或发送 Pull 请求。

现场演示

打开 Chrome 网络浏览器并访问https://luixaviles.com/web-speech-angular/。查看应用内的注释，并使用英语甚至西班牙语进行测试。

最后的话

尽管该演示是使用 Angular 和 TypeScript 编写的，但您可以将这些概念和 Web API 与任何其他 JavaScript 框架或库一起应用。

您可以在Twitter和GitHub上关注我，以了解有关我的工作的更多信息。

感谢您的阅读！

——路易斯·阿维莱斯

鏂囩珷鏉ユ簮锛�https://dev.to/luixaviles/build-your-first-voice-driven-web-application-1h99

构建您的第一个语音驱动的Web应用程序

构建您的第一个语音驱动的Web应用程序

什么是 API？

Web API

Web 语音 API

语音合成接口

观看视频

实现 Web 应用程序

创建项目

添加 Angular Material

添加web-speech模块

添加web-apis目录

建模通知、事件和错误

创建SpeechRecognizerService（异步语音识别）

致力于WebSpeechComponent

模板WebSpeechComponent

添加SpeechSynthesizerService（文本转语音）

通过策略定义行动

添加ActionContext服务和策略

源代码和现场演示

源代码

现场演示

最后的话

添加`web-speech`模块

添加`web-apis`目录

创建`SpeechRecognizerService`（异步语音识别）

致力于`WebSpeechComponent`

模板`WebSpeechComponent`

添加`SpeechSynthesizerService`（文本转语音）

添加`ActionContext`服务和策略