使用 Tensorflow.js 检测 Twilio 聊天中的恶意语言

在当今的网络交流环境中，粗鲁或冒犯性的评论可能泛滥；然而，借助机器学习的力量，我们可以开始打击这种现象。

这篇博文将展示如何使用预先训练的 TensorFlow 模型和TensorFlow.js在客户端将文本分类为淫秽或有毒内容。然后，我们将此分类应用于使用Twilio 可编程聊天在聊天室中发送的消息。

Google 提供了许多预训练的 TensorFlow 模型，可供我们在应用程序中使用。其中一个模型是在Kaggle上提供的维基百科评论标注数据集上训练的。Google 提供了一个预训练的 TensorFlow.js 毒性模型的现场演示，您可以在其中测试短语。在继续阅读之前，您还可以在 Twilio 博客上阅读《开始使用 TensorFlow 之前你需要知道的 10 件事》。

设置

在开始之前，首先需要使用以下命令克隆Twilio JavaScript 聊天演示存储库git clone https://github.com/twilio/twilio-chat-demo-js.git
确保您拥有Twilio 帐户，以便获取帐户 SID、API 密钥 SID 和密码，以及您可以在Twilio 控制台聊天仪表板中创建的聊天服务 SID
在命令行上，确保您位于刚刚克隆的项目目录中

cd twilio-chat-demo-js

# make a new file credentials.json, copying it from credentials.example.json, and replace the credentials in it with the ones you gathered from your account in step one 
cp credentials.example.json credentials.json 

# install dependencies 
npm install 
# then start the server 
npm start

现在，如果您访问http://localhost:8080， 就可以测试一个基本的聊天应用程序了！您可以使用您选择的用户名或 Google 帐户以访客身份登录。请务必创建一个通道，以便使用 Tensorflow.js 检测潜在的恶意消息！

将 Tensorflow.js 集成到 Twilio 可编程聊天中

打开/public/index.html并在标签之间的某个位置<head></head>，使用以下行添加 TensorFlow.js 和 TensorFlow Toxicity 模型：

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/toxicity"></script>

这使得“toxicity”成为一个可以在 JavaScript 代码中使用的全局变量。好了！您已安装模型。

typing-indicator 在div上方的同一 HTML 文件中，添加以下行，如果聊天消息被视为冒犯，则会显示警告文本。

<div id="toxicity-indicator"><span></span></div>

在其正下方，对其进行以下样式更新div。

<style>
#channel-messages { 
    margin-bottom: 100px; 
    position: relative; 
    width: 100%; 
    height: calc(100%- 100px); 
    overflow-y: auto; 
}  
#toxicity-indicator { 
    padding: 5px 15px; 
    font-weight: bold; 
    color: #E30000; 
} 
#toxicity-indicator span { 
    display: block; 
    min-height: 
    18px; 
}
</style>

现在打开/public/js/index.js并准备做很多事情。首先，我们将创建一个名为的函数，用于检索聊天输入有害可能性的预测。它接受两个参数：“input”和“model”。

classifyToxicity

function classifyToxicity(input, model) {

我们需要classify()在模型上调用方法来预测输入聊天消息的恶意程度。此方法调用返回一个以为解析条件的 Promise predictions。

    console.log("input ", input);
    return model.classify(input).then(predictions => {

predictions是一个包含每个标签概率的对象数组。TensorFlow 模型可以根据标签进行预测：identity_attack、、、、、和。接下来insult，我们将循环遍历该数组，解析每个标签的三个值：标签，它是真（匹配的概率大于阈值）、假（不匹配的概率大于阈值）还是obscene空（severe_toxicity两者都不大于），以及预测（模型对输入为真、假或空的置信度百分比）。sexual_explicitthreattoxicity

 return predictions.map(p => {
      const label = p.label;
      const match = p.results[0].match;
      const prediction = p.results[0].probabilities[1];
      console.log(label + ': ' + match + '(' + prediction + ')');
      return match != false && prediction > 0.5;
    }).some(label => label);
  });

在上面的代码中，一个条件判断函数会检查模型对 TensorFlow 模型能够预测的七个有毒标签的置信度是否超过 50%，即输入内容为有毒内容。如果任何一个标签的预测结果为正值，则返回 true。完整的classifyToxicity()函数如下所示：

function classifyToxicity(input, model) {
  console.log('input ', input);
  return model.classify(input).then(predictions => {
    return predictions.map(p => {
      const label = p.label;
      const match = p.results[0].match;
      const prediction = p.results[0].probabilities[1];
      console.log(label + ': ' + match + '(' + prediction + ')');
      return match != false && prediction > 0.5;
    }).some(label => label);
  });

现在，每当聊天中有人输入新消息时，我们都需要调用此函数。接下来，我们将加载模型，并接受一个可选参数。默认值为 0.85，但在本文中，为了更精确，我们将其设置为常数 0.9。给定输入（在本例中为聊天消息），标签是您要预测的输出，阈值是模型对 TensorFlow 模型所提供预测的七个“有毒”标签的置信度。

toxicity.load()threshold

理论上，阈值越高，准确率就越高；然而，较高的阈值也意味着预测结果更有可能返回null， 因为它们低于阈值。您可以随意尝试更改阈值，看看这会如何改变模型返回的预测结果。

搜索$('#send-message').on('click', function () {并在该行上方添加

$('#send-message').off('click');
  const threshold = 0.9;
  toxicity.load(threshold).then(model => {
    $('#send-message').on('click', function () {

toxicity.load返回一个已解析模型的 Promise。加载模型也意味着加载其拓扑和权重。

拓扑：描述模型架构（使用的操作）的文件，包含对存储在外部的模型权重的引用。
权重：包含模型权重的二进制文件，通常与拓扑存储在同一目录中。
（参考TensorFlow 模型保存和加载指南）
您可以在 TensorFlow 文档、Keras 文档中阅读有关拓扑和权重的更多信息，还有许多研究论文对它们进行了详细的介绍。

现在，我们将向处理用户尝试发送消息的函数添加一些额外的代码。在中间$('#send-message').on('click', function () {添加var body = $('#message-body-input').val();

$('#toxicity-indicator span').text('');

如果我们设置了警告消息，这将清除它。接下来，在send-message点击事件中，我们使用函数检查消息 classifyToxicity。如果解析结果为 true，则不会发送消息并显示警告。完整代码如下：

toxicity.load(threshold).then(model => {
    $('#send-message').on('click', function () {
      $('#toxicity-indicator span').text('');
      var body = $('#message-body-input').val();
      classifyToxicity(body, model).then(result => {
        if (result) {
          $('#toxicity-indicator span').text('This message was deemed to be toxic, please be more kind when chatting in this channel.');
          $('#message-body-input').focus();
        } else {
          channel.sendMessage(body).then(function () {
            $('#message-body-input').val('').focus();
            $('#channel-messages').scrollTop($('#channel-messages ul').height());
            $('#channel-messages li.last-read').removeClass('last-read');
          });
        }
      });
    });
  });

让我们保存文件，确保npm start它从命令行运行，然后在localhost:8080上测试聊天！您会看到应用程序检测到恶意语言并显示警报。对于更友好的用户输入，您不会收到警告消息，但您可以通过查看 JavaScript 控制台来查看概率，如下所示：根据您的阈值，“我爱你，你真好”之类的消息的概率可能如下所示

下一步是什么？

这款 TensorFlow 模型还有其他用例：您可以执行情绪分析、审查消息、发送其他警告等等！您也可以在Twilio SMS 或其他消息平台上尝试此功能。根据您的用例，您还可以尝试不同的恶意标签。敬请关注更多 TensorFlow 与 Twilio 结合使用的帖子！请在评论区或在线告诉我您正在构建什么。GitHub
：elizabethsiegle
Twitter：@lizziepika
电子邮件：lsiegle@twilio.com

文章来源：https://dev.to/twilio/detect-token-language-in-twilio-chat-with-tensorflow-js-2662