使用 Elasticsearch 实现闪电般的搜索速度

如果你正在阅读这篇博客，很可能你对 Elasticsearch 及其提供的解决方案非常感兴趣。这篇博客将为你介绍 Elasticsearch，并解释如何在 10 分钟内为你的应用程序实现快速搜索功能。当然，我们不会在这里编写一个完整的、可用于生产环境的搜索解决方案。但是，下面提到的概念将帮助你快速上手。那么，事不宜迟，让我们开始吧！

什么是 Elasticsearch？

Elasticsearch 是一款分布式搜索和分析引擎，可为各种类型的数据提供近乎实时的搜索和分析，无论是结构化或非结构化文本、数值数据还是地理空间数据。Elasticsearch 的关键优势之一在于其高效的数据存储和索引能力，从而支持快速搜索。它不仅限于简单的数据检索，还能聚合信息，帮助您发现数据中的趋势和模式。

你为什么需要它？

Elasticsearch 速度极快。由于 Elasticsearch 基于Lucene构建，因此它在全文搜索方面表现出色。Elasticsearch 也是一个近乎实时的搜索平台，这意味着从文档被索引到可以搜索的延迟非常短——通常只有一秒。因此，Elasticsearch 非常适合对时间要求较高的应用场景，例如安全分析和基础设施监控。

Elasticsearch 本质上是分布式的。存储在 Elasticsearch 中的文档分布在称为分片的不同容器中，这些分片会进行复制，以便在硬件故障时提供数据的冗余副本。Elasticsearch 的分布式特性使其能够扩展到数百台（甚至数千台）服务器，并处理 PB 级的数据。

Elasticsearch 的速度和可扩展性，以及它索引多种类型内容的能力，意味着它可以用于多种用例：

应用程序搜索
网站搜索
企业搜索
日志记录和日志分析，以及更多功能……

我们Webiny团队正在为即将发布的 v5 版本开发一项新功能，该功能将使用 Elasticsearch 在我们的核心应用（例如页面构建器、文件管理器和无头 CMS）中执行超高速搜索。请访问我们的GitHub 代码库了解更多信息。

Elasticsearch 入门

设置 Elasticsearch 集群

您可以创建托管部署，也可以在本地计算机上设置 Elasticsearch 集群。在本博客中，我们假设 Elasticsearch 集群已在 localhost:9200 上运行。如果您想进行本地设置，请参阅此指南。

设置 Elasticsearch Node.js 客户端

我们将使用 Elasticsearch 的官方 Node.js 客户端。您可以创建一个新的 Node.js 项目，也可以使用这个示例项目。

要安装最新版本的客户端，请运行以下命令：

npm install @elastic/elasticsearch

客户端使用起来非常简单，它支持 Elasticsearch 的所有公共 API，并且每个方法都公开相同的签名。

配置客户端

客户端设计便于您根据自身需求进行配置。在下面的示例中，您可以看到使用基本选项进行配置是多么简单。

const { Client } = require("@elastic/elasticsearch");

const client = new Client({
  // The Elasticsearch endpoint to use.
  node: "http://localhost:9200",
  // Max number of retries for each request.
  maxRetries: 5,
  // Max request timeout in milliseconds for each request.
  requestTimeout: 60000,
});

注意：正如我们之前提到的，我们假设本地运行着一个 Elasticsearch 集群。localhost:9200

Elasticsearch 实际应用

在深入探讨这篇博客的核心主题——搜索之前，我们需要创建索引并向其中添加一些文档。

创建索引

让我们在 Elasticsearch 集群中创建一个索引。

您可以使用create索引 API 向 Elasticsearch 集群添加新索引。创建索引时，您可以指定以下内容：

索引设置（可选）
索引中字段的映射（可选）
索引别名（可选）

await client.indices.create({
  // Name of the index you wish to create.
  index: "products",
});

我们将使用动态映射，所以这里没有在正文中添加设置和映射。但是，如果需要，我们可以像这样：

await client.indices.create({
  // Name of the index you wish to create.
  index: "products",
  // If you want to add "settings" & "mappings"
  body: {
    settings: {
      number_of_shards: 1,
    },
    mappings: {
      properties: {
        field1: { type: "text" },
      },
    },
  },
});

索引文件

现在索引已经创建好了product，让我们添加一些文档，以便稍后进行搜索。根据具体使用场景，基本上有两种方法可以实现这一点。

为单个文档建立索引。
批量索引多个文档。

我们稍后会介绍这两种使用场景。

为单个文档建立索引

这里我们将使用create之前创建的客户端方法。让我们来看一下代码：

await client.create({
  // Unique identifier for the document.
  // To automatically generate a document ID omit this parameter.
  id: 1,
  type: "doc",
  // The name of the index.
  index: "products",
  body: {
    id: 1,
    name: "iPhone 12",
    price: 699,
    description: "\"Blast past fast\","
  },
});

我们可以使用 `index`或 ` index` 资源JSON来索引新文档。使用 `index`可以保证仅当文档尚不存在时才会对其进行索引。要更新现有文档，必须使用 ` index` 资源。_doc_create_create_doc

一次性索引多个文档

这一切都很好。但是，有时我们需要一次性索引多个文档。例如，在我们的例子中，如果我们能一次性索引所有新款 iPhone，岂不是更好？对吧？我们可以使用这个bulk方法来专门处理这种情况。让我们来看一下代码：

const dataset = [
  {
    id: 2,
    name: "iPhone 12 mini",
    description: "\"Blast past fast.\","
    price: 599,
  },
  {
    id: 3,
    name: "iPhone 12 Pro",
    description: "\"It's a leap year.\","
    price: 999,
  },
  {
    id: 4,
    name: "iPhone 12 Pro max",
    description: "\"It's a leap year.\","
    price: 1199,
  },
];

const body = dataset.flatMap(doc => [{ index: { _index: "products" } }, doc]);

const { body: bulkResponse } = await client.bulk({ refresh: true, body });

if (bulkResponse.errors) {
  const erroredDocuments = [];
  // The items array has the same order of the dataset we just indexed.
  // The presence of the `error` key indicates that the operation
  // that we did for the document has failed.
  bulkResponse.items.forEach((action, i) => {
    const operation = Object.keys(action)[0];
    if (action[operation].error) {
      erroredDocuments.push({
        // If the status is 429 it means that you can retry the document,
        // otherwise it's very likely a mapping error, and you should
        // fix the document before to try it again.
        status: action[operation].status,
        error: action[operation].error,
        operation: body[i * 2],
        document: body[i * 2 + 1],
      });
    }
  });
  // Do something useful with it.
  console.log(erroredDocuments);
}

该bulk方法允许在单个请求中执行多个操作。这里我们使用了indexes操作，create但您可以根据需要使用其他操作。deleteupdateindex

提示：bulk在单个 API 调用中执行多个索引或删除操作。这可以减少开销并显著提高索引速度。

更新现有文档

我们经常需要更新现有文档。我们将使用这种update方法来实现这一点。

它允许您编写文档更新脚本。该脚本可以更新、删除或跳过对文档的修改。要递增该值price，您可以使用update以下脚本调用该方法：

await client.update({
  // The name of the index.
  index: "products",
  // Document ID.
  id: -1,
  body: {
    script: {
      source: "ctx._source.price += params.price_diff",
      params: {
        price_diff: 99,
      },
    },
  },
});

该updateAPI 还支持传递部分文档，并将其合并到现有文档中。让我们使用它来更新description产品信息id = -1：

await client.update({
  // The name of the index.
  index: "products",
  // Document ID.
  id: -1,
  body: {
    doc: {
      description: "\"Fast enough!\","
    },
  },
});

删除现有文档

不言而喻，我们也需要在某个时候删除现有文档。

我们将使用该delete方法从索引中删除文档。为此，我们必须指定索引名称和文档 ID。让我们来看一个例子：

await client.delete({
  // The name of the index.
  index: "products",
  // Document ID.
  id: -1,
});

搜索

该searchAPI 允许我们执行搜索查询并获取与查询匹配的搜索结果。

我们先从一个简单的查询开始。

// Let's search!
const { body } = await client.search({
  // The name of the index.
  index: "products",
  body: {
    // Defines the search definition using the Query DSL.
    query: {
      match: {
        description: "\"blast\","
      },
    },
  },
});

此查询将返回所有字段与以下description内容匹配的文档"blast"

信息：点击此处了解更多关于查询DSL的信息。

简单明了，对吧？但这还不是全部！我们还可以进行更具体的查询。让我们来看一些例子：

搜索确切文本，例如产品名称。

// Let's search for products with the name "iPhone 12 Pro" !
const { body } = await client.search({
    // The name of the index.
  index: "products",
  body: {
        // Defines the search definition using the Query DSL.
    query: {
      term: {
        title.keyword: {
                    value: "iPhone 12 Pro"
                }
      }
    }
  }
});

搜索一系列数值，例如价格范围在一定范围内的产品

// Let's search for products ranging between 500 and 1000!
const { body } = await client.search({
  // The name of the index.
  index: "products",
  body: {
    // Defines the search definition using the Query DSL.
    query: {
      range: {
        price: {
          gte: 500,
          lte: 1000,
        },
      },
    },
  },
});

使用多个条件进行搜索

// Let's search for products that are either ranging between 500 and 1000
// or description matching "stunning"
const { body } = await client.search({
  // The name of the index.
  index: "products",
  body: {
    // Defines the search definition using the Query DSL.
    query: {
      // Return result for which this nested condition is TRUE.
      bool: {
        // Acts like an OR operator.
        // Returns TRUE even if one of these conditions is met
        should: [
          {
            range: {
              price: {
                gte: 500,
                lte: 1000,
              },
            },
          },
          {
            match: {
              description: "\"stunning\","
            },
          },
        ],
      },
    },
  },
});

如果您需要一个所有条件都必须匹配的搜索查询，那么您应该在布尔值中使用 must 运算符。它的作用类似于 AND 运算符，仅当所有条件都满足时才返回 TRUE。布尔值内部还有其他运算符 must_not 和 should_not，您可以根据需要使用。

以上仅列举了一些搜索查询示例，您还可以执行更具体、更高级的搜索查询。

信息：点击此处了解更多关于使用查询DSL进行搜索的信息。

排序搜索结果

Elasticsearch 允许我们对特定字段添加一个或多个排序。每个排序都可以反向进行。排序是在每个字段级别定义的，可以使用特殊的字段名称来_score区分按分数排序和_doc按索引顺序排序。

当按 <code><p></code> 排序时，顺序默认为“降序”；当按其他任何条件排序时，顺序_score默认为“序”。"asc"

我们来看下面的例子：

// Let's sort the search results!
const { body } = await client.search({
  // The name of the index.
  index: "products",
  body: {
    // Defines the search definition using the Query DSL.
    query: {
      bool: {
        // Acts like an AND operator.
        // Returns TRUE only if all of these conditions are met.
        must: [
          {
            range: {
              price: {
                gte: 500,
                lte: 1100,
              },
            },
          },
          {
            match: {
              name: "iPhone",
            },
          },
        ],
      },
    },
    // Sort the search result by "price"
    sort: [
      {
        price: {
          order: "asc",
        },
      },
    ],
  },
});

price这里我们按顺序对搜索结果进行了排序"asc"。

分页搜索结果

分页是每个像样的实际应用都必备的功能。Elasticsearch 也能帮我们实现这一点。让我们来看看它是如何做到的吧？🙂

默认情况下，该search方法返回前 10 个匹配文档。

要分页浏览更多结果，可以使用搜索 APIsize及其from参数。size参数 1 是要返回的匹配文档数量。from参数 2 是从完整结果集开头开始的偏移量（从零开始索引），指示您要从哪个文档开始。

例如，以下search方法调用将from偏移量设置为15，这意味着请求会偏移（或跳过）前十五个匹配的文档。

该size参数为15，这意味着请求最多可以返回 15 个文档，从偏移量开始。

// Let's paginate the search results!
const { body } = await client.search({
  // The name of the index.
  index: "products",
  body: {
    // Starting offset (default: 0)
    from: 15,
    // Number of hits to return (default: 10)
    size: 15,
    // Defines the search definition using the Query DSL.
    query: {
      match: {
        description: "\"blast\","
      },
    },
  },
});

结论

如果您正在寻找适合您的应用或网站的快速搜索机制，我建议您考虑使用 Elasticsearch 作为解决方案。

如果您对构建全栈无服务器 Web 应用程序感兴趣，我强烈推荐您尝试Webiny，它是采用无服务器架构的最简单方法。我们在核心应用程序（例如页面构建器、文件管理器和无头 CMS）中内置了 Elasticsearch 和 DynamoDB，以实现超快的搜索速度。

我希望这篇博客能对你的网页开发之旅有所帮助，当然，如果你有任何其他问题、疑虑或想法，欢迎随时通过Twitter或直接通过我们的社区 Slack联系我💬 。

感谢阅读这篇博客！我叫 Ashutosh，是 Webiny 的一名全栈开发人员。如果您有任何问题、评论，或者只是想打个招呼，欢迎随时通过Twitter联系我。您也可以订阅我们的YouTube 频道🍿 ，我们每周都会发布知识分享视频。

供参考：该博客最初由阿舒托什撰写并发布。

文章来源：https://dev.to/webiny/lighting-fast-search-with-elasticsearch-n82

菜单

分享

使用 Elasticsearch 实现闪电般的搜索速度

使用 Elasticsearch 实现闪电般的搜索速度

什么是 Elasticsearch？

你为什么需要它？

Elasticsearch 入门

设置 Elasticsearch 集群

设置 Elasticsearch Node.js 客户端

配置客户端

Elasticsearch 实际应用

创建索引

索引文件

为单个文档建立索引

一次性索引多个文档

更新现有文档

删除现有文档

搜索

排序搜索结果

分页搜索结果

结论

系统设计面试中的 19 种微服务模式

使用 React 和 AWS Amplify 实现无服务器架构第三部分：跟踪应用使用情况

模型-视图-控制器（MVC）模式到底是什么？DEV 全球项目展示挑战赛，由 Mux 主办：快来展示你的项目吧！

我在两年内从 PHP 开发人员晋升为高级 C#/.NET 开发人员。

了解 Docker：第 12 部分 – 传递构建参数

Yarn 和第三方 NPM 客户端的黑暗未来 DEV 的全球展示与讲述挑战赛，由 Mux 呈现：展示你的项目！

CSS DEV 的全球展示挑战赛“响应式字体”由 Mux 呈现：展示你的项目！

我是如何以学生开发者的身份免费获得 Tabnine Pro 的，你也可以！

五大顶级JS框架

从 Rector PHP 开始：利用自动化改进您的 PHP 代码