所以你有很多事情要做。为什么不建一条管道呢？

在开发软件时，编写易于阅读的代码是一个好主意。而且，就像任何优秀的讲故事的人一样，你应该省略不重要的细节。你还需要留下一些线索，以便读者在需要时找到细节。

坐下来，喝杯热饮，我们就开始吧。

好故事的要素

故事、程序、流程、功能和算法有什么共同点？

它们都有开始、中间和结束。

当我们描述程序时，我们首先描述执行程序所需的先决条件和材料，即程序的输入。我们描述执行程序所需的步骤。最终，描述还包括预期结果，即输出。

如果你觉得这听起来很像函数调用，那你完全正确。但如果你没意识到这一点，别担心，这篇文章会引导你熟悉这个概念。😁

定义输入

让我们穿上Cosplay服吧。在本篇报道中，你将扮演一位分析师，负责为选定的子版块提交报告。你将获得一个子版块列表，以便根据页面生成多种类型的报告。

您的任务是为每个给定的 subreddit 首页生成一些报告：

每篇文章字数的中位数
每篇文章评论数量的中位数
附有图片的帖子占所有帖子的比例

至于 URL，请自行选择，但在此示例中，我们将使用/r/dataisbeautiful：

https://www.reddit.com/r/dataisbeautiful/

当你看完之后，尝试一下 JSON URL，这样你就会了解数据的结构：

定义步骤

所以，首先，我们需要将问题分解成清晰明确的步骤。步骤越细，就越容易理解、调试和复用。游戏规则就是专心做好一件事。

我们先把第一份报告写下来，越详细越好。

生成 URL
获取 JSON 数据
提取帖子
提取每篇帖子的文本和标题
为每个文本生成字数统计
计算所有文本的中值

理想情况下，你应该对每个步骤进行测试。为了简洁起见，我在本文中省略了测试，但如果我在代码审查中审查你的代码，这肯定行不通！

步骤1：生成URL

这个很简单：获取 Reddit URL，删除尾部斜杠（如果有）并附加.json字符串。

const getRedditJSONUrl = url => url.replace(/\/?$/, '.json');

第 2 步：获取 JSON 数据

一个简单的调用fetch并将响应转换为 JSON 就可以了。

const fetchData = url => fetch(url).then(response => response.json());

步骤 3：提取帖子

我们知道每个页面都包含data.children保存我们感兴趣的帖子数组的属性。

const extractPosts = redditPage => redditPage.data.children;

步骤 4：提取每篇帖子的文本

每篇文章的标题可以在data.title属性中找到，正文可以在中找到data.selftext。我们将使用换行符将它们连接起来\n。

const extractPostTextAndTitle = post => post.data.title + '\n' + post.data.selftext;

步骤 5：为每个文本生成字数

这个有点棘手。目前没有快速可靠的方法来统计单词数量，所以我们将使用 NPM 中一个更复杂的实用函数@iarna/word-count。

请注意，我们仍在创建一个包装库函数的函数。这是为了将我们与库隔离开来，以防我们需要更改实现，或者函数调用因我们这边代码重构而发生变化。

const _wordCount = require('@iarna/word-count');

const countWords = text => _wordCount(text);

步骤 6：计算中位数

要计算一组数字的中位数，我们将其从小到大排序。中位数是将有序集合分成相等两半的值。对于奇数个值的集合，中位数是中间的值。对于偶数个值的集合，中位数是中间两个值的中点。

这是奇数和偶数的中值：

[1 1 2 3 5 8 13] ~ size = 7
       ^ median = 3

[1 1 2 3 5 8 13 21] ~ size = 8
        ^ median = (3+5)/2

实现如下：

const numberValueSorter = (a, b) => a - b;

const calculateMedian = list => {
  // an empty list has no median
  if (list.length == 0) return undefined;

  // sort the values
  const sorted = Array.from(list).sort(numberValueSorter);

  if (sorted.length % 2 == 0) {
    // we're dealing with an even-sized set, so take the midpoint
    // of the middle two values
    const a = sorted.length / 2 - 1;
    const b = a + 1;
    return (list[a] + list[b]) / 2;
  } else {
    // pick the middle value
    const i = Math.floor(sorted.length / 2);
    return list[i];
  }
}

连接步骤

现在我们已经有了步骤，让我们以经典的命令式风格写出代码，以便我们更好地理解这个过程是什么样的。

就叙事而言，流程似乎很混乱。我们不是简单地列出步骤，而是依次调用每个步骤，保存中间结果并将结果传递给下一步。

这个故事中也存在一些陷阱；有些需要await结果，有些需要包装调用来map处理每个项目。

“如果我们能把这些步骤连接起来，并将这些结果传递到下一个链条上，那会怎么样？”他眼里闪烁着光芒问道。

进入管道

这里我们需要引入一个新概念——pipeline函数。我们先来分析一下原来的流程：获取一个 subreddit URL，然后计算页面的字数中位数：

const getMedianWordCountReport = async subredditUrl => {
  /* something something spaceship */
  return 'voilá!';
};

我们之前说过，我们的流程由上面描述的六个步骤定义。假设pipeline存在，并编写如下代码，让我们按照以下步骤创建流程函数：

const getMedianWordCountReport = pipeline(
  getRedditJSONUrl,
  fetchData,
  extractPosts,
  map(extractPostTextAndTitle),
  map(countWords),
  calculateMedian
);

const URL = 'https://www.reddit.com/r/dataisbeautiful/';

// it's an async function, so we need to wait for it to resolve
getMedianWordCountReport(URL)
  .then(median =>
    console.log('Median word count for ' + URL, median)
  )
  .catch(error => console.error(error));

啊，但是那个函数怎么样map()？它只是Array::map改变了函数，以便在接受数组之前用映射函数进行柯里化：

const map = mapper => array => array.map(mapper);

到目前为止一切顺利。现在我们知道了函数应该做什么，只需要定义它。让我们先定义它的签名：

const pipeline = (...steps) => {  // take a list of steps,
  return async input => {         // return an async function that takes an input,
    return input;                 // and eventually returns a result
  };
};

我们创建了一个函数，它可以接受任意数量的函数（steps）并返回一个async function，即过程函数。

对于每一步，函数都应该获取最后的中间结果，将其提供给下一步，并保存该中间结果。

如果没有其他步骤，则返回最后的中间结果。

准备好了吗？出发！

const pipeline = (...steps) => {    // take a list of steps defining the process
  return async input => {           // and return an async function that takes input;
    let result = input;             // the first intermediate result is the input;
    for (const step of steps)       // iterate over each step;
      result = await step(result);  // run the step on the result and update it;
    return result;                  // return the last result!
  };
};

你可能会想，“不，不可能是这样。真的是这样吗？”

是的。你自己试试吧：

简化管道

我们想理顺一下流程中的一些弯路。在某个地方，结果会从单个值变成一个值列表（extractPosts），然后再变回来（calculateMedian）。如果我们能把需要处理单个项目的函数组合在一起就更好了。

为了做到这一点，让我们创建一个组合函数，它将采取多个步骤来处理单个值并将它们串在一起以对值列表进行操作：

const map = (...mappers) =>                 // take an array of mappers,
  array =>                                  // and return a function that takes an array;
    array.map(                              // map each item of the array
      item => mappers.reduce(               // through a function that passes each item
        (result, mapper) => mapper(result)  // and runs them through the chain of mappers
      )
    );

现在，此函数有一个注意事项：传递给此map函数的映射器函数必须是同步的。为了完整性，我们假设每个映射器可能是一个async函数，并应进行相应的处理。

const map = (...mappers) =>
  async array => {                      // we now have to return an async function
    const results = [];
    for (const value of array) {        // for each value of the array,
      let result = value;               // set the first intermediate result to the first value;
      for (const mapper of mappers)     // take each mapper;
        result = await mapper(result);  // and pass the intermediate result to the next;
      results.push(result);             // and push the result onto the results array;
    }
    return results;                     // return the final array
  };

现在我们已经解决了这个边缘情况，我们可以通过将两个单项函数分组为一个步骤来重新制定我们的流程函数：

而且它仍然有效！

分叉管道

现在我们有了一个pipeline函数，可以用来声明式地构建一个描述我们流程的函数。但到目前为止，我们只涵盖了Cosplay 场景中最初设定的三个目标中的一个。

噢不！

让我们把所有的流程都写下来，以盘点我们还有哪些事情要做。

const getMedianWordCount = pipeline(
  getRedditJSONUrl,
  fetchData,
  extractPosts,
  map(
    extractPostTextAndTitle,
    countWords
  ),
  calculateMedian
);

const getMedianCommentCount = pipeline(
  getRedditJSONUrl,
  fetchData,
  extractPosts,
  map(countComments),
  calculateMedian
);

const getImagePresentRatio = pipeline(
  getRedditJSONUrl,
  fetchData,
  extractPosts,
  map(hasImageAttached),
  calculateRatio
);

好的，我们需要编写几个步骤，以便拥有组装流程所需的所有函数。现在让我们添加它们：

const countComments = post => post.data.num_comments;

const hasImageAttached = post => post.data.post_hint == 'image';

const calculateRatio = array => {
  if (array.length == 0) return undefined;
  return array.filter(value => !!value).length / array.length;
};

完成后，让我们看看这一切是否都能运行：

太好了，我们现在知道可以用这些构建块来构建进程了。不过，还有一个小问题。每个进程都要做很多相同的事情，如果每次都让每个进程获取相同的数据并重复相同的操作，似乎很浪费。

让我们创建一个fork函数来处理这个问题。

理想情况下，我们希望将管道拆分成针对每个进程的特定管道，然后将它们连接在一起以获得最终结果。让我们编写一些理想的代码，以使目标更清晰一些：

const getMedianWordCount = pipeline(
  map(
    extractPostTextAndTitle,
    countWords
  ),
  calculateMedian
);

const getMedianCommentCount = pipeline(
  map(countComments),
  calculateMedian
);

const getImagePresentRatio = pipeline(
  map(hasImageAttached),
  calculateRatio
);

// this is a convenience function that associates names to the results returned
const joinResults = ([
  medianWordCount,
  medianCommentCount,
  imagePresentRatio
]) => ({
  medianWordCount,
  medianCommentCount,
  imagePresentRatio
});

// the process function, now with forking!
const getSubredditMetrics = pipeline(
  getRedditJSONUrl,
  fetchData,
  extractPosts,
  fork(
    getMedianWordCount,
    getMedianCommentCount,
    getImagePresentRatio
  ),
  joinResults
);

根据上述要求，该fork函数采用一系列流水线。

fork此时，考虑到上述限制，我建议您继续尝试编写自己的实现。您的实现可能与扩展的非常相似map。

以下是我对该fork功能的看法：

const fork = (...pipelines) =>       // a function that takes a list of pipelines,
  async value =>                     // returns an async function that takes a value;
    await Promise.all(               // it returns the results of promises...
      pipelines.map(                 // ...mapped over pipelines...
        pipeline => pipeline(value)  // ...that are passed the value.
      )
    );

如果看起来令人困惑，别担心。要理解这个函数的作用需要花费很多时间。

诀窍在于记住，它Promise.all()接受一个 Promise 数组，并返回一个 Promise，当所有值都解析完毕后，该 Promise 才会解析。结果是一个按相同顺序排列的 Promise 结果数组。如果任何值不是 Promise，它只会将其视为一个立即解析的 Promise，并返回该结果。

最终结果

那么，这项fork工作能帮我们节省额外的开销吗？让我们拭目以待。

最后一个魔术

还在听吗？好的，还记得我们开始 Cosplay 的时候想要生成一系列 URL 的报告吗？我们能不能创建一个流程，接收一个 URL 数组并返回一个报告数组？

或许。

让我们分解一下这个问题。我们有一个 URL 数组。我们知道可以将每个 URL 传入管道，并返回一个解析报告的 Promise。如果我们将 URL 数组与管道进行映射，就能返回一个 Promise 数组。

我们已经知道如何解决一系列的承诺！

const distribute = pipeline =>  // distribute takes a pipeline,
  values =>                     // and returns a function that takes a list of values;
    Promise.all(                // it returns a promise of all the values...
      values.map(pipeline)      // ...passed through each pipeline
    );

是的，我想这样就行了！我们来试试传递一个 URL 数组，看看效果如何：

 const fetch = require('node-fetch'); const _wordCount = require('@iarna/word-count'); const getRedditJSONUrl = url => url.replace(/\/?$/, '.json'); const fetchData = url => fetch(url).then(response => response.json()); const extractPosts = redditPage => redditPage.data.children; const extractPostTextAndTitle = post => post.data.title + '\n' + post.data.selftext; const countWords = text => _wordCount(text); const numberValueSorter = (a, b) => a - b; const calculateMedian = list => { if (list.length == 0) return undefined; const sorted = Array.from(list).sort(numberValueSorter); if (sorted.length % 2 == 0) { const a = sorted.length / 2 - 1; const b = a + 1; return (list[a] + list[b]) / 2; } else { const i = Math.floor(sorted.length / 2); return list[i]; } } const pipeline = (...steps) => { return async input => { let result = input; for (const step of steps) result = await step(result); return result; }; }; const map = (...mappers) => async array => { const results = []; for (const value of array) { let result = value; for (const mapper of mappers) result = await mapper(result); results.push(result); } return results; }; const countComments = post => post.data.num_comments; const hasImageAttached = post => post.data.post_hint == 'image'; const calculateRatio = array => { if (array.length == 0) return undefined; return array.filter(value => !!value).length / array.length; }; const fork = (...pipelines) => async value => await Promise.all(pipelines.map(pipeline => pipeline(value))); const getMedianWordCount = pipeline( map( extractPostTextAndTitle, countWords ), calculateMedian ); const getMedianCommentCount = pipeline( map(countComments), calculateMedian ); const getImagePresentRatio = pipeline( map(hasImageAttached), calculateRatio ); // this is a convenience function that associates names to the results returned const joinResults = ([ medianWordCount, medianCommentCount, imagePresentRatio ]) => ({ medianWordCount, medianCommentCount, imagePresentRatio }); const getSubredditMetrics = pipeline( getRedditJSONUrl, fetchData, extractPosts, fork( getMedianWordCount, getMedianCommentCount, getImagePresentRatio ), joinResults );

...从此他们过上了幸福的生活。

恭喜你完成了这么多！你已经成功完成了从零开始设计和开发一整套异步协调机制的过程，这绝非易事。

为了总结一下，让我们提取用于构建流程函数的通用实用函数，并将它们作为模块提供：

export const pipeline = (...steps) =>
  async input => {
    let result = input;
    for (const step of steps)
      result = await step(result);
    return result;
  };

export const map = (...mappers) =>
  async array => {
    const results = [];
    for (const value of array) {
      let result = value;
      for (const mapper of mappers)
        result = await mapper(result);
      results.push(result);
    }
    return results;
  };

export const fork = (...pipelines) =>
  async value =>
    await Promise.all(
      pipelines.map(pipeline => pipeline(value))
    );

export const distribute = pipeline =>
  values =>
    Promise.all(
      values.map(pipeline)
    );

仅使用这四个函数，我们就成功构建了一套完整的通用原语，可以在 350 个字符以下的 minifed 代码中处理有限量的工作。😉

你现在可以脱掉那件角色扮演服装了。

文章来源：https://dev.to/krofdrakula/so-you-have-a-bunch-of-things-to-do-why-not-build-a-pipeline-31o0