发布于 2026-01-06 6 阅读
0

为了更快地学习,我利用周末时间,用 GPT-4 构建了一个语音 AI 导师。

为了更快地学习,我利用周末时间,用 GPT-4 构建了一个语音 AI 导师。

几个月前,我注意到了一些令人沮丧的事情。

我花了几个小时试图学习新概念——看视频、读文章,或者和 ChatGPT 聊天。但感觉仍然很慢、很笨拙、很被动。打字提问就像做作业一样。滚动页面寻找“正确”的解释令人筋疲力尽。

于是我问自己:

如果学习感觉更像一场对话呢?

这个问题催生了Learnflow AI——一个语音学习助手,你可以像私人导师一样与它对话。

在本系列文章中,我将向您展示我是如何从零开始,使用VapiNext.jsOpenAI构建一个实时、支持语音的 GPT-4 应用程序的

Learnflow AI是什么?

Learnflow AI是一款语音优先的学习界面——可以把它想象成 ChatGPT,但你不用打字。你说话,AI 会实时回复。

它使用Vapi.ai进行语音交互,并使用GPT-4提供智能答案。这种组合打造了极其自然的辅导体验——没有杂乱的界面,只需按下按钮即可开始说话。

您可以使用相同的堆栈来构建:

  • 能说能听的AI导师
  • 语音助手和生产力机器人
  • 辅助学习工具,无需动手操作

我们在这个部分构建的内容

目标:一个生产级的MVP,让您可以与GPT-4对话并获得实时语音回答。

第一部分包含以下内容:

  • 使用Vapi.ai构建的语音助手
  • GPT-4 推理
  • 使用 App Router 的 Next.js 前端
  • Tailwind、Radix 和 Shadcn 用于样式和组件
  • 目前还没有用户身份验证、数据库、内存或积分(那是第二部分)。

为什么选择语音优先?

打字太慢,滚动页面太混乱,语音彻底改变了一切。

当你大声提问时:

  • 处理速度更快(无需组织输入提示信息)
  • 感觉更自然、更直观。
  • 它模拟了与导师的真实学习对话。

说话就像学习,打字就像搜索。

我的技术栈

科技 为何选择它
语音接口 Vapi.ai 实时音频流 + 兼容 OpenAI
法学硕士提供者 OpenAI GPT-4 高质量答案,快速推理
前端 Next.js(应用路由) 可扩展的基于文件的路由
造型 Tailwind CSS 迭代速度快,响应迅速。
成分 Radix UI + Shadcn 易于访问的底层用户界面基本元素
语言 TypeScript DX + 型安全
主机 维塞尔 Next.js 的即时部署

文件结构(语音MVP)

该应用程序的这一部分特意设计得很简单——其目的是快速展示语音助手的实际操作。

learnflow-ai/
├── app/
        └── globals.css
        └── layout.tsx
        └── page.tsx
├── constants/
    └── soundwaves.json
└── lib/
        └── utils.ts
    └── vapi.sdk.ts
Enter fullscreen mode Exit fullscreen mode

我们专注于确保语音流程正常运行,然后再添加状态、身份验证、数据库或个性化功能。

分步指南:设置语音助手

本部分假设您已经设置好了 Next.js 应用的路由代码库并安装了 shadcn。您可以按照以下步骤操作:

步骤 1:设置应用布局

import type { Metadata } from "next";
import { Bricolage_Grotesque } from "next/font/google";
import "./globals.css";

const bricolage = Bricolage_Grotesque({
  variable: "--font-bricolage",
  subsets: ["latin"],
});

export const metadata: Metadata = {
  title: "Learnflow AI",
  description: "A voice-only learning platform for developers",
};

export default function RootLayout({
  children,
}: Readonly<{
  children: React.ReactNode;
}>) {
  return (
    <html lang="en">
      <body className={`${bricolage.variable} antialiased`}>
          {children}
      </body>
    </html>
  );
}
Enter fullscreen mode Exit fullscreen mode

步骤 2:创建一个名为 constants 的文件夹和一个名为 soundwaves.json 的文件,然后粘贴以下 JSON 代码 ( constants/soundwaves.json)

{"nm":"Render","ddd":0,"h":250,"w":250,"meta":{"g":"LottieFiles AE 3.1.1"},"layers":[{"ty":4,"nm":"Arrow Outlines 4","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[100.5,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.103],[12.471,18.868]]}],"t":30},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.324],[12.471,19.235]]}],"t":45},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.515,4.206],[12.515,25.853]]}],"t":70},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.985],[12.471,17.912]]}],"t":83},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.515,5.309],[12.544,24.088]]}],"t":97},{"o":{"x":0.333,"y":0},"i":{"x":1,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,7.588],[12.456,20.044]]}],"t":109},{"o":{"x":0.333,"y":0},"i":{"x":1,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,6.265],[12.456,21]]}],"t":121},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":128.000005213547}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":1},{"ty":4,"nm":"Arrow Outlines 3","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[146.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.973,"y":0},"i":{"x":0.581,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,13.765],[12.441,18.353]]}],"t":30},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,13.721]]}],"t":45},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.529,7.074],[12.5,15.191]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,5.529],[12.441,16.735]]}],"t":83},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.544,7.147],[12.515,15.044]]}],"t":97},{"o":{"x":0.973,"y":0},"i":{"x":0.592,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,3.103],[12.471,21.809]]}],"t":109},{"o":{"x":0.973,"y":0},"i":{"x":0.893,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":122},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":129.000005254278}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":2},{"ty":4,"nm":"Arrow Outlines 2","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[116.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.333,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.333,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":24},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,4.647],[12.441,25.706]]}],"t":41},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":55},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,6.118],[12.471,20.412]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,1.926],[12.456,22.838]]}],"t":87},{"o":{"x":0.973,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,7.735],[12.471,20.265]]}],"t":101},{"o":{"x":0.333,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":115},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":134.000005457932}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":3},{"ty":4,"nm":"Arrow Outlines","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[131.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.973,"y":0},"i":{"x":0.581,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":30},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":45},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":83},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":97},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,5.75],[12.471,21.809]]}],"t":109},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":125},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":132.00000537647}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":4},{"ty":4,"nm":"cir 1","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":30},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.667,"y":1},"s":[111.8,111.8,100],"t":107.661},{"s":[111.8,111.8,100],"t":134.000005457932}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":100,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":5},{"ty":4,"nm":"cir 2","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[110,110,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":34},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[150,150,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[150,150,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.274,"y":1},"s":[150,150,100],"t":117},{"s":[110,110,100],"t":123.966255049249}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":10,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":6},{"ty":4,"nm":"cir 3","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[110,110,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":34},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[190,190,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[190,190,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.274,"y":1},"s":[190,190,100],"t":118},{"s":[110,110,100],"t":134.000005457932}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":5,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":7}],"v":"4.8.0","fr":29.9700012207031,"op":135.000005498663,"ip":0,"assets":[]}
Enter fullscreen mode Exit fullscreen mode

步骤 3:在常量文件夹 ( constants/index.ts) 中创建另一个文件,并将此代码粘贴到该文件中。

export const subjects = [
  "javascript",
  "python",
  "html",
  "css",
  "algorithms",
  "databases",
];

export const subjectsColors = {
  javascript: "#FFD166",
  python: "#9BE7FF",
  html: "#FF9AA2",
  css: "#B5EAD7",
  algorithms: "#CBAACB",
  databases: "#FFDAC1",
};

export const voices = {
  male: { casual: "2BJW5coyhAzSr8STdHbE", formal: "c6SfcYrb2t09NHXiT80T" },
  female: { casual: "ZIlrSGI4jZqobxRKprJz", formal: "sarah" },
};

export const recentSessions = [
  {
    id: "1",
    subject: "javascript",
    name: "Codey the JS Debugger",
    topic: "Understanding Closures",
    duration: 40,
    color: "#FFD166",
  },
  {
    id: "2",
    subject: "python",
    name: "Snakey the Python Guru",
    topic: "List Comprehensions & Lambdas",
    duration: 35,
    color: "#9BE7FF",
  },
  {
    id: "3",
    subject: "html",
    name: "Structo the Markup Architect",
    topic: "Semantic Tags & Accessibility",
    duration: 25,
    color: "#FF9AA2",
  },
  {
    id: "4",
    subject: "css",
    name: "Stylo the Flexbox Wizard",
    topic: "Flexbox vs Grid Layouts",
    duration: 30,
    color: "#B5EAD7",
  },
  {
    id: "5",
    subject: "algorithms",
    name: "Algo the Problem Solver",
    topic: "Binary Search Explained Visually",
    duration: 45,
    color: "#CBAACB",
  },
  {
    id: "6",
    subject: "databases",
    name: "Query the Data Whisperer",
    topic: "SQL Joins: Inner vs Outer",
    duration: 20,
    color: "#FFDAC1",
  },
];
Enter fullscreen mode Exit fullscreen mode

步骤 4:安装 lottie-react 包

npm install lottie-react
Enter fullscreen mode Exit fullscreen mode

步骤 5:安装 Vapi SDK

npm install @vapi-ai/web
Enter fullscreen mode Exit fullscreen mode

您需要一个免费的Vapi 帐户才能获取 API 密钥。

步骤 6:初始化 Vapi 客户端lib/vapi.sdk.ts

这将使用您的 API 密钥设置 Vapi SDK,从而允许您的应用连接到 Vapi 的语音基础设施:

import Vapi from "@vapi-ai/web";

export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN!);
Enter fullscreen mode Exit fullscreen mode

它初始化核心的Vapi客户端,该客户端负责处理实时音频、流媒体播放以及与您的AI助手的连接。所有语音交互都从这里开始。

步骤 7:创建一个 ( lib/utils.ts) 文件并将这些代码粘贴到该文件中。

import { clsx, type ClassValue } from "clsx";
import { twMerge } from "tailwind-merge";
import { subjectsColors, voices } from "@/constants";
import { CreateAssistantDTO } from "@vapi-ai/web/dist/api";

export function cn(...inputs: ClassValue[]) {
  return twMerge(clsx(inputs));
}

export const getSubjectColor = (subject: string) => {
  return subjectsColors[subject as keyof typeof subjectsColors];
};

export const configureAssistant = (voice: string, style: string) => {
  const voiceId = voices[voice as keyof typeof voices][
          style as keyof (typeof voices)[keyof typeof voices]
          ] || "sarah";

  const vapiAssistant: CreateAssistantDTO = {
    name: "Companion",
    firstMessage:
        "Hello, let's start the session. Today we'll be talking about {{topic}}.",
    transcriber: {
      provider: "deepgram",
      model: "nova-3",
      language: "en",
    },
    voice: {
      provider: "11labs",
      voiceId: voiceId,
      stability: 0.4,
      similarityBoost: 0.8,
      speed: 1,
      style: 0.5,
      useSpeakerBoost: true,
    },
    model: {
      provider: "openai",
      model: "gpt-4",
      messages: [
        {
          role: "system",
          content: `You are a highly knowledgeable tutor teaching a real-time voice session with a student. Your goal is to teach the student about the topic and subject.

                    Tutor Guidelines:
                    Stick to the given topic - {{ topic }} and subject - {{ subject }} and teach the student about it.
                    Keep the conversation flowing smoothly while maintaining control.
                    From time to time make sure that the student is following you and understands you.
                    Break down the topic into smaller parts and teach the student one part at a time.
                    Keep your style of conversation {{ style }}.
                    Keep your responses short, like in a real voice conversation.
                    Do not include any special characters in your responses - this is a voice conversation.
              `,
        },
      ],
    },
    clientMessages: [],
    serverMessages: [],
  };
  return vapiAssistant;
};
Enter fullscreen mode Exit fullscreen mode

步骤 8:创建你的助手

只需将此代码粘贴到您的 ( app/page.tsx) 文件中即可

'use client';

import {useEffect, useRef, useState} from 'react'
import {cn, configureAssistant, getSubjectColor} from "@/lib/utils";
import {vapi} from "@/lib/vapi.sdk";
import Image from "next/image";
import Lottie, {LottieRefCurrentProps} from "lottie-react";
import soundwaves from '@/constants/soundwaves.json'
import { useMutation } from "convex/react";
import { api } from '@/convex/_generated/api';
import { Id } from "@/convex/_generated/dataModel";

enum CallStatus {
    INACTIVE = 'INACTIVE',
    CONNECTING = 'CONNECTING',
    ACTIVE = 'ACTIVE',
    FINISHED = 'FINISHED',
}

const Page = () => {

        //Demo details
        const subject = "javascript"
        const topic = "React and Typescript"
        const name = "Better Call Saul"
        const style = "casual"
        const voice = "male"
        const userName = "Shola - student"
        const userImage = "images/me.png"

    const [callStatus, setCallStatus] = useState<CallStatus>(CallStatus.INACTIVE);
    const [isSpeaking, setIsSpeaking] = useState(false);
    const [isMuted, setIsMuted] = useState(false);
    const [messages, setMessages] = useState<SavedMessage[]>([]);

    const lottieRef = useRef<LottieRefCurrentProps>(null);

    useEffect(() => {
        if(lottieRef) {
            if(isSpeaking) {
                lottieRef.current?.play()
            } else {
                lottieRef.current?.stop()
            }
        }
    }, [isSpeaking, lottieRef])

    useEffect(() => {
        const onCallStart = () => setCallStatus(CallStatus.ACTIVE);

        const onCallEnd = () => {
            setCallStatus(CallStatus.FINISHED);
        }

        const onMessage = (message: Message) => {
            if(message.type === 'transcript' && message.transcriptType === 'final') {
                const newMessage= { role: message.role, content: message.transcript}
                setMessages((prev) => [newMessage, ...prev])
            }
        }

        const onSpeechStart = () => setIsSpeaking(true);
        const onSpeechEnd = () => setIsSpeaking(false);

        const onError = (error: Error) => console.log('Error', error);

        vapi.on('call-start', onCallStart);
        vapi.on('call-end', onCallEnd);
        vapi.on('message', onMessage);
        vapi.on('error', onError);
        vapi.on('speech-start', onSpeechStart);
        vapi.on('speech-end', onSpeechEnd);

        return () => {
            vapi.off('call-start', onCallStart);
            vapi.off('call-end', onCallEnd);
            vapi.off('message', onMessage);
            vapi.off('error', onError);
            vapi.off('speech-start', onSpeechStart);
            vapi.off('speech-end', onSpeechEnd);
        }
    }, []);

    const toggleMicrophone = () => {
        const isMuted = vapi.isMuted();
        vapi.setMuted(!isMuted);
        setIsMuted(!isMuted)
    }

    const handleCall = async () => {
        setCallStatus(CallStatus.CONNECTING)

        const assistantOverrides = {
            variableValues: { subject, topic, style },
            clientMessages: ["transcript"],
            serverMessages: [],
        }

        // @ts-expect-error - The configureAssistant function's return type doesn't match the expected type, but it works at runtime
        vapi.start(configureAssistant(voice, style), assistantOverrides)
    }

    const handleDisconnect = () => {
        setCallStatus(CallStatus.FINISHED)
        vapi.stop()
    }

    return (
        <section className="flex flex-col h-[70vh]">
            <section className="flex gap-8 max-sm:flex-col">
                <div className="companion-section">
                    <div className="companion-avatar" style={{ backgroundColor: getSubjectColor(subject)}}>
                        <div
                            className={
                            cn(
                                'absolute transition-opacity duration-1000', callStatus === CallStatus.FINISHED || callStatus === CallStatus.INACTIVE ? 'opacity-1001' : 'opacity-0', callStatus === CallStatus.CONNECTING && 'opacity-100 animate-pulse'
                            )
                        }>
                            <Image src={`/icons/${subject}.svg`} alt={subject} width={150} height={150} className="max-sm:w-fit" />
                        </div>

                        <div className={cn('absolute transition-opacity duration-1000', callStatus === CallStatus.ACTIVE ? 'opacity-100': 'opacity-0')}>
                            <Lottie
                                lottieRef={lottieRef}
                                animationData={soundwaves}
                                autoplay={false}
                                className="companion-lottie"
                            />
                        </div>
                    </div>
                    <p className="font-bold text-2xl">{name}</p>
                </div>

                <div className="user-section">
                    <div className="user-avatar">
                        <Image src={userImage} alt={userName} width={130} height={130} className="rounded-lg" />
                        <p className="font-bold text-2xl">
                            {userName}
                        </p>
                    </div>
                    <button className="btn-mic" onClick={toggleMicrophone} disabled={callStatus !== CallStatus.ACTIVE}>
                        <Image src={isMuted ? '/icons/mic-off.svg' : '/icons/mic-on.svg'} alt="mic" width={36} height={36} />
                        <p className="max-sm:hidden">
                            {isMuted ? 'Turn on microphone' : 'Turn off microphone'}
                        </p>
                    </button>
                    <button className={cn('rounded-lg py-2 cursor-pointer transition-colors w-full text-white', callStatus ===CallStatus.ACTIVE ? 'bg-red-700' : 'bg-primary', callStatus === CallStatus.CONNECTING && 'animate-pulse')} onClick={callStatus === CallStatus.ACTIVE ? handleDisconnect : handleCall}>
                        {callStatus === CallStatus.ACTIVE
                        ? "End Session"
                        : callStatus === CallStatus.CONNECTING
                            ? 'Connecting'
                        : 'Start Session'
                        }
                    </button>
                </div>
            </section>

            <section className="transcript">
                <div className="transcript-message no-scrollbar">
                    {messages.map((message, index) => {
                        if(message.role === 'assistant') {
                            return (
                                <p key={index} className="max-sm:text-sm">
                                    {
                                        name
                                            .split(' ')[0]
                                            .replace('/[.,]/g, ','')
                                    }: {message.content}
                                </p>
                            )
                        } else {
                           return <p key={index} className="text-primary max-sm:text-sm">
                                {userName}: {message.content}
                            </p>
                        }
                    })}
                </div>

                <div className="transcript-fade" />
            </section>
        </section>
    )
}

export default Page
Enter fullscreen mode Exit fullscreen mode

为什么选择瓦皮?

如果没有Vapi,你需要管理:

  • WebSocket
  • 语音转文本 (STT)
  • TTS(文本转语音)
  • 语音播放和流媒体同步

Vapi只需几行代码就能处理所有这些操作。对于语音优先的AI应用来说,它简直就像魔法一样。

语音助手的工作原理(分步详解)

美人鱼图

  1. 用户点击呼叫按钮
  2. Vapi 开启实时语音流
  3. 用户提出问题
  4. Vapi 将其转录并发送给 OpenAI (GPT-4)。
  5. OpenAI 返回响应
  6. Vapi 将回应转化为言语
  7. 浏览器向用户播放响应

本地安装

先决条件:

  • Node.js 18+
  • Vapi API 密钥

.env.local文件:

env
NEXT_PUBLIC_VAPI_WEB_TOKEN=your_vapi_web_token
VAPI_SECRET_KEY=your_vapi_secret_ke
Enter fullscreen mode Exit fullscreen mode

运行它:

npm install
npm run dev
Enter fullscreen mode Exit fullscreen mode

打开http://localhost:3000并按下按钮即可开始与 GPT-4 对话。

要点总结

Learnflow AI 证明了一件事:

与人工智能对话比打字交流流畅得多。

Vapi + GPT-4 的组合可让您构建功能强大的助手:

  • 实时语音对话
  • 零摩擦用户界面
  • 高记忆力和理解力

而且你可以在一个周末内搭建出整个MVP(最小可行产品)。

第二部分预告

接下来,我们将深入探讨,使其更具个人色彩:

试用MVP或构建您自己的版本

GitHub: github.com/sholajgede/learnflow_ai

如果您想在学习下一部分(第 2 部分)之前设置Kinde Auth ,请查看这篇文章。

文章来源:https://dev.to/sholajgede/i-wanted-to-learn-faster-so-i-built-a-voice-ai-tutor-with-gpt-4-in-a-weekend-3ke0