13 个隐藏的开源库助你成为 AI 奇才🧙‍♂️🪄

过去 4 年我一直在构建 AI 应用程序，并为主要的 AI 工具平台做出了贡献。

在此期间，我使用了许多工具和框架来构建；

在现实世界中真正发挥作用的人工智能代理。
人工智能代理的工具。
端到端 RAG 应用程序。

我精心挑选了一份令人垂涎的开源工具和框架列表，它们将帮助你构建强大而可靠的人工智能应用程序。🔥

请随意探索他们的 GitHub 存储库，为您最喜欢的存储库做出贡献，并通过为存储库加星标来支持他们。

1. Composio 👑 - 以 10 倍速度构建可靠代理

我曾尝试构建许多代理，老实说，虽然创建它们很容易，但要让它们正确运行却是完全不同的另一回事。

构建高效且切实可行的 AI 代理需要高效的工具集。Composio 正是为此而生。

Composio 让您可以使用强大的工具和集成来增强您的 AI 代理，以完成 AI 工作流程。

它们为 Python 和 Javascript 提供原生支持。

Python

使用以下命令开始pip。

pip install composio-core

添加 GitHub 集成。

composio add github

Composio 会为您处理用户身份验证和授权。
以下是如何利用 GitHub 集成来为仓库加注星标的方法。

from openai import OpenAI
from composio_openai import ComposioToolSet, App

openai_client = OpenAI(api_key="******OPENAIKEY******")

# Initialise the Composio Tool Set
composio_toolset = ComposioToolSet(api_key="**\\*\\***COMPOSIO_API_KEY**\\*\\***")

## Step 4
# Get GitHub tools that are pre-configured
actions = composio_toolset.get_actions(actions=[Action.GITHUB_ACTIVITY_STAR_REPO_FOR_AUTHENTICATED_USER])

## Step 5
my_task = "Star a repo ComposioHQ/composio on GitHub"

# Create a chat completion request to decide on the action
response = openai_client.chat.completions.create(
model="gpt-4-turbo",
tools=actions, # Passing actions we fetched earlier.
messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": my_task}
  ]
)

运行此 Python 脚本以使用代理执行给定的指令。

JavaScript

您可以使用npm、、yarn或来安装它pnpm。

npm install composio-core

定义一种方法让用户连接他们的 GitHub 帐户。

import { OpenAI } from "openai";
import { OpenAIToolSet } from "composio-core";

const toolset = new OpenAIToolSet({
  apiKey: process.env.COMPOSIO_API_KEY,
});

async function setupUserConnectionIfNotExists(entityId) {
  const entity = await toolset.client.getEntity(entityId);
  const connection = await entity.getConnection('github');

  if (!connection) {
      // If this entity/user hasn't already connected, the account
      const connection = await entity.initiateConnection(appName);
      console.log("Log in via: ", connection.redirectUrl);
      return connection.waitUntilActive(60);
  }

  return connection;
}

将所需的工具添加到 OpenAI SDK 并将实体名称传递给executeAgent函数。

async function executeAgent(entityName) {
  const entity = await toolset.client.getEntity(entityName)
  await setupUserConnectionIfNotExists(entity.id);

  const tools = await toolset.get_actions({ actions: ["github_activity_star_repo_for_authenticated_user"] }, entity.id);
  const instruction = "Star a repo ComposioHQ/composio on GitHub"

  const client = new OpenAI({ apiKey: process.env.OPEN_AI_API_KEY })
  const response = await client.chat.completions.create({
      model: "gpt-4-turbo",
      messages: [{
          role: "user",
          content: instruction,
      }],
      tools: tools,
      tool_choice: "auto",
  })

  console.log(response.choices[0].message.tool_calls);
  await toolset.handle_tool_call(response, entity.id);
}

executeGithubAgent("joey")

执行代码并让代理为您完成工作。

Composio 与 LangChain、LlamaIndex、CrewAi 等著名框架兼容。

欲了解更多信息，请访问官方文档，此外，欲查看更复杂的示例，请访问存储库的示例部分。

为 Composio.dev 仓库加星标⭐

2. Julep - 构建状态代理的框架

开发人工智能应用程序，尤其是那些需要长期记忆的应用程序，面临着巨大的挑战。

Julep 正在解决这个问题。它是一个用于构建可用于生产的有状态 AI 代理的开源框架。

它们提供内置的状态管理系统，有助于高效的上下文存储和检索。

上下文存储有助于保持对话的连续性，确保与人工智能的交互随着时间的推移保持一致且与上下文相关。

使用以下命令开始pip。

pip install julep

它的工作原理如下。

from julep import Client
from pprint import pprint
import textwrap
import os

base_url = os.environ.get("JULEP_API_URL")
api_key = os.environ.get("JULEP_API_KEY")

client = Client(api_key=api_key, base_url=base_url)

#create agent
agent = client.agents.create(
    name="Jessica"
    model="gpt-4",
    tools=[]    # Tools defined here
)
#create a user
user = client.users.create(
    name="Anon",
    about="Average nerdy tech bro/girl spending 8 hours a day on a laptop,
)
#create a session
situation_prompt = """You are Jessica. You're a stuck-up Cali teenager. 
You basically complain about everything. You live in Bel-Air, Los Angeles and 
drag yourself to Curtis High School when necessary.
"""
session = client.sessions.create(
    user_id=user.id, agent_id=agent.id, situation=situation_prompt
)
#start a conversation

user_msg = "hey. what do u think of Starbucks?"
response = client.sessions.chat(
    session_id=session.id,
    messages=[
        {
            "role": "user",
            "content": user_msg,
            "name": "Anon",
        }
    ],
    recall=True,
    remember=True,
)

print("\n".join(textwrap.wrap(response.response[0][0].content, width=100)))

它们也支持 Javascript。查看它们的文档了解更多信息。

为 Julep 仓库加星标 ⭐

3. E2B——人工智能应用程序的代码解释

如果我正在构建具有代码执行功能的 AI 应用程序，例如 AI 导师或 AI 数据分析师，E2B 的代码解释器将是我的首选工具。

E2B Sandbox 是适用于 AI 代理和应用程序的安全云环境。

它允许人工智能使用与人类相同的工具（例如 GitHub 存储库和云浏览器）长时间安全运行。

他们为 Python 和 Javascript/Typescript 提供原生代码解释器 SDK。

代码解释器 SDK 允许您在安全的小型虚拟机（ E2B 沙盒 ）中运行 AI 生成的代码，以实现 AI 代码执行。沙盒内有一个 Jupyter 服务器，您可以通过其 SDK 进行控制。

使用以下命令开始使用 E2B。

npm i @e2b/code-interpreter

执行一个程序。

import { CodeInterpreter } from '@e2b/code-interpreter'

const sandbox = await CodeInterpreter.create()
await sandbox.notebook.execCell('x = 1')

const execution = await sandbox.notebook.execCell('x+=1; x')
console.log(execution.text)  // outputs 2

await sandbox.close()

有关如何使用 E2B 的更多信息，请访问其官方文档。

为 E2B 代码库加星标 ⭐

4. Camel-ai – 构建可沟通的人工智能系统

解决可扩展的多智能体协作系统可以释放构建人工智能应用程序的许多潜力。

Camel 在这方面占据了有利地位。它是一个开源框架，提供了一种可扩展的方法来研究多智能体系统的合作行为和能力。

如果您打算构建一个多代理系统，Camel 可能是开源领域中最好的选择之一。

通过安装开始pip。

pip install camel-ai

以下是使用 Camel 的方法。

from camel.messages import BaseMessage as bm
from camel.agents import ChatAgent

sys_msg = bm.make_assistant_message(
    role_name='stone',
    content='you are a curious stone wondering about the universe.')

#define agent 
agent = ChatAgent(
    system_message=sys_msg,
    message_window_size=10,    # [Optional] the length of chat memory
    )

# Define a user message
usr_msg = bm.make_user_message(
    role_name='prof. Claude Shannon',
    content='what is information in your mind?')

# Sending the message to the agent
response = agent.step(usr_msg)

# Check the response (just for illustrative purposes)
print(response.msgs[0].content)

瞧，您有了第一个 AI 代理。

欲了解更多信息，请参阅其官方文档。

为 camel-ai 仓库加星标⭐

5. CopilotKit - 为 React 应用构建 AI 辅助驾驶

如果您想在现有的 React 应用中添加 AI 功能，那就别再犹豫了。CopilotKit 可让您使用 GPT 模型自动与应用程序的前后端进行交互。

它是一个现成的 Copilot，您可以将其与您的应用程序或您可以访问的任何代码（OSS）集成。

它提供文本区域、弹出窗口、侧边栏和聊天机器人等 React 组件，以增强任何具有 AI 功能的应用程序。

使用以下命令开始使用 CopilotKit。

npm i @copilotkit/react-core @copilotkit/react-ui

必须 CopilotKit 包装所有与 CopilotKit 交互的组件。您也应该先从CopilotSidebar（稍后切换到其他 UI 提供程序）开始。

"use client";
import { CopilotKit } from "@copilotkit/react-core";
import { CopilotSidebar } from "@copilotkit/react-ui";
import "@copilotkit/react-ui/styles.css";

export default function RootLayout({children}) {
  return (
    <CopilotKit publicApiKey=" the API key or self-host (see below)">
      <CopilotSidebar>
        {children}
      </CopilotSidebar>
    </CopilotKit>
  );
}

您可以查看他们的文档以获取更多信息。

为 CopilotKit 代码库加星标⭐

6. Aider——人工智能结对编程者

想象一下，你身边有一位总是乐于助人、从不烦人的结对程序员。现在，你真的做到了！

Aider 是一个由人工智能驱动的结对程序员，可以从终端启动项目、编辑文件或使用现有的 Git 存储库等。

它可与 GPT4o、Sonnet 3.5、DeepSeek Coder、Llama 70b 等领先的 LLM 配合使用。

您可以像这样快速开始：

pip install aider-chat

# Change directory into a git repo
cd /to/your/git/repo

# Work with Claude 3.5 Sonnet on your repo
export ANTHROPIC_API_KEY=your-key-goes-here
aider

# Work with GPT-4o on your repo
export OPENAI_API_KEY=your-key-goes-here
aider

有关更多详细信息，请参阅安装说明和其他文档。

为 Aider 仓库加星标 ⭐

7. Haystack - 构建可组合的 RAG 管道

有很多用于构建 AI 管道的框架，但如果我想将可用于生产的端到端搜索管道集成到我的应用程序中，那么 Haystack 就是我的首选。

无论是 RAG、问答还是语义搜索，Haystack 高度可组合的管道使开发、维护和部署变得轻而易举。

Haystack简洁的模块化方法使其脱颖而出。
它让您能够轻松地将排序器、向量存储和解析器集成到新的或现有的流程中，从而轻松地将原型转化为可用于生产的解决方案。

Haystack 是一个仅适用于 Python 的框架；您可以使用来安装它pip。

pip install haystack-ai

现在，使用 Haystack 组件构建您的第一个 RAG 管道。

import os

from haystack import Pipeline, PredefinedPipeline
import urllib.request

os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"
urllib.request.urlretrieve("https://www.gutenberg.org/cache/epub/7785/pg7785.txt", "davinci.txt")  

indexing_pipeline =  Pipeline.from_template(PredefinedPipeline.INDEXING)
indexing_pipeline.run(data={"sources": ["davinci.txt"]})

rag_pipeline =  Pipeline.from_template(PredefinedPipeline.RAG)

query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])

要了解更多教程和概念，请查看其文档。

为 Haystack 仓库加星标 ⭐

8. Pgvectorscale - 最快的矢量数据库

如果没有矢量数据库，现代 RAG 应用程序就不完整。矢量数据库将文档（文本、图像）存储为嵌入，使用户能够搜索语义相似的文档。

Pgvectorscale 是 PostgreSQL 向量数据库 PgVector 的扩展，可以与现有的 Postgres 数据库无缝集成。

如果您正在构建一个使用向量存储的应用程序，那么这绝对是明智之举。Pgvectorscale
的性能优于 Pinecone 的存储优化索引 (s1)。而且成本降低了 75%。

您可以从源代码安装它，使用 Yum、Homebrew、apt 等包管理器，或者使用 Docker 容器。

要开始使用它，请编译并安装。

# install prerequisites
## rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
## pgrx
cargo install --locked cargo-pgrx
cargo pgrx init --pg16 pg_config

#download, build and install pgvectorscale
cd /tmp
git clone --branch <version> https://github.com/timescale/pgvectorscale
cd pgvectorscale/pgvectorscale
cargo pgrx install --release

连接到您的数据库：

psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>"

创建 pgvectorscale 扩展：

CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;

CASCADE 自动安装 pgvector。

创建一个包含嵌入列的表。例如：

CREATE TABLE IF NOT EXISTS document_embedding  (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    metadata JSONB,
    contents TEXT,
    embedding VECTOR(1536)
)

有关如何使用此功能的更多信息，请查看存储库。

为 Pgvectorscale 仓库加星标⭐

9. GPTCache - AI 应用的语义缓存

法学硕士学位很昂贵。

如果您正在构建一个需要使用聊天模型进行更多扩展对话的应用程序，并且不想耗尽信用卡，那么您需要缓存。

然而，传统的缓存在这里毫无用处。这时 GPTCache 就派上用场了。

它是来自 Milvus 向量存储的母公司 Zilliz 的语义缓存工具。

它允许您将对话存储在您首选的向量存储中。
在向 LLM 发送查询之前，它会搜索向量存储；如果匹配，则获取它。否则，它会将请求路由到模型。

欲了解更多信息，请访问官方文档页面。

为 GPTCache 存储库加星标⭐

10. Mem0（EmbedChain）——构建个性化的 LLM 应用程序

Mem0 为大型语言模型提供了一个智能、自我改进的内存层，

它允许您为用户、代理和会话添加持久内存。
如果您正在基于自定义数据构建聊天机器人或问答系统，请考虑 Mem0。

开始使用 Mem0 pip。

pip install mem0ai

以下是如何使用 Mem0 向大型语言模型添加内存层。

from mem0 import Memory

# Initialize Mem0
m = Memory()

# Store a memory from any unstructured text
result = m.add("I am working on improving my tennis skills. Suggest some online courses.", user_id="Alice", metadata={"category": "hobbies"})
print(result)
# Created memory: Improving her tennis skills. Looking for online suggestions.

# Retrieve memories
all_memories = m.get_all()
print(all_memories)

# Search memories
related_memories = m.search(query="What are Alice's hobbies?", user_id="alice")
print(related_memories)

# Update a memory
result = m.update(memory_id="m1", data="Likes to play tennis on weekends")
print(result)

# Get memory history
history = m.history(memory_id="m1")
print(history)

更多信息请参阅官方文档。

为 Mem0 (Embedchain) 存储库加星标⭐

11. FastEmbed——更快嵌入文档

执行速度在软件开发中至关重要，在构建人工智能应用程序时则更为重要。

通常，嵌入生成会花费很长时间，从而减慢整个流程的速度。然而，情况不应该如此。

Qdrant 的 FastEmbed 是一个快速、轻量级的 Python 库，专为嵌入生成而构建。

它使用 ONNX 运行时而非 Pytorch，从而提高速度。它还支持大多数最先进的开源嵌入模型。

要开始使用 FastEmbed，请使用安装它pip。

pip install fastembed

# or with GPU support

pip install fastembed-gpu

以下是创建文档嵌入的方法。

from fastembed import TextEmbedding
from typing import List

# Example list of documents
documents: List[str] = [
    "This is built to be faster and lighter than other embedding libraries, e.g. Transformers, Sentence-Transformers, etc.",
    "FastEmbed is supported by and maintained by Quadrant."
]

# This will trigger the model download and initialization
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

embeddings_generator = embedding_model.embed(documents)  # reminder this is a generator
embeddings_list = list(embedding_model.embed(documents))
 # You can also convert the generator to a list, and that to a Numpy array
len(embeddings_list[0]) # Vector of 384 dimensions

查看他们的存储库以获取更多信息。

为 FastEmbed 仓库加星标⭐

12. 讲师 - 从法学硕士 (LLM) 中提取结构化数据

如果您曾经使用过 LLM 输出，您就会知道验证结构化响应可能很有挑战性。

Instructor 是一个开源工具，可简化 LLM 输出的验证、重试和流式传输。

它使用 Pydantic 对 Python 进行数据验证，使用 Zod 对 JS/TS 进行数据验证，并支持 openAI 之外的各种模型提供商。

使用以下命令开始使用 Instructor。

npm i @instructor-ai/instructor zod openai

现在，您可以按照以下步骤从 LLM 响应中提取结构化数据。


import Instructor from "@instructor-ai/instructor";
import OpenAI from "openai"
import { z } from "zod"

const oai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY ?? undefined,
  organization: process.env.OPENAI_ORG_ID ?? undefined
})

const client = Instructor({
  client: oai,
  mode: "TOOLS"
})

const UserSchema = z.object({
  // Description will be used in the prompt
  age: z.number().describe("The age of the user"), 
  name: z.string()
})

// User will be of type z.infer<typeof UserSchema>
const user = await client.chat.completions.create({
  messages: [{ role: "user", content: "Jason Liu is 30 years old" }],
  model: "gpt-3.5-turbo",
  response_model: { 
    schema: UserSchema, 
    name: "User"
  }
})

console.log(user)
// { age: 30, name: "Jason Liu" }

欲了解更多信息，请访问官方文档页面。

为导师加星标⭐

13. LiteLLM——OpenAI 格式的 LLM 的直接替代品

说实话，我们都曾在某个时候尖叫过，因为新的模型提供商没有遵循 OpenAI SDK 的文本、图像或嵌入生成格式。

但是，使用 LiteLLM，使用相同的实现格式，您可以使用任何模型提供程序（Claude、Gemini、Groq、Mistral、Azure AI、Bedrock 等）作为 OpenAI 模型的直接替代品。

它们还支持 100 多个 LLM 的负载平衡、回退和支出跟踪。

使用安装 LiteLLM pip。

pip install litellm

以下是如何使用 Claude-2 模型作为 GPT 模型的直接替代品。

from litellm import completion
import os

# LiteLLM with OpenAI Models

os.environ["OPENAI_API_KEY"] = "your-API-key"

response = completion(
  model="gpt-3.5-turbo",
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

# LiteLLM with Claude Models
os.environ["ANTHROPIC_API_KEY"] = "your-API-key"

response = completion(
  model="claude-2",
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

更多信息请参阅其官方文档。

获得 LiteLLM 的星标⭐

您是否使用或构建过其他一些很酷的工具或框架？

请在评论中让我知道:)

文章来源：https://dev.to/composiodev/13-hidden-open-source-libraries-to-become-an-ai-wizard-4ng9