使用您的 CSV 聊天：使用 Langchain 和 Streamlit 可视化您的数据

大型语言模型 (LLM) 的功能日益强大。这些模型可用于各种任务，包括生成文本、翻译语言和回答问题。

Langchain是一个 Python 模块，它使 LLM 的使用更加便捷。Langchain 提供了访问 LLM 的标准接口，并支持多种 LLM，包括 GPT-3、LLama 和 GPT4All。

在本文中，我将展示如何使用 Langchain 分析 CSV 文件。我们将使用 OpenAI API 访问 GPT-3，并使用 Streamlit 创建用户界面。用户可以上传 CSV 文件并提出有关数据的问题。系统随后会生成答案，并绘制表格和图表。

入门

首先，您需要安装langchain、openai和。您可以使用 pip 安装它们streamlit：python-environ

pip install langchain openai streamlit python-environ tabulate

设置代理

我已将该项目的所有代码包含在我的github上。

设置代理非常简单，因为我们将使用create_pandas_dataframe_agentlangchain 自带的。对于不熟悉的人来说，代理是一个可以访问和使用大型语言模型 (LLM) 的软件程序。代理负责接收用户输入、处理输入并生成响应。它们还可以访问和处理来自其他来源的数据，例如数据库、API 以及本例中的 csv 文件。

我们将使用该python-environ模块来管理 API 密钥。

创建一个.env文件并将密钥添加到其中，如下所示：

apikey=your_openai_api_key

创建一个名为的文件agent.py并添加以下代码：

# agent.py
from langchain import OpenAI
from langchain.agents import create_pandas_dataframe_agent
import pandas as pd

# Setting up the api key
import environ

env = environ.Env()
environ.Env.read_env()

API_KEY = env("apikey")


def create_agent(filename: str):
    """
    Create an agent that can access and use a large language model (LLM).

    Args:
        filename: The path to the CSV file that contains the data.

    Returns:
        An agent that can access and use the LLM.
    """

    # Create an OpenAI object.
    llm = OpenAI(openai_api_key=API_KEY)

    # Read the CSV file into a Pandas DataFrame.
    df = pd.read_csv(filename)

    # Create a Pandas DataFrame agent.
    return create_pandas_dataframe_agent(llm, df, verbose=False)

该create_agent函数以 CSV 文件的路径作为输入，并返回一个可以访问和使用大型语言模型 (LLM) 的代理。该函数首先创建一个 OpenAI 对象，然后将 CSV 文件读入 Pandas DataFrame 中。最后，它创建一个 Pandas DataFrame 代理并返回它。

现在添加以下函数到agent.py：

#agent.py

# ...

def query_agent(agent, query):
    """
    Query an agent and return the response as a string.

    Args:
        agent: The agent to query.
        query: The query to ask the agent.

    Returns:
        The response from the agent as a string.
    """

    prompt = (
        """
            For the following query, if it requires drawing a table, reply as follows:
            {"table": {"columns": ["column1", "column2", ...], "data": [[value1, value2, ...], [value1, value2, ...], ...]}}

            If the query requires creating a bar chart, reply as follows:
            {"bar": {"columns": ["A", "B", "C", ...], "data": [25, 24, 10, ...]}}

            If the query requires creating a line chart, reply as follows:
            {"line": {"columns": ["A", "B", "C", ...], "data": [25, 24, 10, ...]}}

            There can only be two types of chart, "bar" and "line".

            If it is just asking a question that requires neither, reply as follows:
            {"answer": "answer"}
            Example:
            {"answer": "The title with the highest rating is 'Gilead'"}

            If you do not know the answer, reply as follows:
            {"answer": "I do not know."}

            Return all output as a string.

            All strings in "columns" list and data list, should be in double quotes,

            For example: {"columns": ["title", "ratings_count"], "data": [["Gilead", 361], ["Spider's Web", 5164]]}

            Lets think step by step.

            Below is the query.
            Query: 
            """
        + query
    )

    # Run the prompt through the agent.
    response = agent.run(prompt)

    # Convert the response to a string.
    return response.__str__()

函数query_agent是所有魔法发生的地方。该函数接受一个代理（Pandas DataFrame 代理）和一个查询作为输入，并以字符串形式返回代理的响应。该函数首先为代理创建一个提示。在这个提示中，我们指定所需的响应类型。我希望代理返回一个字符串，该字符串稍后会被转换为字典，并根据字典的内容，程序将渲染图形、表格或简单的文本响应。

设置streamlit界面

Streamlit是一个开源 Python 库，可轻松创建用于机器学习和数据科学的 Web 应用。Streamlit 的设计旨在快速易用，无需任何 JavaScript 或 CSS 知识即可创建美观的交互式应用。更多信息，请参阅文档。

Streamlit 相当容易使用。创建一个名为的文件interface.py并添加以下内容：

import streamlit as st
import pandas as pd
import json

from agent import query_agent, create_agent


def decode_response(response: str) -> dict:
    """This function converts the string response from the model to a dictionary object.

    Args:
        response (str): response from the model

    Returns:
        dict: dictionary with response data
    """
    return json.loads(response)

该decode_response函数只是将代理的响应（字符串）转换为字典。

添加以下代码到interface.py：

#interface.py

#...

def write_response(response_dict: dict):
    """
    Write a response from an agent to a Streamlit app.

    Args:
        response_dict: The response from the agent.

    Returns:
        None.
    """

    # Check if the response is an answer.
    if "answer" in response_dict:
        st.write(response_dict["answer"])

    # Check if the response is a bar chart.
    if "bar" in response_dict:
        data = response_dict["bar"]
        df = pd.DataFrame(data)
        df.set_index("columns", inplace=True)
        st.bar_chart(df)

    # Check if the response is a line chart.
    if "line" in response_dict:
        data = response_dict["line"]
        df = pd.DataFrame(data)
        df.set_index("columns", inplace=True)
        st.line_chart(df)

    # Check if the response is a table.
    if "table" in response_dict:
        data = response_dict["table"]
        df = pd.DataFrame(data["data"], columns=data["columns"])
        st.table(df)

此函数以响应字典作为输入，并将响应写入 Streamlit 应用。它可用于将答案、条形图、折线图和表格写入应用。

它首先检查响应是否为“答案”，也就是说，它是否只是针对诸如“文档中有多少行？”之类的问题的普通文本响应。如果是，该函数会将答案写入应用程序。

然后，该函数检查响应是否为条形图。如果是，该函数将根据响应中的数据创建一个条形图，并将其写入应用程序。

然后，该函数检查响应是否为折线图。如果是，该函数将根据响应中的数据创建折线图，并将该图表写入应用程序。

然后，该函数检查响应是否为表。如果是，则该函数根据响应中的数据创建一个表，并将该表写入应用程序。

最后，我们将创建初始界面。添加以下几行：

#interface.py

#...

st.title("👨‍💻 Chat with your CSV")

st.write("Please upload your CSV file below.")

data = st.file_uploader("Upload a CSV")

query = st.text_area("Insert your query")

if st.button("Submit Query", type="primary"):
    # Create an agent from the CSV file.
    agent = create_agent(data)

    # Query the agent.
    response = query_agent(agent=agent, query=query)

    # Decode the response.
    decoded_response = decode_response(response)

    # Write the response to the Streamlit app.
    write_response(decoded_response)