使用vanna AI 进行text2sql的业务炼丹体验

官网:Vanna.AI – Personalized AI SQL Agent

本文以 mysql-ollama-chromadb 为例。

一、前置要求

1、安装 ollama,见官网:Ollama

自行拉取安装所需模型即可

2、安装 python ,官网下载即可

# 配置下全局镜像源
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

二、使用虚拟环境开发

cd your_project

# 这里使用 uv 来作为管理
pip install uv

# 安装依赖
uv pip install 'vanna[chromadb,ollama,mysql]'

# 进入虚拟环境
.venv\Scripts\activate

三、使用

1、编写代码,新建 main.py ,安装官方文档示例进行配置

使用 Ollama 和 ChromaDB 生成 SQL for MySQL – Vanna.AI

完整示例代码如下:(注意修改配置)

from vanna.ollama import Ollama
from vanna.chromadb import ChromaDB_VectorStore
from vanna.flask import VannaFlaskApp


class MyVanna(ChromaDB_VectorStore, Ollama):
    def __init__(self, config=None):
        ChromaDB_VectorStore.__init__(self, config=config)
        Ollama.__init__(self, config=config)

# 配置 ollama 模型
vn = MyVanna(config={'model': 'deepseek-r1:7b'})
# 连接数据库
vn.connect_to_mysql(host='my-host', dbname='my-db', user='my-user', password='my-password', port=123)



# ------------------ 训练信息 ------------------

# The information schema query may need some tweaking depending on your database. This is a good starting point.
# 读取表元信息
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")

# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
plan

# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)

'''
# The following are methods for adding training data. Make sure you modify the examples to match your database.

# DDL statements are powerful because they specify table names, colume names, types, and potentially relationships
vn.train(ddl="""
    CREATE TABLE IF NOT EXISTS my-table (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        age INT
    )
""")

# Sometimes you may want to add documentation about your business terminology or definitions.
vn.train(documentation="Our business defines OTIF score as the percentage of orders that are delivered on time and in full")

# You can also add SQL queries to your training data. This is useful if you have some queries already laying around. 
# You can just copy and paste those from your editor to begin generating new SQL.
vn.train(sql="SELECT * FROM my-table WHERE name = 'John Doe'")

# At any time you can inspect what training data the package is able to reference
training_data = vn.get_training_data()
training_data


# You can remove training data if there's obsolete/incorrect information. 
vn.remove_training_data(id='1-ddl')

## Asking the AI
# Whenever you ask a new question, it will find the 10 most relevant pieces of training data and use it as part of the LLM prompt to generate the SQL.
vn.ask(question=...)
'''

# ------------------ 训练信息 ------------------



# 启动 WebUI
app = VannaFlaskApp(vn)
app.run()

注意:这里训练如果数据表有变更可以多次训练

2、使用WebUI

直接 python main.py 命令启动运行!

期间可能会下载其他依赖,耐心等待多试几次即可。这里可能需要依赖魔法网络,或可参考 huggingface 的镜像网站解决:参见 HF-Mirror

启动成功后会显示地址,打开即可!

WebUI 提供 :训练数据(DDL/文档/sql)、删除、问答训练标记等功能,从这里开始炼丹~

训练部分可参考文章:
How to Train – Vanna.AI Documentation
https://juejin.cn/post/7457182240075612201

Author: thinkwei

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注