深度學習系列75：sql大模型工具vanna

1. 概述

vanna是一個可以將自然語言轉為sql的工具。簡單的demo如下：

!pip install vanna
import vanna
from vanna.remote import VannaDefault
vn = VannaDefault(model='chinook', api_key=vanna.get_api_key('my-email@example.com'))
vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')
vn.ask("What are the top 10 albums by sales?")

執行下面的代碼運行圖形界面

from vanna.flask import VannaFlaskApp
VannaFlaskApp(vn).run()

2. 配置

數據庫可以是任何數據庫，比如mysql如下：

import pandas as pd
import psycopg2def run_sql(sql):conn = psycopg2.connect(host="localhost",database="my_database",user="my_user",password="my_password")return pd.read_sql(sql, conn)vn.run_sql = run_sql
vn.run_sql_is_set = True

向量數據庫稍微麻煩一些，目前支持的包括：
參考代碼如下：

from vanna.chromadb.chromadb_vector import ChromaDB_VectorStore
class MyVanna(ChromaDB_VectorStore):def __init__(self, config=None):ChromaDB_VectorStore.__init__(self, config=config)vn = MyVanna(config={'path': '/path/to/chromadb'})

3. 訓練

訓練數據可以是：DDL、documentation、sql以及Question-SQL Pairs

vn.train(ddl="CREATE TABLE my_table (id INT, name TEXT)")
vn.train(documentation="Our business defines XYZ as ABC")
vn.train(sql="SELECT col1, col2, col3 FROM my_table")

可以設置auto_train = True

4. 詢問

vn.ask("What are the top 10 customers by sales?")

它包含下列幾個函數：

vn.generate_sql
vn.run_sql
vn.generate_plotly_code
vn.get_plotly_figure

visualize=False

5. 啟用服務

參考https://github.com/vanna-ai/vanna-flask，將LLM、embedding、vectorStore都改造成自己的代碼。
首先是LLM，改造框架為：

from vanna.base import VannaBase
class MyLLM(VannaBase):def __init__(self,config=None):VannaBase.__init__(self, config=config)...def system_message(self, message: str) -> any:return {"role": "system", "content": message}def user_message(self, message: str) -> any:return {"role": "user", "content": message}def assistant_message(self, message: str) -> any:return {"role": "assistant", "content": message}def submit_prompt(self, prompt, **kwargs) -> str:...

然后是embedding，需要定義encode_documents和encode_queries兩個函數，例如：

class BgeM3:def __init__(self, url):self.url = urldef encode_documents(self, docs):....def encode_queries(self, queries):....

接下來是vectorStore，我們使用milvus，它會自動調用config中的embedding_function，我們把它定義成上面的BegM3即可：

class MyVanna(Milvus_VectorStore, QwenLLM):def __init__(self, config=None):Milvus_VectorStore.__init__(self, config=config)QwenLLM.__init__(self, config=config)vn = MyVanna(config={'milvus_client': MilvusClient(...),'embedding_function':BgeM3(...)})

然后定義連接的數據庫，可以換成任意的其他數據庫：

def run_sql(sql: str) -> pd.DataFrame:cnx = mysql.connector.connect(...)cursor = cnx.cursor()cursor.execute(sql)result = cursor.fetchall()columns = cursor.column_namesdf = pd.DataFrame(result, columns=columns)return dfvn.run_sql = run_sql
vn.run_sql_is_set = True

接著執行python app.py即可啟用服務，訪問localhost:5000可以打開頁面：
在這里插入圖片描述
同時也可以調用接口：

import requests
response = requests.get(url+'/api/v0/get_training_data',headers={'Content-Type':'application/json'})
response.json()

所有可用的接口清單可以參考app.py。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/66700.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/66700.shtml
英文地址，請注明出處：http://en.pswp.cn/web/66700.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！