LlamaIndex

1、大語言模型開發框架的價值是什么?

SDK:Software Development Kit,它是一組軟件工具和資源的集合,旨在幫助開發者創建、測試、部署和維護應用程序或軟件。

所有開發框架(SDK)的核心價值,都是降低開發、維護成本。

大語言模型開發框架的價值,是讓開發者可以更方便地開發基于大語言模型的應用。主要提供兩類幫助:

  1. 第三方能力抽象。比如 LLM、向量數據庫、搜索接口等
  2. 常用工具、方案封裝
  3. 底層實現封裝。比如流式接口、超時重連、異步與并行等

好的開發框架,需要具備以下特點:

  1. 可靠性、魯棒性高
  2. 可維護性高
  3. 可擴展性高
  4. 學習成本低

舉些通俗的例子:

  • 與外部功能解依賴
    • 比如可以隨意更換 LLM 而不用大量重構代碼
    • 更換三方工具也同理
  • 經常變的部分要在外部維護而不是放在代碼里
    • 比如 Prompt 模板
  • 各種環境下都適用
    • 比如線程安全
  • 方便調試和測試
    • 至少要能感覺到用了比不用方便吧
    • 合法的輸入不會引發框架內部的報錯
劃重點:選對了框架,事半功倍;反之,事倍功半。
舉個例子:使用 SDK,4 行代碼實現一個簡易的 RAG 系統
!pip install --upgrade llama-index
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderdocuments = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()response = query_engine.query("llama2有多少參數")
print(response)

2、LlamaIndex 介紹

「 LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. 」

LlamaIndex 是一個為開發「上下文增強」的大語言模型應用的框架(也就是 SDK)。上下文增強,泛指任何在私有或特定領域數據基礎上應用大語言模型的情況。例如:

在這里插入圖片描述

  • Question-Answering Chatbots (也就是 RAG)

  • Document Understanding and Extraction (文檔理解與信息抽取)

  • Autonomous Agents that can perform research and take actions (智能體應用)

LlamaIndex 有 Python 和 Typescript 兩個版本,Python 版的文檔相對更完善。

  • Python 文檔地址:https://docs.llamaindex.ai/en/stable/

  • Python API 接口文檔:https://docs.llamaindex.ai/en/stable/api_reference/

  • TS 文檔地址:https://ts.llamaindex.ai/

  • TS API 接口文檔:https://ts.llamaindex.ai/api/

LlamaIndex 是一個開源框架,Github 鏈接:https://github.com/run-llama

LlamaIndex 的核心模塊

在這里插入圖片描述

安裝 LlamaIndex

  1. Python
pip install llama-index
  1. Typescript
# 通過 npm 安裝
npm install llamaindex# 通過 yarn 安裝
yarn add llamaindex# 通過 pnpm 安裝
pnpm add llamaindex

本博客以 Python 版為例進行講解。

3、數據加載(Loading)

SimpleDirectoryReader 是一個簡單的本地文件加載器。它會遍歷指定目錄,并根據文件擴展名自動加載文件(文本內容)。

支持的文件類型:

  • .csv - comma-separated values
  • .docx - Microsoft Word
  • .epub - EPUB ebook format
  • .hwp - Hangul Word Processor
  • .ipynb - Jupyter Notebook
  • .jpeg, .jpg - JPEG image
  • .mbox - MBOX email archive
  • .md - Markdown
  • .mp3, .mp4 - audio and video
  • .pdf - Portable Document Format
  • .png - Portable Network Graphics
  • .ppt, .pptm, .pptx - Microsoft PowerPoint
import json
from pydantic.v1 import BaseModeldef show_json(data):"""用于展示json數據"""if isinstance(data, str):obj = json.loads(data)print(json.dumps(obj, indent=4))elif isinstance(data, dict) or isinstance(data, list):print(json.dumps(data, indent=4))elif issubclass(type(data), BaseModel):print(json.dumps(data.dict(), indent=4, ensure_ascii=False))def show_list_obj(data):"""用于展示一組對象"""if isinstance(data, list):for item in data:show_json(item)else:raise ValueError("Input is not a list")from llama_index.core import SimpleDirectoryReaderreader = SimpleDirectoryReader(input_dir="./data",  # 目標目錄recursive=False,  # 是否遞歸遍歷子目錄required_exts=[".pdf"]  # (可選)只讀取指定后綴的文件
)
documents = reader.load_data()show_json(documents[0])
print(documents[0].text)
{"id_": "358482ee-4232-45eb-a5ae-8f595f16c8cd","embedding": null,"metadata": {"page_label": "1","file_name": "llama2-extracted.pdf","file_path": "/home/jovyan/lecture-notes/07-llamaindex/data/llama2-extracted.pdf","file_type": "application/pdf","file_size": 401338,"creation_date": "2024-06-14","last_modified_date": "2024-06-14"},"excluded_embed_metadata_keys": ["file_name","file_type","file_size","creation_date","last_modified_date","last_accessed_date"],"excluded_llm_metadata_keys": ["file_name","file_type","file_size","creation_date","last_modified_date","last_accessed_date"],"relationships": {},"text": "Llama 2: OpenFoundation andFine-Tuned ChatModels\nHugo Touvron?Louis Martin?Kevin Stone?\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov SoumyaBatra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel LukasBlecher Cristian CantonFerrer MoyaChen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu BrianFuller\nCynthia Gao VedanujGoswami NamanGoyal AnthonyHartshorn Saghar Hosseini RuiHou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa IsabelKloumann ArtemKorenev\nPunit Singh Koura Marie-AnneLachaux ThibautLavril Jenya Lee Diana Liskovich\nYinghai Lu YuningMao Xavier Martinet Todor Mihaylov PushkarMishra\nIgor Molybog Yixin Nie AndrewPoulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\nAlan Schelten Ruan Silva EricMichael Smith Ranjan Subramanian XiaoqingEllenTan BinhTang\nRoss Taylor AdinaWilliams JianXiang Kuan PuxinXu ZhengYan Iliyan Zarov YuchenZhang\nAngela Fan MelanieKambadur SharanNarang Aurelien Rodriguez RobertStojnic\nSergey Edunov ThomasScialom?\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and fine-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur fine-tuned LLMs, called Llama 2-Chat , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosed-\nsource models. We provide a detailed description of our approach to fine-tuning and safety\nimprovements of Llama 2-Chat in order to enable the community to build on our work and\ncontribute to the responsibledevelopmentof LLMs.\n?Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com\n?Second author\nContributions for all the authors can be found in Section A.1.arXiv:2307.09288v2  [cs.CL]  19 Jul 2023","mimetype": "text/plain","start_char_idx": null,"end_char_idx": null,"text_template": "{metadata_str}\n\n{content}","metadata_template": "{key}: {value}","metadata_seperator": "\n","class_name": "Document"
}
Llama 2: OpenFoundation andFine-Tuned ChatModels
Hugo Touvron?Louis Martin?Kevin Stone?
Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov SoumyaBatra
Prajjwal Bhargava Shruti Bhosale Dan Bikel LukasBlecher Cristian CantonFerrer MoyaChen
Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu BrianFuller
Cynthia Gao VedanujGoswami NamanGoyal AnthonyHartshorn Saghar Hosseini RuiHou
Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa IsabelKloumann ArtemKorenev
Punit Singh Koura Marie-AnneLachaux ThibautLavril Jenya Lee Diana Liskovich
Yinghai Lu YuningMao Xavier Martinet Todor Mihaylov PushkarMishra
Igor Molybog Yixin Nie AndrewPoulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi
Alan Schelten Ruan Silva EricMichael Smith Ranjan Subramanian XiaoqingEllenTan BinhTang
Ross Taylor AdinaWilliams JianXiang Kuan PuxinXu ZhengYan Iliyan Zarov YuchenZhang
Angela Fan MelanieKambadur SharanNarang Aurelien Rodriguez RobertStojnic
Sergey Edunov ThomasScialom?
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our fine-tuned LLMs, called Llama 2-Chat , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosed-
source models. We provide a detailed description of our approach to fine-tuning and safety
improvements of Llama 2-Chat in order to enable the community to build on our work and
contribute to the responsibledevelopmentof LLMs.
?Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com
?Second author
Contributions for all the authors can be found in Section A.1.arXiv:2307.09288v2  [cs.CL]  19 Jul 2023
注意:對圖像、視頻、語音類文件,默認不會自動提取其中文字。如需提取,參考下面介紹的 Data Connectors

默認的 PDFReader 效果并不理想,我們可以更換文件加載器

# !pip install pymupdf
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PyMuPDFReaderreader = SimpleDirectoryReader(input_dir="./data", # 目標目錄recursive=False, # 是否遞歸遍歷子目錄required_exts=[".pdf"], # (可選)只讀取指定后綴的文件file_extractor={".pdf": PyMuPDFReader()} # 指定特定的文件加載器)documents = reader.load_data()print(documents[0].text)
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron?
Louis Martin?
Kevin Stone?
Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra
Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen
Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller
Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou
Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev
Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich
Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra
Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi
Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov
Thomas Scialom?
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
our human evaluations for helpfulness and safety, may be a suitable substitute for closed-
source models. We provide a detailed description of our approach to fine-tuning and safety
improvements of Llama 2-Chat in order to enable the community to build on our work and
contribute to the responsible development of LLMs.
?Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com
?Second author
Contributions for all the authors can be found in Section A.1.
arXiv:2307.09288v2  [cs.CL]  19 Jul 2023

更多的 PDF 加載器還有 SmartPDFLoaderLlamaParse, 二者都提供了更豐富的解析能力,包括解析章節與段落結構等。但不是 100%準確,偶有文字丟失或錯位情況,建議根據自身需求詳細測試評估。

3.2、Data Connectors

用于處理更豐富的數據類型,并將其讀取為 Document 的形式(text + metadata)。

更多 Data Connectors
  • 內置的文件加載器
  • 連接三方服務的數據加載器,例如數據庫
  • 更多加載器可以在 LlamaHub 上找到

4、文本切分與解析(Chunking)

為方便檢索,我們通常把 Document 切分為 Node

在 LlamaIndex 中,Node 被定義為一個文本的「chunk」。

4.1、使用 TextSplitters 對文本做切分

例如:TokenTextSplitter 按指定 token 數切分文本

from llama_index.core.node_parser import TokenTextSplitterfrom llama_index.core import SimpleDirectoryReaderreader = SimpleDirectoryReader(input_dir="./data",  # 目標目錄recursive=False,  # 是否遞歸遍歷子目錄required_exts=[".pdf"]  # (可選)只讀取指定后綴的文件
)
documents = reader.load_data()
node_parser = TokenTextSplitter(chunk_size=100,  # 每個 chunk 的最大長度chunk_overlap=50  # chunk 之間重疊長度
)nodes = node_parser.get_nodes_from_documents(documents, show_progress=False
)for node in nodes:print(node)
D:\develop\anaconda3\envs\llm-project\python.exe D:\projects\llm-project\llama-index\TextSplitters.py 
Node ID: be6157bd-acd5-419b-903b-fb335ebf1805
Text: Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo
Touvron? Louis Martin? Kevin Stone? Peter Albert Amjad Almahairi
Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti
Bhosale Dan Bikel Lukas Blecher
Node ID: b76c809f-16b6-4988-94e2-7feafcfdd506
Text: Louis Martin? Kevin Stone? Peter Albert Amjad Almahairi Yasmine
Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale
Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem
Cucurull David
Node ID: 578cb67c-1bff-4dd8-9329-af4c2f8f881b
Text: Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti
Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen
Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian
Fuller Cynthia Gao
Node ID: dbb31d51-67ad-4d21-93d5-5062275a66c1
Text: Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer
Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu
Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony
Hartshorn Saghar Hosseini Rui
Node ID: 2d6466e9-a4d4-454e-a2d3-631364d0a126
Text: Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian
Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn
Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian
Khabsa Isabel Kloumann
Node ID: 3dcaf924-e52e-4d93-98bc-9292a1884a40
Text: Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar
Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa
Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux
Thibaut Lavril Jenya
Node ID: 8fa64cb9-c510-4223-b0cb-e223178ebdb9
Text: Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa
Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux
Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier
Martinet Todor
Node ID: 9f5c2d90-ffe9-4efe-a5b1-68eef0119b5d
Text: Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura
Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu
Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog
Yixin Nie Andrew
Node ID: fff15a52-a0da-40c8-a09a-0fe1ea2d6ea1
Text: Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich
Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra
Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta
Kalyan Saladi Alan
Node ID: 5fa0508f-3ca7-4425-a48c-a9cde1112060
Text: Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor
Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta
Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan
Subramanian Xiaoqing Ellen Tan Binh Tang Ross
Node ID: ee895744-82ae-49ad-94c0-8368c0790dea
Text: Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta
Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan
Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams
Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov
Node ID: 0138d1ac-6358-47c3-8f52-d0054b80a568
Text: Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan
Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams
Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela
Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert
Node ID: 73429f67-589e-442d-89af-1513be6bb88f
Text: Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian
Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan
Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom? GenAI,
Node ID: 9d37dbc7-16ef-4036-bb8f-5f3091dba100
Text: Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie
Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov
Thomas Scialom? GenAI, Meta Abstract In this work, we develop and
release Llama 2, a collection of pretrained and
Node ID: 87459f44-90eb-43c8-9e4c-329777efe449
Text: Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov
Thomas Scialom? GenAI, Meta Abstract In this work, we develop and
release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to
Node ID: 8f8a5262-03a3-4e0d-9c5f-f6830bd2cf27
Text: Scialom? GenAI, Meta Abstract In this work, we develop and
release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion
parameters. Our fine-tuned LLMs, calledLlama
Node ID: fc5b70c8-4e78-40bf-9d77-911391e5ea1c
Text: we develop and release Llama 2, a collection of pretrained and
fine-tuned large language models (LLMs) ranging in scale from 7
billion to 70 billion parameters. Our fine-tuned LLMs, calledLlama
2-Chat, are optimized for dialogue use cases. Our models outperform
open-source chat models
Node ID: 6d36968c-de01-4afa-97dc-7987a192bf30
Text: models (LLMs) ranging in scale from 7 billion to 70 billion
parameters. Our fine-tuned LLMs, calledLlama 2-Chat, are optimized for
dialogue use cases. Our models outperform open-source chat models on
most benchmarks we tested, and based on our human evaluations for
helpfulness and safety, may
Node ID: e7ef4823-6937-4d80-824e-693bd81dc149
Text: LLMs, calledLlama 2-Chat, are optimized for dialogue use cases.
Our models outperform open-source chat models on most benchmarks we
tested, and based on our human evaluations for helpfulness and safety,
may be a suitable substitute for closed- source models. We provide a
detailed description of our approach to fine-tuning
Node ID: d1c044a6-02e6-4d88-a92f-a96b65da6de0
Text: outperform open-source chat models on most benchmarks we tested,
and based on our human evaluations for helpfulness and safety, may be
a suitable substitute for closed- source models. We provide a detailed
description of our approach to fine-tuning and safety improvements
ofLlama 2-Chatin order to enable the community to build on our
Node ID: de25c573-8d6f-4d81-8cc0-2df0b8706f6d
Text: helpfulness and safety, may be a suitable substitute for closed-
source models. We provide a detailed description of our approach to
fine-tuning and safety improvements ofLlama 2-Chatin order to enable
the community to build on our work and contribute to the responsible
development of LLMs. ?Equal contribution, corresponding
Node ID: 366723b1-8227-4c8e-9cdf-fb6596c6b607
Text: description of our approach to fine-tuning and safety
improvements ofLlama 2-Chatin order to enable the community to build
on our work and contribute to the responsible development of LLMs.
?Equal contribution, corresponding authors: {tscialom,
htouvron}@meta.com ?Second
Node ID: b1d0a429-f88c-4eaf-a5e2-438395063dd1
Text: 2-Chatin order to enable the community to build on our work and
contribute to the responsible development of LLMs. ?Equal
contribution, corresponding authors: {tscialom, htouvron}@meta.com
?Second author Contributions for all the authors can be found in
Section
Node ID: 3bd1cf71-efdd-476c-ae63-f6813cd426ff
Text: work and contribute to the responsible development of LLMs.
?Equal contribution, corresponding authors: {tscialom,
htouvron}@meta.com ?Second author Contributions for all the authors
can be found in Section A.1. arXiv:2307.09288v2  [cs.CL]
Node ID: a419b2de-57a0-4d13-8316-89ca3c388038
Text: authors: {tscialom, htouvron}@meta.com ?Second author
Contributions for all the authors can be found in Section A.1.
arXiv:2307.09288v2  [cs.CL]  19 Jul 2023
Node ID: c14bc803-2b66-4791-acf1-40c3c3824248
Text: Contents 1 Introduction 3 2 Pretraining 5 2.1 Pretraining Data .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 5 2.2
Node ID: cd88caab-f720-44d0-b60f-52d3f5a8c47a
Text: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 5 2.2 Training Details . . . . . . . . . . . .
. . . . . .
Node ID: 060e2c44-e137-48fc-997a-fef9e1253af1
Text: . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Training
Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
Node ID: 84a3b614-fa9a-4a10-933f-4ea49eb42da5
Text: . . . . 5 2.2 Training Details . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Llama
2Pretrained Model
Node ID: d147d85b-a6f9-4c26-b7d7-61e897c67c67
Text: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 5 2.3 Llama 2Pretrained Model Evaluation . . . . . . . . . .
. . . . . . . . .
Node ID: 74ecb943-d7b1-479e-b280-8a94be4c97f5
Text: . . . . . . . . . . . . . . . . . 5 2.3 Llama 2Pretrained Model
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 7 3 Fine-tuning
Node ID: d6860cfa-8486-41c4-80bd-4bc53788b13a
Text: Llama 2Pretrained Model Evaluation . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 7 3 Fine-tuning 8 3.1 Supervised
Fine-Tuning (SFT) . . . . . . . .
Node ID: 3bd146c0-a554-4e4d-94a9-24cba2a3dbc2
Text: . . . . . . . . . . . . . . . . . . . . 7 3 Fine-tuning 8 3.1
Supervised Fine-Tuning (SFT) . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
Node ID: b84e1605-0ac9-4c9f-9ff5-

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/80846.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/80846.shtml
英文地址,請注明出處:http://en.pswp.cn/web/80846.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

【linux命令】git命令簡單使用

git命令簡單使用 1. 將代碼下載到到本地2. 查看分支是否正確3. 將工作目錄中的變更添加到暫存區,為下一次提交做準備4. 提交更改,添加提交信息5. 將本地的提交推送到遠程倉庫6.從遠端倉庫拉取分支代碼7.查看修改日志8. 解決沖突 1. 將代碼下載到到本地 …

debian系統redis-dump安裝

1. ?Ruby 環境? Redis-dump 是一個 Ruby 工具,需先安裝 Ruby 和 RubyGems。 安裝命令?: sudo apt update sudo apt install ruby-full build-essential[roota29d39f5fd10:/opt/redis-dump/bin# apt install ruby-full build-essential Reading pac…

微軟押注“代理式AI網絡”:一場重塑軟件開發與工作方式的技術革命

在 2025 年 Build 開發者大會上,微軟正式發布了其面向“開放代理式網絡(Open Agentic Web)”的宏大戰略,推出超過 50 項 AI 相關技術更新,涵蓋 GitHub、Azure、Windows 和 Microsoft 365 全線產品。這一系列更新的核心…

【音頻】wav文件如何解析編碼格式(壓縮格式)?

要確定一個WAV文件的編碼格式,可以通過以下幾種方法實現,包括使用操作系統自帶工具、專業音頻軟件或編程解析文件頭信息。以下是詳細說明: 一、通過文件屬性查看(Windows/macOS) 1. Windows系統 步驟: 右…

算法打卡第三天

10.長度最小的子數組 (力扣209題) 給定一個含有 n 個正整數的數組和一個正整數 target 。 找出該數組中滿足其總和大于等于 target 的長度最小的 子數組 [numsl, numsl1, ..., numsr-1, numsr] ,并返回其長度**。**如果不存在符合條件的子…

數字電子技術基礎(六十二)——使用Multisim軟件繪制邊沿觸發的D觸發器和JK觸發器

1 使用Mulitism軟件模擬時鐘觸發的D觸發器 D觸發器是一種基本的數字電路存儲元件,它在時鐘信號的邊沿將輸入數據D傳遞到輸出Q。下面開始使用Multisim軟件來模擬時鐘觸發的D觸發器。 器件選擇: 觸發器選擇:在組選項欄中點擊Misc Digital&am…

自動獲取新版本 js 靜態文件

場景 代碼里有靜態js文件,發布一個版本1.0在真實環境,再修改重新發布2.0,用戶如何得到新版本? 方法 一、文件名哈希策略(最推薦) 通過構建工具為文件生成唯一哈希值,使每次更新后的文件名不同…

第13天-用BeautifulSoup解析網頁數據:以百度熱搜可視化為例

一、BeautifulSoup簡介 BeautifulSoup是Python最受歡迎的HTML/XML解析庫之一,它能將復雜的網頁文檔轉換為樹形結構,支持多種解析器(如lxml、html.parser)。配合requests庫,可以快速構建網頁爬蟲項目。 二、環境準備 pip install requests beautifulsoup4 matplotlib 三…

PyTorch中cdist和sum函數使用詳解

torch.cdist 是 PyTorch 中用于計算**兩個張量之間的成對距離(pairwise distance)**的函數,常用于點云處理、圖神經網絡、相似性度量等場景。 基本語法 torch.cdist(x1, x2, p2.0)參數說明: 參數說明x1一個形狀為 [B, M, D] 或 …

智能視覺檢測技術:制造業質量管控的“隱形守護者”

在工業4.0浪潮的推動下,制造業正經歷一場以智能化為核心的變革。傳統人工質檢模式因效率低、誤差率高、成本高昂等問題,逐漸難以滿足現代生產對高精度、高速度的需求。智能視覺檢測技術作為人工智能與機器視覺融合的產物,正成為制造業質量管控…

水滸后傳-暹羅國建立新國家的故事

第一節《怒海余生》 李俊率領殘部穿越臺風海域,在暹羅灣遭遇葡萄牙艦隊突襲。童猛為掩護船隊突圍,駕駛火船與敵艦同歸于盡,留下最后的忠義絕唱。 第二節《血染王城》 李俊與暹羅舊貴族勢力在曼谷河畔展開決戰。中原陣法與暹羅象兵碰撞出驚心…

1.portainer

容器可視化工具 商業版Business、社區版Community docker容器部署portainer,對外暴露端口9443是一個自簽名的證書端口。還有另外一個暴露的端口8000。 volume 要想看得到,需要通過 portainer可視化界面看到volume,就必須使用: d…

使用Starrocks制作拉鏈表

5月1日向ods_order_info插入3條數據: CREATE TABLE ods_order_info(dt string,id string COMMENT 訂單編號,total_amount decimal(10,2) COMMENT 訂單金額 ) PRIMARY KEY(dt, id) PARTITION BY (dt) DISTRIBUTED BY HASH(id) PROPERTIES ( "replication_num&q…

Linux下Docker使用阿里云鏡像加速器

在中國大陸環境中配置 Docker 使用阿里云鏡像加速器,并確保通過 Clash 代理訪問 Docker Hub 我這里用的Debian12。 步驟 1:獲取阿里云鏡像加速器地址 登錄阿里云容器鏡像服務控制臺:(qinyang.wang) 網址:阿里云登錄 - 歡迎登錄阿…

Electron 后臺常駐服務實現(托盤 + 開機自啟)

基于 electron-vite-vue 項目結構 本篇將詳細介紹如何為 Electron 應用實現后臺常駐運行,包括: ? 創建系統托盤圖標(Tray)? 支持點擊托盤菜單控制窗口顯示/退出? 實現開機自啟功能(Auto Launch) &#…

opencv的直方圖

理解并運用 OpenCV 中的圖像直方圖 📊🖼? 圖像直方圖是計算機視覺和圖像處理中一種基本且強大的工具,它提供了圖像像素強度分布的圖形化表示。OpenCV 作為一個全面的計算機視覺庫,內置了計算和可視化直方圖的強大功能。本文將深…

Linux 內核探秘:從零構建 GPIO 設備驅動程序實戰指南

在嵌入式系統開發領域,GPIO(通用輸入 / 輸出)作為硬件與軟件交互的橋梁,是實現設備控制與數據采集的基礎。編寫高效、穩定的 GPIO 設備驅動程序,對于發揮硬件性能至關重要。本文將深入剖析 Linux 內核中 GPIO 驅動開發…

嵌入式單片機中STM32F1演示寄存器控制方法

該文以STM32F103C8T6為示例,演示如何使用操作寄存器的方法點亮(關閉LED燈),并講解了如何調試,以及使用宏定義。 第一:操作寄存器點亮LED燈。 (1)首先我們的目的是操作板子上的LED2燈,對其實現點亮和關閉操作。打開STM32F103C8T6的原理圖,找到LED2的位置。 可以看到…

牛客網 NC16407 題解:托米航空公司的座位安排問題

牛客網 NC16407 題解:托米航空公司的座位安排問題 題目分析 解題思路 本題可以采用深度優先搜索(DFS)來解決: 從左上角開始,按行優先順序遍歷每個座位對于每個座位,有兩種選擇: 選擇該座位(如果滿足條件…

智慧展館數字孿生平臺

2022年進博會上,國家會展中心憑借“數字孿生機器人調度平臺”驚艷全球,實現人機協同、虛實聯動的智慧運營;2023年天府農博園通過“BIMIoT”技術,貫穿展館全生命周期管理,成為農業會展的數字化標桿。這些案例背后&#…