05-28 周二 TTFT, ITL, TGS 計算過程以及LLama2推理代碼調試過程

05-28 周二 LLama2推理代碼調試過程
時間版本修改人描述
2024年5月28日15:03:49V0.1宋全恒新建文檔

簡介

?本文主要用于求解大模型推理過程中的幾個指標:

?主要是TTFT,ITL, TGS

在這里插入圖片描述

代碼片段

import osdata_dir = "/workspace/models/"
model_name = "Llama-2-7b-hf"
data_dir = data_dir + model_nameimport transformers
from transformers import AutoTokenizer, AutoModel
from transformers import LlamaForCausalLM, LlamaTokenizerimport time
import torch
# from ixformer.inference.models.chatglm2_6b import ChatGLMForConditionalGeneration
# from torch.cuda import profilerimport argparse
import pickle
# from thop import profile
import json
from datetime import datetimedef main(args):tokenizer = AutoTokenizer.from_pretrained(data_dir, trust_remote_code=True, device_map="auto")model = transformers.AutoModelForCausalLM.from_pretrained(data_dir, trust_remote_code=True, device_map='auto')INPUT_LEN = [32,64,128,256,512,1024,2048]# INPUT_LEN = [1024, 2048]current_time = datetime.now().strftime("%Y%m%d%H%M%S")res_file = "result_" + model_name + "_fp16_" + current_time + ".txt"print("res_file {res_file}")with open(res_file, "w") as f_result:with open("input_request_list","rb") as f:input_request_list = pickle.load(f)for input_request in input_request_list:print(input_request)test_len = input_request[1]if test_len not in INPUT_LEN:continueprint("testing len:{}...".format(test_len))query, prompt_len, output_len = input_requestinputs = tokenizer(query, return_tensors='pt').to('cuda')geneate_ids = model.generate(inputs.input_ids, max_new_tokens=1, max_length=None, do_sample=False)# response, _ = model.chat(tokenizer, query, max_new_tokens=1, do_sample=False, history=[])#torch.cuda.synchronize()print("start TTFT test...")TTFT_list = []for _ in range(2):start_time = time.time()geneate_ids = model.generate(inputs.input_ids, max_new_tokens=1, max_length=None, do_sample=False)# response, _ = model.chat(tokenizer, query, do_sample=False, max_new_tokens=1,max_length=None, history=[])#torch.cuda.synchronize()end_time = time.time()TTFT = (end_time - start_time) * 1000print(TTFT)TTFT_list.append(TTFT)TTFT = sum(TTFT_list)/len(TTFT_list)print("time to first token:{:2f} ms".format(TTFT))print("start ITL test...")ITL_list = []out_tokens_num = 0for _ in range(2):start_time = time.time()geneate_ids = model.generate(inputs.input_ids, max_new_tokens=50, max_length=None, do_sample=False)outputs = geneate_ids.tolist()[0][len(inputs["input_ids"][0]):]# response, _ = model.chat(tokenizer, query, max_new_tokens=50, do_sample=False, history=[])#torch.cuda.synchronize()end_time = time.time()# out_tokens_num = len(tokenizer(response).input_ids)out_tokens_num = len(outputs)print("out_tokens_num:{}".format(out_tokens_num))ITL = ((end_time - start_time) * 1000 - TTFT) / out_tokens_numprint(ITL)ITL_list.append(ITL)ITL = sum(ITL_list) / len(ITL_list)print("inter-token latency:{:2f} ms".format(ITL))f_result.write("In len:{}\n".format(test_len))f_result.write("Out len:{}\n".format(out_tokens_num))f_result.write("TTFT:{:.2f}\n".format(TTFT))f_result.write("ITL:{:.2f}\n".format(ITL))f_result.write("\n")f_result.flush()if __name__ == "__main__":main()

調試過程

vscode配置調試代碼

具體可以參見, 05-16 周四 vscode 搭建遠程調試環境

launch.json配置

{// 使用 IntelliSense 了解相關屬性。 // 懸停以查看現有屬性的描述。// 欲了解更多信息,請訪問: https://go.microsoft.com/fwlink/?linkid=830387"version": "0.2.0","configurations": [{"name": "LLaMa2 推理","type": "debugpy","request": "launch","program": "/workspace/infer/test_dql_fp16.py","console": "integratedTerminal","cwd": "/workspace/infer/","args": []}]
}

執行日志

(python38_torch201_cuda) root@node-01:/workspace/infer/#  cd /workspace/infer/ ; /usr/bin/env /root/miniconda/envs/python38_torch201_cuda/bin/python /root/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 52003 -- /workspace/infer/test_dql_fp16.py 
XCCL /workspace/tools/xccl_rdma-ubuntu_x86_64/so/libbkcl.so loaded
[15:02:54][node-01][6791][WARN][BKCL][globals.cpp:127] set BKCL BLOCK SIZE to 0
SYMBOL_REWRITE torch success
SYMBOL_REWRITE torchvision success
[2024-05-28 15:02:56,781] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
MODULE_REPLACE apex success
MODULE_REPLACE fused_lamb success
WARNING: hook error!  No module named 'megatron'
SYMBOL_REWRITE deepspeed success
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:16<00:00,  8.30s/it]
res_file {res_file}
['I have an interview about product speccing with the company Weekend Health. Give me an example of a question they might ask with regards about a new feature', 32, 39]
testing len:32...
/root/miniconda/envs/python38_torch201_cuda/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.warnings.warn(
/root/miniconda/envs/python38_torch201_cuda/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.warnings.warn(
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
2024-05-28 15:04:24.257995: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
start TTFT test...
/root/miniconda/envs/python38_torch201_cuda/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.warnings.warn(
/root/miniconda/envs/python38_torch201_cuda/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.warnings.warn(
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
80.51776885986328
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
75.45685768127441
time to first token:77.987313 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
65.30064821243286
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
65.08305311203003
inter-token latency:65.191851 ms
['In Java, I want to replace string like "This is a new {object} at {place}" with a Map, {object: "student", "point 3, 4"}, and get a result "This is a new student at point 3, 4". How can I do?', 64, 494]
testing len:64...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
start TTFT test...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
82.96585083007812
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
83.1289291381836
time to first token:83.047390 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
67.13986396789551
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
64.81363773345947
inter-token latency:65.976751 ms
["MK runs out of the palace and opens the gates, allowing Macaque and the freedom fighters to come in and begin taking out Wukong's guards. The dark-haired soldiers who Wukong had forced into his army cheer at the sight of their true king and begin turning on the golden-haired soldiers. MK just barely manages to shout a warning to Macaque that Wukong is on his way before Wukong appears, leaping down from a cloud and knocking Macaque out of the sky and slamming him hard into the ground. Let's write that whole scene with details and dialogue.", 128, 701]
testing len:128...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
start TTFT test...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
102.16498374938965
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
102.17642784118652
time to first token:102.170706 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
64.69708442687988
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
67.01707363128662
inter-token latency:65.857079 ms
['this is one specification for the customer, can you explain it \n\nusing System;\nusing System.Linq;\nusing Ardalis.GuardClauses;\nusing Ardalis.Specification;\nusing uInvoice.Core.Entities;\n\nnamespace uInvoice.Core.Specifications\n{\n public class CustomerByIdWithIncludesSpec : Specification, ISingleResultSpecification\n {\n public CustomerByIdWithIncludesSpec(Guid customerId)\n {\n Guard.Against.NullOrEmpty(customerId, nameof(customerId));\n\n Query.Where(customer => customer.CustomerId == customerId && customer.IsActive == true)\n .OrderBy(customer => customer.CustomerCode)\n .Include(f => f.CustomerAddresses)\n .Include(f => f.CustomerEmailAddresses)\n .Include(f => f.CustomerPhoneNumbers)\n .Include(f => f.CustomerPriceLists)\n .Include(f => f.Invoices)\n .AsNoTracking();\n }\n }\n}', 256, 376]
testing len:256...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
start TTFT test...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
130.12409210205078
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
129.8387050628662
time to first token:129.981399 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
66.0555100440979
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
64.68253374099731
inter-token latency:65.369022 ms
['Nezha nods in agreement and turns his chariot towards the city. As they fly over the rooftops, he asks MK about Pigsy and how he came to be with Wukong.\n\nMK tells Nezha about how he had been kidnapped by Wukong and taken to the Mountain of Flowers and Fruit, where he was put in the alcove cage. He explains how Macaque had helped him escape and how they had been hiding out at Pigsy\'s noodle shop ever since.\n\nNezha listens intently, his expression grave. "I\'m sorry you had to go through that," he says. "But I\'m glad you\'re safe now."\n\nAs they approach the noodle shop, Nezha gently sets the chariot down in the street. MK hops out and runs inside. Pigsy is there, cooking noodles as usual. When he sees MK, he drops his ladle and runs over to give him a big hug.\n\n"Thank goodness you\'re safe!" Pigsy says, tears in his eyes. "I was so worried about you."\n\nMK tells Pigsy about everything that happened, about how Macaque had saved him and how Nezha had helped them. Pigsy listens in amazement, his eyes widening with each new detail.\n\nNezha steps inside the shop and nods at Pigsy. "I\'m glad to see you\'re both doing well," he says.\n\nPigsy looks at Nezha with a mixture of fear and awe. "Thank you for everything you\'ve done," he says. "I don\'t know how to repay you."\n\nNezha smiles. "You don\'t need to repay me. I did what I did because it was the right thing to do."\n\nPigsy nods, still looking a bit overwhelmed. "I understand. Thank you again."\n\nMK looks up at Nezha, his eyes shining. "Thank you for bringing me home," he says.\n\nNezha ruffles MK\'s hair affectionately. "Of course. Take care of yourself, okay?"\n\nWith that, Nezha steps back into his chariot and takes off into the sky. Pigsy and MK watch him go, their hearts filled with gratitude and wonder at the amazing events that had just transpired.', 512, 38]
testing len:512...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
start TTFT test...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
285.5086326599121
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
279.91533279418945
time to first token:282.711983 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
67.791428565979
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
65.05216598510742
inter-token latency:66.421797 ms
['Yes, I can definitely help you with that!\n\nThe Python script would need to perform the following functions:\n\n1. Initialize the GPIO pins and relay modules to control the solenoids\n2. Monitor the pressure switch for input to start the sequence\n3. Start the timed sequences for each solenoid, adjusting the duration based on user input from the touchscreen\n4. Log data for each sequence, including the start and end times, solenoid used, and location data from the GPS module\n5. Upload the data to the cloud through WiFi or Bluetooth\n\nHere\'s an example script to get you started:\n```python\nimport time\nimport RPi.GPIO as GPIO\nimport pygame\nimport gpsd\n\n# Initialize the GPIO pins and relay modules\nGPIO.setmode(GPIO.BCM)\nrelay_pins = [1, 2, 3, 4, 5, 6, 7, 8, 9]\nfor pin in relay_pins:\n    GPIO.setup(pin, GPIO.OUT)\n\n# Initialize the GPS module\ngpsd.connect()\n\n# Initialize the Pygame GUI\npygame.init()\nscreen = pygame.display.set_mode((800, 480))\nfont = pygame.font.SysFont("Arial", 24)\n\n# Function to start a timed sequence for a solenoid\ndef run_sequence(solenoid, duration):\n    GPIO.output(relay_pins[solenoid], GPIO.HIGH)\n    time.sleep(duration)\n    GPIO.output(relay_pins[solenoid], GPIO.LOW)\n\n# Main loop\nwhile True:\n    # Get input from the pressure switch to start the sequence\n    if GPIO.input(pressure_switch_pin):\n        # Display the GUI on the touchscreen\n        screen.fill((255, 255, 255))\n        label = font.render("Select sequence duration for each solenoid:", True, (0, 0, 0))\n        screen.blit(label, (50, 50))\n        solenoid_durations = [0] * 9\n        for i in range(9):\n            label = font.render("Solenoid " + str(i + 1) + " duration (seconds):", True, (0, 0, 0))\n            screen.blit(label, (50, 100 + i * 50))\n            pygame.draw.rect(screen, (0, 0, 255), (350, 100 + i * 50, 100, 30))\n            pygame.draw.rect(screen, (255, 0, 0), (460, 100 + i * 50, 100, 30))\n        pygame.display.update()\n\n        # Wait for user input on the touchscreen\n        running = True\n        while running:\n            for event in pygame.event.get():\n                if event.type == pygame.QUIT:\n                    running = False\n                elif event.type == pygame.MOUSEBUTTONDOWN:\n                    pos = pygame.mouse.get_pos()\n                    for i in range(9):\n                        if pos[0] >= 350 and pos[0] <= 450 and pos[1] >= 100 + i * 50 and pos[1] <= 130 + i * 50:\n                            solenoid_durations[i] += 1\n                            pygame.draw.rect(screen, (0, 255, 0), (350, 100 + i * 50, solenoid_durations[i] * 10, 30))\n                            pygame.display.update()\n                        elif pos[0] >= 460 and pos[0] <= 560 and pos[1] >= 100 + i * 50 and pos[1] <= 130 + i * 50:\n                            solenoid_durations[i] -= 1\n                            if solenoid_durations[i] <\n```', 1024, 18]
testing len:1024...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
start TTFT test...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
526.5488624572754
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
537.6203060150146
time to first token:532.084584 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
65.85144281387329
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
67.71608114242554
inter-token latency:66.783762 ms
['Annie the Ant: En kl?nete og glemsk maur som ved et uhell snubler over et mykorrhizanettverk mens hun leter etter mat.\nWoody the Tree: Et klokt og t?lmodig tre som l?rer de andre karakterene om fordelene med mykorrhizale forhold og hvordan de hjelper tr?r ? vokse.\nBuzzy the Bee: En hyperaktiv og energisk bie som l?rer om mykorrhisering mens hun pollinerer blomster i et mykorrhizalt ?kosystem.\nSammy the Soil: En kl?nete og vennlig haug med jord som fungerer som en guide for de andre karakterene, og viser dem mykorrhiseringens underverk og hvordan det bidrar til sunn jord.\nBella the Bacteria: En frekk og selvsikker bakterie som vet alt om den viktige rollen til bakterier i mykorrhiza-forhold og hjelper de andre karakterene ? forst? dette ogs?.\nPippin the Plant: En nysgjerrig og eventyrlysten plante som oppdager fordelene med mykorrhisering mens du utforsker skogbunnen.\nSammy the Soil, Pippin the Plant og vennene deres var fast bestemt p? ? finne en m?te ? hjelpe den forurensede Sammy i hagen deres. Med hjelp av Dr. Baltazar l?rte de mer om mycoremediation og hvordan sopp som ?sters sopp kan bidra til ? rydde opp forurenset jord.\nSammen bestemte de seg for ? pr?ve et eksperiment med ?sters soppmycelium for ? rydde opp petroleum i Sammys jord. Pippin plantet noen ?sters sopp gyte i forurenset jord, og de ventet ? se hva som ville skje.\nEtter noen uker kunne de se myceliet spre seg gjennom jorda og bryte ned oljen. Snart begynte sm? ?sters sopp ? vokse, og jorda begynte ? se sunnere ut.\ntil hverandre. En god lesning for elevene deres, og en fin avslutning p?¥ kurssl?pet deres.\nSammy the Soil ble overrasket over kraften til mycorrhizal sopp og hvordan de kunne bidra til ? rydde opp forurenset jord. Han f?lte seg stolt over ? v?re en del av et s? utrolig undergrunnssamfunn.\nDa de fortsatte ? l?re mer om mycorrhizal sopp, oppdaget gruppen at disse sm? organismer ogs? var viktige for plantevekst. Woody the Tree forklarte at mycorrhizal sopp danner et gjensidig fordelaktig forhold til plantens r?tter, og hjelper dem med ? absorbere n?ringsstoffer fra jorda.\nBuzzy the Bee var spent p? ? h?re at mykorrhizal sopp ogs? var viktig for blomstene hun bes?kte, og at de bidro til ? gj?re nektar og pollen mer n?ringsrik for henne og hennes andre bier.\nAnnie the Ant var fascinert av det komplekse nettverket av mycorrhizal sopp som koblet forskjellige planter og tr?r i skogen. Hun inns? at disse soppene var som skogens internett, slik at forskjellige organismer kunne kommunisere og dele ressurser.\nSammen fortsatte vennegjengen ? utforske den utrolige verden av mykorrhizasopp, og de ble overrasket over hvor mye de m?tte l?re. De oppdaget at disse sm? organismene ikke bare var viktige for jordhelsen, men for helsen til hele ?kosystemer.\nVed oljehullet er Pueblo t?rrlandbruksteknikker utstilt i en offentlig park i sentrum. Hagen, designet og Pippin the Planed av den urfolksledede organisasjonen millj?agewntenne hvordan mat og medisiner kan dyrkes i et milj? Men de har et problem: Sammy-ene er giftige.\nPetroleum fra en n?rliggende parkeringsplass siver inn i Sammy n?r det regner sko ?sterssopp vil rydde opp i rotet.\nDet har fungert f?r.\nMykolog Fungo the Fung forklarer at i en prosess som kalles mycoremediation, har sopp evnen til ? fjerne kjemikalier fra Sammy the - og tungmetaller fra vann - gjennom myceliet.\n“De er p? en m?te naturens st?rste nedbrytere, demonterende, langt bedre enn og kraftigere enn bakteriene Bella, dyrene og Pippin the Plans,” sa McCoy. “De bryter ned alle slags ting.”\nSopp har bidratt til ? fjerne petroleum fra Sammy the Sammy overalt fra Orleans, California, hvor de ryddet opp i et lite motorolje- og dieseldrivstoffs?l p? et samfunnssenter, til den ecmediation for ? rydde opp i tungmetallene. De vet enn? ikke om anbefalingene deres vil f?re til strengere oppryddingskrav. I mellomtiden gj?r sammy de kan for ? rydde opp i petroleum ved Food Oasis.begravde pippin murstein inokulert med ?sterssoppmycel. De planlegger ? teste Sammy the Sammy v?ren for ? se hvordan utbedringen fungerte.\n“Jeg tror det vanlige synet er at disse stedene har g?tt tapt for oss fordi de er forurenset,” sa sammy a. "Men for meg er det som om du ikke bare ville forlatt din syke sammy jorden sin for ? lide alene.\n"Det er slik vi f?ler om disse stedene. De er syke. De trenger helbredelse. De trenger v?r kj?rlighet og oppmerksomhet m varianter vil vokse p? mycelet til de oljespisende soppene til det meste av eller all oljen er borte.? Saken er at disse mycelene, nettverket av filamenter som soppen vokser p?, de kan ikke se forskjellen mellom petroleum og n?ringsstoffer som glukose, s? de er enn noen gang.?\nDe kan bryte ned stort sett alt bortsett fra noen ganske hardcore kjemiske substrater som plast og noen tungmetaller, inkludert gull og s?lv," sa McCoy. Akkurat n? er Sammies dekket av gullfarget ?sterssopp fordi gull ogs? er giftig. Til slutt vil soppen bli brun n?r de bryter ned petroleumenuadorianske Amazonas, hvor de brukes til ? rense opp det st?rste landbaserte oljeutslippet i historien.\nBeata Tsosie-Pe?a fra Santa Clara Pueblo er programkoordinator hos Tewa Women United. Hun sa at hennes eldste opplevde sykdom, sykdom og spontanabort som et resultat av forurensning i omr?det.\nDet er ikke bare Sammy the s i hagen som er forurenset. Ved det n?rliggende Los Alamos National Laboratory siver seksverdig krom, et tungmetall og kjent kreftfremkallende stoff inn i vannforsyningen.\nEn koalisjon inkludert pippin tar til orde for, gjennom offentlig vitnesbyrd i en prosess for bruker mycore', 2048, 597]
testing len:2048...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
start TTFT test...
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
1192.781686782837
Both `max_new_tokens` (=1) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
1187.046766281128
time to first token:1189.914227 ms
start ITL test...
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
67.58917331695557
Both `max_new_tokens` (=50) and `max_length`(=None) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
out_tokens_num:50
69.20413970947266
inter-token latency:68.396657 ms

代碼執行

input_request_list

input_request_list = pickle.load(f)
print(f"len(input_request_list): {len(input_request_list)}")
for input_request in input_request_list:

?斷點可以發現,input_request_list內容如下

在這里插入圖片描述

?程序打印了input_request內容如下:

['I have an interview about product speccing with the company Weekend Health. Give me an example of a question they might ask with regards about a new feature', 32, 39]

搜索關鍵時要理解這三個過程,

query, prompt_len, output_len = input_requestinputs = tokenizer(query, return_tensors='pt').to('cuda')
geneate_ids = model.generate(inputs.input_ids, max_new_tokens=1, max_length=None, do_sample=False)

?上述代碼,關鍵是理解input_request,其是一個元祖,有三個元素組成,

  • query 即提示詞。
  • 提示詞長度,注意,不是字符串的長度,而是使用tokenizer處理后的tokenizer的長度。
  • output_len 指定的事輸出長度,在大模型推理過程中,由于隨機性作用,輸出長度無法指定,在代碼執行中,該變量是沒有讀取的。

?上述的代碼,用已經初始化的tokenizer分詞器,處理query,得到query對應的inputs,其中inputs包含了query對應的tokens信息。這些tokens信息,即inputs.input_ids。

?而model.generate中傳遞進了query對應的token序列,得到了geneate_ids,即為生成的響應對應的序列。由于第一個model.generate相當于warmup,所以,這次推理不產生實際作用。

?而query和32的關系,為使用llama2對應的分詞器,處理query得到的token序列長度就是32,這是產品測試需求決定的。

理解分詞序列

🎈query和對應的token序列為什么長度不一?

在大模型語境中,查詢(query)的長度和其對應的token的長度可能不同的原因是因為在進行自然語言處理任務時,文本數據通常需要進行處理和轉換成模型可以理解和處理的形式,這個過程通常會包括分詞(tokenization)。

分詞是將原始文本拆分成一系列獨立的單元(token)或詞語,每個單元代表著一個有意義且可被模型處理的獨立單元。在不同任務中,這些單元可以是字符、子詞(subword)、或者完整的詞語。

由于分詞過程對于不同文本有不同規則和策略,并且可能涉及到特定領域、術語等方面考慮,所以導致了查詢(query)經過分詞后生成了一系列token。這些token數量通常會比原始查詢字符串長度更長。

另外,在大模型語境中還可能存在其他文本預處理步驟如去除停用詞(stopword),進行大小寫轉換等操作,這也會影響到最終生成token序列的長度與原始查詢字符串長度之間的差異。

總之,在自然語言處理任務中,為了使得模型更好地理解和學習文本信息,我們通常需要對輸入數據進行預處理,并將其轉化為可以被模型接受和操作的形式。因此,在大模型語境下查詢(query)與其對應token序列長度差異較大是很正常情況。

🎈分詞器處理query得到tokens序列,這些token序列是有序的嗎?

是的,分詞器處理查詢(query)通常會產生一個有序的tokens序列。這意味著分詞器將原始查詢字符串拆分成一系列獨立的單元(token),并按照它們在原始字符串中出現的順序進行排列。

按順序排列tokens是為了確保模型能夠正確地理解和解釋文本信息,并保留了輸入文本中的語法和語義結構。這樣,模型就可以更好地理解查詢(query)所表達的含義,并基于此執行后續任務(如檢索、分類等)。

因此,在大多數情況下,經過分詞處理后生成的token序列是有序的,并且會尊重原始查詢字符串中單詞或子詞出現的先后順序。

統計TTFT

print("start TTFT test...")
TTFT_list = []
for _ in range(2):start_time = time.time()geneate_ids = model.generate(inputs.input_ids, max_new_tokens=1, max_length=None, do_sample=False)# response, _ = model.chat(tokenizer, query, do_sample=False, max_new_tokens=1,max_length=None, history=[])#torch.cuda.synchronize()end_time = time.time()TTFT = (end_time - start_time) * 1000print(TTFT)TTFT_list.append(TTFT)
TTFT = sum(TTFT_list)/len(TTFT_list)
print("time to first token:{:2f} ms".format(TTFT))

?上述代碼,主要是通過time庫統計生成第一個token所耗費的實踐,并且把每次統計得到的實踐放在了TTFT_list中,在循環外,統計得到一個平均值。這里面較為重要的事max_new_tokens限制參數值為1,就是為了測量首Token生成時間。

統計ITL和TGS

print("start ITL test...")
ITL_list = []
TGS_list = []
out_tokens_num = 0
for _ in range(10):start_time = time.time()geneate_ids = model.generate(inputs.input_ids, max_new_tokens=50, max_length=None, do_sample=False)outputs = geneate_ids.tolist()[0][len(inputs["input_ids"][0]):]# response, _ = model.chat(tokenizer, query, max_new_tokens=50, do_sample=False, history=[])#torch.cuda.synchronize()end_time = time.time()# out_tokens_num = len(tokenizer(response).input_ids)out_tokens_num = len(outputs)print("out_tokens_num:{}".format(out_tokens_num))ITL = ((end_time - start_time) * 1000 - TTFT) / out_tokens_numTGS = ((end_time - start_time) * 1000 ) / out_tokens_numprint(f"ITL: {ITL}")print(f"TGS: {TGS}")ITL_list.append(ITL)TGS_list.append(TGS)
ITL = sum(ITL_list) / len(ITL_list)
TGS = sum(TGS_list) / len(TGS_list)

?上述代碼的核心為:

    geneate_ids = model.generate(inputs.input_ids, max_new_tokens=50, max_length=None, do_sample=False)outputs = geneate_ids.tolist()[0][len(inputs["input_ids"][0]):]

?首先是使用model生成query對應的響應的tokens序列,在生成過程中限制max_new_tokens為50,即生成的token序列最長為50,具體生成的序列長度不確定。


這段代碼的目的是使用給定的模型生成新的token序列,并將結果存儲在outputs變量中。首先,使用模型的generate方法生成新的token序列。其中,inputs.input_ids是輸入模型進行推理的token序列,max_new_tokens表示要生成最多多少個新的token(默認為50),max_length表示生成結果最大長度(默認為None),do_sample表示是否進行采樣(默認為False)。然后,通過tolist()方法將生成的tensor轉換成Python列表形式。geneate_ids.tolist()返回一個二維列表,第一維對應batch中每個樣本,第二維對應每個樣本生成的token序列。
接下來,在這個二維列表中獲取第一個樣本,并使用切片操作 [len(inputs["input_ids"][]):] 去掉之前輸入部分重復出現在輸出中。最后將處理后得到結果存儲在outputs變量中。這里假設batch size為1,并且只取了一個示例進行處理。

?實際生成的token序列長度被保存在了out_tokens_num變量中,接下來就是計算token中間序列。

geneate_ids.tolist()[0][len(inputs["input_ids"][0]):]

[0]表示batch只有1,取出第一個,而[len(inputs["input_ids"][0]):]是因為輸出的tokens序列包含了輸入的序列,因此截掉輸入的token序列長度。因此統計出了生成的token序列數目。

執行總結

?之前一直覺得大模型很神秘,其實大模型推理的驅動的主要邏輯還是使用的程序流程主要還是for循環,分支語句以及串行,筆者呢有多年的開發經驗,理解起來還是比較簡單的,但是大模型的結構,transformer的工作原理,這些確實是看了很多次都沒有看懂的,壓力很大。

?能夠每天有所進步或許是最開心的事情吧,日日知非,日日有進步,我希望余生都可以這樣

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/bicheng/19208.shtml
繁體地址,請注明出處:http://hk.pswp.cn/bicheng/19208.shtml
英文地址,請注明出處:http://en.pswp.cn/bicheng/19208.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

獲取 Excel 單元格的條件格式是否成立及其改變后的屬性(如背景顏色)

獲取 Excel 單元格的條件格式是否成立及其改變后的屬性&#xff08;如背景顏色&#xff09;&#xff0c;直接通過 VSTO API 是有挑戰的&#xff0c;因為條件格式的實際應用效果在 Excel 的內部邏輯中&#xff0c;并不直接暴露給外部 API。盡管如此&#xff0c;可以通過一些工作…

unity中的常用屬性修飾符

unity中的常用屬性修飾符 一、前言二、常用修飾符三、結語 一、前言 在做unity開發編輯腳本的時候經常會用到屬性修飾符&#xff0c;使開發調試更加便捷。初學者見過最多的莫過于[Header("標題文本")]了吧&#xff0c;除此之外其實還有很多&#xff0c;這篇文章列舉說…

MFC工控項目實例一主菜單制作

1、本項目用在WIN10下安裝的vc6.0兼容版實現。創建項目名為SEAL_PRESSURE的MFC對話框。在項目res文件下添加相關256色ico格式圖片。 2、項目名稱&#xff1a;密封壓力試驗機 主菜單名稱&#xff1a; 系統參數 SYS_DATA 系統測試 SYS_TEST 選擇型號 TYP_CHOICE 開始試驗 TES_STA…

sdbusplus:通過文件描述符傳遞數據

有的時候需要傳遞大量的數據,如果將數據通過dbus傳遞,會消耗大量的帶寬。可以通過傳遞一個文件描述符替代傳遞數據: 以下的service通過文件描述符接收數據: //fd_service.cpp #include <sdbusplus/asio/connection.hpp> #include <sdbusplus/asio/object_server…

U盤無法打開?數據恢復與預防措施全解析

在日常生活和工作中&#xff0c;U盤已成為我們存儲和傳輸數據的重要工具。然而&#xff0c;有時我們會遇到U盤無法打開的情況&#xff0c;這無疑給我們帶來了諸多不便。本文將深入探討U盤打不開的現象、原因及解決方案&#xff0c;并分享如何預防此類問題的發生。 一、U盤無法訪…

Java實現對象存儲的4種方式(本地對象存儲、MINIO、阿里云OSS、FastDFS)

文章目錄 Java實現對象存儲的3中方式1、概述2、本地對象存儲2.1 配置本地文件相關信息2.2 通用映射配置 ResourcesConfig2.3 文件上傳業務 LocalSysFileServiceImpl2.4 上傳接口2.5 演示 3、MINIO3.1 依賴3.2 配置3.3 配置連接信息3.4. MINIO文件上傳業務3.5 文件上傳下載接口3…

學生管理系統 面向對象

創建一個實例對象后 把實例對象添加到列表后 每次遍歷列表 都能獲得一個實例對象 然后就可以使用實例對象的屬性和方法了 學生管理系統 面向對象 兩個類 學生管理類 學生類 # 學生類 # 屬性 姓名 電話 class Student:def __init__(self, name, phone):self.name nameself.phon…

各大翻譯軟件代碼——潯川AI翻譯研發社團

一、前言 有道翻譯API&#xff08;主要推薦&#xff09; 百度翻譯API&#xff08;需要申請key與密鑰&#xff0c;每月100萬免費字符&#xff09; 谷歌翻譯API&#xff08;需要梯子&#xff0c;而且不穩定&#xff0c;不推薦&#xff09; 二、代碼 1、有道翻譯 def is_Chi…

高性價比、超強功能的開源工單解決方案

在企業日常運營中&#xff0c;工單管理系統是不可或缺的工具。高效的工單管理不僅能提升工作效率&#xff0c;還能顯著提高客戶滿意度。今天&#xff0c;我們為您推薦搭貝工單派單系統——一款超高性價比、功能齊全的開源工單管理系統。 &#x1f50d; 為什么選擇搭貝工單派單…

LangChain入門開發教程(一):Model I/O

官方文檔&#xff1a;https://python.langchain.com/docs/get_started/introduction/ LangChain是一個能夠利用大語言模型&#xff08;LLM&#xff0c;Large Language Model&#xff09;能力進行快速應用開發的框架&#xff1a; 高度抽象的組件&#xff0c;可以像搭積木一樣&a…

Nginx R31 doc-17-debugging 調試

前言 大家好&#xff0c;我是老馬。很高興遇到你。 我們為 java 開發者實現了 java 版本的 nginx https://github.com/houbb/nginx4j 如果你想知道 servlet 如何處理的&#xff0c;可以參考我的另一個項目&#xff1a; 手寫從零實現簡易版 tomcat minicat 手寫 nginx 系列 …

【PB案例學習筆記】-13 徒手做個電子時鐘

寫在前面 這是PB案例學習筆記系列文章的第11篇&#xff0c;該系列文章適合具有一定PB基礎的讀者。 通過一個個由淺入深的編程實戰案例學習&#xff0c;提高編程技巧&#xff0c;以保證小伙伴們能應付公司的各種開發需求。 文章中設計到的源碼&#xff0c;小凡都上傳到了gite…

python基礎-數據結構-leetcode刷題必看-queue---隊列-python的底層構建

文章目錄 隊列雙端隊列 deque底層存儲deque接口1. __init__(self, iterable: Iterable[_T], maxlen: int | None None) -> None2. append(self, __x: _T) -> None3. appendleft(self, __x: _T) -> None4. copy(self) -> Self5. count(self, __x: _T) -> int6. …

java項目啟動報錯

java項目啟動報錯&#xff1a;java: java.lang.NoSuchFieldError: Class com.sun.tools.javac.tree.JCTree$JCImport does not have member field ‘com.sun.tools.javac.tree.JCTree qualid’ 原因&#xff1a;編譯和運行的版本不一樣 點擊idea文件 點擊項目結構 把這兩個版本…

軟件架構設計屬性之一:功能性屬性淺析

引言 軟件架構設計屬性中的功能性屬性是評估軟件架構是否滿足其預定功能需求的關鍵指標。功能性屬性確保軟件能夠執行其設計中的任務&#xff0c;并提供所需的服務。以下是對軟件架構設計中功能性屬性的淺析&#xff1a; 一、定義 功能性屬性是指軟件系統所具備的功能特性&a…

解決Android studio 一直提示下載gradle-xxx-all.zip問題

今天用AndroidStdiod打開一個新工程的時候&#xff0c;發現項目一直卡在正在下載gradle-xxx-all.zip的任務上&#xff0c;網絡出奇的慢&#xff0c;即使配了VPN也無濟于事&#xff0c;于是按照以往經驗&#xff1a;將gradle-xxx-all.zip下載到.gradle\gradle\wrapper\dists目錄…

【ESP32之旅】ESP32 PlatformIO 固件單獨燒錄

背景 有時候使用PIO編寫的代碼需要發給客戶去驗證&#xff0c;相比較于發送源碼直接發送bin文件&#xff0c;更加的安全而且高效。不用擔心源碼的泄漏&#xff0c;也不用幫客戶配置PIO環境。 操作方法 1.編譯 首先進行代碼編譯&#xff0c;如編譯成功會在 .pio\build\airm2…

python之any用法

寫法對比 代碼一&#xff1a; any(i for i in [0,1]) 代碼2&#xff1a; any([i for i in [0,1]]) 優劣 結論&#xff1a;代碼一寫法更好 解釋&#xff1a; 在 Python 中&#xff0c;any() 函數可以接受一個可迭代對象作為參數&#xff0c;并返回 True 如果可迭代對象…

詳解 Java 泛型:核心概念與實用示例

詳解 Java 泛型&#xff1a;核心概念與實用示例 Java 泛型&#xff08;Generics&#xff09;是Java SE 5引入的一項特性&#xff0c;旨在提高代碼的可重用性和類型安全性。通過泛型&#xff0c;開發者可以編寫一個通用的類、接口或方法&#xff0c;可以與多種類型一起工作&…

汽車電子學習【車載網絡CAN/LIN】

車載網絡CAN/LIN知識總結 STM32F1開發板測試 STM32測試程序 /** CAN 通信報文內容設置*/ void CAN_SetMsg(void) { #if CAN_STDTxMessage.StdId 0x12;TxMessage.IDE CAN_ID_STD; #elseTxMessage.ExtId 0x1314; //使用的擴展IDTxMessage.IDE CAN_ID_EXT; //擴展模式 #…