【Binaryen】partiallyPrecompute函數梳理

在Binaryen中有一個優化名為Precompute,作用是進行一些提前計算,類似于LLVM中的常量折疊類似的操作。
涉及的提交文件在此。

首先放一下全部的代碼:

// To partially precompute selects we walk up the stack from them, like this:////  (A//    (B//      (select//        (C)//        (D)//        (condition)//      )//    )//  )//// First we try to apply B to C and D. If that works, we arrive at this:////  (A//    (select//      (constant result of B(C))//      (constant result of B(D))//      (condition)//    )//  )//// We can then proceed to perhaps apply A. However, even if we failed to apply// B then we can try to apply A and B together, because that combination may// succeed where incremental work fails, for example:////  (global $C//    (struct.new    ;; outer//      (struct.new  ;; inner//        (i32.const 10)//      )//    )//  )////  (struct.get    ;; outer//    (struct.get  ;; inner//      (select//        (global.get $C)//        (global.get $D)//        (condition)//      )//    )//  )//// Applying the inner struct.get to $C leads us to the inner struct.new, but// that is an interior pointer in the global - it is not something we can// refer to using a global.get, so precomputing it fails. However, when we// apply both struct.gets at once we arrive at the outer struct.new, which is// in fact the global $C, and we succeed.void partiallyPrecompute(Function* func) {if (!canPartiallyPrecompute || partiallyPrecomputable.empty()) {// Nothing to do.return;}// Walk the function to find the parent stacks of the promising selects. We// copy the stacks and process them later. We do it like this because if we// wanted to process stacks as we reached them then we'd trip over// ourselves: when we optimize we replace a parent, but that parent is an// expression we'll reach later in the walk, so modifying it is unsafe.struct StackFinder : public ExpressionStackWalker<StackFinder> {Precompute& parent;StackFinder(Precompute& parent) : parent(parent) {}// We will later iterate on this in the order of insertion, which keeps// things deterministic, and also usually lets us do consecutive work// like a select nested in another select's condition, simply because we// will traverse the selects in postorder (however, because we cannot// always succeed in an incremental manner - see the comment on this// function - it is possible in theory that some work can happen only in a// later execution of the pass).InsertOrderedMap<Select*, ExpressionStack> stackMap;void visitSelect(Select* curr) {if (parent.partiallyPrecomputable.count(curr)) {stackMap[curr] = expressionStack;}}} stackFinder(*this);stackFinder.walkFunction(func);// Note which expressions we've modified as we go, as it is invalid to// modify more than once. This could happen in theory in a situation like// this:////  (ternary.f32.max  ;; fictional instruction for explanatory purposes//    (select ..)//    (select ..)//    (f32.infinity)//  )//// When we consider the first select we can see that the computation result// is always infinity, so we can optimize here and replace the ternary. Then// the same thing happens with the second select, causing the ternary to be// replaced again, which is unsafe because it no longer exists after we// precomputed it the first time. (Note that in this example the result is// the same either way, but at least in theory an instruction could exist// for whom there was a difference.) In practice it does not seem that wasm// has instructions capable of this atm but this code is still useful to// guard against future problems, and as a minor speedup (quickly skip code// if it was already modified).std::unordered_set<Expression*> modified;for (auto& [select, stack] : stackFinder.stackMap) {// Each stack ends in the select itself, and contains more than the select// itself (otherwise we'd have ignored the select), i.e., the select has a// parent that we can try to optimize into the arms.assert(stack.back() == select);assert(stack.size() >= 2);Index selectIndex = stack.size() - 1;assert(selectIndex >= 1);if (modified.count(select)) {// This select was modified; go to the next one.continue;}// Go up through the parents, until we can't do any more work. At each// parent we'll try to execute it and all intermediate parents into the// select arms.for (Index parentIndex = selectIndex - 1; parentIndex != Index(-1);parentIndex--) {auto* parent = stack[parentIndex];if (modified.count(parent)) {// This parent was modified; exit the loop on parents as no upper// parent is valid to try either.break;}// If the parent lacks a concrete type then we can't move it into the// select: the select needs a concrete (and non-tuple) type. For example// if the parent is a drop or is unreachable, those are things we don't// want to handle, and we stop here (once we see one such parent we// can't expect to make any more progress).if (!parent->type.isConcrete() || parent->type.isTuple()) {break;}// We are precomputing the select arms, but leaving the condition as-is.// If the condition breaks to the parent, then we can't move the parent// into the select arms:////  (block $name ;; this must stay outside of the select//    (select//      (B)//      (C)//      (block ;; condition//        (br_if $target//// Ignore all control flow for simplicity, as they aren't interesting// for us, and other passes should have removed them anyhow.if (Properties::isControlFlowStructure(parent)) {break;}// This looks promising, so try to precompute here. What we do is// precompute twice, once with the select replaced with the left arm,// and once with the right. If both succeed then we can create a new// select (with the same condition as before) whose arms are the// precomputed values.auto isValidPrecomputation = [&](const Flow& flow) {// For now we handle simple concrete values. We could also handle// breaks in principle TODOreturn canEmitConstantFor(flow.values) && !flow.breaking() &&flow.values.isConcrete();};// Find the pointer to the select in its immediate parent so that we can// replace it first with one arm and then the other.auto** pointerToSelect =getChildPointerInImmediateParent(stack, selectIndex, func);*pointerToSelect = select->ifTrue;auto ifTrue = precomputeExpression(parent);if (isValidPrecomputation(ifTrue)) {*pointerToSelect = select->ifFalse;auto ifFalse = precomputeExpression(parent);if (isValidPrecomputation(ifFalse)) {// Wonderful, we can precompute here! The select can now contain the// computed values in its arms.select->ifTrue = ifTrue.getConstExpression(*getModule());select->ifFalse = ifFalse.getConstExpression(*getModule());select->finalize();// The parent of the select is now replaced by the select.auto** pointerToParent =getChildPointerInImmediateParent(stack, parentIndex, func);*pointerToParent = select;// Update state for further iterations: Mark everything modified and// move the select to the parent's location.for (Index i = parentIndex; i <= selectIndex; i++) {modified.insert(stack[i]);}selectIndex = parentIndex;stack[selectIndex] = select;stack.resize(selectIndex + 1);}}// Whether we succeeded to precompute here or not, restore the parent's// pointer to its original state (if we precomputed, the parent is no// longer in use, but there is no harm in modifying it).*pointerToSelect = select;}}}

以下各部分將詳細解釋:

if (!canPartiallyPrecompute || partiallyPrecomputable.empty()) {// Nothing to do.return;}

首先判斷是否需要進行partiallyPrecompute。
而如果存在select指令的話,partiallyPrecomputable中會存放這個元素。

然后是定義一個StackFinder,此StackFinder可以找到select對應的棧式結構。
stackFinder.walkFunction(func);開始分析一個func,

void walkFunction(Function* func) {setFunction(func);static_cast<SubType*>(this)->doWalkFunction(func);static_cast<SubType*>(this)->visitFunction(func);setFunction(nullptr);}

在開始和末尾分別清除了func,重點關注中間兩行。
首先doWalkFunction會對func->body進行walk。

  void walk(Expression*& root) {assert(stack.size() == 0);pushTask(SubType::scan, &root);while (stack.size() > 0) {auto task = popTask();replacep = task.currp;assert(*task.currp);task.func(static_cast<SubType*>(this), task.currp);}}

首先向stack中壓棧,但stack的定義類型是SmallVector<Task, 10> stack;,所以是一個任務棧,用來記錄需要執行的任務,你可以理解為棧中的內容都是函數指針。Task的定義如下,一個TaskFunc記錄需要執行的函數,currp指向當前的Expression,可以理解為AST的一個結點 。

  struct Task {TaskFunc func;Expression** currp;Task() {}Task(TaskFunc func, Expression** currp) : func(func), currp(currp) {}};

上述操作最終會形成一個stack,存儲Select指令的各個分支和condition。

然后是一系列的判斷條件,例如是否該select指令發生了修改,其parent結點是否不滿足某些情況。

// Find the pointer to the select in its immediate parent so that we can// replace it first with one arm and then the other.auto** pointerToSelect =getChildPointerInImmediateParent(stack, selectIndex, func);*pointerToSelect = select->ifTrue;auto ifTrue = precomputeExpression(parent);

上面代碼調用了precomputeExpression,定義看下方,其實就是判斷是否能夠生成常量。

// This looks promising, so try to precompute here. What we do is// precompute twice, once with the select replaced with the left arm,// and once with the right. If both succeed then we can create a new// select (with the same condition as before) whose arms are the// precomputed values.auto isValidPrecomputation = [&](const Flow& flow) {// For now we handle simple concrete values. We could also handle// breaks in principle TODOreturn canEmitConstantFor(flow.values) && !flow.breaking() &&flow.values.isConcrete();};

如果True分支和False分支都可以precompute的話,修改select到一個新的select:

// Wonderful, we can precompute here! The select can now contain the// computed values in its arms.select->ifTrue = ifTrue.getConstExpression(*getModule());select->ifFalse = ifFalse.getConstExpression(*getModule());select->finalize();// The parent of the select is now replaced by the select.auto** pointerToParent =getChildPointerInImmediateParent(stack, parentIndex, func);*pointerToParent = select;// Update state for further iterations: Mark everything modified and// move the select to the parent's location.for (Index i = parentIndex; i <= selectIndex; i++) {modified.insert(stack[i]);}selectIndex = parentIndex;stack[selectIndex] = select;stack.resize(selectIndex + 1);

其實就是新建一個select,true分支變為新的true分支,false分支變為新的false分支,condition不變,然后將其放入stack中,同時添加到modified中。

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/bicheng/22165.shtml
繁體地址,請注明出處:http://hk.pswp.cn/bicheng/22165.shtml
英文地址,請注明出處:http://en.pswp.cn/bicheng/22165.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

vs - vs2013/vs2019工程文件的區別

文章目錄 vs - vs2013/vs2019工程文件的區別概述筆記sln文件的區別VisualStudioVersion vcxproj文件的區別ToolsVersionPlatformToolset 備注更方便的方法END vs - vs2013/vs2019工程文件的區別 概述 為了避免安裝UCRT的問題&#xff0c;想將手頭的vs2019工程降級為vs2013工程…

VLM MobileVLM 部署筆記

目錄 模型是自動下載的 在1060顯卡上能跑 LLaMA Meta MobileVLM V2 MobileLLaMA-1.4B 調錯 開源項目地址&#xff1a; GitHub - Meituan-AutoML/MobileVLM: Strong and Open Vision Language Assistant for Mobile Devices 模型是自動下載的 路徑&#xff1a; C:\User…

解決Mac ~/.bash_profile 配置的環境變量重啟終端后失效問題

在Mac系統中&#xff0c;配置環境變量通常是在~/.bash_profile文件中進行。然而&#xff0c;有時會遇到配置的環境變量在重啟終端后失效的問題。 解決辦法&#xff1a; 在~/.zshrc文件最后或最前面&#xff0c;增加一行 source ~/.bash_profile

SARscape雷達圖像處理軟件簡介

合成孔徑雷達&#xff08;SAR&#xff09;擁有獨特的技術魅力和優勢&#xff0c;漸成為國際上的研究熱點之一&#xff0c;其應用領域越來越廣泛。SAR數據可以全天候對研究區域進行量測、分析以及獲取目標信息。高級雷達圖像處理工具SARscape&#xff0c;能讓您輕松將原始SAR數據…

Leetcode 第 131 場雙周賽題解

Leetcode 第 131 場雙周賽題解 Leetcode 第 131 場雙周賽題解題目1&#xff1a;3158. 求出出現兩次數字的 XOR 值思路代碼復雜度分析 題目2&#xff1a;3159. 查詢數組中元素的出現位置思路代碼復雜度分析 題目3&#xff1a;3160. 所有球里面不同顏色的數目思路代碼復雜度分析 …

AI 時代,產品經理該如何進化

前言 傳統的互聯網業務或者游戲業務&#xff0c;產品或者業務輸出需求&#xff0c;技術人員只需要指哪打哪就好了。而人工智能發展到當下這個尷尬的階段&#xff0c;仿佛它能干很多事&#xff0c;但是真把它往業務里擱就發現&#xff0c;這個叛逆的小東西不一定勝任的了這些有…

AI大模型學習筆記之四:生成式人工智能是如何工作的?

OpenAI 發布 ChatGPT 已經1年多了&#xff0c;生成式人工智能&#xff08;AIGC&#xff09;也已經廣為人知&#xff0c;我們常常津津樂道于 ChatGPT 和 Claude 這樣的人工智能系統能夠神奇地生成文本與我們對話&#xff0c;并且能夠記憶上下文情境。 GPT-4多模態分析對話 Midj…

數字機頂盒、顯示器方案DCDC芯片OC5816 2A,18V同步降壓DC-DC

概述 OC5816 是一款 2A 的高集成度、高效率同步整流降壓轉換器。在一個相當寬的輸出電流負載范圍內&#xff0c;OC5816 可以高效工作。 OC5816 的兩種工作模式&#xff0c;固定頻率PWM 峰值電流控制和輕載 PFM 開關模式&#xff0c;允許系統高效工作在一個相當寬的輸出電流…

i 人 聊 天 手 冊(e人禁止入內)

在之前的讀書筆記-《蔡康永的說話之道》中&#xff0c;作者給大家分享了很多具體的要點&#xff0c;其更偏向于戰術層面&#xff0c;我更想要的是一個類似聊天手冊的東西&#xff0c;就讓我自己來總結下吧。 雖然在 MBTI 中&#xff0c;按照獲取能量的方式定義了 i 人、e 人&a…

【面試干貨】如何選擇MySQL數據庫存儲引擎(MyISAM 或 InnoDB)

【面試干貨】如何選擇MySQL數據庫存儲引擎(MyISAM 或 InnoDB&#xff09; &#x1f496;The Begin&#x1f496;點點關注&#xff0c;收藏不迷路&#x1f496; MySQL數據庫存儲引擎是一個 關鍵 的考慮因素。MySQL提供了多種存儲引擎&#xff0c;其中最常用的是 MyISAM 和 InnoD…

封裝一個頁面自適應方法

封裝一個頁面自適應方法 在 Vue 中&#xff0c;你可以封裝一個頁面自適應的方法來根據屏幕大小動態調整頁面的布局和樣式。以下是一個示例代碼&#xff1a; export const getPageSize () > {const { innerWidth, innerHeight } window;const width innerWidth > 192…

攻防世界---misc---a_good_idea

1、下載附件得到一張圖片&#xff0c;winhex分析&#xff0c;發現有壓縮包 2、在kali中用普通用戶對jpg進行binwalk 3、得到兩張圖片和一個文本&#xff0c;查看文本信息&#xff1a;提示試著找到像素的秘密 4、提到像素就想到了Stegsolve這個工具&#xff0c;將這兩張圖片用該…

rpm打包 postgres14.9 repmgr pgpool

rpm打包 postgres14.9 repmgr pgpool 上一篇講解了rpm打包的基礎知識之后&#xff0c;我們就可以根據實際業務自行打包了&#xff0c;需要注意的是依賴問題&#xff0c;需要提前講依賴準備好&#xff0c;對于各種系統需要的依賴的依賴也不一致&#xff0c;可以根據具體報錯去相…

Python項目開發實戰:二手房數據分析預測系統(案例教程)

一、項目背景與意義 在房地產市場日益繁榮的今天,二手房市場占據了重要地位。對于購房者、房地產中介和開發商來說,了解二手房市場的動態、價格趨勢以及潛在價值至關重要。因此,開發一個基于Python的二手房數據分析預測系統具有實際應用價值和商業意義。本項目旨在利用Pytho…

2024.05.21 校招 實習 內推 面經

綠*泡*泡VX&#xff1a; neituijunsir 交流*裙 &#xff0c;內推/實習/校招匯總表格 1、實習 | 云鯨智能暑期實習熱招崗位&#xff08;內推&#xff09; 實習 | 云鯨智能暑期實習熱招崗位&#xff08;內推&#xff09; 2、實習 | 亞馬遜實習生招聘倒計時&#xff01; 實習…

HOW - Lodash 使用指南和源碼學習

目錄 一、什么是 lodash二、為什么需要 Lodash三、API 分類ArrayCollectionDateFunctionLangMathNumberObjectStringSeqUtil 我們經常在項目里遇到 Lodash 函數的引入&#xff0c;如&#xff1a; debounce(Function)cloneDeep(Lang)isNull(Lang)isUndefined(Lang)isNil(Lang)i…

106、python-第四階段-3-設計模式-單例模式

不是單例類&#xff0c;如下&#xff1a; class StrTools():pass str1StrTools() str2StrTools() print(str1) print(str2) 運用單例&#xff0c;先創建一個test.py class StrTools():pass str1StrTools()然后創建一個hello.py&#xff0c;在這個文件中引用test.py中的對象&a…

JVM-JAVA-雙親委派機制

雙親委派機制 雙親委派機制Tomcat打破雙親委派機制 雙親委派機制 雙親委派機制&#xff0c;加載某個類時會先委托父加載器尋找目標類&#xff0c;找不到再委托上層父加載器加載&#xff0c;如果所有父加載器在自己的加載類路徑下都找不到目標類&#xff0c;則在自己的類加載路徑…

網絡攻擊的常見形式

開篇 本篇文章來自于《網絡安全 ——技術與實踐》的學習整理筆記。 正篇 口令竊取 相比于利用系統缺陷破壞網絡系統&#xff0c;最容易的方法還是通過竊取用戶的口令進入系統。因為人們傾向于選擇很糟糕的口令作為登錄密碼&#xff0c;所以口令猜測很容易成功。通常&#xff0…

C語言:基礎知識

創作不易&#xff0c;友友們給個三連吧 一、C語?的基本概念與發展歷史 1.1 人和計算機進行交流的語言 通常&#xff0c;我們使用英語、中文等語言來進行兩個人之間的交流。這意味著當我們想要和他人進行交流時&#xff0c;我們需要一種語言來表達自己的感受。同樣的&#xf…