從零開始搭建深度學習大廈系列-2.卷積神經網絡基礎(5-9)

(1)本人挑戰手寫代碼驗證理論,獲得一些AI工具無法提供的收獲和思考,對于一些我無法回答的疑問請大家在評論區指教;

(2)本系列文章有很多細節需要弄清楚,但是考慮到讀者的吸收情況和文章篇幅限制,選擇重點進行分享,如果有沒說清楚或者解釋錯誤的地方歡迎在評論區提出;

(3)寫的時候是用英文撰寫的,這里就不翻譯成中文了,希望大家理解;

(4)本系列內容基于李沐老師《動手學深度學習》教材,網址:

《動手學深度學習》 — 動手學深度學習 2.0.0 documentation

(5)由于代碼量較大,以免費資源的形式上傳到個人空間,方便讀者運行和使用。

注:AlexNet提供了Pytorch和MxNet兩種實現方式,LeNet只提供基于MxNet框架的實現。

原論文作者訓練后的模型參數也可以通過深度學習框架直接下載獲取,不過本實驗意在探究CNN的理論基礎和實現思路,因此從零開始訓練不同版本的"LeNet"和"AlexNet"。

同時提出了不同的代碼實現方案和分析思路,對于訓練成本較大的模型,建議使用Google Colaboratory提供的免費算力平臺,本質是配置了Pytorch和Tensorflow等深度學習框架的基于Ubuntu系統的服務器。

本篇主要分析:

【1】CNN卷積神經網絡中卷積層、池化層、批規范化層、激活層、“暫退層”的作用原理;

下篇文章主要分析:

【2】單CPU核訓練背景下的時間花費組成和實驗驗證,以及函數接口的加速效果;

【3】學習率、優化方法、批量大小、激活函數等超參數(Hyperparameters)的調參方法;

【4】卷積神經網絡(LeNet,1998)和深度卷積神經網絡(AlexNet,2012)在MNIST,Fashion_MNIST,CIFAR100數據集上的表現與一種可能可行的參數量自適應調整方法;

【5】CNN激活層特征可視化,直觀比對人工設計卷積核的濾波效果,理解CNN的信息提取過程;

【6】混淆矩陣的作用分析,繪制自定義混淆矩陣。

從零開始搭建深度學習大廈系列-3.卷積神經網絡基礎(5-9)-CSDN博客https://blog.csdn.net/2302_80464577/article/details/149260898?sharetype=blogdetail&sharerId=149260898&sharerefer=PC&sharesource=2302_80464577&spm=1011.2480.3001.8118

A Quick Look

LeNet(Based on mxnet; Textbook:2019+GPU,Max-pooling; Mine: Max-pooling+2 PREFETCHING PROCESS / 5? Prefetching processes)

2 PREFETCHING PROCESSES

5 Prefetching processes

Alexnet (Based on pytorch; Textbook:Original;Mine: Parameter size nearly 1/256 of original design)

2 Prefetching processes (Batch size=64)

5 Prefetching processes(Batch size=64/32, initial learning rate=0.01/0.03)

Figure 1 result of textbook’s VS mine

Content

Environment Setting. 5

Experiment Goals. 6

1.??? Edge Detection. 6

1.1????? Basic Principle. 6

1.2????? Function Design. 6

1.3????? Carrying-out Result 8

2.??? Shape of layers and kernels in a CNN.. 14

2.1????? Basic Theories. 15

2.2????? Code implementation(numpy,mxnet.gluon.nn,mxnet.nd) 17

2.3????? Result 18

3.??? 1x1 Convolution. 20

3.1????? Basic Theory. 20

3.2????? Code implementation (3 lines) 20

3.3????? Result 21

4-5 CNN Architecture Implementation and Evaluation. 21

About Data loaders. 22

About num _workers and prefetching processes. 23

4.??? LeNet Implementation (MxNet based) 24

4.1????? Basic Theories. 24

4.2????? Code Implementation. 25

4.3????? Model Evaluation on Fashion-MNIST dataset 26

4.3.1 Pooling: Maximum-pooling VS Average-pooling. 26

4.3.2 Optimization: sgd vs sgd+momentum(nag) 28

4.3.3 Activation Function: ReLU vs sigmoid. 28

4.3.4 Normalization Layer: Batch Normalization VS None. 29

4.3.5 Batch size: 64 vs 128. 31

4.3.6 Textbook Result(Batch Normalization) && Running Snapshot 32

4.4????? LeNet Evaluation on MNIST dataset 33

4.5????? Evaluating LeNet on CIFAR100. 35

4.5.1 Coarse Classification (20 classes) 36

4.5.2 Fine Classification (100 classes) 38

4.5.3 Running Snapshot 39

5.??? AlexNet Architecture. 39

5.1????? Code Implementation. 39

5.2????? Fashion_MNIST Dataset (Mxnet vs Pytorch) 44

5.3????? MNIST Dataset (Pytorch only) 45

5.4????? CIFAR100(100 classes, fine labels)-Pytorch Only. 46

5.4.1 Learning rate setting. 46

6.??? CNN activation layer characteristics visualization. 48

6.1????? MNIST Dataset 49

6.2????? Fashion_MNIST Dataset 52

7.??? Confusion Matrix. 54

7.1 MNIST.. 54

7.2 Fashion_MNIST.. 55

References. 55

Environment Setting

All of the four experiments are carried out on virtual environment based on Python interpreter 3.7.0 and mainly used packages include deep-learning package mxnet1.7.0.post2(PREFETCHING PROCESS version), visualization package matplotlib.pyplot, image processing package opencv-python, array manipulation package numpy.

Experiment Goals

  1. Design appropriate kernels of fixed parameters and detect edges with horizontal, vertical, diagonal orientation separately;
  2. Derive shape transformation formula in the forward propagation process of CNN(Convolutional Neural Network) and verify the result by fundamental coding and calling scripts;
  3. Understand the effect and principle of 1x1 kernels and then explore different implementation versions of 1x1 convolution in 2-dimensional plane such as cross-correlation calculation and matrix multiplication;
  4. Construct LeNet[2] by hand using mxnet.gluon.nn and explore how different settings of hyperparameters impact training result and model performance;
  5. Construct AlexNet[3] by hand using torch.nn and explore how different settings of hyperparameters impact training result and model performance.

1.Edge Detection

1.1?Basic Principle

According to corresponding theories in DIP(Digital Image Processing), one-order difference calculators or kernels such as Prewitt and Sobel kernels with horizontal, vertical and two diagonal design versions can be used to detect edges in gray-scale images.

These kernels can filter out transition between different objects or parts of an object due to rapid change in intensity level of pixels distributed along the edges on both sides.

By the way, adding a comprehensive orientation algorithm to combine all direction information, see ‘combimg’ implementation for details.

1.2?Function Design

This section employs two tool functions to accomplish the goal: get_data(input_dir) for image loading(similar to building dataset) ;? edge_detect(input_dir) for cross-correlation calculation under different settings of kernel shape and layer shape.

Figure 2 Code implementation

1.3?Carrying-out Result

Choosing 6 different scenario photos with obvious edge information posted by professional photographers on websites as a mini-dataset.

Figure 3 Mini-dataset in Mission 1

Only saving the ‘combimgs’. ‘canyon.jpg’ interprets direction attribute of Prewitt kernel vividly.

Figure 4 canyon.jpg

Other examples are as follows, orientation of texture can verify the DIP theory more or less.

Figure 5 galaxy, notice the small dot in the picture with interesting behavior(The combination dot has a black circle within it, others only include shapes like rectangular line)

Figure 6 Bungalow laying in the embrace of lake and mountains

Figure 7 grassland and night sky in an estate

Figure 8 Clouds

2. Shape of layers and kernels in a CNN

2.1?Basic Theories

Figure 9 Kernel and Layer in a CNN

Unlike hidden neurons(intermediate outputs) and lines(weights) fully connecting them in a multiple-layer perceptron(MLP), CNN is mainly characterized by kernels(similar to weights in MLP) and feature maps(similar to nodes in MLP), with activation functions, normalization layers and some other designs together construct the architecture. Kernels can also be understood as component of certain CNN layers.

Kernel is here mainly to reduce overwhelming parameter size and to reuse parameter scientifically according to spatial distribution locality and adjacency principles. Input images are transformed to different feature maps after going through convolution or cross-correlation operations of kernels. Notice that these kernels can have either adjustable(Convolutional kernel) or non-adjustable parameter settings(Pooling kernel).

These feature maps can include any implicit information such as edges of objects and so on. Part I of this experiment demonstrates the effect of human-designed edge-detection kernels. For layers near the top of deeper neural networks, the feature maps within may indicate rather global information(sometimes nothing can be learned may be due to smaller images input and deeper depth, so ResNet was born), taking AlexNet and LeNet as examples.

Figure 10???????????? Characteristics Visualization & Understanding[1]

Figure 11????? Primitive CNN architecture proposed(1998,2012)

A common design problem is to estimate the parameter size(storage amount) and training time(PREFETCHING PROCESS/GPU hour measured) of a 2D-CNN architecture. The size of a featured map is fixed as ‘NCHW’ format(or ‘NHWC’), while the size of a kernel is denoted as ‘CoCiKhKw’(or ‘KhKwCiCo’). See Figure 9 for graphic explanation.

Figure 12???????????? Cross-correlation calculation at a 2D-convolutional layer

According to academic design and that within the textbook, NCHW and CoCiKhKw? should satisfy C== Ci. When Co==1, Ci different kernels are used to do convolution(equivalent to cross-correlation operations in manipulation section) separately in correspondence to feature maps. One kernel aims at one feature map with size Nx1xHxW.

The result is obtained by pixel-wise summation on Ci different Nx1xHxW to get a composite Nx1xH2xW2 feature map with richer information. Then repeating similar process Co times to achieve final output: NxCoxH2xW2. Kernel size, paddings and strides and three basic settings for convolution operation which leads to different mapping: H->H2, W->W2.

Pooling layer has kernels with unlearnable parameters, generally divided into max-pooling and average-pooling.

2.2?Code implementation(numpy,mxnet.gluon.nn,mxnet.nd)

Use 2 ways to verify: direct hand-coding and package calling.

The input images are random values generated by numpy and simulate noises, serving to verify shapes of feature maps at the current layer. Kernels vary in Ci channels and are identical in Co channels. Use 5 nested loops to accomplish.

???????2.3?Result

These hand-coded kernels can be actually interpreted as smoothing filters with small variance because of k_base setting in the code block, however parameters in nn.Conv2D are initialized randomly without meaning at the start of training. By the way, the network layer isn’t initialized in this section because it is not necessary to do so.

nn.Conv2D can detect in_channels automatically and the layer goes through delayed initialization in this case. Reinitialization or assigning in_channels by hand can avoid delayed initialization.

In the simulation, N=2 and Ci=3, Co=4,H=360,W=480,Kh=Kw=3,ph=pw=1(only for unilateral),sh=sw=1. Result shows that the shape formula is correct.

Figure 13????? H2 and W2 should be floored to integer[1]

3. 1x1 Convolution

???????3.1?Basic Theory

1x1 convolution is specifically used to compress channels C of feature maps to contract parameters needed. In this case, ph=pw=0,sh=sw=1, Co<Ci.

???????3.2?Code implementation (3 lines)

Use NHWC format in 1x1 convolution matrix multiplication implementation.

A much slower implementation of 1x1 convolution is simply adjusting the parameter and calling hand-coded convsize_verify().

???????3.3?Result

?

The result indicates that mxnet.gluon.nn implements convolution in the form of matrix multiplication. Similar method can be generalized into implementation of general kernel size convolution:

Given a layer of N feature maps, first divide input feature maps into M(H2xW2) flattened pixel vectors(length=KhxKw);

then do dot product with flattened kernels on two dimensions;

finally, reshape the output feature maps to obtain result.

A more detailed explantion is as follows.

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/diannao/90538.shtml
繁體地址,請注明出處:http://hk.pswp.cn/diannao/90538.shtml
英文地址,請注明出處:http://en.pswp.cn/diannao/90538.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

【iOS設計模式】深入理解MVC架構 - 重構你的第一個App

目錄 一、MVC模式概述 二、創建Model層 1. 新建Person模型類 2. 實現Person類 三、重構ViewController 1. 修改ViewController.h 2. 重構ViewController.m 四、MVC組件詳解 1. Model&#xff08;Person類&#xff09; 2. View&#xff08;Storyboard中的UI元素&#x…

前端項目集成lint-staged

lint-staged (lint-staged) 這個插件可以只針對進入git暫存區中的代碼進行代碼格式檢查與修復&#xff0c;極大提升效率&#xff0c;避免掃描整個項目文件&#xff0c;代碼風格控制 eslint prettier stylelint 看這兩篇文章 前端項目vue3項目集成eslint9.x跟prettier 前端項…

李宏毅genai筆記:模型編輯

0 和post training的區別直接用post training的方法是有挑戰的&#xff0c;因為通常訓練資料只有一筆而且之后不管問什么問題&#xff0c;都有可能只是這個答案了1 模型編輯的評估方案 reliability——同樣的問題&#xff0c;需要是目標答案generalization——問題&#xff08;…

Oracle:union all和union區別

UNION ALL和UNION在Oracle中的主要區別體現在處理重復記錄、性能及結果排序上&#xff1a;處理重復記錄?UNION?&#xff1a;自動去除重復記錄&#xff0c;確保最終結果唯一。?UNION ALL?&#xff1a;保留所有記錄&#xff0c;包括完全重復的行。性能表現?UNION?&#xff…

[C#/.NET] 內網開發中如何使用 System.Text.Json 實現 JSON 解析(無需 NuGet)

在實際的企業開發環境中&#xff0c;尤其是內網隔離環境&#xff0c;開發人員經常面臨無法使用 NuGet 安裝外部包的問題。對于基于 .NET Framework 4.8 的應用&#xff0c;JSON 解析是一個常見的需求&#xff0c;但初始項目中往往未包含任何 JSON 處理相關的程序集。這時&#…

JVM(Java 虛擬機)的介紹

JVM原理JVM 核心架構與工作流程1. 類加載機制&#xff08;Class Loading&#xff09;2. 運行時數據區&#xff08;Runtime Data Areas&#xff09;堆&#xff08;Heap&#xff09;方法區&#xff08;Method Area&#xff09;:元空間&#xff08;Metaspace&#xff09;公共區域虛…

Qt 信號槽的擴展知識

Qt 信號槽的擴展知識一、信號與槽的重載Qt信號與槽的重載問題注意事項示例場景二、一個信號連接多個槽1、直接連接多個槽2、使用lambda表達式連接3、連接順序控制4、斷開特定連接5、自動連接方式三、 多個信號連接一個槽基本連接語法使用QSignalMapper區分信號源&#xff08;Qt…

鏈表算法之【合并兩個有序鏈表】

目錄 LeetCode-21題 LeetCode-21題 將兩個升序鏈表合并成一個新的升序鏈表并返回 class Solution {public ListNode mergeTwoLists(ListNode list1, ListNode list2) {if (list1 null)return list2;if (list2 null)return list1;ListNode dummyHead new ListNode();ListN…

Linux - firewall 防火墻

&#x1f525; 什么是 firewalld&#xff1f;firewalld 是一個動態管理防火墻的守護進程&#xff08;daemon&#xff09;&#xff0c;它提供了一個 D-Bus 接口來管理系統或用戶的防火墻規則。與傳統的靜態 iptables 不同&#xff0c;firewalld 支持&#xff1a;區域&#xff08…

【GESP】C++二級真題 luogu-B4356 [GESP202506 二級] 數三角形

GESP C二級&#xff0c;2025年6月真題&#xff0c;多重循環&#xff0c;難度★?☆☆☆。 題目題解詳見&#xff1a;【GESP】C二級真題 luogu-B4356 [GESP202506 二級] 數三角形 | OneCoder 【GESP】C二級真題 luogu-B4356 [GESP202506 二級] 數三角形 | OneCoderGESP C二級&…

遙感影像巖性分類:基于CNN與CNN-EL集成學習的深度學習方法

遙感影像巖性分類&#xff1a;基于CNN與CNN-EL集成學習的深度學習方法 大家好&#xff0c;我是微學AI&#xff0c;今天給大家介紹一下遙感影像巖性分類&#xff1a;基于CNN與CNN-EL集成學習的深度學習方法。該方法充分利用了多源遙感數據的光譜和空間信息&#xff0c;同時結合…

【STM32 學習筆記】SPI通信協議

SPI通信協議 SPI協議是由摩托羅拉公司提出的通訊協議(Serial Peripheral Interface)&#xff0c;即串行外圍設備接口&#xff0c; 是一種高速全雙工的通信總線。它被廣泛地使用在ADC、LCD等設備與MCU間&#xff0c;要求通訊速率較高的場合。 ??學習本章時&#xff0c;可與I2C…

Kafka如何做到消息不丟失

一、三種消息傳遞語義(Message Delivery Semantics):核心是“消息被消費處理的次數” Kafka的三種傳遞語義本質上描述的是“一條消息從生產到最終被消費者處理完成,可能出現的次數”,這由生產者的消息寫入可靠性和消費者的offset提交策略共同決定。 1. At most once(最…

HEVC/H.265 碼流分析工具 HEVCESBrowser 使用教程

引言 研究視頻編解碼的都知道&#xff0c;少不了各類的分析工具助力標準研究和算法開發&#xff0c;目前最出名的流媒體分析工具就是elecard系列&#xff0c;但基于一些原因可能大家用的都比較少。因此&#xff0c;找到合適的碼流分析工具才是編解碼研究的便捷途徑&#xff0c…

量子計算+AI芯片:光子計算如何重構神經網絡硬件生態

前言 前些天發現了一個巨牛的人工智能免費學習網站&#xff0c;通俗易懂&#xff0c;風趣幽默&#xff0c;忍不住分享一下給大家。點擊跳轉到網站 量子計算AI芯片&#xff1a;光子計算如何重構神經網絡硬件生態 ——2025年超異構計算架構下的萬億參數模型訓練革命 產業拐點&a…

linux 4.14 kernel屏蔽arm arch timer的方法

在 ARMv7 架構的單核 CPU 系統中&#xff0c;完全禁用 coretime 時鐘中斷&#xff08;通常是 ARM 私有定時器中斷&#xff09;需要謹慎操作&#xff0c;因為這會導致調度器無法工作&#xff0c;系統可能失去響應。以下是實現方法及注意事項&#xff1a;方法 1&#xff1a;通過 …

[實戰]調頻(FM)和調幅(AM)信號生成(完整C語言實現)

調頻&#xff08;FM&#xff09;和調幅&#xff08;AM&#xff09;信號生成 文章目錄調頻&#xff08;FM&#xff09;和調幅&#xff08;AM&#xff09;信號生成1. 調頻&#xff08;FM&#xff09;和調幅&#xff08;AM&#xff09;信號原理與信號生成調幅&#xff08;AM&#…

【LeetCode 熱題 100】21. 合并兩個有序鏈表——(解法一)迭代法

Problem: 21. 合并兩個有序鏈表 題目&#xff1a;將兩個升序鏈表合并為一個新的 升序 鏈表并返回。新鏈表是通過拼接給定的兩個鏈表的所有節點組成的。 文章目錄整體思路完整代碼時空復雜度時間復雜度&#xff1a;O(M N)空間復雜度&#xff1a;O(1)整體思路 這段代碼旨在解決…

力扣 hot100 Day40

23. 合并 K 個升序鏈表 給你一個鏈表數組&#xff0c;每個鏈表都已經按升序排列。 請你將所有鏈表合并到一個升序鏈表中&#xff0c;返回合并后的鏈表。 //自己寫的垃圾 class Solution { public:ListNode* mergeKLists(vector<ListNode*>& lists) {vector<int…

validate CRI v1 image API for endpoint “unix:///run/containerd/containerd.sock“

1.現象pull image failed: Failed to exec command: sudo -E /bin/bash -c "env PATH$PATH crictl pull 172.23.123.117:8443/kubesphereio/pause:3.9"FATA[0000] validate service connection: validate CRI v1 image API for endpoint "unix:///run/container…