ncnn 算子操作描述

ncnn 算子操作描述，具體查詢見

ncnn/docs/developer-guide/operators.md at master · Tencent/ncnn · GitHub

都是從上述地方copy過來的，做備份。

具體如下：（針對有些算子用pytorch 實現了用例，可以對比學習，如有錯誤歡迎指出）

內容索引

1.AbsVal: 計算輸入張量中的每個元素的絕對值

2.ArgMax: 計算輸入張量中元素的最大值，并返回其位置索引。

3.BatchNorm: 對神經網絡的每一層進行歸一化操作

4.Bias: 為神經網絡的神經元或層添加偏置項

5.BinaryOp: 二元操作

6.BNLL: 對輸入應用 BNLL 激活函數

7.Cast: 類型轉換

8.CELU: 應用 CELU 激活函數。

9.Clip: 將輸入張量中的元素限制在指定范圍內。

10.Concat: 沿指定軸連接多個輸入張量。

11.Convolution: 卷積操作

12.Convolution1D:一維卷積

13.Convolution3D:三維卷積

14.ConvolutionDepthWise: 深度可分離卷積

15.ConvolutionDepthWise1D: 在一維數據上應用深度可分離卷積

16.ConvolutionDepthWise3D: 在三維數據上應用深度可分離卷積

17.CopyTo: 將輸入數據復制到指定位置

18.Crop: 裁剪操作

19.CumulativeSum: 對輸入數據進行累積求和操作。

20.Deconvolution: 反卷積操作

21.Deconvolution1D: 一維反卷積操作

22.Deconvolution3D: 三維反卷積操作

23.DeconvolutionDepthWise: 深度可分離反卷積

24.DeconvolutionDepthWise1D: 在一維數據上應用深度可分離反卷積

25.DeconvolutionDepthWise3D: 三維深度可分離反卷積

26.DeformableConv2D: 可變形卷積，允許卷積核在空間上變形

27.Dequantize: 對量化后的數據進行反量化操作

28.Diag: 創建一個對角陣

29.Dropout: 隨機失活

30.Eltwise: 逐元素操作

31.ELU: 應用指數線性單元（ELU）激活函數

32.Embed: 將輸入數據映射到低維空間

33.Exp: 計算輸入數據的指數

34.Flatten: 將輸入數據展平為一維

35.Fold: 折疊操作

36.GELU: 應用高斯誤差線性單元（GELU）激活函數

37.GLU: 應用門控線性單元（GLU）激活函數

38.Gemm: 執行矩陣乘法操作

39.GridSample: 在輸入的網格上進行采樣操作

40.GroupNorm: 對神經網絡中的特征圖執行分組歸一化

41.GRU: 門控循環單元（GRU）神經網絡層

42.HardSigmoid: 應用硬Sigmoid激活函數

43.HardSwish: 應用硬Swish激活函數

44.InnerProduct: 執行全連接操作

45.Input: 神經網絡的輸入層

46.InstanceNorm: 歸一化操作

47.Interp: 執行插值操作

48.LayerNorm: 對神經網絡中的層執行歸一化操作

49.Log: 計算輸入數據的自然對數

50.LRN: 局部響應歸一化層

51.LSTM: 長短期記憶（LSTM）神經網絡層

52.MemoryData: 用于存儲數據并生成數據迭代器

53.Mish: 應用Mish激活函數

54.MultiHeadAttention: 多頭注意力機制

55.MVN: 均值方差歸一化操作

56.Noop: 空操作

57.Normalize: 歸一化操作

58.Packing: 打包操作

59.Padding: 填充操作

60.Permute: 置換操作

61.PixelShuffle: 像素重組

62.Pooling: 池化操作

63.Pooling1D: 一維池化操作

64.Pooling3D: 三維池化操作

65.Power: 冪運算

66.PReLU: 參數化修正線性單元

67.Quantize: 量化操作

68.Reduction: 執行張量的降維操作

69.ReLU: 應用修正線性單元（ReLU）激活函數。

70.Reorg: 通道重排操作

71.Requantize: 重新量化（再量化）

72.Reshape: 形狀重塑操作

73.RNN: 循環神經網絡（RNN）層。

74.Scale: 縮放操作

75.SELU: 應用自歸一化激活函數

76.Shrink: 對輸入數據進行收縮操作

77.ShuffleChannel: 通道混洗操作

78.Sigmoid: 應用Sigmoid激活函數

79.Slice: 分割操作

80.Softmax: 應用Softmax激活函數，通常用于分類任務。

81.Softplus: 應用Softplus激活函數。

82.Split: 將輸入數據分割為多個部分。

83.Swish: swish激活函數

84.TanH: TanH激活函數

85.Threshold: 閾值操作

86.Tile: 重復復制

87.UnaryOp: 對輸入執行一元操作

88.Unfold: 在輸入數據上執行展開操作

1.AbsVal: 計算輸入張量中的每個元素的絕對值。

y = abs(x)

one_blob_only? 只支持一個blob
support_inplace? 支持替換輸入的blob 就 y=abs(y)

import torchinput_tensor = torch.tensor([-1, 2, -3, 4, -5])
output_tensor = torch.abs(input_tensor)
print(output_tensor)
# tensor([1, 2, 3, 4, 5])

2.ArgMax: 計算輸入張量中元素的最大值，并返回其位置索引。

y = argmax(x, out_max_val, topk)

one_blob_only? 支持一個blob

param id	name	type	default	description
0	out_max_val	int	0
1	topk	int	1

import torchinput_tensor = torch.tensor([10, 5, 8, 20, 15])
output_index = torch.argmax(input_tensor)
print(output_index)
# tensor(3)

3.BatchNorm: 對神經網絡的每一層進行歸一化操作。

y = (x - mean) / sqrt(var + eps) * slope + bias

one_blob_only? 支持一個參數
support_inplace? 支持替換

param id	name	type	default	description
0	channels	int	0
1	eps	float	0.f

weight	type	shape
slope_data	float	[channels]
mean_data	float	[channels]
var_data	float	[channels]
bias_data	float	[channels]

import torch
import torch.nn as nnbatch_norm_layer = nn.BatchNorm1d(3)
input_tensor = torch.randn(2, 3, 4)  # Batch size為2，特征維度為3，序列長度為4
output_tensor = batch_norm_layer(input_tensor)
print(output_tensor)# tensor([[[-0.5624,  0.9015, -0.9183,  0.3030],
#          [ 0.4668,  1.0430, -2.0182,  0.7149],
#          [-1.5960,  0.5437,  0.8771, -0.1269]],
# 
#         [[-0.1101, -1.4983,  1.9178, -0.0333],
#          [-0.1873, -1.1687,  0.7301,  0.4194],
#          [ 1.2667,  0.7976, -1.4188, -0.3434]]],
#        grad_fn=<NativeBatchNormBackward0>)

4.Bias: 為神經網絡的神經元或層添加偏置項。

y = x + bias

one_blob_only
support_inplace

param id	name	type	default	description
0	bias_data_size	int	0

weight	type	shape
bias_data	float	[channels]

import torchinput_tensor = torch.randn(3, 4)
bias = torch.randn(4)
output_tensor = input_tensor + bias
print('output_tensor:',output_tensor,'\nshape:',output_tensor.shape)# tensor([[-0.1874,  1.2358,  1.9006,  0.4483],
#         [-1.1005,  1.6844, -0.3991, -0.4538],
#         [ 0.4519,  2.2752,  1.6041, -1.2463]])
# shape: torch.Size([3, 4])

5.BinaryOp: 二元操作

對兩個輸入執行特定的二元操作，如加法.減法等

This operation is used for binary computation, and the calculation rule depends on the?broadcasting rule.（這個操作用于二進制計算，計算規則取決于廣播規則。）

C = binaryop(A, B)

if with_scalar = 1:

one_blob_only
support_inplace

param id	name	type	default	description
0	op_type	int	0	Operation type as follows
1	with_scalar	int	0	with_scalar=0 B is a matrix, with_scalar=1 B is a scalar
2	b	float	0.f	When B is a scalar, B = b

Operation type:

0 = ADD（加法）
1 = SUB（減法）
2 = MUL（乘法）
3 = DIV（除法）
4 = MAX（取最大值）
5 = MIN（取最小值）
6 = POW（冪運算）
7 = RSUB（右操作數減去左操作數）
8 = RDIV（右操作數除以左操作數）
9 = RPOW（右操作數的左操作數次冪）
10 = ATAN2（反正切運算）
11 = RATAN2（右操作數以左操作數為底的反正切運算）

6.BNLL: 對輸入應用 BNLL 激活函數

激活函數中的雙極性 Sigmoid 函數

f(x)=log(1 + exp(x))

y = log(1 + e^(-x)) , x > 0
y = log(1 + e^x),     x < 0

one_blob_only
support_inplace

7.Cast: 類型轉換

將輸入數據從一種數據類型轉換為另一種數據類型

y = cast(x)

one_blob_only
support_packing

param id	name	type	default	description
0	type_from	int	0
1	type_to	int	0

Element type:

0 = auto
1 = float32
2 = float16
3 = int8
4 = bfloat16

import torchinput_tensor = torch.tensor([1.5, 2.3, 3.7])
output_tensor = input_tensor.type(torch.int)
print(output_tensor)
# tensor([1, 2, 3], dtype=torch.int32)

8.CELU: 應用 CELU 激活函數。

if x < 0    y = (exp(x / alpha) - 1.f) * alpha
else        y = x

one_blob_only
support_inplace

param id	name	type	default	description
0	alpha	float	1.f

import torch
import torch.nn.functional as Finput_tensor = torch.randn(3, 4)
output_tensor = F.elu(input_tensor)
print('output_tensor:',output_tensor,'\nshape:',output_tensor.shape)
# output_tensor: tensor([[-0.5924,  0.7810,  1.1752,  0.8274],
#         [-0.6871,  0.0466,  0.9411, -0.7082],
#         [-0.8632, -0.1801, -0.8730,  0.9515]]) 
# shape: torch.Size([3, 4])

9.Clip: 將輸入張量中的元素限制在指定范圍內。

y = clamp(x, min, max)

one_blob_only
support_inplace

param id	name	type	default	description
0	min	float	-FLT_MAX
1	max	float	FLT_MAX

import torchinput_tensor = torch.randn(2, 3)
output_tensor = torch.clamp(input_tensor, min=-0.5, max=0.5)
print(output_tensor)# tensor([[-0.5000, -0.5000, -0.5000],
#         [ 0.5000, -0.4091, -0.5000]])

10.Concat: 沿指定軸連接多個輸入張量。

y = concat(x0, x1, x2, ...) by axis

param id	name	type	default	description
0	axis	int	0

import torchinput_tensor1 = torch.randn(2, 3)
input_tensor2 = torch.randn(2, 3)
output_tensor = torch.cat((input_tensor1, input_tensor2), dim=1)
print('output_tensor:',output_tensor,'\nshape:',output_tensor.shape)# output_tensor: tensor([[-2.4431, -0.6428,  0.4434,  1.2216, -1.1874, -1.1327],
#         [-0.8082, -0.3552,  0.9945, -0.7679,  0.6547, -1.0401]]) 
# shape: torch.Size([2, 6])

11.Convolution: 卷積操作

通過卷積操作提取輸入數據的特征。

x2 = pad(x, pads, pad_value)
x3 = conv(x2, weight, kernel, stride, dilation) + bias
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
8	int8_scale_term	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
18	pad_value	float	0.f
19	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16/int8	[kernel_w, kernel_h, num_input, num_output]
bias_data	float	[num_output]
weight_data_int8_scales	float	[num_output]
bottom_blob_int8_scales	float	[1]
top_blob_int8_scales	float	[1]

import torch
import torch.nn as nnconv_layer = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
input_tensor = torch.randn(1, 3, 32, 32)
output_tensor = conv_layer(input_tensor)
print(output_tensor.shape)
# torch.Size([1, 16, 32, 32])

12.Convolution1D:一維卷積

在一維數據上應用卷積操作。

x2 = pad(x, pads, pad_value)
x3 = conv1d(x2, weight, kernel, stride, dilation) + bias
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
15	pad_right	int	pad_left
18	pad_value	float	0.f
19	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16/int8	[kernel_w, num_input, num_output]
bias_data	float	[num_output]

import torch
import torch.nn as nnconv_layer = nn.Conv1d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
input_tensor = torch.randn(1, 3, 32)
output_tensor = conv_layer(input_tensor)
print(output_tensor.shape)
# torch.Size([1, 16, 32])

13.Convolution3D:三維卷積

在三維數據上應用卷積操作。

x2 = pad(x, pads, pad_value)
x3 = conv3d(x2, weight, kernel, stride, dilation) + bias
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
17	pad_behind	int	pad_front
18	pad_value	float	0.f
21	kernel_d	int	kernel_w
22	dilation_d	int	dilation_w
23	stride_d	int	stride_w
24	pad_front	int	pad_left

weight	type	shape
weight_data	float/fp16/int8	[kernel_w, kernel_h, kernel_d, num_input, num_output]
bias_data	float	[num_output]

import torch
import torch.nn as nnconv_layer = nn.Conv3d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
input_tensor = torch.randn(1, 3, 32, 32, 32)
output_tensor = conv_layer(input_tensor)
print(output_tensor.shape)
# torch.Size([1, 16, 32, 32, 32])

14.ConvolutionDepthWise: 深度可分離卷積

對每個輸入通道應用獨立卷積核。

x2 = pad(x, pads, pad_value)
x3 = conv(x2, weight, kernel, stride, dilation, group) + bias
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
7	group	int	1
8	int8_scale_term	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
18	pad_value	float	0.f
19	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16/int8	[kernel_w, kernel_h, num_input / group, num_output / group, group]
bias_data	float	[num_output]
weight_data_int8_scales	float	[group]
bottom_blob_int8_scales	float	[1]
top_blob_int8_scales	float	[1]

import torch
import torch.nn as nnconv_dw_layer = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, groups=3)
input_tensor = torch.randn(1, 3, 32, 32)
output_tensor = conv_dw_layer(input_tensor)
print(output_tensor.shape)
# torch.Size([1, 3, 30, 30])

15.ConvolutionDepthWise1D: 在一維數據上應用深度可分離卷積。

x2 = pad(x, pads, pad_value)
x3 = conv1d(x2, weight, kernel, stride, dilation, group) + bias
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
7	group	int	1
9	activation_type	int	0
10	activation_params	array	[ ]
15	pad_right	int	pad_left
18	pad_value	float	0.f
19	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16/int8	[kernel_w, num_input / group, num_output / group, group]
bias_data	float	[num_output]

import torch
import torch.nn as nn# 定義一個一維的深度可分離卷積層
conv_dw_layer = nn.Conv1d(in_channels=3, out_channels=3, kernel_size=3, groups=3)# 創建一個隨機輸入張量
input_tensor = torch.randn(1, 3, 10)  # 輸入張量的形狀為 (batch_size, channels, sequence_length)# 將輸入張量傳遞給深度可分離卷積層
output_tensor = conv_dw_layer(input_tensor)print(output_tensor.shape)
# torch.Size([1, 3, 8])

16.ConvolutionDepthWise3D: 在三維數據上應用深度可分離卷積。

x2 = pad(x, pads, pad_value)
x3 = conv1d(x2, weight, kernel, stride, dilation, group) + bias
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
7	group	int	1
9	activation_type	int	0
10	activation_params	array	[ ]
15	pad_right	int	pad_left
18	pad_value	float	0.f
19	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16/int8	[kernel_w, num_input / group, num_output / group, group]
bias_data	float	[num_output]

17.CopyTo: 將輸入數據復制到指定位置

self[offset] = src

one_blob_only

param id	name	type	default
0	woffset	int	0
1	hoffset	int	0
13	doffset	int	0
2	coffset	int	0
9	starts	array	[ ]
11	axes	array	[ ]

18.Crop: 裁剪操作

對輸入數據進行裁剪操作，保留感興趣的部分。

y = crop(x)

one_blob_only

param id	name	type	default
0	woffset	int	0
1	hoffset	int	0
13	doffset	int	0
2	coffset	int	0
3	outw	int	0
4	outh	int	0
14	outd	int	0
5	outc	int	0
6	woffset2	int	0
7	hoffset2	int	0
15	doffset2	int	0
8	coffset2	int	0
9	starts	array	[ ]
10	ends	array	[ ]
11	axes	array	[ ]

import torch# 創建一個3x3的張量
tensor = torch.tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]])# 進行裁剪，選取其中部分區域
cropped_tensor = tensor[1:, 1:]print(cropped_tensor)
# tensor([[5, 6],
#         [8, 9]])

19.CumulativeSum: 對輸入數據進行累積求和操作。

If axis < 0, we use axis = x.dims + axis

It implements?torch.cumsum — PyTorch 2.3 documentation

one_blob_only
support_inplace

param id	name	type	default	description
0	axis	int	0

20.Deconvolution: 反卷積操作

用于圖像生成和語義分割任務等。

x2 = deconv(x, weight, kernel, stride, dilation) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
18	output_pad_right	int	0
19	output_pad_bottom	int	output_pad_right
20	output_w	int	0
21	output_h	int	output_w
28	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16	[kernel_w, kernel_h, num_input, num_output]
bias_data	float	[num_output]

21.Deconvolution1D: 一維反卷積操作

在一維數據上應用反卷積操作。

x2 = deconv1d(x, weight, kernel, stride, dilation) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
15	pad_right	int	pad_left
18	output_pad_right	int	0
20	output_w	int	0
28	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16	[kernel_w, num_input, num_output]
bias_data	float	[num_output]

22.Deconvolution3D: 三維反卷積操作

在三維數據上應用反卷積操作。

x2 = deconv3d(x, weight, kernel, stride, dilation) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
17	pad_behind	int	pad_front
18	output_pad_right	int	0
19	output_pad_bottom	int	output_pad_right
20	output_pad_behind	int	output_pad_right
21	kernel_d	int	kernel_w
22	dilation_d	int	dilation_w
23	stride_d	int	stride_w
24	pad_front	int	pad_left
25	output_w	int	0
26	output_h	int	output_w
27	output_d	int	output_w

weight	type	shape
weight_data	float/fp16	[kernel_w, kernel_h, kernel_d, num_input, num_output]
bias_data	float	[num_output]

23.DeconvolutionDepthWise: 深度可分離反卷積。

x2 = deconv(x, weight, kernel, stride, dilation, group) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
7	group	int	1
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
18	output_pad_right	int	0
19	output_pad_bottom	int	output_pad_right
20	output_w	int	0
21	output_h	int	output_w
28	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16	[kernel_w, kernel_h, num_input / group, num_output / group, group]
bias_data	float	[num_output]

24.DeconvolutionDepthWise1D: 在一維數據上應用深度可分離反卷積。

x2 = deconv1d(x, weight, kernel, stride, dilation, group) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
7	group	int	1
9	activation_type	int	0
10	activation_params	array	[ ]
15	pad_right	int	pad_left
18	output_pad_right	int	0
20	output_w	int	0
28	dynamic_weight	int	0

weight	type	shape
weight_data	float/fp16	[kernel_w, num_input / group, num_output / group, group]
bias_data	float	[num_output]

25.DeconvolutionDepthWise3D: 三維深度可分離反卷積

在三維數據上應用深度可分離反卷積。

x2 = deconv3d(x, weight, kernel, stride, dilation, group) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
7	group	int	1
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
17	pad_behind	int	pad_front
18	output_pad_right	int	0
19	output_pad_bottom	int	output_pad_right
20	output_pad_behind	int	output_pad_right
21	kernel_d	int	kernel_w
22	dilation_d	int	dilation_w
23	stride_d	int	stride_w
24	pad_front	int	pad_left
25	output_w	int	0
26	output_h	int	output_w
27	output_d	int	output_w

weight	type	shape
weight_data	float/fp16	[kernel_w, kernel_h, kernel_d, num_input / group, num_output / group, group]
bias_data	float	[num_output]

26.DeformableConv2D: 可變形卷積，允許卷積核在空間上變形。

x2 = deformableconv2d(x, offset, mask, weight, kernel, stride, dilation) + bias
y = activation(x2, act_type, act_params)

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
5	bias_term	int	0
6	weight_data_size	int	0
9	activation_type	int	0
10	activation_params	array	[ ]
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top

weight

type

shape

weight_data

float/fp16/int8

[kernel_w, kernel_h, num_input, num_output]

bias_data

float

[num_output]

27.Dequantize: 對量化后的數據進行反量化操作。

將量化后的數據還原為原始浮點數形式的過程，通常用于將量化后的激活值或權重恢復為浮點數，以便進行后續的計算

y = x * scale + bias

one_blob_only
support_inplace

param id	name	type	default	description
0	scale_data_size	int	1
1	bias_data_size	int	0

weight	type	shape
scale_data	float	[scale_data_size]
bias_data	float	[bias_data_size]

#對激活值（Activation）進行Dequantization：
import torch# 假設quantized_tensor為量化后的張量
quantized_tensor = torch.tensor([0, 1, 2, 3], dtype=torch.uint8)  # 假設使用8位無符號整數進行量化# 進行Dequantization
dequantized_tensor = quantized_tensor.float()  # 將數據類型轉換為float類型，即將量化后的整數數據轉換為浮點數print(dequantized_tensor)
# tensor([0., 1., 2., 3.])

#對權重（Weights）進行Dequantization
import torch# 假設quantized_weights為量化后的權重張量
quantized_weights = torch.tensor([-1, 0, 1, 2], dtype=torch.int8)  # 假設使用8位有符號整數進行量化# 進行Dequantization
scale = 0.01  # 量化比例
dequantized_weights = quantized_weights.float() * scale  # 將量化后的整數數據乘以比例因子以完成反量化操作print(dequantized_weights)
# tensor([-0.0100,  0.0000,  0.0100,  0.0200])

28.Diag: 創建一個對角陣。

對角矩陣是一個主對角線以外的所有元素均為零的矩陣，而主對角線上的元素可以為零或非零。

如下：

y = diag(x, diagonal)

one_blob_only

param id	name	type	default	description
0	diagonal	int	0

import torch# 創建一個包含對角線元素為 [1, 2, 3] 的對角矩陣
diagonal_elements = torch.tensor([1, 2, 3])
diagonal_matrix = torch.diag(diagonal_elements)print(diagonal_matrix)
# tensor([[1, 0, 0],
#         [0, 2, 0],
#         [0, 0, 3]])

29.Dropout: 隨機失活

在訓練過程中隨機斷開神經元連接，用于防止過擬合。

y = x * scale

one_blob_only

param id	name	type	default	description
0	scale	float	1.f

import torch
import torch.nn as nn# 創建一個包含兩個全連接層和一個Dropout層的神經網絡
class MyModel(nn.Module):def __init__(self):super(MyModel, self).__init__()self.fc1 = nn.Linear(10, 5)self.dropout = nn.Dropout(p=0.5)  # 創建一個保留概率為0.5的Dropout層self.fc2 = nn.Linear(5, 2)def forward(self, x):x = self.fc1(x)x = self.dropout(x)  # 在全連接層1的輸出上應用Dropoutx = torch.relu(x)x = self.fc2(x)return x# 創建模型實例
model = MyModel()# 在訓練時，使用model.train()來開啟Dropout
model.train()# 輸入數據示例
input_data = torch.randn(1, 10)  # 創建一個大小為(1, 10)的張量# 前向傳播
output = model(input_data)print(output)
# tensor([[0.7759, 0.4466]], grad_fn=<AddmmBackward0>)

30.Eltwise: 逐元素操作

對輸入執行元素級操作，如加法.乘法等。

y = elementwise_op(x0, x1, ...)

param id	name	type	default	description
0	op_type	int	0
1	coeffs	array	[ ]

Operation type:

0 = PROD
1 = SUM
2 = MAX

import torch# 創建兩個張量
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])# 0 = PROD，逐元素相乘
prod_result = torch.mul(a, b)
print("Elementwise product result:", prod_result)
# Elementwise product result: tensor([ 4, 10, 18])
# 1 = SUM，逐元素相加
sum_result = torch.add(a, b)
print("Elementwise sum result:", sum_result)
# Elementwise sum result: tensor([5, 7, 9])# 2 = MAX，逐元素取最大值
max_result = torch.maximum(a, b)
print("Elementwise max result:", max_result)
# Elementwise max result: tensor([4, 5, 6])

31.ELU: 應用指數線性單元（ELU）激活函數。

if x < 0    y = (exp(x) - 1) * alpha
else        y = x

one_blob_only
support_inplace

param id	name	type	default	description
0	alpha	float	0.1f

32.Embed: 將輸入數據映射到低維空間。

詞向量啊，萬物皆可embed

將高維稀疏的數據編碼成低維稠密向量表示的技術，通常用于將離散的類別型數據（例如單詞、產品ID等）映射到連續的實數向量空間中

y = embedding(x)

param id	name	type
0	num_output	int
1	input_dim	int
2	bias_term	int
3	weight_data_size	int

weight	type	shape
weight_data	float	[weight_data_size]
bias_term	float	[num_output]

import torch
import torch.nn as nn# 假設我們有10個不同的詞，需要將它們映射成一個5維的稠密向量
vocab_size = 10
embedding_dim = 5# 創建一個Embedding層
embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dim)# 定義一個輸入，假設我們要獲取ID為3和7的詞的向量表示
input_ids = torch.LongTensor([3, 7])# 通過Embedding層獲取對應詞的向量表示
output = embedding(input_ids)print(output)
# tensor([[-0.4583,  2.2385,  1.1503,  0.4575, -0.5081],
#         [ 2.1852, -1.2893,  0.6631,  0.1552,  1.6735]],
#        grad_fn=<EmbeddingBackward0>)

33.Exp: 計算輸入數據的指數。

if base == -1   y = exp(shift + x * scale)
else            y = pow(base, (shift + x * scale))

one_blob_only
support_inplace

param id	name	type	default
0	base	float	-1.f
1	scale	float	1.f
2	shift	float	0.f

34.Flatten: 將輸入數據展平為一維。

Reshape blob to 1 dimension（將其重塑為一維數組。）

one_blob_only

import torch# 創建一個3維張量，例如(2, 3, 4)，表示(batch_size, channels, height, width)
input_tensor = torch.randn(2, 3, 4)# 使用torch.flatten()將張量展平
output_tensor1 = torch.flatten(input_tensor, start_dim=0)# 使用torch.flatten()將張量展平
output_tensor2 = input_tensor.view(2*3*4)print("Input Tensor shape:", input_tensor.shape)
print("Flattened Tensor shape:", output_tensor1.shape)
print("view Tensor shape:", output_tensor2.shape)
# Input Tensor shape: torch.Size([2, 3, 4])
# Flattened Tensor shape: torch.Size([24])
# view Tensor shape: torch.Size([24])

35.Fold: 折疊操作

對輸入數據進行折疊操作，與展平相反。

y = fold(x)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top
20	output_w	int	0
21	output_h	int	output_w

import torch# 創建一個4x4的張量
x = torch.arange(1, 17).view(4, 4)
print("Original tensor:")
print(x)
# Original tensor:
# tensor([[ 1,  2,  3,  4],
#         [ 5,  6,  7,  8],
#         [ 9, 10, 11, 12],
#         [13, 14, 15, 16]])
# 對張量進行fold操作 4x4 =16  分成 2x8 或者8x2 、1x16 、2x2x2x2其他 等等
folded_tensor = x.view(2,2,2,2)
print("Folded tensor:")
print(folded_tensor)
# Folded tensor:
# tensor([[[[ 1,  2],
#           [ 3,  4]],
# 
#          [[ 5,  6],
#           [ 7,  8]]],
# 
# 
#         [[[ 9, 10],
#           [11, 12]],
# 
#          [[13, 14],
#           [15, 16]]]])

36.GELU: 應用高斯誤差線性單元（GELU）激活函數。

if fast_gelu == 1   y = 0.5 * x * (1 + tanh(0.79788452 * (x + 0.044715 * x * x * x)));
else                y = 0.5 * x * erfc(-0.70710678 * x)

one_blob_only
support_inplace

param id	name	type	default	description
0	fast_gelu	int	0	use approximation

37.GLU: 應用門控線性單元（GLU）激活函數。

If axis < 0, we use axis = x.dims + axis

GLU(a,b)=a?σ(b)

where a is the first half of the input matrix and b is the second half.

axis specifies the dimension to split the input

a 是輸入矩陣的前一半，b 是后一半。

axis?參數用于指定沿著哪個維度（dimension）對輸入矩陣進行分割。

one_blob_only

param id	name	type	default	description
0	axis	int	0

38.Gemm: 執行矩陣乘法操作。

a = transA ? transpose(x0) : x0
b = transb ? transpose(x1) : x1
c = x2
y = (gemm(a, b) + c * beta) * alpha

param id	name	type	default
0	alpha	float	1.f
1	beta	float	1.f
2	transA	int	0
3	transb	int	0
4	constantA	int	0
5	constantB	int	0
6	constantC	int	0
7	constantM	int	0
8	constantN	int	0
9	constantK	int	0
10	constant_broadcast_type_C	int	0
11	output_N1M	int	0
12	output_elempack	int	0
13	output_elemtype	int	0
14	output_transpose	int	0
20	constant_TILE_M	int	0
21	constant_TILE_N	int	0
22	constant_TILE_K	int	0

weight	type	shape
A_data	float	[M, K] or [K, M]
B_data	float	[N, K] or [K, N]
C_data	float	[1], [M] or [N] or [1, M] or [N,1] or [N, M]

import torch# 創建兩個矩陣
A = torch.tensor([[1, 2], [3, 4]])
B = torch.tensor([[5, 6], [7, 8]])# 執行矩陣乘法
C = torch.matmul(A, B)print("Matrix A:")
print(A)
print("Matrix B:")
print(B)
print("Result of Matrix Multiplication:")
print(C)
# Matrix A:
# tensor([[1, 2],
#         [3, 4]])
# Matrix B:
# tensor([[5, 6],
#         [7, 8]])
# Result of Matrix Multiplication:
# tensor([[19, 22],
#         [43, 50]])

39.GridSample: 在輸入的網格上進行采樣操作。

根據輸入的采樣網格（sampling grid）中指定的坐標，在輸入張量上進行采樣，輸出對應的插值結果

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.For each output location output[:, h2, w2], the size-2 vector grid[h2, w2, 2] specifies input pixel[:, h1, w1] locations x and y, 
which are used to interpolate the output value output[:, h2, w2]This function is often used in conjunction with affine_grid() to build Spatial Transformer Networks .

給定一個輸入和一個flow-field流場網格，使用輸入值和來自網格的像素位置，計算輸出。

對于每個輸出位置 output[:, h2, w2]，大小為2的向量 grid[h2, w2, 2] 指定了輸入像素[:, h1, w1] 的位置 x 和 y，用于進行輸出值 output[:, h2, w2] 的插值計算。

這個函數通常與 affine_grid() 一起使用，用于構建空間變換網絡（Spatial Transformer Networks）。

param id	name	type	default	description
0	sample_type	int	1
1	padding_mode	int	1
2	align_corner	int	0
3	permute_fusion	int	0	fuse with permute

Sample type:

1 = Nearest
2 = Bilinear
3 = Bicubic

Padding mode:

1 = zeros
2 = border
3 = reflection

#引用 https://www.cnblogs.com/yanghailin/p/17747266.html
import torch
from torch.nn import functional as Finp = torch.randint(10, 20, (1, 1, 20, 20)).float()
print('inp.shape:', inp.shape)# 得到一個長寬為20的tensor
out_h = 40
out_w = 40# 生成grid點
grid_h = torch.linspace(-1, 1, out_h).view(1, -1, 1).expand(1, out_h, out_w)
grid_w = torch.linspace(-1, 1, out_w).view(1, 1, -1).expand(1, out_h, out_w)
grid = torch.stack((grid_h, grid_w), dim=3)  # grid的形狀為 [1, 20, 20, 2]outp = F.grid_sample(inp, grid=grid, mode='bilinear')
print(outp.shape)  # torch.Size([1, 1, 20, 20])print("Input tensor:")
print(inp)print("Output tensor after grid sampling:")
print(outp)
# inp.shape: torch.Size([1, 1, 20, 20])
# torch.Size([1, 1, 40, 40])
# Input tensor:
# tensor([[[[16., 17., 16., 10., 16., 11., 13., 17., 16., 15., 10., 10., 13., 17.,
#            11., 19., 12., 11., 10., 12.],
#           [12., 15., 17., 16., 13., 13., 16., 19., 18., 10., 11., 13., 19., 14.,
#            14., 18., 14., 11., 10., 15.],
#           [12., 11., 18., 10., 15., 15., 17., 10., 10., 14., 18., 15., 12., 16.,
#            10., 18., 16., 16., 10., 16.],
#           [17., 17., 12., 11., 16., 16., 10., 16., 17., 16., 13., 10., 18., 18.,
#            17., 17., 17., 10., 16., 19.],
#           [14., 15., 16., 19., 12., 12., 11., 10., 16., 12., 16., 10., 17., 10.,
#            12., 18., 19., 13., 13., 16.],
#           [15., 19., 17., 18., 15., 16., 15., 10., 19., 15., 11., 16., 18., 14.,
#            19., 10., 13., 16., 18., 19.],
#           [13., 13., 14., 11., 15., 13., 18., 14., 10., 13., 13., 11., 17., 13.,
#            17., 13., 10., 12., 14., 10.],
#           [12., 10., 17., 16., 17., 10., 18., 15., 14., 13., 13., 10., 17., 16.,
#            19., 13., 14., 10., 17., 12.],
#           [12., 14., 18., 15., 16., 14., 13., 14., 13., 13., 17., 11., 15., 18.,
#            19., 14., 12., 14., 12., 14.],
#           [12., 13., 17., 14., 18., 16., 14., 16., 14., 15., 19., 13., 19., 17.,
#            12., 18., 15., 12., 16., 11.],
#           [10., 19., 12., 13., 12., 17., 14., 13., 19., 19., 12., 13., 17., 17.,
#            14., 17., 11., 14., 18., 12.],
#           [10., 19., 19., 11., 16., 16., 15., 17., 10., 13., 16., 10., 17., 10.,
#            15., 11., 11., 17., 15., 17.],
#           [13., 12., 10., 11., 11., 16., 16., 16., 10., 10., 13., 19., 14., 13.,
#            18., 15., 12., 19., 14., 16.],
#           [16., 13., 11., 11., 12., 16., 12., 16., 10., 16., 11., 19., 19., 12.,
#            11., 15., 11., 15., 12., 17.],
#           [17., 12., 17., 10., 15., 12., 13., 16., 14., 15., 19., 17., 17., 12.,
#            10., 18., 19., 12., 15., 13.],
#           [10., 15., 16., 10., 13., 19., 17., 19., 18., 18., 12., 14., 13., 12.,
#            18., 17., 12., 17., 14., 17.],
#           [13., 10., 15., 19., 19., 14., 11., 14., 11., 13., 19., 10., 10., 13.,
#            16., 11., 15., 13., 18., 15.],
#           [19., 10., 15., 15., 13., 13., 15., 13., 15., 18., 13., 10., 14., 10.,
#            13., 14., 16., 12., 17., 12.],
#           [12., 10., 17., 15., 19., 12., 19., 11., 14., 19., 16., 11., 17., 14.,
#            15., 12., 12., 14., 18., 15.],
#           [12., 15., 14., 18., 19., 19., 17., 11., 11., 12., 13., 19., 17., 19.,
#            10., 17., 15., 18., 14., 10.]]]])
# Output tensor after grid sampling:
# tensor([[[[ 4.0000,  7.9744,  6.9487,  ...,  6.0000,  6.0000,  3.0000],
#           [ 8.0064, 15.9619, 13.9237,  ..., 12.0048, 12.0376,  6.0192],
#           [ 8.2628, 16.4878, 14.9757,  ..., 12.1954, 13.5432,  6.7885],
#           ...,
#           [ 5.4744, 10.9670, 11.6967,  ..., 14.4545, 12.1599,  6.0513],
#           [ 5.9872, 12.0123, 13.5311,  ..., 12.6727, 10.1152,  5.0256],
#           [ 3.0000,  6.0192,  6.7885,  ...,  6.3141,  5.0321,  2.5000]]]])

40.GroupNorm: 對神經網絡中的特征圖執行分組歸一化。

將特征通道分為多個組，每個組包含一定數量的通道，然后對每個組內的通道進行獨立的規范化操作。

split x along channel axis into group x0, x1 ...
l2 normalize for each group x0, x1 ...
y = x * gamma + beta

one_blob_only
support_inplace

param id	name	type	default	description
0	group	int	1
1	channels	int	0
2	eps	float	0.001f	x = x / sqrt(var + eps)
3	affine	int	1

weight	type	shape
gamma_data	float	[channels]
beta_data	float	[channels]

import torch
import torch.nn as nn# 定義一個輸入張量
input_tensor = torch.randn(1, 6, 4, 4)  # (batch_size, num_channels, height, width)# 使用GroupNorm，假設分成2組
num_groups = 2
group_norm = nn.GroupNorm(num_groups, 6)  # num_groups為組數，6為輸入通道數# 對輸入張量進行GroupNorm操作
output = group_norm(input_tensor)# 打印輸入輸出形狀
print("Input shape:", input_tensor.shape)
print("Output shape after GroupNorm:", output.shape)
# Input shape: torch.Size([1, 6, 4, 4])
# Output shape after GroupNorm: torch.Size([1, 6, 4, 4])

41.GRU: 門控循環單元（GRU）神經網絡層。

????????是一種常用的遞歸神經網絡（RNN）變體，用于處理序列數據。與標準RNN相比，GRU引入了門控機制，有助于更好地捕捉長期依賴關系

Apply a single-layer GRU to a feature sequence of?T?timesteps. The input blob shape is?[w=input_size, h=T]?and the output blob shape is?[w=num_output, h=T].

y = gru(x)
y0, hidden y1 = gru(x0, hidden x1)

one_blob_only if bidirectional

param id	name	type	description
0	num_output	int	hidden size of output
1	weight_data_size	int	total size of weight matrix
2	direction	int	0=forward, 1=reverse, 2=bidirectional

weight	type	shape
weight_xc_data	float/fp16/int8	[input_size, num_output * 3, num_directions]
bias_c_data	float/fp16/int8	[num_output, 4, num_directions]
weight_hc_data	float/fp16/int8	[num_output, num_output * 3, num_directions]

Direction flag:

0 = forward only
1 = reverse only
2 = bidirectional

import torch
import torch.nn as nn# 假設輸入維度為3，隱藏單元數為4
input_size = 3
hidden_size = 4# 定義一個GRU層
gru = nn.GRU(input_size, hidden_size)  # 默認情況下，沒有指定層數，默認為單層# 定義一個輸入序列，假設序列長度為2，批量大小為1
input_seq = torch.randn(2, 1, 3)  # (seq_len, batch_size, input_size)# 初始化隱藏狀態
hidden = torch.zeros(1, 1, 4)  # (num_layers, batch_size, hidden_size)# 將輸入序列傳遞給GRU層
output, hidden = gru(input_seq, hidden)# 打印輸出和隱藏狀態的形狀
print("Output shape:", output.shape)  # (seq_len, batch_size, num_directions * hidden_size)
print("Hidden state shape:", hidden.shape)  # (num_layers * num_directions, batch, hidden_size)
# Output shape: torch.Size([2, 1, 4])
# Hidden state shape: torch.Size([1, 1, 4])

42.HardSigmoid: 應用硬Sigmoid激活函數。

在神經網絡中通常用于限制神經元的激活范圍。與標準的 Sigmoid 函數相比，HardSigmoid?是一種更簡單和高效的近似函數，通常用于加速模型的訓練過程

y = clamp(x * alpha + beta, 0, 1)

one_blob_only
support_inplace

param id	name	type	default	description
0	alpha	float	0.2f
1	beta	float	0.5f

import torch
import torch.nn.functional as F# 定義輸入張量
input_tensor = torch.randn(3, 4)  # 假設輸入張量大小為3x4# 使用HardSigmoid激活函數
output = F.hardsigmoid(input_tensor)  # HardSigmoid(x) = clip(0.2*x + 0.5, 0, 1)# 打印輸入和輸出張量
print("Input tensor:")
print(input_tensor)
# Input tensor:
# tensor([[ 0.5026,  0.6612, -0.0961,  1.9332],
#         [-0.8780, -0.4930, -0.2804, -0.0440],
#         [ 1.2866, -1.9575,  0.7738, -0.8340]])
print("\nOutput tensor after HardSigmoid:")
print(output)
# Output tensor after HardSigmoid:
# tensor([[0.5838, 0.6102, 0.4840, 0.8222],
#         [0.3537, 0.4178, 0.4533, 0.4927],
#         [0.7144, 0.1738, 0.6290, 0.3610]])

43.HardSwish: 應用硬Swish激活函數。

y = x * clamp(x * alpha + beta, 0, 1)

one_blob_only
support_inplace

param id	name	type	default	description
0	alpha	float	0.2f
1	beta	float	0.5f

import torch
import torch.nn.functional as F# 定義 HardSwish 激活函數
def hardswish(x):return x * F.hardsigmoid(x + 3, inplace=True)# 創建一個張量作為輸入
input_tensor = torch.randn(3, 4)  # 假設輸入張量大小為 3x4# 應用 HardSwish 激活函數
output = hardswish(input_tensor)# 打印輸入張量和輸出張量
print("Input tensor:")
print(input_tensor)
print("\nOutput tensor after HardSwish:")
print(output)
# Input tensor:
# tensor([[ 0.4330, -1.9232,  1.9127,  0.6024],
#         [-0.2073,  0.1116, -0.6153,  0.5362],
#         [-1.4893,  0.0764, -0.1484, -0.0945]])
# 
# Output tensor after HardSwish:
# tensor([[ 0.4330, -1.3068,  1.9127,  0.6024],
#         [-0.2001,  0.1116, -0.5522,  0.5362],
#         [-1.1197,  0.0764, -0.1447, -0.0930]])

44.InnerProduct: 執行全連接操作。

將輸入的所有特征連接到輸出層的每個神經元，實現了每個神經元與前一層的所有神經元之間的連接

x2 = innerproduct(x, weight) + bias
y = activation(x2, act_type, act_params)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	bias_term	int	0
2	weight_data_size	int	0
8	int8_scale_term	int	0
9	activation_type	int	0
10	activation_params	array	[ ]

weight	type	shape
weight_data	float/fp16/int8	[num_input, num_output]
bias_data	float	[num_output]
weight_data_int8_scales	float	[num_output]
bottom_blob_int8_scales	float	[1]

import torch
import torch.nn as nnclass InnerProduct(nn.Module):def __init__(self, in_features, out_features):super(InnerProduct, self).__init__()self.fc = nn.Linear(in_features, out_features)def forward(self, x):return self.fc(x)# 創建一個 InnerProduct 層
inner_product_layer = InnerProduct(100, 200)  # 假設輸入特征維度為 100，輸出特征維度為 200# 定義輸入數據
input_data = torch.randn(1, 100)  # 假設輸入數據為 1 組，每組包含 100 個特征# 運行 InnerProduct 層
output = inner_product_layer(input_data)
print(output.shape)  # 輸出特征的形狀
# torch.Size([1, 200])

45.Input: 神經網絡的輸入層

y = input

support_inplace

param id	name	type
0	w	int
1	h	int
11	d	int
2	c	int

46.InstanceNorm: 歸一化操作

一種歸一化技術，通常用于神經網絡中的層級操作。實例歸一化獨立地標準化每個樣本的特征，而不是整個批次的特征。這種歸一化方式可以幫助模型更好地學習特征表示，提高收斂速度并加速訓練

split x along channel axis into instance x0, x1 ...
l2 normalize for each channel instance x0, x1 ...
y = x * gamma + beta

one_blob_only
support_inplace

param id	name	type	default	description
0	channels	int	0
1	eps	float	0.001f	x = x / sqrt(var + eps)
2	affine	int	1

weight	type	shape
gamma_data	float	[channels]
beta_data	float	[channels]

import torch
import torch.nn as nn# 創建一個實例歸一化層
instance_norm_layer = nn.InstanceNorm2d(3)  # 通道數為 3# 隨機生成一組特征圖作為輸入數據
input_data = torch.randn(1, 3, 224, 224)  # 假設輸入數據為 1 組，通道數為 3，圖像尺寸為 224x224# 運行實例歸一化層
output = instance_norm_layer(input_data)print(output.shape)  # 輸出特征的形狀
# torch.Size([1, 3, 224, 224])

47.Interp: 執行插值操作

在計算機視覺領域，插值通常用于調整圖像的大小，從而實現圖像的放大、縮小或者調整分辨率等操作

if dynamic_target_size == 0     y = resize(x) by fixed size or scale
else                            y = resize(x0, size(x1))

one_blob_only if dynamic_target_size == 0

param id	name	type	default
0	resize_type	int	0
1	height_scale	float	1.f
2	width_scale	float	1.f
3	output_height	int	0
4	output_width	int	0
5	dynamic_target_size	int	0
6	align_corner	int	0

Resize type:

1 = Nearest?最近鄰插值?最近鄰插值是一種簡單的插值方法，它將目標圖像中每個像素的值設置為其在原始圖像中最近的像素的值。這種方法適用于像素級別的映射，但可能會導致圖像呈現邊緣鋸齒狀的情況
2 = Bilinear?雙線性插值?雙線性插值是一種常見的插值方法，它根據目標圖像中的位置對原始圖像中的四個最近像素進行線性插值。這種方法能夠提供比最近鄰插值更平滑的圖像結果。
3 = Bicubic?雙三次插值?雙三次插值是一種更復雜的插值方法，它會在目標圖像的像素周圍選擇16個像素進行加權平均，以生成新像素的值。這種方法在保留圖像細節的同時，也會增加計算復雜度

import torch
import torch.nn.functional as F# 創建一個隨機的特征圖作為輸入數據
input_data = torch.randn(1, 3, 224, 224)  # 假設輸入數據為 1 組，通道數為 3，圖像尺寸為 224x224# 執行雙線性插值將圖像大小調整到 300x300
output = F.interpolate(input_data, size=(300, 300), mode='bilinear', align_corners=False)print(output.shape)  # 輸出特征的形狀
# torch.Size([1, 3, 300, 300])

48.LayerNorm: 對神經網絡中的層執行歸一化操作

是一種用于神經網絡中的歸一化技術，與 Batch Normalization 不同，Layer Normalization 是對單個樣本的特征進行標準化，而不是對整個批次。層歸一化有助于減少內部協變量偏移，從而加速網絡訓練過程并提高泛化性能

split x along outmost axis into part x0, x1 ...
l2 normalize for each part x0, x1 ...
y = x * gamma + beta by elementwise

one_blob_only
support_inplace

param id	name	type	default	description
0	affine_size	int	0
1	eps	float	0.001f	x = x / sqrt(var + eps)
2	affine	int	1

weight	type	shape
gamma_data	float	[affine_size]
beta_data	float	[affine_size]

import torch
import torch.nn as nn# 創建一個層歸一化模塊
layer_norm = nn.LayerNorm(256)  # 輸入特征的尺寸為 256# 隨機生成一組特征作為輸入數據
input_data = torch.randn(4, 256)  # 假設輸入數據為 4 組，每組特征的尺寸為 256# 運行層歸一化模塊
output = layer_norm(input_data)print(output.shape)  # 輸出特征的形狀
# torch.Size([4, 256])

49.Log: 計算輸入數據的自然對數。

if base == -1   y = log(shift + x * scale)
else            y = log(shift + x * scale) / log(base)

one_blob_only
support_inplace

param id	name	type	default
0	base	float	-1.f
1	scale	float	1.f
2	shift	float	0.f

50.LRN: 局部響應歸一化層。

一種局部歸一化的方法，用于一些深度學習模型中，旨在模擬生物神經元系統中的側抑制機制。LRN 主要用于提升模型的泛化能力，防止模型過擬合

if region_type == ACROSS_CHANNELS   square_sum = sum of channel window of local_size
if region_type == WITHIN_CHANNEL    square_sum = sum of spatial window of local_size
y = x * pow(bias + alpha * square_sum / (local_size * local_size), -beta)

one_blob_only
support_inplace

param id	name	type	default
0	region_type	int	0
1	local_size	int	5
2	alpha	float	1.f
3	beta	float	0.75f
4	bias	float	1.f

Region type:

0 = ACROSS_CHANNELS
1 = WITHIN_CHANNEL

import torch
import torch.nn as nnclass LRN(nn.Module):def __init__(self, size=5, alpha=1e-4, beta=0.75, k=1.0):super(LRN, self).__init__()self.size = sizeself.alpha = alphaself.beta = betaself.k = kdef forward(self, x):squared = x.pow(2)pool = nn.functional.avg_pool2d(squared, self.size, stride=1, padding=self.size//2)denom = self.k + self.alpha * pooloutput = x / denom.pow(self.beta)return output# 創建一個 LRN 模塊實例
lrn = LRN(size=3, alpha=1e-4, beta=0.75, k=1.0)# 隨機生成一組特征作為輸入數據
input_data = torch.randn(1, 3, 224, 224)  # 假設輸入數據為 1 組，通道數為 3，圖像尺寸為 224x224# 運行 LRN 模塊
output = lrn(input_data)print(output.shape)  # 輸出特征的形狀
# torch.Size([1, 3, 224, 224])

51.LSTM: 長短期記憶（LSTM）神經網絡層。

????????是一種常用的循環神經網絡（RNN）變體，專門設計用來解決傳統 RNN 中遇到的長期依賴問題。LSTM 的設計使其能夠更好地捕捉和利用長期序列中的依賴關系，適用于處理時間序列數據、自然語言處理等任務。

Apply a single-layer LSTM to a feature sequence of?T?timesteps. The input blob shape is?[w=input_size, h=T]?and the output blob shape is?[w=num_output, h=T].

y = lstm(x)
y0, hidden y1, cell y2 = lstm(x0, hidden x1, cell x2)

one_blob_only if bidirectional

param id	name	type	default	description
0	num_output	int	0	output size of output
1	weight_data_size	int	0	total size of IFOG weight matrix
2	direction	int	0	0=forward, 1=reverse, 2=bidirectional
3	hidden_size	int	num_output	hidden size

weight	type	shape
weight_xc_data	float/fp16/int8	[input_size, hidden_size * 4, num_directions]
bias_c_data	float/fp16/int8	[hidden_size, 4, num_directions]
weight_hc_data	float/fp16/int8	[num_output, hidden_size * 4, num_directions]
weight_hr_data	float/fp16/int8	[hidden_size, num_output, num_directions]

Direction flag:

0 = forward only
1 = reverse only
2 = bidirectional

52.MemoryData: 用于存儲數據并生成數據迭代器。

用于在模型中定義一個固定大小的內存數據塊。MemoryData 層通常用于存儲一些固定的參數或中間數據，以便在模型前向推理過程中進行使用。

y = data

param id	name	type	default	description
0	w	int	0
1	h	int	0
11	d	int	0
2	c	int	0
21	load_type	int	1	1=fp32

weight	type	shape
data	float	[w, h, d, c]

53.Mish: 應用Mish激活函數。

Mish 激活函數的形式相對簡單，但由于其使用了雙曲正切函數和軟加函數的組合，可以在一定程度上克服一些常見激活函數的問題，如梯度消失和梯度爆炸。

y = x * tanh(log(exp(x) + 1))

one_blob_only
support_inplace

54.MultiHeadAttention: 多頭注意力機制。

多頭注意力機制是注意力機制的一種擴展形式，旨在充分利用不同“頭”（獨立的子空間）來對輸入的序列進行不同方式的關注和表示。每個“頭”都學習關注輸入序列中不同的部分，從而能夠更好地捕捉序列中的不同特征和關系。

split q k v into num_head part q0, k0, v0, q1, k1, v1 ...
for each num_head partxq = affine(q) / (embed_dim / num_head)xk = affine(k)xv = affine(v)xqk = xq * xkxqk = xqk + attn_mask if attn_mask existssoftmax_inplace(xqk)xqkv = xqk * xvmerge xqkv to out
y = affine(out)

param id	name	type	default
0	embed_dim	int	0
1	num_heads	int	1
2	weight_data_size	int	0
3	kdim	int	embed_dim
4	vdim	int	embed_dim
5	attn_mask	int	0

weight	type	shape
q_weight_data	float/fp16/int8	[weight_data_size]
q_bias_data	float	[embed_dim]
k_weight_data	float/fp16/int8	[embed_dim * kdim]
k_bias_data	float	[embed_dim]
v_weight_data	float/fp16/int8	[embed_dim * vdim]
v_bias_data	float	[embed_dim]
out_weight_data	float/fp16/int8	[weight_data_size]
out_bias_data	float	[embed_dim]

55.MVN: 均值方差歸一化操作。

if normalize_variance == 1 && across_channels == 1      y = (x - mean) / (sqrt(var) + eps) of whole blob
if normalize_variance == 1 && across_channels == 0      y = (x - mean) / (sqrt(var) + eps) of each channel
if normalize_variance == 0 && across_channels == 1      y = x - mean of whole blob
if normalize_variance == 0 && across_channels == 0      y = x - mean of each channel

one_blob_only

param id	name	type	default	description
0	normalize_variance	int	0
1	across_channels	int	0
2	eps	float	0.0001f	x = x / (sqrt(var) + eps)

56.Noop: 空操作

空操作，不對輸入做任何操作

y = x

57.Normalize: 歸一化操作

對輸入數據進行歸一化操作

if across_spatial == 1 && across_channel == 1      x2 = normalize(x) of whole blob
if across_spatial == 1 && across_channel == 0      x2 = normalize(x) of each channel
if across_spatial == 0 && across_channel == 1      x2 = normalize(x) of each position
y = x2 * scale

one_blob_only
support_inplace

param id	name	type	default	description
0	across_spatial	int	0
1	channel_shared	int	0
2	eps	float	0.0001f	see eps mode
3	scale_data_size	int	0
4	across_channel	int	0
9	eps_mode	int	0

weight	type	shape
scale_data	float	[scale_data_size]

Eps Mode:

0 = caffe/mxnet x = x / sqrt(var + eps)
1 = pytorch x = x / max(sqrt(var), eps)
2 = tensorflow x = x / sqrt(max(var, eps))

58.Packing: 打包操作

用于高效處理圖像張量數據

y = wrap_packing(x)

one_blob_only

param id	name	type	default
0	out_elempack	int	1
1	use_padding	int	0
2	cast_type_from	int	0
3	cast_type_to	int	0
4	storage_type_from	int	0
5	storage_type_to	int	0

59.Padding: 填充操作

對輸入數據進行填充操作

y = pad(x, pads)

param id	name	type	default
0	top	int	0
1	bottom	int	0
2	left	int	0
3	right	int	0
4	type	int	0
5	value	float	0
6	per_channel_pad_data_size	int	0
7	front	int	stride_w
8	behind	int	pad_left

weight	type	shape
per_channel_pad_data	float	[per_channel_pad_data_size]

Padding type:

0 = CONSTANT
1 = REPLICATE
2 = REFLECT

60.Permute: 置換操作

對輸入數據的維度進行排列操作

指的是重新排列數據或張量中的維度，以改變數據的排列順序或維度順序。這樣的操作可以對數據進行重構以適應不同的模型或算法的需求，也可以在處理序列數據時對特定維度進行調整。

y = reorder(x)

param id	name	type	default	description
0	order_type	int	0

Order Type:排列類型如下（?W-寬 H-高 C-通道? D-次數）

默認：NCHW 格式? DCHW

如下：WH代表二維，WHC代表三維替換，WHDC代表4維，最高支持4維，最低2維

0 = WH WHC WHDC?
1 = HW HWC HWDC
2 = WCH WDHC
3 = CWH DWHC
4 = HCW HDWC
5 = CHW DHWC
6 = WHCD
7 = HWCD
8 = WCHD
9 = CWHD
10 = HCWD
11 = CHWD
12 = WDCH
13 = DWCH
14 = WCDH
15 = CWDH
16 = DCWH
17 = CDWH
18 = HDCW
19 = DHCW
20 = HCDW
21 = CHDW
22 = DCHW
23 = CDHW

61.PixelShuffle: 像素重組

執行像素重排操作，用于實現像素重排。這種操作通常用于超分辨率重建或者圖像生成領域

if mode == 0    y = depth_to_space(x) where x channel order is sw-sh-outc
if mode == 1    y = depth_to_space(x) where x channel order is outc-sw-sh

one_blob_only

param id	name	type	default	description
0	upscale_factor	int	1
1	mode	int	0

PixelShuffle 操作將輸入張量中的通道分組，然后對每個分組內的像素進行重排，從而增加圖像的分辨率。在每個分組內部，PixelShuffle 操作會將多個低分辨率通道重組成一個高分辨率通道。

PixelShuffle 的主要優點是可以在不引入額外參數的情況下增加圖像的分辨率，這使得神經網絡在圖像超分辨率重建等任務上表現更加出色

62.Pooling: 池化操作

執行池化操作，降低特征圖維度

x2 = pad(x, pads)
x3 = pooling(x2, kernel, stride)

param id	name	type	default
0	pooling_type	int	0
1	kernel_w	int	0
2	stride_w	int	1
3	pad_left	int	0
4	global_pooling	int	0
5	pad_mode	int	0
6	avgpool_count_include_pad	int	0
7	adaptive_pooling	int	0
8	out_w	int	0
11	kernel_h	int	kernel_w
12	stride_h	int	stride_w
13	pad_top	int	pad_left
14	pad_right	int	pad_left
15	pad_bottom	int	pad_top
18	out_h	int	out_w

Pooling type:

0 = MAX
1 = AVG

Pad mode:

0 = full padding
1 = valid padding
2 = tensorflow padding=SAME or onnx padding=SAME_UPPER
3 = onnx padding=SAME_LOWER

63.Pooling1D: 一維池化操作

在一維數據上執行池化操作

x2 = pad(x, pads)
x3 = pooling1d(x2, kernel, stride)

param id	name	type	default
0	pooling_type	int	0
1	kernel_w	int	0
2	stride_w	int	1
3	pad_left	int	0
4	global_pooling	int	0
5	pad_mode	int	0
6	avgpool_count_include_pad	int	0
7	adaptive_pooling	int	0
8	out_w	int	0
14	pad_right	int	pad_left

Pooling type:

0 = MAX
1 = AVG

Pad mode:

0 = full padding
1 = valid padding
2 = tensorflow padding=SAME or onnx padding=SAME_UPPER
3 = onnx padding=SAME_LOWER

64.Pooling3D: 三維池化操作

在三維數據上執行池化操作

x2 = pad(x, pads)
x3 = pooling3d(x2, kernel, stride)

param id	name	type	default
0	pooling_type	int	0
1	kernel_w	int	0
2	stride_w	int	1
3	pad_left	int	0
4	global_pooling	int	0
5	pad_mode	int	0
6	avgpool_count_include_pad	int	0
7	adaptive_pooling	int	0
8	out_w	int	0
11	kernel_h	int	kernel_w
12	stride_h	int	stride_w
13	pad_top	int	pad_left
14	pad_right	int	pad_left
15	pad_bottom	int	pad_top
16	pad_behind	int	pad_front
18	out_h	int	out_w
21	kernel_d	int	kernel_w
22	stride_d	int	stride_w
23	pad_front	int	pad_left
28	out_d	int	out_w

Pooling type:

0 = MAX
1 = AVG

Pad mode:

0 = full padding
1 = valid padding
2 = tensorflow padding=SAME or onnx padding=SAME_UPPER
3 = onnx padding=SAME_LOWER

65.Power: 冪運算

對輸入數據執行冪運算

y = pow((shift + x * scale), power)

one_blob_only
support_inplace

param id	name	type	default
0	power	float	1.f
1	scale	float	1.f
2	shift	float	0.f

66.PReLU: 參數化修正線性單元

????????在傳統的ReLU中，當輸入值小于0時，激活函數的輸出始終為0。而在PReLU中，當輸入值小于0時，激活函數的輸出不再是固定的0，而是一個小的線性函數，其斜率是可學習的參數，即一個非零值

if x < 0    y = x * slope
else        y = x

one_blob_only
support_inplace

param id	name	type	default	description
0	num_slope	int	0

weight	type	shape
slope_data	float	[num_slope]

67.Quantize: 量化操作

?量化是將神經網絡中的參數和/或激活值從較高精度（比如32位浮點數）轉換為較低精度（比如8位整數）的過程。這一過程有助于減少模型的存儲消耗和計算成本，并且在一定程度上可以提高模型的運行速度

y = float2int8(x * scale)

one_blob_only

param id	name	type	default	description
0	scale_data_size	int	1

weight	type	shape
scale_data	float	[scale_data_size]

68.Reduction: 執行張量的降維操作

進行聚合操作或降維操作

y = reduce_op(x * coeff)

one_blob_only

param id	name	type	default	description
0	operation	int	0
1	reduce_all	int	1
2	coeff	float	1.f
3	axes	array	[ ]
4	keepdims	int	0
5	fixbug0	int	0	hack for bug fix, should be 1

Operation type:

0 = SUM?（求和）：將張量中所有元素相加，得到一個標量值。
1 = ASUM（絕對值求和）：?將張量中所有元素的絕對值相加，得到一個標量值。
2 = SUMSQ?（平方和）：?將張量中所有元素的平方相加，得到一個標量值。
3 = MEAN?（均值）：?計算張量中所有元素的平均值，得到一個標量值
4 = MAX?（最大值）：?找出張量中的最大值，并返回一個標量值。
5 = MIN（最小值）：?找出張量中的最小值，并返回一個標量值。
6 = PROD（乘積）：??計算張量中所有元素的乘積，得到一個標量值。
7 = L1?（L1范數）：計算張量中所有元素的L1范數（絕對值的和），得到一個標量值。
8 = L2（L2范數）：?計算張量中所有元素的L2范數（平方和后開根號），得到一個標量值。
9 = LogSum（對數求和）：?對張量中的元素取對數后相加，得到一個標量值。
10 = LogSumExp對數指數求和）：?對張量中的元素先分別取指數，再取對數后相加，得到一個標量值。

69.ReLU: 應用修正線性單元（ReLU）激活函數。

????????ReLU函數對輸入值進行處理，如果輸入值小于零，則輸出為零；如果輸入值大于零，則輸出與輸入相同

if x < 0    y = x * slope
else        y = x

one_blob_only
support_inplace

param id	name	type	default	description
0	slope	float	0.f

70.Reorg: 通道重排操作

將輸入張量的通道重新排列，實現通道數變化和數據重組，從而滿足特定的網絡結構要求。通常情況下，Reorg操作會改變張量的通道數、高度和寬度，同時保持數據不變

if mode == 0    y = space_to_depth(x) where x channel order is sw-sh-outc
if mode == 1    y = space_to_depth(x) where x channel order is outc-sw-sh

one_blob_only

param id	name	type	default	description
0	stride	int	1
1	mode	int	0

71.Requantize: 重新量化（再量化）

就是對量化的數據進再量化，一般Quantize從f32 到 int8 ，Requantize 從int32 到int8

x2 = x * scale_in + bias
x3 = activation(x2)
y = float2int8(x3 * scale_out)

one_blob_only

param id	name	type	default
0	scale_in_data_size	int	1
1	scale_out_data_size	int	1
2	bias_data_size	int	0
3	activation_type	int	0
4	activation_params	int	[ ]

weight	type	shape
scale_in_data	float	[scale_in_data_size]
scale_out_data	float	[scale_out_data_size]
bias_data	float	[bias_data_size]

72.Reshape: 形狀重塑操作

對輸入數據進行形狀重塑操作

操作通常用于調整神經網絡中層的輸入輸出張量的形狀，以適應不同層之間的連接需求或更改數據的維度

if permute == 1     y = hwc2chw(reshape(chw2hwc(x)))
else                y = reshape(x)

one_blob_only

param id	name	type	default
0	w	int	-233
1	h	int	-233
11	d	int	-233
2	c	int	-233
3	permute	int	0

Reshape flag:

0 = copy from bottom （當維度值為0時，表示從底部（原始維度）復制維度值。換句話說，保留原始張量的相應維度值）
-1 = remaining （維度值為-1時，表示保持剩余的維度不變。這意味著在進行reshape操作時，會根據其他指定的維度值，自動計算并保持剩余的維度值）
-233 = drop this dim(default)（維度值為-233時，表示丟棄該維度。在進行reshape操作時，將會將指定維度值設為-233，這樣就會將該維度丟棄，從而改變張量的形狀）

73.RNN: 循環神經網絡（RNN）層。

Apply a single-layer RNN to a feature sequence of?T?timesteps. The input blob shape is?[w=input_size, h=T]?and the output blob shape is?[w=num_output, h=T].

將單層 RNN 應用于一個包含 T 個時間步的特征序列。輸入的數據形狀為 [w=input_size, h=T]，輸出的數據形狀為 [w=num_output, h=T]。

y = rnn(x)
y0, hidden y1 = rnn(x0, hidden x1)

one_blob_only if bidirectional

param id	name	type	description
0	num_output	int	hidden size of output
1	weight_data_size	int	total size of weight matrix
2	direction	int	0=forward, 1=reverse, 2=bidirectional

weight	type	shape
weight_xc_data	float/fp16/int8	[input_size, num_output, num_directions]
bias_c_data	float/fp16/int8	[num_output, 1, num_directions]
weight_hc_data	float/fp16/int8	[num_output, num_output, num_directions]

Direction flag:

0 = forward only?只允許向前移動
1 = reverse only?只允許向后移動
2 = bidirectional?允許雙向移動

74.Scale: 縮放操作

????????操作通常用于調整權重、偏置或特征圖等參數的數值大小，以影響模型的學習效率、性能和收斂速度

if scale_data_size == -233  y = x0 * x1
else                        y = x * scale + bias

one_blob_only if scale_data_size != -233
support_inplace

param id	name	type	default	description
0	scale_data_size	int	0
1	bias_term	int	0

weight	type	shape
scale_data	float	[scale_data_size]
bias_data	float	[scale_data_size]

75.SELU: 應用自歸一化激活函數

是一種激活函數。SELU激活函數最初由Hochreiter等人在2017年提出，被設計用于神經網絡的隱藏層，與其他激活函數（如ReLU、sigmoid、tanh）相比，SELU具有一些獨特的性質和優勢。

$\lambda$ ?= 1.0507 和? $\alpha$ ?= 1.67326

SELU激活函數具有以下特點：

自歸一化性質（self-normalizing）：?在一定條件下，使用SELU激活函數可以使得神經網絡自我歸一化，有助于緩解梯度消失或爆炸問題，提高網絡訓練的穩定性。
非線性特性：?SELU在激活過程中引入了非線性，有助于神經網絡學習復雜的數據模式和特征。
穩定性和魯棒性：?SELU對于輸入值的變化相對穩定，在一定程度上增強了網絡的魯棒性。

if x < 0    y = (exp(x) - 1.f) * alpha * lambda
else        y = x * lambda

one_blob_only
support_inplace

param id	name	type	default	description
0	alpha	float	1.67326324f
1	lambda	float	1.050700987f

76.Shrink: 對輸入數據進行收縮操作

操作通常用于減少量化后張量數據的尺寸，以便在神經網絡計算中更有效地處理數據

if x < -lambd y = x + bias
if x >  lambd y = x - bias
else          y = x

one_blob_only
support_inplace

param id	name	type	default	description
0	bias	float	0.0f
1	lambd	float	0.5f

77.ShuffleChannel: 通道混洗操作

會將輸入張量的通道進行重新排列，以改變數據的通道順

將輸入張量按照一定規則分割成若干個通道組。
對這些通道組進行重新排列。
將重新排列后的通道重新組合成最終的輸出張量。

if reverse == 0     y = shufflechannel(x) by group
if reverse == 1     y = shufflechannel(x) by channel / group

one_blob_only

param id	name	type	default	description
0	group	int	1
1	reverse	int	0

78.Sigmoid: 應用Sigmoid激活函數

它將任意實數映射到一個取值范圍在 0 到 1 之間的實數

Sigmoid函數曾經被廣泛用于隱藏層的激活函數，但后來由于存在梯度消失和飽和性的問題，逐漸被ReLU等激活函數取代

y = 1 / (1 + exp(-x))

one_blob_only
support_inplace

79.Slice: 分割操作

操作通常用于從輸入張量中獲取指定范圍內的子張量或子數組。

Slice操作可以根據用戶指定的起始索引和結束索引以及步長，從輸入張量中提取出一個子張量。這個子張量通常是原始張量的一個子集，用于在神經網絡中的特定層或模塊中進一步處理

split x along axis into slices, each part slice size is based on slices array

param id	name	type	default	description
0	slices	array	[ ]	切片數組
1	axis	int	0	軸
2	indices	array	[ ]????????

80.Softmax: 應用Softmax激活函數，通常用于分類任務。

將模型的原始輸出轉換為表示概率分布的形式

softmax(x, axis)

one_blob_only
support_inplace

param id	name	type	default	description
0	axis	int	0
1	fixbug0	int	0	hack for bug fix, should be 1

import torch
import torch.nn.functional as F# 定義一個示例原始輸出張量
logits = torch.tensor([2.0, 1.0, 0.1])# 使用 torch.nn.functional.softmax 進行Softmax操作
probabilities = F.softmax(logits, dim=0)# 打印轉換后的概率分布
print("Softmax輸出概率分布:")
print(probabilities)
# Softmax輸出概率分布:
# tensor([0.6590, 0.2424, 0.0986])

81.Softplus: 應用Softplus激活函數。

softplus(x)=log(1+ex)

Softplus函數可以將輸入的任何實數映射到一個大于零的實數范圍內

Softplus函數的特點是它在輸入值為負數時會接近于0，而在輸入值為正數時會保持增長。與 ReLU 函數類似，Softplus函數也具有非線性特性，有助于增加神經網絡的表達能力

y = log(exp(x) + 1)

one_blob_only
support_inplace

import torch
import torch.nn.functional as F# 定義一個示例輸入張量
x = torch.tensor([-2.0, 0.0, 2.0])# 使用 torch.nn.functional.softplus 進行Softplus操作
output = F.softplus(x)# 打印Softplus函數的輸出
print("Softplus輸出:")
print(output)
# Softplus輸出:
# tensor([0.1269, 0.6931, 2.1269])

82.Split: 將輸入數據分割為多個部分。

直接把輸入數據復制多份，此處應該直接就是指針引用

y0, y1 ... = x

83.Swish: swish激活函數

應用Swish激活函數

y = x / (1 + exp(-x))

one_blob_only
support_inplace

84.TanH: TanH激活函數

應用雙曲正切（tanh）激活函數

y = tanh(x)

one_blob_only
support_inplace

85.Threshold: 閾值操作

對輸入數據應用閾值操作

if x > threshold    y = 1
else                y = 0

one_blob_only
support_inplace

param id	name	type	default	description
0	threshold	float	0.f

86.Tile: 重復復制

????????是指在張量的維度上重復其內容以擴展張量的尺寸。重復操作允許您在指定的維度上復制張量中的數據，從而增加該維度的大小。

y = repeat tiles along axis for x

one_blob_only

param id	name	type	default	description
0	axis	int	0	軸
1	tiles	int	1	次數
2	repeats	array	[ ]????????

import torch# 創建一個示例張量
x = torch.tensor([[1, 2],[3, 4]])# 定義參數
params = {"axis": 0, "tiles": 2, "repeats": [2, 1]}# 獲取參數值
axis = params["axis"]
tiles = params["tiles"]
repeats = params["repeats"]# 在指定的軸上重復張量內容
y = x.repeat(repeats[0] if axis == 0 else 1, repeats[1] if axis == 1 else 1)# 輸出結果
print(y)
# tensor([[1, 2],
#         [3, 4],
#         [1, 2],
#         [3, 4]])

87.UnaryOp: 對輸入執行一元操作。

一元操作通常涉及對輸入進行轉換、變換或提取特定信息，而不涉及多個輸入之間的操作

y = unaryop(x)

one_blob_only
support_inplace

param id	name	type	default	description
0	op_type	int	0	Operation type as follows

Operation type:

0 = ABS（絕對值）：返回輸入的絕對值。
1 = NEG（負值）：返回輸入的負值。
2 = FLOOR（向下取整）：返回不大于輸入值的最大整數。
3 = CEIL（向上取整）：返回不小于輸入值的最小整數
4 = SQUARE（平方）：返回輸入值的平方。
5 = SQRT（平方根）：返回輸入的平方根。
6 = RSQ（倒數平方根）：返回輸入值的倒數的平方根。
7 = EXP（指數）：返回以 e 為底的輸入值的指數。
8 = LOG（對數）：返回輸入值的自然對數。
9 = SIN（正弦）：返回輸入值的正弦值。
10 = COS（余弦）：返回輸入值的余弦值。
11 = TAN（正切）：返回輸入值的正切值。
12 = ASIN（反正弦）：返回輸入值的反正弦值
13 = ACOS（反余弦）：返回輸入值的反余弦值。
14 = ATAN（反正切）：返回輸入值的反正切值。
15 = RECIPROCAL（倒數）：返回輸入值的倒數。
16 = TANH（雙曲正切）：返回輸入值的雙曲正切值。
17 = LOG10（以10為底的對數）：返回輸入值的以10為底的對數。
18 = ROUND（四舍五入）：返回輸入值四舍五入的結果。
19 = TRUNC（截斷）：返回輸入值的整數部分。

88.Unfold: 在輸入數據上執行展開操作。

從一個批次的輸入張量中提取出滑動的局部區域塊

y = unfold(x)

one_blob_only

param id	name	type	default
0	num_output	int	0
1	kernel_w	int	0
2	dilation_w	int	1
3	stride_w	int	1
4	pad_left	int	0
11	kernel_h	int	kernel_w
12	dilation_h	int	dilation_w
13	stride_h	int	stride_w
14	pad_top	int	pad_left
15	pad_right	int	pad_left
16	pad_bottom	int	pad_top

import torch# 創建一個3x3的張量作為示例輸入
input_tensor = torch.tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]])# 在第一個維度上展開，窗口大小為2，步長為1
unfolded_tensor = input_tensor.unfold(0, 2, 1)print('Input Tensor:\n', input_tensor)
# tensor([[1, 2, 3],
#         [4, 5, 6],
#         [7, 8, 9]])
print('Unfolded Tensor:\n', unfolded_tensor,"\nshape:",unfolded_tensor.shape)
# tensor([[[1, 4],
#          [2, 5],
#          [3, 6]],
#
#         [[4, 7],
#          [5, 8],
#          [6, 9]]])
# shape: torch.Size([2, 3, 2])