因为科研需求，从tensorflow转战pytorch，U1S1 pytorch确实上手容易多了。容易到把之前的几个tensorflow算法都用pytorch重构了下，等放假有时间都给整到树莓派上玩玩。此贴日常更新一些pytorch小技巧，主要是脑子笨记不住，以后有个地方好查。

正文

测试安装

import torch
torch.__version__ # 查看版本
# 检查cuda
torch.cuda.is_available()
torch.version.cuda
# 检查cudnn
torch.backends.cudnn.is_available()
torch.backends.cudnn.version()

模型固化

将模型和参数固化到一个文件，可以跨平台直接调用，本来是想把之前做的人脸识别固化后丢树莓派上然后和家里的摄像头联动做个智能安防到，一直没时间。
1.头文件

1	`import torch`

1.网络和参数固化

net = Net() # 创建网络
net = torch.load('./model/Resnet50_scriptmodule.pth') # 加载参数
net = torch.jit.trace(net,torch.rand(1,3,256,256)) # 转换
net.save('scriptmodule.pt') # 保存

2.调用

1	`net = torch.jit.load('scriptmodule.pt')`

torchvision

最近做图像分类项目，发现很多常用模型torchvision中都有，而且已经预训练好，调用十分方便，所以记录下

官网上的介绍（翻墙）：The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.
翻译过来就是： torchvision包由流行的数据集、模型体系结构和通用的计算机视觉图像转换组成。简单地说就是常用数据集+常见模型+常见图像增强方法

这个torchvision中主要有包组成：

torchvision.datasets
torchvision.models
torchvision.transforms

torchvision.datssets

包含贼多的数据集，详情可见官网

CAM 图

深度学习是一个”黑盒”系统。它通过“end-to-end”的方式来工作，中间过程是不可知的，通过中间特征可视化可以对模型的数据进行一定的解释。最早的特征可视化是通过在模型最后一个conv层的Global average pooling实现，并将分类层设置为单个全连接层。通过Global average pooling的值来确定各个feature map的权重，然后累加到一起实现可视化。后来有衍生出了一系列，基于特定class label反向传播获取梯度的可视化方法，Grad-CAM。这里仅做简单使用记录，详情请看原作者readme

1	`pip install grad-cam`

头文件

import pprint # 用来打印模型结构
from pytorch_grad_cam import GradCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM, EigenCAM
from pytorch_grad_cam.utils.image import show_cam_on_image, \
                                         deprocess_image, \
                                         preprocess_image
from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget

可以在测试代码部分加入CAM代码来看梯度激活情况。
首先打印模型结构确定需要添加的位置

1	`pprint.pprint(net)`

然后设置好需要输出激活图的层名称

1	`target_layer = [net.module.swin_unet.norm_up]`

最近很火的transformer的需要自己写个reshape_transform函数

def reshape_transform(tensor, height=128, width=128): 
    result = tensor.reshape(tensor.size(0),
        height, width, tensor.size(2))

    # Bring the channels to the first dimension,
    # like in CNNs.
    result = result.transpose(2, 3).transpose(1, 2)
    return result

这里的参数height width需要根据transformer参数变动，最好的就是先设置4或者8，然后如果报错，可以根据报错信息给出的参数量计算出正确的值。

1
2

with GradCAM(model=net, target_layers=target_layer, use_cuda=True,reshape_transform=reshape_transform) as cam: # transform版
with GradCAM(model=net, target_layers=target_layer, use_cuda=True) as cam: # cnn版

根据类别输出

1	`targets = [SemanticSegmentationTarget(1, cls_mask_float)]`

SemanticSegmentationTarget是根据分割问题写的函数，具体如下：

class SemanticSegmentationTarget:
    def __init__(self, category, mask):
        self.category = category # 1
        self.mask = torch.from_numpy(mask)
        if torch.cuda.is_available():
            self.mask = self.mask.cuda()

    def __call__(self, model_output):
        return (model_output[self.category, :, :] * self.mask).sum()

最后输出CAM图，并保存

1
2
3

grayscale_cam = cam(input_tensor=input, targets=targets)
cam_image = show_cam_on_image(rgb_img, grayscale_cam[0,:], use_rgb=True,colormap=9)
cv2.imwrite(test_save_path + '/' + case_name + '.png', cam_image)

多线程并发与进程速度可视化

首先需要的包

1
2
3

from tqdm import tqdm
from multiprocessing import Pool
from functools import partial

示例

def test(a,b):
    if a>b:
        print(a)
if __name__ == '__main__':
    a_s = [1,2,3,4,5,6,7]
    b = 5
    with Pool(10) as p: # 括号里的数字是线程数量，最好不要超过cpu核数
        r=list(tqdm(p.imap(partial(test,b=b),a_s)))
        p.close()
        p.join()

多线程一定要放main函数
list需要不然不显示进度
要用imap不能用map不然不能显示进度
partial用来解决非迭代参数

Pytorch 加载多gpu模型+解决加载模型测试结果不正确

将模型加载到多gpu上的时候，会用到

1	`model = torch.nn.DataParallel(model, device_ids=[1, 2, 3])`

因此在保存模型的时候，若直接

1	`torch.save(model.state_dict(), save_dict_path)`

加载后的模型参数每个都会带有 module
如果使用strict=False的加载方式，很有可能会使测试结果不同

1	`model.load_state_dict(weights_dict, strict=False)`

解决方法1：
在保存模型的时候使用model.module保存

1	`torch.save(model.module.state_dict(), save_dict)`

加载的时候可直接加载：

1
2
3

checkpoint = torch.load('./weight/Best.pth', map_location='cpu')
model.load_state_dict(checkpoint)
model.cuda()

解决方法2：

如果模型已经保存，将模型参数字典中的module替换为空字串进行加载：

1
2
3

checkpoint = torch.load(weight_pth, map_location='cpu')
model.load_state_dict({k.replace('module.', ''): v for k, v in checkpoint.items()})
model = model.cuda()

PS:不要使用model.load_state_dict(weights_dict, strict=False)这种方式，会造成预测预测结果发生变换。

计算torch模型参数量与运算量

通过pytorch计算参数量

1
2
3

def count_para(model): #参数量计算（通过pytorch） 
    total = sum([param.nelement() for param in model.paramenters()])
    print("parameter: %fM"%(total/1e6))

通过thop计算

def count_para_flops(input, model): # 计算单一输入模型
    flops, params = profile(model, inputs=(input,))
    print("FLOPS=",str(flops/1e9)+'{}'.format("G"))
    print("params=",str(params/1e6)+'{}'.format("M"))

当模型有多个输入参数时通过封装一层类来实现

class func(nn.Module): # 将多输入模型封装成单输入
def __init__(self,object):
    super(func,self).__init__()
    self.model = object 

def forward(self, a):
    x0 = torch.randn(1, 1, 32, 96, 96)
    x1 = torch.randn(1, 1, 32, 96, 96)
    x2 = torch.randn(1, 1, 32, 96, 96)
    x3 = torch.randn(1, 1, 32, 96, 96)
    x4 = torch.randn(1, 1, 32, 96, 96)
    x5 = torch.randn(1, 1, 32, 96, 96)
    TS_all = np.array([1, 1, 1, 1, 1, 1]).astype(np.float32)
    TS = torch.from_numpy(np.repeat(TS_all[np.newaxis,:],x0.shape[0],0))
    out = self.model(x0,x1,x2,x3,x4,x5,TS)
    return out

def Complex_count_para_flops(): # 计算多输入模型
    trans_param = {'hidden_size': 768, 'MLP_dim': 2048, 'Num_heads': 12, \
    'Dropout_rate': 0.1, 'Attention_dropout_rate':0.0, 'Trans_num_layers':12}
    model = TranRUnet(1,1,16,trans_param)
    use_model = func(model) 
    input = torch.randn(1, 1, 32, 96, 96)
    flops, params = profile(use_model, inputs=(input))
    print("FLOPS=",str(flops/1e9)+'{}'.format("G"))
    print("params=",str(params/1e6)+'{}'.format("M"))

教程 PyTorch

PyTorch

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

人脸识别(连载中) 上一篇

PyTorch之tensorboard 下一篇

PyTorch拾遗

正文