PaddleOCR使用指南

原创
2022/08/04 14:17
阅读数 2.1K

首先是安装飞桨,然后是安装paddleocr

pip install "paddleocr>=2.0.1"

对图像进行识别

from paddleocr import PaddleOCR, draw_ocr
from PIL import Image

if __name__ == '__main__':

    ocr = PaddleOCR(use_angle_cls=True, lang='ch')
    img_path = 'demo/demo_kie.jpeg'
    result = ocr.ocr(img_path, cls=True)
    for line in result:
        print(line)

    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result]
    txts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, txts, scores, font_path='data/chineseocr/labels/font.TTF')
    im_show = Image.fromarray(im_show)
    im_show.save('output/result5.jpg')

这里的PaddleOCR(use_angle_cls=True, lang='ch')中的lang可以是很多种语言,比如`ch`, `en`, `fr`, `german`, `korean`, `japan`

这里即包含了文字检测,也包含了文本识别,一般结果如下

但如果是一张比较简单的文字,如

这个时候,我们只需要识别,无需检测

from paddleocr import PaddleOCR, draw_ocr

if __name__ == '__main__':

    ocr = PaddleOCR(use_angle_cls=True, lang='en')
    img_path = 'demo/demo_text_recog.jpg'
    result = ocr.ocr(img_path, cls=True, det=False)
    for line in result:
        print(line)

运行结果(部分)

('STAR', 0.8838256597518921)

PaddleOCR框架下载地址:GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

模型训练

这里依然以Kaggle 验证码文本识别为例,PaddleOCR的数据集格式跟MMOCR有一些不同,它需要将训练数据集和测试数据集的图片放在两个不同的文件夹中。大致样式如下

由于之前都是放在一起的,所以写一个脚本将它们分开

import shutil

if __name__ == '__main__':

    with open('data/toy_dataset/test_label.txt', 'r') as f:
        for line in f:
            filename = line.split('	')[0]
            shutil.move('data/toy_dataset/train/' + filename, 'data/toy_dataset/test/' + filename)

另外它的标签文件中间是以制表符\t分开的,而在MMOCR中是以空格分开的。

2wc38.png	2wc38
y5n6d.png	y5n6d
men4f.png	men4f
57b27.png	57b27
x3deb.png	x3deb

这里依然使用SAR模型来进行训练。修改PaddleOCR主目录下的configs/rec/rec_r31_sar.yml文件,当然这只是识别框架的其中之一,我们以此为例,修改的部分内容如下

Global:
  use_gpu: true
  epoch_num: 200
  log_smooth_window: 20
  print_batch_step: 20
  save_model_dir: ./sar_rec
  save_epoch_step: 30
  # evaluation is run every 2000 iterations
  eval_batch_step: [0, 1000]
  cal_metric_during_train: True
  pretrained_model:
  checkpoints: 
  save_inference_dir:
  use_visualdl: False
  infer_img: data/toy_dataset/test/2en7g.png
  # for data or label process
#  character_dict_path: ppocr/utils/dict90.txt
  character_dict_path: ppocr/utils/en_dict.txt
  max_text_length: 30
  infer_mode: False
  use_space_char: False
  rm_symbol: True
  save_res_path: ./output/rec/predicts_sar.txt

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Piecewise
    decay_epochs: [3, 4]
#    values: [0.001, 0.0001, 0.00001]
    values: [0.001, 0.001, 0.001]
  regularizer:
    name: 'L2'
    factor: 0

Train:
  dataset:
    name: SimpleDataSet
    label_file_list: ["./data/toy_dataset/train_label.txt"]
    data_dir: ./data/toy_dataset/train/
    ratio_list: 1.0
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - SARLabelEncode: # Class handling label
      - SARRecResizeImg:
          image_shape: [3, 48, 48, 160] # h:48 w:[48,160]
          width_downsample_ratio: 0.25
      - KeepKeys:
          keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 32
    drop_last: True
    num_workers: 8
    use_shared_memory: False

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./data/toy_dataset/test/
    label_file_list: ["./data/toy_dataset/test_label.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - SARLabelEncode: # Class handling label
      - SARRecResizeImg:
          image_shape: [3, 48, 48, 160]
          width_downsample_ratio: 0.25
      - KeepKeys:
          keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 32
    num_workers: 4
    use_shared_memory: False

每一条内容的具体含义可以参考PaddleOCR/config.md at release/2.5 · PaddlePaddle/PaddleOCR · GitHub

将tools文件夹下的train.py拷贝到PaddleOCR主文件夹下,添加参数

--config=configs/rec/rec_r31_sar.yml

运行,开始训练。

运行结果(部分)

[2022/08/08 16:31:30] ppocr INFO: epoch: [80/200], global_step: 1980, lr: 0.001000, acc: 0.593750, norm_edit_dis: 0.904241, loss: 0.240176, avg_reader_cost: 0.04908 s, avg_batch_cost: 0.08742 s, avg_samples: 8.0, ips: 91.51111 samples/s, eta: 0:09:36
[2022/08/08 16:31:33] ppocr INFO: epoch: [80/200], global_step: 2000, lr: 0.001000, acc: 0.593750, norm_edit_dis: 0.911458, loss: 0.244095, avg_reader_cost: 0.00006 s, avg_batch_cost: 0.15064 s, avg_samples: 32.0, ips: 212.42641 samples/s, eta: 0:09:31
eval model:: 100%|██████████| 9/9 [00:03<00:00,  2.34it/s]
[2022/08/08 16:31:37] ppocr INFO: cur metric, acc: 0.9888475468829909, norm_edit_dis: 0.9977695168115421, fps: 72.34870468201763
[2022/08/08 16:31:38] ppocr INFO: save best model is to ./sar_rec/best_accuracy
[2022/08/08 16:31:38] ppocr INFO: best metric, acc: 0.9888475468829909, norm_edit_dis: 0.9977695168115421, fps: 72.34870468201763, best_epoch: 80

模型评估

将tools/eval.py拷贝到PaddleOCR主文件夹下,添加参数

-c configs/rec/rec_r31_sar.yml -o Global.checkpoints=sar_rec/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt

然后运行eval.py

运行结果

[2022/08/08 16:42:55] ppocr INFO: resume from sar_rec/best_accuracy
[2022/08/08 16:42:55] ppocr INFO: metric in ckpt ***************
[2022/08/08 16:42:55] ppocr INFO: acc:0.992565018863754
[2022/08/08 16:42:55] ppocr INFO: norm_edit_dis:0.9985130112076948
[2022/08/08 16:42:55] ppocr INFO: fps:72.37282298826126
[2022/08/08 16:42:55] ppocr INFO: best_epoch:200
[2022/08/08 16:42:55] ppocr INFO: start_epoch:201
eval model:: 100%|██████████| 9/9 [00:04<00:00,  1.81it/s]
[2022/08/08 16:43:00] ppocr INFO: metric eval ***************
[2022/08/08 16:43:00] ppocr INFO: acc:0.992565018863754
[2022/08/08 16:43:00] ppocr INFO: norm_edit_dis:0.9985130112076948
[2022/08/08 16:43:00] ppocr INFO: fps:55.0351114820872

预测

将tools/infer_rec.py拷贝到PaddleOCR主文件夹下,添加参数

-c configs/rec/rec_r31_sar.yml -o Global.checkpoints=sar_rec/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt

运行infer_rec.py

运行结果

[2022/08/08 16:45:17] ppocr INFO: resume from sar_rec/best_accuracy
[2022/08/08 16:45:17] ppocr INFO: infer_img: data/toy_dataset/test/2en7g.png
[2022/08/08 16:45:18] ppocr INFO: 	 result: 2en7g	1.0
[2022/08/08 16:45:18] ppocr INFO: success!

CRNN文字识别

除了SAR之外,CRNN也是一种文字识别的网络模型。假设有这样一个场景,它的文字已经被检测到了,接下来就是文字识别

基于RNN文字识别算法主要有两个框架:

  1. CNN+RNN+CTC(CRNN+CTC)
  2. CNN+Seq2Seq+Attention

这里我们以第一个框架为例进行说明,CRNN基本网络结构

输入的图像是一个32*100*3的形状,32为高,100是宽,3为通道数。Convlutional Layers是一个普通的CNN,用来提取图像的feature maps,大小为1*25*512。Recurrent Layers是一个RNN网络,它是一个深层双向LSTM网络,在卷积特征的基础上继续提取文字序列特征。

由于CNN输出的Feature map是(1,25,512)大小,所以对于RNN最大时间长度 T=25 (即有25个时间输入,每个输入 xt 列向量有 D=512 )。这里跟SAR不同,SAR的CNN输出的feature map的高度并不是1,它会将feature map的每一列作为后续RNN的输入。

为了将特征输入到Recurrent Layers,做如下处理:

  1. 首先会将图像在固定长宽比的情况下缩放到 32×W×3 大小( W 代表任意宽度)
  2. 然后经过CNN后变为 1×(W/4)×512
  3. 针对LSTM设置 T=(W/4) ,即可将特征输入LSTM。

所以在处理输入图像的时候,建议在保持长宽比的情况下将高缩放到 32,这样能够尽量不破坏图像中的文本细节(当然也可以将输入图像缩放到固定宽度,但是这样由于破坏文本的形状,肯定会造成性能下降)。以下代码为了简化将宽度给固定了。

import cv2
import math
import numpy as np
import matplotlib.pyplot as plt

def resize_norm_img(img):
    '''
    数据缩放归一化
    :param img: 输入图片
    :return:
    '''
    # 默认输入尺寸
    img_c = 3
    img_h = 32
    img_w = 320
    # 图片的真实高、宽
    h, w = img.shape[:2]
    # 图片真实长宽比
    ratio = w / float(h)
    # 按比例缩放
    if math.ceil(img_h * ratio) > img_w:
        # 如果大于默认宽度,则宽度为img_w
        resized_w = img_w
    else:
        # 如果小于默认宽度则以图片真实宽度为准
        resized_w = int(math.ceil(img_h * ratio))
    # 缩放
    resized_image = cv2.resize(img, (resized_w, img_h))
    resized_image = resized_image.astype('float32')
    # 归一化
    resized_image = resized_image.transpose((2, 0, 1)) / 255
    resized_image -= 0.5
    resized_image /= 0.5
    # 对宽度不足的位置补0
    padding_im = np.zeros((img_c, img_h, img_w), dtype=np.float32)
    padding_im[:, :, 0:resized_w] = resized_image
    # 转置padding后的图片可用于可视化
    draw_img = padding_im.transpose((1, 2, 0))
    return padding_im, draw_img

if __name__ == '__main__':

    raw_img = cv2.imread('word_1.png')
    plt.figure()
    plt.subplot(2, 1, 1)
    plt.imshow(raw_img)
    plt.show()
    padding_im, draw_img = resize_norm_img(raw_img)
    plt.subplot(2, 1, 1)
    plt.imshow(draw_img)
    plt.show()

运行结果

CNN Backbone

这里使用MobileNet V3作为主干网络,有关MobileNet V3的内容请参考深度学习网络模型的改进与调整 中的MobileNet V3

import paddle
import paddle.nn as nn
import paddle.nn.functional as F

class ConvBNLayer(nn.Layer):

    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, groups=1,
                 if_act=True, act=None):
        '''
        卷积BN层
        :param in_channels: 输入通道数
        :param out_channels: 输出通道数
        :param kernel_size: 卷积核大小
        :param stride: 步长大小
        :param padding: 填充大小
        :param groups: 二维卷积层的组数
        :param if_act: 是否激活
        :param act: 激活函数
        '''
        super(ConvBNLayer, self).__init__()
        self.if_act = if_act
        self.act = act
        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
                              groups=groups, bias_attr=False)
        self.bn = nn.BatchNorm(num_channels=out_channels, act=None)

    def forward(self, x):
        out = self.conv(x)
        out = self.bn(out)
        if self.if_act:
            if self.act == "relu":
                out = F.relu(out)
            elif self.act == "hardswish":
                out = F.hardswish(out)
            else:
                print("The activation function({}) is selected incorrectly".format(self.act))
                exit()
        return out

class SEModule(nn.Layer):

    def __init__(self, in_channels, reduction=4):
        '''
        SE模块
        :param in_channels: 输入通道数
        :param reduction: 通道缩放率
        '''
        super(SEModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.conv1 = nn.Conv2D(in_channels, in_channels // reduction, 1, stride=1, padding=0)
        self.conv2 = nn.Conv2D(in_channels // reduction, in_channels, 1, stride=1, padding=0)

    def forward(self, x):
        out = self.avg_pool(x)
        out = self.conv1(out)
        out = F.relu(out)
        out = self.conv2(out)
        out = F.hardsigmoid(out, slope=0.2, offset=0.5)
        return x * out

class ResidualUnit(nn.Layer):

    def __init__(self, in_channels, mid_channels, out_channels, kernel_size, stride,
                 use_se, act=None):
        '''
        残差层
        :param in_channels: 输入通道数
        :param mid_channels: 中间通道数
        :param out_channels: 输出通道数
        :param kernel_size: 卷积核尺寸
        :param stride: 步长大小
        :param use_se: 是否使用se模块
        :param act: 激活函数
        '''
        super(ResidualUnit, self).__init__()
        self.if_shortcut = stride == 1 and in_channels == out_channels
        self.if_se = use_se

        self.expand_conv = ConvBNLayer(in_channels, mid_channels, 1, 1, 0, if_act=True, act=act)
        self.bottleneck_conv = ConvBNLayer(mid_channels, mid_channels, kernel_size, stride,
                                           int((kernel_size - 1) // 2), groups=mid_channels,
                                           if_act=True, act=act)
        if self.if_se:
            self.mid_se = SEModule(mid_channels)
        self.linear_conv = ConvBNLayer(mid_channels, out_channels, 1, 1, 0, if_act=False, act=None)

    def forward(self, x):
        out = self.expand_conv(x)
        out = self.bottleneck_conv(out)
        if self.if_se:
            out = self.mid_se(out)
        out = self.linear_conv(out)
        if self.if_shortcut:
            out = paddle.add(x, out)
        return out

def make_divisible(v, divisor=8, min_value=None):
    '''
    确保被8整除
    '''
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class MobileNetV3(nn.Layer):

    def __init__(self, in_channels=3, model_name='small', scale=0.5, small_stride=None,
                 disable_se=False, **kwargs):
        super(MobileNetV3, self).__init__()
        self.disable_se = disable_se
        small_stride = [1, 2, 2, 2]
        if model_name == "small":
            cfg = [
                [3, 16, 16, True, 'relu', (small_stride[0], 1)],
                [3, 72, 24, False, 'relu', (small_stride[1], 1)],
                [3, 88, 24, False, 'relu', 1],
                [5, 96, 40, True, 'hardswish', (small_stride[2], 1)],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 120, 48, True, 'hardswish', 1],
                [5, 144, 48, True, 'hardswish', 1],
                [5, 288, 96, True, 'hardswish', (small_stride[3], 1)],
                [5, 576, 96, True, 'hardswish', 1],
                [5, 576, 96, True, 'hardswish', 1]
            ]
            cls_ch_squeeze = 576
        else:
            raise NotImplementedError("model[" + model_name + "_model] is not implemented!")

        supported_scale = [0.35, 0.5, 0.75, 1.0, 1.25]
        assert scale in supported_scale, "supported scales are {} but input scale is {}".format(
            supported_scale, scale)

        inplanes = 16
        self.conv1 = ConvBNLayer(in_channels, make_divisible(inplanes * scale), 3, 2, 1, groups=1,
                                 if_act=True, act='hardswish')
        i = 0
        block_list = []
        inplanes = make_divisible(inplanes * scale)
        for (k, exp, c, se, nl, s) in cfg:
            se = se and not self.disable_se
            block_list.append(
                ResidualUnit(inplanes, make_divisible(scale * exp), make_divisible(scale * c), k, s,
                             se, act=nl))
            inplanes = make_divisible(scale * c)
            i += 1
        self.blocks = nn.Sequential(*block_list)

        self.conv2 = ConvBNLayer(inplanes, make_divisible(scale * cls_ch_squeeze), 1, 1, 0,
                                 groups=1, if_act=True, act='hardswish')
        self.pool = nn.MaxPool2D(2, stride=2, padding=0)
        self.out_channels = make_divisible(scale * cls_ch_squeeze)

    def forward(self, x):
        out = self.conv1(x)
        out = self.blocks(out)
        out = self.conv2(out)
        out = self.pool(out)
        return out

if __name__ == '__main__':

    IMAGE_SHAPE_C = 3
    IMAGE_SHAPE_H = 32
    IMAGE_SHAPE_W = 100

    paddle.summary(MobileNetV3(), [(1, IMAGE_SHAPE_C, IMAGE_SHAPE_H, IMAGE_SHAPE_W)])
    img = paddle.rand((1, IMAGE_SHAPE_C, IMAGE_SHAPE_H, IMAGE_SHAPE_W), dtype='float32')
    net = MobileNetV3()
    out = net(img)
    print(out.shape)

运行结果

-------------------------------------------------------------------------------
   Layer (type)         Input Shape          Output Shape         Param #    
===============================================================================
     Conv2D-1        [[1, 3, 32, 100]]      [1, 8, 16, 50]          216      
    BatchNorm-1       [[1, 8, 16, 50]]      [1, 8, 16, 50]          32       
   ConvBNLayer-1     [[1, 3, 32, 100]]      [1, 8, 16, 50]           0       
     Conv2D-2         [[1, 8, 16, 50]]      [1, 8, 16, 50]          64       
    BatchNorm-2       [[1, 8, 16, 50]]      [1, 8, 16, 50]          32       
   ConvBNLayer-2      [[1, 8, 16, 50]]      [1, 8, 16, 50]           0       
     Conv2D-3         [[1, 8, 16, 50]]      [1, 8, 16, 50]          72       
    BatchNorm-3       [[1, 8, 16, 50]]      [1, 8, 16, 50]          32       
   ConvBNLayer-3      [[1, 8, 16, 50]]      [1, 8, 16, 50]           0       
AdaptiveAvgPool2D-1   [[1, 8, 16, 50]]       [1, 8, 1, 1]            0       
     Conv2D-4          [[1, 8, 1, 1]]        [1, 2, 1, 1]           18       
     Conv2D-5          [[1, 2, 1, 1]]        [1, 8, 1, 1]           24       
    SEModule-1        [[1, 8, 16, 50]]      [1, 8, 16, 50]           0       
     Conv2D-6         [[1, 8, 16, 50]]      [1, 8, 16, 50]          64       
    BatchNorm-4       [[1, 8, 16, 50]]      [1, 8, 16, 50]          32       
   ConvBNLayer-4      [[1, 8, 16, 50]]      [1, 8, 16, 50]           0       
  ResidualUnit-1      [[1, 8, 16, 50]]      [1, 8, 16, 50]           0       
     Conv2D-7         [[1, 8, 16, 50]]     [1, 40, 16, 50]          320      
    BatchNorm-5      [[1, 40, 16, 50]]     [1, 40, 16, 50]          160      
   ConvBNLayer-5      [[1, 8, 16, 50]]     [1, 40, 16, 50]           0       
     Conv2D-8        [[1, 40, 16, 50]]      [1, 40, 8, 50]          360      
    BatchNorm-6       [[1, 40, 8, 50]]      [1, 40, 8, 50]          160      
   ConvBNLayer-6     [[1, 40, 16, 50]]      [1, 40, 8, 50]           0       
     Conv2D-9         [[1, 40, 8, 50]]      [1, 16, 8, 50]          640      
    BatchNorm-7       [[1, 16, 8, 50]]      [1, 16, 8, 50]          64       
   ConvBNLayer-7      [[1, 40, 8, 50]]      [1, 16, 8, 50]           0       
  ResidualUnit-2      [[1, 8, 16, 50]]      [1, 16, 8, 50]           0       
     Conv2D-10        [[1, 16, 8, 50]]      [1, 48, 8, 50]          768      
    BatchNorm-8       [[1, 48, 8, 50]]      [1, 48, 8, 50]          192      
   ConvBNLayer-8      [[1, 16, 8, 50]]      [1, 48, 8, 50]           0       
     Conv2D-11        [[1, 48, 8, 50]]      [1, 48, 8, 50]          432      
    BatchNorm-9       [[1, 48, 8, 50]]      [1, 48, 8, 50]          192      
   ConvBNLayer-9      [[1, 48, 8, 50]]      [1, 48, 8, 50]           0       
     Conv2D-12        [[1, 48, 8, 50]]      [1, 16, 8, 50]          768      
   BatchNorm-10       [[1, 16, 8, 50]]      [1, 16, 8, 50]          64       
  ConvBNLayer-10      [[1, 48, 8, 50]]      [1, 16, 8, 50]           0       
  ResidualUnit-3      [[1, 16, 8, 50]]      [1, 16, 8, 50]           0       
     Conv2D-13        [[1, 16, 8, 50]]      [1, 48, 8, 50]          768      
   BatchNorm-11       [[1, 48, 8, 50]]      [1, 48, 8, 50]          192      
  ConvBNLayer-11      [[1, 16, 8, 50]]      [1, 48, 8, 50]           0       
     Conv2D-14        [[1, 48, 8, 50]]      [1, 48, 4, 50]         1,200     
   BatchNorm-12       [[1, 48, 4, 50]]      [1, 48, 4, 50]          192      
  ConvBNLayer-12      [[1, 48, 8, 50]]      [1, 48, 4, 50]           0       
AdaptiveAvgPool2D-2   [[1, 48, 4, 50]]      [1, 48, 1, 1]            0       
     Conv2D-15        [[1, 48, 1, 1]]       [1, 12, 1, 1]           588      
     Conv2D-16        [[1, 12, 1, 1]]       [1, 48, 1, 1]           624      
    SEModule-2        [[1, 48, 4, 50]]      [1, 48, 4, 50]           0       
     Conv2D-17        [[1, 48, 4, 50]]      [1, 24, 4, 50]         1,152     
   BatchNorm-13       [[1, 24, 4, 50]]      [1, 24, 4, 50]          96       
  ConvBNLayer-13      [[1, 48, 4, 50]]      [1, 24, 4, 50]           0       
  ResidualUnit-4      [[1, 16, 8, 50]]      [1, 24, 4, 50]           0       
     Conv2D-18        [[1, 24, 4, 50]]     [1, 120, 4, 50]         2,880     
   BatchNorm-14      [[1, 120, 4, 50]]     [1, 120, 4, 50]          480      
  ConvBNLayer-14      [[1, 24, 4, 50]]     [1, 120, 4, 50]           0       
     Conv2D-19       [[1, 120, 4, 50]]     [1, 120, 4, 50]         3,000     
   BatchNorm-15      [[1, 120, 4, 50]]     [1, 120, 4, 50]          480      
  ConvBNLayer-15     [[1, 120, 4, 50]]     [1, 120, 4, 50]           0       
AdaptiveAvgPool2D-3  [[1, 120, 4, 50]]      [1, 120, 1, 1]           0       
     Conv2D-20        [[1, 120, 1, 1]]      [1, 30, 1, 1]          3,630     
     Conv2D-21        [[1, 30, 1, 1]]       [1, 120, 1, 1]         3,720     
    SEModule-3       [[1, 120, 4, 50]]     [1, 120, 4, 50]           0       
     Conv2D-22       [[1, 120, 4, 50]]      [1, 24, 4, 50]         2,880     
   BatchNorm-16       [[1, 24, 4, 50]]      [1, 24, 4, 50]          96       
  ConvBNLayer-16     [[1, 120, 4, 50]]      [1, 24, 4, 50]           0       
  ResidualUnit-5      [[1, 24, 4, 50]]      [1, 24, 4, 50]           0       
     Conv2D-23        [[1, 24, 4, 50]]     [1, 120, 4, 50]         2,880     
   BatchNorm-17      [[1, 120, 4, 50]]     [1, 120, 4, 50]          480      
  ConvBNLayer-17      [[1, 24, 4, 50]]     [1, 120, 4, 50]           0       
     Conv2D-24       [[1, 120, 4, 50]]     [1, 120, 4, 50]         3,000     
   BatchNorm-18      [[1, 120, 4, 50]]     [1, 120, 4, 50]          480      
  ConvBNLayer-18     [[1, 120, 4, 50]]     [1, 120, 4, 50]           0       
AdaptiveAvgPool2D-4  [[1, 120, 4, 50]]      [1, 120, 1, 1]           0       
     Conv2D-25        [[1, 120, 1, 1]]      [1, 30, 1, 1]          3,630     
     Conv2D-26        [[1, 30, 1, 1]]       [1, 120, 1, 1]         3,720     
    SEModule-4       [[1, 120, 4, 50]]     [1, 120, 4, 50]           0       
     Conv2D-27       [[1, 120, 4, 50]]      [1, 24, 4, 50]         2,880     
   BatchNorm-19       [[1, 24, 4, 50]]      [1, 24, 4, 50]          96       
  ConvBNLayer-19     [[1, 120, 4, 50]]      [1, 24, 4, 50]           0       
  ResidualUnit-6      [[1, 24, 4, 50]]      [1, 24, 4, 50]           0       
     Conv2D-28        [[1, 24, 4, 50]]      [1, 64, 4, 50]         1,536     
   BatchNorm-20       [[1, 64, 4, 50]]      [1, 64, 4, 50]          256      
  ConvBNLayer-20      [[1, 24, 4, 50]]      [1, 64, 4, 50]           0       
     Conv2D-29        [[1, 64, 4, 50]]      [1, 64, 4, 50]         1,600     
   BatchNorm-21       [[1, 64, 4, 50]]      [1, 64, 4, 50]          256      
  ConvBNLayer-21      [[1, 64, 4, 50]]      [1, 64, 4, 50]           0       
AdaptiveAvgPool2D-5   [[1, 64, 4, 50]]      [1, 64, 1, 1]            0       
     Conv2D-30        [[1, 64, 1, 1]]       [1, 16, 1, 1]          1,040     
     Conv2D-31        [[1, 16, 1, 1]]       [1, 64, 1, 1]          1,088     
    SEModule-5        [[1, 64, 4, 50]]      [1, 64, 4, 50]           0       
     Conv2D-32        [[1, 64, 4, 50]]      [1, 24, 4, 50]         1,536     
   BatchNorm-22       [[1, 24, 4, 50]]      [1, 24, 4, 50]          96       
  ConvBNLayer-22      [[1, 64, 4, 50]]      [1, 24, 4, 50]           0       
  ResidualUnit-7      [[1, 24, 4, 50]]      [1, 24, 4, 50]           0       
     Conv2D-33        [[1, 24, 4, 50]]      [1, 72, 4, 50]         1,728     
   BatchNorm-23       [[1, 72, 4, 50]]      [1, 72, 4, 50]          288      
  ConvBNLayer-23      [[1, 24, 4, 50]]      [1, 72, 4, 50]           0       
     Conv2D-34        [[1, 72, 4, 50]]      [1, 72, 4, 50]         1,800     
   BatchNorm-24       [[1, 72, 4, 50]]      [1, 72, 4, 50]          288      
  ConvBNLayer-24      [[1, 72, 4, 50]]      [1, 72, 4, 50]           0       
AdaptiveAvgPool2D-6   [[1, 72, 4, 50]]      [1, 72, 1, 1]            0       
     Conv2D-35        [[1, 72, 1, 1]]       [1, 18, 1, 1]          1,314     
     Conv2D-36        [[1, 18, 1, 1]]       [1, 72, 1, 1]          1,368     
    SEModule-6        [[1, 72, 4, 50]]      [1, 72, 4, 50]           0       
     Conv2D-37        [[1, 72, 4, 50]]      [1, 24, 4, 50]         1,728     
   BatchNorm-25       [[1, 24, 4, 50]]      [1, 24, 4, 50]          96       
  ConvBNLayer-25      [[1, 72, 4, 50]]      [1, 24, 4, 50]           0       
  ResidualUnit-8      [[1, 24, 4, 50]]      [1, 24, 4, 50]           0       
     Conv2D-38        [[1, 24, 4, 50]]     [1, 144, 4, 50]         3,456     
   BatchNorm-26      [[1, 144, 4, 50]]     [1, 144, 4, 50]          576      
  ConvBNLayer-26      [[1, 24, 4, 50]]     [1, 144, 4, 50]           0       
     Conv2D-39       [[1, 144, 4, 50]]     [1, 144, 2, 50]         3,600     
   BatchNorm-27      [[1, 144, 2, 50]]     [1, 144, 2, 50]          576      
  ConvBNLayer-27     [[1, 144, 4, 50]]     [1, 144, 2, 50]           0       
AdaptiveAvgPool2D-7  [[1, 144, 2, 50]]      [1, 144, 1, 1]           0       
     Conv2D-40        [[1, 144, 1, 1]]      [1, 36, 1, 1]          5,220     
     Conv2D-41        [[1, 36, 1, 1]]       [1, 144, 1, 1]         5,328     
    SEModule-7       [[1, 144, 2, 50]]     [1, 144, 2, 50]           0       
     Conv2D-42       [[1, 144, 2, 50]]      [1, 48, 2, 50]         6,912     
   BatchNorm-28       [[1, 48, 2, 50]]      [1, 48, 2, 50]          192      
  ConvBNLayer-28     [[1, 144, 2, 50]]      [1, 48, 2, 50]           0       
  ResidualUnit-9      [[1, 24, 4, 50]]      [1, 48, 2, 50]           0       
     Conv2D-43        [[1, 48, 2, 50]]     [1, 288, 2, 50]        13,824     
   BatchNorm-29      [[1, 288, 2, 50]]     [1, 288, 2, 50]         1,152     
  ConvBNLayer-29      [[1, 48, 2, 50]]     [1, 288, 2, 50]           0       
     Conv2D-44       [[1, 288, 2, 50]]     [1, 288, 2, 50]         7,200     
   BatchNorm-30      [[1, 288, 2, 50]]     [1, 288, 2, 50]         1,152     
  ConvBNLayer-30     [[1, 288, 2, 50]]     [1, 288, 2, 50]           0       
AdaptiveAvgPool2D-8  [[1, 288, 2, 50]]      [1, 288, 1, 1]           0       
     Conv2D-45        [[1, 288, 1, 1]]      [1, 72, 1, 1]         20,808     
     Conv2D-46        [[1, 72, 1, 1]]       [1, 288, 1, 1]        21,024     
    SEModule-8       [[1, 288, 2, 50]]     [1, 288, 2, 50]           0       
     Conv2D-47       [[1, 288, 2, 50]]      [1, 48, 2, 50]        13,824     
   BatchNorm-31       [[1, 48, 2, 50]]      [1, 48, 2, 50]          192      
  ConvBNLayer-31     [[1, 288, 2, 50]]      [1, 48, 2, 50]           0       
  ResidualUnit-10     [[1, 48, 2, 50]]      [1, 48, 2, 50]           0       
     Conv2D-48        [[1, 48, 2, 50]]     [1, 288, 2, 50]        13,824     
   BatchNorm-32      [[1, 288, 2, 50]]     [1, 288, 2, 50]         1,152     
  ConvBNLayer-32      [[1, 48, 2, 50]]     [1, 288, 2, 50]           0       
     Conv2D-49       [[1, 288, 2, 50]]     [1, 288, 2, 50]         7,200     
   BatchNorm-33      [[1, 288, 2, 50]]     [1, 288, 2, 50]         1,152     
  ConvBNLayer-33     [[1, 288, 2, 50]]     [1, 288, 2, 50]           0       
AdaptiveAvgPool2D-9  [[1, 288, 2, 50]]      [1, 288, 1, 1]           0       
     Conv2D-50        [[1, 288, 1, 1]]      [1, 72, 1, 1]         20,808     
     Conv2D-51        [[1, 72, 1, 1]]       [1, 288, 1, 1]        21,024     
    SEModule-9       [[1, 288, 2, 50]]     [1, 288, 2, 50]           0       
     Conv2D-52       [[1, 288, 2, 50]]      [1, 48, 2, 50]        13,824     
   BatchNorm-34       [[1, 48, 2, 50]]      [1, 48, 2, 50]          192      
  ConvBNLayer-34     [[1, 288, 2, 50]]      [1, 48, 2, 50]           0       
  ResidualUnit-11     [[1, 48, 2, 50]]      [1, 48, 2, 50]           0       
     Conv2D-53        [[1, 48, 2, 50]]     [1, 288, 2, 50]        13,824     
   BatchNorm-35      [[1, 288, 2, 50]]     [1, 288, 2, 50]         1,152     
  ConvBNLayer-35      [[1, 48, 2, 50]]     [1, 288, 2, 50]           0       
    MaxPool2D-1      [[1, 288, 2, 50]]     [1, 288, 1, 25]           0       
===============================================================================
Total params: 259,056
Trainable params: 246,736
Non-trainable params: 12,320
-------------------------------------------------------------------------------
Input size (MB): 0.04
Forward/backward pass size (MB): 13.88
Params size (MB): 0.99
Estimated Total Size (MB): 14.91
-------------------------------------------------------------------------------

[1, 288, 1, 25]

由结果可知,它最终输出的feature map的高度为1,宽度为25,通道数为288。

RNN neck

neck部分将backbone输出的视觉特征图转换为1维向量输入送到LSTM网络中,输出序列特征

import paddle
import paddle.nn as nn
from backbone import MobileNetV3

class Im2Seq(nn.Layer):

    def __init__(self, in_channels, **kwargs):
        '''
        图像特征转换为序列特征
        :param in_channels: 输入通道数
        '''
        super(Im2Seq, self).__init__()
        self.out_channels = in_channels

    def forward(self, x):
        B, C, H, W = x.shape
        assert H == 1
        x = x.squeeze(axis=2)
        x = x.transpose([0, 2, 1])
        return x

class SequenceEncoder(nn.Layer):

    def __init__(self, in_channels, hidden_size=48, **kwargs):
        '''
        序列编码
        :param in_channels: 输入通道数
        :param hidden_size: 隐藏层size
        '''
        super(SequenceEncoder, self).__init__()
        self.encoder_reshape = Im2Seq(in_channels)

        self.encoder = EncoderWithRNN(self.encoder_reshape.out_channels, hidden_size)
        self.out_channels = self.encoder.out_channels

    def forward(self, x):
        x = self.encoder_reshape(x)
        x = self.encoder(x)
        return x


class EncoderWithRNN(nn.Layer):

    def __init__(self, in_channels, hidden_size):
        super(EncoderWithRNN, self).__init__()
        self.out_channels = hidden_size * 2
        # 双向的LSTM
        self.lstm = nn.LSTM(in_channels, hidden_size, direction='bidirectional', num_layers=2)

    def forward(self, x):
        x, _ = self.lstm(x)
        return x

if __name__ == '__main__':

    IMAGE_SHAPE_C = 3
    IMAGE_SHAPE_H = 32
    IMAGE_SHAPE_W = 100

    img = paddle.rand((1, IMAGE_SHAPE_C, IMAGE_SHAPE_H, IMAGE_SHAPE_W), dtype='float32')
    net = MobileNetV3()
    out = net(img)

    neck = SequenceEncoder(in_channels=288)
    sequence = neck(out)
    print(sequence.shape)

运行结果

[1, 25, 96]

模型推理和部署

使用预训练模型

进入PaddleOCR github官网,下载模型

这里选择推理模型(inference model)下载(三个都要下载),将下载好的模型.tar文件解压后放入PaddleOCR主目录的inference目录下。

将/tools/infer目录下的predict_system.py拷贝到PaddleOCR主目录,添加执行参数

--image_dir=doc/imgs/00018069.jpg --det_model_dir=inference/ch_PP-OCRv3_det_infer/ --cls_model_dir=inference/ch_ppocr_mobile_v2.0_cls_infer/ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer/ --use_angle_cls=True

这里我们使用的图片如下

执行完成后会生成一个inference_results的文件夹,里面会有相应的执行结果的图片

我们不仅可以保存上面的预测结果还可以对结果进行裁剪,修改执行参数

--image_dir=doc/imgs/00018069.jpg --det_model_dir=inference/ch_PP-OCRv3_det_infer/ --cls_model_dir=inference/ch_ppocr_mobile_v2.0_cls_infer/ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer/ --use_angle_cls=True --save_crop_res=True

执行后会多出一个output文件夹,里面会有裁剪后的图片

将训练模型转换成推理模型

一般我们训练出来的模型如下图所示

我们需要将其转化成之前下载的预训练推理模型的格式

将tools/export_model.py拷贝到OCR主目录下,添加执行参数

-c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"

执行后会出现如下的推理文件,我们会使用的是Student里面的模型

模型推理

将tools/infer目录下的predict_rec.py拷贝到OCR主目录下,添加执行参数

--image_dir="train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"

移动端部署

环境要求

  1. gcc、g++ == 8.2.0
  2. CMake >= 3.10
  3. git、make、wget、python

环境安装命令

linux系统

sudo apt update
sudo apt-get install -y --no-install-recommends gcc g++ git make wget python unzip curl
wget -c https://mms-res.cdn.bcebos.com/cmake-3.10.3-Linux-x86_64.tar.gz
tar xzf cmake-3.10.3-Linux-x86_64.tar.gz
sudo mv cmake-3.10.3-Linux-x86_64 /opt/cmake-3.10
sudo ln -s /opt/cmake-3.10/bin/cmake /usr/bin/cmake
sudo ln -s /opt/cmake-3.10/bin/ccmake /usr/bin/ccmake

macos系统

brew install curl gcc git make unzip wget
wget https://cmake.org/files/v3.10/cmake-3.10.2-Darwin-x86_64.tar.gz
tar -xzvf cmake-3.10.2-Darwin-x86_64.tar.gz
mv cmake-3.10.2-Darwin-x86_64 3.10.2
sudo ln -s /usr/local/Cellar/cmake/3.10.2/bin/cmake /usr/local/cmake

编译过程

linux系统

git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite/
git checkout develop
rm -rf third-party/
./lite/tools/build_linux.sh --arch=x86

macos系统

git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite/
git checkout develop
rm -rf third-party/
./lite/tools/build_macos.sh x86
  • 获取预测库
git checkout release/v2.10

安装安卓NDK,请参考模型部署篇 中的模型的转化与量化加速,不过如果下载地址无法下载的话可以使用https://www.androiddevtools.cn/进行下载。

如果是mac的请直接下载并解压

curl -O https://dl.google.com/android/repository/android-ndk-r17c-darwin-x86_64.zip

按照以上方式安装好后,需要在/etc/profile中添加。这里/Users/admin/Documents/android-ndk-r17c是我的安装地址,改成你实际的安装地址。

export ANDROID_NDK=/Users/admin/Documents/android-ndk-r17c
export PATH=$PATH:$ANDROID_NDK
export NDK_ROOT=$ANDROID_NDK

再source /etc/profile文件。

然后是安装JDK,由于Oracle网站登录有点问题,这里给一个JDK的下载链接:https://pan.baidu.com/s/17XOussWis0w56kh4k0A_0Q
提取码:1ucb,配置/etc/profile,这里/dev/ndk/jdk1.8.0_161是我的安装目录,你要根据你自己的安装目录填写

export JAVA_HOME=/dev/ndk/jdk1.8.0_161
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}

再source /etc/profile文件。然后再在Paddle-Lite目录下执行

linux系统

./lite/tools/build_android.sh  --arch=armv8  --with_cv=ON --with_extra=ON

macos系统

./lite/tools/build_android.sh

macos如果编译不出来,可以在https://paddle-lite.readthedocs.io/zh/release-v2.10_a/quick_start/release_lib.html直接下载

编译后的预测库位于Paddle-lite文件夹下的build.lite.android.armv8.gcc/inference_lite_lib.android.armv8,我这里是

/home/user/Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8
  • 模型优化

安装Python版本的Paddle-Lite

pip install paddlelite==2.10 -i https://pypi.tuna.tsinghua.edu.cn/simple

我们之前下载好了三个Python版本的预训练模型,现在我们把它们给转化成Paddle-Lite需要的.nb格式的模型。进入PaddleOCR主目录下的inference目录,执行以下命令

paddle_lite_opt --model_file=./ch_PP-OCRv3_det_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_det_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_det_out --valid_targets=arm --optimize_out_type=naive_buffer
paddle_lite_opt --model_file=./ch_PP-OCRv3_rec_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_rec_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_rec_out --valid_targets=arm --optimize_out_type=naive_buffer
paddle_lite_opt --model_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams --optimize_out=./ch_ppocr_mobile_v2.0_cls_out --valid_targets=arm --optimize_out_type=naive_buffer

这样就会产生三个nb文件

  • 与手机联调

准备预测库文件、测试图像和使用的字典文件,并放置在预测库中的demo/cxx/ocr文件夹下。先进入PaddleOCR主目录的deploy/lite文件夹,执行

sh prepare.sh /home/user/Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8

该命令接的参数为编译好的预测库的文件夹地址。

进入生成的OCR demo目录

cd /home/user/Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/demo/cxx/ocr/

将C++预测动态库so文件复制到debug文件夹中

cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/

将生成的三个.nb文件复制到debug文件夹中

cp /home/user/下载/PaddleOCR-release-2.5/inference/*.nb ./debug/

此时debug文件夹的内容如下图所示

其中ppocr_keys_v1.txt是中文字典文件,如果使用的 nb 模型是英文数字或其他语言的模型,需要更换为对应语言的字典。 PaddleOCR 在ppocr/utils/下存放了多种字典,包括:

dict/french_dict.txt     # 法语字典
dict/german_dict.txt     # 德语字典
ic15_dict.txt       # 英文字典
dict/japan_dict.txt      # 日语字典
dict/korean_dict.txt     # 韩语字典
ppocr_keys_v1.txt   # 中文字典
...

 config.txt 包含了检测器、分类器、识别器的超参数,如下:

max_side_len  960         # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
det_db_thresh  0.3        # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显
det_db_box_thresh  0.5    # 检测器后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_unclip_ratio  1.6  # 表示文本框的紧致程度,越小则文本框更靠近文本
use_direction_classify  0  # 是否使用方向分类器,0表示不使用,1表示使用
rec_image_height  32      # 识别模型输入图像的高度,PP-OCRv3模型设置为48,PP-OCRv2模型需要设置为32

安装Auto_log,在ocr文件夹下执行

git clone https://gitee.com/Double_V/AutoLog
cd AutoLog/
pip install -r requirements.txt
python setup.py bdist_wheel
pip install ./dist/auto_log-1.2.0-py3-none-any.whl

回到ocr文件夹

cd ..
make -j
make -j
mv ocr_db_crnn ./debug/
adb push debug /data/local/tmp/
adb shell
cd /data/local/tmp/debug 
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
./ocr_db_crnn ch_PP-OCRv3_det_out.nb ch_PP-OCRv3_rec_out.nb ch_ppocr_mobile_v2.0_cls_out.nb ./11.jpg ppocr_keys_v1.txt

Paddle2ONNX

环境准备

python -m pip install paddle2onnx
python -m pip install onnxruntime==1.9.0
pip install onnx

模型转换,使用Paddle2ONNX将Paddle模型转换成ONNX模型。进入PaddleOCR主目录

paddle2onnx --model_dir ./inference/ch_PP-OCRv3_det_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./inference/det_onnx/model.onnx --opset_version 10 --input_shape_dict="{'x':[-1,3,-1,-1]}" --enable_onnx_checker True
paddle2onnx --model_dir ./inference/ch_PP-OCRv3_rec_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./inference/rec_onnx/model.onnx --opset_version 10 --input_shape_dict="{'x':[-1,3,-1,-1]}" --enable_onnx_checker True
paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./inference/cls_onnx/model.onnx --opset_version 10 --input_shape_dict="{'x':[-1,3,-1,-1]}" --enable_onnx_checker True

 这样就生成了三个onnx的文件

注意:对于OCR模型,转化过程中必须采用动态shape的形式,即加入选项--input_shape_dict="{'x': [-1, 3, -1, -1]}",否则预测结果可能与直接使用Paddle预测有细微不同。 另外,以下几个模型暂不支持转换为 ONNX 模型: NRTR、SAR、RARE、SRN

  • 推理预测

执行拷贝到PaddleOCR主目录的predict_system.py,添加执行参数

--use_gpu=False --use_onnx=True --det_model_dir=./inference/det_onnx/model.onnx --rec_model_dir=./inference/rec_onnx/model.onnx --cls_model_dir=./inference/cls_onnx/model.onnx --image_dir=./deploy/lite/imgs/lite_demo.png

执行结果,在inference_results目录下

终端输出

[2022/08/16 11:40:04] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2022/08/16 11:40:04] ppocr DEBUG: dt_boxes num : 42, elapse : 0.022621631622314453
[2022/08/16 11:40:05] ppocr DEBUG: rec_res num  : 42, elapse : 0.7882494926452637
[2022/08/16 11:40:05] ppocr DEBUG: 0  Predict time of ./deploy/lite/imgs/lite_demo.png: 0.813s
[2022/08/16 11:40:05] ppocr DEBUG: visualizedimagesavedin./vis.jipg, 0.917
[2022/08/16 11:40:05] ppocr DEBUG: rhedetection, 0.882
[2022/08/16 11:40:05] ppocr DEBUG: 纯臻营养护发素0.993604, 0.914
[2022/08/16 11:40:05] ppocr DEBUG: 产品信息/参数, 0.873
[2022/08/16 11:40:05] ppocr DEBUG: 0.992728, 0.926
[2022/08/16 11:40:05] ppocr DEBUG: (45元/每公斤,100公斤起订), 0.926
[2022/08/16 11:40:05] ppocr DEBUG: 0.97417, 0.937
[2022/08/16 11:40:05] ppocr DEBUG: m, 0.799
[2022/08/16 11:40:05] ppocr DEBUG: 每瓶22元,1000瓶起订), 0.912
[2022/08/16 11:40:05] ppocr DEBUG: 0.993976, 0.909
[2022/08/16 11:40:05] ppocr DEBUG: 【品牌】:代加工方式/OEMODM, 0.896
[2022/08/16 11:40:05] ppocr DEBUG: 0.985133, 0.936
[2022/08/16 11:40:05] ppocr DEBUG: 【品名】:纯臻营养护发素, 0.986
[2022/08/16 11:40:05] ppocr DEBUG: 0.995007, 0.992
[2022/08/16 11:40:05] ppocr DEBUG: 6, 0.888
[2022/08/16 11:40:05] ppocr DEBUG: 【产品编号】:YM-X-30110.96899, 0.965
[2022/08/16 11:40:05] ppocr DEBUG: 【净含量】:220ml, 0.963
[2022/08/16 11:40:05] ppocr DEBUG: 0.996577, 0.928
[2022/08/16 11:40:05] ppocr DEBUG: 8, 0.997
[2022/08/16 11:40:05] ppocr DEBUG: 【适用人群】:适合所有肤质, 0.976
[2022/08/16 11:40:05] ppocr DEBUG: 0.995842, 0.908
[2022/08/16 11:40:05] ppocr DEBUG: 9, 0.874
[2022/08/16 11:40:05] ppocr DEBUG: 【主要成分】:鲸蜡硬脂醇、, 0.969
[2022/08/16 11:40:05] ppocr DEBUG: 蒸麦B-葡聚, 0.895
[2022/08/16 11:40:05] ppocr DEBUG: 0.961928, 0.995
[2022/08/16 11:40:05] ppocr DEBUG: 10, 0.994
[2022/08/16 11:40:05] ppocr DEBUG: 糖、椰油酰胺丙基甜菜碱、泛醒, 0.957
[2022/08/16 11:40:05] ppocr DEBUG: 0.925898, 0.976
[2022/08/16 11:40:05] ppocr DEBUG: 11, 0.998
[2022/08/16 11:40:05] ppocr DEBUG: (成品包材), 0.892
[2022/08/16 11:40:05] ppocr DEBUG: 0.972573, 0.843
[2022/08/16 11:40:05] ppocr DEBUG: 12, 0.999
[2022/08/16 11:40:05] ppocr DEBUG: 【主要功能】:可紧致头发磷层,从而达到, 0.949
[2022/08/16 11:40:05] ppocr DEBUG: 0.994448, 0.992
[2022/08/16 11:40:05] ppocr DEBUG: 13, 1.000
[2022/08/16 11:40:05] ppocr DEBUG: 即时持久改善头发光泽的效果,给干燥的头, 0.953
[2022/08/16 11:40:05] ppocr DEBUG: 0.990198, 0.963
[2022/08/16 11:40:05] ppocr DEBUG: 14, 1.000
[2022/08/16 11:40:05] ppocr DEBUG: 发足够的滋养, 0.966
[2022/08/16 11:40:05] ppocr DEBUG: 0.997668, 0.939
[2022/08/16 11:40:05] ppocr DEBUG: 花费了0.457335秒, 0.868
[2022/08/16 11:40:05] ppocr DEBUG: The visualized image saved in ./inference_results/lite_demo.png
[2022/08/16 11:40:05] ppocr INFO: The predict total time is 0.8480799198150635

PaddleLite使用NPU(华为)进行部署

进入Paddle Lite源码根目录。

wget https://paddlelite-demo.bj.bcebos.com/devices/huawei/kirin/hiai_ddk_lib_510.tar.gz
tar -xzvf hiai_ddk_lib_510.tar.gz

编译

rm -rf third-party/
./lite/tools/build_android.sh --toolchain=clang --android_stl=c++_shared --with_extra=ON --with_log=ON --with_nnadapter=ON --nnadapter_with_huawei_kirin_npu=ON --nnadapter_huawei_kirin_npu_sdk_root=$(pwd)/hiai_ddk_lib_510 full_publish
  • 编译opt
./lite/tools/build.sh build_optimize_tool

编译结果位于Paddle-Lite/build.opt/lite/api/opt,这里opt是可执行文件。

编辑opt环境,使用vim打开/etc/profile,添加内容如下

export OPT=/home/user/Paddle-Lite/build.opt/lite/api
export PATH=$PATH:$OPT

source /etc/profile,进入PaddleOCR的inference目录,执行

opt --model_file=./ch_PP-OCRv3_det_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_det_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_det_npu_out --valid_targets=huawei_kirin_npu,arm --optimize_out_type=naive_buffer
opt --model_file=./ch_PP-OCRv3_rec_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_rec_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_rec_npu_out --valid_targets=huawei_kirin_npu,arm --optimize_out_type=naive_buffer
opt --model_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_cls_npu_out --valid_targets=huawei_kirin_npu,arm --optimize_out_type=naive_buffer

这样就会产生三个nb文件

编译安卓demo

PaddleOCR的deploy目录下有一个android_demo的项目,我们可以单独将其拷贝出来使用Android Studio打开。

在local.properties中添加NDK的路径,我这里是

ndk.dir=/Users/admin/Documents/android-ndk-r17c

添加NDK的版本号

在Module级别的build.gradle修改NDK的版本号以及限制ndk的使用范围arm64-v8a

ndkVersion '17.2.4988734'
android {
    compileSdk 29
    defaultConfig {
        applicationId "com.baidu.paddle.lite.demo.ocr"
        minSdkVersion 23
        targetSdkVersion 29
        versionCode 2
        versionName "2.0"
        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
        ndk {
            abiFilters 'arm64-v8a'
        }
        externalNativeBuild {
            cmake {
                cppFlags "-std=c++11 -frtti -fexceptions -Wno-format"
                arguments '-DANDROID_PLATFORM=android-23', '-DANDROID_STL=c++_shared' ,"-DANDROID_ARM_NEON=TRUE"
            }
        }
    }

修改app/OpenCV/sdk/jni/OpenCVConfig.cmake中的

set(OpenCV_FOUND TRUE)

此处原本是False。当然此处需要C++安装OpenCV,可以参考模型部署篇 中的安装 C++ 版本的 OpenCV。

  • 更换模型

在app/src/main/assets/models/ch_PP-OCRv2目录下更换你自己的模型文件就可以了,这里必须是Paddle-Lite生成的.nb的模型文件,然后改成跟原始文件名相同的名字就可以了,其他都不用改。

  • 使用华为NPU加速安卓应用

假设上面替换的是使用华为npu转换的nb文件。替换npu编译的库文件,进入Paddle-Lite的根目录

cp -rf build.lite.android.armv8.clang/inference_lite_lib.android.armv8.nnadapter/cxx/include/* /Users/admin/Documents/android_demo/app/PaddleLite/cxx/include/

数据合成工具

有的时候,我们的数据背景比较单一,语言也比较单一,需要进行混合的使用,使得模型具有更好的泛化能力。

进入PaddleOCR主目录的styleText目录,下载模型,并解压

wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
unzip style_text_models.zip

现在我们挑选一张风格图作为OCR图片的背景,这里我们选择如下

此时我们输入的语料文字是PaddleOCR,在styleText/tools目录下的synth_image.py中添加执行参数

-c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en

,并且调整执行路径为styleText

我们来看看最终的输出    

除此之外,程序还会生成并保存中间结果fake_bg.jpg:为风格参考图去掉文字后的背景;

fake_text.jpg:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。

批量合成

下载风格图片https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar,下载完成后放入styleText目录下。

修改styleText/configs/dataset_config.yml,内容如下

Global:
  output_num: 10
  output_dir: output_data
  use_gpu: false
  image_height: 32
  image_width: 320
  standard_font: fonts/ch_standard.ttf
TextDrawer:
  fonts:
    en: fonts/en_standard.ttf
    ch: fonts/ch_standard.ttf
    ko: fonts/ko_standard.ttf
StyleSampler:
  method: DatasetSampler
  image_home: chkoen_5w
  label_file: chkoen_5w/label.txt
  with_label: false
CorpusGenerator:
  method: FileCorpus
  language: ch
  corpus_file: examples/corpus/example.txt
Predictor:
  method: StyleTextRecPredictor
  algorithm: StyleTextRec
  scale: 0.00392156862745098
  mean:
  - 0.5
  - 0.5
  - 0.5
  std:
  - 0.5
  - 0.5
  - 0.5
  expand_result: false
  bg_generator:
    pretrain: style_text_models/bg_generator
    module_name: bg_generator
    generator_type: BgGeneratorWithMask
    encode_dim: 64
    norm_layer: null
    conv_block_num: 4
    conv_block_dropout: false
    conv_block_dilation: true
    output_factor: 1.05
  text_generator:
    pretrain: style_text_models/text_generator
    module_name: text_generator
    generator_type: TextGenerator
    encode_dim: 64
    norm_layer: InstanceNorm2D
    conv_block_num: 4
    conv_block_dropout: false
    conv_block_dilation: true
  fusion_generator:
    pretrain: style_text_models/fusion_generator
    module_name: fusion_generator
    generator_type: FusionGeneratorSimple
    encode_dim: 64
    norm_layer: null
    conv_block_num: 4
    conv_block_dropout: false
    conv_block_dilation: true
Writer:
  method: SimpleWriter

 

展开阅读全文
加载中

作者的其它热门文章

打赏
0
0 收藏
分享
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部