pip install "paddleocr>=2.0.1"
对图像进行识别
from paddleocr import PaddleOCR, draw_ocr
from PIL import Image
if __name__ == '__main__':
ocr = PaddleOCR(use_angle_cls=True, lang='ch')
img_path = 'demo/demo_kie.jpeg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='data/chineseocr/labels/font.TTF')
im_show = Image.fromarray(im_show)
im_show.save('output/result5.jpg')
这里的PaddleOCR(use_angle_cls=True, lang='ch')中的lang可以是很多种语言,比如`ch`, `en`, `fr`, `german`, `korean`, `japan`。
这里即包含了文字检测,也包含了文本识别,一般结果如下
但如果是一张比较简单的文字,如
这个时候,我们只需要识别,无需检测
from paddleocr import PaddleOCR, draw_ocr
if __name__ == '__main__':
ocr = PaddleOCR(use_angle_cls=True, lang='en')
img_path = 'demo/demo_text_recog.jpg'
result = ocr.ocr(img_path, cls=True, det=False)
for line in result:
print(line)
运行结果(部分)
('STAR', 0.8838256597518921)
PaddleOCR框架下载地址:GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
模型训练
这里依然以Kaggle 验证码文本识别为例,PaddleOCR的数据集格式跟MMOCR有一些不同,它需要将训练数据集和测试数据集的图片放在两个不同的文件夹中。大致样式如下
由于之前都是放在一起的,所以写一个脚本将它们分开
import shutil
if __name__ == '__main__':
with open('data/toy_dataset/test_label.txt', 'r') as f:
for line in f:
filename = line.split(' ')[0]
shutil.move('data/toy_dataset/train/' + filename, 'data/toy_dataset/test/' + filename)
另外它的标签文件中间是以制表符\t分开的,而在MMOCR中是以空格分开的。
2wc38.png 2wc38
y5n6d.png y5n6d
men4f.png men4f
57b27.png 57b27
x3deb.png x3deb
这里依然使用SAR模型来进行训练。修改PaddleOCR主目录下的configs/rec/rec_r31_sar.yml文件,当然这只是识别框架的其中之一,我们以此为例,修改的部分内容如下
Global:
use_gpu: true
epoch_num: 200
log_smooth_window: 20
print_batch_step: 20
save_model_dir: ./sar_rec
save_epoch_step: 30
# evaluation is run every 2000 iterations
eval_batch_step: [0, 1000]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: data/toy_dataset/test/2en7g.png
# for data or label process
# character_dict_path: ppocr/utils/dict90.txt
character_dict_path: ppocr/utils/en_dict.txt
max_text_length: 30
infer_mode: False
use_space_char: False
rm_symbol: True
save_res_path: ./output/rec/predicts_sar.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs: [3, 4]
# values: [0.001, 0.0001, 0.00001]
values: [0.001, 0.001, 0.001]
regularizer:
name: 'L2'
factor: 0
Train:
dataset:
name: SimpleDataSet
label_file_list: ["./data/toy_dataset/train_label.txt"]
data_dir: ./data/toy_dataset/train/
ratio_list: 1.0
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SARLabelEncode: # Class handling label
- SARRecResizeImg:
image_shape: [3, 48, 48, 160] # h:48 w:[48,160]
width_downsample_ratio: 0.25
- KeepKeys:
keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 32
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: ./data/toy_dataset/test/
label_file_list: ["./data/toy_dataset/test_label.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SARLabelEncode: # Class handling label
- SARRecResizeImg:
image_shape: [3, 48, 48, 160]
width_downsample_ratio: 0.25
- KeepKeys:
keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 32
num_workers: 4
use_shared_memory: False
每一条内容的具体含义可以参考PaddleOCR/config.md at release/2.5 · PaddlePaddle/PaddleOCR · GitHub
将tools文件夹下的train.py拷贝到PaddleOCR主文件夹下,添加参数
--config=configs/rec/rec_r31_sar.yml
运行,开始训练。
运行结果(部分)
[2022/08/08 16:31:30] ppocr INFO: epoch: [80/200], global_step: 1980, lr: 0.001000, acc: 0.593750, norm_edit_dis: 0.904241, loss: 0.240176, avg_reader_cost: 0.04908 s, avg_batch_cost: 0.08742 s, avg_samples: 8.0, ips: 91.51111 samples/s, eta: 0:09:36
[2022/08/08 16:31:33] ppocr INFO: epoch: [80/200], global_step: 2000, lr: 0.001000, acc: 0.593750, norm_edit_dis: 0.911458, loss: 0.244095, avg_reader_cost: 0.00006 s, avg_batch_cost: 0.15064 s, avg_samples: 32.0, ips: 212.42641 samples/s, eta: 0:09:31
eval model:: 100%|██████████| 9/9 [00:03<00:00, 2.34it/s]
[2022/08/08 16:31:37] ppocr INFO: cur metric, acc: 0.9888475468829909, norm_edit_dis: 0.9977695168115421, fps: 72.34870468201763
[2022/08/08 16:31:38] ppocr INFO: save best model is to ./sar_rec/best_accuracy
[2022/08/08 16:31:38] ppocr INFO: best metric, acc: 0.9888475468829909, norm_edit_dis: 0.9977695168115421, fps: 72.34870468201763, best_epoch: 80
模型评估
将tools/eval.py拷贝到PaddleOCR主文件夹下,添加参数
-c configs/rec/rec_r31_sar.yml -o Global.checkpoints=sar_rec/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt
然后运行eval.py
运行结果
[2022/08/08 16:42:55] ppocr INFO: resume from sar_rec/best_accuracy
[2022/08/08 16:42:55] ppocr INFO: metric in ckpt ***************
[2022/08/08 16:42:55] ppocr INFO: acc:0.992565018863754
[2022/08/08 16:42:55] ppocr INFO: norm_edit_dis:0.9985130112076948
[2022/08/08 16:42:55] ppocr INFO: fps:72.37282298826126
[2022/08/08 16:42:55] ppocr INFO: best_epoch:200
[2022/08/08 16:42:55] ppocr INFO: start_epoch:201
eval model:: 100%|██████████| 9/9 [00:04<00:00, 1.81it/s]
[2022/08/08 16:43:00] ppocr INFO: metric eval ***************
[2022/08/08 16:43:00] ppocr INFO: acc:0.992565018863754
[2022/08/08 16:43:00] ppocr INFO: norm_edit_dis:0.9985130112076948
[2022/08/08 16:43:00] ppocr INFO: fps:55.0351114820872
预测
将tools/infer_rec.py拷贝到PaddleOCR主文件夹下,添加参数
-c configs/rec/rec_r31_sar.yml -o Global.checkpoints=sar_rec/best_accuracy Global.character_dict_path=ppocr/utils/en_dict.txt
运行infer_rec.py
运行结果
[2022/08/08 16:45:17] ppocr INFO: resume from sar_rec/best_accuracy
[2022/08/08 16:45:17] ppocr INFO: infer_img: data/toy_dataset/test/2en7g.png
[2022/08/08 16:45:18] ppocr INFO: result: 2en7g 1.0
[2022/08/08 16:45:18] ppocr INFO: success!
CRNN文字识别
除了SAR之外,CRNN也是一种文字识别的网络模型。假设有这样一个场景,它的文字已经被检测到了,接下来就是文字识别
基于RNN文字识别算法主要有两个框架:
- CNN+RNN+CTC(CRNN+CTC)
- CNN+Seq2Seq+Attention
这里我们以第一个框架为例进行说明,CRNN基本网络结构
输入的图像是一个32*100*3的形状,32为高,100是宽,3为通道数。Convlutional Layers是一个普通的CNN,用来提取图像的feature maps,大小为1*25*512。Recurrent Layers是一个RNN网络,它是一个深层双向LSTM网络,在卷积特征的基础上继续提取文字序列特征。
由于CNN输出的Feature map是(1,25,512)大小,所以对于RNN最大时间长度 T=25 (即有25个时间输入,每个输入 xt 列向量有 D=512 )。这里跟SAR不同,SAR的CNN输出的feature map的高度并不是1,它会将feature map的每一列作为后续RNN的输入。
为了将特征输入到Recurrent Layers,做如下处理:
- 首先会将图像在固定长宽比的情况下缩放到 32×W×3 大小( W 代表任意宽度)
- 然后经过CNN后变为 1×(W/4)×512
- 针对LSTM设置 T=(W/4) ,即可将特征输入LSTM。
所以在处理输入图像的时候,建议在保持长宽比的情况下将高缩放到 32,这样能够尽量不破坏图像中的文本细节(当然也可以将输入图像缩放到固定宽度,但是这样由于破坏文本的形状,肯定会造成性能下降)。以下代码为了简化将宽度给固定了。
import cv2
import math
import numpy as np
import matplotlib.pyplot as plt
def resize_norm_img(img):
'''
数据缩放归一化
:param img: 输入图片
:return:
'''
# 默认输入尺寸
img_c = 3
img_h = 32
img_w = 320
# 图片的真实高、宽
h, w = img.shape[:2]
# 图片真实长宽比
ratio = w / float(h)
# 按比例缩放
if math.ceil(img_h * ratio) > img_w:
# 如果大于默认宽度,则宽度为img_w
resized_w = img_w
else:
# 如果小于默认宽度则以图片真实宽度为准
resized_w = int(math.ceil(img_h * ratio))
# 缩放
resized_image = cv2.resize(img, (resized_w, img_h))
resized_image = resized_image.astype('float32')
# 归一化
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
# 对宽度不足的位置补0
padding_im = np.zeros((img_c, img_h, img_w), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
# 转置padding后的图片可用于可视化
draw_img = padding_im.transpose((1, 2, 0))
return padding_im, draw_img
if __name__ == '__main__':
raw_img = cv2.imread('word_1.png')
plt.figure()
plt.subplot(2, 1, 1)
plt.imshow(raw_img)
plt.show()
padding_im, draw_img = resize_norm_img(raw_img)
plt.subplot(2, 1, 1)
plt.imshow(draw_img)
plt.show()
运行结果
CNN Backbone
这里使用MobileNet V3作为主干网络,有关MobileNet V3的内容请参考深度学习网络模型的改进与调整 中的MobileNet V3
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class ConvBNLayer(nn.Layer):
def __init__(self, in_channels, out_channels, kernel_size, stride, padding, groups=1,
if_act=True, act=None):
'''
卷积BN层
:param in_channels: 输入通道数
:param out_channels: 输出通道数
:param kernel_size: 卷积核大小
:param stride: 步长大小
:param padding: 填充大小
:param groups: 二维卷积层的组数
:param if_act: 是否激活
:param act: 激活函数
'''
super(ConvBNLayer, self).__init__()
self.if_act = if_act
self.act = act
self.conv = nn.Conv2D(in_channels, out_channels, kernel_size, stride=stride, padding=padding,
groups=groups, bias_attr=False)
self.bn = nn.BatchNorm(num_channels=out_channels, act=None)
def forward(self, x):
out = self.conv(x)
out = self.bn(out)
if self.if_act:
if self.act == "relu":
out = F.relu(out)
elif self.act == "hardswish":
out = F.hardswish(out)
else:
print("The activation function({}) is selected incorrectly".format(self.act))
exit()
return out
class SEModule(nn.Layer):
def __init__(self, in_channels, reduction=4):
'''
SE模块
:param in_channels: 输入通道数
:param reduction: 通道缩放率
'''
super(SEModule, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2D(1)
self.conv1 = nn.Conv2D(in_channels, in_channels // reduction, 1, stride=1, padding=0)
self.conv2 = nn.Conv2D(in_channels // reduction, in_channels, 1, stride=1, padding=0)
def forward(self, x):
out = self.avg_pool(x)
out = self.conv1(out)
out = F.relu(out)
out = self.conv2(out)
out = F.hardsigmoid(out, slope=0.2, offset=0.5)
return x * out
class ResidualUnit(nn.Layer):
def __init__(self, in_channels, mid_channels, out_channels, kernel_size, stride,
use_se, act=None):
'''
残差层
:param in_channels: 输入通道数
:param mid_channels: 中间通道数
:param out_channels: 输出通道数
:param kernel_size: 卷积核尺寸
:param stride: 步长大小
:param use_se: 是否使用se模块
:param act: 激活函数
'''
super(ResidualUnit, self).__init__()
self.if_shortcut = stride == 1 and in_channels == out_channels
self.if_se = use_se
self.expand_conv = ConvBNLayer(in_channels, mid_channels, 1, 1, 0, if_act=True, act=act)
self.bottleneck_conv = ConvBNLayer(mid_channels, mid_channels, kernel_size, stride,
int((kernel_size - 1) // 2), groups=mid_channels,
if_act=True, act=act)
if self.if_se:
self.mid_se = SEModule(mid_channels)
self.linear_conv = ConvBNLayer(mid_channels, out_channels, 1, 1, 0, if_act=False, act=None)
def forward(self, x):
out = self.expand_conv(x)
out = self.bottleneck_conv(out)
if self.if_se:
out = self.mid_se(out)
out = self.linear_conv(out)
if self.if_shortcut:
out = paddle.add(x, out)
return out
def make_divisible(v, divisor=8, min_value=None):
'''
确保被8整除
'''
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class MobileNetV3(nn.Layer):
def __init__(self, in_channels=3, model_name='small', scale=0.5, small_stride=None,
disable_se=False, **kwargs):
super(MobileNetV3, self).__init__()
self.disable_se = disable_se
small_stride = [1, 2, 2, 2]
if model_name == "small":
cfg = [
[3, 16, 16, True, 'relu', (small_stride[0], 1)],
[3, 72, 24, False, 'relu', (small_stride[1], 1)],
[3, 88, 24, False, 'relu', 1],
[5, 96, 40, True, 'hardswish', (small_stride[2], 1)],
[5, 240, 40, True, 'hardswish', 1],
[5, 240, 40, True, 'hardswish', 1],
[5, 120, 48, True, 'hardswish', 1],
[5, 144, 48, True, 'hardswish', 1],
[5, 288, 96, True, 'hardswish', (small_stride[3], 1)],
[5, 576, 96, True, 'hardswish', 1],
[5, 576, 96, True, 'hardswish', 1]
]
cls_ch_squeeze = 576
else:
raise NotImplementedError("model[" + model_name + "_model] is not implemented!")
supported_scale = [0.35, 0.5, 0.75, 1.0, 1.25]
assert scale in supported_scale, "supported scales are {} but input scale is {}".format(
supported_scale, scale)
inplanes = 16
self.conv1 = ConvBNLayer(in_channels, make_divisible(inplanes * scale), 3, 2, 1, groups=1,
if_act=True, act='hardswish')
i = 0
block_list = []
inplanes = make_divisible(inplanes * scale)
for (k, exp, c, se, nl, s) in cfg:
se = se and not self.disable_se
block_list.append(
ResidualUnit(inplanes, make_divisible(scale * exp), make_divisible(scale * c), k, s,
se, act=nl))
inplanes = make_divisible(scale * c)
i += 1
self.blocks = nn.Sequential(*block_list)
self.conv2 = ConvBNLayer(inplanes, make_divisible(scale * cls_ch_squeeze), 1, 1, 0,
groups=1, if_act=True, act='hardswish')
self.pool = nn.MaxPool2D(2, stride=2, padding=0)
self.out_channels = make_divisible(scale * cls_ch_squeeze)
def forward(self, x):
out = self.conv1(x)
out = self.blocks(out)
out = self.conv2(out)
out = self.pool(out)
return out
if __name__ == '__main__':
IMAGE_SHAPE_C = 3
IMAGE_SHAPE_H = 32
IMAGE_SHAPE_W = 100
paddle.summary(MobileNetV3(), [(1, IMAGE_SHAPE_C, IMAGE_SHAPE_H, IMAGE_SHAPE_W)])
img = paddle.rand((1, IMAGE_SHAPE_C, IMAGE_SHAPE_H, IMAGE_SHAPE_W), dtype='float32')
net = MobileNetV3()
out = net(img)
print(out.shape)
运行结果
-------------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===============================================================================
Conv2D-1 [[1, 3, 32, 100]] [1, 8, 16, 50] 216
BatchNorm-1 [[1, 8, 16, 50]] [1, 8, 16, 50] 32
ConvBNLayer-1 [[1, 3, 32, 100]] [1, 8, 16, 50] 0
Conv2D-2 [[1, 8, 16, 50]] [1, 8, 16, 50] 64
BatchNorm-2 [[1, 8, 16, 50]] [1, 8, 16, 50] 32
ConvBNLayer-2 [[1, 8, 16, 50]] [1, 8, 16, 50] 0
Conv2D-3 [[1, 8, 16, 50]] [1, 8, 16, 50] 72
BatchNorm-3 [[1, 8, 16, 50]] [1, 8, 16, 50] 32
ConvBNLayer-3 [[1, 8, 16, 50]] [1, 8, 16, 50] 0
AdaptiveAvgPool2D-1 [[1, 8, 16, 50]] [1, 8, 1, 1] 0
Conv2D-4 [[1, 8, 1, 1]] [1, 2, 1, 1] 18
Conv2D-5 [[1, 2, 1, 1]] [1, 8, 1, 1] 24
SEModule-1 [[1, 8, 16, 50]] [1, 8, 16, 50] 0
Conv2D-6 [[1, 8, 16, 50]] [1, 8, 16, 50] 64
BatchNorm-4 [[1, 8, 16, 50]] [1, 8, 16, 50] 32
ConvBNLayer-4 [[1, 8, 16, 50]] [1, 8, 16, 50] 0
ResidualUnit-1 [[1, 8, 16, 50]] [1, 8, 16, 50] 0
Conv2D-7 [[1, 8, 16, 50]] [1, 40, 16, 50] 320
BatchNorm-5 [[1, 40, 16, 50]] [1, 40, 16, 50] 160
ConvBNLayer-5 [[1, 8, 16, 50]] [1, 40, 16, 50] 0
Conv2D-8 [[1, 40, 16, 50]] [1, 40, 8, 50] 360
BatchNorm-6 [[1, 40, 8, 50]] [1, 40, 8, 50] 160
ConvBNLayer-6 [[1, 40, 16, 50]] [1, 40, 8, 50] 0
Conv2D-9 [[1, 40, 8, 50]] [1, 16, 8, 50] 640
BatchNorm-7 [[1, 16, 8, 50]] [1, 16, 8, 50] 64
ConvBNLayer-7 [[1, 40, 8, 50]] [1, 16, 8, 50] 0
ResidualUnit-2 [[1, 8, 16, 50]] [1, 16, 8, 50] 0
Conv2D-10 [[1, 16, 8, 50]] [1, 48, 8, 50] 768
BatchNorm-8 [[1, 48, 8, 50]] [1, 48, 8, 50] 192
ConvBNLayer-8 [[1, 16, 8, 50]] [1, 48, 8, 50] 0
Conv2D-11 [[1, 48, 8, 50]] [1, 48, 8, 50] 432
BatchNorm-9 [[1, 48, 8, 50]] [1, 48, 8, 50] 192
ConvBNLayer-9 [[1, 48, 8, 50]] [1, 48, 8, 50] 0
Conv2D-12 [[1, 48, 8, 50]] [1, 16, 8, 50] 768
BatchNorm-10 [[1, 16, 8, 50]] [1, 16, 8, 50] 64
ConvBNLayer-10 [[1, 48, 8, 50]] [1, 16, 8, 50] 0
ResidualUnit-3 [[1, 16, 8, 50]] [1, 16, 8, 50] 0
Conv2D-13 [[1, 16, 8, 50]] [1, 48, 8, 50] 768
BatchNorm-11 [[1, 48, 8, 50]] [1, 48, 8, 50] 192
ConvBNLayer-11 [[1, 16, 8, 50]] [1, 48, 8, 50] 0
Conv2D-14 [[1, 48, 8, 50]] [1, 48, 4, 50] 1,200
BatchNorm-12 [[1, 48, 4, 50]] [1, 48, 4, 50] 192
ConvBNLayer-12 [[1, 48, 8, 50]] [1, 48, 4, 50] 0
AdaptiveAvgPool2D-2 [[1, 48, 4, 50]] [1, 48, 1, 1] 0
Conv2D-15 [[1, 48, 1, 1]] [1, 12, 1, 1] 588
Conv2D-16 [[1, 12, 1, 1]] [1, 48, 1, 1] 624
SEModule-2 [[1, 48, 4, 50]] [1, 48, 4, 50] 0
Conv2D-17 [[1, 48, 4, 50]] [1, 24, 4, 50] 1,152
BatchNorm-13 [[1, 24, 4, 50]] [1, 24, 4, 50] 96
ConvBNLayer-13 [[1, 48, 4, 50]] [1, 24, 4, 50] 0
ResidualUnit-4 [[1, 16, 8, 50]] [1, 24, 4, 50] 0
Conv2D-18 [[1, 24, 4, 50]] [1, 120, 4, 50] 2,880
BatchNorm-14 [[1, 120, 4, 50]] [1, 120, 4, 50] 480
ConvBNLayer-14 [[1, 24, 4, 50]] [1, 120, 4, 50] 0
Conv2D-19 [[1, 120, 4, 50]] [1, 120, 4, 50] 3,000
BatchNorm-15 [[1, 120, 4, 50]] [1, 120, 4, 50] 480
ConvBNLayer-15 [[1, 120, 4, 50]] [1, 120, 4, 50] 0
AdaptiveAvgPool2D-3 [[1, 120, 4, 50]] [1, 120, 1, 1] 0
Conv2D-20 [[1, 120, 1, 1]] [1, 30, 1, 1] 3,630
Conv2D-21 [[1, 30, 1, 1]] [1, 120, 1, 1] 3,720
SEModule-3 [[1, 120, 4, 50]] [1, 120, 4, 50] 0
Conv2D-22 [[1, 120, 4, 50]] [1, 24, 4, 50] 2,880
BatchNorm-16 [[1, 24, 4, 50]] [1, 24, 4, 50] 96
ConvBNLayer-16 [[1, 120, 4, 50]] [1, 24, 4, 50] 0
ResidualUnit-5 [[1, 24, 4, 50]] [1, 24, 4, 50] 0
Conv2D-23 [[1, 24, 4, 50]] [1, 120, 4, 50] 2,880
BatchNorm-17 [[1, 120, 4, 50]] [1, 120, 4, 50] 480
ConvBNLayer-17 [[1, 24, 4, 50]] [1, 120, 4, 50] 0
Conv2D-24 [[1, 120, 4, 50]] [1, 120, 4, 50] 3,000
BatchNorm-18 [[1, 120, 4, 50]] [1, 120, 4, 50] 480
ConvBNLayer-18 [[1, 120, 4, 50]] [1, 120, 4, 50] 0
AdaptiveAvgPool2D-4 [[1, 120, 4, 50]] [1, 120, 1, 1] 0
Conv2D-25 [[1, 120, 1, 1]] [1, 30, 1, 1] 3,630
Conv2D-26 [[1, 30, 1, 1]] [1, 120, 1, 1] 3,720
SEModule-4 [[1, 120, 4, 50]] [1, 120, 4, 50] 0
Conv2D-27 [[1, 120, 4, 50]] [1, 24, 4, 50] 2,880
BatchNorm-19 [[1, 24, 4, 50]] [1, 24, 4, 50] 96
ConvBNLayer-19 [[1, 120, 4, 50]] [1, 24, 4, 50] 0
ResidualUnit-6 [[1, 24, 4, 50]] [1, 24, 4, 50] 0
Conv2D-28 [[1, 24, 4, 50]] [1, 64, 4, 50] 1,536
BatchNorm-20 [[1, 64, 4, 50]] [1, 64, 4, 50] 256
ConvBNLayer-20 [[1, 24, 4, 50]] [1, 64, 4, 50] 0
Conv2D-29 [[1, 64, 4, 50]] [1, 64, 4, 50] 1,600
BatchNorm-21 [[1, 64, 4, 50]] [1, 64, 4, 50] 256
ConvBNLayer-21 [[1, 64, 4, 50]] [1, 64, 4, 50] 0
AdaptiveAvgPool2D-5 [[1, 64, 4, 50]] [1, 64, 1, 1] 0
Conv2D-30 [[1, 64, 1, 1]] [1, 16, 1, 1] 1,040
Conv2D-31 [[1, 16, 1, 1]] [1, 64, 1, 1] 1,088
SEModule-5 [[1, 64, 4, 50]] [1, 64, 4, 50] 0
Conv2D-32 [[1, 64, 4, 50]] [1, 24, 4, 50] 1,536
BatchNorm-22 [[1, 24, 4, 50]] [1, 24, 4, 50] 96
ConvBNLayer-22 [[1, 64, 4, 50]] [1, 24, 4, 50] 0
ResidualUnit-7 [[1, 24, 4, 50]] [1, 24, 4, 50] 0
Conv2D-33 [[1, 24, 4, 50]] [1, 72, 4, 50] 1,728
BatchNorm-23 [[1, 72, 4, 50]] [1, 72, 4, 50] 288
ConvBNLayer-23 [[1, 24, 4, 50]] [1, 72, 4, 50] 0
Conv2D-34 [[1, 72, 4, 50]] [1, 72, 4, 50] 1,800
BatchNorm-24 [[1, 72, 4, 50]] [1, 72, 4, 50] 288
ConvBNLayer-24 [[1, 72, 4, 50]] [1, 72, 4, 50] 0
AdaptiveAvgPool2D-6 [[1, 72, 4, 50]] [1, 72, 1, 1] 0
Conv2D-35 [[1, 72, 1, 1]] [1, 18, 1, 1] 1,314
Conv2D-36 [[1, 18, 1, 1]] [1, 72, 1, 1] 1,368
SEModule-6 [[1, 72, 4, 50]] [1, 72, 4, 50] 0
Conv2D-37 [[1, 72, 4, 50]] [1, 24, 4, 50] 1,728
BatchNorm-25 [[1, 24, 4, 50]] [1, 24, 4, 50] 96
ConvBNLayer-25 [[1, 72, 4, 50]] [1, 24, 4, 50] 0
ResidualUnit-8 [[1, 24, 4, 50]] [1, 24, 4, 50] 0
Conv2D-38 [[1, 24, 4, 50]] [1, 144, 4, 50] 3,456
BatchNorm-26 [[1, 144, 4, 50]] [1, 144, 4, 50] 576
ConvBNLayer-26 [[1, 24, 4, 50]] [1, 144, 4, 50] 0
Conv2D-39 [[1, 144, 4, 50]] [1, 144, 2, 50] 3,600
BatchNorm-27 [[1, 144, 2, 50]] [1, 144, 2, 50] 576
ConvBNLayer-27 [[1, 144, 4, 50]] [1, 144, 2, 50] 0
AdaptiveAvgPool2D-7 [[1, 144, 2, 50]] [1, 144, 1, 1] 0
Conv2D-40 [[1, 144, 1, 1]] [1, 36, 1, 1] 5,220
Conv2D-41 [[1, 36, 1, 1]] [1, 144, 1, 1] 5,328
SEModule-7 [[1, 144, 2, 50]] [1, 144, 2, 50] 0
Conv2D-42 [[1, 144, 2, 50]] [1, 48, 2, 50] 6,912
BatchNorm-28 [[1, 48, 2, 50]] [1, 48, 2, 50] 192
ConvBNLayer-28 [[1, 144, 2, 50]] [1, 48, 2, 50] 0
ResidualUnit-9 [[1, 24, 4, 50]] [1, 48, 2, 50] 0
Conv2D-43 [[1, 48, 2, 50]] [1, 288, 2, 50] 13,824
BatchNorm-29 [[1, 288, 2, 50]] [1, 288, 2, 50] 1,152
ConvBNLayer-29 [[1, 48, 2, 50]] [1, 288, 2, 50] 0
Conv2D-44 [[1, 288, 2, 50]] [1, 288, 2, 50] 7,200
BatchNorm-30 [[1, 288, 2, 50]] [1, 288, 2, 50] 1,152
ConvBNLayer-30 [[1, 288, 2, 50]] [1, 288, 2, 50] 0
AdaptiveAvgPool2D-8 [[1, 288, 2, 50]] [1, 288, 1, 1] 0
Conv2D-45 [[1, 288, 1, 1]] [1, 72, 1, 1] 20,808
Conv2D-46 [[1, 72, 1, 1]] [1, 288, 1, 1] 21,024
SEModule-8 [[1, 288, 2, 50]] [1, 288, 2, 50] 0
Conv2D-47 [[1, 288, 2, 50]] [1, 48, 2, 50] 13,824
BatchNorm-31 [[1, 48, 2, 50]] [1, 48, 2, 50] 192
ConvBNLayer-31 [[1, 288, 2, 50]] [1, 48, 2, 50] 0
ResidualUnit-10 [[1, 48, 2, 50]] [1, 48, 2, 50] 0
Conv2D-48 [[1, 48, 2, 50]] [1, 288, 2, 50] 13,824
BatchNorm-32 [[1, 288, 2, 50]] [1, 288, 2, 50] 1,152
ConvBNLayer-32 [[1, 48, 2, 50]] [1, 288, 2, 50] 0
Conv2D-49 [[1, 288, 2, 50]] [1, 288, 2, 50] 7,200
BatchNorm-33 [[1, 288, 2, 50]] [1, 288, 2, 50] 1,152
ConvBNLayer-33 [[1, 288, 2, 50]] [1, 288, 2, 50] 0
AdaptiveAvgPool2D-9 [[1, 288, 2, 50]] [1, 288, 1, 1] 0
Conv2D-50 [[1, 288, 1, 1]] [1, 72, 1, 1] 20,808
Conv2D-51 [[1, 72, 1, 1]] [1, 288, 1, 1] 21,024
SEModule-9 [[1, 288, 2, 50]] [1, 288, 2, 50] 0
Conv2D-52 [[1, 288, 2, 50]] [1, 48, 2, 50] 13,824
BatchNorm-34 [[1, 48, 2, 50]] [1, 48, 2, 50] 192
ConvBNLayer-34 [[1, 288, 2, 50]] [1, 48, 2, 50] 0
ResidualUnit-11 [[1, 48, 2, 50]] [1, 48, 2, 50] 0
Conv2D-53 [[1, 48, 2, 50]] [1, 288, 2, 50] 13,824
BatchNorm-35 [[1, 288, 2, 50]] [1, 288, 2, 50] 1,152
ConvBNLayer-35 [[1, 48, 2, 50]] [1, 288, 2, 50] 0
MaxPool2D-1 [[1, 288, 2, 50]] [1, 288, 1, 25] 0
===============================================================================
Total params: 259,056
Trainable params: 246,736
Non-trainable params: 12,320
-------------------------------------------------------------------------------
Input size (MB): 0.04
Forward/backward pass size (MB): 13.88
Params size (MB): 0.99
Estimated Total Size (MB): 14.91
-------------------------------------------------------------------------------
[1, 288, 1, 25]
由结果可知,它最终输出的feature map的高度为1,宽度为25,通道数为288。
RNN neck
neck部分将backbone输出的视觉特征图转换为1维向量输入送到LSTM网络中,输出序列特征
import paddle
import paddle.nn as nn
from backbone import MobileNetV3
class Im2Seq(nn.Layer):
def __init__(self, in_channels, **kwargs):
'''
图像特征转换为序列特征
:param in_channels: 输入通道数
'''
super(Im2Seq, self).__init__()
self.out_channels = in_channels
def forward(self, x):
B, C, H, W = x.shape
assert H == 1
x = x.squeeze(axis=2)
x = x.transpose([0, 2, 1])
return x
class SequenceEncoder(nn.Layer):
def __init__(self, in_channels, hidden_size=48, **kwargs):
'''
序列编码
:param in_channels: 输入通道数
:param hidden_size: 隐藏层size
'''
super(SequenceEncoder, self).__init__()
self.encoder_reshape = Im2Seq(in_channels)
self.encoder = EncoderWithRNN(self.encoder_reshape.out_channels, hidden_size)
self.out_channels = self.encoder.out_channels
def forward(self, x):
x = self.encoder_reshape(x)
x = self.encoder(x)
return x
class EncoderWithRNN(nn.Layer):
def __init__(self, in_channels, hidden_size):
super(EncoderWithRNN, self).__init__()
self.out_channels = hidden_size * 2
# 双向的LSTM
self.lstm = nn.LSTM(in_channels, hidden_size, direction='bidirectional', num_layers=2)
def forward(self, x):
x, _ = self.lstm(x)
return x
if __name__ == '__main__':
IMAGE_SHAPE_C = 3
IMAGE_SHAPE_H = 32
IMAGE_SHAPE_W = 100
img = paddle.rand((1, IMAGE_SHAPE_C, IMAGE_SHAPE_H, IMAGE_SHAPE_W), dtype='float32')
net = MobileNetV3()
out = net(img)
neck = SequenceEncoder(in_channels=288)
sequence = neck(out)
print(sequence.shape)
运行结果
[1, 25, 96]
模型推理和部署
使用预训练模型
进入PaddleOCR github官网,下载模型
这里选择推理模型(inference model)下载(三个都要下载),将下载好的模型.tar文件解压后放入PaddleOCR主目录的inference目录下。
将/tools/infer目录下的predict_system.py拷贝到PaddleOCR主目录,添加执行参数
--image_dir=doc/imgs/00018069.jpg --det_model_dir=inference/ch_PP-OCRv3_det_infer/ --cls_model_dir=inference/ch_ppocr_mobile_v2.0_cls_infer/ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer/ --use_angle_cls=True
这里我们使用的图片如下
执行完成后会生成一个inference_results的文件夹,里面会有相应的执行结果的图片
我们不仅可以保存上面的预测结果还可以对结果进行裁剪,修改执行参数
--image_dir=doc/imgs/00018069.jpg --det_model_dir=inference/ch_PP-OCRv3_det_infer/ --cls_model_dir=inference/ch_ppocr_mobile_v2.0_cls_infer/ --rec_model_dir=inference/ch_PP-OCRv3_rec_infer/ --use_angle_cls=True --save_crop_res=True
执行后会多出一个output文件夹,里面会有裁剪后的图片
将训练模型转换成推理模型
一般我们训练出来的模型如下图所示
我们需要将其转化成之前下载的预训练推理模型的格式
将tools/export_model.py拷贝到OCR主目录下,添加执行参数
-c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
执行后会出现如下的推理文件,我们会使用的是Student里面的模型
模型推理
将tools/infer目录下的predict_rec.py拷贝到OCR主目录下,添加执行参数
--image_dir="train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"
移动端部署
- paddle-lite安装
环境要求
- gcc、g++ == 8.2.0
- CMake >= 3.10
- git、make、wget、python
环境安装命令
linux系统
sudo apt update
sudo apt-get install -y --no-install-recommends gcc g++ git make wget python unzip curl
wget -c https://mms-res.cdn.bcebos.com/cmake-3.10.3-Linux-x86_64.tar.gz
tar xzf cmake-3.10.3-Linux-x86_64.tar.gz
sudo mv cmake-3.10.3-Linux-x86_64 /opt/cmake-3.10
sudo ln -s /opt/cmake-3.10/bin/cmake /usr/bin/cmake
sudo ln -s /opt/cmake-3.10/bin/ccmake /usr/bin/ccmake
macos系统
brew install curl gcc git make unzip wget
wget https://cmake.org/files/v3.10/cmake-3.10.2-Darwin-x86_64.tar.gz
tar -xzvf cmake-3.10.2-Darwin-x86_64.tar.gz
mv cmake-3.10.2-Darwin-x86_64 3.10.2
sudo ln -s /usr/local/Cellar/cmake/3.10.2/bin/cmake /usr/local/cmake
编译过程
linux系统
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite/
git checkout develop
rm -rf third-party/
./lite/tools/build_linux.sh --arch=x86
macos系统
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite/
git checkout develop
rm -rf third-party/
./lite/tools/build_macos.sh x86
- 获取预测库
git checkout release/v2.10
安装安卓NDK,请参考模型部署篇 中的模型的转化与量化加速,不过如果下载地址无法下载的话可以使用https://www.androiddevtools.cn/进行下载。
如果是mac的请直接下载并解压
curl -O https://dl.google.com/android/repository/android-ndk-r17c-darwin-x86_64.zip
按照以上方式安装好后,需要在/etc/profile中添加。这里/Users/admin/Documents/android-ndk-r17c是我的安装地址,改成你实际的安装地址。
export ANDROID_NDK=/Users/admin/Documents/android-ndk-r17c
export PATH=$PATH:$ANDROID_NDK
export NDK_ROOT=$ANDROID_NDK
再source /etc/profile文件。
然后是安装JDK,由于Oracle网站登录有点问题,这里给一个JDK的下载链接:https://pan.baidu.com/s/17XOussWis0w56kh4k0A_0Q
提取码:1ucb,配置/etc/profile,这里/dev/ndk/jdk1.8.0_161是我的安装目录,你要根据你自己的安装目录填写
export JAVA_HOME=/dev/ndk/jdk1.8.0_161
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
再source /etc/profile文件。然后再在Paddle-Lite目录下执行
linux系统
./lite/tools/build_android.sh --arch=armv8 --with_cv=ON --with_extra=ON
macos系统
./lite/tools/build_android.sh
macos如果编译不出来,可以在https://paddle-lite.readthedocs.io/zh/release-v2.10_a/quick_start/release_lib.html直接下载
编译后的预测库位于Paddle-lite文件夹下的build.lite.android.armv8.gcc/inference_lite_lib.android.armv8,我这里是
/home/user/Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8
- 模型优化
安装Python版本的Paddle-Lite
pip install paddlelite==2.10 -i https://pypi.tuna.tsinghua.edu.cn/simple
我们之前下载好了三个Python版本的预训练模型,现在我们把它们给转化成Paddle-Lite需要的.nb格式的模型。进入PaddleOCR主目录下的inference目录,执行以下命令
paddle_lite_opt --model_file=./ch_PP-OCRv3_det_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_det_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_det_out --valid_targets=arm --optimize_out_type=naive_buffer
paddle_lite_opt --model_file=./ch_PP-OCRv3_rec_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_rec_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_rec_out --valid_targets=arm --optimize_out_type=naive_buffer
paddle_lite_opt --model_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams --optimize_out=./ch_ppocr_mobile_v2.0_cls_out --valid_targets=arm --optimize_out_type=naive_buffer
这样就会产生三个nb文件
- 与手机联调
准备预测库文件、测试图像和使用的字典文件,并放置在预测库中的demo/cxx/ocr文件夹下。先进入PaddleOCR主目录的deploy/lite文件夹,执行
sh prepare.sh /home/user/Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8
该命令接的参数为编译好的预测库的文件夹地址。
进入生成的OCR demo目录
cd /home/user/Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/demo/cxx/ocr/
将C++预测动态库so文件复制到debug文件夹中
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
将生成的三个.nb文件复制到debug文件夹中
cp /home/user/下载/PaddleOCR-release-2.5/inference/*.nb ./debug/
此时debug文件夹的内容如下图所示
其中ppocr_keys_v1.txt是中文字典文件,如果使用的 nb 模型是英文数字或其他语言的模型,需要更换为对应语言的字典。 PaddleOCR 在ppocr/utils/下存放了多种字典,包括:
dict/french_dict.txt # 法语字典
dict/german_dict.txt # 德语字典
ic15_dict.txt # 英文字典
dict/japan_dict.txt # 日语字典
dict/korean_dict.txt # 韩语字典
ppocr_keys_v1.txt # 中文字典
...
config.txt
包含了检测器、分类器、识别器的超参数,如下:
max_side_len 960 # 输入图像长宽大于960时,等比例缩放图像,使得图像最长边为960
det_db_thresh 0.3 # 用于过滤DB预测的二值化图像,设置为0.-0.3对结果影响不明显
det_db_box_thresh 0.5 # 检测器后处理过滤box的阈值,如果检测存在漏框情况,可酌情减小
det_db_unclip_ratio 1.6 # 表示文本框的紧致程度,越小则文本框更靠近文本
use_direction_classify 0 # 是否使用方向分类器,0表示不使用,1表示使用
rec_image_height 32 # 识别模型输入图像的高度,PP-OCRv3模型设置为48,PP-OCRv2模型需要设置为32
安装Auto_log,在ocr文件夹下执行
git clone https://gitee.com/Double_V/AutoLog
cd AutoLog/
pip install -r requirements.txt
python setup.py bdist_wheel
pip install ./dist/auto_log-1.2.0-py3-none-any.whl
回到ocr文件夹
cd ..
make -j
make -j
mv ocr_db_crnn ./debug/
adb push debug /data/local/tmp/
adb shell
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
./ocr_db_crnn ch_PP-OCRv3_det_out.nb ch_PP-OCRv3_rec_out.nb ch_ppocr_mobile_v2.0_cls_out.nb ./11.jpg ppocr_keys_v1.txt
Paddle2ONNX
环境准备
python -m pip install paddle2onnx
python -m pip install onnxruntime==1.9.0
pip install onnx
模型转换,使用Paddle2ONNX将Paddle模型转换成ONNX模型。进入PaddleOCR主目录
paddle2onnx --model_dir ./inference/ch_PP-OCRv3_det_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./inference/det_onnx/model.onnx --opset_version 10 --input_shape_dict="{'x':[-1,3,-1,-1]}" --enable_onnx_checker True
paddle2onnx --model_dir ./inference/ch_PP-OCRv3_rec_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./inference/rec_onnx/model.onnx --opset_version 10 --input_shape_dict="{'x':[-1,3,-1,-1]}" --enable_onnx_checker True
paddle2onnx --model_dir ./inference/ch_ppocr_mobile_v2.0_cls_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ./inference/cls_onnx/model.onnx --opset_version 10 --input_shape_dict="{'x':[-1,3,-1,-1]}" --enable_onnx_checker True
这样就生成了三个onnx的文件
注意:对于OCR模型,转化过程中必须采用动态shape的形式,即加入选项--input_shape_dict="{'x': [-1, 3, -1, -1]}",否则预测结果可能与直接使用Paddle预测有细微不同。 另外,以下几个模型暂不支持转换为 ONNX 模型: NRTR、SAR、RARE、SRN
- 推理预测
执行拷贝到PaddleOCR主目录的predict_system.py,添加执行参数
--use_gpu=False --use_onnx=True --det_model_dir=./inference/det_onnx/model.onnx --rec_model_dir=./inference/rec_onnx/model.onnx --cls_model_dir=./inference/cls_onnx/model.onnx --image_dir=./deploy/lite/imgs/lite_demo.png
执行结果,在inference_results目录下
终端输出
[2022/08/16 11:40:04] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2022/08/16 11:40:04] ppocr DEBUG: dt_boxes num : 42, elapse : 0.022621631622314453
[2022/08/16 11:40:05] ppocr DEBUG: rec_res num : 42, elapse : 0.7882494926452637
[2022/08/16 11:40:05] ppocr DEBUG: 0 Predict time of ./deploy/lite/imgs/lite_demo.png: 0.813s
[2022/08/16 11:40:05] ppocr DEBUG: visualizedimagesavedin./vis.jipg, 0.917
[2022/08/16 11:40:05] ppocr DEBUG: rhedetection, 0.882
[2022/08/16 11:40:05] ppocr DEBUG: 纯臻营养护发素0.993604, 0.914
[2022/08/16 11:40:05] ppocr DEBUG: 产品信息/参数, 0.873
[2022/08/16 11:40:05] ppocr DEBUG: 0.992728, 0.926
[2022/08/16 11:40:05] ppocr DEBUG: (45元/每公斤,100公斤起订), 0.926
[2022/08/16 11:40:05] ppocr DEBUG: 0.97417, 0.937
[2022/08/16 11:40:05] ppocr DEBUG: m, 0.799
[2022/08/16 11:40:05] ppocr DEBUG: 每瓶22元,1000瓶起订), 0.912
[2022/08/16 11:40:05] ppocr DEBUG: 0.993976, 0.909
[2022/08/16 11:40:05] ppocr DEBUG: 【品牌】:代加工方式/OEMODM, 0.896
[2022/08/16 11:40:05] ppocr DEBUG: 0.985133, 0.936
[2022/08/16 11:40:05] ppocr DEBUG: 【品名】:纯臻营养护发素, 0.986
[2022/08/16 11:40:05] ppocr DEBUG: 0.995007, 0.992
[2022/08/16 11:40:05] ppocr DEBUG: 6, 0.888
[2022/08/16 11:40:05] ppocr DEBUG: 【产品编号】:YM-X-30110.96899, 0.965
[2022/08/16 11:40:05] ppocr DEBUG: 【净含量】:220ml, 0.963
[2022/08/16 11:40:05] ppocr DEBUG: 0.996577, 0.928
[2022/08/16 11:40:05] ppocr DEBUG: 8, 0.997
[2022/08/16 11:40:05] ppocr DEBUG: 【适用人群】:适合所有肤质, 0.976
[2022/08/16 11:40:05] ppocr DEBUG: 0.995842, 0.908
[2022/08/16 11:40:05] ppocr DEBUG: 9, 0.874
[2022/08/16 11:40:05] ppocr DEBUG: 【主要成分】:鲸蜡硬脂醇、, 0.969
[2022/08/16 11:40:05] ppocr DEBUG: 蒸麦B-葡聚, 0.895
[2022/08/16 11:40:05] ppocr DEBUG: 0.961928, 0.995
[2022/08/16 11:40:05] ppocr DEBUG: 10, 0.994
[2022/08/16 11:40:05] ppocr DEBUG: 糖、椰油酰胺丙基甜菜碱、泛醒, 0.957
[2022/08/16 11:40:05] ppocr DEBUG: 0.925898, 0.976
[2022/08/16 11:40:05] ppocr DEBUG: 11, 0.998
[2022/08/16 11:40:05] ppocr DEBUG: (成品包材), 0.892
[2022/08/16 11:40:05] ppocr DEBUG: 0.972573, 0.843
[2022/08/16 11:40:05] ppocr DEBUG: 12, 0.999
[2022/08/16 11:40:05] ppocr DEBUG: 【主要功能】:可紧致头发磷层,从而达到, 0.949
[2022/08/16 11:40:05] ppocr DEBUG: 0.994448, 0.992
[2022/08/16 11:40:05] ppocr DEBUG: 13, 1.000
[2022/08/16 11:40:05] ppocr DEBUG: 即时持久改善头发光泽的效果,给干燥的头, 0.953
[2022/08/16 11:40:05] ppocr DEBUG: 0.990198, 0.963
[2022/08/16 11:40:05] ppocr DEBUG: 14, 1.000
[2022/08/16 11:40:05] ppocr DEBUG: 发足够的滋养, 0.966
[2022/08/16 11:40:05] ppocr DEBUG: 0.997668, 0.939
[2022/08/16 11:40:05] ppocr DEBUG: 花费了0.457335秒, 0.868
[2022/08/16 11:40:05] ppocr DEBUG: The visualized image saved in ./inference_results/lite_demo.png
[2022/08/16 11:40:05] ppocr INFO: The predict total time is 0.8480799198150635
PaddleLite使用NPU(华为)进行部署
进入Paddle Lite源码根目录。
wget https://paddlelite-demo.bj.bcebos.com/devices/huawei/kirin/hiai_ddk_lib_510.tar.gz
tar -xzvf hiai_ddk_lib_510.tar.gz
编译
rm -rf third-party/
./lite/tools/build_android.sh --toolchain=clang --android_stl=c++_shared --with_extra=ON --with_log=ON --with_nnadapter=ON --nnadapter_with_huawei_kirin_npu=ON --nnadapter_huawei_kirin_npu_sdk_root=$(pwd)/hiai_ddk_lib_510 full_publish
- 编译opt
./lite/tools/build.sh build_optimize_tool
编译结果位于Paddle-Lite/build.opt/lite/api/opt,这里opt是可执行文件。
编辑opt环境,使用vim打开/etc/profile,添加内容如下
export OPT=/home/user/Paddle-Lite/build.opt/lite/api
export PATH=$PATH:$OPT
source /etc/profile,进入PaddleOCR的inference目录,执行
opt --model_file=./ch_PP-OCRv3_det_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_det_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_det_npu_out --valid_targets=huawei_kirin_npu,arm --optimize_out_type=naive_buffer
opt --model_file=./ch_PP-OCRv3_rec_infer/inference.pdmodel --param_file=./ch_PP-OCRv3_rec_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_rec_npu_out --valid_targets=huawei_kirin_npu,arm --optimize_out_type=naive_buffer
opt --model_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel --param_file=./ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams --optimize_out=./ch_PP-OCRv3_cls_npu_out --valid_targets=huawei_kirin_npu,arm --optimize_out_type=naive_buffer
这样就会产生三个nb文件
编译安卓demo
在PaddleOCR的deploy目录下有一个android_demo的项目,我们可以单独将其拷贝出来使用Android Studio打开。
在local.properties中添加NDK的路径,我这里是
ndk.dir=/Users/admin/Documents/android-ndk-r17c
添加NDK的版本号
在Module级别的build.gradle修改NDK的版本号以及限制ndk的使用范围arm64-v8a
ndkVersion '17.2.4988734'
android { compileSdk 29 defaultConfig { applicationId "com.baidu.paddle.lite.demo.ocr" minSdkVersion 23 targetSdkVersion 29 versionCode 2 versionName "2.0" testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner" ndk { abiFilters 'arm64-v8a' } externalNativeBuild { cmake { cppFlags "-std=c++11 -frtti -fexceptions -Wno-format" arguments '-DANDROID_PLATFORM=android-23', '-DANDROID_STL=c++_shared' ,"-DANDROID_ARM_NEON=TRUE" } } }
修改app/OpenCV/sdk/jni/OpenCVConfig.cmake中的
set(OpenCV_FOUND TRUE)
此处原本是False。当然此处需要C++安装OpenCV,可以参考模型部署篇 中的安装 C++ 版本的 OpenCV。
- 更换模型
在app/src/main/assets/models/ch_PP-OCRv2目录下更换你自己的模型文件就可以了,这里必须是Paddle-Lite生成的.nb的模型文件,然后改成跟原始文件名相同的名字就可以了,其他都不用改。
- 使用华为NPU加速安卓应用
假设上面替换的是使用华为npu转换的nb文件。替换npu编译的库文件,进入Paddle-Lite的根目录
cp -rf build.lite.android.armv8.clang/inference_lite_lib.android.armv8.nnadapter/cxx/include/* /Users/admin/Documents/android_demo/app/PaddleLite/cxx/include/
数据合成工具
有的时候,我们的数据背景比较单一,语言也比较单一,需要进行混合的使用,使得模型具有更好的泛化能力。
进入PaddleOCR主目录的styleText目录,下载模型,并解压
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
unzip style_text_models.zip
现在我们挑选一张风格图作为OCR图片的背景,这里我们选择如下
此时我们输入的语料文字是PaddleOCR,在styleText/tools目录下的synth_image.py中添加执行参数
-c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
,并且调整执行路径为styleText
我们来看看最终的输出
除此之外,程序还会生成并保存中间结果fake_bg.jpg
:为风格参考图去掉文字后的背景;
fake_text.jpg
:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。
批量合成
下载风格图片https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar,下载完成后放入styleText目录下。
修改styleText/configs/dataset_config.yml,内容如下
Global: output_num: 10 output_dir: output_data use_gpu: false image_height: 32 image_width: 320 standard_font: fonts/ch_standard.ttf TextDrawer: fonts: en: fonts/en_standard.ttf ch: fonts/ch_standard.ttf ko: fonts/ko_standard.ttf StyleSampler: method: DatasetSampler image_home: chkoen_5w label_file: chkoen_5w/label.txt with_label: false CorpusGenerator: method: FileCorpus language: ch corpus_file: examples/corpus/example.txt Predictor: method: StyleTextRecPredictor algorithm: StyleTextRec scale: 0.00392156862745098 mean: - 0.5 - 0.5 - 0.5 std: - 0.5 - 0.5 - 0.5 expand_result: false bg_generator: pretrain: style_text_models/bg_generator module_name: bg_generator generator_type: BgGeneratorWithMask encode_dim: 64 norm_layer: null conv_block_num: 4 conv_block_dropout: false conv_block_dilation: true output_factor: 1.05 text_generator: pretrain: style_text_models/text_generator module_name: text_generator generator_type: TextGenerator encode_dim: 64 norm_layer: InstanceNorm2D conv_block_num: 4 conv_block_dropout: false conv_block_dilation: true fusion_generator: pretrain: style_text_models/fusion_generator module_name: fusion_generator generator_type: FusionGeneratorSimple encode_dim: 64 norm_layer: null conv_block_num: 4 conv_block_dropout: false conv_block_dilation: true Writer: method: SimpleWriter