用PaddlePaddle实现经典的经典的人脸识别算法PyramidBox

原创
2020/04/13 18:05
阅读数 1.2K

一、项目背景介绍

人脸检测是经典的计算机视觉任务,非受控场景中的小脸、模糊和遮挡的人脸检测是这个方向上最有挑战的问题。PyramidBox 是一种基于SSD的单阶段人脸检测器,它利用上下文信息解决困难人脸的检测问题。 Pyramidbox模型可以在以下示例图片上展示鲁棒的检测性能,该图有一千张人脸,该模型检测出其中的880张人脸。 Pyramidbox 人脸检测性能展示 论文原文:https://arxiv.org/pdf/1803.07737.pdf

下载安装命令

## CPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

二、数据集介绍

本文使用 WIDER FACE 数据集 来进行模型的训练测试工作,官网给出了详尽的数据介绍。

WIDER FACE数据集包含32,203张图片,其中包含393,703个人脸,数据集的人脸在尺度、姿态、遮挡方面有较大的差异性。另外WIDER FACE数据集是基于61个场景归类的,然后针对每个场景,随机的挑选40%作为训练集,10%作为验证集,50%作为测试集。

AI Studio中有WIDER FACE数据集,创建项目时,直接添加即可。解压如下:

In[3]
# 解压数据信息
!unzip data/data4336/wider_face_split.zip -d data/data4336
# 训练数据
!unzip data/data4336/WIDER_train.zip -d data/data4336
# 验证集数据
!unzip data/data4336/WIDER_val.zip -d data/data4336
# 测试集数据
!unzip data/data4336/WIDER_test.zip -d data/data4336
 

准备好数据之后,data/data4336目录如下:

 

三、网络结构构建

网络结构示意图如下: 网络结构理解可参考博客:https://blog.csdn.net/Xingyb14/article/details/81253129#31__53

 

四、模型训练

4、1下载预训练模型

In[2]
# 下载预训练模型
!wget http://paddlemodels.bj.bcebos.com/vgg_ilsvrc_16_fc_reduced.tar.gz
!tar -xf vgg_ilsvrc_16_fc_reduced.tar.gz && rm -f vgg_ilsvrc_16_fc_reduced.tar.gz
 

4、2安装ujson库

In[4]
# 安装ujson库
!pip install ujson
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Collecting ujson
  Downloading https://mirrors.tuna.tsinghua.edu.cn/pypi/web/packages/16/c4/79f3409bc710559015464e5f49b9879430d8f87498ecdc335899732e5377/ujson-1.35.tar.gz (192kB)
    100% |████████████████████████████████| 194kB 7.7MB/s ta 0:00:011
Building wheels for collected packages: ujson
  Running setup.py bdist_wheel for ujson ... done
  Stored in directory: /home/aistudio/.cache/pip/wheels/a1/a1/04/335f51b9e097d459e05744a94ab146fc1f689fd9b78f0709f0
Successfully built ujson
Installing collected packages: ujson
Successfully installed ujson-1.35
 

4、3训练

  • 关于训练的超参数设置,可修改下面代码中的add_arg和train_parameters。
  • 默认使用GPU训练,默认训练模型保存位置是out文件夹。
  • 可以通过设置 export CUDA_VISIBLE_DEVICES=0,1,2,3 指定想要使用的GPU数量,batch_size官方默认设置为12或16(本程序是2)
  • 官方参考训练时间(使用官方默认设置):模型训练150轮以上可以收敛。用Nvidia Tesla P40 GPU 4卡并行,batch_size=16的配置,每轮训练大约40分钟,总共训练时长大约100小时
  • 模型训练所采用的数据增强: 数据增强:数据的读取行为定义在 reader.py 中,所有的图片都会被缩放到640x640。在训练时还会对图片进行数据增强,包括随机扰动、翻转、裁剪等,和物体检测SSD算法中数据增强类似,除此之外,增加了上面提到的Data-anchor-sampling: 尺度变换(Data-anchor-sampling):随机将图片尺度变换到一定范围的尺度,大大增强人脸的尺度变化。具体操作为根据随机选择的人脸高(height)和宽(width),得到v=width∗heightv=\sqrt{width * height}v=widthheight,判断vvv的值位于缩放区间[16,32,64,128,256,512][16,32,64,128,256,512][163264128256512]中的的哪一个。假设v=45v=45v=45,则选定32<v<6432<v<6432<v<64,以均匀分布的概率选取[16,32,64][16,32,64][163264]中的任意一个值。若选中646464,则该人脸的缩放区间在 [64/2,min(v∗2,64∗2)][64 / 2,min(v * 2, 64 * 2)][64/2min(v2,642)]中选定。
In[2]
# 训练代码
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import shutil
import numpy as np
import time
import argparse
import functools


def set_paddle_flags(**kwargs):
    for key, value in kwargs.items():
        if os.environ.get(key, None) is None:
            os.environ[key] = str(value)


# 以上设置需要在导入paddle之前设置,否则无效
set_paddle_flags(
    FLAGS_eager_delete_tensor_gb=0,  # 启用GC保存内存
)

import paddle
import paddle.fluid as fluid
from pyramidbox import PyramidBox
import reader
from utility import add_arguments, print_arguments, check_cuda
# 命令解析,源码中为了方便接收命令调用时设置的参数,此例直接修改即可。
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)


# 参数设置
# 是否使用GPU多线程
add_arg('parallel',         bool,  True,            "Whether use multi-GPU/threads or not.")
add_arg('learning_rate',    float, 0.001,           "The start learning rate.")
add_arg('batch_size',       int,   2,              "Minibatch size.")
add_arg('epoc_num',         int,   1,             "Epoch number.")
add_arg('use_gpu',          bool,  True,            "Whether use GPU.")
add_arg('use_pyramidbox',   bool,  True,            "Whether use PyramidBox model.")
# 模型保存位置
add_arg('model_save_dir',   str,   'out',        "The path to save model.")
# 训练时图片的长宽
add_arg('resize_h',         int,   640,             "The resized image height.")
add_arg('resize_w',         int,   640,             "The resized image width.")
# 通道转换比例
add_arg('mean_BGR',         str,   '104., 117., 123.', "Mean value for B,G,R channel which will be subtracted.")
# 预训练模型位置
add_arg('pretrained_model', str,   './vgg_ilsvrc_16_fc_reduced/', "The init model path.")
# 训练数据位置
add_arg('data_dir',         str,   'data/data4336',          "The base dir of dataset")
# 是否使用多进程进行数据预处理
add_arg('use_multiprocess', bool,  True,            "Whether use multi-process for data preprocessing.")
parser.add_argument('--enable_ce', action='store_true', help='If set, run the task with continuous evaluation logs.')
parser.add_argument('--batch_num', type=int, help="batch num for ce")
parser.add_argument('--num_devices', type=int, default=1, help='Number of GPU devices')

# 训练参数设置
train_parameters = {
    "train_images": 12880,
    "image_shape": [3, 640, 640],
    "class_num": 2,
    "batch_size": 16,
    "lr": 0.001,
    "lr_epochs": [99, 124, 149],
    "lr_decay": [1, 0.1, 0.01, 0.001],
    "epoc_num": 160,
    "optimizer_method": "momentum",
    "use_pyramidbox": True
}

# 优化器设置
def optimizer_setting(train_params):
    batch_size = train_params["batch_size"]
    iters = train_params["train_images"] // batch_size
    lr = train_params["lr"]
    optimizer_method = train_params["optimizer_method"]
    boundaries = [i * iters for i in train_params["lr_epochs"]]
    values = [i * lr for i in train_params["lr_decay"]]

    if optimizer_method == "momentum":
        optimizer = fluid.optimizer.Momentum(
            learning_rate=fluid.layers.piecewise_decay(boundaries, values),
            momentum=0.9,
            regularization=fluid.regularizer.L2Decay(0.0005),
        )
    else:
        optimizer = fluid.optimizer.RMSProp(
            learning_rate=fluid.layers.piecewise_decay(boundaries, values),
            regularization=fluid.regularizer.L2Decay(0.0005),
        )
    return optimizer

def build_program(train_params, main_prog, startup_prog, args):
    use_pyramidbox = train_params["use_pyramidbox"]
    image_shape = train_params["image_shape"]
    class_num = train_params["class_num"]
    with fluid.program_guard(main_prog, startup_prog):
        py_reader = fluid.layers.py_reader(
            capacity=8,
            shapes=[[-1] + image_shape, [-1, 4], [-1, 4], [-1, 1]],
            lod_levels=[0, 1, 1, 1],
            dtypes=["float32", "float32", "float32", "int32"],
            use_double_buffer=True)
        with fluid.unique_name.guard():
            image, face_box, head_box, gt_label = fluid.layers.read_file(py_reader)
            fetches = []
            network = PyramidBox(image=image,
                                 face_box=face_box,
                                 head_box=head_box,
                                 gt_label=gt_label,
                                 sub_network=use_pyramidbox)
            if use_pyramidbox:
                face_loss, head_loss, loss = network.train()
                fetches = [face_loss, head_loss]
            else:
                loss = network.vgg_ssd_loss()
                fetches = [loss]
            optimizer = optimizer_setting(train_params)
            optimizer.minimize(loss)
    return py_reader, fetches, loss

# 训练函数
def train(args, config, train_params, train_file_list):
    batch_size = train_params["batch_size"]         #每块大小
    epoc_num = train_params["epoc_num"]             #训练次数
    optimizer_method = train_params["optimizer_method"]     #优化方法
    use_pyramidbox = train_params["use_pyramidbox"]         #是否使用金字塔箱模型

    use_gpu = args.use_gpu              #是否使用gpu
    model_save_dir = args.model_save_dir  
    pretrained_model = args.pretrained_model
    
    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""       #获取GPU设备信息
    devices_num = len(devices.split(","))                   #GPU个数
    # 一些简单计算
    batch_size_per_device = batch_size // devices_num
    iters_per_epoc = train_params["train_images"] // batch_size
    num_workers = 8
    is_shuffle = True

    startup_prog = fluid.Program()
    train_prog = fluid.Program()

    #only for ce
    if args.enable_ce:
        SEED = 102
        startup_prog.random_seed = SEED
        train_prog.random_seed = SEED
        num_workers = 1
        pretrained_model = ""
        if args.batch_num != None:
            iters_per_epoc = args.batch_num

    train_py_reader, fetches, loss = build_program(
        train_params = train_params,
        main_prog = train_prog,
        startup_prog = startup_prog,
        args=args)
    
    # 使用GPU or CPU
    place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
    exe.run(startup_prog)

    start_epoc = 0
    # 使用预训练模型
    if pretrained_model:
        if pretrained_model.isdigit():
            start_epoc = int(pretrained_model) + 1
            pretrained_model = os.path.join(model_save_dir, pretrained_model)
            print("Resume from %s " %(pretrained_model))

        if not os.path.exists(pretrained_model):
            raise ValueError("The pre-trained model path [%s] does not exist." %
                             (pretrained_model))
        def if_exist(var):
            return os.path.exists(os.path.join(pretrained_model, var.name))
        fluid.io.load_vars(
            exe, pretrained_model, main_program=train_prog, predicate=if_exist)
    
    # 开始训练
    train_reader = reader.train(config,
                                train_file_list,
                                batch_size_per_device,
                                shuffle = is_shuffle,
                                use_multiprocess=args.use_multiprocess,
                                num_workers=num_workers)
    train_py_reader.decorate_paddle_reader(train_reader)

    # 多GPU
    if args.parallel:
        train_exe = fluid.ParallelExecutor(
            main_program = train_prog,
            use_cuda=use_gpu,
            loss_name=loss.name)

    # 模型保存
    def save_model(postfix, program):
        model_path = os.path.join(model_save_dir, postfix)
        if os.path.isdir(model_path):
            shutil.rmtree(model_path)

        print('save models to %s' % (model_path))
        fluid.io.save_persistables(exe, model_path, main_program=program)

    # 计算每次,每块训练结果 ,每训练十块打印一次训练结果
    # 结果有训练次数
    # 训练块数
    # 平均头部、脸部损失
    # 训练每块时间
    # 总时间
    total_time = 0.0
    epoch_idx = 0
    face_loss = 0
    head_loss = 0
    for pass_id in range(start_epoc, epoc_num):
        epoch_idx += 1
        start_time = time.time()
        prev_start_time = start_time
        end_time = 0
        batch_id = 0
        train_py_reader.start()
        while True:
            try:
                prev_start_time = start_time
                start_time = time.time()
                if args.parallel:
                    fetch_vars = train_exe.run(fetch_list=
                        [v.name for v in fetches])
                else:
                    fetch_vars = exe.run(train_prog, fetch_list=fetches)
                end_time = time.time()
                fetch_vars = [np.mean(np.array(v)) for v in fetch_vars]
                face_loss = fetch_vars[0]
                head_loss = fetch_vars[1]
                if batch_id % 10 == 0:
                    if not args.use_pyramidbox:
                        print("Pass {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
                            pass_id, batch_id, face_loss,
                            start_time - prev_start_time))
                    else:
                        print("Pass {:d}, batch {:d}, face loss {:.6f}, " \
                              "head loss {:.6f}, " \
                              "time {:.5f}".format(pass_id,
                               batch_id, face_loss, head_loss,
                               start_time - prev_start_time))
                batch_id += 1
            except (fluid.core.EOFException, StopIteration):
                train_py_reader.reset()
                break
        epoch_end_time = time.time()
        total_time += epoch_end_time - start_time
        save_model(str(pass_id), train_prog)

    # only for ce
    if args.enable_ce:
        gpu_num = get_cards(args)
        print("kpis\teach_pass_duration_card%s\t%s" %
                (gpu_num, total_time / epoch_idx))
        print("kpis\ttrain_face_loss_card%s\t%s" %
                (gpu_num, face_loss))
        print("kpis\ttrain_head_loss_card%s\t%s" %
                (gpu_num, head_loss))

def get_cards(args):
    if args.enable_ce:
        cards = os.environ.get('CUDA_VISIBLE_DEVICES')
        num = len(cards.split(","))
        return num
    else:
        return args.num_devices


if __name__ == '__main__':
    #导入超参数
    args = parser.parse_args(args=[])   #源码中没有参数,但没有参数会报错,具体原因待学习
    print_arguments(args)
    check_cuda(args.use_gpu)
    
    mean_BGR = [float(m) for m in args.mean_BGR.split(",")]      #获取各通道占比
    image_shape = [3, int(args.resize_h), int(args.resize_w)]    #图片形状设置
    train_parameters["image_shape"] = image_shape
    train_parameters["use_pyramidbox"] = args.use_pyramidbox
    train_parameters["batch_size"] = args.batch_size
    train_parameters["lr"] = args.learning_rate
    train_parameters["epoc_num"] = args.epoc_num
    
    #数据加载
    data_dir = os.path.join(args.data_dir, 'WIDER_train/images/')
    train_file_list = os.path.join(args.data_dir,
        'wider_face_split/wider_face_train_bbx_gt.txt')

    # 训练配置
    config = reader.Settings(
        data_dir=data_dir,
        resize_h=image_shape[1],
        resize_w=image_shape[2],
        apply_distort=True,
        apply_expand=False,
        mean_value=mean_BGR,
        ap_version='11point')
    print('开始训练')
    train(args, config, train_parameters, train_file_list)
    print('finish')
-----------  Configuration Arguments -----------
batch_num: None
batch_size: 2
data_dir: data/data4336
enable_ce: False
epoc_num: 1
learning_rate: 0.001
mean_BGR: 104., 117., 123.
model_save_dir: out
num_devices: 1
parallel: True
pretrained_model: ./vgg_ilsvrc_16_fc_reduced/
resize_h: 640
resize_w: 640
use_gpu: True
use_multiprocess: True
use_pyramidbox: True
------------------------------------------------
开始训练
WARNING:root:
     You can try our memory optimize feature to save your memory usage:
         # create a build_strategy variable to set memory optimize option
         build_strategy = compiler.BuildStrategy()
         build_strategy.enable_inplace = True
         build_strategy.memory_optimize = True
         
         # pass the build_strategy to with_data_parallel API
         compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
             loss_name=loss.name, build_strategy=build_strategy)
      
     !!! Memory optimize is our experimental feature !!!
         some variables may be removed/reused internal to save memory usage, 
         in order to fetch the right value of the fetch_list, please set the 
         persistable property to true for each variable in fetch_list

         # Sample
         conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
         # if you need to fetch conv1, then:
         conv1.persistable = True
Pass 0, batch 0, face loss 9.779743, head loss 8.900367, time 0.00051
Pass 0, batch 10, face loss 7.977530, head loss 11.644043, time 0.31490
Pass 0, batch 20, face loss 8.761090, head loss 6.781384, time 0.31145
Pass 0, batch 30, face loss 6.564213, head loss 7.319831, time 0.29078
Pass 0, batch 40, face loss 7.336867, head loss 6.687093, time 0.49785
Pass 0, batch 50, face loss 11.479609, head loss 7.921889, time 0.29177
Pass 0, batch 60, face loss 9.714411, head loss 9.740385, time 0.40002
Pass 0, batch 70, face loss 8.429513, head loss 7.406289, time 0.40068
Pass 0, batch 80, face loss 7.554870, head loss 5.238661, time 0.38369
Pass 0, batch 90, face loss 6.647611, head loss 7.927995, time 0.30237
Pass 0, batch 100, face loss 7.626194, head loss 7.476327, time 0.30053
Pass 0, batch 110, face loss 6.561839, head loss 5.359453, time 0.27778
Pass 0, batch 120, face loss 8.526546, head loss 8.009300, time 0.28608
Pass 0, batch 130, face loss 7.376438, head loss 7.418943, time 0.68068
Pass 0, batch 140, face loss 7.876499, head loss 7.551691, time 0.30833
Pass 0, batch 150, face loss 7.038351, head loss 6.874320, time 0.27617
Pass 0, batch 160, face loss 5.992482, head loss 6.690054, time 0.27540
Pass 0, batch 170, face loss 7.605720, head loss 7.591462, time 0.27874
Pass 0, batch 180, face loss 7.876195, head loss 7.656796, time 0.27855
Pass 0, batch 190, face loss 8.026948, head loss 9.820858, time 0.27617
Pass 0, batch 200, face loss 6.139144, head loss 5.475175, time 0.77825
Pass 0, batch 210, face loss 9.695359, head loss 7.184970, time 0.27668
Pass 0, batch 220, face loss 11.177995, head loss 6.948171, time 0.27677
Pass 0, batch 230, face loss 7.307765, head loss 6.161407, time 0.27716
Pass 0, batch 240, face loss 8.474988, head loss 7.739111, time 0.30256
Pass 0, batch 250, face loss 8.438314, head loss 7.262356, time 0.29460
Pass 0, batch 260, face loss 5.441424, head loss 5.481203, time 0.27773
Pass 0, batch 270, face loss 8.069559, head loss 6.246784, time 0.27706
Pass 0, batch 280, face loss 6.295203, head loss 5.652632, time 0.48682
Pass 0, batch 290, face loss 5.960318, head loss 6.215240, time 0.28262
Pass 0, batch 300, face loss 12.801303, head loss 7.689561, time 0.27648
Pass 0, batch 310, face loss 5.958226, head loss 5.518826, time 0.28061
Pass 0, batch 320, face loss 7.571748, head loss 6.239631, time 0.27963
Pass 0, batch 330, face loss 7.988011, head loss 6.390769, time 0.27753
Pass 0, batch 340, face loss 7.835019, head loss 6.609488, time 0.27996
Pass 0, batch 350, face loss 6.619651, head loss 6.107749, time 0.28046
Pass 0, batch 360, face loss 8.492857, head loss 6.346087, time 0.27744
Pass 0, batch 370, face loss 6.896113, head loss 7.644141, time 0.27724
Pass 0, batch 380, face loss 7.813636, head loss 5.252610, time 0.32740
Pass 0, batch 390, face loss 6.630651, head loss 5.668124, time 0.28359
Pass 0, batch 400, face loss 5.923609, head loss 6.352117, time 0.27878
Pass 0, batch 410, face loss 7.661426, head loss 6.187863, time 0.27616
Pass 0, batch 420, face loss 6.323527, head loss 4.648950, time 0.28002
Pass 0, batch 430, face loss 7.008544, head loss 6.418988, time 0.30782
Pass 0, batch 440, face loss 4.369812, head loss 4.931898, time 0.28045
Pass 0, batch 450, face loss 10.715254, head loss 7.752547, time 0.29711
Pass 0, batch 460, face loss 7.980027, head loss 8.120670, time 0.28499
Pass 0, batch 470, face loss 7.023969, head loss 5.038123, time 0.27471
Pass 0, batch 480, face loss 4.648422, head loss 5.033396, time 0.28636
Pass 0, batch 490, face loss 5.617349, head loss 4.613503, time 0.27748
Pass 0, batch 500, face loss 6.688055, head loss 6.495238, time 0.27893
Pass 0, batch 510, face loss 4.241967, head loss 4.302329, time 0.28027
Pass 0, batch 520, face loss 7.575843, head loss 6.653645, time 0.29019
Pass 0, batch 530, face loss 6.486510, head loss 6.643053, time 0.27945
Pass 0, batch 540, face loss 7.088201, head loss 5.747246, time 0.27881
Pass 0, batch 550, face loss 7.135513, head loss 6.466057, time 0.27966
Pass 0, batch 560, face loss 5.067999, head loss 4.355090, time 1.32942
Pass 0, batch 570, face loss 5.647346, head loss 5.196908, time 0.27653
Pass 0, batch 580, face loss 5.504908, head loss 4.933537, time 0.28012
Pass 0, batch 590, face loss 6.981771, head loss 6.778963, time 0.27844
Pass 0, batch 600, face loss 8.531418, head loss 6.565226, time 0.28835
Pass 0, batch 610, face loss 8.745155, head loss 6.477547, time 0.27858
Pass 0, batch 620, face loss 6.019545, head loss 6.184611, time 0.27947
Pass 0, batch 630, face loss 7.596415, head loss 6.025851, time 0.27779
Pass 0, batch 640, face loss 6.479803, head loss 5.122132, time 0.27635
Pass 0, batch 650, face loss 5.509667, head loss 4.720405, time 0.29149
Pass 0, batch 660, face loss 5.652287, head loss 3.880599, time 0.28030
Pass 0, batch 670, face loss 7.232968, head loss 6.510226, time 0.28461
Pass 0, batch 680, face loss 5.256457, head loss 5.510109, time 0.28104
Pass 0, batch 690, face loss 6.860471, head loss 5.810205, time 0.28205
Pass 0, batch 700, face loss 4.175430, head loss 4.239882, time 0.28485
Pass 0, batch 710, face loss 7.678154, head loss 4.785631, time 0.28141
Pass 0, batch 720, face loss 7.250123, head loss 6.579511, time 0.27916
Pass 0, batch 730, face loss 9.304526, head loss 7.101081, time 0.27784
Pass 0, batch 740, face loss 7.048196, head loss 5.352265, time 0.27819
Pass 0, batch 750, face loss 7.183696, head loss 6.876764, time 0.27973
 

  • 此数据集较大,训练较慢,若想快速看效果,可参考下面代码,直接下载官方训练好的模型。
In[6]
# 下载官方模型
!wget http://paddlemodels.bj.bcebos.com/PyramidBox_WiderFace.tar.gz
!tar -zxvf PyramidBox_WiderFace.tar.gz && rm -f PyramidBox_WiderFace.tar.gz
 

五、模型预测

官方代码提供模型预测和模型评估功能,这里强调预测功能。

  • 和训练一样,修改add_arg即可,add_arg各个参数在程序中都有说明,强调一下 预测图片时参数infer需设置为True,给出带预测图片image_path路径,和训练好的模型路径model_dir即可。
  • 模型评估,需要用到Matlab,可参考https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/face_detection。 这里因训练时间较长,缺少Matlab,故训练了一会直接看预测结果不是很好,想要训练一个效果较好的模型,建议按照官方(本文4.3)设置超参数训练。
In[1]
# 模型预测以及可视化

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import time
import numpy as np
import argparse
import functools
from PIL import Image

import paddle.fluid as fluid
import reader
from pyramidbox import PyramidBox
from visualize import draw_bboxes
from utility import add_arguments, print_arguments
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)

# yapf: disable
# 是否使用gpu
add_arg('use_gpu',         bool,  True,                              "Whether use GPU or not.")
# 是否使用金字塔箱模型
add_arg('use_pyramidbox',  bool,  True,                              "Whether use PyramidBox model.")
# 评估用,预测不用管
add_arg('data_dir',        str,   '',          "The validation dataset path.")
# 训练好的模型位置
add_arg('model_dir',       str,   'PyramidBox_WiderFace/',                                "The model path.")
# 评估用,预测不用管
add_arg('pred_dir',        str,   'pred',                            "The path to save the evaluation results.")
add_arg('file_list',       str,   'data/data4336/wider_face_split/wider_face_val_bbx_gt.txt', "The validation dataset path.")
# 预测设置为True,模型评估设置成False
add_arg('infer',           bool,  True,                             "Whether do infer or eval.")
# 阈值
add_arg('confs_threshold', float, 0.15,                              "Confidence threshold to draw bbox.")
# 预测图片路径
add_arg('image_path',      str,   'data/data4336/WIDER_test/images/28--Sports_Fan/28_Sports_Fan_Sports_Fan_28_740.jpg', "The image used to inference and visualize.")
# yapf: enable

def infer(args, config):
    model_dir = args.model_dir
    pred_dir = args.pred_dir
    if not os.path.exists(model_dir):
        raise ValueError("The model path [%s] does not exist." % (model_dir))

    if args.infer:
        image_path = args.image_path
        image = Image.open(image_path)
        if image.mode == 'L':
            image = img.convert('RGB')
        shrink, max_shrink = get_shrink(image.size[1], image.size[0])

        det0 = detect_face(image, shrink)
        if args.use_gpu:
            det1 = flip_test(image, shrink)
            [det2, det3] = multi_scale_test(image, max_shrink)
            det4 = multi_scale_test_pyramid(image, max_shrink)
            det = np.row_stack((det0, det1, det2, det3, det4))
            dets = bbox_vote(det)
        else:
            # when infer on cpu, use a simple case
            dets = det0

        keep_index = np.where(dets[:, 4] >= args.confs_threshold)[0]
        dets = dets[keep_index, :]
        draw_bboxes(image_path, dets[:, 0:4])
    else:
        test_reader = reader.test(config, args.file_list)
        for image, image_path in test_reader():
            shrink, max_shrink = get_shrink(image.size[1], image.size[0])

            det0 = detect_face(image, shrink)
            det1 = flip_test(image, shrink)
            [det2, det3] = multi_scale_test(image, max_shrink)
            det4 = multi_scale_test_pyramid(image, max_shrink)
            det = np.row_stack((det0, det1, det2, det3, det4))
            dets = bbox_vote(det)

            save_widerface_bboxes(image_path, dets, pred_dir)

        print("Finish evaluation.")

def save_widerface_bboxes(image_path, bboxes_scores, output_dir):
    """
    Save predicted results, including bbox and score into text file.
    Args:
        image_path (string): file name.
        bboxes_scores (np.array|list): the predicted bboxed and scores, layout
            is (xmin, ymin, xmax, ymax, score)
        output_dir (string): output directory.
    """
    image_name = image_path.split('/')[-1]
    image_class = image_path.split('/')[-2]

    odir = os.path.join(output_dir, image_class)
    if not os.path.exists(odir):
        os.makedirs(odir)

    ofname = os.path.join(odir, '%s.txt' % (image_name[:-4]))
    f = open(ofname, 'w')
    f.write('{:s}\n'.format(image_class + '/' + image_name))
    f.write('{:d}\n'.format(bboxes_scores.shape[0]))
    for box_score in bboxes_scores:
        xmin, ymin, xmax, ymax, score = box_score
        f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'.format(xmin, ymin, (
            xmax - xmin + 1), (ymax - ymin + 1), score))
    f.close()
    print("The predicted result is saved as {}".format(ofname))

def detect_face(image, shrink):
    image_shape = [3, image.size[1], image.size[0]]
    if shrink != 1:
        h, w = int(image_shape[1] * shrink), int(image_shape[2] * shrink)
        image = image.resize((w, h), Image.ANTIALIAS)
        image_shape = [3, h, w]

    img = np.array(image)
    img = reader.to_chw_bgr(img)
    mean = [104., 117., 123.]
    scale = 0.007843
    img = img.astype('float32')
    img -= np.array(mean)[:, np.newaxis, np.newaxis].astype('float32')
    img = img * scale
    img = [img]
    img = np.array(img)

    detection, = exe.run(infer_program,
                         feed={'image': img},
                         fetch_list=fetches,
                         return_numpy=False)
    detection = np.array(detection)
    # layout: xmin, ymin, xmax. ymax, score
    if np.prod(detection.shape) == 1:
        print("No face detected")
        return np.array([[0, 0, 0, 0, 0]])
    det_conf = detection[:, 1]
    det_xmin = image_shape[2] * detection[:, 2] / shrink
    det_ymin = image_shape[1] * detection[:, 3] / shrink
    det_xmax = image_shape[2] * detection[:, 4] / shrink
    det_ymax = image_shape[1] * detection[:, 5] / shrink

    det = np.column_stack((det_xmin, det_ymin, det_xmax, det_ymax, det_conf))
    return det

def bbox_vote(det):
    order = det[:, 4].ravel().argsort()[::-1]
    det = det[order, :]
    if det.shape[0] == 0:
        dets = np.array([[10, 10, 20, 20, 0.002]])
        det = np.empty(shape=[0, 5])
    while det.shape[0] > 0:
        # IOU
        area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1)
        xx1 = np.maximum(det[0, 0], det[:, 0])
        yy1 = np.maximum(det[0, 1], det[:, 1])
        xx2 = np.minimum(det[0, 2], det[:, 2])
        yy2 = np.minimum(det[0, 3], det[:, 3])
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        o = inter / (area[0] + area[:] - inter)

        # nms
        merge_index = np.where(o >= 0.3)[0]
        det_accu = det[merge_index, :]
        det = np.delete(det, merge_index, 0)
        if merge_index.shape[0] <= 1:
            if det.shape[0] == 0:
                try:
                    dets = np.row_stack((dets, det_accu))
                except:
                    dets = det_accu
            continue
        det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4))
        max_score = np.max(det_accu[:, 4])
        det_accu_sum = np.zeros((1, 5))
        det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4],
                                      axis=0) / np.sum(det_accu[:, -1:])
        det_accu_sum[:, 4] = max_score
        try:
            dets = np.row_stack((dets, det_accu_sum))
        except:
            dets = det_accu_sum
    dets = dets[0:750, :]
    return dets

def flip_test(image, shrink):
    img = image.transpose(Image.FLIP_LEFT_RIGHT)
    det_f = detect_face(img, shrink)
    det_t = np.zeros(det_f.shape)
    # image.size: [width, height]
    det_t[:, 0] = image.size[0] - det_f[:, 2]
    det_t[:, 1] = det_f[:, 1]
    det_t[:, 2] = image.size[0] - det_f[:, 0]
    det_t[:, 3] = det_f[:, 3]
    det_t[:, 4] = det_f[:, 4]
    return det_t

def multi_scale_test(image, max_shrink):
    # Shrink detecting is only used to detect big faces
    st = 0.5 if max_shrink >= 0.75 else 0.5 * max_shrink
    det_s = detect_face(image, st)
    index = np.where(
        np.maximum(det_s[:, 2] - det_s[:, 0] + 1, det_s[:, 3] - det_s[:, 1] + 1)
        > 30)[0]
    det_s = det_s[index, :]
    # Enlarge one times
    bt = min(2, max_shrink) if max_shrink > 1 else (st + max_shrink) / 2
    det_b = detect_face(image, bt)

    # Enlarge small image x times for small faces
    if max_shrink > 2:
        bt *= 2
        while bt < max_shrink:
            det_b = np.row_stack((det_b, detect_face(image, bt)))
            bt *= 2
        det_b = np.row_stack((det_b, detect_face(image, max_shrink)))

    # Enlarged images are only used to detect small faces.
    if bt > 1:
        index = np.where(
            np.minimum(det_b[:, 2] - det_b[:, 0] + 1,
                       det_b[:, 3] - det_b[:, 1] + 1) < 100)[0]
        det_b = det_b[index, :]
    # Shrinked images are only used to detect big faces.
    else:
        index = np.where(
            np.maximum(det_b[:, 2] - det_b[:, 0] + 1,
                       det_b[:, 3] - det_b[:, 1] + 1) > 30)[0]
        det_b = det_b[index, :]
    return det_s, det_b

def multi_scale_test_pyramid(image, max_shrink):
    # Use image pyramids to detect faces
    det_b = detect_face(image, 0.25)
    index = np.where(
        np.maximum(det_b[:, 2] - det_b[:, 0] + 1, det_b[:, 3] - det_b[:, 1] + 1)
        > 30)[0]
    det_b = det_b[index, :]

    st = [0.75, 1.25, 1.5, 1.75]
    for i in range(len(st)):
        if (st[i] <= max_shrink):
            det_temp = detect_face(image, st[i])
            # Enlarged images are only used to detect small faces.
            if st[i] > 1:
                index = np.where(
                    np.minimum(det_temp[:, 2] - det_temp[:, 0] + 1,
                               det_temp[:, 3] - det_temp[:, 1] + 1) < 100)[0]
                det_temp = det_temp[index, :]
            # Shrinked images are only used to detect big faces.
            else:
                index = np.where(
                    np.maximum(det_temp[:, 2] - det_temp[:, 0] + 1,
                               det_temp[:, 3] - det_temp[:, 1] + 1) > 30)[0]
                det_temp = det_temp[index, :]
            det_b = np.row_stack((det_b, det_temp))
    return det_b

def get_shrink(height, width):
    """
    Args:
        height (int): image height.
        width (int): image width.
    """
    # avoid out of memory
    max_shrink_v1 = (0x7fffffff / 577.0 / (height * width))**0.5
    max_shrink_v2 = ((678 * 1024 * 2.0 * 2.0) / (height * width))**0.5

    def get_round(x, loc):
        str_x = str(x)
        if '.' in str_x:
            str_before, str_after = str_x.split('.')
            len_after = len(str_after)
            if len_after >= 3:
                str_final = str_before + '.' + str_after[0:loc]
                return float(str_final)
            else:
                return x

    max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3
    if max_shrink >= 1.5 and max_shrink < 2:
        max_shrink = max_shrink - 0.1
    elif max_shrink >= 2 and max_shrink < 3:
        max_shrink = max_shrink - 0.2
    elif max_shrink >= 3 and max_shrink < 4:
        max_shrink = max_shrink - 0.3
    elif max_shrink >= 4 and max_shrink < 5:
        max_shrink = max_shrink - 0.4
    elif max_shrink >= 5:
        max_shrink = max_shrink - 0.5

    shrink = max_shrink if max_shrink < 1 else 1
    return shrink, max_shrink

if __name__ == '__main__':
    # 导入超参数
    args = parser.parse_args(args=[])
    print_arguments(args)
    config = reader.Settings(data_dir=args.data_dir)
    # 选择预测方式GPU or CPU
    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
    
    main_program = fluid.Program()
    startup_program = fluid.Program()
    image_shape = [3, 1024, 1024]
    with fluid.program_guard(main_program, startup_program):
        # 加载网络
        network = PyramidBox(
            data_shape=image_shape,
            sub_network=args.use_pyramidbox,
            is_infer=True)
        infer_program, nmsed_out = network.infer(main_program)
        fetches = [nmsed_out]
        fluid.io.load_persistables(             #加载持久性变量
            exe, args.model_dir, main_program=infer_program)
        # save model and program
        #fluid.io.save_inference_model('pyramidbox_model',
        #    ['image'], [nmsed_out], exe, main_program=infer_program,
        #    model_filename='model', params_filename='params')
    # 预测
    infer(args, config)
-----------  Configuration Arguments -----------
confs_threshold: 0.15
data_dir: 
file_list: data/data4336/wider_face_split/wider_face_val_bbx_gt.txt
image_path: data/data4336/WIDER_test/images/28--Sports_Fan/28_Sports_Fan_Sports_Fan_28_740.jpg
infer: True
model_dir: PyramidBox_WiderFace/
pred_dir: pred
use_gpu: True
use_pyramidbox: True
------------------------------------------------
The image with bbox is saved as 28_Sports_Fan_Sports_Fan_28_740.jpg
 

下图可视化了模型的预测结果:

 

六、总结

  • 提出了一个新的环境辅助的单步人脸检测器 PyramidBox,来解决检测不受约束的人脸的问题。设计了一个新的环境 anchor,叫做 PyramidAnchor,来监督人脸检测器从面部周围的环境学习特征。此外,把特征金字塔网络修改成了低层级的特征金字塔网络,将高层级特征和高分辨率特征结合起来,有利于检测较小的人脸。还提出了一个更宽更深的预测模块以充分利用结合了的特征。除此之外,采用了 Data-anchor-sampling 来增强训练数据,提高训练数据在较小的人脸上的多样性。实验证明 PyramidBox 在常用的人脸检测基准上达到了顶尖的水平,尤其对于难于检测的人脸。
  1. 价值   2018 年 WIDER FACE 三料冠军人脸检测算法,来自百度。算法聚焦检测难度大的人脸,尤其针对小尺度的人脸,在 WIDER FACE 验证集和测试集的困难子集上达到了 88.9% 和 88.7% 的 mAP。   不考虑速度。

  2. single shot   理解为 one-stage 和 two-stage 中的 one-stage。two-stage 方法多的一个阶段是提出候选窗口。应该是出自 SSD。SSD、S^3FD 和 PyramidBox 可以看作是一个系列。

3、anchor   锚点,是一个点——高层级特征图上滑窗的中心点,这个点对应原图中不同面积、不同长宽比的多个候选边框。但是好像实际使用中经常把锚点对应的候选边框区域也叫做 anchor,带来了理解这个词的困难。   传统 anchor 可以处理多尺度问题,但并没有关注环境信息。anchor 这个概念出自 Faster R-CNN。

4、高层级低层级   高层级是指更深的层,后面的层,感受野更大。高层的特征语义信息比较丰富,但是目标位置比较粗略,分辨率低;   低层的特征语义信息比较少,但是目标位置准确,分辨率高。   高层级特征适合用于检测尺寸较大的人脸,而低层级特征适合用于检测尺寸较小的人脸

5、特征金字塔网络 (FPN)   在 FPN 出现之前的目标检测算法基本都是只采用顶层特征做预测,虽然也有些算法采用多尺度特征融合的方式,但是一般是采用融合后的特征做预测,而 FPN 不一样的地方在于预测是在不同特征层独立进行的。   FPN 中有一个自底向上的线路,一个自顶向下的线路。自底向上其实就是网络的前向过程。在前向过程中,特征图的大小在经过某些层后会改变,而在经过其他一些层的时候不会改变,作者将不改变特征图大小的层归为一个阶段,因此每次抽取的特征都是每个阶段的最后一层的输出,这样就能构成特征金字塔。   自顶向下的过程是上采样,而横向连接则是将上采样的结果和自底向上生成的相同大小的特征图进行融合。在融合之后还会再采用 3*3 的卷积核对每个融合结果进行卷积,目的是消除上采样的混叠效应。并假设生成的特征图结果是 P2,P3,P4,P5,和原来自底向上的卷积结果 C2,C3,C4,C5 一一对应。

6、1×1​ 卷积的位置   每个 LFPN 块和 FPN 块并不完全一样—— 1×1​ 1 \times 1​1×1​ 卷积层的位置不同。1×1​ 1 \times 1​1×1​ 的卷积核的作用是降维或者加入非线性,这里的作用应该是减少特征的通道数。

参考博客:https://blog.csdn.net/Xingyb14/article/details/81253129

 

请点击此处查看本环境基本用法.
Please click here for more detailed instructions.

点击链接,使用AI Studio一键上手实践项目吧:https://aistudio.baidu.com/aistudio/projectdetail/169468

下载安装命令

## CPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

>> 访问 PaddlePaddle 官网,了解更多相关内容

展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部