机器翻译之Transformer

原创
2020/04/23 13:01
阅读数 1.2K

注意

本项目代码包含多个文件, Fork并使用GPU环境来运行后, 才能看到项目完整代码, 并正确运行:

 
 

并请检查相关参数设置, 例如use_gpu, fluid.CUDAPlace(0)等处是否设置正确.

 

模型简介

机器翻译(machine translation, MT)是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程,输入为源语言句子,输出为相应的目标语言的句子。本示例是机器翻译主流模型 Transformer 的实现和相关介绍。具体的论文地址为:Attention Is All You Need

图 1. Transformer 网络结构图

图 2. Multi-Head Attention

Encoder 由若干相同的 layer 堆叠组成,每个 layer 主要由多头注意力(Multi-Head Attention)和全连接的前馈(Feed-Forward)网络这两个 sub-layer 构成。

  • Multi-Head Attention 在这里用于实现 Self-Attention,相比于简单的 Attention 机制,其将输入进行多路线性变换后分别计算 Attention 的结果,并将所有结果拼接后再次进行线性变换作为输出。参见图2,其中 Attention 使用的是点积(Dot-Product),并在点积后进行了 scale 的处理以避免因点积结果过大进入 softmax 的饱和区域。
  • Feed-Forward 网络会对序列中的每个位置进行相同的计算(Position-wise),其采用的是两次线性变换中间加以 ReLU 激活的结构。

此外,每个 sub-layer 后还施以 Residual Connection  Layer Normalization 来促进梯度传播和模型收敛。
Decoder 具有和 Encoder 类似的结构,只是相比于组成 Encoder 的 layer ,在组成 Decoder 的 layer 中还多了一个 Multi-Head Attention 的 sub-layer 来实现对 Encoder 输出的 Attention,这个 Encoder-Decoder Attention 在其他 Seq2Seq 模型中也是存在的。
具体的关于模型方面的理解可以参考博客:深入学习Google Transformer模型网络结构

代码结构说明

以下是本示例的目录结构以及说明:

├── trained_models       # 预训练的模型存放的位置
├── gen_data             # 数据集
├── trained_ckpts        # 训练时checkpoints保存的位置
├── model.py             # transformer模型
├── desc.py              # 模型的参数
├── dist_utils.py        # 多卡训练
├── config.py            # 训练、预测以及模型参数配置
├── infer.py             # 预测脚本
├── base_reader.py       # 基本的数据读取接口
├── data_reader.py       # 利用base_reader实现的一些类似于batch_reader等数据读取接口
├── train.py             # 训练脚本
├── freeze.py            # 固化训练后的模型参数
├── freeze_infer.py      # 使用固化的模型参数进行预测
└── gen_data.sh          # 数据生成脚本

注意:gen_data中的数据有两种获取方式,具体如下:

  1. 运行「gen_data.sh」脚本进行WMT'16 EN-DE 数据集的下载和预处理。数据处理过程主要包括 Tokenize 和 BPE 编码(byte-pair encoding)
  2. 下载处理好的 WMT'16 EN-DE 数据:下载(包含训练所需 BPE 数据和词典以及预测和评估所需的 BPE 数据和 tokenize 的数据)
    本示例是直接下载处理好的WMT'16 EN-DE 数据进行的,因为方式1的运行时间较长

数据格式

  • 本示例程序中支持的数据格式为制表符 \t 分隔的源语言和目标语言句子对,句子中的 token 之间使用空格分隔 。如需使用 BPE 编码,亦可以使用类似 WMT'16 EN-DE 原始数据的格式,参照 gen_data.sh 进行处理。
    例如本示例中使用的BPE编码后的数据如下:
hey are not even 100 metres apart : On Tuesday , the new B 33 pedestrian lights in Dorf@@ park@@ platz in Gut@@ ach became operational - within view of the existing Town Hall traffic lights .  "\t"  Sie stehen keine 100 Meter voneinander entfernt : Am Dienstag ist in Gut@@ ach die neue B 3@@ 3-@@ Fußgänger@@ amp@@ el am Dorf@@ park@@ platz in Betrieb genommen worden - in Sicht@@ weite der älteren Ra@@ th@@ aus@@ amp@@ el .

为了展示效果,在上述例子中我们将'\t'直观的显示出来

效果说明

使用公开的WMT'16 EN-DE 数据集训练Base、Big两种配置的Transformer模型后,在相应的测试集上进行测评,效果如下:

测试集 newstest2014 newstest2015 newstest2016
Base 26.35 29.07 33.30
Big 27.07 30.09 34.38
In[2]
# 解压数据
!mkdir gen_data
!tar xzf data/data9711/wmt16_ende_data_bpe_clean.tar.gz -C gen_data/
In[3]
# 下载预训练的模型并解压
!wget https://transformer-res.bj.bcebos.com/base_model.tar.gz
!mkdir trained_models
!tar xzf base_model.tar.gz -C trained_models/
!rm base_model.tar.gz
--2019-08-12 16:40:05--  https://transformer-res.bj.bcebos.com/base_model.tar.gz
Resolving transformer-res.bj.bcebos.com (transformer-res.bj.bcebos.com)... 220.181.33.44, 220.181.33.43
Connecting to transformer-res.bj.bcebos.com (transformer-res.bj.bcebos.com)|220.181.33.44|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 234161337 (223M) [application/x-gzip]
Saving to: ‘base_model.tar.gz’

base_model.tar.gz   100%[===================>] 223.31M  25.6MB/s    in 9.9s    

2019-08-12 16:40:16 (22.6 MB/s) - ‘base_model.tar.gz’ saved [234161337/234161337]
In[15]
# 更多详细的信息可以使用 python train.py --help 来获取
# 更多模型训练相关的参数则在 `config.py` 中的 `ModelHyperParams` 和 `TrainTaskConfig` 内定义
# `ModelHyperParams` 定义了 embedding 维度等模型超参数
# `TrainTaskConfig` 定义了 warmup 步数等训练需要的参数
# 这些参数默认使用了 Transformer 论文中 base model 的配置,如需调整可以在该脚本中进行修改
# 另外这些参数同样可在执行训练脚本的命令行中设置,传入的配置会合并并覆盖 `config.py` 中的配置
# 由于训练数据较多,该模型的训练很慢,在batch_size为2048时,训练20000个step大概需要1个小时,然后这还没到一个epoch
# 因此不建议该项目在cpu环境中运行
!python -u train.py \
  --src_vocab_fpath gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000 \
  --trg_vocab_fpath gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000 \
  --special_token '<s>' '<e>' '<unk>' \
  --train_file_pattern gen_data/wmt16_ende_data_bpe_clean/train.tok.clean.bpe.32000.en-de \
  --token_delimiter ' ' \
  --use_token_batch True \
  --batch_size 2048 \
  --sort_type pool \
  --pool_size 2000
[2019-09-05 16:10:11,670 INFO train.py:431] Namespace(batch_size=2048, device='GPU', enable_ce=False, fetch_steps=100, local=True, opts=[], pool_size=2000, shuffle=True, shuffle_batch=True, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='gen_data/wmt16_ende_data_bpe_clean/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000', update_method='pserver', use_mem_opt=True, use_py_reader=False, use_token_batch=True, val_file_pattern=None)
[2019-09-05 16:10:12,048 INFO train.py:481] before adam
[2019-09-05 16:10:12,791 INFO train.py:496] local start_up:
1
[2019-09-05 16:10:12,791 INFO train.py:285] init fluid.framework.default_startup_program
W0905 16:10:13.675068   593 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0905 16:10:13.678527   593 device_context.cc:267] device: 0, cuDNN Version: 7.3.
[2019-09-05 16:10:13,712 INFO train.py:288] begin reader
[2019-09-05 16:12:41,177 INFO train.py:315] begin executor
[2019-09-05 16:12:41,255 WARNING compiler.py:239] 
     You can try our memory optimize feature to save your memory usage:
         # create a build_strategy variable to set memory optimize option
         build_strategy = compiler.BuildStrategy()
         build_strategy.enable_inplace = True
         build_strategy.memory_optimize = True
         
         # pass the build_strategy to with_data_parallel API
         compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
             loss_name=loss.name, build_strategy=build_strategy)
      
     !!! Memory optimize is our experimental feature !!!
         some variables may be removed/reused internal to save memory usage, 
         in order to fetch the right value of the fetch_list, please set the 
         persistable property to true for each variable in fetch_list

         # Sample
         conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) 
         # if you need to fetch conv1, then:
         conv1.persistable = True

                 
I0905 16:12:41.296064   593 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0905 16:12:41.663770   593 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[2019-09-05 16:12:41,751 INFO train.py:337] begin train
1
[2019-09-05 16:12:58,289 INFO train.py:372] step_idx: 0, epoch: 0, batch: 0, avg loss: 11.071126, normalized loss: 9.695386, ppl: 64287.851562
[2019-09-05 16:13:08,746 INFO train.py:381] step_idx: 100, epoch: 0, batch: 100, avg loss: 9.614654, normalized loss: 8.238913, ppl: 14982.731445, speed: 9.56 step/s
[2019-09-05 16:13:24,768 INFO train.py:381] step_idx: 200, epoch: 0, batch: 200, avg loss: 9.040323, normalized loss: 7.664583, ppl: 8436.503906, speed: 6.24 step/s
[2019-09-05 16:13:40,750 INFO train.py:381] step_idx: 300, epoch: 0, batch: 300, avg loss: 8.752187, normalized loss: 7.376446, ppl: 6324.503418, speed: 6.26 step/s
[2019-09-05 16:13:56,802 INFO train.py:381] step_idx: 400, epoch: 0, batch: 400, avg loss: 8.220883, normalized loss: 6.845143, ppl: 3717.785156, speed: 6.23 step/s
[2019-09-05 16:14:12,780 INFO train.py:381] step_idx: 500, epoch: 0, batch: 500, avg loss: 8.172359, normalized loss: 6.796619, ppl: 3541.690674, speed: 6.26 step/s
[2019-09-05 16:14:28,721 INFO train.py:381] step_idx: 600, epoch: 0, batch: 600, avg loss: 7.568666, normalized loss: 6.192926, ppl: 1936.555176, speed: 6.27 step/s
[2019-09-05 16:14:44,731 INFO train.py:381] step_idx: 700, epoch: 0, batch: 700, avg loss: 7.726439, normalized loss: 6.350698, ppl: 2267.512207, speed: 6.25 step/s
[2019-09-05 16:15:00,755 INFO train.py:381] step_idx: 800, epoch: 0, batch: 800, avg loss: 7.733466, normalized loss: 6.357726, ppl: 2283.503418, speed: 6.24 step/s
[2019-09-05 16:15:16,870 INFO train.py:381] step_idx: 900, epoch: 0, batch: 900, avg loss: 7.637717, normalized loss: 6.261977, ppl: 2075.001709, speed: 6.21 step/s
[2019-09-05 16:15:32,850 INFO train.py:381] step_idx: 1000, epoch: 0, batch: 1000, avg loss: 7.468491, normalized loss: 6.092750, ppl: 1751.960327, speed: 6.26 step/s
[2019-09-05 16:15:48,804 INFO train.py:381] step_idx: 1100, epoch: 0, batch: 1100, avg loss: 7.492769, normalized loss: 6.117029, ppl: 1795.015991, speed: 6.27 step/s
[2019-09-05 16:16:04,741 INFO train.py:381] step_idx: 1200, epoch: 0, batch: 1200, avg loss: 7.181555, normalized loss: 5.805815, ppl: 1314.951782, speed: 6.27 step/s
[2019-09-05 16:16:20,663 INFO train.py:381] step_idx: 1300, epoch: 0, batch: 1300, avg loss: 7.425134, normalized loss: 6.049394, ppl: 1677.624634, speed: 6.28 step/s
[2019-09-05 16:16:36,611 INFO train.py:381] step_idx: 1400, epoch: 0, batch: 1400, avg loss: 7.263412, normalized loss: 5.887672, ppl: 1427.118286, speed: 6.27 step/s
[2019-09-05 16:16:52,512 INFO train.py:381] step_idx: 1500, epoch: 0, batch: 1500, avg loss: 7.011061, normalized loss: 5.635321, ppl: 1108.830566, speed: 6.29 step/s
[2019-09-05 16:17:08,377 INFO train.py:381] step_idx: 1600, epoch: 0, batch: 1600, avg loss: 7.222647, normalized loss: 5.846906, ppl: 1370.110596, speed: 6.30 step/s
[2019-09-05 16:17:24,412 INFO train.py:381] step_idx: 1700, epoch: 0, batch: 1700, avg loss: 6.902591, normalized loss: 5.526850, ppl: 994.848816, speed: 6.24 step/s
[2019-09-05 16:17:40,315 INFO train.py:381] step_idx: 1800, epoch: 0, batch: 1800, avg loss: 6.438662, normalized loss: 5.062922, ppl: 625.569275, speed: 6.29 step/s
[2019-09-05 16:17:56,339 INFO train.py:381] step_idx: 1900, epoch: 0, batch: 1900, avg loss: 6.615817, normalized loss: 5.240076, ppl: 746.814331, speed: 6.24 step/s
[2019-09-05 16:18:12,173 INFO train.py:381] step_idx: 2000, epoch: 0, batch: 2000, avg loss: 6.905449, normalized loss: 5.529709, ppl: 997.696289, speed: 6.32 step/s
[2019-09-05 16:18:28,106 INFO train.py:381] step_idx: 2100, epoch: 0, batch: 2100, avg loss: 6.192332, normalized loss: 4.816591, ppl: 488.984985, speed: 6.28 step/s
[2019-09-05 16:18:43,939 INFO train.py:381] step_idx: 2200, epoch: 0, batch: 2200, avg loss: 6.941187, normalized loss: 5.565447, ppl: 1033.996704, speed: 6.32 step/s
[2019-09-05 16:18:59,869 INFO train.py:381] step_idx: 2300, epoch: 0, batch: 2300, avg loss: 5.660133, normalized loss: 4.284393, ppl: 287.186951, speed: 6.28 step/s
[2019-09-05 16:19:15,763 INFO train.py:381] step_idx: 2400, epoch: 0, batch: 2400, avg loss: 5.564558, normalized loss: 4.188818, ppl: 261.009827, speed: 6.29 step/s
[2019-09-05 16:19:31,611 INFO train.py:381] step_idx: 2500, epoch: 0, batch: 2500, avg loss: 6.237848, normalized loss: 4.862108, ppl: 511.755920, speed: 6.31 step/s
[2019-09-05 16:19:47,582 INFO train.py:381] step_idx: 2600, epoch: 0, batch: 2600, avg loss: 6.154101, normalized loss: 4.778361, ppl: 470.643494, speed: 6.26 step/s
[2019-09-05 16:20:03,873 INFO train.py:381] step_idx: 2700, epoch: 0, batch: 2700, avg loss: 5.174521, normalized loss: 3.798781, ppl: 176.712021, speed: 6.14 step/s
[2019-09-05 16:20:19,834 INFO train.py:381] step_idx: 2800, epoch: 0, batch: 2800, avg loss: 6.595821, normalized loss: 5.220081, ppl: 732.029541, speed: 6.27 step/s
[2019-09-05 16:20:35,784 INFO train.py:381] step_idx: 2900, epoch: 0, batch: 2900, avg loss: 6.381345, normalized loss: 5.005604, ppl: 590.721558, speed: 6.27 step/s
[2019-09-05 16:20:51,694 INFO train.py:381] step_idx: 3000, epoch: 0, batch: 3000, avg loss: 6.197413, normalized loss: 4.821673, ppl: 491.475922, speed: 6.29 step/s
[2019-09-05 16:21:07,604 INFO train.py:381] step_idx: 3100, epoch: 0, batch: 3100, avg loss: 6.186893, normalized loss: 4.811153, ppl: 486.332947, speed: 6.29 step/s
[2019-09-05 16:21:23,492 INFO train.py:381] step_idx: 3200, epoch: 0, batch: 3200, avg loss: 5.934240, normalized loss: 4.558500, ppl: 377.752930, speed: 6.29 step/s
[2019-09-05 16:21:39,344 INFO train.py:381] step_idx: 3300, epoch: 0, batch: 3300, avg loss: 5.725294, normalized loss: 4.349553, ppl: 306.523254, speed: 6.31 step/s
[2019-09-05 16:21:55,279 INFO train.py:381] step_idx: 3400, epoch: 0, batch: 3400, avg loss: 6.016361, normalized loss: 4.640621, ppl: 410.083679, speed: 6.28 step/s
[2019-09-05 16:22:11,214 INFO train.py:381] step_idx: 3500, epoch: 0, batch: 3500, avg loss: 5.962931, normalized loss: 4.587191, ppl: 388.747925, speed: 6.28 step/s
[2019-09-05 16:22:27,179 INFO train.py:381] step_idx: 3600, epoch: 0, batch: 3600, avg loss: 6.757975, normalized loss: 5.382235, ppl: 860.897217, speed: 6.26 step/s
[2019-09-05 16:22:43,028 INFO train.py:381] step_idx: 3700, epoch: 0, batch: 3700, avg loss: 5.346803, normalized loss: 3.971063, ppl: 209.936096, speed: 6.31 step/s
[2019-09-05 16:22:58,899 INFO train.py:381] step_idx: 3800, epoch: 0, batch: 3800, avg loss: 6.103039, normalized loss: 4.727299, ppl: 447.214905, speed: 6.30 step/s
[2019-09-05 16:23:14,747 INFO train.py:381] step_idx: 3900, epoch: 0, batch: 3900, avg loss: 5.559483, normalized loss: 4.183742, ppl: 259.688446, speed: 6.31 step/s
[2019-09-05 16:23:30,544 INFO train.py:381] step_idx: 4000, epoch: 0, batch: 4000, avg loss: 5.627350, normalized loss: 4.251610, ppl: 277.924591, speed: 6.33 step/s
[2019-09-05 16:23:46,386 INFO train.py:381] step_idx: 4100, epoch: 0, batch: 4100, avg loss: 5.427331, normalized loss: 4.051591, ppl: 227.541229, speed: 6.31 step/s
[2019-09-05 16:24:02,320 INFO train.py:381] step_idx: 4200, epoch: 0, batch: 4200, avg loss: 5.468938, normalized loss: 4.093198, ppl: 237.208115, speed: 6.28 step/s
[2019-09-05 16:24:18,175 INFO train.py:381] step_idx: 4300, epoch: 0, batch: 4300, avg loss: 5.850169, normalized loss: 4.474428, ppl: 347.292969, speed: 6.31 step/s
[2019-09-05 16:24:33,998 INFO train.py:381] step_idx: 4400, epoch: 0, batch: 4400, avg loss: 5.891147, normalized loss: 4.515406, ppl: 361.819916, speed: 6.32 step/s
[2019-09-05 16:24:49,812 INFO train.py:381] step_idx: 4500, epoch: 0, batch: 4500, avg loss: 3.850545, normalized loss: 2.474805, ppl: 47.018677, speed: 6.32 step/s
[2019-09-05 16:25:05,681 INFO train.py:381] step_idx: 4600, epoch: 0, batch: 4600, avg loss: 4.959150, normalized loss: 3.583410, ppl: 142.472626, speed: 6.30 step/s
[2019-09-05 16:25:21,507 INFO train.py:381] step_idx: 4700, epoch: 0, batch: 4700, avg loss: 5.061464, normalized loss: 3.685724, ppl: 157.821365, speed: 6.32 step/s
[2019-09-05 16:25:37,441 INFO train.py:381] step_idx: 4800, epoch: 0, batch: 4800, avg loss: 5.410656, normalized loss: 4.034916, ppl: 223.778336, speed: 6.28 step/s
[2019-09-05 16:25:53,284 INFO train.py:381] step_idx: 4900, epoch: 0, batch: 4900, avg loss: 4.713914, normalized loss: 3.338174, ppl: 111.487717, speed: 6.31 step/s
[2019-09-05 16:26:09,143 INFO train.py:381] step_idx: 5000, epoch: 0, batch: 5000, avg loss: 4.792452, normalized loss: 3.416712, ppl: 120.596695, speed: 6.31 step/s
[2019-09-05 16:26:25,281 INFO train.py:381] step_idx: 5100, epoch: 0, batch: 5100, avg loss: 4.913872, normalized loss: 3.538131, ppl: 136.165604, speed: 6.20 step/s
[2019-09-05 16:26:41,135 INFO train.py:381] step_idx: 5200, epoch: 0, batch: 5200, avg loss: 4.692264, normalized loss: 3.316524, ppl: 109.099915, speed: 6.31 step/s
[2019-09-05 16:26:56,987 INFO train.py:381] step_idx: 5300, epoch: 0, batch: 5300, avg loss: 4.983935, normalized loss: 3.608195, ppl: 146.047928, speed: 6.31 step/s
[2019-09-05 16:27:12,791 INFO train.py:381] step_idx: 5400, epoch: 0, batch: 5400, avg loss: 5.191663, normalized loss: 3.815923, ppl: 179.767303, speed: 6.33 step/s
[2019-09-05 16:27:28,515 INFO train.py:381] step_idx: 5500, epoch: 0, batch: 5500, avg loss: 4.504713, normalized loss: 3.128973, ppl: 90.442390, speed: 6.36 step/s
[2019-09-05 16:27:44,440 INFO train.py:381] step_idx: 5600, epoch: 0, batch: 5600, avg loss: 4.741674, normalized loss: 3.365934, ppl: 114.625977, speed: 6.28 step/s
[2019-09-05 16:28:00,252 INFO train.py:381] step_idx: 5700, epoch: 0, batch: 5700, avg loss: 4.921169, normalized loss: 3.545429, ppl: 137.162903, speed: 6.32 step/s
[2019-09-05 16:28:16,124 INFO train.py:381] step_idx: 5800, epoch: 0, batch: 5800, avg loss: 4.510664, normalized loss: 3.134924, ppl: 90.982208, speed: 6.30 step/s
[2019-09-05 16:28:31,883 INFO train.py:381] step_idx: 5900, epoch: 0, batch: 5900, avg loss: 4.585212, normalized loss: 3.209472, ppl: 98.023987, speed: 6.35 step/s
[2019-09-05 16:28:48,050 INFO train.py:381] step_idx: 6000, epoch: 0, batch: 6000, avg loss: 4.820763, normalized loss: 3.445022, ppl: 124.059669, speed: 6.19 step/s
[2019-09-05 16:29:04,107 INFO train.py:381] step_idx: 6100, epoch: 0, batch: 6100, avg loss: 5.065711, normalized loss: 3.689971, ppl: 158.493088, speed: 6.23 step/s
[2019-09-05 16:29:20,022 INFO train.py:381] step_idx: 6200, epoch: 0, batch: 6200, avg loss: 4.828879, normalized loss: 3.453139, ppl: 125.070724, speed: 6.28 step/s
[2019-09-05 16:29:35,784 INFO train.py:381] step_idx: 6300, epoch: 0, batch: 6300, avg loss: 5.157671, normalized loss: 3.781931, ppl: 173.759293, speed: 6.34 step/s
[2019-09-05 16:29:51,539 INFO train.py:381] step_idx: 6400, epoch: 0, batch: 6400, avg loss: 5.091738, normalized loss: 3.715997, ppl: 162.672302, speed: 6.35 step/s
[2019-09-05 16:30:07,404 INFO train.py:381] step_idx: 6500, epoch: 0, batch: 6500, avg loss: 4.781703, normalized loss: 3.405963, ppl: 119.307358, speed: 6.30 step/s
[2019-09-05 16:30:23,262 INFO train.py:381] step_idx: 6600, epoch: 0, batch: 6600, avg loss: 5.277546, normalized loss: 3.901806, ppl: 195.888565, speed: 6.31 step/s
[2019-09-05 16:30:39,116 INFO train.py:381] step_idx: 6700, epoch: 0, batch: 6700, avg loss: 4.341463, normalized loss: 2.965722, ppl: 76.819817, speed: 6.31 step/s
[2019-09-05 16:30:55,002 INFO train.py:381] step_idx: 6800, epoch: 0, batch: 6800, avg loss: 4.645447, normalized loss: 3.269706, ppl: 104.109871, speed: 6.29 step/s
[2019-09-05 16:31:10,843 INFO train.py:381] step_idx: 6900, epoch: 0, batch: 6900, avg loss: 5.015174, normalized loss: 3.639434, ppl: 150.682343, speed: 6.31 step/s
[2019-09-05 16:31:26,656 INFO train.py:381] step_idx: 7000, epoch: 0, batch: 7000, avg loss: 4.424605, normalized loss: 3.048865, ppl: 83.479820, speed: 6.32 step/s
[2019-09-05 16:31:42,851 INFO train.py:381] step_idx: 7100, epoch: 0, batch: 7100, avg loss: 4.803339, normalized loss: 3.427598, ppl: 121.916763, speed: 6.17 step/s
[2019-09-05 16:31:58,704 INFO train.py:381] step_idx: 7200, epoch: 0, batch: 7200, avg loss: 4.239784, normalized loss: 2.864044, ppl: 69.392876, speed: 6.31 step/s
[2019-09-05 16:32:14,573 INFO train.py:381] step_idx: 7300, epoch: 0, batch: 7300, avg loss: 4.575620, normalized loss: 3.199880, ppl: 97.088234, speed: 6.30 step/s
[2019-09-05 16:32:30,435 INFO train.py:381] step_idx: 7400, epoch: 0, batch: 7400, avg loss: 3.984411, normalized loss: 2.608670, ppl: 53.753605, speed: 6.30 step/s
[2019-09-05 16:32:46,264 INFO train.py:381] step_idx: 7500, epoch: 0, batch: 7500, avg loss: 4.149014, normalized loss: 2.773274, ppl: 63.371487, speed: 6.32 step/s
[2019-09-05 16:33:02,087 INFO train.py:381] step_idx: 7600, epoch: 0, batch: 7600, avg loss: 4.444090, normalized loss: 3.068350, ppl: 85.122375, speed: 6.32 step/s
[2019-09-05 16:33:18,013 INFO train.py:381] step_idx: 7700, epoch: 0, batch: 7700, avg loss: 4.581161, normalized loss: 3.205420, ppl: 97.627632, speed: 6.28 step/s
[2019-09-05 16:33:34,119 INFO train.py:381] step_idx: 7800, epoch: 0, batch: 7800, avg loss: 4.950149, normalized loss: 3.574408, ppl: 141.195938, speed: 6.21 step/s
[2019-09-05 16:33:49,976 INFO train.py:381] step_idx: 7900, epoch: 0, batch: 7900, avg loss: 4.171795, normalized loss: 2.796055, ppl: 64.831711, speed: 6.31 step/s
[2019-09-05 16:34:05,861 INFO train.py:381] step_idx: 8000, epoch: 0, batch: 8000, avg loss: 4.461265, normalized loss: 3.085525, ppl: 86.596992, speed: 6.30 step/s
[2019-09-05 16:34:21,737 INFO train.py:381] step_idx: 8100, epoch: 0, batch: 8100, avg loss: 4.925384, normalized loss: 3.549644, ppl: 137.742233, speed: 6.30 step/s
[2019-09-05 16:34:37,667 INFO train.py:381] step_idx: 8200, epoch: 0, batch: 8200, avg loss: 3.980042, normalized loss: 2.604302, ppl: 53.519279, speed: 6.28 step/s
[2019-09-05 16:34:53,525 INFO train.py:381] step_idx: 8300, epoch: 0, batch: 8300, avg loss: 4.367057, normalized loss: 2.991317, ppl: 78.811333, speed: 6.31 step/s
[2019-09-05 16:35:09,381 INFO train.py:381] step_idx: 8400, epoch: 0, batch: 8400, avg loss: 4.598736, normalized loss: 3.222996, ppl: 99.358673, speed: 6.31 step/s
[2019-09-05 16:35:25,259 INFO train.py:381] step_idx: 8500, epoch: 0, batch: 8500, avg loss: 4.545544, normalized loss: 3.169803, ppl: 94.211632, speed: 6.30 step/s
[2019-09-05 16:35:41,160 INFO train.py:381] step_idx: 8600, epoch: 0, batch: 8600, avg loss: 4.464136, normalized loss: 3.088395, ppl: 86.845932, speed: 6.29 step/s
[2019-09-05 16:35:57,110 INFO train.py:381] step_idx: 8700, epoch: 0, batch: 8700, avg loss: 4.671291, normalized loss: 3.295551, ppl: 106.835617, speed: 6.27 step/s
[2019-09-05 16:36:12,952 INFO train.py:381] step_idx: 8800, epoch: 0, batch: 8800, avg loss: 4.651101, normalized loss: 3.275360, ppl: 104.700157, speed: 6.31 step/s
[2019-09-05 16:36:28,773 INFO train.py:381] step_idx: 8900, epoch: 0, batch: 8900, avg loss: 4.040067, normalized loss: 2.664326, ppl: 56.830135, speed: 6.32 step/s
[2019-09-05 16:36:44,652 INFO train.py:381] step_idx: 9000, epoch: 0, batch: 9000, avg loss: 4.399907, normalized loss: 3.024167, ppl: 81.443306, speed: 6.30 step/s
[2019-09-05 16:37:00,514 INFO train.py:381] step_idx: 9100, epoch: 0, batch: 9100, avg loss: 4.596371, normalized loss: 3.220631, ppl: 99.123962, speed: 6.30 step/s
[2019-09-05 16:37:16,446 INFO train.py:381] step_idx: 9200, epoch: 0, batch: 9200, avg loss: 4.630662, normalized loss: 3.254922, ppl: 102.581993, speed: 6.28 step/s
[2019-09-05 16:37:32,336 INFO train.py:381] step_idx: 9300, epoch: 0, batch: 9300, avg loss: 4.434281, normalized loss: 3.058541, ppl: 84.291489, speed: 6.29 step/s
[2019-09-05 16:37:48,145 INFO train.py:381] step_idx: 9400, epoch: 0, batch: 9400, avg loss: 4.850261, normalized loss: 3.474520, ppl: 127.773697, speed: 6.33 step/s
[2019-09-05 16:38:04,030 INFO train.py:381] step_idx: 9500, epoch: 0, batch: 9500, avg loss: 4.214247, normalized loss: 2.838507, ppl: 67.643227, speed: 6.30 step/s
[2019-09-05 16:38:19,874 INFO train.py:381] step_idx: 9600, epoch: 0, batch: 9600, avg loss: 4.649561, normalized loss: 3.273821, ppl: 104.539078, speed: 6.31 step/s
[2019-09-05 16:38:35,678 INFO train.py:381] step_idx: 9700, epoch: 0, batch: 9700, avg loss: 4.538639, normalized loss: 3.162899, ppl: 93.563377, speed: 6.33 step/s
[2019-09-05 16:38:51,553 INFO train.py:381] step_idx: 9800, epoch: 0, batch: 9800, avg loss: 3.885924, normalized loss: 2.510183, ppl: 48.711914, speed: 6.30 step/s
[2019-09-05 16:39:07,417 INFO train.py:381] step_idx: 9900, epoch: 0, batch: 9900, avg loss: 4.629916, normalized loss: 3.254175, ppl: 102.505424, speed: 6.30 step/s
[2019-09-05 16:39:23,270 INFO train.py:381] step_idx: 10000, epoch: 0, batch: 10000, avg loss: 4.950545, normalized loss: 3.574805, ppl: 141.251907, speed: 6.31 step/s
[2019-09-05 16:39:39,153 INFO train.py:381] step_idx: 10100, epoch: 0, batch: 10100, avg loss: 4.636306, normalized loss: 3.260566, ppl: 103.162544, speed: 6.30 step/s
[2019-09-05 16:39:54,949 INFO train.py:381] step_idx: 10200, epoch: 0, batch: 10200, avg loss: 3.933517, normalized loss: 2.557777, ppl: 51.086357, speed: 6.33 step/s
[2019-09-05 16:40:10,680 INFO train.py:381] step_idx: 10300, epoch: 0, batch: 10300, avg loss: 4.710368, normalized loss: 3.334628, ppl: 111.093056, speed: 6.36 step/s
[2019-09-05 16:40:26,764 INFO train.py:381] step_idx: 10400, epoch: 0, batch: 10400, avg loss: 4.254222, normalized loss: 2.878482, ppl: 70.402016, speed: 6.22 step/s
[2019-09-05 16:40:42,607 INFO train.py:381] step_idx: 10500, epoch: 0, batch: 10500, avg loss: 4.134883, normalized loss: 2.759143, ppl: 62.482304, speed: 6.31 step/s
[2019-09-05 16:40:58,471 INFO train.py:381] step_idx: 10600, epoch: 0, batch: 10600, avg loss: 3.953880, normalized loss: 2.578140, ppl: 52.137283, speed: 6.30 step/s
[2019-09-05 16:41:14,365 INFO train.py:381] step_idx: 10700, epoch: 0, batch: 10700, avg loss: 4.106468, normalized loss: 2.730727, ppl: 60.731815, speed: 6.29 step/s
[2019-09-05 16:41:30,193 INFO train.py:381] step_idx: 10800, epoch: 0, batch: 10800, avg loss: 4.502720, normalized loss: 3.126980, ppl: 90.262344, speed: 6.32 step/s
[2019-09-05 16:41:45,915 INFO train.py:381] step_idx: 10900, epoch: 0, batch: 10900, avg loss: 4.643319, normalized loss: 3.267579, ppl: 103.888596, speed: 6.36 step/s
[2019-09-05 16:42:01,725 INFO train.py:381] step_idx: 11000, epoch: 0, batch: 11000, avg loss: 6.191985, normalized loss: 4.816245, ppl: 488.815521, speed: 6.32 step/s
[2019-09-05 16:42:17,500 INFO train.py:381] step_idx: 11100, epoch: 0, batch: 11100, avg loss: 4.250278, normalized loss: 2.874537, ppl: 70.124870, speed: 6.34 step/s
[2019-09-05 16:42:33,404 INFO train.py:381] step_idx: 11200, epoch: 0, batch: 11200, avg loss: 4.404364, normalized loss: 3.028624, ppl: 81.807106, speed: 6.29 step/s
[2019-09-05 16:42:49,218 INFO train.py:381] step_idx: 11300, epoch: 0, batch: 11300, avg loss: 5.055368, normalized loss: 3.679628, ppl: 156.862244, speed: 6.32 step/s
[2019-09-05 16:43:04,965 INFO train.py:381] step_idx: 11400, epoch: 0, batch: 11400, avg loss: 3.847073, normalized loss: 2.471332, ppl: 46.855698, speed: 6.35 step/s
[2019-09-05 16:43:20,804 INFO train.py:381] step_idx: 11500, epoch: 0, batch: 11500, avg loss: 3.526150, normalized loss: 2.150410, ppl: 33.992844, speed: 6.31 step/s
[2019-09-05 16:43:36,640 INFO train.py:381] step_idx: 11600, epoch: 0, batch: 11600, avg loss: 4.486849, normalized loss: 3.111109, ppl: 88.841095, speed: 6.31 step/s
[2019-09-05 16:43:52,643 INFO train.py:381] step_idx: 11700, epoch: 0, batch: 11700, avg loss: 3.743704, normalized loss: 2.367963, ppl: 42.254192, speed: 6.25 step/s
[2019-09-05 16:44:08,416 INFO train.py:381] step_idx: 11800, epoch: 0, batch: 11800, avg loss: 4.408619, normalized loss: 3.032879, ppl: 82.155922, speed: 6.34 step/s
[2019-09-05 16:44:24,221 INFO train.py:381] step_idx: 11900, epoch: 0, batch: 11900, avg loss: 3.673965, normalized loss: 2.298225, ppl: 39.407856, speed: 6.33 step/s
[2019-09-05 16:44:39,971 INFO train.py:381] step_idx: 12000, epoch: 0, batch: 12000, avg loss: 3.559720, normalized loss: 2.183979, ppl: 35.153336, speed: 6.35 step/s
[2019-09-05 16:44:55,768 INFO train.py:381] step_idx: 12100, epoch: 0, batch: 12100, avg loss: 4.457419, normalized loss: 3.081679, ppl: 86.264610, speed: 6.33 step/s
[2019-09-05 16:45:11,558 INFO train.py:381] step_idx: 12200, epoch: 0, batch: 12200, avg loss: 3.557376, normalized loss: 2.181635, ppl: 35.071037, speed: 6.33 step/s
[2019-09-05 16:45:27,352 INFO train.py:381] step_idx: 12300, epoch: 0, batch: 12300, avg loss: 4.488514, normalized loss: 3.112774, ppl: 88.989105, speed: 6.33 step/s
[2019-09-05 16:45:43,424 INFO train.py:381] step_idx: 12400, epoch: 0, batch: 12400, avg loss: 4.505509, normalized loss: 3.129769, ppl: 90.514397, speed: 6.22 step/s
[2019-09-05 16:45:59,195 INFO train.py:381] step_idx: 12500, epoch: 0, batch: 12500, avg loss: 3.975097, normalized loss: 2.599357, ppl: 53.255291, speed: 6.34 step/s
[2019-09-05 16:46:14,954 INFO train.py:381] step_idx: 12600, epoch: 0, batch: 12600, avg loss: 3.900831, normalized loss: 2.525091, ppl: 49.443542, speed: 6.35 step/s
[2019-09-05 16:46:30,799 INFO train.py:381] step_idx: 12700, epoch: 0, batch: 12700, avg loss: 3.989999, normalized loss: 2.614259, ppl: 54.054825, speed: 6.31 step/s
[2019-09-05 16:46:46,674 INFO train.py:381] step_idx: 12800, epoch: 0, batch: 12800, avg loss: 3.497594, normalized loss: 2.121854, ppl: 33.035885, speed: 6.30 step/s
[2019-09-05 16:47:02,484 INFO train.py:381] step_idx: 12900, epoch: 0, batch: 12900, avg loss: 3.074656, normalized loss: 1.698916, ppl: 21.642435, speed: 6.33 step/s
[2019-09-05 16:47:18,225 INFO train.py:381] step_idx: 13000, epoch: 0, batch: 13000, avg loss: 3.461395, normalized loss: 2.085655, ppl: 31.861393, speed: 6.35 step/s
[2019-09-05 16:47:34,194 INFO train.py:381] step_idx: 13100, epoch: 0, batch: 13100, avg loss: 4.184169, normalized loss: 2.808429, ppl: 65.638924, speed: 6.26 step/s
[2019-09-05 16:47:50,080 INFO train.py:381] step_idx: 13200, epoch: 0, batch: 13200, avg loss: 4.442073, normalized loss: 3.066333, ppl: 84.950851, speed: 6.29 step/s
[2019-09-05 16:48:05,863 INFO train.py:381] step_idx: 13300, epoch: 0, batch: 13300, avg loss: 4.630250, normalized loss: 3.254510, ppl: 102.539696, speed: 6.34 step/s
[2019-09-05 16:48:21,846 INFO train.py:381] step_idx: 13400, epoch: 0, batch: 13400, avg loss: 3.766955, normalized loss: 2.391215, ppl: 43.248169, speed: 6.26 step/s
[2019-09-05 16:48:37,725 INFO train.py:381] step_idx: 13500, epoch: 0, batch: 13500, avg loss: 3.976989, normalized loss: 2.601249, ppl: 53.356140, speed: 6.30 step/s
[2019-09-05 16:48:53,563 INFO train.py:381] step_idx: 13600, epoch: 0, batch: 13600, avg loss: 3.967489, normalized loss: 2.591748, ppl: 52.851643, speed: 6.31 step/s
[2019-09-05 16:49:09,388 INFO train.py:381] step_idx: 13700, epoch: 0, batch: 13700, avg loss: 4.166954, normalized loss: 2.791213, ppl: 64.518600, speed: 6.32 step/s
[2019-09-05 16:49:25,177 INFO train.py:381] step_idx: 13800, epoch: 0, batch: 13800, avg loss: 3.623917, normalized loss: 2.248176, ppl: 37.484093, speed: 6.33 step/s
[2019-09-05 16:49:40,925 INFO train.py:381] step_idx: 13900, epoch: 0, batch: 13900, avg loss: 3.945514, normalized loss: 2.569774, ppl: 51.702904, speed: 6.35 step/s
[2019-09-05 16:49:56,683 INFO train.py:381] step_idx: 14000, epoch: 0, batch: 14000, avg loss: 4.418413, normalized loss: 3.042673, ppl: 82.964531, speed: 6.35 step/s
[2019-09-05 16:50:12,516 INFO train.py:381] step_idx: 14100, epoch: 0, batch: 14100, avg loss: 3.971454, normalized loss: 2.595714, ppl: 53.061634, speed: 6.32 step/s
[2019-09-05 16:50:28,511 INFO train.py:381] step_idx: 14200, epoch: 0, batch: 14200, avg loss: 4.465334, normalized loss: 3.089594, ppl: 86.950104, speed: 6.25 step/s
[2019-09-05 16:50:44,420 INFO train.py:381] step_idx: 14300, epoch: 0, batch: 14300, avg loss: 3.604965, normalized loss: 2.229225, ppl: 36.780403, speed: 6.29 step/s
[2019-09-05 16:51:00,245 INFO train.py:381] step_idx: 14400, epoch: 0, batch: 14400, avg loss: 3.812929, normalized loss: 2.437189, ppl: 45.282898, speed: 6.32 step/s
[2019-09-05 16:51:16,111 INFO train.py:381] step_idx: 14500, epoch: 0, batch: 14500, avg loss: 3.931731, normalized loss: 2.555991, ppl: 50.995186, speed: 6.30 step/s
[2019-09-05 16:51:32,078 INFO train.py:381] step_idx: 14600, epoch: 0, batch: 14600, avg loss: 4.105995, normalized loss: 2.730254, ppl: 60.703094, speed: 6.26 step/s
[2019-09-05 16:51:48,047 INFO train.py:381] step_idx: 14700, epoch: 0, batch: 14700, avg loss: 3.640428, normalized loss: 2.264688, ppl: 38.108135, speed: 6.26 step/s
[2019-09-05 16:52:03,865 INFO train.py:381] step_idx: 14800, epoch: 0, batch: 14800, avg loss: 3.436378, normalized loss: 2.060638, ppl: 31.074219, speed: 6.32 step/s
[2019-09-05 16:52:19,679 INFO train.py:381] step_idx: 14900, epoch: 0, batch: 14900, avg loss: 3.696231, normalized loss: 2.320491, ppl: 40.295151, speed: 6.32 step/s
[2019-09-05 16:52:35,493 INFO train.py:381] step_idx: 15000, epoch: 0, batch: 15000, avg loss: 3.339145, normalized loss: 1.963405, ppl: 28.195021, speed: 6.32 step/s
[2019-09-05 16:52:51,277 INFO train.py:381] step_idx: 15100, epoch: 0, batch: 15100, avg loss: 3.807851, normalized loss: 2.432111, ppl: 45.053528, speed: 6.34 step/s
[2019-09-05 16:53:07,148 INFO train.py:381] step_idx: 15200, epoch: 0, batch: 15200, avg loss: 4.188531, normalized loss: 2.812791, ppl: 65.925873, speed: 6.30 step/s
[2019-09-05 16:53:23,060 INFO train.py:381] step_idx: 15300, epoch: 0, batch: 15300, avg loss: 3.793076, normalized loss: 2.417335, ppl: 44.392723, speed: 6.28 step/s
[2019-09-05 16:53:39,185 INFO train.py:381] step_idx: 15400, epoch: 0, batch: 15400, avg loss: 4.013710, normalized loss: 2.637970, ppl: 55.351845, speed: 6.20 step/s
[2019-09-05 16:53:55,265 INFO train.py:381] step_idx: 15500, epoch: 0, batch: 15500, avg loss: 4.550488, normalized loss: 3.174748, ppl: 94.678642, speed: 6.22 step/s
[2019-09-05 16:54:11,164 INFO train.py:381] step_idx: 15600, epoch: 0, batch: 15600, avg loss: 3.797143, normalized loss: 2.421402, ppl: 44.573635, speed: 6.29 step/s
[2019-09-05 16:54:26,952 INFO train.py:381] step_idx: 15700, epoch: 0, batch: 15700, avg loss: 4.050385, normalized loss: 2.674645, ppl: 57.419559, speed: 6.33 step/s
[2019-09-05 16:54:42,922 INFO train.py:381] step_idx: 15800, epoch: 0, batch: 15800, avg loss: 3.952860, normalized loss: 2.577120, ppl: 52.084133, speed: 6.26 step/s
[2019-09-05 16:54:58,758 INFO train.py:381] step_idx: 15900, epoch: 0, batch: 15900, avg loss: 4.094244, normalized loss: 2.718504, ppl: 59.993996, speed: 6.32 step/s
[2019-09-05 16:55:14,603 INFO train.py:381] step_idx: 16000, epoch: 0, batch: 16000, avg loss: 3.783485, normalized loss: 2.407745, ppl: 43.969025, speed: 6.31 step/s
[2019-09-05 16:55:30,475 INFO train.py:381] step_idx: 16100, epoch: 0, batch: 16100, avg loss: 3.909370, normalized loss: 2.533630, ppl: 49.867546, speed: 6.30 step/s
[2019-09-05 16:55:46,424 INFO train.py:381] step_idx: 16200, epoch: 0, batch: 16200, avg loss: 3.508461, normalized loss: 2.132720, ppl: 33.396816, speed: 6.27 step/s
[2019-09-05 16:56:02,299 INFO train.py:381] step_idx: 16300, epoch: 0, batch: 16300, avg loss: 4.103420, normalized loss: 2.727679, ppl: 60.546993, speed: 6.30 step/s
[2019-09-05 16:56:18,179 INFO train.py:381] step_idx: 16400, epoch: 0, batch: 16400, avg loss: 4.270930, normalized loss: 2.895190, ppl: 71.588203, speed: 6.30 step/s
[2019-09-05 16:56:34,027 INFO train.py:381] step_idx: 16500, epoch: 0, batch: 16500, avg loss: 4.148382, normalized loss: 2.772641, ppl: 63.331429, speed: 6.31 step/s
[2019-09-05 16:56:50,071 INFO train.py:381] step_idx: 16600, epoch: 0, batch: 16600, avg loss: 3.825178, normalized loss: 2.449438, ppl: 45.840977, speed: 6.23 step/s
[2019-09-05 16:57:05,930 INFO train.py:381] step_idx: 16700, epoch: 0, batch: 16700, avg loss: 3.397982, normalized loss: 2.022241, ppl: 29.903683, speed: 6.31 step/s
[2019-09-05 16:57:21,843 INFO train.py:381] step_idx: 16800, epoch: 0, batch: 16800, avg loss: 4.203436, normalized loss: 2.827696, ppl: 66.915855, speed: 6.28 step/s
[2019-09-05 16:57:37,722 INFO train.py:381] step_idx: 16900, epoch: 0, batch: 16900, avg loss: 3.616197, normalized loss: 2.240457, ppl: 37.195839, speed: 6.30 step/s
[2019-09-05 16:57:53,595 INFO train.py:381] step_idx: 17000, epoch: 0, batch: 17000, avg loss: 4.340471, normalized loss: 2.964731, ppl: 76.743698, speed: 6.30 step/s
[2019-09-05 16:58:09,434 INFO train.py:381] step_idx: 17100, epoch: 0, batch: 17100, avg loss: 3.761116, normalized loss: 2.385376, ppl: 42.996384, speed: 6.31 step/s
[2019-09-05 16:58:25,265 INFO train.py:381] step_idx: 17200, epoch: 0, batch: 17200, avg loss: 3.962679, normalized loss: 2.586938, ppl: 52.598030, speed: 6.32 step/s
[2019-09-05 16:58:41,183 INFO train.py:381] step_idx: 17300, epoch: 0, batch: 17300, avg loss: 3.622895, normalized loss: 2.247155, ppl: 37.445827, speed: 6.28 step/s
[2019-09-05 16:58:57,080 INFO train.py:381] step_idx: 17400, epoch: 0, batch: 17400, avg loss: 3.538991, normalized loss: 2.163251, ppl: 34.432167, speed: 6.29 step/s
[2019-09-05 16:59:12,906 INFO train.py:381] step_idx: 17500, epoch: 0, batch: 17500, avg loss: 3.728656, normalized loss: 2.352916, ppl: 41.623131, speed: 6.32 step/s
[2019-09-05 16:59:28,890 INFO train.py:381] step_idx: 17600, epoch: 0, batch: 17600, avg loss: 3.360714, normalized loss: 1.984974, ppl: 28.809753, speed: 6.26 step/s
[2019-09-05 16:59:44,853 INFO train.py:381] step_idx: 17700, epoch: 0, batch: 17700, avg loss: 3.765436, normalized loss: 2.389696, ppl: 43.182526, speed: 6.26 step/s
[2019-09-05 17:00:00,608 INFO train.py:381] step_idx: 17800, epoch: 0, batch: 17800, avg loss: 3.850994, normalized loss: 2.475254, ppl: 47.039803, speed: 6.35 step/s
[2019-09-05 17:00:16,425 INFO train.py:381] step_idx: 17900, epoch: 0, batch: 17900, avg loss: 3.832510, normalized loss: 2.456770, ppl: 46.178310, speed: 6.32 step/s
[2019-09-05 17:00:32,343 INFO train.py:381] step_idx: 18000, epoch: 0, batch: 18000, avg loss: 4.167870, normalized loss: 2.792130, ppl: 64.577759, speed: 6.28 step/s
[2019-09-05 17:00:48,170 INFO train.py:381] step_idx: 18100, epoch: 0, batch: 18100, avg loss: 3.728332, normalized loss: 2.352591, ppl: 41.609627, speed: 6.32 step/s
[2019-09-05 17:01:03,962 INFO train.py:381] step_idx: 18200, epoch: 0, batch: 18200, avg loss: 3.643733, normalized loss: 2.267992, ppl: 38.234283, speed: 6.33 step/s
[2019-09-05 17:01:19,778 INFO train.py:381] step_idx: 18300, epoch: 0, batch: 18300, avg loss: 3.645209, normalized loss: 2.269468, ppl: 38.290760, speed: 6.32 step/s
[2019-09-05 17:01:35,503 INFO train.py:381] step_idx: 18400, epoch: 0, batch: 18400, avg loss: 3.539849, normalized loss: 2.164108, ppl: 34.461700, speed: 6.36 step/s
[2019-09-05 17:01:51,312 INFO train.py:381] step_idx: 18500, epoch: 0, batch: 18500, avg loss: 4.130910, normalized loss: 2.755170, ppl: 62.234524, speed: 6.33 step/s
[2019-09-05 17:02:07,170 INFO train.py:381] step_idx: 18600, epoch: 0, batch: 18600, avg loss: 3.790271, normalized loss: 2.414531, ppl: 44.268410, speed: 6.31 step/s
[2019-09-05 17:02:23,047 INFO train.py:381] step_idx: 18700, epoch: 0, batch: 18700, avg loss: 3.749206, normalized loss: 2.373466, ppl: 42.487347, speed: 6.30 step/s
[2019-09-05 17:02:38,948 INFO train.py:381] step_idx: 18800, epoch: 0, batch: 18800, avg loss: 4.241145, normalized loss: 2.865405, ppl: 69.487381, speed: 6.29 step/s
[2019-09-05 17:02:54,851 INFO train.py:381] step_idx: 18900, epoch: 0, batch: 18900, avg loss: 3.374077, normalized loss: 1.998337, ppl: 29.197332, speed: 6.29 step/s
[2019-09-05 17:03:10,646 INFO train.py:381] step_idx: 19000, epoch: 0, batch: 19000, avg loss: 4.071568, normalized loss: 2.695828, ppl: 58.648880, speed: 6.33 step/s
[2019-09-05 17:03:26,440 INFO train.py:381] step_idx: 19100, epoch: 0, batch: 19100, avg loss: 4.016469, normalized loss: 2.640729, ppl: 55.504772, speed: 6.33 step/s
[2019-09-05 17:03:42,306 INFO train.py:381] step_idx: 19200, epoch: 0, batch: 19200, avg loss: 3.889569, normalized loss: 2.513829, ppl: 48.889801, speed: 6.30 step/s
[2019-09-05 17:03:58,424 INFO train.py:381] step_idx: 19300, epoch: 0, batch: 19300, avg loss: 4.036683, normalized loss: 2.660943, ppl: 56.638168, speed: 6.20 step/s
[2019-09-05 17:04:14,490 INFO train.py:381] step_idx: 19400, epoch: 0, batch: 19400, avg loss: 4.561982, normalized loss: 3.186241, ppl: 95.773087, speed: 6.22 step/s
[2019-09-05 17:04:30,282 INFO train.py:381] step_idx: 19500, epoch: 0, batch: 19500, avg loss: 3.578715, normalized loss: 2.202975, ppl: 35.827469, speed: 6.33 step/s
[2019-09-05 17:04:46,110 INFO train.py:381] step_idx: 19600, epoch: 0, batch: 19600, avg loss: 3.527187, normalized loss: 2.151446, ppl: 34.028099, speed: 6.32 step/s
[2019-09-05 17:05:01,898 INFO train.py:381] step_idx: 19700, epoch: 0, batch: 19700, avg loss: 3.935607, normalized loss: 2.559867, ppl: 51.193237, speed: 6.33 step/s
[2019-09-05 17:05:17,691 INFO train.py:381] step_idx: 19800, epoch: 0, batch: 19800, avg loss: 3.755292, normalized loss: 2.379552, ppl: 42.746708, speed: 6.33 step/s
[2019-09-05 17:05:33,480 INFO train.py:381] step_idx: 19900, epoch: 0, batch: 19900, avg loss: 3.985263, normalized loss: 2.609523, ppl: 53.799442, speed: 6.33 step/s
[2019-09-05 17:05:49,314 INFO train.py:381] step_idx: 20000, epoch: 0, batch: 20000, avg loss: 3.893937, normalized loss: 2.518197, ppl: 49.103821, speed: 6.32 step/s
[2019-09-05 17:06:05,092 INFO train.py:381] step_idx: 20100, epoch: 0, batch: 20100, avg loss: 4.029266, normalized loss: 2.653526, ppl: 56.219624, speed: 6.34 step/s
[2019-09-05 17:06:20,951 INFO train.py:381] step_idx: 20200, epoch: 0, batch: 20200, avg loss: 4.583488, normalized loss: 3.207748, ppl: 97.855118, speed: 6.31 step/s
^C
In[17]
# 使用训练好的base_model和预处理过后的数据,可以按照以下代码进行预测,并将翻译结果保存在--output_file指定的文件中
# 文件中的每一行为test_file_pattern中对应行 源语言翻译后的 目标语言,输出形式是BPE编码后的格式,用户可根据需要自行进行解码
# 更多信息可以通过 python infer.py --help 来获取
!python -u infer.py \
--src_vocab_fpath gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000 \
--trg_vocab_fpath gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000 \
--special_token '<s>' '<e>' '<unk>' \
--test_file_pattern gen_data/wmt16_ende_data_bpe_clean/newstest2014.tok.bpe.32000.en-de \
--token_delimiter ' ' \
--batch_size 32 \
--output_file "predict.txt" \
model_path trained_ckpts/latest.checkpoint \
beam_size 5 \
max_out_len 255
memory_optimize is deprecated. Use CompiledProgram and Executor
W0905 17:08:27.187067   690 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0905 17:08:27.191167   690 device_context.cc:267] device: 0, cuDNN Version: 7.3.
I0905 17:08:27.416724   690 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0905 17:08:27.425316   690 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
The translation result has been output to predict.txt
In[14]
# 将训练好的模型固化
!python freeze.py
[u'src_word', u'src_pos', u'src_slf_attn_bias', u'trg_word', u'init_score', u'init_idx', u'trg_src_attn_bias']
In[13]
# 使用固化的模型进行预测
# 由于freeze部分使用的是同步的方式读取数据,所以该infer需要将use_py_reader参数设置为False
# 文件中的每一行为test_file_pattern中对应行 源语言翻译后的 目标语言,输出形式是BPE编码后的格式,用户可根据需要自行进行解码
!python -u freeze_infer.py \
--src_vocab_fpath gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000 \
--trg_vocab_fpath gen_data/wmt16_ende_data_bpe_clean/vocab_all.bpe.32000 \
--special_token '<s>' '<e>' '<unk>' \
--test_file_pattern gen_data/wmt16_ende_data_bpe_clean/newstest2014.tok.bpe.32000.en-de \
--token_delimiter ' ' \
--batch_size 32 \
model_path trained_ckpts/latest.checkpoint \
beam_size 5 \
max_out_len 255
W0905 16:03:17.123710   504 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0905 16:03:17.127701   504 device_context.cc:267] device: 0, cuDNN Version: 7.3.
memory_optimize is deprecated. Use CompiledProgram and Executor
I0905 16:03:18.605345   504 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0905 16:03:18.612524   504 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
The translation result has been output to predict.txt

 点击链接,使用AI Studio一键上手实践项目吧:https://aistudio.baidu.com/aistudio/projectdetail/122281

下载安装命令

## CPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle

## GPU版本安装命令
pip install -f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

>> 访问 PaddlePaddle 官网,了解更多相关内容

展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部