文档章节

Keras 中 TimeDistributed 和 TimeDistributedDense 理解

o
 osc_w9s1w4o0
发布于 2019/04/01 18:01
字数 1256
阅读 19
收藏 0

精选30+云产品,助力企业轻松上云!>>>

From the offical code:

class TimeDistributed(Wrapper):
    """This wrapper applies a layer to every temporal slice of an input.
    The input should be at least 3D, and the dimension of index one
    will be considered to be the temporal dimension.
    Consider a batch of 32 samples,
    where each sample is a sequence of 10 vectors of 16 dimensions.
    The batch input shape of the layer is then `(32, 10, 16)`,
    and the `input_shape`, not including the samples dimension, is `(10, 16)`.
    You can then use `TimeDistributed` to apply a `Dense` layer
    to each of the 10 timesteps, independently:
    ```python
        # as the first layer in a model
        model = Sequential()
        model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
        # now model.output_shape == (None, 10, 8)
    ```
    The output will then have shape `(32, 10, 8)`.
    In subsequent layers, there is no need for the `input_shape`:
    ```python
        model.add(TimeDistributed(Dense(32)))
        # now model.output_shape == (None, 10, 32)
    ```
    The output will then have shape `(32, 10, 32)`.
    `TimeDistributed` can be used with arbitrary layers, not just `Dense`,
    for instance with a `Conv2D` layer:
    ```python
        model = Sequential()
        model.add(TimeDistributed(Conv2D(64, (3, 3)),
                                  input_shape=(10, 299, 299, 3)))
    ```
    # Arguments
        layer: a layer instance.

So - basically the   TimeDistributedDense was introduced first in early versions of Keras in order to apply a  Dense layer stepwise to sequences.   TimeDistributed is a Keras wrapper which makes possible to get any static (non-sequential) layer and apply it in a sequential manner. An example of such usage might be using a e.g. pretrained convolutional layer to a short video clip by applying   TimeDistributed(conv_layer)  where  conv_layer   is applied to each frame of a clip. It produces the sequence of outputs which might be then consumed by next recurrent or   TimeDistributed  layer.

It's good to know that usage of   TimeDistributedDense is depreciated and it's better to use   TimeDistributed(Dense)  .

TimeDistributed 

RNNs are capable of a number of different types of input / output combinations, as seen below

RNN types

The  TimeDistributedDense  layer allows you to build models that do the one-to-many and many-to-many architectures. This is because the output function for each of the "many" outputs must be the same function applied to each timestep. The  TimeDistributedDense  layers allows you to apply that Dense function across every output over time. This is important because it needs to be the same dense function applied at every time step.

If you didn't not use this, you would only have one final output - and so you use a normal dense layer. This means you are doing either a one-to-one or a many-to-one network, since there will only be one dense layer for the output.

 ======================================================

As fchollet said ,
TimeDistributedDense applies a same Dense (fully-connected) operation to every timestep of a 3D tensor.

But I think you still don't catch the point. The most common scenario for using TimeDistributedDense is using a recurrent NN for tagging task.e.g. POS labeling or slot filling task.

In this kind of task:
For each sample, the input is a sequence (a1,a2,a3,a4...aN) and the output is a sequence (b1,b2,b3,b4...bN) with the same length. bi could be viewed as the label of ai.
Push a1 into a recurrent nn to get output b1. Than push a2 and the hidden output of a1 to get b2...

If you want to model this by Keras, you just need to used a TimeDistributedDense after a RNN or LSTM layer(with return_sequence=True) to make the cost function is calculated on all time-step output. If you don't use TimeDistributedDense ans set the return_sequence of RNN=False, then the cost is calculated on the last time-step output and you could only get the last bN.

I am also new to Keras, but I am trying to use it to do sequence labeling and I find this could only be done by using TimeDistributedDense. If I make something wrong, please correct me.

 ======================================================

It's quite easy to understand . Let's not think in terms of tensors and stuffs for a sec.

It all depends upon the "return_sequences" parameter of the LSTM function.
if return_sequence = false ( by default , it's always false ), then we get LSTM output corresponding only to THE LAST TIME STEP.
Now applying model.add(Dense( )) , what we are doing is connecting only LSTM output at last time step to Dense Layer. (This approach is in encoding the overall sequence into a compact vector .
Now given a sequence of 50 words , my LSTM will only output only one word )

Ques) WHEN NOT TO USE TIMEDISTRIBUTED ?
Ans) In my experience, for encoder decoder model.
if you want to squeeze all your input information into a single vector, we DONT use TIMEDISTRIBUTED.
Only final unrolled layer of LSTM layer will be the output. This final layer will holder the compact information of whole input sequence which is useful for task like classification , summarization etc.
-----------------------------------------------------------However !-----------------------------------------------------------------------

If return_sequence is set True , LSTM outputs at every time step . So , I must use TimeDistributed to ensure that the Dense layer is connected to LSTM output at each TimeStep. Otherwise , error occurs !
Also keep in mind , just like lstm is unrolled , so is the dense layer . i.e dense layer at each time step is the same one . It's not like there are 50 different dense layer for 50 time steps.
There's nothing to get confused.
This time , model will generate a sequence corresponding to length of Timestep. So, given set of 50 input word , LSTM will output 50 output word

Q) WHEN TO USE TIMEDISTRIBUTED ?
A) In case of word generation task (like shakespeare) , where given a sequence of words , we train model predict next set of words .
EXAMPLE : if nth training input to LSTM Network is : 'I want to ' AND output of netwok is "want to eat" . Here , each word ['want','to','eat'] are output of LSTM during each timestep.

 ======================================================

Let's say you have time-series data with NN rows and 700700 columns which you want to feed to a SimpleRNN(200, return_sequence=True) layer in Keras. Before you feed that to the RNN, you need to reshape the previous data to a 3D tensor. So it becomes a N×700×1N×700×1.

 

 The image is taken from https://colah.github.io/posts/2015-08-Understanding-LSTMs

In RNN, your columns (the "700 columns") is the timesteps of RNN. Your data is processed from t=1 to 700t=1 to 700. After feeding the data to the RNN, now it have 700 outputs which are h1h1 to h700h700, not h1h1to h200h200. Remember that now the shape of your data is N×700×200N×700×200 which is samples (the rows) x timesteps (the columns) x channels.

And then, when you apply a TimeDistributedDense , you're applying a Dense  layer on each timestep, which means you're applying a Dense  layer on each h1h1, h2h2,...,htht respectively. Which means: actually you're applying the fully-connected operation on each of its channels (the "200" one) respectively, from h1h1 to h700h700. The 1st "1×1×2001×1×200" until the 700th "1×1×2001×1×200".

Why are we doing this? Because you don't want to flatten the RNN output.

Why not flattening the RNN output? Because you want to keep each timestep values separate.

Why keep each timestep values separate? Because:

  • you're only want to interacting the values between its own timestep
  • you don't want to have a random interaction between different timesteps and channels.

 

 

参考:

https://datascience.stackexchange.com/questions/10836/the-difference-between-dense-and-timedistributeddense-of-keras

https://github.com/keras-team/keras/blob/master/keras/layers/wrappers.py#L43

https://github.com/keras-team/keras/issues/1029

https://stackoverflow.com/questions/42398645/timedistributed-vs-timedistributeddense-keras

o
粉丝 0
博文 500
码字总数 0
作品 0
私信 提问
加载中
请先登录后再评论。
keras 序列模型

教程概述本教程分为5个部分; 他们是: TimeDistributed层序列学习问题用于序列预测的一对一LSTM用于序列预测的多对一LSTM(不含TimeDistributed)用于序列预测的多对多LSTM(带TimeDistributed...

osc_g1crygl6
2018/10/11
4
0
使用Keras搭建cnn+rnn, BRNN,DRNN等模型

Keras api 提前知道: BatchNormalization, 用来加快每次迭代中的训练速度 Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintain......

osc_r74j15vd
1970/03/24
4
0
keras 中的一点问题

TimeDistributed层在Keras中的作用是什么? 我试图了解TimeDistributed包装器在Keras中的作用。 我得到TimeDistributed“将一个图层应用于输入的每个时间片。” 但我做了一些实验并得到了我无...

osc_n3166lwj
2018/07/09
1
0
keras中TimeDistributed

TimeDistributed这个层还是比较难理解的。事实上通过这个层我们可以实现从二维像三维的过渡,甚至通过这个层的包装,我们可以实现图像分类视频分类的转化。 考虑一批32个样本,其中每个样本是...

osc_xp2ngacj
2018/05/24
6
0
在浏览器中进行深度学习:TensorFlow.js (七)递归神经网络 (RNN)

介绍 上一篇博客我们讨论了CNN,卷积神经网络。CNN广泛应用于图像相关的深度学习场景中。然而CNN也有一些限制: 很难应用于序列数据 输入数据和输出数据都是固定长度 不理解上下文 这些问题就...

naughty
2018/09/09
1.5K
0

没有更多内容

加载失败,请刷新页面

加载更多

OSChina 周日乱弹 —— 那么长的绳子,你这是放风筝呢

Osc乱弹歌单(2020)请戳(这里) 【今日歌曲】 @ 巴拉迪维:黑豹乐队的单曲《无地自容》 耳畔突然响起旋律,是那首老歌。中国摇滚有了《一无所有》不再一无所有;中国摇滚有了《无地自容》不...

小小编辑
44分钟前
55
1
《吐血整理》-顶级程序员书单集

你知道的越多,你不知道的越多 给岁月以文明,而不是给文明以岁月 前言 王潇:格局决定了一个人的梦想,梦想反过来决定行为。 那格局是什么呢? 格局是你能够看见的深度、广度和密度。 王潇认...

敖丙
2019/12/11
8
0
我可以在Android版式中加下划线吗? - Can I underline text in an Android layout?

问题: 如何在Android布局xml文件中定义带下划线的文本? 解决方案: 参考一: https://stackoom.com/question/A31z/我可以在Android版式中加下划线吗 参考二: https://oldbug.net/q/A31z/...

法国红酒甜
47分钟前
26
0
干掉ELK | 使用Prometheus+Grafana搭建监控平台

什么是Prometheus? Prometheus是由SoundCloud开发的开源监控报警系统和时序列数据库(TSDB)。Prometheus使用Go语言开发,是Google BorgMon监控系统的开源版本。 Prometheus的特点 · 多维度...

木九天
今天
34
0
拉勾网拉你上勾

预览 需求简介 拉勾网是一个互联网行业的一个招聘网站,上面有许多职位,于是乎,小编想提取指定职位的基本信息(职位名,薪水,工作经验,工作地点,教育背景),然后插入 MongoDB 数据库,...

木下瞳
2019/04/17
20
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部