教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

tensorflow RNN

作者：三川

2017/04/28 16:23

RNN 是什么?

递归神经网络，或者说 RNN，在数据能被按次序处理、数据点的不同排列亦会产生影响时就可以使用它。更重要的是，该次序可以是任意长度。

最直接的例子大概是一组数字的时间序列，根据此前的数值来预测接下来的数值。每个时间步（time-step）上，RNN 的输入是当前数值以及一个静态矢量，后者用来表示神经网络在此前的不同时间步所“看到”的东西。该静态矢量是 RNN 的编码存储，初始值设为零。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

RNN 处理系列数据的过程图解

设置

我们会创建一个简单的 Echo-RNN，它能记住输入数据并在几个时间步之后与之呼应。首先要设置一些我们需要的限制，它们的意义下面会解释。

from __future__ import print_function, division
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

num_epochs = 100
total_series_length = 50000
truncated_backprop_length = 15
state_size = 4
num_classes = 2
echo_step = 3
batch_size = 5
num_batches = total_series_length//batch_size//truncated_backprop_length

生成数据

现在生成训练数据，输入在本质上是一个随机的二元矢量。输出会是输入的“回响”（echo），把 echo_step 步骤移到右边。

def generateData():
x = np.array(np.random.choice(2, total_series_length, p=[0.5, 0.5]))
y = np.roll(x, echo_step)
y[0:echo_step] = 0

x = x.reshape((batch_size, -1)) # The first index changing slowest, subseries as rows
y = y.reshape((batch_size, -1))

return (x, y)

注意数据整形（data reshaping）步骤，这是为了将其装入有 batch_size 行的矩阵。神经网络根据神经元权重来逼近损失函数的梯度，通过这种方式来进行训练；该过程只会利用数据的一个小子集，即 mini-batch。数据整形把整个数据集装入矩阵，然后分割为这些 mini-batch。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！整形后的数据矩阵图解。曲线箭头用以表示换了行的相邻时间步。浅灰色代表 0，深灰色代表 1。

创建计算图

TensorFlow 的工作方式会首先创建一个计算图，来确认哪些操作需要完成。计算图的输入和输出一般是多维阵列，即张量（tensor）。计算图或其中一部分，将被迭代执行。这既可以在 CPU、GPU，也可在远程服务器上执行。

变量和 placeholder

本教程中使用的两个最基础的 TensorFlow 数据结构是变量和 placeholder。每轮运行中，batch 数据会被喂给 placeholder，而后者是计算图的“起始点”。另外，前一轮输出的 RNN-state 会在 placeholder 中提供。

batchX_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])

init_state = tf.placeholder(tf.float32, [batch_size, state_size])

神经网络的权重和偏差，被作为 TensorFlow 变量。这使得它们在每轮运行中保持一致，并对每次 batch 渐进式地更新。

W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)

W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)), dtype=tf.float32)

下图展示的是作为输入的数据矩阵，现有的 batch——batchX_placeholder 在虚线长方形里。正如我们后来看到的，这一 ”batch 窗口“在每轮运行向右移动了 truncated_backprop_length 规定的步数，这便是箭头的意义。在下面的例子中，batch_size = 3, truncated_backprop_length = 3, and total_series_length = 36。注意这些数字只是出于可视化目的，代码中的数值并不一样。在几个数据点中，series order 指数以数字表示。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

Unpacking

这一步，要做的是搭建计算图中类似于真正的 RNN 计算的部分。首先，我们希望把 batch 数据分割为邻近的时间步。

# Unpack columns
inputs_series = tf.unpack(batchX_placeholder, axis=1)
labels_series = tf.unpack(batchY_placeholder, axis=1)

如同下图所示，这通过把 batch 中的列（axis = 1）解压到 Python 列表来实现。RNN 同时在时间序列的不同部分上训练；在现有 batch 例子中，是 4-6、16-18、28-30 步。使用以 “plural”_”series”为名的变量，是为了强调该变量是一个列表——代表了在每一个时间步有多个 entry 的时间序列。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

现有 batch 被分成列的图示，每个数据点上的数字是顺序指数，牵头指示相邻时间步。

在我们的时间序列中，训练同时在三个地方完成。这需要在前馈是时同时保存三个 instances of states。这已经被考虑到了：你看得到的 init_state placeholder 有 batch_size 行。

Forward pass

下一步，我们会创建进行真实 RNN 运算的计算图部分。

# Forward pass
current_state = init_state
states_series = []
for current_input in inputs_series:
current_input = tf.reshape(current_input, [batch_size, 1])
input_and_state_concatenated = tf.concat(1, [current_input, current_state]) # Increasing number of columns

next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b) # Broadcasted addition
states_series.append(next_state)
current_state = next_state

注意第六行的串联（concatenation），我们实际上想要做的，是计算两个仿射变形（affine transforms）的 current_input * Wa + current_state *Wbin，见下图。通过串联这两个张量，你会=只会使用一个矩阵乘法。偏差 b 的加法，会在 batch 里的所有样本上传播。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

上面代码示例中矩阵第八行的计算，非线性变形的反正切（arctan）被忽略。

你也许会好奇变量 truncated_backprop_length 其名称的含义。当一个 RNN 被训练，事实上它被作为是一个深度神经网络的特殊情况：在每一层有重复出现的权重。这些层不会展开到一开始的时候，这么干的计算成本太高，因而时间步的数量被截为有限的数目。在上面的图示中，误差在 batch 中被反向传播三步。

计算损失

这是计算图的最后一步，一个从状态到输出的全连接 softmax 层，让 classes 以 one-hot 格式编码，然后计算 batch 的损失。

logits_series = [tf.matmul(state, W2) + b2 for state in states_series] #Broadcasted addition
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]

losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)

train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

最后一行加入的是训练功能。TensorFlow 会自动运行反向传播——对每一个 mini-batch，计算图会执行一次；网络权重会渐进式更新。

注意 API 调用 ”sparse_softmax_cross_entropy_with_logits“，它在内部自动计算 softmax，然后计算 cross-entropy。在我们的例子里，这些 class 是互相排斥的，要么是 1 要么是 0，这便是使用 “Sparse-softmax” 的原因。你可以在 API 中了解更多。

训练可视化

这里面有可视化函数，所以我们能在训练时看到神经网络中发生了什么。它会不断绘制损失曲线，展示训练输入、训练输出，以及在一个训练 batch 的不同样本序列上神经网络的现有预测。

def plot(loss_list, predictions_series, batchX, batchY):
plt.subplot(2, 3, 1)
plt.cla()
plt.plot(loss_list)

for batch_series_idx in range(5):
one_hot_output_series = np.array(predictions_series)[:, batch_series_idx, :]
single_output_series = np.array([(1 if out[0] < 0.5 else 0) for out in one_hot_output_series])

plt.subplot(2, 3, batch_series_idx + 2)
plt.cla()
plt.axis([0, truncated_backprop_length, 0, 2])
left_offset = range(truncated_backprop_length)
plt.bar(left_offset, batchX[batch_series_idx, :], width=1, color="blue")
plt.bar(left_offset, batchY[batch_series_idx, :] * 0.5, width=1, color="red")
plt.bar(left_offset, single_output_series * 0.3, width=1, color="green")

plt.draw()
plt.pause(0.0001)

运行训练环节

到了把一切归总、训练网络的时候了。在 TensorFlow 中，计算图要在一个大环节中执行。新数据在每个小环节生成（并不是通常的方式，但它在这个例子中有用。以为所有东西都是可预测的）。

with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
plt.ion()
plt.figure()
plt.show()
loss_list = []

for epoch_idx in range(num_epochs):
x,y = generateData()
_current_state = np.zeros((batch_size, state_size))

print("New data, epoch", epoch_idx)

for batch_idx in range(num_batches):
start_idx = batch_idx * truncated_backprop_length
end_idx = start_idx + truncated_backprop_length

batchX = x[:,start_idx:end_idx]
batchY = y[:,start_idx:end_idx]

_total_loss, _train_step, _current_state, _predictions_series = sess.run(
[total_loss, train_step, current_state, predictions_series],
feed_dict={
batchX_placeholder:batchX,
batchY_placeholder:batchY,
init_state:_current_state
})

loss_list.append(_total_loss)

if batch_idx%100 == 0:
print("Step",batch_idx, "Loss", _total_loss)
plot(loss_list, _predictions_series, batchX, batchY)

plt.ioff()
plt.show()

你可以看到，我们在每次迭代把 truncated_backprop_length 步骤向前移（第 15–19 行），但设置不同的移动幅度是可能的。该话题在下面进一步讨论。据雷锋网了解，这么做的坏处是，truncated_backprop_length 需要比 time dependencies 大很多（在我们的例子中是三步），才能隔离相关训练数据。否则可能会有许多“丢失”，如下图。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

方块时间序列，升起的黑块代表 echo-output，在 echo input（黑块）三步之后激活。滑动 batch 窗口每次也移动三步，在我们的例子中，这意味着没有 batch 会隔离 dependency，所以它无法训练。

雷锋网提醒，这只是一个解释 RNN 工作原理的简单例子，该功能可以很容易地用几行代码编写出来。该神经网络将能够准确地学习 echo 行为，所以没有必要用测试数据。

该程序会随训练更新图表。请见下面的图例。蓝条代表训练输入信号（二元），红条表示训练输出的 echo，绿条是神经网络产生的 echo。不同的条形块代表了当前 batch 的不同样本序列。

我们的算法能够相当快速地学习该任务。左上角的图展示了随时函数的输出，但图中的尖刺是怎么回事？你可以好好想一想，答案在下面。

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

损失、输入、输出训练数据（蓝、红）以及预测（绿）的可视化。

形成尖刺的原因是：我们正在开始一个新的小环节，生成新数据。由于矩阵被整形过，每一行的新单元与上一行的最后一个单元临近。除了第一行，所有行的开头几个单元有不会被包括在状态（state）里的 dependency，因此神经网络在第一个 batch 上的表现永远不怎么样。

整个系统

以下便是整个可运行的系统，你只需要复制粘贴然后运行。

from __future__ import print_function, division
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

num_epochs = 100
total_series_length = 50000
truncated_backprop_length = 15
state_size = 4
num_classes = 2
echo_step = 3
batch_size = 5
num_batches = total_series_length//batch_size//truncated_backprop_length

def generateData():
x = np.array(np.random.choice(2, total_series_length, p=[0.5, 0.5]))
y = np.roll(x, echo_step)
y[0:echo_step] = 0

x = x.reshape((batch_size, -1)) # The first index changing slowest, subseries as rows
y = y.reshape((batch_size, -1))

return (x, y)

batchX_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])

init_state = tf.placeholder(tf.float32, [batch_size, state_size])

W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)

W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)), dtype=tf.float32)

# Unpack columns
inputs_series = tf.unpack(batchX_placeholder, axis=1)
labels_series = tf.unpack(batchY_placeholder, axis=1)

# Forward pass
current_state = init_state
states_series = []
for current_input in inputs_series:
current_input = tf.reshape(current_input, [batch_size, 1])
input_and_state_concatenated = tf.concat(1, [current_input, current_state]) # Increasing number of columns

next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b) # Broadcasted addition
states_series.append(next_state)
current_state = next_state

logits_series = [tf.matmul(state, W2) + b2 for state in states_series] #Broadcasted addition
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]

losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)

train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

def plot(loss_list, predictions_series, batchX, batchY):
plt.subplot(2, 3, 1)
plt.cla()
plt.plot(loss_list)

for batch_series_idx in range(5):
one_hot_output_series = np.array(predictions_series)[:, batch_series_idx, :]
single_output_series = np.array([(1 if out[0] < 0.5 else 0) for out in one_hot_output_series])

plt.subplot(2, 3, batch_series_idx + 2)
plt.cla()
plt.axis([0, truncated_backprop_length, 0, 2])
left_offset = range(truncated_backprop_length)
plt.bar(left_offset, batchX[batch_series_idx, :], width=1, color="blue")
plt.bar(left_offset, batchY[batch_series_idx, :] * 0.5, width=1, color="red")
plt.bar(left_offset, single_output_series * 0.3, width=1, color="green")

plt.draw()
plt.pause(0.0001)

with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
plt.ion()
plt.figure()
plt.show()
loss_list = []

for epoch_idx in range(num_epochs):
x,y = generateData()
_current_state = np.zeros((batch_size, state_size))

print("New data, epoch", epoch_idx)

for batch_idx in range(num_batches):
start_idx = batch_idx * truncated_backprop_length
end_idx = start_idx + truncated_backprop_length

batchX = x[:,start_idx:end_idx]
batchY = y[:,start_idx:end_idx]

_total_loss, _train_step, _current_state, _predictions_series = sess.run(
[total_loss, train_step, current_state, predictions_series],
feed_dict={
batchX_placeholder:batchX,
batchY_placeholder:batchY,
init_state:_current_state
})

loss_list.append(_total_loss)

if batch_idx%100 == 0:
print("Step",batch_idx, "Loss", _total_loss)
plot(loss_list, _predictions_series, batchX, batchY)

plt.ioff()
plt.show()

via medium，原作者 Erik Hallström，雷锋网编译

“TensorFlow & 神经网络算法高级应用班”要开课啦！

教你从零开始在 TensorFlow 上搭建 RNN（完整代码）！

从初级到高级，理论+实战，一站式深度了解 TensorFlow！

本课程面向深度学习开发者，讲授如何利用 TensorFlow 解决图像识别、文本分析等具体问题。课程跨度为 10 周，将从 TensorFlow 的原理与基础实战技巧开始，一步步教授学员如何在 TensorFlow 上搭建 CNN、自编码、RNN、GAN 等模型，并最终掌握一整套基于 TensorFlow 做深度学习开发的专业技能。

两名授课老师佟达、白发川身为 ThoughtWorks 的资深技术专家，具有丰富的大数据平台搭建、深度学习系统开发项目经验。

时间：每周二、四晚 20：00-21：00
开课时长：总学时 20 小时，分 10 周完成，每周2次，每次 1 小时
线上授课地址：http://www.mooc.ai/

一文读懂 CNN、DNN、RNN 内部网络结构区别

万事开头难！入门TensorFlow，这9个问题TF Boys 必须要搞清楚

TensorFlow在工程项目中的应用公开课视频+文字转录（上） | AI 研习社

一文详解如何用 TensorFlow 实现基于 LSTM 的文本分类（附源码）