Learn
Generating Text with Deep Learning
Training Setup (part 2)

At this point we need to fill out the `1`s in each vector. We can loop over each English-Spanish pair in our training sample using the features dictionaries to add a `1` for the token in question. For example, the dog sentence (`["the", "dog", "licked", "me"]`) would be split into the following matrix of vectors:

``````[
[1, 0, 0, 0], # timestep 0 => "the"
[0, 1, 0, 0], # timestep 1 => "dog"
[0, 0, 1, 0], # timestep 2 => "licked"
[0, 0, 0, 1], # timestep 3 => "me"
]``````

You’ll notice the vectors have timesteps — we use these to track where in a given document (sentence) we are.

To build out a three-dimensional NumPy matrix of one-hot vectors, we can assign a value of 1 for a given word at a given timestep in a given line:

``matrix_name[line, timestep, features_dict[token]] = 1.``

Keras will fit — or train — the seq2seq model using these matrices of one-hot vectors:

• the encoder input data
• the decoder input data
• the decoder target data

Hang on a second, why build two matrices of decoder data? Aren’t we just encoding and decoding?

The reason has to do with a technique known as teacher forcing that most seq2seq models employ during training. Here’s the idea: we have a Spanish input token from the previous timestep to help train the model for the current timestep’s target token. ### Instructions

1.

Inside the first nested `for` loop, assign `1.` for the current `line`, `timestep`, and `token` in `encoder_input_data`.

2.

Inside the second nested `for` loop, assign `1.` for the current `line`, `timestep`, and `token` in `decoder_input_data`.

3.

Inside the second nested `for` loop, assign `1.` for the current `line`, the previous `timestep` (at `timestep - 1`), and `token` in `decoder_target_data` if `timestep` is greater than `0`.