FeedForwardNetwork¶

class FeedForwardNetwork(config)[source]¶

Based on the paper, each layer has 2 subayers:: A multi-headed attention mechanism & a position-wise fully connected feed-forward network

Each layer employs a residual connection, y = f(x) + id(x) = f(x) + x, followed by layer normalization This python file would define the position-wise fully connected feed-forward network:

A two layer feed-forward module FFN(x) = max(0, x* w_1 + b_1) * w_2 + b_2

forward(x)[source]¶: FFN(x) = max(0, x* w_1 + b_1) * w_2 + b_2 a residual connection, y = f(x) + id(x) = f(x) + x