FeedForwardNetwork¶
-
class
FeedForwardNetwork
(config)[source]¶ - Based on the paper, each layer has 2 subayers:
A multi-headed attention mechanism & a position-wise fully connected feed-forward network
Each layer employs a residual connection, y = f(x) + id(x) = f(x) + x, followed by layer normalization This python file would define the position-wise fully connected feed-forward network:
A two layer feed-forward module FFN(x) = max(0, x* w_1 + b_1) * w_2 + b_2