20. DIAGRAMS
Complete Model Diagram
The whole flow of the Llama 3.1 8B-Instruct model, this diagram tries to reveal all of the details as much as possible:
STAGE 1: Tokenization
STAGE 2: Creating tokens tensor
STAGE 3: Looping through sequence length Diagram
STAGE 6: Forward Pass Through Attention Pre-normalization
STAGE 7: Forward Pass Through Attention Module
STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv
STAGE 9: Forward Pass Through Attention Module - Do reshapings
STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings
STAGE 11: Forward Pass Through Attention Module - Update KV cache
STAGE 12: Forward Pass Through Attention Module - Do transposes
STAGE 13: Forward Pass Through Attention Module - Calculate scores
STAGE 14: Forward Pass Through Attention Module - Calculate output
STAGE 15: Add attention module output and current tensor
STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization
STAGE 17: Forward Pass Through Feed-Forward Module
STAGE 18: Add Feed-Forward module output and current tensor