20. DIAGRAMS
Complete Model Diagram
The whole flow of the Llama 3.1 8B-Instruct model, this diagram tries to reveal all of the details as much as possible:

STAGE 1: Tokenization

STAGE 2: Creating tokens tensor

STAGE 3: Looping through sequence length Diagram



STAGE 6: Forward Pass Through Attention Pre-normalization

STAGE 7: Forward Pass Through Attention Module

STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv

STAGE 9: Forward Pass Through Attention Module - Do reshapings

STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings

STAGE 11: Forward Pass Through Attention Module - Update KV cache

STAGE 12: Forward Pass Through Attention Module - Do transposes

STAGE 13: Forward Pass Through Attention Module - Calculate scores

STAGE 14: Forward Pass Through Attention Module - Calculate output

STAGE 15: Add attention module output and current tensor

STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization

STAGE 17: Forward Pass Through Feed-Forward Module

STAGE 18: Add Feed-Forward module output and current tensor

