Skip to content

20. DIAGRAMS

Complete Model Diagram

The whole flow of the Llama 3.1 8B-Instruct model, this diagram tries to reveal all of the details as much as possible:

Complete Model Diagram

STAGE 1: Tokenization

STAGE 1: Tokenization Diagram

STAGE 2: Creating tokens tensor

STAGE 2: Creating tokens tensor Diagram

STAGE 3: Looping through sequence length Diagram

STAGE 3: Looping through sequence length Diagram

STAGE 4: Creating inputTensor

STAGE 4: Creating inputTensor Diagram

STAGE 5: Forward Pass Through Each Transformer Block Diagram

STAGE 5: Forward Pass Through Each Transformer Block Diagram

STAGE 6: Forward Pass Through Attention Pre-normalization

STAGE 6: Forward Pass Through Attention Pre-normalization Diagram

STAGE 7: Forward Pass Through Attention Module

STAGE 7: Forward Pass Through Attention Module Diagram

STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv

STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv Diagram

STAGE 9: Forward Pass Through Attention Module - Do reshapings

STAGE 9: Forward Pass Through Attention Module - Do reshapings Diagram

STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings

STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings Diagram

STAGE 11: Forward Pass Through Attention Module - Update KV cache

STAGE 11: Forward Pass Through Attention Module - Update KV cache Diagram

STAGE 12: Forward Pass Through Attention Module - Do transposes

STAGE 12: Forward Pass Through Attention Module - Do transposes Diagram

STAGE 13: Forward Pass Through Attention Module - Calculate scores

STAGE 13: Forward Pass Through Attention Module - Calculate scores Diagram

STAGE 14: Forward Pass Through Attention Module - Calculate output

STAGE 14: Forward Pass Through Attention Module - Calculate output Diagram

STAGE 15: Add attention module output and current tensor

STAGE 15: Add attention module output and current tensor Diagram

STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization

STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization Diagram

STAGE 17: Forward Pass Through Feed-Forward Module

STAGE 17: Forward Pass Through Feed-Forward Module Diagram

STAGE 18: Add Feed-Forward module output and current tensor

STAGE 18: Add Feed-Forward module output and current tensor Diagram

STAGE 19: Forward Pass Through Output of The Transformer

STAGE 19: Forward Pass Through Output of The Transformer Diagram