Skip to content

Llama Nuts and Bolts

DIAGRAMS

adalkiran/llama-nuts-and-bolts

20. DIAGRAMS¶

Complete Model Diagram¶

The whole flow of the Llama 3.1 8B-Instruct model, this diagram tries to reveal all of the details as much as possible:

Complete Model Diagram

STAGE 1: Tokenization¶

STAGE 1: Tokenization Diagram

STAGE 2: Creating tokens tensor¶

STAGE 2: Creating tokens tensor Diagram

STAGE 3: Looping through sequence length Diagram¶

STAGE 3: Looping through sequence length Diagram

STAGE 4: Creating inputTensor¶

STAGE 4: Creating inputTensor Diagram

STAGE 5: Forward Pass Through Each Transformer Block Diagram¶

STAGE 5: Forward Pass Through Each Transformer Block Diagram

STAGE 6: Forward Pass Through Attention Pre-normalization¶

STAGE 6: Forward Pass Through Attention Pre-normalization Diagram

STAGE 7: Forward Pass Through Attention Module¶

STAGE 7: Forward Pass Through Attention Module Diagram

STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv¶

STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv Diagram

STAGE 9: Forward Pass Through Attention Module - Do reshapings¶

STAGE 9: Forward Pass Through Attention Module - Do reshapings Diagram

STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings¶

STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings Diagram

STAGE 11: Forward Pass Through Attention Module - Update KV cache¶

STAGE 11: Forward Pass Through Attention Module - Update KV cache Diagram

STAGE 12: Forward Pass Through Attention Module - Do transposes¶

STAGE 12: Forward Pass Through Attention Module - Do transposes Diagram

STAGE 13: Forward Pass Through Attention Module - Calculate scores¶

STAGE 13: Forward Pass Through Attention Module - Calculate scores Diagram

STAGE 14: Forward Pass Through Attention Module - Calculate output¶

STAGE 14: Forward Pass Through Attention Module - Calculate output Diagram

STAGE 15: Add attention module output and current tensor¶

STAGE 15: Add attention module output and current tensor Diagram

STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization¶

STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization Diagram

STAGE 17: Forward Pass Through Feed-Forward Module¶

STAGE 17: Forward Pass Through Feed-Forward Module Diagram

STAGE 18: Add Feed-Forward module output and current tensor¶

STAGE 18: Add Feed-Forward module output and current tensor Diagram

STAGE 19: Forward Pass Through Output of The Transformer¶

STAGE 19: Forward Pass Through Output of The Transformer Diagram