20. DIAGRAMS
Complete Model Diagram
The whole flow of the Llama 3.1 8B-Instruct model, this diagram tries to reveal all of the details as much as possible:
data:image/s3,"s3://crabby-images/58a67/58a6765d26ad29b77f36aec65d5c9548bbfc72d2" alt="Complete Model Diagram"
STAGE 1: Tokenization
data:image/s3,"s3://crabby-images/4dc3e/4dc3e47fa1017d51d2810a997ccc58e49e688009" alt="STAGE 1: Tokenization Diagram"
STAGE 2: Creating tokens tensor
data:image/s3,"s3://crabby-images/55cc5/55cc539ac21fd15e3ee6cac386e3f10706e23982" alt="STAGE 2: Creating tokens tensor Diagram"
STAGE 3: Looping through sequence length Diagram
data:image/s3,"s3://crabby-images/4de14/4de14225dbe8ec9e6b04e325ba29d1e43b05f290" alt="STAGE 3: Looping through sequence length Diagram"
data:image/s3,"s3://crabby-images/843b1/843b146849f550a5c8c4a8eb1b8d1dd5c7eadb6a" alt="STAGE 4: Creating inputTensor Diagram"
data:image/s3,"s3://crabby-images/46b7b/46b7b61e43b0cdc5313b6fb0e229c5f386785585" alt="STAGE 5: Forward Pass Through Each Transformer Block Diagram"
STAGE 6: Forward Pass Through Attention Pre-normalization
data:image/s3,"s3://crabby-images/26e57/26e57e3123576a78a64f01d94dcbc88e38c41397" alt="STAGE 6: Forward Pass Through Attention Pre-normalization Diagram"
STAGE 7: Forward Pass Through Attention Module
data:image/s3,"s3://crabby-images/36837/368375b03675c8741ffa2e9f7c4cd0f102a3cb86" alt="STAGE 7: Forward Pass Through Attention Module Diagram"
STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv
data:image/s3,"s3://crabby-images/b5f2c/b5f2c7f2f3abb20d503a7036e7ca35d28ed213e4" alt="STAGE 8: Forward Pass Through Attention Module - Calculating xq, xk, and xv Diagram"
STAGE 9: Forward Pass Through Attention Module - Do reshapings
data:image/s3,"s3://crabby-images/89e76/89e761a1b2f7a55c92b0ac93d52dbecb2a1de798" alt="STAGE 9: Forward Pass Through Attention Module - Do reshapings Diagram"
STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings
data:image/s3,"s3://crabby-images/6dc39/6dc390146295ff0256d99c57bb006478f0cfc5b9" alt="STAGE 10: Forward Pass Through Attention Module - Apply Rotary Embeddings Diagram"
STAGE 11: Forward Pass Through Attention Module - Update KV cache
data:image/s3,"s3://crabby-images/ff284/ff284294dacda3d23b1d25a0ba42e9d3259ab51b" alt="STAGE 11: Forward Pass Through Attention Module - Update KV cache Diagram"
STAGE 12: Forward Pass Through Attention Module - Do transposes
data:image/s3,"s3://crabby-images/ff118/ff118dca18d58543b69078e31fb4bca7a4a10498" alt="STAGE 12: Forward Pass Through Attention Module - Do transposes Diagram"
STAGE 13: Forward Pass Through Attention Module - Calculate scores
data:image/s3,"s3://crabby-images/95fc8/95fc85eca34a73f4382d0eb376eff20857149571" alt="STAGE 13: Forward Pass Through Attention Module - Calculate scores Diagram"
STAGE 14: Forward Pass Through Attention Module - Calculate output
data:image/s3,"s3://crabby-images/4893a/4893a85359569bac2655d677913c13f2407094cb" alt="STAGE 14: Forward Pass Through Attention Module - Calculate output Diagram"
STAGE 15: Add attention module output and current tensor
data:image/s3,"s3://crabby-images/1773a/1773a24eaf9a367ecb6596939b80b35b313af0c2" alt="STAGE 15: Add attention module output and current tensor Diagram"
STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization
data:image/s3,"s3://crabby-images/0b440/0b4405780e09a574020dddba7b2e61cf4820baa1" alt="STAGE 16: Stage: Forward Pass Through Feed-Forward Pre-normalization Diagram"
STAGE 17: Forward Pass Through Feed-Forward Module
data:image/s3,"s3://crabby-images/0e5e3/0e5e3f419916a97d73982b68109717b5793426f7" alt="STAGE 17: Forward Pass Through Feed-Forward Module Diagram"
STAGE 18: Add Feed-Forward module output and current tensor
data:image/s3,"s3://crabby-images/a4973/a49736751dcf86d5aa3b29ebc065771b44a70f47" alt="STAGE 18: Add Feed-Forward module output and current tensor Diagram"
data:image/s3,"s3://crabby-images/832f0/832f059561ddd0ef6f0a0b80328a6baf232d66b6" alt="STAGE 19: Forward Pass Through Output of The Transformer Diagram"