FiveTech Support Forums

FiveWin / Harbour / xBase community
Board index Artificial Intelligence examples Transformers diagram
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Transformers diagram
Posted: Tue Sep 09, 2025 07:33 PM
</s>graph TD
subgraph Input
Tokens --&gt; Embedding
Embedding --&gt; Positional_Encoding
end
subgraph Encoder
Positional_Encoding --&gt; MultiHead_Attention1
MultiHead_Attention1 --&gt; AddNorm1
AddNorm1 --&gt; FeedForward1
FeedForward1 --&gt; AddNorm2
AddNorm2 --&gt; Encoder_Output
end
subgraph Decoder
Encoder_Output --&gt; Masked_MultiHead_Attention
Masked_MultiHead_Attention --&gt; AddNorm3
AddNorm3 --&gt; MultiHead_Attention2
MultiHead_Attention2 --&gt; AddNorm4
AddNorm4 --&gt; FeedForward2
FeedForward2 --&gt; AddNorm5
AddNorm5 --&gt; Linear
Linear --&gt; Softmax
Softmax --&gt; Output
end
subgraph Attention
Query --&gt; Scaled_Dot_Product
Key --&gt; Scaled_Dot_Product
Value --&gt; Scaled_Dot_Product
Scaled_Dot_Product --&gt; Mask[Optional Mask]
Mask --&gt; Softmax_Attention
Softmax_Attention --&gt; Weighted_Sum
end
subgraph MultiHead
Weighted_Sum --&gt; Concat
Concat --&gt; Linear_Projection
end

%% Backpropagation (punteado e inverso)
%% Desde la pérdida en la salida
Output -. grad .-&gt; Softmax
Softmax -. dL/dlogits .-&gt; Linear
%% Decoder backward
Linear -. grad .-&gt; AddNorm5
AddNorm5 -. grad .-&gt; FeedForward2
FeedForward2 -. grad .-&gt; AddNorm4
AddNorm4 -. grad .-&gt; MultiHead_Attention2
MultiHead_Attention2 -. grad to Q,K,V,W .-&gt; AddNorm3
AddNorm3 -. grad .-&gt; Masked_MultiHead_Attention
Masked_MultiHead_Attention -. grad .-&gt; Encoder_Output
%% Encoder backward (vía cross-attention)
Encoder_Output -. grad .-&gt; AddNorm2
AddNorm2 -. grad .-&gt; FeedForward1
FeedForward1 -. grad .-&gt; AddNorm1
AddNorm1 -. grad .-&gt; MultiHead_Attention1
MultiHead_Attention1 -. grad .-&gt; Positional_Encoding
Positional_Encoding -. dL/dEmb .-&gt; Embedding
Embedding -. grad .-&gt; Tokens
%% Atención interna (backprop por atención)
%% Nota: se muestran gradientes típicos dentro de la subcapa de atención
Softmax_Attention -. dL/dAttn .-&gt; Mask
Mask -. grad .-&gt; Scaled_Dot_Product
Scaled_Dot_Product -. grad Q,K,V .-&gt; Query
Scaled_Dot_Product -. grad Q,K,V .-&gt; Key
Scaled_Dot_Product -. grad Q,K,V .-&gt; Value
Weighted_Sum -. grad .-&gt; Softmax_Attention
Concat -. grad .-&gt; Weighted_Sum
Linear_Projection -. grad .-&gt; Concat
<e>
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: Transformers diagram
Posted: Tue Sep 16, 2025 10:50 AM
</s>flowchart TD
    A[Tech Tree] --&gt; B[Python]

    %% --- Dataset ---
    subgraph DS[Dataset]
        C[Basic Dataset] --&gt; R[Data Collection]
        R --&gt; S[Data Cleaning]
        S --&gt; T[Tokenisation]
        T --&gt; U[Byte Pair Encoding]
        U --&gt; V[Causal Language Modelling]
        V --&gt; W[Positional Encoding]
        W --&gt; X[Rotary Positional Encoding]
    end

    %% --- Operaciones Básicas ---
    subgraph OP[Operaciones Básicas]
        D[Tensor]
        E[Basic Operations]
        Y[Linear Layer]
        Z[ReLU]
        AA[Swish]
        AB[GeLU]
        AC[SiLU/Swish]
        AD[Attention]
        AE[Multi-Head Attention]
    end

    %% --- Entrenamiento ---
    subgraph TR[Entrenamiento]
        F[Autograd]
        G[Mean Squared Error]
        H[Cross Entropy]
        I[Backpropagation]
        J[Stochastic Gradient Descent]
        K[Weight Decay]
        L[Momentum]
        M[Learning rate]
        N[Gradient Clipping]
        O[AdamW]
        AN[Hyperparameter Tuning]
        AO[μ-Scaling Laws]
    end

    %% --- Arquitectura ---
    subgraph ARQ[Arquitectura]
        AF[Multi-Layer Perceptron]
        Q[Transformer Block]
        AG[Transformer]
        AH[Small Language Model]
        AI[Modern Language Model]
        AJ[Quantisation]
        AK[Batch Norm]
        AL[Layer Norm]
        AM[RMSNorm]
    end

    %% Relaciones principales
    B --&gt; C
    B --&gt; D
    B --&gt; E

    D --&gt; F
    F --&gt; G
    F --&gt; H
    F --&gt; I
    I --&gt; J

    J --&gt; K
    J --&gt; L
    J --&gt; M
    J --&gt; N
    J --&gt; O
    O --&gt; Q

    E --&gt; Y
    Y --&gt; Z
    Y --&gt; AA
    Y --&gt; AB
    AA --&gt; AC
    Y --&gt; AD
    AD --&gt; AE

    Z --&gt; AF
    AA --&gt; AF
    AB --&gt; AF
    AC --&gt; AF

    AF --&gt; Q
    AE --&gt; Q
    I --&gt; Y

    Q --&gt; AG
    AG --&gt; AH
    AH --&gt; AI
    AI --&gt; AJ

    AH --&gt; AK
    AH --&gt; AL
    AH --&gt; AM

    K --&gt; Q
    L --&gt; Q
    M --&gt; Q
    N --&gt; Q
    AN --&gt; AO --&gt; Q
<e>
regards, saludos

Antonio Linares
www.fivetechsoft.com

Continue the discussion