https://medium.com/data-science-in-your-pocket/googles-mixture-of-recursions-end-of-transformers-b8de0fe9c83b
https://youtu.be/GWqXCgd7Hnc?si=0hV8wnGRT7slVYtQ
Mixture-of-Recursions (MoR) is a recursive Transformer. But instead of looping the same number of times for all tokens, MoR assigns each token a different recursion depth depending on how much “thinking” it needs. The router, a tiny neural net, decides this, token by token, during training and inference.https://youtu.be/qUtxkMk7_AE?si=rVImHs-eeeR9bwTw
https://youtu.be/GWqXCgd7Hnc?si=0hV8wnGRT7slVYtQ