Encoder - Encoder understands the input text Decoder - Decodes uses the understanding of encoder and generates new piece of data from it

The decoder repeatedly feeds the sequence of input back in itself

Memory - Each token holds a memory, ie as the token count increases the memory required increases, the memory usage eventually becomes exhausted

Compute - Each additional token requires more computation, the longer the sequence the more compute is required

Models as a service(s) often will have limits on combined inputs and outputs

As a consumer, its better to have conversations short and jump by summarising current into the next session, so that the token context will be reduced