Abstract: This paper explores several key facts about Transformer from the intersection of philosophy and technology, systematically criticizing the fundamental limitations of large language model (LLM) based on the Transformer architecture. It posits that the Transformer is essentially a "prisoner of experience," whose capabilities are strictly confined to the "past" and the "known" as defined by its training data. The critique is presented through three core dimensions: Firstly, at the epistemological level, its learning paradigm based on maximum likelihood estimation is an extreme form of empiricism, unable to reach a priori reason and logical necessity, entangled in the "problem of induction"; Secondly, at the ontological level, its word embeddings and attention mechanisms operate within a closed symbolic system, lacking intentionality toward the real world, and its tokenization process results in a fragmented understanding of concepts; Finally, at the level of philosophy of mind, its nature as a deterministic function approximator makes it a super version of the “Chinese room” thought experiment, lacking belief, intention, and true understanding. The conclusion of this paper is that while Transformer is an excellent engineering technology, its architecture itself cannot lead to general artificial intelligence (AGI), and future breakthroughs will require a new paradigm that transcends pure empiricism.
Keywords: Transformer, Large Language Models, Empiricism, Induction, Intentionality
Introduction
1、The Epistemological Prisoner:
An Extreme Empiricist Technological Realization
2、Ontology and the Lost Symbol:
Word Embeddings and Attention Mechanisms without “Worldness”
3、The Paradox of Philosophy of Mind:
A “Heartless” Device as a Deterministic Function Approximator
Conclusion and Future Outlook
END
[1] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[2] Hume, D. (1739). A Treatise of Human Nature.
[3] The Problem of Induction was formulated by Hume, noting that no particular experience can logically guarantee universal laws.
[4] Husserl, E. (1900). Logical Investigations.
[5] Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences.
[6] Heidegger, M. (1927). Being and Time.
[7] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[J]. 2018.
[8] Radford A, Kim J W, Xu T, et al. Robust speech recognition via large-scale weak supervision[C]//International conference on machine learning. PMLR, 2023: 28492-28518.
[9] Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[10] Arnab A, Dehghani M, Heigold G, et al. Vivit: A video vision transformer[C] //Proceedings of the IEEE/CVF international conference on computer vision. 2021: 6836-6846.
[11] Li J, Li D, Savarese S, et al. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[C]//International conference on machine learning. PMLR, 2023: 19730-19742.

