Appendix B.1 -- Is Artificial Intelligence New? ----------------------------------------------- No. In the last century, we already have some AI software, such as the automatic calculus programs [maxima]. Their intelligence comes from a set of human-designed rules. This is fine for some well-defined problems. After many years of developments, people gradually managed to design AI software that can recognize hand written digits [mnist]. As the computer hardware capacity grows, people then managed to make AI software that is able to recognize human faces [facenet], or recognize objects in images [imagenet, resnet]. Next, people continue to scale up the training data as well as the compute, along with the architecture revolution [transformer], a group of people created ChatGPT [instructgpt] and eventually ignited the AI hype. [maxima]: https://maxima.sourceforge.io/ [mnist]: https://en.wikipedia.org/wiki/MNIST_database [facenet]: https://arxiv.org/abs/1503.03832 [imagenet]: https://arxiv.org/abs/1409.0575 [resnet]: https://arxiv.org/abs/1512.03385 [transformer]: a.k.a. "Attention is All You Need" https://arxiv.org/abs/1706.03762 [instructgpt]: https://arxiv.org/abs/2203.02155 If you want to learn more about the history of AI, and the recent AI advances, it is suggested to refer to other materials such as Wikipedia. My personal recommendation on professional reading on the recent advances is the Deep Learning Book [dlbook]. The arxiv papers mentioned above are simply milestones on the road of AI development. Oh, in particular, the popular Large Language Models (LLMs) are mostly decoder-only transformers [transformer]. [dlbook]: https://www.deeplearningbook.org/ Appendix B.2 -- What is (modern) AI Software? --------------------------------------------- First, it is still software, but a particular type. In the current context, it mentions the software that is able to learn from data, and make predictions or decisions based on the learned knowledge. The trained AI models, can be distributed along with the inference software for deployment to achieve a certain functionality. From the file system point of view, the creation and distribution of AI software involves several parts: (1) Training data, or training simulator. The training data, for example, can be a large dataset of images as well as their annotations [imagenet,coco], for some AIs focusing on computer vision. In the natural language processing (or computational linguistics) area, the training data can be the Wikipedia dumps. Training simulator is not quite "data", as it can be Grand Theft Auto [gta] or Minecraft [minecraft], which are used to train AI agents to play. Physics simulators and Atari games are also used for this purpose. (2) Training software. This is just software, written in your familiar language like Python, C++, etc. It defines the model architecture, algorithm parameters, and the training process. You can run the training software, on the prepared training data, and obtain a trained model. The model here is also called a "pre-trained" model in later steps. (3) Pre-trained model. Technically, you can think of such model to be a set of Matrices and vectors. And on the disk, they are indeed matrices and vectors. The model is the most tricky part. We will expand on this later. (4) Inference software. Inference is the stage where you use an already trained (i.e., pre-trained) model to do some predictions. Training software itself is not enough if you want to use a pre-trained model. That's because the inference of an AI model can be very different from its training. For example, the dropout layer, and diffusion probabilistic models, behaves differently between training and inference. So, while it is possible to guess how to do inference of the model based on the training software, it is not always easy. [coco]: https://cocodataset.org/ [gta]: https://en.wikipedia.org/wiki/Grand_Theft_Auto [minecraft]: https://en.wikipedia.org/wiki/Minecraft Appendix B.3 -- Common Practice of Handling Pre-trained AI Models ----------------------------------------------------------------- The common practice in the ecosystem is to release the pre-trained model and the inference software, so the AI software can be used by the end users. For instance, Huggingface is a popular platform for sharing pre-trained models, based on git-lfs. The pre-trained AI models are not quite easy to handle, because some of the state-of-the-art models are super giant blobs that can easily go up to 400+GB [deepseek-r1]. Unlike Art creations like photos and videos that do not need further editing after the final cut, the AI models are nowadays frequently updated. So it is not wise to embed the AI models into the code repository and distribute them together. Instead, the common practice is to put the model separately, on some dedicated servers, cloud storage, or huggingface. Then, the AI software upstream will write the code for automatically downloading the model from the internet, or at least they will write some instructions in the readme to tell users how to download the model and prepare it for use [llama.cpp, ollama, torchvision, transformers]. [huggingface]: https://huggingface.co/ [deepseek-r1]: https://huggingface.co/deepseek-ai/DeepSeek-R1 [llama.cpp]: https://github.com/ggerganov/llama.cpp [ollama]: https://ollama.com/ [torchvision]: https://pytorch.org/vision/stable/index.html [transformers]: https://huggingface.co/docs/transformers/index We can take a closer look at some recent popular AI models, in particular, Large Language Models (LLMs), based on the ollama's listing [ollama-listing]: * Deepseek-R1: the model (or say, model weights) itself is released under MIT license. https://ollama.com/library/deepseek-r1 * LLama3.3: the model is released under a custom non-free license (see for example the "Additional Commercial Terms" section). https://ollama.com/library/llama3.3/blobs/bc371a43ce90 * Phi-4 (Microsoft): the model is released under MIT license. https://ollama.com/library/phi4/blobs/fa8235e5b48f * Mistral-7B: the model is released under Apache-2.0 license. https://ollama.com/library/mistral/blobs/43070e2d4e53 Indeed, you can see a trend that the ecosystem appreciates open source licenses. And sharing knowledge is beneficial to the whole ecosystem. [ollama-listing]: https://ollama.com/search Appendix B.4 -- "Reproducibility" of AI Models ---------------------------------------------- Different from compiled software, AI models are not deterministic. The training process is stochastic, and there are still many factors that can affect the training result even if all random seeds are fixed. So the "reproducibility" in the AI context is usually not the byte-to-byte reproducibility, but the model performance reproducibility -- the reproduced model can reach a similar performance or effectiveness as the original model to be reproduced.