Appendix B.1 -- Is Artificial Intelligence New?
-----------------------------------------------

No.

In the last century, we already have some AI software, such as the automatic
calculus programs [maxima]. Their intelligence comes from a set of
human-designed rules.  This is fine for some well-defined problems. After many
years of developments, people gradually managed to design AI software that can
recognize hand written digits [mnist].  As the computer hardware capacity
grows, people then managed to make AI software that is able to recognize human
faces [facenet], or recognize objects in images [imagenet, resnet]. Next,
people continue to scale up the training data as well as the compute, along
with the architecture revolution [transformer], a group of people created
ChatGPT [instructgpt] and eventually ignited the AI hype.

[maxima]: https://maxima.sourceforge.io/
[mnist]: https://en.wikipedia.org/wiki/MNIST_database
[facenet]: https://arxiv.org/abs/1503.03832
[imagenet]: https://arxiv.org/abs/1409.0575
[resnet]: https://arxiv.org/abs/1512.03385
[transformer]: a.k.a. "Attention is All You Need" https://arxiv.org/abs/1706.03762
[instructgpt]: https://arxiv.org/abs/2203.02155

If you want to learn more about the history of AI, and the recent AI advances,
it is suggested to refer to other materials such as Wikipedia. My personal
recommendation on professional reading on the recent advances is the Deep
Learning Book [dlbook]. The arxiv papers mentioned above are simply milestones
on the road of AI development.

Oh, in particular, the popular Large Language Models (LLMs) are mostly decoder-only
transformers [transformer].

[dlbook]: https://www.deeplearningbook.org/


Appendix B.2 -- What is (modern) AI Software?
---------------------------------------------

First, it is still software, but a particular type. In the current context, it
mentions the software that is able to learn from data, and make predictions or
decisions based on the learned knowledge. The trained AI models, can be
distributed along with the inference software for deployment to achieve a
certain functionality. From the file system point of view, the creation and
distribution of AI software involves several parts:

(1) Training data, or training simulator. The training data, for example, can
    be a large dataset of images as well as their annotations [imagenet,coco],
    for some AIs focusing on computer vision. In the natural language
    processing (or computational linguistics) area, the training data can be
    the Wikipedia dumps. Training simulator is not quite "data", as it can be
    Grand Theft Auto [gta] or Minecraft [minecraft], which are used to train
    AI agents to play.  Physics simulators and Atari games are also used for
    this purpose.

(2) Training software. This is just software, written in your familiar language
    like Python, C++, etc. It defines the model architecture, algorithm
    parameters, and the training process. You can run the training software,
    on the prepared training data, and obtain a trained model. The model here
    is also called a "pre-trained" model in later steps.

(3) Pre-trained model. Technically, you can think of such model to be a set of
    Matrices and vectors. And on the disk, they are indeed matrices and
    vectors. The model is the most tricky part. We will expand on this later.

(4) Inference software. Inference is the stage where you use an already trained
    (i.e., pre-trained) model to do some predictions. Training software itself
    is not enough if you want to use a pre-trained model. That's because the
    inference of an AI model can be very different from its training. For
    example, the dropout layer, and diffusion probabilistic models, behaves
    differently between training and inference. So, while it is possible to
    guess how to do inference of the model based on the training software, it
    is not always easy.

[coco]: https://cocodataset.org/
[gta]: https://en.wikipedia.org/wiki/Grand_Theft_Auto
[minecraft]: https://en.wikipedia.org/wiki/Minecraft


Appendix B.3 -- Common Practice of Handling Pre-trained AI Models
-----------------------------------------------------------------

The common practice in the ecosystem is to release the pre-trained model and
the inference software, so the AI software can be used by the end users.  For
instance, Huggingface is a popular platform for sharing pre-trained models,
based on git-lfs. The pre-trained AI models are not quite easy to handle,
because some of the state-of-the-art models are super giant blobs that can
easily go up to 400+GB [deepseek-r1]. Unlike Art creations like photos and
videos that do not need further editing after the final cut, the AI models are
nowadays frequently updated. So it is not wise to embed the AI models into the
code repository and distribute them together. Instead, the common practice is
to put the model separately, on some dedicated servers, cloud storage, or
huggingface. Then, the AI software upstream will write the code for
automatically downloading the model from the internet, or at least they will
write some instructions in the readme to tell users how to download the model
and prepare it for use [llama.cpp, ollama, torchvision, transformers].

[huggingface]: https://huggingface.co/
[deepseek-r1]: https://huggingface.co/deepseek-ai/DeepSeek-R1
[llama.cpp]: https://github.com/ggerganov/llama.cpp
[ollama]: https://ollama.com/
[torchvision]: https://pytorch.org/vision/stable/index.html
[transformers]: https://huggingface.co/docs/transformers/index

We can take a closer look at some recent popular AI models, in particular,
Large Language Models (LLMs), based on the ollama's listing [ollama-listing]:

 * Deepseek-R1: the model (or say, model weights) itself is released under MIT
   license.  https://ollama.com/library/deepseek-r1
 * LLama3.3: the model is released under a custom non-free license
   (see for example the "Additional Commercial Terms" section).
   https://ollama.com/library/llama3.3/blobs/bc371a43ce90
 * Phi-4 (Microsoft): the model is released under MIT license.
   https://ollama.com/library/phi4/blobs/fa8235e5b48f
 * Mistral-7B: the model is released under Apache-2.0 license.
   https://ollama.com/library/mistral/blobs/43070e2d4e53

Indeed, you can see a trend that the ecosystem appreciates open source or
DFSG-compatible licenses.  And sharing knowledge is beneficial to the whole
ecosystem.

[ollama-listing]: https://ollama.com/search


Appendix B.4 -- "Reproducibility" of AI Models
----------------------------------------------

Different from compiled software, AI models are not deterministic. The training
process is stochastic, and there are still many factors that can affect the
training result even if all random seeds are fixed. So the "reproducibility" in
the AI context is usually not the byte-to-byte reproducibility, but the model
performance reproducibility -- the reproduced model can reach a similar
performance or effectiveness as the original model to be reproduced.