Reading code

Exploring code in the era of large language models

Reading code is often boring. Not to mention that trying to understand what is that code snippet for could take a long time. Here we try to examine code (a Python file) – actually not extremely difficult to understand! – just to offer an alternative view. Of course… it’s just an example and may not have the greatest generality, but it will still be useful..

Example file

We use Visual Studio Code (just for convenience, but other editors are ok). The file we are going to explore is model.py from llama2.c repository by A. Karpathy. Let’s pretend we know neural networks, transformers and large language models. The code is 367 lines long and represents the model behind a tiny Llama 2 implementation. Llama 2 is an open source large language model software that can be used for natural language processing tasks such as sentiment analysis, topic modeling, named entity recognition, and more.

What is this code for?

Many code files have a description of what the code does more or less at the beginning of the file (usually as comments within quotes, slashes or hashtags). This is not the case in our example (the file model.py), but the file name is self explanatory enough: it is the code for the neural network model (based on transformer architecture). We will not deal here with what a transformer or an attention mechanism is because it would be beyond the scope of the post.

At the beginning of the computer program you can see the lines for importing modules or libraries, identified by the initial import command. Note that the ML framework used is PyTorch (import torch). Sometimes it is useful to have a bird’s-eye view of the code. So, start to shrink each class by clicking near the class definition, as shown below.

Click on the wedge-shaped element to shrink the dataclass definition

You can do the same with function definitions. After shrinking, you find the following three blocks of code.

First block: defines the class with model arguments (ModelArgs) and the class RMSNorm implementing Root Mean Square Layer Normalization, used in TransformerBlock and Transformer classes below. Don’t worry if you don’t know this stuff about AI, we’re not here to discuss it, it’s the connection between code components that we care about!

Second block: defines some auxiliary function we’ll see in detail later.

Third block: the place in which the elementary components that make up the transformer are defined: Attention and FeedForward are used to define TransformerBlock and Transformer is built using of TransformerBlock blocks.

A little help from ChatGPT

Now, as an example, let’s investigate the functions from the second block. Suppose we want to find out what the precompute_freqs_cis function is all about. Let’s ask ChatGPT! You shouldn’t trust ChatGPT’s answers much, after all it’s something that doesn’t think. This might seem like a last-ditch effort because ChatGPT is misleading and unreliable… but it works quite well when asked to sift through a piece of code.

A prototype request for examining a code could be as follows:

examine the following code:

```*LANGUAGE HERE (optional)*
*CODE HERE*
```

Actually, asking “examine the following code” is overkill because you can just cut and paste the code into the ChatGPT prompt window to get the same answer. Anyway, let’s ask ChatGPT about the precompute_freqs_cis function.

examine the following code:
```
def precompute_freqs_cis(
    dim: int,
    end: int,
    theta: float = 10000.0):
    freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)\
        [: (dim // 2)].float() / dim))
    t = torch.arange(end, device=freqs.device)  # type: ignore
    freqs = torch.outer(t, freqs).float()  # type: ignore
    freqs_cis = torch.polar(torch.ones_like(freqs),
        freqs)  # complex64
    return freqs_cis
```

Here is ChatGPT’s response:

The provided code snippet is a Python function named precompute_freqs_cis that calculates and returns a complex-valued tensor named freqs_cis. The function appears to be related to precomputing frequency values for some purpose, possibly for signal processing or mathematical computations. Let’s break down the code step by step:

[CUT]
Detailed code description in 6 steps
[end CUT]

Overall, this function seems to be used to precompute complex frequency values based on given parameters. It’s important to note that this code snippet relies on the torch library, likely from PyTorch, as indicated by the usage of functions like torch.arange, torch.outer, and torch.polar.

Detailed code description in 6 steps (returned by ChatGPT) is the following.

Seems fine. Where are these frequencies used? Let’s discover more examining the next function, that is reshape_for_broadcast. Similar request:

examine the following code:
```
def reshape_for_broadcast(
    freqs_cis: torch.Tensor,
    x: torch.Tensor):
    ndim = x.ndim
    assert 0 <= 1 < ndim
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
    shape = [d if i == 1 or i == ndim - 1 else 1 \
        for i, d in enumerate(x.shape)]
    return freqs_cis.view(*shape)
```

ChatGPT replies that

“The provided code snippet defines a function named reshape_for_broadcast that reshapes a given complex tensor (freqs_cis) to match the shape of another tensor (x) for broadcasting purposes. Broadcasting is a technique used in array-based computations to perform operations on arrays of different shapes without explicitly replicating the data.”

And again:

“In summary, the purpose of the reshape_for_broadcast function is to reshape the freqs_cis tensor in a way that allows it to be broadcasted (element-wise operations) with the tensor x. Broadcasting enables efficient computation between tensors of different shapes, and this function helps ensure that the broadcasting is performed correctly by reshaping the freqs_cis tensor to match the shape of the x tensor.”

Without knowing much about the context, ChatGPT gives an accurate and sound answer because these complex frequencies are actually used in the apply_rotary_emb function, that implements rotary embeddings, that is, is a common technique used to inject positional information into the input data when working with self-attention mechanisms, such as those found in transformer models (for example, to encode the position of a word in a phrase).

Conclusions

This post just wanted to be an example of how you can make it easier to read a code. If you want to make it easier for you to read and understand what a code does, the features offered by editing tools (such as modern IDEs) and current language models offer great advantages and possibilities. In this example we have used ChatGPT but it is clear that similar tools such as h2oGPT, Open Assistant and so on can be used very well. Of course the best thing is to try and experiment, because without practice you won’t get anywhere.

Useful links

llama2.c – A. Karpathy’s GitHub page (link)

Introducing Llama 2 – Meta AI (link)

What Is a Transformer Model? – NVIDIA blog (link)

Rotary Embeddings: A Relative Revolution – EleutherAI (link)

ChatGPT login page (link)