Exploring code in the era of large language models
Reading code is often boring. Not to mention that trying to understand what is that code snippet for could take a long time. Here we try to examine code (a Python file) – actually not extremely difficult to understand! – just to offer an alternative view. Of course… it’s just an example and may not have the greatest generality, but it will still be useful..
Example file
We use Visual Studio Code (just for convenience, but other editors are ok). The file we are going to explore is model.py
from llama2.c repository by A. Karpathy. Let’s pretend we know neural networks, transformers and large language models. The code is 367 lines long and represents the model behind a tiny Llama 2 implementation. Llama 2 is an open source large language model software that can be used for natural language processing tasks such as sentiment analysis, topic modeling, named entity recognition, and more.
What is this code for?
Many code files have a description of what the code does more or less at the beginning of the file (usually as comments within quotes, slashes or hashtags). This is not the case in our example (the file model.py
), but the file name is self explanatory enough: it is the code for the neural network model (based on transformer architecture). We will not deal here with what a transformer or an attention mechanism is because it would be beyond the scope of the post.
At the beginning of the computer program you can see the lines for importing modules or libraries, identified by the initial import
command. Note that the ML framework used is PyTorch (import torch
). Sometimes it is useful to have a bird’s-eye view of the code. So, start to shrink each class by clicking near the class definition, as shown below.
You can do the same with function definitions. After shrinking, you find the following three blocks of code.
First block: defines the class with model arguments (ModelArgs
) and the class RMSNorm
implementing Root Mean Square Layer Normalization, used in TransformerBlock
and Transformer
classes below. Don’t worry if you don’t know this stuff about AI, we’re not here to discuss it, it’s the connection between code components that we care about!
Second block: defines some auxiliary function we’ll see in detail later.
Third block: the place in which the elementary components that make up the transformer are defined: Attention
and FeedForward
are used to define TransformerBlock
and Transformer
is built using of TransformerBlock
blocks.
A little help from ChatGPT
Now, as an example, let’s investigate the functions from the second block. Suppose we want to find out what the precompute_freqs_cis
function is all about. Let’s ask ChatGPT! You shouldn’t trust ChatGPT’s answers much, after all it’s something that doesn’t think. This might seem like a last-ditch effort because ChatGPT is misleading and unreliable… but it works quite well when asked to sift through a piece of code.
A prototype request for examining a code could be as follows:
examine the following code: ```*LANGUAGE HERE (optional)* *CODE HERE* ```
Actually, asking “examine the following code” is overkill because you can just cut and paste the code into the ChatGPT prompt window to get the same answer. Anyway, let’s ask ChatGPT about the precompute_freqs_cis
function.
examine the following code: ``` def precompute_freqs_cis( dim: int, end: int, theta: float = 10000.0): freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)\ [: (dim // 2)].float() / dim)) t = torch.arange(end, device=freqs.device) # type: ignore freqs = torch.outer(t, freqs).float() # type: ignore freqs_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64 return freqs_cis ```
Here is ChatGPT’s response:
The provided code snippet is a Python function named precompute_freqs_cis
that calculates and returns a complex-valued tensor named freqs_cis
. The function appears to be related to precomputing frequency values for some purpose, possibly for signal processing or mathematical computations. Let’s break down the code step by step:
[CUT]
Detailed code description in 6 steps
[end CUT]
Overall, this function seems to be used to precompute complex frequency values based on given parameters. It’s important to note that this code snippet relies on the torch library, likely from PyTorch, as indicated by the usage of functions like torch.arange
, torch.outer
, and torch.polar
.
Detailed code description in 6 steps (returned by ChatGPT) is the following.
Seems fine. Where are these frequencies used? Let’s discover more examining the next function, that is reshape_for_broadcast
. Similar request:
examine the following code: ``` def reshape_for_broadcast( freqs_cis: torch.Tensor, x: torch.Tensor): ndim = x.ndim assert 0 <= 1 < ndim assert freqs_cis.shape == (x.shape[1], x.shape[-1]) shape = [d if i == 1 or i == ndim - 1 else 1 \
for i, d in enumerate(x.shape)] return freqs_cis.view(*shape) ```
ChatGPT replies that
“The provided code snippet defines a function named reshape_for_broadcast
that reshapes a given complex tensor (freqs_cis
) to match the shape of another tensor (x
) for broadcasting purposes. Broadcasting is a technique used in array-based computations to perform operations on arrays of different shapes without explicitly replicating the data.”
And again:
“In summary, the purpose of the reshape_for_broadcast
function is to reshape the freqs_cis
tensor in a way that allows it to be broadcasted (element-wise operations) with the tensor x
. Broadcasting enables efficient computation between tensors of different shapes, and this function helps ensure that the broadcasting is performed correctly by reshaping the freqs_cis
tensor to match the shape of the x
tensor.”
Without knowing much about the context, ChatGPT gives an accurate and sound answer because these complex frequencies are actually used in the apply_rotary_emb
function, that implements rotary embeddings, that is, is a common technique used to inject positional information into the input data when working with self-attention mechanisms, such as those found in transformer models (for example, to encode the position of a word in a phrase).
Conclusions
This post just wanted to be an example of how you can make it easier to read a code. If you want to make it easier for you to read and understand what a code does, the features offered by editing tools (such as modern IDEs) and current language models offer great advantages and possibilities. In this example we have used ChatGPT but it is clear that similar tools such as h2oGPT, Open Assistant and so on can be used very well. Of course the best thing is to try and experiment, because without practice you won’t get anywhere.
Useful links
llama2.c – A. Karpathy’s GitHub page (link)
Introducing Llama 2 – Meta AI (link)
What Is a Transformer Model? – NVIDIA blog (link)
Rotary Embeddings: A Relative Revolution – EleutherAI (link)
ChatGPT login page (link)