Computers AI Large Language Model On this page
Resources
Understanding large language models
(HN )
Top 10-ish papers to understand the design, constraints and evolution of
LLMs
Development of LLMs: Attention weighted encodings, transformer, BERT, GPT,
BART
Improving the efficiency of LLMs: FlashAttention, Cramming, finetuning
methods, Chinchilla model, InstructGPT, and more on reinforcement learning
with human feedback (RLHF)
What we know about LLMs (Primer) | Will Thompson
A simple explainer of what is considered an LLM, what we knew about LLMs and
what are the ongoing research
Includes a lot of links to other resources. A few concepts introduced
include LLMs' capability to generalize knowledge, power law in LLMs'
performance, reinforcement learning via human feedback (RLHF), etc.
LLM Visualization
3D graphics visualizing parameters of a LLM model at each stage from
tokenization to the output
LLM Course | GitHub @mlabonne
Resources from mathematics, to Python, to neural networks, to NLP
Spreadsheets are all you need
(HN )
Understand GPT with Excel Spreadsheet
Links
Understanding ChatGPT | Atmosera
(HN )
Understand how things went from RRN to LSTM to Transformer, to BERT, to GPT
Contains a brief explanation of each advancement and links to all the
important papers
Web LLM | GitHub @mlc-ai — Running LLM
directly in the browser
How Replit train their own Large Language Models
Data processing (Databricks) → Custom tokenization → Model training
(MosaicML) → Evaluation (HumanEval framework)
All the Hard Stuff Nobody Talks About when Building Products with LLMs | Honeycomb.io
It's hard to build a real product backed by an LLM
Limited context windows, LLMs are slow and chaining is impractical, prompt
engineering is weird, prompt injection, etc
Understanding GPT tokenizers | Simon Willison
Optimizations by including leading space in the token
The tokenization is biased towards English words
Glitch tokens: words that have no meaning but got tokenized, and get near 0
weight after training lead to weird glitch
The history of open-source LLMs | Deep (Learning) Focus
Nice graphs and tables visualizing the performances of different LLMs
Explains the evolution from lower-quality LLMs (BLOOM and OPT) to recent
powerful models (LLaMA and MPT)
10 open challenges in LLM research | Chip Huyen
(HN )
Reduce & measure hallucinations, optimize context construction, multimodal
inputs, faster & cheaper, new architecture, GPU alternatives, agents acting
on behalf of LLM, human preference, chat interface, non-English language
Asking 60+ LLMs a set of 20 questions
(HN )
Benchmarking LLMs with some reflexion, knowledge, code, instructions and
creativity questions
More "realistic" benchmarks then those exams because it's likely it's
outside the training set
How transformers work
Nice graphics explaining concepts like embeddings, self-attention mechanism,
beam search and hallucination
Decomposing Language Models Into Understandable Components
(HN )
A single neuron does not have consistent meaning, but a group of neurons
does, called "features"
Artificially activating features can steer the output of models, improving
security and our understanding of LLMs
ChatGPT system prompts
(HN )
Training great LLMs entirely from ground zero in the wilderness as a startup
Technology used, difficulties for startup "in the wild" (i.e. outside
Google)
Some comparison of training in startup v.s. training with Google infra
LLM Generality is a Timeline Crux | LessWrong
Limitation exists, scaling, scaffolding and tooling can't fully overcome
Narrative jailbreaking
Not the "Ignore previous instructions", but slowly nudge the narrative of
the LLM because they are just next token generators, and they try to be
internally consistent
The distinction between the assistant and user role sometimes is weak