Skip to main content

Large Language Model

Resources

Understanding large language models (HN)
- Top 10-ish papers to understand the design, constraints and evolution of LLMs
- Development of LLMs: Attention weighted encodings, transformer, BERT, GPT, BART
- Improving the efficiency of LLMs: FlashAttention, Cramming, finetuning methods, Chinchilla model, InstructGPT, and more on reinforcement learning with human feedback (RLHF)
What we know about LLMs (Primer) | Will Thompson
- A simple explainer of what is considered an LLM, what we knew about LLMs and what are the ongoing research
- Includes a lot of links to other resources. A few concepts introduced include LLMs' capability to generalize knowledge, power law in LLMs' performance, reinforcement learning via human feedback (RLHF), etc.
LLM Visualization
- 3D graphics visualizing parameters of a LLM model at each stage from tokenization to the output
LLM Course | GitHub @mlabonne
- Resources from mathematics, to Python, to neural networks, to NLP
Spreadsheets are all you need (HN)
- Understand GPT with Excel Spreadsheet

Links

Understanding ChatGPT | Atmosera (HN)
- Understand how things went from RRN to LSTM to Transformer, to BERT, to GPT
- Contains a brief explanation of each advancement and links to all the important papers
Web LLM | GitHub @mlc-ai — Running LLM directly in the browser
How Replit train their own Large Language Models
- Data processing (Databricks) → Custom tokenization → Model training (MosaicML) → Evaluation (HumanEval framework)
All the Hard Stuff Nobody Talks About when Building Products with LLMs | Honeycomb.io
- It's hard to build a real product backed by an LLM
- Limited context windows, LLMs are slow and chaining is impractical, prompt engineering is weird, prompt injection, etc
Understanding GPT tokenizers | Simon Willison
- Optimizations by including leading space in the token
- The tokenization is biased towards English words
- Glitch tokens: words that have no meaning but got tokenized, and get near 0 weight after training lead to weird glitch
The history of open-source LLMs | Deep (Learning) Focus
- Nice graphs and tables visualizing the performances of different LLMs
- Explains the evolution from lower-quality LLMs (BLOOM and OPT) to recent powerful models (LLaMA and MPT)
10 open challenges in LLM research | Chip Huyen (HN)
- Reduce & measure hallucinations, optimize context construction, multimodal inputs, faster & cheaper, new architecture, GPU alternatives, agents acting on behalf of LLM, human preference, chat interface, non-English language
Asking 60+ LLMs a set of 20 questions (HN)
- Benchmarking LLMs with some reflexion, knowledge, code, instructions and creativity questions
- More "realistic" benchmarks then those exams because it's likely it's outside the training set
How transformers work
- Nice graphics explaining concepts like embeddings, self-attention mechanism, beam search and hallucination
Decomposing Language Models Into Understandable Components (HN)
- A single neuron does not have consistent meaning, but a group of neurons does, called "features"
- Artificially activating features can steer the output of models, improving security and our understanding of LLMs
ChatGPT system prompts (HN)
- How it's done by OP
Training great LLMs entirely from ground zero in the wilderness as a startup
- Technology used, difficulties for startup "in the wild" (i.e. outside Google)
- Some comparison of training in startup v.s. training with Google infra
LLM Generality is a Timeline Crux | LessWrong
- Limitation exists, scaling, scaffolding and tooling can't fully overcome
Narrative jailbreaking
- Not the "Ignore previous instructions", but slowly nudge the narrative of the LLM because they are just next token generators, and they try to be internally consistent
- The distinction between the assistant and user role sometimes is weak

Resources
Links