Productize AI

AI as a product.

Links

Production AI Systems are really hard | Kevin Fischer (HN)
- Using radiologists as an example to explain why it is hard to build production AI systems that can replace an occupation
- The top comment in HN gives good insight, sometimes the market AI companies focus on is wrong
Patterns for Building LLM-based Systems & Products (HN)
- Will be a long read. Contains patterns for the following seven topics:
  - Evaluations: to measure the performance of the models
  - Retrieval-Augmented Generation (RAG): to provide richer context to the model
  - Fine-tuning: to get better at specific tasks, usually a domain-specific dataset
  - Caching: to reduce latency and cost for semantically similar requests
  - Guardrails: to ensure output quality (syntactically and factually correct, free from harmful content)
  - Defensive UX: guide user behaviour, avert possible misuse & handle errors gracefully
  - Collect user feedback: incorporate user feedback into the UX design to build a data flywheel
The pain points of building a copilot system
- Trial and error in prompting, difficult to orchestrate data sources & prompts, flaky tests
- Unclear and evolving best practices, safety, privacy, compliance, undesirable DevEx
AI Design Patterns
- 4 AI design patterns for deployment & training, e.g.:
- AI Router: route recognized type to small language model
- Proxy to clean query and answer for AI
What we learned in 6 months of working on a CodeGen dev tool - GPT Pilot
- Spec writer for better initial description
- Iterative process and allowing agents to review themselves
- LLMs work best when focused on one problem, on smile files
- Asking LLM to code with verbose logs helps LLM to debug the code
Why Google failed to make GPT-3 + Why Multimodal Agents are the path to AGI
- Google has Transformer, why OpenAI ended up with GPT 1/2/3, not Google?
- Internal processes slow researchers
What we learned from a year building with LLMs
- Part 1: tactical practices on prompting, RAG, tuning, caching, evaluation & monitoring
Secrets of the ChatGPT Linux system
- On poking the sandboxed environment where ChatGPT run code
How to make LLMs shut up (HN)
- Prompting and external judge does not work
- Clustering worked: use user rating to form cluster with embeddings to determine if the comment is useful or not

Links​

Links