What is a Large Language Model?
LLMs are neural networks trained to predict the next token in a sequence across enormous amounts of text. Through this seemingly simple objective, they learn grammar, reasoning, factual knowledge, and task-following behaviour โ all from statistical patterns in language.
LLMs operate on sub-word tokens, not whole words. "unbelievable" might be split into ["un", "believ", "able"]. A typical LLM vocabulary has 50,000โ100,000 tokens. This matters for understanding context limits and API costs.
The Transformer: Self-Attention
The key innovation in modern LLMs is the self-attention mechanism (Vaswani et al., 2017). It allows every token to directly attend to every other token in the context, capturing long-range dependencies that recurrent networks struggled with.
๐ Multi-head attention runs this operation \(h\) times in parallel with different learned projections โ allowing the model to simultaneously attend to syntax, coreference, semantic similarity, and other relationship types.
Practical Deployment: What Actually Matters
For business applications, the most important concepts are context window (how much text the model processes at once), prompt engineering (structuring inputs for consistent outputs), and RAG (Retrieval-Augmented Generation โ connecting models to your own data).
| Approach | When to use | Cost | Customisation |
|---|---|---|---|
| Prompt engineering | General tasks, quick exploration | Low | Low |
| RAG | Domain Q&A over your own documents | Medium | Medium |
| Fine-tuning | Consistent style/format, specialised tasks | High | High |
โ ๏ธ LLMs hallucinate. They generate plausible-sounding text, not guaranteed facts. Always validate outputs for high-stakes decisions, and use RAG with verifiable sources when factual accuracy is critical.
LLMs in Business Workflows
The most impactful applications we see in small business contexts are: document summarisation and extraction, customer-facing chatbots grounded in company data, automated report generation, and email/comms drafting. The key is identifying tasks that are language-heavy, repetitive, and currently done manually.
Beyond single-turn responses, LLMs can be embedded in multi-step agent loops โ reading data, calling tools, checking outputs, and iterating autonomously. This is where the real productivity gains come from, but also where careful design and guardrails become essential.