# multi-task learning

> form of machine learning where a model learns multiple tasks

**Wikidata**: [Q6934509](https://www.wikidata.org/wiki/Q6934509)  
**Wikipedia**: [English](https://en.wikipedia.org/wiki/Multi-task_learning)  
**Source**: https://4ort.xyz/entity/multi-task-learning

## Summary
Multi-task learning is a machine-learning paradigm in which a single model is trained to solve several related tasks simultaneously, leveraging shared representations to improve generalization on every task. By forcing the network to optimize for multiple objectives, it extracts common structure that boosts performance compared with training a separate model for each task.

## Key Facts
- Subclass of: machine learning (Wikidata Q2539)
- Learning setting: simultaneous optimization across ≥2 tasks
- Shared mechanism: common hidden layers/parameters among tasks
- Goal: better generalization through inductive transfer

## FAQs
### Q: How does multi-task learning differ from single-task learning?
A: Single-task learning trains one model per objective. Multi-task learning trains one model on many objectives at once, letting gradients from each task update shared weights and thereby improve low-resource tasks with data from high-resource ones.

### Q: When should you use multi-task learning?
A: Use it when you have several related prediction problems and limited labeled data for some of them; the shared representation can raise accuracy on all tasks while cutting total parameter count.

### Q: Does multi-task learning always improve performance?
A: No. If tasks are unrelated or optimization is imbalanced, gradients can conflict and degrade accuracy. Task weighting, architecture design, and gradient-surgery techniques mitigate negative transfer.

## Why It Matters
Multi-task learning turns data scarcity into an advantage: tasks with abundant labels regularize rarer ones through shared parameters, yielding higher accuracy with fewer examples. It reduces deployment costs because one compact network replaces many specialized models, cutting memory and latency in production systems. The paradigm underpins transfer-learning breakthroughs such as BERT and T5, where masked-language-modeling and downstream objectives are jointly optimized. By embedding linguistic knowledge into shared layers, multi-task learning has become the default route to state-of-the-art language understanding, computer-vision backbones, and reinforcement-learning agents.

## Notable For
- Enables parameter sharing across tasks, shrinking model size versus ensembles
- Serves as precursor to modern pre-training/fine-tuning pipelines
- Provides built-in data augmentation via auxiliary tasks

## Body
### Formal Definition
Multi-task learning minimizes a joint loss L = Σᵢ wᵢ Lᵢ where Lᵢ is the loss of task i and wᵢ is its weight. Shared layers learn representations common to all tasks; task-specific heads map these representations to individual outputs.

### Architectures
Hard parameter sharing keeps the first k layers identical across tasks and branches only the final layers. Soft parameter sharing gives each task its own network but adds regularization terms that encourage similarity between corresponding parameters.

### Optimization Challenges
Gradient magnitude imbalance causes some tasks to dominate training. Solutions include gradient normalization, uncertainty-based weighting, and dynamic average weighting that rescales losses according to their relative rates of change.

### Evaluation
Performance is reported as the average or per-task metric versus single-task baselines. Significance is tested with paired bootstrap or t-tests across multiple random seeds.

## Schema Markup
```json
{
  "@context": "https://schema.org",
  "@type": "Thing",
  "name": "multi-task learning",
  "description": "A form of machine learning where a model learns multiple tasks simultaneously.",
  "sameAs": ["https://www.wikidata.org/wiki/Q2539"]
}

## References

1. [OpenAlex](https://docs.openalex.org/download-snapshot/snapshot-data-format)