32 papers

DeepSeek-V3 Technical Report
The Efficiency Manifesto. Introduced Multi-Head Latent Attention (MLA) and DeepSeekMoE, proving GPT-4 class models can be trained for $5.5M.


DeepSeek-V3 Technical Report
The Efficiency Manifesto. Introduced Multi-Head Latent Attention (MLA) and DeepSeekMoE, proving GPT-4 class models can be trained for $5.5M.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces
The Transformer Challenger. Proposed a modern State Space Model (SSM) architecture that offers linear scaling, influencing new "hybrid" architectures.


Mamba: Linear-Time Sequence Modeling with Selective State Spaces
The Transformer Challenger. Proposed a modern State Space Model (SSM) architecture that offers linear scaling, influencing new "hybrid" architectures.

Direct Preference Optimization (DPO)
Killed PPO. Simplified alignment by mathematically showing you can optimize for human preferences directly without training a separate Reward Model.


Direct Preference Optimization (DPO)
Killed PPO. Simplified alignment by mathematically showing you can optimize for human preferences directly without training a separate Reward Model.

QLoRA: Efficient Finetuning of Quantized LLMs
The Democratizer. Combined 4-bit quantization with LoRA, allowing anyone to finetune a 65B parameter model on a single consumer GPU.


QLoRA: Efficient Finetuning of Quantized LLMs
The Democratizer. Combined 4-bit quantization with LoRA, allowing anyone to finetune a 65B parameter model on a single consumer GPU.

Voyager: An Open-Ended Embodied Agent with Large Language Models
The Agent Blueprint. One of the first papers to successfully use an LLM to write code, execute it in Minecraft, fail, and self-correct via a feedback loop.


Voyager: An Open-Ended Embodied Agent with Large Language Models
The Agent Blueprint. One of the first papers to successfully use an LLM to write code, execute it in Minecraft, fail, and self-correct via a feedback loop.

Segment Anything (SAM)
Meta's foundation model for image segmentation that generalizes to zero-shot objects.


Segment Anything (SAM)
Meta's foundation model for image segmentation that generalizes to zero-shot objects.

LLaMA: Open and Efficient Foundation Language Models
Meta's release that kickstarted the open-source LLM race by proving smaller, better-trained models can rival giants.


LLaMA: Open and Efficient Foundation Language Models
Meta's release that kickstarted the open-source LLM race by proving smaller, better-trained models can rival giants.

Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)
Allowed precise structural control (edges, pose, depth) over diffusion generation.


Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)
Allowed precise structural control (edges, pose, depth) over diffusion generation.