NVIDIA: Llama 3.1 Nemotron Nano 8B v1

nvidia/llama-3.1-nemotron-nano-8b-v1

Llama-3.1-Nemotron-Nano-8B-v1 is a compact large language model (LLM) derived from Meta's Llama-3.1-8B-Instruct, specifically optimized for reasoning tasks, conversational interactions, retrieval-augmented generation (RAG), and tool-calling applications. It balances accuracy and efficiency, fitting comfortably onto a single consumer-grade RTX GPU for local deployment. The model supports extended context lengths of up to 128K tokens.

Note: you must include detailed thinking on in the system prompt to enable reasoning. Please see Usage Recommendations(opens in new tab) for more.

Modalities

Context

131K

Knowledge Cutoff

Dec 31, 2023

Recent activity on Llama 3.1 Nemotron Nano 8B v1

Total usage per day on OpenRouter

Not enough data to display yet.

NVIDIA: Llama 3.1 Nemotron Nano 8B v1