Quantization

Machine Learning

An efficient fine-tuning method combining quantization and LoRA for LLMs, reducing memory/compute needs.

Detailed Explanation

QLoRA (Quantized Low-Rank Adaptation) is an efficient fine-tuning technique for large language models that combines quantization with Low-Rank Adaptation (LoRA). It reduces memory and computational requirements by quantizing model weights and applying low-rank updates, enabling effective customization of models on limited hardware while maintaining high performance.

Use Cases

•Enable large language model fine-tuning on edge devices with minimal memory and compute using QLoRA.

Related Terms

Other terms in the Machine Learning category