Fine-Tuning Qwen3-1.7B with LoRA + SFTTrainer (Production-Level Guide)
Tech3Space15 Jun 2026
๐ Fine-Tuning Qwen3-1.7B with LoRA + SFTTrainer (Production-Level Guide)
๐ง Description
In this tutorial, we build a complete parameter-efficient fine-tuning pipeline for the Qwen3-1.7B model using LoRA (Low-Rank Adaptation) and Hugging Faceโs SFTTrainer. This setup is optimized for real-world usage: low VRAM, stable training, fast convergence, and deployment-ready model merging.
Youโll learn how to:
- Load Qwen3 model efficiently
- Apply LoRA for memory-efficient training
- Format chat datasets properly
- Train using SFTTrainer
- Save + merge LoRA adapters for deployment
๐ Full Tutorial: LoRA Fine-Tuning Qwen3-1.7B
โ๏ธ 1. Project Setup
We start by importing required libraries:
1import torch 2from datasets import load_dataset 3from transformers import AutoTokenizer, AutoModelForCausalLM 4from peft import LoraConfig, get_peft_model 5from trl import SFTTrainer, SFTConfig
๐ฅ Why these libraries?
- transformers โ Model + tokenizer loading
- datasets โ Efficient dataset pipeline
- peft โ LoRA implementation
- trl โ Supervised fine-tuning (SFTTrainer)
๐ฆ 2. Configuration Setup
1MODEL_NAME = "./Qwen/Qwen3-1.7B" 2DATASET_PATH = "./dataset/train.jsonl" 3OUTPUT_DIR = "./qwen3_lora_sft_pro" 4 5MAX_LENGTH = 512
๐ง Key Idea:
- Local Qwen model path
- JSONL dataset format
- Controlled sequence length for GPU efficiency
๐ค 3. Tokenizer Setup
1tokenizer = AutoTokenizer.from_pretrained( 2 MODEL_NAME, 3 trust_remote_code=True, 4) 5 6if tokenizer.pad_token is None: 7 tokenizer.pad_token = tokenizer.eos_token
๐ก Why this matters:
- Qwen uses custom tokenizer logic
- Padding token ensures stable batching
- Prevents training crashes during packing
๐ง 4. Load Model (Optimized for GPU Training)
1dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16 2 3model = AutoModelForCausalLM.from_pretrained( 4 MODEL_NAME, 5 trust_remote_code=True, 6 dtype=dtype, 7)
โก Optimization choices:
bfloat16โ better stability (A100/H100 GPUs)float16โ fallback for consumer GPUs
โ๏ธ 5. Memory Optimization Tricks
1model.config.use_cache = False 2model.gradient_checkpointing_enable()
๐ Why this is important:
- Saves GPU memory during backpropagation
- Enables training large models on limited VRAM
๐ง 6. LoRA Configuration (Core Idea)
1lora_config = LoraConfig( 2 r=16, 3 lora_alpha=32, 4 lora_dropout=0.05, 5 bias="none", 6 task_type="CAUSAL_LM", 7 target_modules=[ 8 "q_proj", "k_proj", "v_proj", 9 "o_proj", 10 "gate_proj", "up_proj", "down_proj", 11 ], 12)
๐ง Explanation:
LoRA injects trainable low-rank matrices into transformer layers:
- r=16 โ capacity of adaptation
- alpha=32 โ scaling factor
- dropout=0.05 โ regularization
- target_modules โ attention + MLP layers
๐ This makes training:
- 10โ50x cheaper
- Faster convergence
- Minimal GPU memory usage
๐งฉ 7. Apply LoRA to Model
1model = get_peft_model(model, lora_config) 2model.print_trainable_parameters()
๐ Result:
Only ~1โ5% of parameters are trainable instead of full model.
๐ 8. Load Dataset
1dataset = load_dataset( 2 "json", 3 data_files=DATASET_PATH, 4 split="train", 5)
๐ Format expected:
1{ 2 "messages": [ 3 {"role": "user", "content": "Hello"}, 4 {"role": "assistant", "content": "Hi! How can I help?"} 5 ] 6}
๐งน 9. Dataset Preprocessing
1def preprocess(example): 2 text = tokenizer.apply_chat_template( 3 example["messages"], 4 tokenize=False, 5 add_generation_prompt=False, 6 ) 7 return {"text": text} 8 9dataset = dataset.map(preprocess, remove_columns=dataset.column_names)
๐ก Why chat template matters:
- Converts structured conversation โ training text
- Ensures Qwen-style formatting consistency
๐๏ธ 10. Training Configuration (SFTConfig)
1training_args = SFTConfig( 2 output_dir=OUTPUT_DIR, 3 max_length=MAX_LENGTH, 4 packing=True, 5 6 per_device_train_batch_size=1, 7 gradient_accumulation_steps=8, 8 9 num_train_epochs=3, 10 11 learning_rate=2e-4, 12 warmup_ratio=0.03, 13 lr_scheduler_type="cosine", 14 15 weight_decay=0.05, 16 17 logging_steps=10, 18 save_steps=500, 19 save_total_limit=3, 20 21 bf16=torch.cuda.is_bf16_supported(), 22 fp16=not torch.cuda.is_bf16_supported(), 23 24 report_to="none", 25)
๐ฅ Key Training Insights:
- packing=True โ improves GPU utilization
- grad accumulation โ simulates bigger batch size
- cosine scheduler โ smoother convergence
- warmup_ratio โ stabilizes early training
๐ค 11. Initialize Trainer
1trainer = SFTTrainer( 2 model=model, 3 args=training_args, 4 train_dataset=dataset, 5 processing_class=tokenizer, 6)
๐ 12. Start Training
1trainer.train()
At this point:
- LoRA layers start learning task-specific behavior
- Base model remains frozen
๐พ 13. Save LoRA Adapter
1trainer.save_model(OUTPUT_DIR) 2tokenizer.save_pretrained(OUTPUT_DIR)
๐ 14. Merge Model for Deployment
1#!/usr/bin/env python3 2 3import os 4import torch 5 6from datasets import load_dataset 7from transformers import ( 8 AutoTokenizer, 9 AutoModelForCausalLM, 10) 11 12from peft import ( 13 LoraConfig, 14 get_peft_model, 15) 16 17from trl import ( 18 SFTTrainer, 19 SFTConfig, 20) 21 22 23# ============================================================ 24# CONFIG 25# ============================================================ 26 27MODEL_NAME = "./Qwen3-1.7B" 28DATASET_PATH = "./dataset/train.jsonl" 29 30OUTPUT_DIR = "./qwen3_lora_sft_pro" 31MERGED_DIR = OUTPUT_DIR + "_merged" 32 33MAX_LENGTH = 512 34 35os.makedirs(OUTPUT_DIR, exist_ok=True) 36 37 38# ============================================================ 39# TOKENIZER 40# ============================================================ 41 42print("Loading tokenizer...") 43 44tokenizer = AutoTokenizer.from_pretrained( 45 MODEL_NAME, 46 trust_remote_code=True, 47) 48 49if tokenizer.pad_token is None: 50 tokenizer.pad_token = tokenizer.eos_token 51 52tokenizer.padding_side = "right" 53 54 55# ============================================================ 56# MODEL 57# ============================================================ 58 59print("Loading model...") 60 61dtype = ( 62 torch.bfloat16 63 if torch.cuda.is_available() 64 and torch.cuda.is_bf16_supported() 65 else torch.float16 66) 67 68model = AutoModelForCausalLM.from_pretrained( 69 MODEL_NAME, 70 trust_remote_code=True, 71 torch_dtype=dtype, 72 device_map="auto", 73) 74 75model.config.use_cache = False 76model.gradient_checkpointing_enable() 77 78 79# ============================================================ 80# LORA CONFIG 81# ============================================================ 82 83print("Applying LoRA...") 84 85lora_config = LoraConfig( 86 r=16, 87 lora_alpha=32, 88 lora_dropout=0.05, 89 bias="none", 90 task_type="CAUSAL_LM", 91 target_modules=[ 92 "q_proj", 93 "k_proj", 94 "v_proj", 95 "o_proj", 96 "gate_proj", 97 "up_proj", 98 "down_proj", 99 ], 100) 101 102model = get_peft_model( 103 model, 104 lora_config, 105) 106 107model.print_trainable_parameters() 108 109 110# ============================================================ 111# DATASET 112# ============================================================ 113 114print("Loading dataset...") 115 116dataset = load_dataset( 117 "json", 118 data_files=DATASET_PATH, 119 split="train", 120) 121 122 123def preprocess(example): 124 text = tokenizer.apply_chat_template( 125 example["messages"], 126 tokenize=False, 127 add_generation_prompt=False, 128 ) 129 130 return { 131 "text": text 132 } 133 134 135dataset = dataset.map( 136 preprocess, 137 remove_columns=dataset.column_names, 138) 139 140print(dataset) 141 142 143# ============================================================ 144# TRAINING CONFIG 145# ============================================================ 146 147training_args = SFTConfig( 148 output_dir=OUTPUT_DIR, 149 150 max_length=MAX_LENGTH, 151 packing=True, 152 153 per_device_train_batch_size=1, 154 gradient_accumulation_steps=8, 155 156 num_train_epochs=3, 157 158 learning_rate=2e-4, 159 warmup_ratio=0.03, 160 161 lr_scheduler_type="cosine", 162 weight_decay=0.05, 163 164 logging_steps=10, 165 166 save_steps=500, 167 save_total_limit=3, 168 169 bf16=torch.cuda.is_available() 170 and torch.cuda.is_bf16_supported(), 171 172 fp16=not ( 173 torch.cuda.is_available() 174 and torch.cuda.is_bf16_supported() 175 ), 176 177 report_to="none", 178 179 dataset_text_field="text", 180) 181 182 183# ============================================================ 184# TRAINER 185# ============================================================ 186 187print("Initializing trainer...") 188 189trainer = SFTTrainer( 190 model=model, 191 args=training_args, 192 train_dataset=dataset, 193 processing_class=tokenizer, 194) 195 196 197# ============================================================ 198# TRAIN 199# ============================================================ 200 201print("Starting training...") 202 203trainer.train() 204 205print("Training completed.") 206 207 208# ============================================================ 209# SAVE LORA ADAPTER 210# ============================================================ 211 212print("Saving LoRA adapter...") 213 214trainer.save_model(OUTPUT_DIR) 215tokenizer.save_pretrained(OUTPUT_DIR) 216 217 218# ============================================================ 219# MERGE LORA 220# ============================================================ 221 222print("Merging LoRA into base model...") 223 224merged_model = model.merge_and_unload() 225 226merged_model.save_pretrained( 227 MERGED_DIR, 228 safe_serialization=True, 229) 230 231tokenizer.save_pretrained( 232 MERGED_DIR, 233) 234 235print(f"Merged model saved to: {MERGED_DIR}") 236 237print("Done.") 238
โก Why merging matters:
- Removes LoRA dependency
- Produces standalone model
- Easier deployment with vLLM / Transformers / APIs
๐ฏ Final Output
After training, you get:
- ๐ LoRA adapter model
- ๐ Full merged model
- ๐ Tokenizer files
- ๐ Deployment-ready checkpoint