different types of dataset examples
csemachine learning
Perfect — you’re asking for different types of dataset examples for LLM / NLP (especially text-to-text, reasoning, code, chat). I’ll give you a complete structured set 🔥 so you can directly use them.
🧠 1. Basic Text-to-Text (Simple NLP)
👉 Direct input → output
{"input": "Translate Hello to Hindi", "output": "नमस्ते"}
{"input": "Summarize: AI is growing very fast.", "output": "AI is growing quickly."}
✔️ Use for:
- translation
- summarization
🧠 2. Instruction-Based Dataset (Most Important 🔥)
{"instruction": "Explain concept", "input": "SQL Injection", "output": "SQL Injection is a vulnerability where attackers inject malicious queries."}
{"instruction": "Define term", "input": "Machine Learning", "output": "Machine learning is a method where systems learn from data."}
✔️ Used in:
- modern LLMs
- instruction tuning
🧠 3. Question Answering (QA)
{"question": "What is Python?", "answer": "Python is a programming language."}
{"question": "What is NLP?", "answer": "Natural Language Processing is a field of AI."}
✔️ Simple but powerful
🧠 4. Conversational (ChatGPT Style 🔥)
{
"messages": [
{"role": "user", "content": "What is SQL Injection?"},
{"role": "assistant", "content": "SQL Injection is a vulnerability..."}
]
}
🧠 5. Multi-turn Conversation (Advanced Chat)
{
"messages": [
{"role": "user", "content": "What is XSS?"},
{"role": "assistant", "content": "XSS is a web vulnerability."},
{"role": "user", "content": "Give example"},
{"role": "assistant", "content": "<script>alert(1)</script>"}
]
}
🧠 6. Code Generation Dataset 🔥
{"instruction": "Write Python code for factorial", "output": "def fact(n): return 1 if n==0 else n*fact(n-1)"}
{"instruction": "Write JavaScript fetch API example", "output": "fetch('url').then(res=>res.json())"}
🧠 7. Reasoning (Chain-of-Thought) 🔥
{
"instruction": "Check if number is prime",
"output": "Step 1: Check divisibility\nStep 2: Loop from 2 to n\nStep 3: If divisible return false\nFinal code: def is_prime(n): ..."
}