Loading...
Development

Detailed Phase 1: Python Foundations for Data Science

Detailed Phase 1: Python Foundations for Data Science

Goal: Master Python fundamentals + core data libraries (Pandas, NumPy) to clean, explore, and analyze real datasets like a pro.


Week-by-Week Breakdown

WeekFocusHours
1Python Basics25
2Control Flow + Functions25
3Data Structures Deep Dive25
4File I/O + Error Handling20
5NumPy Mastery25
6Pandas Core30
7Data Cleaning & EDA30
8Mini-Project + GitHub20

Week 1: Python Basics

Topics

TopicDetails
Variablesint, float, str, bool
Basic Operations+ - * / // % **
Type Conversionint(), float(), str()
StringsIndexing, slicing, .split(), .join(), f-strings
Print & Inputprint(), input()

Practice (Daily)

# Day 1
name = input("Enter name: ")
age = int(input("Age: "))
print(f"Hello {name}, you will be {age + 5} in 5 years!")

Resources

Mini-Task: Build a tip calculator

Input: bill, tip %, people → Output: each person pays $X.XX


Week 2: Control Flow & Functions

TopicSyntax
if/elif/elseif x > 0: ...
Loopsfor i in range(10):, while x < 5:
List Comprehensions[x**2 for x in range(5)]
Functionsdef func_name(params):
*args, **kwargsOptional later

Practice

def grade_score(score):
    if score >= 90: return "A"
    elif score >= 80: return "B"
    # ...

Resources

Project: FizzBuzz + Prime Checker

Write two functions:

  1. fizzbuzz(n) → prints 1 to n with rules
  2. is_prime(n) → returns True/False

Week 3: Data Structures Deep Dive

StructureUse Case
listOrdered, mutable
tupleImmutable, faster
dictKey-value pairs
setUnique, unordered

Key Methods

# List
lst = [1, 2, 3]
lst.append(4), lst.pop(), lst[1:3]

# Dict
d = {"name": "Alex", "age": 25}
d.keys(), d.values(), d.items()

# Set
a = {1,2,3}; b = {3,4,5}; a & b  # intersection

Practice

# Count word frequency
text = "the cat and the dog and the bird"
words = text.split()
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1

Project: To-Do List CLI App

Add, remove, list tasks → save to .txt


Week 4: File Handling + Error Handling

TopicCode
Read/Writewith open('file.txt', 'r') as f:
CSVimport csv
JSONimport json
Try/Excepttry: ... except ValueError:

Example: Read CSV

import csv
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['age'])

Resources

Mini-Project: Student Gradebook

Read grades.csv → calculate average → write summary.txt


Week 5: NumPy – Numerical Python

ConceptCode
Arraysnp.array([1,2,3])
Shape.shape, .reshape()
Mathnp.mean(), np.std()
IndexingBoolean, fancy
Broadcastingarr + 5

Practice

import numpy as np
arr = np.random.randn(1000)
print(f"Mean: {arr.mean():.2f}, Std: {arr.std():.2f}")

Resources

Task:

Generate 1000 random heights (normal dist: μ=170, σ=10) → find % > 180 cm


Week 6: Pandas – Data Manipulation

Core ObjectUse
Series1D labeled array
DataFrame2D table

Essential Methods

TaskCode
Read CSVpd.read_csv('file.csv')
View.head(), .info(), .describe()
Selectdf['col'], df.loc[], df.iloc[]
Filterdf[df['age'] > 30]
GroupBydf.groupby('city').mean()
Mergepd.merge(df1, df2, on='id')

Example

import pandas as pd
df = pd.read_csv("titanic.csv")
df = df.dropna(subset=['Age'])
adults = df[df['Age'] > 18]
survival_rate = adults['Survived'].mean()

Resources

Practice Dataset: Titanic


Week 7: Data Cleaning & EDA

TaskCode
Missing Valuesdf.isnull().sum(), df.fillna(), df.dropna()
Duplicatesdf.duplicated(), df.drop_duplicates()
OutliersZ-score or IQR method
Type Fixdf['age'] = df['age'].astype(int)
New Columnsdf['family_size'] = df['sibsp'] + df['parch'] + 1

EDA Checklist

df.describe()
df['column'].value_counts()
df.corr()
sns.heatmap(df.corr(), annot=True)

Project: Titanic Survival Analysis

Clean data → EDA → answer:

  • Survival rate by gender?
  • Did age affect survival?
  • Fare vs survival?

Week 8: Mini-Project + GitHub

Final Project: Titanic Data Explorer

Deliverables:

  1. Jupyter Notebook: titanic_analysis.ipynb
  2. Cleaned dataset: titanic_clean.csv
  3. GitHub Repo: yourname/titanic-ds
  4. README.md with:
    • Problem statement
    • Key findings (3 bullet points)
    • Charts (embed or link)
    • How to run

GitHub Setup

git init
git add .
git commit -m "Titanic EDA complete"
git remote add origin https://github.com/yourname/titanic-ds.git
git push -u origin main

README Template

# Titanic Survival Analysis

## Key Insights
- Women survived at 74% vs men at 19%
- 1st class: 63% survival
- Children (<12) had highest survival

## How to Run
```bash
pip install pandas matplotlib seaborn
jupyter notebook titanic_analysis.ipynb

Visualizations

Survival by Gender


---

## Tools to Install (Week 1)
```bash
# Anaconda (recommended)
https://www.anaconda.com/products/distribution

# Or via pip
pip install pandas numpy matplotlib seaborn jupyter

Daily Learning Template (60 mins)

TimeActivity
10 minReview yesterday
30 minWatch/read new topic
15 minCode along
5 minWrite notes (Notion/Obsidian)

Assessment: Can You Do This?

TaskYes/No
Read CSV into DataFrame
Filter passengers > 30 years
Group by class and compute mean fare
Plot survival rate by gender
Save cleaned data to new CSV

If all Yes → You passed Phase 1!


Next: Phase 2 – Statistics & Math

“Garbage in, garbage out.” Learn why models work.


Free Resources Cheat Sheet

ResourceLink
Automate the Boring Stuffautomatetheboringstuff.com
Kaggle Pythonkaggle.com/learn/python
Kaggle Pandaskaggle.com/learn/pandas
Pandas 10minpandas.pydata.org/10min
NumPy Quickstartnumpy.org/quickstart

Pro Tip: Build a “Cheat Sheet”

Create python_cheat_sheet.md:

# Python for DS

## Pandas
df.head() → first 5 rows
df['col'].mean()
df.groupby('cat').size()

Update daily.


Start Now:

  1. Open terminal
  2. jupyter notebook
  3. Create week1_day1.ipynb
  4. Write: print("I will be a Data Scientist")

Tag me on LinkedIn when you push your first repo!
Let’s make Phase 1 legendary.