Narrowest Deep Learning (DL)

Auto-learns hierarchical representations from raw data. End-to-end, no hand-crafted features.

Subset of AI Machine Learning (ML)

Learns f(x)→y from data. Involves feature design, training, and evaluation.

Broadest Artificial Intelligence (AI)

Systems that perceive, reason, and act to achieve goals.

šŸ‘ Seeing → Computer Vision
šŸ‘‚ Hearing → Sound Recognition
šŸ—£ Talking → Speech Synthesis
šŸ’¬ Language → NLP
šŸ¤– Movement → Robotics
🧠 Reasoning → Auto. Reasoning
This course = Image Processing (signal-level operations) + ML/DL (decision-making) = complete vision pipelines.
1960s–70s: Early attempts
Vision seen as a "simple" problem (wrong). Limited compute & memory. Synthetic scenes, hand-crafted rules and heuristics.
No dataNo compute
1980s–90s: Geometry & statistics
Mathematical rigor. Edge & corner detection, feature extraction. Statistical pattern recognition. Face recognition emerges.
SobelCannyHarris
2000s: Data-driven & practical
Larger labeled datasets. More compute. Video analysis and real-time applications grow.
SIFTSURFReal-time
Today: Deep learning era
CNNs dominate. End-to-end learning. Significant gains in recognition, detection, segmentation. Classical methods still matter for efficiency & geometry.
CNNsTransformersE2E
Progress needs 3 things simultaneously
Sufficient data Enough compute Sound math models

Ventral Stream

"What" pathway

Task: Object identity & appearance
CV equiv.: Classification / Recognition
Q it answers: What am I looking at?

Dorsal Stream

"Where / How" pathway

Task: Spatial location & motion
CV equiv.: Detection / Tracking
Q it answers: Where is it?
Key insight
Detection = Recognition (what) + Localization (where). Detection is the harder task because it requires both.

Image Processing

Does: Transforms images
Level: Pixel-level operations
Output: An image
Meaning: No semantic meaning
Examples: Denoise, blur, resize

Computer Vision

Does: Interprets images
Level: Object/scene reasoning
Output: Information / decisions
Meaning: High-level understanding
Examples: Detect cars, classify faces
Every step introduces bias/artifacts: opticssensor noiseexposurecolor processingdenoisingcompression
Environment
Illumination changes, occlusion, clutter — same scene looks totally different.
Sensors & Data
Motion blur, noise, dataset bias — models trained on curated data fail in deployment.
Deployment
Real-time constraints, limited memory, low power. High benchmark accuracy ≠ robustness in the wild.
Image Enhancement
Noise reduction, contrast adjustment, sharpening. Improves visual appearance.
Image Restoration
Denoising, deblurring, correcting sensor-level artifacts. Reverses degradation.
Geometric Transforms
Scaling, rotation, alignment, warping. Manipulates spatial coordinates.
Representation Change
Color space conversions, filtering, and frequency-domain analysis.
What is present?
Object and scene recognition (Image Classification).
Where is it?
Spatial localization (Object Detection or Segmentation).
What is the geometry?
Depth, scale, pose, and 3D scene structure.
How does it change?
Motion estimation, object tracking, and optical flow over time.

Everyday Systems

Auto Driving: Lane/pedestrian detection, collision avoidance (Tesla, Waymo, Mobileye)
Face ID: Identity verification & authentication (Apple Face ID, Android Face Unlock)
Augmented Reality: Face filters, environment tracking (Snapchat, Instagram, TikTok)

Professional Systems

Healthcare: Tumor detection, X-ray & MRI analysis (Google Health, Aidoc)
Security: Person tracking, anomaly detection, crowd surveillance
Interaction: Gesture recognition, body tracking (Microsoft Kinect, Meta Quest)
Raw Image / Video
→
Preprocessing
denoise, resize, normalize
→
Feature Extraction
corners, edges, descriptors
→
ML / DL Model
inference
→
Output
labels / boxes / masks
Optics
lens, aperture
→
Sensor + AFE
+ CFA
→
ISP
demosaic, WB, denoise
→
Post-capture
enhance, compress
CFA = Color Filter Array  |  AFE = Analog Front-End  |  WB = White Balance  |  ISP = Image Signal Processor
Quantification
Sampling rate, resolution, intensity quantization, aliasing
Transformation
Geometric (scale, rotate, warp), intensity (gamma, histogram eq.)
Filtering
Linear & non-linear filters, denoising, edge enhancement, frequency domain
Analysis
Pixel stats, region properties, low-level descriptors
Corner detection
High local intensity variation → Harris, Shi-Tomasi
Edge detection
Strong intensity gradients at boundaries → Sobel, Canny
Feature matching
Local descriptors for correspondence across images → SIFT, SURF, ORB
Visual Cheat Sheet Summary
1-Image Summary
50-Question Practice Quiz
This comprehensive practice quiz contains 50 multiple-choice questions loaded directly from the lecture database.
25-Question True/False Practice
Answer each statement, reveal optional hints, and review the explanation after submitting.
Figures Extracted from the Original Lecture Document
These figures are preserved in their original document order as a complete visual reference. Captions identify the source part and figure number; explanatory text remains in the study-guide sections.
Lecture 1 — original figure 1
Lecture 1 — original figure 1
Lecture 1 — original figure 2
Lecture 1 — original figure 2
Lecture 1 — original figure 3
Lecture 1 — original figure 3
Lecture 1 — original figure 4
Lecture 1 — original figure 4