Lectures 7 & 8: Feature and Corner Detection
Moravec SSD Function
Measures intensity variation when a window is shifted in a discrete direction $(x, y)$.
\[E_{m,n}(x,y) = \sum_{(u,v) \in W} \big[I(m+u, n+v) - I(m+x+u, n+y+v)\big]^2\]
Moravec Corner Response
Takes the minimum Sum of Squared Differences (SSD) over all evaluated shift directions.
\[F_{m,n} = \min_{(x,y) \in D} E_{m,n}(x,y)\]
Image Gradients
Measures horizontal and vertical intensity changes per pixel.
\[I_x = \dfrac{\partial I}{\partial x}, \qquad I_y = \dfrac{\partial I}{\partial y}\]
Gradient Vector & Direction
Identifies the magnitude and direction angle of the strongest local intensity change.
\[\nabla f = \left(\dfrac{\partial f}{\partial x}, \dfrac{\partial f}{\partial y}\right), \qquad \theta_g = \text{atan2}\!\left(\dfrac{\partial f}{\partial y}, \dfrac{\partial f}{\partial x}\right)\]
Structure Tensor Matrix ($M$)
Summarizes local intensity variation using gradients in a neighborhood weighted by $w(x, y)$.
\[M = \sum w(x,y) \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix}\]
Harris SSD Taylor Approximation
Simplifies the continuous intensity variation function into a quadratic matrix expression.
\[E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix} u \\ v \end{bmatrix}\]
Eigenvector & Characteristic Equations
Used to explicitly compute eigenvalues $\lambda_1, \lambda_2$ from the structure tensor matrix.
\[(M - \lambda I)e = 0, \qquad \det(M - \lambda I) = 0\]
Determinant & Trace
Calculated directly from the structure tensor coefficients, avoiding the need to solve quadratic equations.
\[\det(M) = \lambda_1 \lambda_2, \qquad \text{trace}(M) = \lambda_1 + \lambda_2\]
Harris Response Function
Evaluates corner strength using a sensitivity coefficient $\alpha$ (typically $0.04$ to $0.06$).
\[R = \det(M) - \alpha \cdot \text{trace}(M)^2\]
Shi-Tomasi Response
Directly selects the smaller eigenvalue as the corner strength metric.
\[R = \min(\lambda_1, \lambda_2)\]
Noble Response
An alternative response function used for improved numerical stability.
\[R = \dfrac{\det(M)}{\text{trace}(M) + \epsilon}\]
Lecture 9: Feature Descriptors (SIFT)
SIFT Descriptor Vector Dimensions
Formed by combining $4 \times 4 = 16$ spatial grid cells, each storing an 8-bin gradient orientation histogram.
\[16 \text{ cells} \times 8 \text{ bins} = 128 \text{ dimensions}\]
Lecture 10 I: Hough Transform
Slope-Intercept Representation
Computes intercept parameters in Hough space for an edge point $(x_i, y_i)$.
\[b = -x_i \cdot m + y_i\]
Polar Line Representation
Standard normal representation to prevent slope parameters from reaching infinity on vertical lines.
\[\rho = x\cos\theta + y\sin\theta\]
Accumulator Bin Indexing
Maps continuous line parameters into discrete parameter bins.
\[\rho_i = \text{round}\!\left(\dfrac{\rho}{\Delta\rho}\right), \qquad \theta_j = \text{round}\!\left(\dfrac{\theta}{\Delta\theta}\right)\]
Accumulator Increment
Increments voting array cell $(i, j)$ in parameter space.
\[H(\rho_i, \theta_j) = H(\rho_i, \theta_j) + 1\]
Circle Representation
Base analytical equation for circle boundary points.
\[(x-a)^2 + (y-b)^2 = r^2\]
Gradient-Guided Circle Centers
Restricts parameter voting to the normal direction $\theta_g$ to speed up computation.
\[a = x \pm r\cos\theta_g, \qquad b = y \pm r\sin\theta_g\]
Lecture 10 II: K-Means, BoVW & SVM
K-Means Objective (SSD)
Objective function minimizing distances between data points and centroids.
\[\text{SSD} = \sum_{\text{clusters}} \sum_{p_j \in c_i} \lVert p_j - c_i \rVert^2\]
K-Means Point Assignment
Assigns a point $p$ to the nearest cluster centroid $c_i$.
\[\lVert p - c_i \rVert < \lVert p - c_j \rVert \quad \text{for all } j \neq i\]
K-Means Cluster Center Update
Recomputes the cluster centroid as the average of its assigned points.
\[c_i = \dfrac{1}{N_i} \sum p_j\]
Term Frequency (TF)
Measures the proportional occurrence frequency of visual word $v_i$ inside an image.
\[\text{TF}(v_i) = \dfrac{\text{count}(v_i \text{ in image})}{\text{total visual words}}\]
Inverse Document Frequency (IDF)
Measures the rarity of visual word $v_i$ across a dataset of size $N$.
\[\text{IDF}(v_i) = \log\!\left(\dfrac{N}{df_i}\right)\]
TF-IDF Weighting
Combines word frequency and document rarity to weight features.
\[\text{TF-IDF}(v_i) = \text{TF}(v_i) \times \text{IDF}(v_i)\]
SVM Decision Function
Defines the classification prediction label using support weights and bias.
\[f(x) = \text{sign}\big(w^T \varphi(x) + b\big)\]
Lecture 11: Deep Learning in Computer Vision
CNN Output Spatial Size (without padding)
Computes output dimension for input size $N$, filter size $F$, and stride $S$.
\[\text{Output size} = \left\lfloor \dfrac{N-F}{S} \right\rfloor + 1\]
Sigmoid Activation
Outputs range $(0, 1)$ representing probabilities.
\[\sigma(x) = \dfrac{1}{1 + e^{-x}}\]
tanh Activation
Outputs range $(-1, 1)$, centered at zero.
\[\tanh(x)\]
ReLU Activation
Keeps positive values and zeroes out negative ones.
\[\max(0, x)\]
Leaky ReLU Activation
Allows a small positive leakage gradient of $0.1$ when inactive.
\[\max(0.1x, x)\]
Maxout Activation
Evaluates maximum over two parallel linear transformations.
\[\max(w_1^T x + b_1,\ w_2^T x + b_2)\]
ELU Activation
Smooth curve for negative inputs based on hyperparameter $\alpha$.
\[x \ \text{if } x \ge 0 \text{ else } \alpha(e^x - 1)\]
Categorical Cross-Entropy Loss
Computes negative log probability of the true target class.
\[L = -\log(P_{true})\]
Lecture 12: Evaluation Metrics
Accuracy
Overall fraction of correct predictions.
\[\text{Accuracy} = \dfrac{TP+TN}{TP+TN+FP+FN}\]
Precision
Measures prediction purity (what fraction of positives are correct).
\[\text{Precision} = \dfrac{TP}{TP+FP}\]
Recall
Measures detection completeness (what fraction of actual positives are found).
\[\text{Recall} = \dfrac{TP}{TP+FN}\]
F1-Score
Harmonic mean balancing precision and recall.
\[F_1 = \dfrac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision}+\text{Recall}} = \dfrac{2TP}{2TP+FP+FN}\]
Intersection over Union (IoU)
Measures bounding box or pixel mask overlap.
\[\text{IoU} = \dfrac{\text{Intersection}}{\text{Union}}\]
mean Average Precision (mAP)
Average of the area under the precision-recall curve across all classes.
\[\text{mAP} = \dfrac{AP_1+AP_2+\dots+AP_n}{n}\]