Multi-Class Classification Visualization

Multi-Class Classification

Linear Discriminant Functions

Introduction to Multi-Class Linear Classification
Interactive Visualization: Binary Case (K=2)
Three Classes (K=3)

Introduction to Multi-Class Linear Classification

Rather than decomposing a multi-class problem into multiple binary classification tasks, we can construct a unified model that handles all $K$ classes simultaneously. This approach yields a more elegant formulation and possesses desirable theoretical properties, including the natural emergence of convex decision regions.

The Multi-Class Discriminant Model For a $K$-class problem, we define $K$ discriminant functions, one for each class: $y_k(\mathbf{x}) = \mathbf{w}_k^\top \mathbf{x} + w_{k0}, \quad k = 1, 2, \ldots, K$ where $\mathbf{w}_k \in \mathbb{R}^D$ is the weight vector for class $k$, $w_{k0}$ is the bias term, and $y_k(\mathbf{x})$ is the score (or logit) assigned to class $k$.

Decision Rule Given an input $\mathbf{x}$, we compute all $K$ discriminant values and assign to the class with highest score: $\hat{k} = \arg\max_{j \in \{1, \ldots, K\}} y_j(\mathbf{x})$ We classify $\mathbf{x}$ into class $C_k$ if and only if: $y_k(\mathbf{x}) > y_j(\mathbf{x}) \quad \text{for all } j \neq k$

Comparison: Binary vs Multi-Class Classification

Fundamental Difference for 2 Classes:

Binary Classification Approach:

• Uses a SINGLE weight vector $\mathbf{w}$

• One discriminant: $y(\mathbf{x}) = \mathbf{w}^\top\mathbf{x} + b$

• Decision: Class 1 if $y(\mathbf{x}) > 0$, Class 2 if $y(\mathbf{x}) < 0$

Multi-Class Approach (even for K=2):

• Uses TWO weight vectors: $\mathbf{w}_1$ and $\mathbf{w}_2$

• Two discriminants: $y_1(\mathbf{x}) = \mathbf{w}_1^\top\mathbf{x} + b_1$ and $y_2(\mathbf{x}) = \mathbf{w}_2^\top\mathbf{x} + b_2$

• Decision: Class 1 if $y_1(\mathbf{x}) > y_2(\mathbf{x})$, Class 2 if $y_2(\mathbf{x}) > y_1(\mathbf{x})$

Why This Changes the Interpretation:

In Binary: The single $\mathbf{w}$ is perpendicular to the decision boundary and points from Class 2 toward Class 1. The boundary is defined by $\mathbf{w}^\top\mathbf{x} + b = 0$.

In Multi-Class: Each $\mathbf{w}_k$ represents the “direction of preference” for class $C_k$. Neither $\mathbf{w}_1$ nor $\mathbf{w}_2$ is perpendicular to the boundary! Instead:

• $\mathbf{w}_1$ points in the direction where $y_1$ increases (favoring Class 1)

• $\mathbf{w}_2$ points in the direction where $y_2$ increases (favoring Class 2)

• The boundary between classes occurs where $y_1(\mathbf{x}) = y_2(\mathbf{x})$

• The normal to this boundary is $(\mathbf{w}_1 – \mathbf{w}_2)$, NOT $\mathbf{w}_1$ or $\mathbf{w}_2$ alone

Geometric Properties of Decision Boundaries: The decision boundary separating classes $C_k$ and $C_j$ corresponds to the hyperplane where the two discriminant functions are equal, i.e., where $y_k(\mathbf{x}) = y_j(\mathbf{x})$. This can be expressed in standard hyperplane form as:

$(\mathbf{w}_k – \mathbf{w}_j)^\top \mathbf{x} + (w_{k0} – w_{j0}) = 0$

The decision region associated with each class is defined as the intersection of $K-1$ half-spaces, thereby forming a convex polyhedron. This constitutes a significant advantage over alternative approaches such as one-versus-rest or one-versus-one decomposition: the multi-class discriminant formulation inherently produces convex, singly-connected decision regions, ensuring a geometrically well-behaved partition of the input space.

Interactive Visualization

Binary Case (K=2) in 2D Space

Key Observation: What Do Weight Vectors Represent?

$\mathbf{w}_1$ shows the direction where $y_1$ increases, not where “class $C_1$ is located”.

Moving in direction $\mathbf{w}_1$ makes $y_1$ grow, but classification depends on the relative comparison of $y_1$ vs $y_2$.

The green vector $(\mathbf{w}_1 – \mathbf{w}_2)$ actually points from $C_2$ toward $C_1$ across the boundary.

Class $C_1$

Class $C_2$

Decision Boundary

Click anywhere on the canvas to see the classification

Mathematical Setup

Understanding the Weight Vectors

The weight vector $\mathbf{w}_1$ indicates the direction in which the discriminant $y_1$ increases most rapidly, rather than directly specifying the location of class $C_1$ in the input space.

Motion along the direction of $\mathbf{w}_1$ increases the value of $y_1$, but the classification decision depends on the relative comparison between $y_1$ and $y_2$, not on the absolute value of either discriminant.

The green vector $(\mathbf{w}_1 – \mathbf{w}_2)$ represents the normal to the decision boundary, oriented from region $C_2$ toward region $C_1$.

Weight Vectors: $\mathbf{w}_1 = [2, 1]$, $b_1 = -3$
$\mathbf{w}_2 = [-1, 2]$, $b_2 = 1$

Discriminant Functions: $$y_1(\mathbf{x}) = 2x_1 + x_2 – 3$$ $$y_2(\mathbf{x}) = -x_1 + 2x_2 + 1$$

Decision Boundary ($y_1 = y_2$): $2x_1 + x_2 – 3 = -x_1 + 2x_2 + 1$ $\boxed{x_2 = 3x_1 – 4}$

Normal to Boundary: $$\mathbf{w}_1 – \mathbf{w}_2 = [3, -1]$$

Classification Rule: Predict $C_1$ if $y_1(\mathbf{x}) > y_2(\mathbf{x})$
Predict $C_2$ if $y_2(\mathbf{x}) > y_1(\mathbf{x})$

Three Classes (K=3)

Convex Decision Regions

Class $C_1$

Class $C_2$

Class $C_3$

Boundaries

Click anywhere to see which class wins

Three-Class Setup

Weight Vectors: $\mathbf{w}_1 = [1.5, 0.5]$, $b_1 = -1.5$
$\mathbf{w}_2 = [-0.8, 1.8]$, $b_2 = -0.5$
$\mathbf{w}_3 = [-0.7, -2.0]$, $b_3 = 3.5$

Discriminant Functions: $$y_1(\mathbf{x}) = 1.5x_1 + 0.5x_2 – 1.5$$ $$y_2(\mathbf{x}) = -0.8x_1 + 1.8x_2 – 0.5$$ $$y_3(\mathbf{x}) = -0.7x_1 – 2.0x_2 + 3.5$$

Three Decision Boundaries: $C_1$ vs $C_2$: Where $y_1(\mathbf{x}) = y_2(\mathbf{x})$
Normal: $\mathbf{w}_1 – \mathbf{w}_2 = [2.3, -1.3]$

$C_1$ vs $C_3$: Where $y_1(\mathbf{x}) = y_3(\mathbf{x})$
Normal: $\mathbf{w}_1 – \mathbf{w}_3 = [2.2, 2.5]$

$C_2$ vs $C_3$: Where $y_2(\mathbf{x}) = y_3(\mathbf{x})$
Normal: $\mathbf{w}_2 – \mathbf{w}_3 = [-0.1, 3.8]$

Convex Decision Regions in Multi-Class Linear Classification

For a classification problem involving $K$ classes with affine discriminant functions $y_k(\mathbf{x}) = \mathbf{w}_k^\top \mathbf{x} + b_k$, the decision rule assigns each point $\mathbf{x}$ to the class $C_i$ for which $y_i(\mathbf{x}) > y_j(\mathbf{x})$ holds for all $j \neq i$. This “winner-takes-all” strategy partitions the feature space $\mathbb{R}^D$ into $K$ convex regions.

Theorem (Convexity): The decision region $R_k = \{\mathbf{x} : y_k(\mathbf{x}) \geq y_j(\mathbf{x}) \text{ for all } j \neq k\}$ corresponding to any class $C_k$ forms a convex polyhedron. This result follows from the observation that $R_k$ can be expressed as the intersection of $K-1$ half-spaces, and since the intersection of convex sets is convex, $R_k$ must be convex.

Decision Boundaries: The boundary separating classes $C_i$ and $C_j$ is characterized by the hyperplane where $y_i(\mathbf{x}) = y_j(\mathbf{x})$, or equivalently $(\mathbf{w}_i – \mathbf{w}_j)^\top \mathbf{x} + (b_i – b_j) = 0$. The normal vector to this hyperplane is $(\mathbf{w}_i – \mathbf{w}_j)$, oriented from region $C_j$ toward region $C_i$.

Mathematical Property (Concurrency): In the three-class setting, the boundaries defined by $y_1=y_2$, $y_1=y_3$, and $y_2=y_3$ are linearly dependent, as the relation $(y_1-y_3) = (y_1-y_2) + (y_2-y_3)$ holds identically. Consequently, these three hyperplanes must intersect at a single point (under non-degenerate conditions), which is the unique solution to the system:

$\begin{bmatrix} (\mathbf{w}_1-\mathbf{w}_2)^\top \\ (\mathbf{w}_1-\mathbf{w}_3)^\top \end{bmatrix} \mathbf{x} = -\begin{bmatrix} b_1-b_2 \\ b_1-b_3 \end{bmatrix}$

Class Regions: Each class occupies the region where its discriminant function dominates all others:

• $C_1$: $\{\mathbf{x} : y_1(\mathbf{x}) \geq y_2(\mathbf{x}) \text{ AND } y_1(\mathbf{x}) \geq y_3(\mathbf{x})\}$

• $C_2$: $\{\mathbf{x} : y_2(\mathbf{x}) \geq y_1(\mathbf{x}) \text{ AND } y_2(\mathbf{x}) \geq y_3(\mathbf{x})\}$

• $C_3$: $\{\mathbf{x} : y_3(\mathbf{x}) \geq y_1(\mathbf{x}) \text{ AND } y_3(\mathbf{x}) \geq y_2(\mathbf{x})\}$

Table of Contents