Hierarchical Mixture of Gaussians¶

Hierarchical Mixture of Gaussians (HMoG) models.

This module provides concrete implementations of hierarchical Gaussian models that combine linear Gaussian dimensionality reduction with Gaussian mixture clustering, enabling joint learning of latent factor representations and cluster assignments.

Model structure: HMoG models have two levels:

Lower harmonium: Maps observations \(X \in \mathbb{R}^p\) to first-level latent factors \(Y \in \mathbb{R}^d\) using a linear Gaussian relationship (factor analysis/PCA)
Upper harmonium: Models a mixture of Gaussians over the latent space \(Y\)

The joint distribution factors as:

\[p(X, Y, Z) = p(Z) \cdot p(Y | Z) \cdot p(X | Y)\]

where \(Z \in \{1,\ldots,K\}\) are discrete cluster assignments.

Variants: Three implementations with different analytical properties:

DifferentiableHMoG: Gradient-based optimization, uses restricted posterior covariance for efficiency (e.g., diagonal)
SymmetricHMoG: Symmetric posterior/prior structure, additional functionality like join_conjugated, but slower due to full covariance matrix operations
AnalyticHMoG: Fully analytic, enables closed-form EM and bidirectional parameter conversion

Factory functions (differentiable_hmog, symmetric_hmog, analytic_hmog) provide convenient construction for common configurations.

Class Hierarchy¶

Core Classes¶

class DifferentiableHMoG(lwr_hrm: LowerHarmonium, pst_upr_hrm: PstUpperHarmonium, prr_upr_hrm: PrrUpperHarmonium)[source]¶

Bases: DifferentiableHierarchical[NormalLGM, AnalyticMixture[Normal], Mixture[FullNormal]], Generic

Differentiable Hierarchical Mixture of Gaussians.

This model combines: 1. A linear Gaussian model (factor analysis) mapping observations to latents 2. A Gaussian mixture model over the latent space

Supports gradient-based optimization via log-likelihood descent. Uses full covariance Gaussians in the latent space.

Posterior vs Prior Structure: The posterior latent mixture (pst_upr_hrm) uses an AnalyticMixture with a restricted covariance structure for computational efficiency. The prior latent mixture (prr_upr_hrm) embeds the restricted structure into full covariance for conjugation parameter computation.

whiten_prior(means: Array) → Array[source]¶

Reparameterize the latent Y-space to have zero mean and identity covariance.

Preserves p(x) by updating both: - The lower LGM interaction (loading matrix + observable bias adjustment) - Each GMM component (via the existing Normal.whiten relative to GMM marginal)

posterior_categorical(params: Array, x: Array) → Array[source]¶

Compute posterior categorical distribution p(Z|x) in natural coordinates.

Returns the natural parameters of the categorical distribution over mixture components in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Array of shape (n_components-1,) with categorical natural parameters

posterior_soft_assignments(params: Array, x: Array) → Array[source]¶

Compute posterior assignment probabilities p(Z|x).

Returns the posterior probability distribution over mixture components in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Array of shape (n_components,) giving p(z_k|x) for each component k

posterior_hard_assignment(params: Array, x: Array) → Array[source]¶

Compute hard posterior assignments p(Z|x).

Returns the index of the most probable mixture component in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Integer index of the most probable component

class SymmetricHMoG(lwr_hrm: LowerHarmonium, upr_hrm: UpperHarmonium)[source]¶

Bases: SymmetricHierarchical[NormalAnalyticLGM, Mixture[FullNormal]], Generic

Symmetric Hierarchical Mixture of Gaussians.

This model supports gradient-based optimization with additional functionality (e.g., join_conjugated) not available in DifferentiableHMoG.

The symmetric structure means the posterior and conjugated latent spaces are the same, enabling bidirectional parameter transformations.

Trade-off: Matrix inversions happen in the space of full covariance matrices over the latent space, which can be slower than DifferentiableHMoG.

posterior_categorical(params: Array, x: Array) → Array[source]¶

Compute posterior categorical distribution p(Z|x) in natural coordinates.

Returns the natural parameters of the categorical distribution over mixture components in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Array of shape (n_components-1,) with categorical natural parameters

posterior_assignments(params: Array, x: Array) → Array[source]¶

Compute posterior assignment probabilities p(Z|x).

Returns the posterior probability distribution over mixture components in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Array of shape (n_components,) giving p(z_k|x) for each component k

class AnalyticHMoG(lwr_hrm: LowerHarmonium, upr_hrm: UpperHarmonium)[source]¶

Bases: AnalyticHierarchical[NormalAnalyticLGM, AnalyticMixture[FullNormal]], Generic

Analytic Hierarchical Mixture of Gaussians.

This model enables: - Closed-form EM algorithm for learning (from AnalyticConjugated) - Bidirectional parameter conversion (mean <-> natural) - Full analytical tractability

Requires full covariance Gaussians in the latent space.

expectation_maximization(params: Array, xs: Array) → Array[source]¶

Perform a single iteration of EM with latent-prior whitening.

HMoG has the same latent-space non-identifiability as FA/PCA. After the E-step, whiten the latent prior in mean coordinates before mapping back to natural coordinates.

whiten_prior(means: Array) → Array[source]¶

Reparameterize the latent Y-space to have zero mean and identity covariance.

Preserves p(x) by updating both: - The lower LGM interaction (loading matrix + observable bias adjustment) - Each GMM component (via the existing Normal.whiten relative to GMM marginal)

posterior_categorical(params: Array, x: Array) → Array[source]¶

Compute posterior categorical distribution p(Z|x) in natural coordinates.

Returns the natural parameters of the categorical distribution over mixture components in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Array of shape (n_components-1,) with categorical natural parameters

posterior_assignments(params: Array, x: Array) → Array[source]¶

Compute posterior assignment probabilities p(Z|x).

Returns the posterior probability distribution over mixture components in the latent space given an observation.

Parameters:

params – Model parameters (natural coordinates)
x – Observable data point

Returns:

Array of shape (n_components,) giving p(z_k|x) for each component k

Factory Functions¶

differentiable_hmog(obs_dim: int, obs_rep: ObsRep, lat_dim: int, pst_rep: PstRep, n_components: int) → DifferentiableHMoG[source]¶

Create a differentiable hierarchical mixture of Gaussians model.

This function constructs a hierarchical model combining: 1. A bottom layer with a linear Gaussian model reducing observables to first-level latents 2. A top layer with a Gaussian mixture model for modelling the latent distribution

This model supports optimization via log-likelihood gradient descent. Uses full covariance Gaussians in the latent space.

symmetric_hmog(obs_dim: int, obs_rep: ObsRep, lat_dim: int, lat_rep: PositiveDefinite, n_components: int) → SymmetricHMoG[source]¶

Create a symmetric hierarchical mixture of Gaussians model.

Supports optimization via log-likelihood gradient descent with additional functionality (e.g., join_conjugated) not available in DifferentiableHMoG. The symmetric structure means posterior and prior use the same latent parameterization.

Trade-off: Matrix inversions happen in the space of full covariance matrices over the latent space, which can be slower than DifferentiableHMoG.

analytic_hmog(obs_dim: int, obs_rep: ObsRep, lat_dim: int, n_components: int) → AnalyticHMoG[source]¶

Create an analytic hierarchical mixture of Gaussians model.

Enables closed-form expectation-maximization for learning and bidirectional parameter conversion between natural and mean coordinates. Requires full covariance Gaussians in the latent space for complete analytical tractability.