Hierarchical Mixture of Gaussians¶
Hierarchical Mixture of Gaussians (HMoG) models.
This module provides concrete implementations of hierarchical Gaussian models that combine linear Gaussian dimensionality reduction with Gaussian mixture clustering, enabling joint learning of latent factor representations and cluster assignments.
Model structure: HMoG models have two levels:
Lower harmonium: Maps observations \(X \in \mathbb{R}^p\) to first-level latent factors \(Y \in \mathbb{R}^d\) using a linear Gaussian relationship (factor analysis/PCA)
Upper harmonium: Models a mixture of Gaussians over the latent space \(Y\)
The joint distribution factors as:
where \(Z \in \{1,\ldots,K\}\) are discrete cluster assignments.
Variants: Three implementations with different analytical properties:
DifferentiableHMoG: Gradient-based optimization, uses restricted posterior covariance for efficiency (e.g., diagonal)
SymmetricHMoG: Symmetric posterior/prior structure, additional functionality like join_conjugated, but slower due to full covariance matrix operations
AnalyticHMoG: Fully analytic, enables closed-form EM and bidirectional parameter conversion
Factory functions (differentiable_hmog, symmetric_hmog, analytic_hmog) provide convenient construction for common configurations.
Class Hierarchy¶
Core Classes¶
- class DifferentiableHMoG(lwr_hrm: LowerHarmonium, pst_upr_hrm: PstUpperHarmonium, prr_upr_hrm: PrrUpperHarmonium)[source]¶
Bases:
DifferentiableHierarchical[NormalLGM,AnalyticMixture[Normal],Mixture[FullNormal]],GenericDifferentiable Hierarchical Mixture of Gaussians.
This model combines: 1. A linear Gaussian model (factor analysis) mapping observations to latents 2. A Gaussian mixture model over the latent space
Supports gradient-based optimization via log-likelihood descent. Uses full covariance Gaussians in the latent space.
Posterior vs Prior Structure: The posterior latent mixture (pst_upr_hrm) uses an AnalyticMixture with a restricted covariance structure for computational efficiency. The prior latent mixture (prr_upr_hrm) embeds the restricted structure into full covariance for conjugation parameter computation.
- whiten_prior(means: Array) Array[source]¶
Reparameterize the latent Y-space to have zero mean and identity covariance.
Preserves p(x) by updating both: - The lower LGM interaction (loading matrix + observable bias adjustment) - Each GMM component (via the existing Normal.whiten relative to GMM marginal)
- posterior_categorical(params: Array, x: Array) Array[source]¶
Compute posterior categorical distribution p(Z|x) in natural coordinates.
Returns the natural parameters of the categorical distribution over mixture components in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Array of shape (n_components-1,) with categorical natural parameters
- posterior_soft_assignments(params: Array, x: Array) Array[source]¶
Compute posterior assignment probabilities p(Z|x).
Returns the posterior probability distribution over mixture components in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Array of shape (n_components,) giving p(z_k|x) for each component k
- posterior_hard_assignment(params: Array, x: Array) Array[source]¶
Compute hard posterior assignments p(Z|x).
Returns the index of the most probable mixture component in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Integer index of the most probable component
- class SymmetricHMoG(lwr_hrm: LowerHarmonium, upr_hrm: UpperHarmonium)[source]¶
Bases:
SymmetricHierarchical[NormalAnalyticLGM,Mixture[FullNormal]],GenericSymmetric Hierarchical Mixture of Gaussians.
This model supports gradient-based optimization with additional functionality (e.g., join_conjugated) not available in DifferentiableHMoG.
The symmetric structure means the posterior and conjugated latent spaces are the same, enabling bidirectional parameter transformations.
Trade-off: Matrix inversions happen in the space of full covariance matrices over the latent space, which can be slower than DifferentiableHMoG.
- posterior_categorical(params: Array, x: Array) Array[source]¶
Compute posterior categorical distribution p(Z|x) in natural coordinates.
Returns the natural parameters of the categorical distribution over mixture components in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Array of shape (n_components-1,) with categorical natural parameters
- posterior_assignments(params: Array, x: Array) Array[source]¶
Compute posterior assignment probabilities p(Z|x).
Returns the posterior probability distribution over mixture components in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Array of shape (n_components,) giving p(z_k|x) for each component k
- class AnalyticHMoG(lwr_hrm: LowerHarmonium, upr_hrm: UpperHarmonium)[source]¶
Bases:
AnalyticHierarchical[NormalAnalyticLGM,AnalyticMixture[FullNormal]],GenericAnalytic Hierarchical Mixture of Gaussians.
This model enables: - Closed-form EM algorithm for learning (from AnalyticConjugated) - Bidirectional parameter conversion (mean <-> natural) - Full analytical tractability
Requires full covariance Gaussians in the latent space.
- expectation_maximization(params: Array, xs: Array) Array[source]¶
Perform a single iteration of EM with latent-prior whitening.
HMoG has the same latent-space non-identifiability as FA/PCA. After the E-step, whiten the latent prior in mean coordinates before mapping back to natural coordinates.
- whiten_prior(means: Array) Array[source]¶
Reparameterize the latent Y-space to have zero mean and identity covariance.
Preserves p(x) by updating both: - The lower LGM interaction (loading matrix + observable bias adjustment) - Each GMM component (via the existing Normal.whiten relative to GMM marginal)
- posterior_categorical(params: Array, x: Array) Array[source]¶
Compute posterior categorical distribution p(Z|x) in natural coordinates.
Returns the natural parameters of the categorical distribution over mixture components in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Array of shape (n_components-1,) with categorical natural parameters
- posterior_assignments(params: Array, x: Array) Array[source]¶
Compute posterior assignment probabilities p(Z|x).
Returns the posterior probability distribution over mixture components in the latent space given an observation.
- Parameters:
params – Model parameters (natural coordinates)
x – Observable data point
- Returns:
Array of shape (n_components,) giving p(z_k|x) for each component k
Factory Functions¶
- differentiable_hmog(obs_dim: int, obs_rep: ObsRep, lat_dim: int, pst_rep: PstRep, n_components: int) DifferentiableHMoG[source]¶
Create a differentiable hierarchical mixture of Gaussians model.
This function constructs a hierarchical model combining: 1. A bottom layer with a linear Gaussian model reducing observables to first-level latents 2. A top layer with a Gaussian mixture model for modelling the latent distribution
This model supports optimization via log-likelihood gradient descent. Uses full covariance Gaussians in the latent space.
- symmetric_hmog(obs_dim: int, obs_rep: ObsRep, lat_dim: int, lat_rep: PositiveDefinite, n_components: int) SymmetricHMoG[source]¶
Create a symmetric hierarchical mixture of Gaussians model.
Supports optimization via log-likelihood gradient descent with additional functionality (e.g., join_conjugated) not available in DifferentiableHMoG. The symmetric structure means posterior and prior use the same latent parameterization.
Trade-off: Matrix inversions happen in the space of full covariance matrices over the latent space, which can be slower than DifferentiableHMoG.
- analytic_hmog(obs_dim: int, obs_rep: ObsRep, lat_dim: int, n_components: int) AnalyticHMoG[source]¶
Create an analytic hierarchical mixture of Gaussians model.
Enables closed-form expectation-maximization for learning and bidirectional parameter conversion between natural and mean coordinates. Requires full covariance Gaussians in the latent space for complete analytical tractability.