Goal: Geometric OptimizAtion Libraries¶
(G)eometric (O)ptimiz(A)tion (L)ibraries
A JAX framework for statistical modeling grounded in information geometry and exponential families.
Overview¶
Goal provides machine learning algorithms that operate on statistical manifolds — spaces where every point is a probability distribution. By embedding the mathematics of information geometry directly into its type system, Goal ensures that operations respect geometric constraints and enables efficient, composable algorithms for inference, learning, and model evaluation.
Mathematical Foundation¶
An exponential family is a collection of distributions with densities of the form
where \(\theta\) are the natural parameters, \(\mathbf{s}(x)\) the sufficient statistic, \(\mu(x)\) the base measure, and \(\psi(\theta)\) the log-partition function. These families carry a dually flat Riemannian structure: the natural parameters \(\theta\) and mean parameters \(\eta = \nabla\psi(\theta)\) form dual coordinate systems connected by Legendre duality.
Library Structure¶
The library is organized in two packages: geometry (abstract mathematical machinery) and models (concrete statistical distributions).
Geometry¶
The geometry package provides two layers:
Manifold Subpackage — Geometric primitives: manifolds, matrix representations, embeddings, and linear maps. A manifold is a stateless object that pairs a dimension with operations on flat JAX arrays.
Exponential Family Subpackage — Statistical manifolds with increasing capabilities:
ExponentialFamily — sufficient statistics and base measure
Generative — additionally supports sampling
Differentiable — analytic log-partition function, enabling gradient-based optimization
Analytic — analytic negative entropy, enabling closed-form algorithms (e.g. EM)
Composed models (harmoniums, graphical models) recapitulate this hierarchy, mixing their own structure with these capability levels.
Models¶
The models package builds concrete distributions on this foundation:
Base Distributions Subpackage — Fundamental exponential family distributions:
Distribution
Sufficient Statistic
Base Measure
Bernoulli
\(x \in \{0,1\}\)
\(0\)
Categorical
One-hot encoding for \(k > 0\)
\(0\)
Binomial
Count \(x\)
\(\log \binom{n}{x}\)
Poisson
Count \(k\)
\(-\log(k!)\)
CoM-Poisson
\((k, \log(k!))\)
\(0\)
von Mises
\((\cos(\theta), \sin(\theta))\)
\(-\log(2\pi)\)
Normal
\((x, x \otimes x)\)
\(-\frac{d}{2}\log(2\pi)\)
Boltzmann
\(x \otimes x\)
\(0\)
Harmonium Models Subpackage — Conjugate latent-variable models (harmoniums): mixtures, linear Gaussian models, and Poisson mixtures.
Graphical Models Subpackage — Multi-level hierarchical models composing harmoniums, e.g. mixture of factor analyzers.