Report a bug
If you spot a problem with this page, click here to create a Github issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using a local clone.


Online variational Bayes for latent Dirichlet allocation

References: Hoffman, Matthew D., Blei, David M. and Bach, Francis R.. "Online Learning for Latent Dirichlet Allocation.." Paper presented at the meeting of the NIPS, 2010.

Ilya Yaroshenko
struct LdaHoffman(F) if (isFloatingPoint!F);
Batch variational Bayes for LDA with mini-batches.
this(size_t K, size_t W, size_t D, F alpha, F eta, F tau0, F kappa, F eps = 1e-05, TaskPool tp = taskPool());
size_t K theme count
size_t W dictionary size
size_t D approximate total number of documents in a collection.
F alpha Dirichlet document-topic prior (0.1)
F eta Dirichlet word-topic prior (0.1)
F tau0 𝞽0 ≧ 0 slows down the early iterations of the algorithm.
F kappa 𝞳 ∈ (0.5, 1], controls the rate at which old values of 𝝺 are forgotten. 𝝺 = (1 - 𝞀(𝞽)) 𝝺 + 𝞀 𝝺', 𝞀(𝞽) = (𝞽0 + 𝞽)^(-𝞳). Use 𝞳 = 0 for Batch variational Bayes LDA.
F eps Stop iterations if ||𝝺 - 𝝺'||_l1 < s * eps, where s is a documents count in a batch.
TaskPool tp task pool
void updateBeta();
@property Slice!(Contiguous, [2], F*) beta();
Posterior over the topics
@property Slice!(Contiguous, [2], F*) lambda();
Parameterized posterior over the topics.
const @property F tau();

@property void tau(F v);
Count of already seen documents. Slows down the iterations of the algorithm.
size_t putBatch(SliceKind kind, C, I, J)(Slice!(kind, [1], FieldIterator!(CompressedField!(C, I, J))) n, size_t maxIterations);
Accepts mini-batch and performs multiple E-step iterations for each document and single M-step.
This implementation is optimized for sparse documents, which contain much less unique words than a dictionary.
Slice!(kind, [1], FieldIterator!(CompressedField!(C, I, J))) n mini-batch, a collection of compressed documents.
size_t maxIterations maximal number of iterations for single document in a batch for E-step.