plsa.algorithms.result module¶

class plsa.algorithms.result.PlsaResult(topic_given_doc: numpy.ndarray, word_given_topic: numpy.ndarray, topic_given_word: numpy.ndarray, topic: numpy.ndarray, kl_divergences: List[float], corpus: plsa.corpus.Corpus, tf_idf: bool)¶

Bases: object

Container for the results generated by a (conditional) PLSA run.

Parameters:

topic_given_doc (ndarray) – The conditional probability p(t|d) as \(n_{topics}\times n_{docs}\) array.
word_given_topic (ndarray) – The conditional probability p(w|t) as \(n_{words}\times n_{topics}\) array.
topic_given_word (ndarray) – The conditional probability p(t|w) as \(n_{topics}\times n_{words}\) array.
topic (ndarray) – The marginal probability p(w).
kl_divergences (list of float) – The Kullback-Leibler divergences between the original document-word probability p(d, w) and its approximate for each iteration.
corpus (Corpus) – The original corpus the PLSA model was trained on.
tf_idf (bool) – Whether to weigh the document.word matrix with the inverse document frequencies or not.

convergence¶: The convergence of the Kullback-Leibler divergence.

kl_divergence¶: KL-divergence of approximate and true document-word probability.

n_topics¶: The number of latent topics identified.

predict(doc: str) → Tuple[numpy.ndarray, int, Tuple[str, ...]]¶

Predict the relative importance of latent topics in a new document.

Parameters:	doc (str) – A new document given as a single string.
Returns:	ndarray – A 1-D array with the relative importance of latent topics. int – The number of words in the new document that were not present in the corpus the PLSA model was trained on. tuple of str – Those words in the new document that were not present in the corpus the PLSA model was trained on.
Raises:	`ValueError` – If the document to predict on is an empty string, if there are no words left after preprocessing the document, or if there are no known words in the document.

tf_idf¶: Used inverse document frequency to weigh the document-word counts?

topic¶: The relative importance of latent topics.

topic_given_doc¶

The relative importance of latent topics in each document.

Dimensions are \(n_{docs} \times n_{topics}\).

word_given_topic¶

The words in each latent topic and their relative importance.

Results are presented as a tuple of 2-tuples (word, word importance).