plsa.algorithms.result module¶
-
class
plsa.algorithms.result.PlsaResult(topic_given_doc: numpy.ndarray, word_given_topic: numpy.ndarray, topic_given_word: numpy.ndarray, topic: numpy.ndarray, kl_divergences: List[float], corpus: plsa.corpus.Corpus, tf_idf: bool)¶ Bases:
objectContainer for the results generated by a (conditional) PLSA run.
Parameters: - topic_given_doc (ndarray) – The conditional probability p(t|d) as \(n_{topics}\times n_{docs}\) array.
- word_given_topic (ndarray) – The conditional probability p(w|t) as \(n_{words}\times n_{topics}\) array.
- topic_given_word (ndarray) – The conditional probability p(t|w) as \(n_{topics}\times n_{words}\) array.
- topic (ndarray) – The marginal probability p(w).
- kl_divergences (list of float) – The Kullback-Leibler divergences between the original document-word probability p(d, w) and its approximate for each iteration.
- corpus (Corpus) – The original corpus the PLSA model was trained on.
- tf_idf (bool) – Whether to weigh the document.word matrix with the inverse document frequencies or not.
-
convergence¶ The convergence of the Kullback-Leibler divergence.
-
kl_divergence¶ KL-divergence of approximate and true document-word probability.
-
n_topics¶ The number of latent topics identified.
-
predict(doc: str) → Tuple[numpy.ndarray, int, Tuple[str, ...]]¶ Predict the relative importance of latent topics in a new document.
Parameters: doc (str) – A new document given as a single string. Returns: - ndarray – A 1-D array with the relative importance of latent topics.
- int – The number of words in the new document that were not present in the corpus the PLSA model was trained on.
- tuple of str – Those words in the new document that were not present in the corpus the PLSA model was trained on.
Raises: ValueError– If the document to predict on is an empty string, if there are no words left after preprocessing the document, or if there are no known words in the document.
-
tf_idf¶ Used inverse document frequency to weigh the document-word counts?
-
topic¶ The relative importance of latent topics.
-
topic_given_doc¶ The relative importance of latent topics in each document.
Dimensions are \(n_{docs} \times n_{topics}\).
-
word_given_topic¶ The words in each latent topic and their relative importance.
Results are presented as a tuple of 2-tuples (word, word importance).