Gensim: Wie speichern LDA-Modell produziert Themen, um ein lesbares format (csv,txt,etc)?

letzten Teile des Codes:

lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2)
print lda

bash-Ausgang:

INFO : adding document #0 to Dictionary(0 unique tokens)
INFO : built Dictionary(18 unique tokens) from 5 documents (total  20 corpus positions)
INFO : using serial LDA version on this node
INFO : running online LDA training, 2 topics, 1 passes over the supplied corpus of 5 documents, updating model once every 5 documents
WARNING : too few updates, training might not converge; consider increasing the number of passes to improve accuracy
INFO : PROGRESS: iteration 0, at document #5/5
INFO : 2/5 documents converged within 50 iterations
INFO : topic #0: 0.079*cute + 0.076*broccoli + 0.070*adopted + 0.069*yesterday + 0.069*eat + 0.069*sister + 0.068*kitten + 0.068*kittens + 0.067*bananas + 0.067*chinchillas
INFO : topic #1: 0.082*broccoli + 0.079*cute + 0.071*piece + 0.070*munching + 0.069*spinach + 0.068*hamster + 0.068*ate + 0.067*banana + 0.066*breakfast + 0.066*smoothie
INFO : topic diff=0.470477, rho=1.000000
<gensim.models.ldamodel.LdaModel object at 0x10f1f4050>

So, ich bin gefragt, ich bin in der Lage, speichern Sie die bearbeiteten Themen, die es erzeugt, zu einem lesbaren format. Ich habe versucht, die .save() Methoden, aber es gibt das Videosignal immer mit etwas unleserlich.

haben Sie versucht, regex? ich konfrontiert die gleiche Sache und bemerkt, dass jedes Element verhält sich wie ein string.

InformationsquelleAutor jeremy.ting | 2013-06-27

3

brauchen Sie nur zu verwenden lda.show_topics(topics=-1) oder eine beliebige Anzahl von Themen, die Sie haben möchten (Themen=10, Themen=15, - Themen=1000....). Ich bin in der Regel tun, nur:
```
logfile = open('.../yourfile.txt', 'a')
print>>logfile, lda.show_topics(topics=-1, topn=10)
```
Alle diese Parameter und andere, sind verfügbar in gensim Dokumentation.

InformationsquelleAutor Everst

Hier ist, wie zu speichern ein Modell für gensim LDA:

from gensim import corpora, models, similarities

# create corpus and dictionary
corpus = ...
dictionary = ...

# train model, this might takes time
model = models.LdaModel.LdaModel(corpus=corpus,id2word=dictionary, num_topics=200,passes=5, alpha='auto')
# save model to disk (no need to use pickle module)
model.save('lda.model')

Drucken Themen, hier sind ein paar Möglichkeiten:

# later on, load trained model from file
model =  models.LdaModel.load('lda.model')

# print all topics
model.show_topics(topics=200, topn=20)

# print topic 28
model.print_topic(109, topn=20)

# another way
for i in range(0, model.num_topics-1):
    print model.print_topic(i)

# and another way, only prints top words
for t in range(0, model.num_topics-1):
    print 'topic {}: '.format(t) + ', '.join([v[1] for v in model.show_topic(t, 20)])

InformationsquelleAutor Renaud

3

Können Sie verwenden pickle Modul.
```
import pickle
# your code
pickle.dump(lda,open(filename,'w'))
# you may load it back again
lda_copy = pickle.load(file(filename))
```
Beachten Sie, dass pickle in der Regel schreibt eine text-Datei, die zwar lesbar, möglicherweise nicht verständlich.
argh. ja, ich sah nur die Ergebnisse. kennen Sie ein Weg, um ziehen Sie einfach die Themen aus dem Paket, so dass die resultierende text-Datei, wird es leichter zu schrubben?
Sorry, ich kenne keine solche Möglichkeit.
Gurke wird nicht funktionieren, denn Sie speichern das gesamte Modell, nicht die Thema-Wörter...

InformationsquelleAutor Nik
0

.save() geben Sie das Modell selbst, nicht die Themen (also nicht lesbar-Ausgabe).

Mit:
```
with open('topic_file', 'w') as topic_file:
    topics=lda_model.top_topics(corpus)
    topic_file.write('\n'.join('%s %s' %topic for topic in topics))
```
Erhalten Sie eine lesbare Datei der Themen, die für alle Cluster mit den dazugehörigen Wahrscheinlichkeit.

InformationsquelleAutor Jane Illario

Schreibe einen Kommentar

Du musst angemeldet sein, um einen Kommentar abzugeben.