Why are the signs of my topic weights changing from run to run?

Question

Why are the signs of my topic weights changing from run to run?

169 views Asked by mudstick At 22 September 2020 at 22:43

I'm running the LSI program from Gensim's Topics and Transformations tutorial and for some reason, the signs of the topic weights keep switching from positive to negative and vice versa. For example, this is what I get when I print using the line

for doc, as_text in zip(corpus_lsi, documents):
    print(doc, as_text)

Run 1
[(0, 0.066007833960900791), (1, 0.52007033063618491), (2, -0.37649581219168904)]
[(0, 0.196675928591421), (1, 0.7609563167700063), (2, 0.5080674581001664)]
[(0, 0.089926399724459982), (1, 0.72418606267525132), (2, -0.408989731553764)]
[(0, 0.075858476521777865), (1, 0.63205515860034334), (2, -0.53935336057339001)]
[(0, 0.10150299184979866), (1, 0.57373084830029653), (2, 0.67093385852959075)]
[(0, 0.70321089393783254), (1, -0.1611518021402539), (2, -0.18266089635241448)]
[(0, 0.87747876731198449), (1, -0.16758906864658912), (2, -0.10880822642632856)]
[(0, 0.90986246868185872), (1, -0.14086553628718496), (2, 0.00087117874886860625)]
[(0, 0.61658253505692762), (1, 0.053929075663897361), (2, 0.25568697959599318)]

Run 2
[(0, 0.066007833960908563), (1, -0.52007033063618446), (2, -0.37649581219168959)]
[(0, 0.19667592859143226), (1, -0.76095631677000253), (2, 0.50806745810016629)]
[(0, 0.089926399724470751), (1, -0.72418606267525032), (2, -0.40898973155376284)]
[(0, 0.075858476521787177), (1, -0.63205515860034223), (2, -0.5393533605733889)]
[(0, 0.10150299184980684), (1, -0.57373084830029419), (2, 0.67093385852959098)]
[(0, 0.70321089393782976), (1, 0.16115180214026417), (2, -0.18266089635241456)]
[(0, 0.87747876731198149), (1, 0.16758906864660211), (2, -0.10880822642632891)]
[(0, 0.90986246868185627), (1, 0.14086553628719861), (2, 0.00087117874886795399)]
[(0, 0.61658253505692828), (1, -0.053929075663887563), (2, 0.25568697959599251)]

Run 3
[(0, 0.066007833960902929), (1, -0.52007033063618535), (2, 0.37649581219168821)]
[(0, 0.19667592859142491), (1, -0.76095631677000497), (2, -0.50806745810016662)]
[(0, 0.089926399724463771), (1, -0.7241860626752511), (2, 0.40898973155376317)]
[(0, 0.075858476521781085), (1, -0.63205515860034334), (2, 0.5393533605733889)]
[(0, 0.10150299184980124), (1, -0.57373084830029542), (2, -0.67093385852959064)]
[(0, 0.70321089393783143), (1, 0.16115180214025732), (2, 0.18266089635241564)]
[(0, 0.87747876731198304), (1, 0.16758906864659326), (2, 0.10880822642632952)]
[(0, 0.90986246868185761), (1, 0.1408655362871892), (2, -0.00087117874886778746)]
[(0, 0.61658253505692784), (1, -0.053929075663894419), (2, -0.25568697959599318)]

I am running Python 3.5.2 on a PC, coding in IntelliJ.

Anyone encountered this problem, using the Gensim library or elsewhere?

Original Q&A

There are 2 answers

sophros On 23 September 2020 at 09:07

There is a number of possibilities:

Order of the topics can be different. Topic/vocabulary changes between runs. If you run it from scratch every time (incl. vocabulary generation, etc.) there is a possibility that the eventual topics that you see are changing between runs or vocabulary changes between runs which could explain the differences.
The calculations are numerically unstable. This could happen if there was a value close to 0.0 which could get rounded either to -0.0 or +0.0 (depending on the order of calculation which sometimes can be different) and influence the sign of the result. This can be related to a generic numerical bug or a combination of software/hardware that causes it.
Some other reason not yet identified.

**mujjiga** · Accepted Answer · 2020-09-23T09:39:27+00:00

LSI model is nothing but an implementation of fast truncated SVD underneath it. SVD calculates eigen vectors and these vectors correspond to the topics. However, eigenvectors remain eigenvectors even after multiplying by -1. So the sign might keep flipping based on the how the algorithm is implemented. In fact it is the case with the SVD implementation of the popular library LAPACK and even the numpy implementation.

The sign really does not matter here, as multiplication by -1 is also an eigen vector.

TechQA.

Why are the signs of my topic weights changing from run to run?

There are 2 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in GENSIM

Related Questions in TOPIC-MODELING

Related Questions in LATENT-SEMANTIC-INDEXING

Popular Questions

Popular Tags

Trending Questions