The classic example of determining similarity as distance Word Mover's Distance as for example here https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html, word2vec model on GoogleNews-vectors-negative300.bin, D1="Obama speaks to the media in Illinois",D2="The president greets the press in Chicago",D3="Oranges are my favorite fruit". When calculated wmd distances: distance (D1,D2)=3.3741, distance (D1,D3)=4.3802. So we understand that (D1,D2) more similar than (D1,D3). What is the threshold value for vmd distance to decide that the two sentences actually contain almost the same information? Maybe in the case of sentences D1 and D2, the value of 3.3741 is too large and in reality these sentences are different? Such decisions need to be made, for example, when there is a question, a sample of the correct answer and a student's answer. Addition after the answer by gojomo: Let's postpone identification and automatic understanding of logic for later. Let's consider the case when in two sentences there is an enumeration of objects, or properties and actions of one object in a positive way, and we need to evaluate how similar the content of these two sentences is.
Decision that texts or sentences are equivalent in content
38 views Asked by vmk At
1
There are 1 answers
Related Questions in WORD2VEC
- Output of Cosine Similarity is not as expected
- How do handle compound nouns (animal names) in word2vec (using tensorflow)?
- the key did not present in Word2vec
- Very long training times in pyTorch compared to Gensim
- " 'Word2Vec' object has no attribute 'load_parent_word2vec_format' " error
- Future Warning and User warning in KMeans Algo
- Load word2vec model that is in .tar format
- How do I split words effectively through TextVectorization function?
- How to Export Gensim Word2Vec Model with Ngram Weights for DL4J?
- Word2Vec to calculate similarity of movies to high preforming movies
- How to query questions with high similarity based on the input question content?
- Generating Vector Embeddings for Organization Names
- How to know the semantic similarity of words in a text using word2vec or WordNet in R?
- Python word2vec updates
- How does the model.resize_token_embeddings() function refactor the embeddings for newly added tokens in the tokenizer?
Related Questions in SIMILARITY
- Similar Questions but Different Response Set Up in Survey Data Sets
- Measures of similarity for time series data
- similarity between two numpy arrays based on shape but not distance
- How to detect if two sentences are simmilar, not in meaning, but in syllables/words?
- How can I compare the similarity between multiple sets?
- Similarity search within vector database records
- Langchain FAISS | Any solutions or alternatives for similarity search on vector DBs for slightly repetitive short words with numerics?
- I have plots of points that I extract from an image. How can I determine a similarity measure between two different plots?
- How to combine a column containing score value with knn score of rest of the columns
- Shared triples between two knowledge graphs
- record matching/similarity calculation for numbers and characters
- Dealing with Pearson Similarity returning 0 for users with equal item counts - Mahout
- VBA collect consecutive similar cells in the row
- Textual similarity between two tags in Nodejs
- Get similarity within a column based on another column
Related Questions in WMD
- Cannot pip3 install wmd on M1 mac
- wmd model gensim is returning infinity
- How can we use our own customised embedding with WordMoverDistance?
- Can I optimize this Word Mover's Distance look-up function?
- Iterate efficiently over a list of strings to get matrix of pairwise WMD distances
- Decision that texts or sentences are equivalent in content
- Relaxed Word Mover's Distance in R
- In K-Means clustering algorithm(sklearn) how to override euclidean distance to some distance
- How to use WmdSimilarity function provided in gensim along with word embeddings which are in numpy.ndarray data type
- How to display text in Textarea of wmd-editor
- Force WMD to use built-in preprocessor
- Word Mover's distance calculation between word pairs of two documents
- HTMLPurifier ignore WMD/ WYSIWYG code samples/tags like SO does
- wmd-rails in production doesn't show images
- MathJax - Optimize performance on multiple typeset
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
I don't believe there's any absolute threshold that could be used as you wish.
The "Word Mover's Distance" can offer some impressive results in finding highly-similar texts, especially in relative comparison to other candidate texts.
However, its magnitude may be affected by the sizes of the texts, and further it has no understanding of rigorous grammar/semantics. Thus things like subtle negations or contrasts, or things that would be nonsense to a native speaker, won't be highlighted as very "different" from other statements.
For example, the two phrases "Many historians agree Obama is absolutely positively the best President of the 21st century", and "Many historians agree Obama is absolutely positively not the best President of the 21st century", will be noted as incredibly similar by most measures based on word-statistics, such as Word Mover's Distance. Yet, the insertion of one word means they convey somewhat opposite ideas.