why does python pandas DataFrame() returns 'duplicated' when value is duplicate

Question

why does python pandas DataFrame() returns 'duplicated' when value is duplicate

160 views Asked by user19023975 At 26 November 2023 at 04:02

To my knowledge, "ValueError: cannot reindex on an axis with duplicate labels" means that you have two or more indix labels (or column labels) have the common name and pandas cannot decide which rows or columns to use.

however, when I created a Dataframe and assign the same values, though with unique labels, it seems to occur.

test=pd.DataFrame(data=np.arange(12).reshape(4,3),index=np.arange(4),columns=np.arange(3))
test.duplicated()

returns False for all indices,

while

test=pd.DataFrame(data=np.zeros(12).reshape(4,3),index=np.arange(4),columns=np.arange(3))
test.duplicated()

produce retruns True except for the first index.

What I misunderstand about the behavior of pandas dataframe?

Thanks.

I want to know my misunderstanding ^_^

Original Q&A

There are 1 answers

**Pawan Tolani** · Accepted Answer · 2023-11-27T08:39:59+00:00

By default, the first occurrence of two or more duplicates will be set to False. It essentially means that first occurrence is not a duplicate and all other occurrences are duplicates.

It returns false for all the rows in first example because no rows are repeated. Whereas in the second example, all rows are repeated with zeroes. That makes the first row the original one (hence false) and all others a duplicate (hence true).

TechQA.

why does python pandas DataFrame() returns 'duplicated' when value is duplicate

There are 1 answers

Related Questions in PANDAS

Related Questions in DROP-DUPLICATES

Popular Questions

Popular Tags

Trending Questions