Given I have this multiindexed dataframe:
>>> import pandas as p
>>> import numpy as np
...
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo']),
... np.array(['one', 'two', 'one', 'two', 'one', 'two'])]
...
>>> s = p.Series(np.random.randn(6), index=arrays)
>>> s
bar one -1.046752
two 2.035839
baz one 1.192775
two 1.774266
foo one -1.716643
two 1.158605
dtype: float64
How I should do to eliminate index bar?
I tried with drop
>>> s1 = s.drop('bar')
>>> s1
baz one 1.192775
two 1.774266
foo one -1.716643
two 1.158605
dtype: float64
Seems OK but bar is still there in some bizarre way:
>>> s1.index
MultiIndex(levels=[[u'bar', u'baz', u'foo'], [u'one', u'two']],
labels=[[1, 1, 2, 2], [0, 1, 0, 1]])
>>> s1['bar']
Series([], dtype: float64)
>>>
How could I get ride of any residue from this index label ?
Definitely looks like a bug.
s1.index.tolist() returns to the expected value without "bar".
s1["bar"] returns a null Series.
The standard methods to override this don't seem to work either:
However, as expected, trying grab a new key invokes a KeyError:
The main difference is when you actually look at the source code between the two in pandas.core.index.py
So, the index.tolist() and the _labels aren't accessing the same piece of shared information, in fact, they aren't even close to.
So, we can use this to manually update the resulting indexer.
If we compare this to the initial multindexed index, we get
So the _levels attributes aren't updated, while the values is.
EDIT: Overriding it wasn't as easy as I thought.
EDIT: Wrote a custom function to fix this behavior
Example user:
EDIT: This seems to be specific to dataframes/series with multiindexing, as the standard pandas.core.index.Index class does not have the same limitations. I would recommend filing a bug report.
Consider the same series with a standard index:
The same is true for a dataframe