How can I convert a list to a numpy array for filtering elements?

207 views Asked by At

I have a list of float numbers and I would like to convert it to numpy array so I can use numpy.where() to get indices of elements that are bigger than 0.0 (not zero)

I tried this, but with no luck:

import numpy as np

arr = np.asarray(enumerate(grade_list))
g_indices = np.where(arr[1] > 0)[0]

Edit:

is dtype=float needed?

6

There are 6 answers

2
Martin Thoma On BEST ANSWER

You don't need numpy arrays to filter lists.

List comprehensions

List comprehensions are a really powerful tool to write readable and short code:

grade_list = [1, 2, 3, 4, 4, 5, 4, 3, 1, 6, 0, -1, 6, 3]
indices = [index for index, grade in enumerate(grade_list) if grade > 0.0]
print(indices)

gives [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13]. This is a standard Python list. This list can be converted to a numpy array afterwards, if necessary.

Numpy

If you really want to use numpy.where, you should skip the enumerate:

import numpy
grade_list = [1, 2, 3, 4, 4, 5, 4, 3, 1, 6, 0, -1, 6, 3]
grade_list_np = numpy.array(grade_list)
indices = numpy.where(grade_list_np > 0.0)[0]
print(indices)

gives [ 0 1 2 3 4 5 6 7 8 9 12 13].

Performance comparision

If you only need this for a small list (e.g. < 100), the list comprehension is the fastest way to do it. Using numpys where is significantly faster than using a list comprehension first and then converting it to a numpy array (for list length of 1000):

numpy.where (|L| = 1000): 13.5045940876
list_comprehension_np (|L| = 1000): 27.2982738018
list_comprehension (|L| = 1000): 15.2280910015

These results were created with the following script:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import random
import timeit
import numpy


def filtered_list_comprehension(grade_list):
    return [index for index, grade in enumerate(grade_list) if grade > 0.3]


def filtered_list_comprehension_np(grade_list):
    return numpy.array([index for index, grade in enumerate(grade_list)
                        if grade > 0.3])


def filtered_numpy(grade_list):
    grade_list_np = numpy.array(grade_list)
    return numpy.where(grade_list_np > 0.3)[0]

list_elements = 10000
grade_list = [random.random() for i in range(list_elements)]

res = timeit.timeit('filtered_numpy(grade_list)',
                    number=100000,
                    setup="from __main__ import grade_list, filtered_numpy")
print("numpy.where (|L| = %i): %s" % (list_elements, str(res)))
res = timeit.timeit('filtered_list_comprehension_np(grade_list)',
                    number=100000,
                    setup="from __main__ import grade_list, filtered_list_comprehension_np")
print("list_comprehension_np (|L| = %i): %s" % (list_elements, str(res)))
res = timeit.timeit('filtered_list_comprehension(grade_list)',
                    number=100000,
                    setup="from __main__ import grade_list, filtered_list_comprehension")
print("list_comprehension (|L| = %i): %s" % (list_elements, str(res)))
0
camz On

try first converting the enumerate to a list first

I did:

np.asarray(list(enumerate([1, 2, 3])))
0
yvespeirsman On

You don't need the enumerate():

arr = np.asarray(grade_list)
g_indices = np.where(arr > 0)[0]
0
farhawa On

You are over-complicating it:

import  numpy as np

grade_list_as_array = np.array(grade_list)
0
tmdavison On

You want to use np.array, not np.asarray, and you don't need enumerate:

import numpy as np

grade_list=[0,1,2,3,2,1,2,3,1,0,2,4]
arr=np.array(grade_list)

g_indices = np.where(arr > 0)[0]

print g_indices

>>> [ 1  2  3  4  5  6  7  8 10 11]
0
rjonnal On

The enumerate is superfluous. If you truly have a list of floats, this will work:

import numpy as np 
arr = np.array(grade_list)
g_indices = np.where(arr > 0)[0]

Since in boolean comparisons of numbers, 0.0 evaluates to False, technically you can leave off the >0 too.

But if you have a nested list, or a list of tuples, it won't. We may need to know more about your list.