Asked  7 Months ago    Answers:  5   Viewed   32 times

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )

 Answers

51
import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261
Tuesday, June 1, 2021
 
JakeGR
answered 7 Months ago
90

scipy.spatial also has a k-d tree implementation: scipy.spatial.KDTree.

The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).

Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.

Thursday, June 3, 2021
 
francadaval
answered 7 Months ago
35

EDIT: Have adjusted the queries below to convert to using long arithmetic, so that we avoid overflow issues.

I would probably use MoreLINQ's MinBy method:

var nearest = array.MinBy(x => Math.Abs((long) x - targetNumber));

Or you could just use:

var nearest = array.OrderBy(x => Math.Abs((long) x - targetNumber)).First();

... but that will sort the whole collection, which you really don't need. It won't make much difference for a small array, admittedly... but it just doesn't feel quite right, compared with describing what you're actually trying to do: find the element with the minimum value according to some function.

Note that both of these will fail if the array is empty, so you should check for that first.

Wednesday, June 9, 2021
 
Gersom
answered 6 Months ago
12

Some example data:

# some example data
np.random.seed(0)
n_values = 1000000
n_items = 100000
values = np.random.rand(n_values)
items = np.random.rand(n_items)
values.sort()
items.sort()

Your original code snippet as well as an implementation of @PeterE's suggestion:

def original(values, items):
    l = np.empty(items.size, dtype=np.int32)
    k, K = 0, len(values)
    for i, item in enumerate(items):
        while k < K and values[k] < item:
            k += 1
        l[i] = k
    return l

def peter_e(values, items):
    l = np.empty(items.size, dtype=np.int32)
    last_idx = 0
    for i, item in enumerate(items):
        last_idx += values[last_idx:].searchsorted(item)
        l[i] = last_idx
    return l

Test for correctness against naive np.searchsorted:

ss = values.searchsorted(items)

print(all(original(values, items) == ss))
# True

print(all(peter_e(values, items) == ss))
# True

Timings:

In [1]: %timeit original(values, items)
10 loops, best of 3: 115 ms per loop

In [2]: %timeit peter_e(values, items)
10 loops, best of 3: 79.8 ms per loop

In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.09 ms per loop

So for inputs of this size, naive use of np.searchsorted handily beats your original code, as well as PeterE's suggestion.

Update

To avoid any caching effects that might skew the timings, we can generate a new set of random input arrays for each iteration of the benchmark:

In [1]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
original(values, items)
   .....: 
10 loops, best of 3: 115 ms per loop

In [2]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
peter_e(values, items)
   .....: 
10 loops, best of 3: 79.9 ms per loop

In [3]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
values.searchsorted(items)
   .....: 
100 loops, best of 3: 4.08 ms per loop

Update 2

It's not that hard to write a Cython function that will beat np.searchsorted for the case where both values and items are sorted.

search_doubly_sorted.pyx:

import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def search_doubly_sorted(values, items):

    cdef:
        double[:] _values = values.astype(np.double)
        double[:] _items = items.astype(np.double)
        long n_items = items.shape[0]
        long n_values = values.shape[0]
        long[:] out = np.empty(n_items, dtype=np.int64)
        long ii, jj, last_idx

    last_idx = 0
    for ii in range(n_items):
        for jj in range(last_idx, n_values):
             if _items[ii] <= _values[jj]:
                break
        last_idx = jj
        out[ii] = last_idx

    return out.base

Test for correctness:

In [1]: from search_doubly_sorted import search_doubly_sorted

In [2]: print(all(search_doubly_sorted(values, items) == values.searchsorted(items)))                     
# True

Benchmark:

In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.07 ms per loop

In [4]: %timeit search_doubly_sorted(values, items)
1000 loops, best of 3: 1.44 ms per loop

The performance improvement is fairly marginal, though. Unless this is a serious bottleneck in your code then you should probably stick with np.searchsorted.

Friday, August 6, 2021
 
noir
answered 4 Months ago
40

How about this solution:

1) Identify all the locations of the upper-right left-hand element of the small array in the big array.

2) Check if the slice of the big array that corresponds to a every given element is exactly the same as the small array.

Say if the upper left-hand corner element of the slice is 5, we would find locations of 5 in the big array, and then go check if a slice of the big array to the bottom-left of 5 is the same as small array.

import numpy as np

a = np.array([[0,1,5,6,7],
              [0,4,5,6,8],
              [2,3,5,7,9]])

b = np.array([[5,6],
              [5,7]])

b2 = np.array([[6,7],
               [6,8],
               [7,9]])

def check(a, b, upper_left):
    ul_row = upper_left[0]
    ul_col = upper_left[1]
    b_rows, b_cols = b.shape
    a_slice = a[ul_row : ul_row + b_rows, :][:, ul_col : ul_col + b_cols]
    if a_slice.shape != b.shape:
        return False
    return (a_slice == b).all()

def find_slice(big_array, small_array):
    upper_left = np.argwhere(big_array == small_array[0,0])
    for ul in upper_left:
        if check(big_array, small_array, ul):
            return True
    else:
        return False

Result:

>>> find_slice(a, b)
True
>>> find_slice(a, b2)
True
>>> find_slice(a, np.array([[5,6], [5,8]]))
False
Sunday, October 3, 2021
 
clean_coding
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share