# Find nearest value in numpy array

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

``````np.find_nearest( array, value )
``````

51
``````import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261
``````
Tuesday, June 1, 2021

90

`scipy.spatial` also has a k-d tree implementation: `scipy.spatial.KDTree`.

The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).

Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.

Thursday, June 3, 2021

35

EDIT: Have adjusted the queries below to convert to using `long` arithmetic, so that we avoid overflow issues.

I would probably use MoreLINQ's `MinBy` method:

``````var nearest = array.MinBy(x => Math.Abs((long) x - targetNumber));
``````

Or you could just use:

``````var nearest = array.OrderBy(x => Math.Abs((long) x - targetNumber)).First();
``````

... but that will sort the whole collection, which you really don't need. It won't make much difference for a small array, admittedly... but it just doesn't feel quite right, compared with describing what you're actually trying to do: find the element with the minimum value according to some function.

Note that both of these will fail if the array is empty, so you should check for that first.

Wednesday, June 9, 2021

12

Some example data:

``````# some example data
np.random.seed(0)
n_values = 1000000
n_items = 100000
values = np.random.rand(n_values)
items = np.random.rand(n_items)
values.sort()
items.sort()
``````

Your original code snippet as well as an implementation of @PeterE's suggestion:

``````def original(values, items):
l = np.empty(items.size, dtype=np.int32)
k, K = 0, len(values)
for i, item in enumerate(items):
while k < K and values[k] < item:
k += 1
l[i] = k
return l

def peter_e(values, items):
l = np.empty(items.size, dtype=np.int32)
last_idx = 0
for i, item in enumerate(items):
last_idx += values[last_idx:].searchsorted(item)
l[i] = last_idx
return l
``````

Test for correctness against naive `np.searchsorted`:

``````ss = values.searchsorted(items)

print(all(original(values, items) == ss))
# True

print(all(peter_e(values, items) == ss))
# True
``````

Timings:

``````In [1]: %timeit original(values, items)
10 loops, best of 3: 115 ms per loop

In [2]: %timeit peter_e(values, items)
10 loops, best of 3: 79.8 ms per loop

In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.09 ms per loop
``````

So for inputs of this size, naive use of `np.searchsorted` handily beats your original code, as well as PeterE's suggestion.

## Update

To avoid any caching effects that might skew the timings, we can generate a new set of random input arrays for each iteration of the benchmark:

``````In [1]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
original(values, items)
.....:
10 loops, best of 3: 115 ms per loop

In [2]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
peter_e(values, items)
.....:
10 loops, best of 3: 79.9 ms per loop

In [3]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
values.searchsorted(items)
.....:
100 loops, best of 3: 4.08 ms per loop
``````

## Update 2

It's not that hard to write a Cython function that will beat `np.searchsorted` for the case where both `values` and `items` are sorted.

`search_doubly_sorted.pyx`:

``````import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def search_doubly_sorted(values, items):

cdef:
double[:] _values = values.astype(np.double)
double[:] _items = items.astype(np.double)
long n_items = items.shape[0]
long n_values = values.shape[0]
long[:] out = np.empty(n_items, dtype=np.int64)
long ii, jj, last_idx

last_idx = 0
for ii in range(n_items):
for jj in range(last_idx, n_values):
if _items[ii] <= _values[jj]:
break
last_idx = jj
out[ii] = last_idx

return out.base
``````

Test for correctness:

``````In [1]: from search_doubly_sorted import search_doubly_sorted

In [2]: print(all(search_doubly_sorted(values, items) == values.searchsorted(items)))
# True
``````

Benchmark:

``````In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.07 ms per loop

In [4]: %timeit search_doubly_sorted(values, items)
1000 loops, best of 3: 1.44 ms per loop
``````

The performance improvement is fairly marginal, though. Unless this is a serious bottleneck in your code then you should probably stick with `np.searchsorted`.

Friday, August 6, 2021

40

1) Identify all the locations of the upper-right left-hand element of the small array in the big array.

2) Check if the slice of the big array that corresponds to a every given element is exactly the same as the small array.

Say if the upper left-hand corner element of the slice is 5, we would find locations of 5 in the big array, and then go check if a slice of the big array to the bottom-left of 5 is the same as small array.

``````import numpy as np

a = np.array([[0,1,5,6,7],
[0,4,5,6,8],
[2,3,5,7,9]])

b = np.array([[5,6],
[5,7]])

b2 = np.array([[6,7],
[6,8],
[7,9]])

def check(a, b, upper_left):
ul_row = upper_left[0]
ul_col = upper_left[1]
b_rows, b_cols = b.shape
a_slice = a[ul_row : ul_row + b_rows, :][:, ul_col : ul_col + b_cols]
if a_slice.shape != b.shape:
return False
return (a_slice == b).all()

def find_slice(big_array, small_array):
upper_left = np.argwhere(big_array == small_array[0,0])
for ul in upper_left:
if check(big_array, small_array, ul):
return True
else:
return False
``````

Result:

``````>>> find_slice(a, b)
True
>>> find_slice(a, b2)
True
>>> find_slice(a, np.array([[5,6], [5,8]]))
False
``````
Sunday, October 3, 2021