Is there a numpy-thonic way, e.g. function, to find the *nearest value* in an array?

Example:

```
np.find_nearest( array, value )
```

Asked 7 Months ago Answers: 5 Viewed 32 times

Is there a numpy-thonic way, e.g. function, to find the *nearest value* in an array?

Example:

```
np.find_nearest( array, value )
```

51

Tuesday, June 1, 2021

answered 7 Months ago

JakeGR

90

`scipy.spatial`

also has a k-d tree implementation: `scipy.spatial.KDTree`

.

The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).

Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.

Thursday, June 3, 2021

answered 7 Months ago

35

EDIT: Have adjusted the queries below to convert to using `long`

arithmetic, so that we avoid overflow issues.

I would probably use MoreLINQ's `MinBy`

method:

```
var nearest = array.MinBy(x => Math.Abs((long) x - targetNumber));
```

Or you *could* just use:

```
var nearest = array.OrderBy(x => Math.Abs((long) x - targetNumber)).First();
```

... but that will sort the whole collection, which you really don't need. It won't make much difference for a *small* array, admittedly... but it just doesn't feel quite right, compared with describing what you're *actually* trying to do: find the element with the minimum value according to some function.

Note that both of these will fail if the array is empty, so you should check for that first.

Wednesday, June 9, 2021

answered 6 Months ago

12

Some example data:

```
# some example data
np.random.seed(0)
n_values = 1000000
n_items = 100000
values = np.random.rand(n_values)
items = np.random.rand(n_items)
values.sort()
items.sort()
```

Your original code snippet as well as an implementation of @PeterE's suggestion:

```
def original(values, items):
l = np.empty(items.size, dtype=np.int32)
k, K = 0, len(values)
for i, item in enumerate(items):
while k < K and values[k] < item:
k += 1
l[i] = k
return l
def peter_e(values, items):
l = np.empty(items.size, dtype=np.int32)
last_idx = 0
for i, item in enumerate(items):
last_idx += values[last_idx:].searchsorted(item)
l[i] = last_idx
return l
```

Test for correctness against naive `np.searchsorted`

:

```
ss = values.searchsorted(items)
print(all(original(values, items) == ss))
# True
print(all(peter_e(values, items) == ss))
# True
```

Timings:

```
In [1]: %timeit original(values, items)
10 loops, best of 3: 115 ms per loop
In [2]: %timeit peter_e(values, items)
10 loops, best of 3: 79.8 ms per loop
In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.09 ms per loop
```

So for inputs of this size, naive use of `np.searchsorted`

handily beats your original code, as well as PeterE's suggestion.

To avoid any caching effects that might skew the timings, we can generate a new set of random input arrays for each iteration of the benchmark:

```
In [1]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
original(values, items)
.....:
10 loops, best of 3: 115 ms per loop
In [2]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
peter_e(values, items)
.....:
10 loops, best of 3: 79.9 ms per loop
In [3]: %%timeit values = np.random.randn(n_values); items = np.random.randn(n_items); values.sort(); items.sort();
values.searchsorted(items)
.....:
100 loops, best of 3: 4.08 ms per loop
```

It's not that hard to write a Cython function that will beat `np.searchsorted`

for the case where both `values`

and `items`

are sorted.

`search_doubly_sorted.pyx`

:

```
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def search_doubly_sorted(values, items):
cdef:
double[:] _values = values.astype(np.double)
double[:] _items = items.astype(np.double)
long n_items = items.shape[0]
long n_values = values.shape[0]
long[:] out = np.empty(n_items, dtype=np.int64)
long ii, jj, last_idx
last_idx = 0
for ii in range(n_items):
for jj in range(last_idx, n_values):
if _items[ii] <= _values[jj]:
break
last_idx = jj
out[ii] = last_idx
return out.base
```

Test for correctness:

```
In [1]: from search_doubly_sorted import search_doubly_sorted
In [2]: print(all(search_doubly_sorted(values, items) == values.searchsorted(items)))
# True
```

Benchmark:

```
In [3]: %timeit values.searchsorted(items)
100 loops, best of 3: 4.07 ms per loop
In [4]: %timeit search_doubly_sorted(values, items)
1000 loops, best of 3: 1.44 ms per loop
```

The performance improvement is fairly marginal, though. Unless this is a serious bottleneck in your code then you should probably stick with `np.searchsorted`

.

Friday, August 6, 2021

answered 4 Months ago

40

How about this solution:

1) Identify all the locations of the upper-right left-hand element of the small array in the big array.

2) Check if the slice of the big array that corresponds to a every given element is exactly the same as the small array.

Say if the upper left-hand corner element of the slice is 5, we would find locations of 5 in the big array, and then go check if a slice of the big array to the bottom-left of 5 is the same as small array.

```
import numpy as np
a = np.array([[0,1,5,6,7],
[0,4,5,6,8],
[2,3,5,7,9]])
b = np.array([[5,6],
[5,7]])
b2 = np.array([[6,7],
[6,8],
[7,9]])
def check(a, b, upper_left):
ul_row = upper_left[0]
ul_col = upper_left[1]
b_rows, b_cols = b.shape
a_slice = a[ul_row : ul_row + b_rows, :][:, ul_col : ul_col + b_cols]
if a_slice.shape != b.shape:
return False
return (a_slice == b).all()
def find_slice(big_array, small_array):
upper_left = np.argwhere(big_array == small_array[0,0])
for ul in upper_left:
if check(big_array, small_array, ul):
return True
else:
return False
```

Result:

```
>>> find_slice(a, b)
True
>>> find_slice(a, b2)
True
>>> find_slice(a, np.array([[5,6], [5,8]]))
False
```

Sunday, October 3, 2021

answered 2 Months ago

Only authorized users can answer the question. Please sign in first, or register a free account.