Der effizienteste Weg, den Modus in einem numply-Array zu finden

Ich habe ein 2D-array mit Ganzzahlen (beide positiv oder negativ). Jede Zeile repräsentiert die Werte über die Zeit für einen bestimmten räumlichen Ort, in der Erwägung, dass jede Spalte stellt die Werte für die verschiedenen räumlichen Standorten für eine bestimmte Zeit.

So, wenn das array wie:

1 3 4 2 2 7
5 2 2 1 4 1
3 3 2 2 1 1

Sollte das Ergebnis

1 3 2 2 2 1

Beachten Sie, dass wenn mehrere Werte für den Modus, eine (zufällig ausgewählte) kann festgelegt werden, wie-Modus.

Ich kann die Iteration über die Spalten zu finden-Modus, ein zu einer Zeit, aber ich hatte gehofft, numpy vielleicht haben einige in-built-Funktion zu tun. Oder wenn es ein trick zu finden, das effizient ohne Schleifen.

InformationsquelleAutor der Frage Nik | 2013-05-02

2d mode numpy python

Überprüfen scipy.stats.Modus() (inspiriert von @tom10 Kommentar):

import numpy as np
from scipy import stats

a = np.array([[1, 3, 4, 2, 2, 7],
              [5, 2, 2, 1, 4, 1],
              [3, 3, 2, 2, 1, 1]])

m = stats.mode(a)
print(m)

Ausgabe:

ModeResult(mode=array([[1, 3, 2, 2, 1, 1]]), count=array([[1, 2, 2, 2, 1, 2]]))

Wie Sie sehen können, gibt es sowohl die mode-als auch den Grafen. Sie können wählen Sie die Modi direkt über m[0]:

print(m[0])

Ausgabe:

[[1 3 2 2 1 1]]

InformationsquelleAutor der Antwort fgb

Dies ist ein schwieriges problem, da es nicht viel gibt, um zu berechnen-Modus entlang einer Achse. Die Lösung ist einfach, für 1-D-arrays, wo numpy.bincount ist praktisch, zusammen mit numpy.unique mit der return_counts arg wie True. Die häufigsten n-dimensionale Funktion, die ich sehe, ist scipy.stats.Modus, obwohl es ist zu langsam - vor allem für große arrays mit vielen einzigartigen Werte. Als Lösung habe ich entwickelt diese Funktion, und verwenden Sie es schwer:

import numpy

def mode(ndarray, axis=0):
    # Check inputs
    ndarray = numpy.asarray(ndarray)
    ndim = ndarray.ndim
    if ndarray.size == 1:
        return (ndarray[0], 1)
    elif ndarray.size == 0:
        raise Exception('Cannot compute mode on empty array')
    try:
        axis = range(ndarray.ndim)[axis]
    except:
        raise Exception('Axis "{}" incompatible with the {}-dimension array'.format(axis, ndim))

    # If array is 1-D and numpy version is > 1.9 numpy.unique will suffice
    if all([ndim == 1,
            int(numpy.__version__.split('.')[0]) >= 1,
            int(numpy.__version__.split('.')[1]) >= 9]):
        modals, counts = numpy.unique(ndarray, return_counts=True)
        index = numpy.argmax(counts)
        return modals[index], counts[index]

    # Sort array
    sort = numpy.sort(ndarray, axis=axis)
    # Create array to transpose along the axis and get padding shape
    transpose = numpy.roll(numpy.arange(ndim)[::-1], axis)
    shape = list(sort.shape)
    shape[axis] = 1
    # Create a boolean array along strides of unique values
    strides = numpy.concatenate([numpy.zeros(shape=shape, dtype='bool'),
                                 numpy.diff(sort, axis=axis) == 0,
                                 numpy.zeros(shape=shape, dtype='bool')],
                                axis=axis).transpose(transpose).ravel()
    # Count the stride lengths
    counts = numpy.cumsum(strides)
    counts[~strides] = numpy.concatenate([[0], numpy.diff(counts[~strides])])
    counts[strides] = 0
    # Get shape of padded counts and slice to return to the original shape
    shape = numpy.array(sort.shape)
    shape[axis] += 1
    shape = shape[transpose]
    slices = [slice(None)] * ndim
    slices[axis] = slice(1, None)
    # Reshape and compute final counts
    counts = counts.reshape(shape).transpose(transpose)[slices] + 1

    # Find maximum counts and return modals/counts
    slices = [slice(None, i) for i in sort.shape]
    del slices[axis]
    index = numpy.ogrid[slices]
    index.insert(axis, numpy.argmax(counts, axis=axis))
    return sort[index], counts[index]

Ergebnis:

In [2]: a = numpy.array([[1, 3, 4, 2, 2, 7],
                         [5, 2, 2, 1, 4, 1],
                         [3, 3, 2, 2, 1, 1]])

In [3]: mode(a)
Out[3]: (array([1, 3, 2, 2, 1, 1]), array([1, 2, 2, 2, 1, 2]))

Einige benchmarks:

In [4]: import scipy.stats

In [5]: a = numpy.random.randint(1,10,(1000,1000))

In [6]: %timeit scipy.stats.mode(a)
10 loops, best of 3: 41.6 ms per loop

In [7]: %timeit mode(a)
10 loops, best of 3: 46.7 ms per loop

In [8]: a = numpy.random.randint(1,500,(1000,1000))

In [9]: %timeit scipy.stats.mode(a)
1 loops, best of 3: 1.01 s per loop

In [10]: %timeit mode(a)
10 loops, best of 3: 80 ms per loop

In [11]: a = numpy.random.random((200,200))

In [12]: %timeit scipy.stats.mode(a)
1 loops, best of 3: 3.26 s per loop

In [13]: %timeit mode(a)
1000 loops, best of 3: 1.75 ms per loop

EDIT: um mehr von einem hintergrund und verändert den Ansatz für mehr Speicher-effiziente

InformationsquelleAutor der Antwort Devin Cairns

3

Erweiterung auf diese Methodeangewendet auf die Suche nach dem Modus der Daten, wo müssen Sie möglicherweise den index des aktuellen array zu sehen, wie weit der Wert von der Mitte der Verteilung.
```
(_, idx, counts) = np.unique(a, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
mode = a[index]
```
Erinnern zu verwerfen-Modus, wenn len(np.argmax(zählt)) > 1, auch um zu überprüfen, ob es tatsächlich Vertreter der zentralen Verteilung Ihrer Daten können Sie überprüfen, ob es sich in Ihrem standard-Abweichung Intervall.

InformationsquelleAutor der Antwort Lean Bravo

Ich denke, eine sehr einfache Möglichkeit wäre die Verwendung der Counter-Klasse. Sie können dann die most_common () - Funktion der Zähler-Beispiel wie bereits erwähnt hier.

Für 1-d-arrays:

import numpy as np
from collections import Counter

nparr = np.arange(10) 
nparr[2] = 6 
nparr[3] = 6 #6 is now the mode
mode = Counter(nparr).most_common(1)
# mode will be [(6,3)] to give the count of the most occurring value, so ->
print(mode[0][0])

Für mehrere eindimensionale arrays (kleine Differenz):

import numpy as np
from collections import Counter

nparr = np.arange(10) 
nparr[2] = 6 
nparr[3] = 6 
nparr = nparr.reshape((10,2,5))     #same thing but we add this to reshape into ndarray
mode = Counter(nparr.flatten()).most_common(1)  # just use .flatten() method

# mode will be [(6,3)] to give the count of the most occurring value, so ->
print(mode[0][0])

Dies kann oder kann nicht sein, eine effiziente Implementierung, aber es ist bequem.

InformationsquelleAutor der Antwort Ali_Ayub

Schreibe einen Kommentar

Du musst angemeldet sein, um einen Kommentar abzugeben.