# numpy: most efficient frequency counts for unique values in an array

## Question or problem about Python programming:

In numpy / scipy, is there an efficient way to get frequency counts for unique values in an array?

Something along these lines:

```x = array( [1,1,1,2,2,2,5,25,1,1] )
y = freq_count( x )
print y

>> [[1, 5], [2,3], [5,1], [25,1]]
```

( For you, R users out there, I’m basically looking for the table() function )

## How to solve the problem:

### Solution 1:

Take a look at `np.bincount`:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

```import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
```

And then:

```zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
```

or:

```np.vstack((ii,y[ii])).T
# array([[ 1,  5],
[ 2,  3],
[ 5,  1],
[25,  1]])
```

or however you want to combine the counts and the unique values.

### Solution 2:

As of Numpy 1.9, the easiest and fastest method is to simply use `numpy.unique`, which now has a `return_counts` keyword argument:

```import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

print np.asarray((unique, counts)).T
```

Which gives:

``` [[ 1  5]
[ 2  3]
[ 5  1]
[25  1]]
```

A quick comparison with `scipy.stats.itemfreq`:

```In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop
```

### Solution 3:

Update: The method mentioned in the original answer is deprecated, we should use the new way instead:

```>>> import numpy as np
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> np.array(np.unique(x, return_counts=True)).T
array([[ 1,  5],
[ 2,  3],
[ 5,  1],
[25,  1]])
```

you can use scipy.stats.itemfreq

```>>> from scipy.stats import itemfreq
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> itemfreq(x)
/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`
array([[  1.,   5.],
[  2.,   3.],
[  5.,   1.],
[ 25.,   1.]])
```

### Solution 4:

I was also interested in this, so I did a little performance comparison (using perfplot, a pet project of mine). Result:

```y = np.bincount(a)
ii = np.nonzero(y)[0]
out = np.vstack((ii, y[ii])).T
```

is by far the fastest. (Note the log-scaling.)

Code to generate the plot:

```import numpy as np
import pandas as pd
import perfplot
from scipy.stats import itemfreq

def bincount(a):
y = np.bincount(a)
ii = np.nonzero(y)[0]
return np.vstack((ii, y[ii])).T

def unique(a):
unique, counts = np.unique(a, return_counts=True)
return np.asarray((unique, counts)).T

def unique_count(a):
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), np.int)
return np.vstack((unique, count)).T

def pandas_value_counts(a):
out = pd.value_counts(pd.Series(a))
out.sort_index(inplace=True)
out = np.stack([out.keys().values, out.values]).T
return out

perfplot.show(
setup=lambda n: np.random.randint(0, 1000, n),
kernels=[bincount, unique, itemfreq, unique_count, pandas_value_counts],
n_range=[2 ** k for k in range(26)],
logx=True,
logy=True,
xlabel="len(a)",
)
```

### Solution 5:

Using pandas module:

```>>> import pandas as pd
>>> import numpy as np
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> pd.value_counts(x)
1     5
2     3
25    1
5     1
dtype: int64
```