Question or problem about Python programming:
I would like to have the norm of one NumPy array. More specifically, I am looking for an equivalent version of this function
def normalize(v): norm = np.linalg.norm(v) if norm == 0: return v return v / norm
Is there something like that in skearn or numpy?
This function works in a situation where v is the 0 vector.
How to solve the problem:
Solution 1:
If you’re using scikit-learn you can use sklearn.preprocessing.normalize
:
import numpy as np from sklearn.preprocessing import normalize x = np.random.rand(1000)*10 norm1 = x / np.linalg.norm(x) norm2 = normalize(x[:,np.newaxis], axis=0).ravel() print np.all(norm1 == norm2) # True
Solution 2:
I would agree that it were nice if such a function was part of the included batteries. But it isn’t, as far as I know. Here is a version for arbitrary axes, and giving optimal performance.
import numpy as np def normalized(a, axis=-1, order=2): l2 = np.atleast_1d(np.linalg.norm(a, order, axis)) l2[l2==0] = 1 return a / np.expand_dims(l2, axis) A = np.random.randn(3,3,3) print(normalized(A,0)) print(normalized(A,1)) print(normalized(A,2)) print(normalized(np.arange(3)[:,None])) print(normalized(np.arange(3)))
Solution 3:
You can specify ord to get the L1 norm.
To avoid zero division I use eps, but that’s maybe not great.
def normalize(v): norm=np.linalg.norm(v, ord=1) if norm==0: norm=np.finfo(v.dtype).eps return v/norm
Solution 4:
This might also work for you
import numpy as np normalized_v = v / np.sqrt(np.sum(v**2))
but fails when v
has length 0.
Solution 5:
If you have multidimensional data and want each axis normalized to its max or its sum:
def normalize(_d, to_sum=True, copy=True): # d is a (n x dimension) np array d = _d if not copy else np.copy(_d) d -= np.min(d, axis=0) d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0)) return d
Uses numpys peak to peak function.
a = np.random.random((5, 3)) b = normalize(a, copy=False) b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1 c = normalize(a, to_sum=False, copy=False) c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1