## Question or problem about Python programming:

np.where has the semantics of a vectorized if/else (similar to Apache Spark’s when/otherwise DataFrame method). I know that I can use np.where on pandas Series, but pandas often defines its own API to use instead of raw numpy functions, which is usually more convenient with pd.Series/pd.DataFrame.

Sure enough, I found pandas.DataFrame.where. However, at first glance, it has a completely different semantics. I could not find a way to rewrite the most basic example of np.where using pandas where:

# df is pd.DataFrame # how to write this using df.where? df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])

Am I missing something obvious? Or is pandas where intended for a completely different use case, despite same name as np.where?

## How to solve the problem:

Try:

(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])

The difference between the `numpy`

`where`

and `DataFrame`

`where`

is that the default values are supplied by the `DataFrame`

that the `where`

method is being called on (docs).

I.e.

np.where(m, A, B)

is roughly equivalent to

A.where(m, B)

If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python:

pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])

or without kwargs (Note: that the positional order of arguments is different from the `numpy`

`where`

argument order):

pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])