Find out dataframe index value under a certain condition
The simple code would be: df.index[condition] so something like this: df.index[df[‘column1’] == True].tolist() .tolist() is used when there are multiple values match the condition
The simple code would be: df.index[condition] so something like this: df.index[df[‘column1’] == True].tolist() .tolist() is used when there are multiple values match the condition
.item() will do the job: p1 = df[df[“column2”]==df.column2.max()].column1.item() print(p1) This way you can extract the actual value from pandas dataframe and store it in variable for later use.
Here we calculate 0.9th quantile of each column in our dataframe: q = 0.9 for column in df: qr = df[column].quantile(q) print(f”{q*100}% are lower than {qr}”) Here’s a good example to understand quantiles.
import dask.dataframe as dd n = [“column1”, “column2”, “column3”, “column4”] df = dd.read_csv(‘D:/BigData/data1.csv’, assume_missing=True, names=n) print(df.head())
Unfortunately you can use ternary operator like this a if x>y else b on pandas dataframe logic. With that said you can use numpy.where instead: df[‘result’] = np.where(df1[‘col1’] > df1[‘col2′], 1, 0) There you go. It’s also much faster.
Use double square brackets around ‘Number’, i.e.: df.groupby([‘Name’, ‘Fruit’])[[‘Number’]].agg(‘sum’)
Hello, I’m back! 😎 Now the traditional method is this: df.loc[df[‘column_name’] == value] For string, and for numeric values this: df.loc[df[‘column_name’] == ‘string’] While it always work with numeric values, for string values sometimes it doesn’t work. It picks up a blank dataframe. I guess it’s something to do with encoding of the source where … Read more