dataframe - Texxl

Find out dataframe index value under a certain condition

February 3, 2022 by admin

The simple code would be: df.index[condition] so something like this: df.index[df[‘column1’] == True].tolist() .tolist() is used when there are multiple values match the condition

Get actual value from pandas dataframe instead of object with index

February 2, 2022 by admin

.item() will do the job: p1 = df[df[“column2”]==df.column2.max()].column1.item() print(p1) This way you can extract the actual value from pandas dataframe and store it in variable for later use.

How to calculate percentile (quantile) for each column in pandas dataframe

December 7, 2021December 7, 2021 by admin

Here we calculate 0.9th quantile of each column in our dataframe: q = 0.9 for column in df: qr = df[column].quantile(q) print(f”{q*100}% are lower than {qr}”) Here’s a good example to understand quantiles.

Read large csv files with dask dataframe quickly

November 23, 2021 by admin

import dask.dataframe as dd n = [“column1”, “column2”, “column3”, “column4”] df = dd.read_csv(‘D:/BigData/data1.csv’, assume_missing=True, names=n) print(df.head())

Ternary operator on pandas dataframe

November 19, 2021 by admin

Unfortunately you can use ternary operator like this a if x>y else b on pandas dataframe logic. With that said you can use numpy.where instead: df[‘result’] = np.where(df1[‘col1’] > df1[‘col2′], 1, 0) There you go. It’s also much faster.

Return dataframe object from pandas groupby() instead of series or groupby object

November 15, 2021 by admin

Use double square brackets around ‘Number’, i.e.: df.groupby([‘Name’, ‘Fruit’])[[‘Number’]].agg(‘sum’)

Select dataframe rows by specific column string values in Pandas

October 3, 2021July 31, 2021 by admin

Hello, I’m back! 😎 Now the traditional method is this: df.loc[df[‘column_name’] == value] For string, and for numeric values this: df.loc[df[‘column_name’] == ‘string’] While it always work with numeric values, for string values sometimes it doesn’t work. It picks up a blank dataframe. I guess it’s something to do with encoding of the source where … Read more