Read large csv files with dask dataframe quickly
import dask.dataframe as dd n = [“column1”, “column2”, “column3”, “column4”] df = dd.read_csv(‘D:/BigData/data1.csv’, assume_missing=True, names=n) print(df.head())
import dask.dataframe as dd n = [“column1”, “column2”, “column3”, “column4”] df = dd.read_csv(‘D:/BigData/data1.csv’, assume_missing=True, names=n) print(df.head())
import pandas as pd from datetime import datetime import plotly import plotly.graph_objects as go df = pd.read_csv(‘https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv’) fig = go.Figure(data=[go.Candlestick(x=df[‘Date’], open=df[‘AAPL.Open’], high=df[‘AAPL.High’], low=df[‘AAPL.Low’], close=df[‘AAPL.Close’])]) # fig.show() plotly.offline.plot(fig)
Unfortunately you can use ternary operator like this a if x>y else b on pandas dataframe logic. With that said you can use numpy.where instead: df[‘result’] = np.where(df1[‘col1’] > df1[‘col2′], 1, 0) There you go. It’s also much faster.
How to concatenate dataframes and give column new names? Here, look: df2 = pd.concat([df1,df2], keys=[‘x’, ‘y’, ‘z’] ,axis=1)
.describe() in Python And here’s a good image to understand percentiles: Images taken from here and here.
import pandas as pd pd.set_option(‘display.max_rows’, 500) pd.set_option(‘display.max_columns’, 500) pd.set_option(‘display.width’, 1000)
print(df.columns.get_loc(“column_name”))
You need to use ‘dt.strftime‘ df[“new_time”] = df[“time”].dt.strftime( “%d/%m/%Y %H:%M” )
Use double square brackets around ‘Number’, i.e.: df.groupby([‘Name’, ‘Fruit’])[[‘Number’]].agg(‘sum’)
list = [a,b,c,d] for i, val in enumerate(list): print(i, val)