Operators¶
xarray¶
We have based our library on xarray, an open source project and Python package that makes working with labelled multi-dimensional arrays simple and efficient. The full documentation can be found at https://xarray.pydata.org/en/stable/.
The basic data structure we use is an xarray.DataArray, a labelled multi-dimensional array whose key properties are:
values: a numpy.ndarray holding the array’s values;
dims: dimension names for each axis;
coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings);
attrs: a dict to hold arbitrary metadata (attributes).
Let us consider a specific example:
import qnt.data as qndata
futures = qndata.futures.load_data(min_date="2006-01-01")
futures.dims
The output is a tuple:
('field', 'time', 'asset')
The most common operation is to select a specific field as follows:
close_price = futures.sel(field='close')
which will return a structure similar to a pandas DataFrame: a two-by-two matrix with the time coordinate on the y-axis, in ascending order, and the values of the close for all assets on the x-axis.
These data structures can be used for building indicators.
Arithmetic operations with a single xarray.DataArray automatically vectorize (like numpy) over all array values:
close_price_100 = close_price/100.0
You can also use any of numpy’s or scipy’s many ufunc functions directly on a DataArray:
import numpy
numpy.log(close_price)
The file qnt/xr_talib.py contains many technical indicators, for example:
import qnt.xr_talib as talib
close_price_sma= talib.SMA(close_price, 2)
Optimized version of the indicators based on numba can be found in the qnt/ta folder, for example:
import qnt.ta as qnta
close_price_sma= qnta.sma(close_price, 2)
pandas¶
Here we describe how to work with pandas data structures.
The first step consists in converting the sliced xarray.DataArray into a pandas.DataFrame:
import qnt.data as qntdata
data = qntdata.futures.load_data(tail=365*15)
close= data.sel(field="close").to_pandas()
We can then compute an indicator using standard pandas methods:
close_sma = ((close-close.shift(10))/close.shift(10)).rolling(30).mean()
and define our normalized weights to be:
norm = abs(close_sma).sum(axis=1)
weights= close_sma.div(norm, axis=0)
The final conversion to an xarray.DataArray can be performed simply with:
final_weights = weights.unstack().to_xarray()
In the following table we show some useful wrapper functions for working with pandas structures:
Operator | Python |
---|---|
ts_sum(df, window) |
def ts_sum(df, window=20):
"""
Computes the sum of the values on a rolling basis.
:param df: pandas.DataFrame.
:param window: the rolling window used for the computation.
:return: a pandas.DataFrame with the sum of the values over the past 'window' days.
"""
return df.rolling(window).sum()
|
sma(df, window) |
def sma(df, window=20):
"""
Computes the simple moving average.
:param df: pandas.DataFrame.
:param window: the rolling window used for the computation.
:return: a pandas.DataFrame with the sma over the past 'window' days.
"""
return df.rolling(window).mean()
|
stddev(df, window) |
def stddev(df, window=20):
"""
Computes the standard deviation on a rolling basis.
:param df: pandas.DataFrame.
:param window: the rolling window used for the computation.
:return: a pandas.DataFrame with the stddev over the past 'window' days.
"""
return df.rolling(window).std()
|
correlation(x, y, window) |
def correlation(x, y, window=20):
"""
Computes correlation on a rolling basis.
:params x,y: pandas.DataFrames.
:param window: the rolling window used for the computation.
:return: a pandas.DataFrame with the time-series of the column-wise correlation between x and y over the past 'window' days.
"""
return x.rolling(window).corr(y)
|
covariance(x, y, window) |
def covariance(x, y, window=20):
"""
Computes covariance on a rolling basis.
:params x,y: pandas.DataFrames.
:param window: the rolling window used for the computation.
:return: a pandas.DataFrame with the time-series of the column-wise covariance between x and y over the past 'window' days.
"""
return x.rolling(window).cov(y)
|
rolling_rank(na) |
def rolling_rank(na):
"""
Auxiliary function for ts_rank.
:param na: numpy array.
:return: The rank of the last value in the array.
"""
import scipy.stats
return scipy.stats.rankdata(na)[-1]
|
ts_rank(df, window) |
def ts_rank(df, window=20):
"""
Computes the rank on a rolling basis.
:param df: a pandas.DataFrame.
:param window: the rolling window used for the computation.
:return: a pandas.DataFrame with the rank over the past window days.
"""
return df.rolling(window).apply(rolling_rank)
|
rolling_prod(na) |
def rolling_prod(na):
"""
Auxiliary function for ts_prod.
:param na: numpy array.
:return: The product of the values in the array.
"""
import numpy
return numpy.prod(na)
|
product(df, window) |
def product(df, window=20):
"""
Computes the product on a rolling basis.
:param df: a pandas.DataFrame.
:param window: the rolling window used for the computation.
:return: a pandas DataFrame with the product over the past 'window' days.
"""
return df.rolling(window).apply(rolling_prod)
|
ts_min(df, window) |
def ts_min(df, window=20):
"""
Computes the minimum on a rolling basis.
:param df: a pandas.DataFrame.
:param window: the rolling window.
:return: a pandas DataFrame with the minimum over the past 'window' days.
"""
return df.rolling(window).min()
|
ts_max(df, window) |
def ts_max(df, window=20):
"""
Computes the maximum on a rolling basis.
:param df: a pandas.DataFrame.
:param window: the rolling window.
:return: a pandas DataFrame with the maximum over the past 'window' days.
"""
return df.rolling(window).max()
|
delta(df, period) |
def delta(df, period=1):
"""
Computes the difference.
:param df: a pandas.DataFrame.
:param period: the difference.
:return: a pandas DataFrame with today’s value minus the value 'period' days ago.
"""
return df.diff(period)
|
delay(df, period) |
def delay(df, period=1):
"""
Computes lagged value.
:param df: a pandas.DataFrame.
:param period: the lag grade.
:return: a pandas DataFrame with the lagged values of the time series.
"""
return df.shift(period)
|
rank(df) |
def rank(df):
"""
Cross sectional rank.
:param df: a pandas.DataFrame.
:return: a pandas DataFrame with rank along columns (percentiles).
"""
return df.rank(axis=1, pct=True)
|
scale(df, k) |
def scale(df, k=1):
"""
Scaled time serie.
:param df: a pandas.DataFrame.
:param k: scaling factor.
:return: a pandas.DataFrame rescaled such that sum(abs(df)) = k
"""
import numpy
return df.mul(k).div(numpy.abs(df).sum())
|
ts_argmax(df, window) |
def ts_argmax(df, window=20):
"""
Computes on which day ts_max(df, window) occurred on.
:param df: a pandas.DataFrame.
:param window: the rolling window.
:return: number of days ago condition occurred.
"""
return df.rolling(window).apply(np.argmax) + 1
|
ts_argmin(df, window) |
def ts_argmin(df, window=20):
"""
Computes on which day ts_min(df, window) occurred on.
:param df: a pandas.DataFrame.
:param window: the rolling window.
:return: number of days ago condition occurred.
"""
return df.rolling(window).apply(np.argmin) + 1
|