Operators¶

xarray¶

We have based our library on xarray, an open source project and Python package that makes working with labelled multi-dimensional arrays simple and efficient. The full documentation can be found at https://xarray.pydata.org/en/stable/.

The basic data structure we use is an xarray.DataArray, a labelled multi-dimensional array whose key properties are:

values: a numpy.ndarray holding the array’s values;
dims: dimension names for each axis;
coords: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings);
attrs: a dict to hold arbitrary metadata (attributes).

Let us consider a specific example:

import qnt.data as qndata
futures = qndata.futures.load_data(min_date="2006-01-01")
futures.dims

The output is a tuple:

('field', 'time', 'asset')

The most common operation is to select a specific field as follows:

close_price = futures.sel(field='close')

which will return a structure similar to a pandas DataFrame: a two-by-two matrix with the time coordinate on the y-axis, in ascending order, and the values of the close for all assets on the x-axis.

These data structures can be used for building indicators.

Arithmetic operations with a single xarray.DataArray automatically vectorize (like numpy) over all array values:

close_price_100 = close_price/100.0

You can also use any of numpy’s or scipy’s many ufunc functions directly on a DataArray:

import numpy
numpy.log(close_price)

The file qnt/xr_talib.py contains many technical indicators, for example:

import qnt.xr_talib as talib
close_price_sma= talib.SMA(close_price, 2)

Optimized version of the indicators based on numba can be found in the qnt/ta folder, for example:

import qnt.ta as qnta
close_price_sma= qnta.sma(close_price, 2)

pandas¶

Here we describe how to work with pandas data structures.

The first step consists in converting the sliced xarray.DataArray into a pandas.DataFrame:

import qnt.data as qntdata
data = qntdata.futures.load_data(tail=365*15)
close= data.sel(field="close").to_pandas()

We can then compute an indicator using standard pandas methods:

close_sma = ((close-close.shift(10))/close.shift(10)).rolling(30).mean()

and define our normalized weights to be:

norm = abs(close_sma).sum(axis=1)
weights= close_sma.div(norm, axis=0)

The final conversion to an xarray.DataArray can be performed simply with:

final_weights = weights.unstack().to_xarray()

In the following table we show some useful wrapper functions for working with pandas structures:

Operator	Python
ts_sum(df, window)	def ts_sum(df, window=20): """ Computes the sum of the values on a rolling basis. :param df: pandas.DataFrame. :param window: the rolling window used for the computation. :return: a pandas.DataFrame with the sum of the values over the past 'window' days. """ return df.rolling(window).sum()
sma(df, window)	def sma(df, window=20): """ Computes the simple moving average. :param df: pandas.DataFrame. :param window: the rolling window used for the computation. :return: a pandas.DataFrame with the sma over the past 'window' days. """ return df.rolling(window).mean()
stddev(df, window)	def stddev(df, window=20): """ Computes the standard deviation on a rolling basis. :param df: pandas.DataFrame. :param window: the rolling window used for the computation. :return: a pandas.DataFrame with the stddev over the past 'window' days. """ return df.rolling(window).std()
correlation(x, y, window)	def correlation(x, y, window=20): """ Computes correlation on a rolling basis. :params x,y: pandas.DataFrames. :param window: the rolling window used for the computation. :return: a pandas.DataFrame with the time-series of the column-wise correlation between x and y over the past 'window' days. """ return x.rolling(window).corr(y)
covariance(x, y, window)	def covariance(x, y, window=20): """ Computes covariance on a rolling basis. :params x,y: pandas.DataFrames. :param window: the rolling window used for the computation. :return: a pandas.DataFrame with the time-series of the column-wise covariance between x and y over the past 'window' days. """ return x.rolling(window).cov(y)
rolling_rank(na)	def rolling_rank(na): """ Auxiliary function for ts_rank. :param na: numpy array. :return: The rank of the last value in the array. """ import scipy.stats return scipy.stats.rankdata(na)[-1]
ts_rank(df, window)	def ts_rank(df, window=20): """ Computes the rank on a rolling basis. :param df: a pandas.DataFrame. :param window: the rolling window used for the computation. :return: a pandas.DataFrame with the rank over the past window days. """ return df.rolling(window).apply(rolling_rank)
rolling_prod(na)	def rolling_prod(na): """ Auxiliary function for ts_prod. :param na: numpy array. :return: The product of the values in the array. """ import numpy return numpy.prod(na)
product(df, window)	def product(df, window=20): """ Computes the product on a rolling basis. :param df: a pandas.DataFrame. :param window: the rolling window used for the computation. :return: a pandas DataFrame with the product over the past 'window' days. """ return df.rolling(window).apply(rolling_prod)
ts_min(df, window)	def ts_min(df, window=20): """ Computes the minimum on a rolling basis. :param df: a pandas.DataFrame. :param window: the rolling window. :return: a pandas DataFrame with the minimum over the past 'window' days. """ return df.rolling(window).min()
ts_max(df, window)	def ts_max(df, window=20): """ Computes the maximum on a rolling basis. :param df: a pandas.DataFrame. :param window: the rolling window. :return: a pandas DataFrame with the maximum over the past 'window' days. """ return df.rolling(window).max()
delta(df, period)	def delta(df, period=1): """ Computes the difference. :param df: a pandas.DataFrame. :param period: the difference. :return: a pandas DataFrame with today’s value minus the value 'period' days ago. """ return df.diff(period)
delay(df, period)	def delay(df, period=1): """ Computes lagged value. :param df: a pandas.DataFrame. :param period: the lag grade. :return: a pandas DataFrame with the lagged values of the time series. """ return df.shift(period)
rank(df)	def rank(df): """ Cross sectional rank. :param df: a pandas.DataFrame. :return: a pandas DataFrame with rank along columns (percentiles). """ return df.rank(axis=1, pct=True)
scale(df, k)	def scale(df, k=1): """ Scaled time serie. :param df: a pandas.DataFrame. :param k: scaling factor. :return: a pandas.DataFrame rescaled such that sum(abs(df)) = k """ import numpy return df.mul(k).div(numpy.abs(df).sum())
ts_argmax(df, window)	def ts_argmax(df, window=20): """ Computes on which day ts_max(df, window) occurred on. :param df: a pandas.DataFrame. :param window: the rolling window. :return: number of days ago condition occurred. """ return df.rolling(window).apply(np.argmax) + 1
ts_argmin(df, window)	def ts_argmin(df, window=20): """ Computes on which day ts_min(df, window) occurred on. :param df: a pandas.DataFrame. :param window: the rolling window. :return: number of days ago condition occurred. """ return df.rolling(window).apply(np.argmin) + 1