Introduction

Quantiacs offers historical data for major financial markets, including stocks, futures (like Bitcoin futures), and cryptocurrencies. This section provides an overview of the data:

  • Stocks: Market data for NASDAQ-listed, S&P500-listed companies, past and present.

  • Futures: Market data for liquid global futures contracts with various underlying assets.

  • Cryptocurrencies: Market data for top cryptocurrencies by market capitalization.

Additional Datasets:

Working with Data

Quantiacs provides a Python library called qnt.data that simplifies the process of loading and working with financial data. You can load various types of data using the following functions:

import qnt.data as qndata

# Load daily stock data for the Q22 S&P500 contest
stocks = qndata.stocks.load_spx_data(min_date="2005-06-01")

# Load daily stock data for the Q20 Nasdaq-100 contest
stocks_nasdaq = qndata.stocks.load_ndx_data(min_date="2005-06-01")

# Load cryptocurrency daily data for the Q16/Q17 contests
cryptodaily = qndata.cryptodaily.load_data(min_date="2005-06-01")

# Load futures data for the Q15 contest
futures = qndata.futures.load_data(min_date="2005-06-01")

# Load BTC futures data for the Q15 contest
crypto_futures = qndata.cryptofutures.load_data(min_date="2005-06-01")

print(stocks, stocks_nasdaq, cryptodaily, futures, crypto_futures)

All datasets have the same structure, and you can access information about opening and closing prices, high and low prices, trading volumes, and other relevant data fields.

import qnt.data as qndata

data = qndata.stocks.load_spx_data(min_date="2005-06-01")

price_open = data.sel(field="open")
price_close = data.sel(field="close")
price_high = data.sel(field="high")
price_low = data.sel(field="low")
volume_day = data.sel(field="vol")
is_liquid = data.sel(field="is_liquid")
Data field Description Code example
open Daily open price, i.e. the price at which a security trades when the exchange opens. price_open = data.sel(field="open")
close Daily close price. price_close = data.sel(field="close")
high Daily high price. price_high = data.sel(field="high")
low Daily low price. price_low = data.sel(field="low")
vol Daily number of traded shares. volume_day = data.sel(field="vol")
divs Dividends from shares. dividends = data.sel(field="divs")
split Indicates a stock split. Split = 2.0 means that on this day there was a split of shares 2 to 1. split = data.sel(field="split")
split_cumprod The product of split values from the very beginning. It can be used for restoring original prices. split_cumprod = data.sel(field="split_cumprod")
is_liquid Determines whether this stock is liquid enough for trading. A stock is liquid if it is part of the top 500 stocks. is_liquid = data.sel(field="is_liquid")

Frequently used functions

Description Code Example
View a list of all tickers data.asset.to_pandas().to_list()
See which fields are available data.field.to_pandas().to_list()
Load specific tickers data = qndata.stocks.load_spx_data(min_date="2005-06-01", assets=["NAS:AAPL", "NAS:AMZN"])
Select specific tickers after loading all data def get_data_filter(data, assets):
filler= data.sel(asset=assets)
return filler

get_data_filter(data, ["NAS:AAPL", "NAS:AMZN"])
Loads a list of NASDAQ-listed stocks stocks_list = qndata.stocks.load_ndx_list(min_date='2006-01-01')
Loads a list of available futures contracts. future_list = qndata.futures.load_list()
List of sectors. sectors = [x['sector'] for x in stocks_list]
Filter list of asset IDs for the specified sector. assets_for_sector = [x['id'] for x in stocks_list if x['sector'] == "Energy"]
Load specific tickers for sector data = qndata.stocks.load_spx_data(min_date="2005-06-01", assets=assets_for_sector)

Xarray

Quantiacs data is stored in xarray DataArray format, which provides a powerful and flexible way to work with labeled multi-dimensional arrays. You can perform arithmetic operations, mathematical functions, broadcasting, aggregations, slicing and selecting data, combining DataArrays, and applying custom functions on DataArray objects.

import qnt.data as qndata

# Load data
stocks = qndata.stocks.load_spx_data(min_date="2005-06-01")
print(stocks.dims)

# Example output: ('field', 'time', 'asset')

The main data structure in xarray is the DataArray. It has the following properties:

  • values: a numpy.ndarray containing the array’s values.

  • dims: dimension names for each axis.

  • coords: a container of arrays (coordinates) that label each point.

  • attrs: a dictionary for storing arbitrary metadata (attributes).

import qnt.data as qndata
import numpy as np
import qnt.ta as qnta
import qnt.xr_talib as talib

# Load data
data = qndata.stocks.load_spx_data(min_date="2005-06-01")

# Select close prices
price_close = data.sel(field="close")

# Normalize close prices
price_close_100 = price_close / 100.0

# Calculate log of close prices
log_price = np.log(price_close)

# Select close prices for a specific time range
slice_price = price_close.sel(time=slice("2006-01-01", "2006-12-31"))

# Calculate simple moving average (SMA) using qnt.ta
close_price_sma = qnta.sma(price_close, 2)

# Calculate SMA using qnt.xr_talib
close_price_sma_talib = talib.SMA(price_close, 2)

Pandas

In addition to xarray, you can also work with pandas data structures by converting the xarray DataArray into a pandas DataFrame, perform computations using standard pandas methods, and then convert the result back into an xarray DataArray.

Example 1

import qnt.data as qntdata

# Load data
data = qntdata.stocks.load_spx_data(min_date="2005-06-01")

# Calculate percentage change of close prices
def get_price_pct_change(prices):
    prices_pandas = prices.to_pandas()
    assets = data.coords["asset"].values
    for asset in assets:
        prices_pandas[asset] = prices_pandas[asset].pct_change()
    return prices_pandas

prices = data.sel(field="close") * 1.0
prices_pct_change = get_price_pct_change(prices).unstack().to_xarray()

Example 2

import qnt.data as qntdata

# Load data
data = qntdata.stocks.load_spx_data(min_date="2005-06-01")

# Convert close prices to pandas DataFrame
close = data.sel(field="close").to_pandas()

# Calculate simple moving average (SMA) for close prices
close_sma = ((close - close.shift(10)) / close.shift(10)).rolling(30).mean()

# Normalize SMA values
norm = abs(close_sma).sum(axis=1)
weights = close_sma.div(norm, axis=0)

# Convert weights back to xarray DataArray
final_weights = weights.unstack().to_xarray()

How to create a dataset yourself

import pandas as pd
import xarray as xr
import numpy as np


class Fields:
    OPEN = "open"
    LOW = "low"
    HIGH = "high"
    CLOSE = "close"
    VOL = "vol"
    DIVS = "divs"
    SPLIT = "split"
    SPLIT_CUMPROD = "split_cumprod"
    IS_LIQUID = 'is_liquid'


class Dimensions:
    TIME = 'time'
    FIELD = 'field'
    ASSET = 'asset'


def get_base_df():
    sample_data = {"schema": {"fields": [{"name": "time", "type": "datetime"}, {"name": "open", "type": "integer"},
                                         {"name": "close", "type": "integer"}, {"name": "low", "type": "integer"},
                                         {"name": "high", "type": "integer"}, {"name": "vol", "type": "integer"},
                                         {"name": "divs", "type": "integer"}, {"name": "split", "type": "integer"},
                                         {"name": "split_cumprod", "type": "integer"},
                                         {"name": "is_liquid", "type": "integer"}], "primaryKey": ["time"],
                              "pandas_version": "0.20.0"}, "data": [
        {"time": "2021-01-30T00:00:00.000Z", "open": 1, "close": 2, "low": 1, "high": 2, "vol": 1000, "divs": 0,
         "split": 0, "split_cumprod": 0, "is_liquid": 1},
        {"time": "2021-01-31T00:00:00.000Z", "open": 2, "close": 3, "low": 2, "high": 3, "vol": 1000, "divs": 0,
         "split": 0, "split_cumprod": 0, "is_liquid": 1},
        {"time": "2021-02-01T00:00:00.000Z", "open": 3, "close": 4, "low": 3, "high": 4, "vol": 1000, "divs": 0,
         "split": 0, "split_cumprod": 0, "is_liquid": 1},
    ]}
    columns = [Dimensions.TIME, Fields.OPEN, Fields.CLOSE, Fields.LOW, Fields.HIGH, Fields.VOL, Fields.DIVS,
               Fields.SPLIT, Fields.SPLIT_CUMPROD, Fields.IS_LIQUID]
    rows = sample_data['data']
    for r in rows:
        r['time'] = np.datetime64(r['time'])
    df = pd.DataFrame(columns=columns, data=rows)
    df.set_index(Dimensions.TIME, inplace=True)
    prices_array = df.to_xarray().to_array(Dimensions.FIELD)
    prices_array.name = "BTC_TEST"
    prices_array_r = xr.concat([prices_array], pd.Index(['BTC'], name='asset'))
    return prices_array_r


weights = get_base_df()
print(weights)