strategy

Futures - BLS Macro Data

This template uses data from the Bureau of Labor Statistics for trading futures contracts.

You can clone and edit this example there (tab Examples).


The U.S. Bureau of Labor Statistics is the principal agency for the U.S. government in the field of labor economics and statistics. It provides macroeconomic data in several interesting categories: prices, employment and unemployment, compensation and working conditions and productivity.

Quantiacs has implemented these datasets on its cloud and makes them also available for local use on your machine.

In this template we show how to use the BLS data for creating a trading algorithm.

Need help? Check the Documentation and find solutions/report problems in the Forum section.

More help with Jupyter? Check the official Jupyter page.

Check the BLS documentation on the Quantiacs macroeconomics help page.

Once you are done, click on Submit to the contest and take part to our competitions.

API reference:

  • data: check how to work with data;

  • backtesting: read how to run the simulation and check the results.

Need to use the optimizer function to automate tedious tasks?

  • optimization: read more on our article.
In [1]:
import pandas as pd
import numpy as np

import qnt.data as qndata
In [2]:
%%javascript
window.IPython && (IPython.OutputArea.prototype._should_scroll = function(lines) { return false; })
// disable widget scrolling

First of all we list the 34 available datasets and inspect them:

In [3]:
dbs = qndata.blsgov.load_db_list()

display(pd.DataFrame(dbs)) # convert to pandas for better formatting
100% (3935 of 3935) |####################| Elapsed Time: 0:00:00 Time:  0:00:00
id modified name
0 EN 2019-10-15T11:01:00 Quarterly Census of Employment and Wages
1 CS 2022-02-15T12:49:00 Nonfatal cases involving days away from work: ...
2 OE 2022-03-31T10:09:00 Occupational Employment Statistics
3 FM 2022-04-20T10:04:00 Marital and family labor force statistics from...
4 TU 2022-06-23T11:29:00 American Time Use
5 EP 2022-09-08T10:03:00 Employment Projections by Industry
6 NB 2022-09-22T10:02:00 National Compensation Survey-Benefits
7 CX 2022-10-25T12:00:00 Consumer Expenditure Survey
8 IS 2022-11-09T10:00:00 Occupational injuries and illnesses industry data
9 OR 2022-11-17T10:00:00 Occupational Requirements
10 MP 2022-11-18T10:00:00 Major Sector Multifactor Productivity
11 WM 2022-12-08T10:00:00 Wage Modeling
12 CM 2022-12-15T10:00:00 Employer Costs for Employee Compensation
13 FW 2022-12-16T10:00:00 Census of Fatal Occupational Injuries (2011 fo...
14 IP 2023-01-05T10:00:00 Industry Productivity
15 WS 2023-01-10T10:00:00 Work Stoppage Data
16 AP 2023-01-12T08:30:00 Consumer Price Index - Average Price Data
17 CU 2023-01-12T08:30:00 Consumer Price Index - All Urban Consumers
18 CW 2023-01-12T08:30:00 Consumer Price Index - Urban Wage Earners and ...
19 SU 2023-01-12T08:30:00 Consumer Price Index - Chained Consumer Price ...
20 EI 2023-01-13T08:30:00 Import/Export Price Indexes
21 ND 2023-01-18T08:30:00 Producer Price Index Industry Data
22 PC 2023-01-18T08:30:00 Producer Price Index Industry Data
23 WD 2023-01-18T08:30:00 Producer Price Index Commodity-Discontinued Se...
24 WP 2023-01-18T08:30:00 Producer Price Index-Commodities
25 LE 2023-01-19T10:00:00 Weekly and hourly earnings data from the Curre...
26 LU 2023-01-19T10:00:00 Union affiliation data from the Current Popula...
27 SM 2023-01-24T10:00:00 State and Area Employment, Hours, and Earnings
28 BD 2023-01-25T10:00:00 Business Employment Dynamics
29 CI 2023-01-31T08:30:00 Employment Cost Index
30 JT 2023-02-01T10:00:00 Job Openings and Labor Turnover Survey
31 LA 2023-02-01T10:00:00 Local Area Unemployment Statistics
32 PR 2023-02-02T08:30:00 Major Sector Productivity and Costs
33 CE 2023-02-03T08:30:00 Employment, Hours, and Earnings from the Curre...
34 LN 2023-02-03T08:30:00 Labor Force Statistics from the Current Popula...

For each dataset you can see the identifier, the name and the date of the last available update. Each dataset contains several time series which can be used as indicators.

In this example we use AP. Average consumer Prices are calculated for household fuel, motor fuel and food items from prices collected for the Consumer Price Index (CPI). The full description is available in the metadata.

Let us load and display the time series contained in the AP dataset:

In [4]:
series_list = list(qndata.blsgov.load_series_list('AP'))

display(pd.DataFrame(series_list).set_index('id')) # convert to pandas for better formatting
100% (478963 of 478963) |################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (2 of 2) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00
area_code item_code series_title footnote_codes begin_year begin_period end_year end_period
id
APU0000701111 0000 701111 Flour, white, all purpose, per lb. (453.6 gm) ... 1980 M01 2022 M12
APU0000701311 0000 701311 Rice, white, long grain, precooked (cost per p... 1980 M01 1981 M12
APU0000701312 0000 701312 Rice, white, long grain, uncooked, per lb. (45... 1980 M01 2022 M12
APU0000701321 0000 701321 Spaghetti (cost per pound/453.6 grams) in U.S.... 1980 M01 1981 M03
APU0000701322 0000 701322 Spaghetti and macaroni, per lb. (453.6 gm) in ... 1984 M01 2022 M12
... ... ... ... ... ... ... ... ...
APUS49G74713 S49G 74713 Gasoline, leaded premium (cost per gallon/3.8 ... 1978 M01 1981 M04
APUS49G74714 S49G 74714 Gasoline, unleaded regular, per gallon/3.785 l... 1978 M01 2022 M12
APUS49G74715 S49G 74715 Gasoline, unleaded midgrade, per gallon/3.785 ... 2021 M06 2022 M12
APUS49G74716 S49G 74716 Gasoline, unleaded premium, per gallon/3.785 l... 1981 M09 2022 M12
APUS49G7471A S49G 7471A Gasoline, all types, per gallon/3.785 liters i... 1978 M01 2022 M12

1482 rows × 8 columns

As you see, the AP Average Price Data dataset contains 1479 time series.

Let us see how we can learn the meaning of the 8 columns. Some of them are obvious, like series_title, begin_year or end_year, but others are not, like area_code, item_code, begin_period, end_period.

Inspect the metadata

The Quantiacs toolbox allows you to inspect the meaning of all fields:

In [5]:
meta = qndata.blsgov.load_db_meta('AP')

for k in meta.keys():
    print('### ' + k + " ###")
    m = meta[k]
    
    if type(m) == str:
        # Show only the first line if this is a text entry.
        print(m.split('\n')[0])
        print('...')
        # Uncomment the next line to see the full text. It will give you more details about the database.
        # print(m) 

    if type(m) == dict:
        # convert dictionaries to pandas DataFrame for better formatting:
        df = pd.DataFrame(meta[k].values())
        df = df.set_index(np.array(list(meta[k].keys())))
        display(df)
100% (26925 of 26925) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
### area ###
0
0000 U.S. city average
0100 Northeast
0110 New England
0120 Middle Atlantic
0200 Midwest
... ...
S49C Riverside-San Bernardino-Ontario, CA
S49D Seattle-Tacoma-Bellevue WA
S49E San Diego-Carlsbad, CA
S49F Urban Hawaii
S49G Urban Alaska

74 rows × 1 columns

### footnote ###
0
footnote_code footnote_text
### item ###
0
701111 Flour, white, all purpose, per lb. (453.6 gm)
701311 Rice, white, long grain, precooked (cost per p...
701312 Rice, white, long grain, uncooked, per lb. (45...
701321 Spaghetti (cost per pound/453.6 grams)
701322 Spaghetti and macaroni, per lb. (453.6 gm)
... ...
FJ4101 Yogurt, per 8 oz. (226.8 gm)
FL2101 Lettuce, romaine, per lb. (453.6 gm)
FN1101 All soft drinks, per 2 liters (67.6 oz)
FN1102 All soft drinks, 12 pk, 12 oz., cans, per 12 o...
FS1101 Butter, stick, per lb. (453.6 gm)

160 rows × 1 columns

### period ###
period period_abbr period_name
M01 M01 JAN January
M02 M02 FEB February
M03 M03 MAR March
M04 M04 APR April
M05 M05 MAY May
M06 M06 JUN June
M07 M07 JUL July
M08 M08 AUG August
M09 M09 SEP September
M10 M10 OCT October
M11 M11 NOV November
M12 M12 DEC December
M13 M13 AN AV Annual Average
### seasonal ###
0
S Seasonally Adjusted
U Not Seasonally Adjusted
### contacts ###
Consumer Price Indexes Contacts
...
### txt ###
				Average Price Data (AP)
...

These tables allows you to quickly understand the meaning of the fields for each times series in the Average Price Data.

The area_code column reflects the U.S. area connected to the time series, for example 0000 for the entire U.S.

Let us select only time series related to the entire U.S.:

In [6]:
us_series_list = [s for s in series_list if s['area_code'] == '0000']

display(pd.DataFrame(us_series_list).set_index('id')) # convert to pandas for better formatting
area_code item_code series_title footnote_codes begin_year begin_period end_year end_period
id
APU0000701111 0000 701111 Flour, white, all purpose, per lb. (453.6 gm) ... 1980 M01 2022 M12
APU0000701311 0000 701311 Rice, white, long grain, precooked (cost per p... 1980 M01 1981 M12
APU0000701312 0000 701312 Rice, white, long grain, uncooked, per lb. (45... 1980 M01 2022 M12
APU0000701321 0000 701321 Spaghetti (cost per pound/453.6 grams) in U.S.... 1980 M01 1981 M03
APU0000701322 0000 701322 Spaghetti and macaroni, per lb. (453.6 gm) in ... 1984 M01 2022 M12
... ... ... ... ... ... ... ... ...
APU0000FJ4101 0000 FJ4101 Yogurt, per 8 oz. (226.8 gm) in U.S. city aver... 2018 M04 2022 M12
APU0000FL2101 0000 FL2101 Lettuce, romaine, per lb. (453.6 gm) in U.S. c... 2006 M01 2022 M12
APU0000FN1101 0000 FN1101 All soft drinks, per 2 liters (67.6 oz) in U.S... 2018 M04 2022 M12
APU0000FN1102 0000 FN1102 All soft drinks, 12 pk, 12 oz., cans, per 12 o... 2018 M04 2022 M12
APU0000FS1101 0000 FS1101 Butter, stick, per lb. (453.6 gm) in U.S. city... 2018 M04 2022 M12

160 rows × 8 columns

We have 160 time series out of the original 1479. These are global U.S. time series which are more relevant for forecasting global financial markets. Let us select time series which are currently being updated and have at least 20 years of history:

In [7]:
actual_us_series_list = [s for s in us_series_list if s['begin_year'] <= '2000' and s['end_year'] == '2021' ]

display(pd.DataFrame(actual_us_series_list).set_index('id')) # convert to pandas for better formatting
area_code item_code series_title footnote_codes begin_year begin_period end_year end_period
id
APU0000711417 0000 711417 Grapes, Thompson Seedless, per lb. (453.6 gm) ... 1980 M07 2021 M09
In [8]:
len(actual_us_series_list)
Out[8]:
1

We have 55 time series whose history is long enough for our purpose. Now we can load one of these series and use it for our strategy. Let us focus on energy markets. We consider fuel oil APU000072511 on a monthly basis:

In [9]:
series_data = qndata.blsgov.load_series_data('APU000072511', tail = 30*365)

# convert to pandas.DataFrame
series_data = pd.DataFrame(series_data)
series_data = series_data.set_index('pub_date')

# remove yearly average data, see period dictionary
series_data = series_data[series_data['period'] != 'M13']

series_data
100% (37710 of 37710) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
Out[9]:
year period footnote_codes value
pub_date
1994-10-14 1994 M09 [] 0.894
1994-11-14 1994 M10 [] 0.890
1994-12-14 1994 M11 [] 0.894
1995-01-14 1994 M12 [] 0.900
1995-02-14 1995 M01 [] 0.913
... ... ... ... ...
2022-09-14 2022 M08 [] 4.953
2022-10-14 2022 M09 [] 4.815
2022-11-14 2022 M10 [] 5.786
2022-12-14 2022 M11 [] 5.240
2023-01-14 2022 M12 [] 4.344

340 rows × 4 columns

Next, let us consider Futures contracts in the Energy sector:

In [10]:
futures_list = qndata.futures_load_list()

energy_futures_list = [f for f in futures_list if f['sector'] == 'Energy']

pd.DataFrame(energy_futures_list)
100% (7168 of 7168) |####################| Elapsed Time: 0:00:00 Time:  0:00:00
Out[10]:
id name sector point_value
0 F_BC Crude Oil Brent Energy $1,000
1 F_BG Gasoil Low Sulphur Energy $100
2 F_HO Heating Oil Energy $42,000
3 F_NG UK Natural Gas Energy GBP 1,000
4 F_RB JPX Gasoline Energy JPY 50
5 F_CL United States Oil Fund Energy 1

We consider Brent Crude Oil, F_BC, and define a strategy using a multi-pass approach:

In [11]:
import xarray as xr
import numpy as np
import pandas as pd

import qnt.ta as qnta
import qnt.backtester as qnbt
import qnt.data as qndata


def load_data(period):
    
    futures = qndata.futures_load_data(assets=['F_BC'], tail=period, dims=('time','field','asset'))
    
    ap = qndata.blsgov.load_series_data('APU000072511', tail=period)
    
    # convert to pandas.DataFrame
    ap = pd.DataFrame(ap) 
    ap = ap.set_index('pub_date') 

    # remove yearly average data, see period dictionary
    ap = ap[ap['period'] != 'M13']
    
    # convert to xarray
    ap = ap['value'].to_xarray().rename(pub_date='time').assign_coords(time=pd.to_datetime(ap.index.values))
    
    # return both time series
    return dict(ap=ap, futures=futures), futures.time.values


def window(data, max_date: np.datetime64, lookback_period: int):
    # the window function isolates data which are needed for one iteration
    # of the backtester call
    
    min_date = max_date - np.timedelta64(lookback_period, 'D')
    
    return dict(
        futures = data['futures'].sel(time=slice(min_date, max_date)),
        ap = data['ap'].sel(time=slice(min_date, max_date))
    )


def strategy(data, state):
    
    close = data['futures'].sel(field='close')
    ap = data['ap']
    
    # the strategy complements indicators based on the Futures price with macro data
    # and goes long/short or takes no exposure:
    
    if ap.isel(time=-1) > ap.isel(time=-2) \
            and close.isel(time=-1) > close.isel(time=-20):
        return xr.ones_like(close.isel(time=-1)), 1
    
    elif ap.isel(time=-1) < ap.isel(time=-2) \
            and ap.isel(time=-2) < ap.isel(time=-3) \
            and ap.isel(time=-3) < ap.isel(time=-4) \
            and close.isel(time=-1) < close.isel(time=-40):
        return -xr.ones_like(close.isel(time=-1)), 1 
    
    # When the state is None, we are in the beginning and no weights were generated.
    # We use buy'n'hold to fill these first days.
    elif state is None: 
        return xr.ones_like(close.isel(time=-1)), None
    
    else:
        return xr.zeros_like(close.isel(time=-1)), 1


weights, state = qnbt.backtest(
    competition_type='futures',
    load_data=load_data,
    window=window,
    lookback_period=365,
    start_date="2006-01-01",
    strategy=strategy,
    analyze=True,
    build_plots=True
)
Run last pass...
Load data...
100% (35952132 of 35952132) |############| Elapsed Time: 0:00:00 Time:  0:00:00
100% (2 of 2) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-dfba7cfcdb6d> in <module>
     75     strategy=strategy,
     76     analyze=True,
---> 77     build_plots=True
     78 )

/usr/local/lib/python3.7/site-packages/qnt/backtester.py in backtest(competition_type, strategy, load_data, lookback_period, test_period, start_date, end_date, window, step, analyze, build_plots, collect_all_states)
    273     log_info("Run last pass...")
    274     log_info("Load data...")
--> 275     data = load_data(lookback_period)
    276     try:
    277         if data.name == 'stocks' and competition_type != 'stocks' and competition_type != 'stocks_long'\

<ipython-input-11-dfba7cfcdb6d> in load_data(period)
     16     # convert to pandas.DataFrame
     17     ap = pd.DataFrame(ap)
---> 18     ap = ap.set_index('pub_date')
     19 
     20     # remove yearly average data, see period dictionary

/usr/local/lib/python3.7/site-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4725 
   4726         if missing:
-> 4727             raise KeyError(f"None of {missing} are in the columns")
   4728 
   4729         if inplace:

KeyError: "None of ['pub_date'] are in the columns"