Test Anomaly Modules Using Matrix Profile¶

Both New York taxi data and steam generator data are used
Demonstrate a couple of functions of matrix profile:
- normal: calculate matrix profile of each time series
- approx: compute an approximate matrix profile
- incremental: compute incremental matrix profile for streaming time series data
- kdp: identify the best K out of N time series that have detected anomalies at the same time slot using k-dimensional profile

Anomaly types and matrix profile can convert different types fo anomalies into outliers¶

Set up paths and load matrix profile module¶

[1]:

import numpy as np
import pandas as pd
import os
import sys
import warnings
warnings.filterwarnings("ignore")
# set up DACKAR path
cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)
# set up data path
data_path = os.path.abspath(os.path.join(cwd, os.pardir, 'data'))
# Load MatrixProfile module for anomaly detection
from dackar.anomalies.MatrixProfile import MatrixProfile

Calculate the matrix profiles for NY taxi data¶

Anomalies occur during Columbus day, Daylight savings and Thanksgiving

Link: https://stumpy.readthedocs.io/en/latest/Tutorial_STUMPY_Basics.html

Load NY taxi data

[2]:

# Load taxi data
taxi_data_file = os.path.abspath(os.path.join(data_path, 'nyc_taxi_passengers.csv'))
taxi_df = pd.read_csv(taxi_data_file, index_col='timestamp')
taxi_df['value'] = taxi_df['value'].astype(np.float64)
taxi_df.index = pd.to_datetime(taxi_df.index, errors='ignore')
taxi_df.head()

[2]:

	value
timestamp
2014-10-01 00:00:00	12751.0
2014-10-01 00:30:00	8767.0
2014-10-01 01:00:00	7005.0
2014-10-01 01:30:00	5257.0
2014-10-01 02:00:00	4189.0

Compute the Matrix Profile for NY taxi data

[3]:

# set up sliding window side to 48 hours
m = 48
mpObj = MatrixProfile(m, normalize='robust', method='normal')
mpObj.fit(taxi_df)
fig = mpObj.plot()

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Calculate matrix profile for steam generator data¶

Load data

[4]:

steam_gen_data_file = os.path.abspath(os.path.join(data_path, 'Steamgen.csv'))
steam_df = pd.read_csv(steam_gen_data_file)
steam_df.head()

[4]:

	drum pressure	excess oxygen	water level	steam flow
0	320.08239	2.506774	0.032701	9.302970
1	321.71099	2.545908	0.284799	9.662621
2	320.91331	2.360562	0.203652	10.990955
3	325.00252	0.027054	0.326187	12.430107
4	326.65276	0.285649	0.753776	13.681666

Compute matrix profile for steam generator data

[5]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='normal')
mpObj.fit(steam_df)
fig = mpObj.plot()

Testing ‘approx’ method to compute matrix profile¶

[6]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='approx')
mpObj.fit(steam_df)
fig = mpObj.plot()

Enable Streaming, use ‘evaluate’ function for streaming data¶

Compute matrix profile for first 1000 data points

[7]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='incremental')
mpObj.fit(taxi_df.iloc[0:1000])
fig = mpObj.plot()

Evaluate more data points to demonstrate data streaming

[8]:

mpObj.evaluate(taxi_df.iloc[1000:])
fig = mpObj.plot()

Test different data structure¶

Data structure: numpy.ndarray

[9]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='normal')
mpObj.fit(steam_df.to_numpy())
fig = mpObj.plot()

Data structure: 1-D array data

[10]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='normal')
mpObj.fit(steam_df.to_numpy()[:,0])
fig = mpObj.plot()

Data structure: pandas.Series data

[11]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='normal')
mpObj.fit(steam_df['drum pressure'])
fig = mpObj.plot()

Test Multi-Dimensional Anomaly Detection: Identify Best K out of N Anomalies¶

[12]:

m = 48
mpObj = MatrixProfile(m, normalize='robust', method='normal', kdp=True)
mpObj.fit(steam_df)
fig = mpObj.plot_kdp()