Time Series ARIMA model-Finding best fit AIC -pdq/s by grid search

DataScience Deep Dive
3 min readDec 26, 2020

--

Usually, in the basic ARIMA model, we need to provide the p,d, and q values which are essential. We use statistical techniques to generate these values by performing the difference to eliminate the non-stationarity and plotting ACF and PACF graphs.

In Auto ARIMA, the model itself will generate the optimal p, d, and q values which would be suitable for the data set to provide better forecasting. This can be done using the pmdarima package.

For steps on how to run the auto-Arima model this link has it stepped out well:

Running the pmdarima package required me to not only install pmdarima but also to update my version of pip install.

! pip install pmdarima

from pmdarima.arima import auto_arima

Note: this package depends on several other python packages

A workaround without using the pmdarima library is to run a grid search function, see example code:

When calling this function it will produce the best pdq(s) params based on the lowest AIC value:

In the above example, the pdq for ARIMA modeling would be (0,1,1) and for SARIMAX modeling (1,1,1,12) given that the lowest AIC calculated was 466.665

The example Time Series project I used this for with the complete steps of modeling Zillow house price data analysis is available on my Github:

https://github.com/Sue-Mir/Module4_Project_Time_Series_Modelling/blob/master/time-series/TimeSeries.ipynb

Here’s the code for the grid search function :

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

def AIC_PDQS(df):
‘’’
Runs grid search to return lowest AIC result for permutations of pdq/s values in range 0,2

df - Dataframe to analyze for best pdq/s permutation
‘’’

# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)

# Auto-Regressive (p) -> Number of autoregressive terms.
# Integrated (d) -> Number of nonseasonal differences needed for stationarity.
# Moving Average (q) -> Number of lagged forecast errors in the prediction equation.

# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))

# Generate all different combinations of seasonal p, q and q triplets
pdqs = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

# Run a grid with pdq and seasonal pdq parameters calculated above and get #the best AIC value
ans = []
for comb in pdq:
for combs in pdqs:
try:
mod = sm.tsa.statespace.SARIMAX(df,order=comb,seasonal_order=combs,
enforce_stationarity=False,enforce_invertibility=False)

output = mod.fit()
ans.append([comb, combs, output.aic])
print(‘ARIMA {} x {}12 : AIC Calculated ={}’.format(comb, combs, output.aic))
except:
continue

# Find the parameters with minimal AIC value
ans_df = pd.DataFrame(ans, columns=[‘pdq’, ‘pdqs’, ‘aic’])
print(ans_df.loc[ans_df[‘aic’].idxmin()])
return ans_df

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

--

--

DataScience Deep Dive
DataScience Deep Dive

Responses (1)