Time Series ARIMA model-Finding best fit AIC -pdq/s by grid search
Usually, in the basic ARIMA model, we need to provide the p,d, and q values which are essential. We use statistical techniques to generate these values by performing the difference to eliminate the non-stationarity and plotting ACF and PACF graphs.
In Auto ARIMA, the model itself will generate the optimal p, d, and q values which would be suitable for the data set to provide better forecasting. This can be done using the pmdarima package.
For steps on how to run the auto-Arima model this link has it stepped out well:
Running the pmdarima package required me to not only install pmdarima but also to update my version of pip install.
! pip install pmdarima
from pmdarima.arima import auto_arima
Note: this package depends on several other python packages
A workaround without using the pmdarima library is to run a grid search function, see example code:
When calling this function it will produce the best pdq(s) params based on the lowest AIC value:
In the above example, the pdq for ARIMA modeling would be (0,1,1) and for SARIMAX modeling (1,1,1,12) given that the lowest AIC calculated was 466.665
The example Time Series project I used this for with the complete steps of modeling Zillow house price data analysis is available on my Github:
Here’s the code for the grid search function :
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
def AIC_PDQS(df):
‘’’
Runs grid search to return lowest AIC result for permutations of pdq/s values in range 0,2
df - Dataframe to analyze for best pdq/s permutation
‘’’
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)
# Auto-Regressive (p) -> Number of autoregressive terms.
# Integrated (d) -> Number of nonseasonal differences needed for stationarity.
# Moving Average (q) -> Number of lagged forecast errors in the prediction equation.
# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
# Generate all different combinations of seasonal p, q and q triplets
pdqs = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
# Run a grid with pdq and seasonal pdq parameters calculated above and get #the best AIC value
ans = []
for comb in pdq:
for combs in pdqs:
try:
mod = sm.tsa.statespace.SARIMAX(df,order=comb,seasonal_order=combs,
enforce_stationarity=False,enforce_invertibility=False)
output = mod.fit()
ans.append([comb, combs, output.aic])
print(‘ARIMA {} x {}12 : AIC Calculated ={}’.format(comb, combs, output.aic))
except:
continue
# Find the parameters with minimal AIC value
ans_df = pd.DataFrame(ans, columns=[‘pdq’, ‘pdqs’, ‘aic’])
print(ans_df.loc[ans_df[‘aic’].idxmin()])
return ans_df
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —