Научные статьи \ Математика. Естественные науки \ Математика \ Математическая кибернетика

Efficient Predictive Model for Determining Critical Factors Affecting Commodity Price: The Case of Coffee in Ethiopian Commodity Exchange (ECX)

Автор: Worku Abebe Degife, Dr.ing. Abiot Sinamo

Журнал: International Journal of Information Engineering and Electronic Business @ijieeb

Статья в выпуске: 6 vol.11, 2019 года.

Бесплатный доступ

In this paper, we have focused on the data mining technique on market data to establish meaningful relationships or patterns to determine the determinate critical factors of commodity price. The data is taken from Ethiopia commodity exchange and 18141 data sets were used. The dataset contains all main information. The hybrid methodology is followed to explore the application of data mining on the market dataset. Data cleaning and data transformation were used for preprocessing the data. WEKA 3.8.1 data mining tool, classification algorithms are applied as a means to address the research problem. The classification task was made using J48 decision tree classification algorithms, and different experimentations were conducted. The experiments have been done using pruning and unpruning for all attributes. The developed models were evaluated using the standard metrics of accuracy, ROC area. The most effective model to determine the determinate critical factors for the commodity has an accuracy of 88.35% and this result is a good experiment result.

Еще

Commodity Price, Predictive Model, J48 decision tree

Короткий адрес: https://sciup.org/15017071

IDR: 15017071 | DOI: 10.5815/ijieeb.2019.06.05

Текст научной статьи Efficient Predictive Model for Determining Critical Factors Affecting Commodity Price: The Case of Coffee in Ethiopian Commodity Exchange (ECX)

Published Online November 2019 in MECS DOI: 10.5815/ijieeb.2019.06.05

Modern commodity exchanges dates back to the trading of Rice futures in the 17th century in Osaka, Japan[1]. With the liberalization of agricultural trade in many countries, and the withdrawal of government support to agricultural producers a new need arises for price discovery and even physical trading mechanisms, a need that can often be met by commodity exchanges. Hence, the rapid creation of new commodity exchanges, and the expansion of existing ones have increased over the past decade.

At present, there are major commodity exchanges globally and a large number of brand new exchanges have been created during the past decade in developing countries[1].

Although the major task of ECX is protecting its customers through the modern trading system, controlling the factors affecting commodity prices should also be taken into consideration for smooth and healthy exchanges between customers. In this regard number of factors can be considered as determinants but there is no a clear indicator that can directly be inferred out of the available commodity exchange data. Therefore, as one means of potential intervention, data mining research can be conducted to unveil the unseen pattern that can potentially be useful for regulating commodity exchange trends.

II. Literature Review
- 2.1. Data Mining
2.2. Related Works

Data mining is the process to discover interesting knowledge from large amounts of data[3]. Nowadays, data stored in market databases are growing in gradually rapid way. Therefore, it is necessary to analyse this huge amount of data and extract useful information from it. Data mining is the process that results in the discovery of new patterns in large data sets. The goal of the data mining process is to extract knowledge from an existing data set and transform it into a human understandable formation for advance use. It is the process of analyzing data from different perspectives and summarizing it into useful information. There is no restriction to the type of data that can be analyzed by data mining. We can analyze data contained in a relational database, a data warehouse, a web server log or a simple text file. Analysis of data in effective way requires understanding of appropriate techniques of data mining. [3,4,5,6] In addition to this, data mining technology can generate new business opportunities by providing automated process of finding predictive information in large databases and discovery of previously unknown patterns.

A large data collection is required for producing information. Only data retrieving is not enough, rather we need a means to automate the aggregation of data, information extraction, and recognize discovery patterns in The source data. Files, databases and other repositories consists of huge amount of data, hence it is necessary to develop a prevailing tool for analysis and explanation of data and extracting interesting knowledge to facilitate in decision making. Data mining can solve all of the tasks[7].

Data mining is a method of extracting unknown projecting information from large databases which is a widespread technology that helps organizations to focus on the most important information in data repositories with great potential[7,8,9,10]. Data analysis tools predict future trends and behavior, helping organizations in active business solutions to knowledge driven decisions [6,11,12]. Intelligent data analysis tools produce a database to search for hidden patterns, finding projecting information that may be missed due to beyond experts’ prediction. The task of data mining are varied and distinct because there are many patterns in a large database. Deferent kinds of methods and techniques are needed to find deferent kinds of patterns[10,12,13].

The first work in this regard is the work done by Ticlavilca et al [14]. This work applied a MVRVM model to develop multiple-time-ahead predictions with confidence intervals of monthly agricultural commodity prices. The predictions are one, two and three months ahead of prices of cattle, hogs and corn. The MVRVM is a regression tool extension of the RVM model to produce multivariate outputs. The statistical test results indicate an overall good performance of the model for one and two month’s prediction for all the commodity prices. The performance decreased for the three-month prediction of the three commodity prices. The MVRVM model outperforms the ANN most of the time with the exception of corn price prediction two and three months ahead. However, the bootstrap histograms of the MVRVM model show narrow confidence bounds in comparison to the histograms of the ANN model for the three commodity price forecasts. Based on this study, the MVRVM is more robust.

From the study in [15], the author discussed, determining the Stock market forecasts has always been challenging work for business analysts. In the paper, the author attempted to make use of these huge chaotic in nature data to predict the stock market indices. Moreover, if we combine both these chaotic data and numeric time series analysis, the accuracy in predictions can be achieved. Investors can use this prediction model to take trading decision by observing market behavior. Enhancements of this system are focused to help in improving more accurate predictability in stock market regardless how chaotic the stock market data can be.

The study of Santoso and Rusdianto [16] resulted with a “Hybrid Clustering Method for Stock Price and Commodity Price”. They find out that, combination between K-Means Clustering and Principal Component Analysis can give better analysis and classification. The results of using those two methods are a dimension reduced cluster (compact cluster). The result is supported by some findings in the reality through having some observations in the news and daily reports of the company conditions. In their summary, the study has shown that it is an effective method to use a hybrid method to cluster the stock price and commodity price using K-Means Clustering and Principal Component Analysis. It is supported that the result of the Principal Component Analysis can have a direct impact on the number and type of dietary patterns revealed in the data. They also mentioned that, reducing the dimension of the cluster is an important task and it can be implemented by using Principal Component Analysis. Moreover, for further research works, they suggested to combine the K-Means algorithm, Principal Component Analysis, and Neural Network to have better solutions. By conducting the Neural Network to the established clusters, it will give the exact information on how to identify the information in every cluster.

III. Problem of the Statement

Knowing how an industry is influenced by market trends is essential to stay competitive and meeting consumers’ needs. In order to keep a company ahead of the competition, it is also important to utilize market trend analysis that is the process of evaluating changes to a given market. As a pioneer and modern marketing platform in Ethiopia [2]. ECX should also determine factors affecting commodities prices. Although the global market plays a significant role in this aspect, there is no a clear cutting rule explicitly list the determent factors and indicate the relationships within the factors.

As a researcher knowledge there was no any work that studded determinate factors of the commodities price which one considered in ECX. The study will help the organization (ECX) to take action according to the discovered knowledge and experts and suppliers also use for decision making.

IV. Experimentation Design

In this research number of experiments for J48 were conducted and high accuracy values are recorded. A total of 18141 datasets with 7 independent attributes and one dependent variable are used throughout all the experiments. Once the experimental setups are established, building model with a number of parameters that govern the model generation process would be the next task.
4.1. Model Building Using J48 Decision Tree

Table 1 Sample values of final attributes after pre-processed

1	Deference	Season	CommodityUame	Description	Origin	Volume	Warehouse	Oto sin g_ Prise
2	One	Winter	Expcrt_Coffee	Ranks	Kaffa	X	Bedelle	Low
;	One	Spring	ExportCoffee	RankS-	Wolega	Y	Girti	Low
4	One	Spring	Speca ty_Ccffee	Rank!	Yrga chafe	r	Dte_Hawassa	High
5	One	Spring	Speciaty_Ceffee	Rank2	Gelana_Ab	x	Dita Hawassa	Hgh
5	One	Spring	Specie ty_Ceffee	Rank!	Yaga chafe	X	Dte_Hawussa	Medium
	One	Spring	Speciaty_Coffee	Rankl	Sktama	X	Dita Hawassa	High
8	One	Summer	Expcrt_Coffee	Ranks	Harar	w	DkeDawa	High
;	One	Summer	ExportCoffee	Ranki	Harar	w	DkeDawa	High
10	□пе	Spring	Expcrt_Coffee	Rank»	Harar	X	DkeDawa	High
И	One	Spring	Export_Coffee	Ranks	Sktama	X	Dita Hawassa	Medium
12	One	Spring	Specie ty_Ceffee	Rank?	Sktama	w	Dte Hawassa	High
13	One	Spring	Expcrt_Coffee	RankS-	Kaffa	Y	Bonga	Low
14	Samejear	Autumn	Expcrt_Coffee	Ranks	Sktama	z	Dte Hawassa	High
15	Same_year	Autumn	Export_Coffee	Rank»	Harar	X	DkeDawa
16	One	Spring	Expcrt_Coffee	Ranks	Kaffa	w	Dte_Hawassa	Low
17	One	Y/hlr	Expcrt_Coffee	RankS-	Kaffa	X	Bedelie	Medium
18	One	Spring	Expcrt_Coffee	Ranks	Sktama	w	Dte_Hawassa	Low
19	Same_year	Y/hlr	Spedaty_Ceffee	Rank2	Sktama	X	Dte_Hawassa	Medium
20	One	Summer	Specie ty_Ceffee	Rank!	Sktama	X	Dte_Hawassa	High
21	One	Autumn	Export_Coffee	RankS-	Sktama	w	Dte_Hawassa	High
22	One	Spring	Expcrt_Coffee	RankS	Wolega	r	Girti	Low
23	j - ^	Y/hlr	Export_Coffee	RankS-	Harar	w	DkeDawa	High

Model building is an iterative process. Therefore, it is important to conduct different experiments to find the optimal model to address the problem. In this study, different experiments are conducted by altering parameters of the J48 decision tree but only some of the experiments are presented here which score high accuracy [17,118]. (Compare to each other, this scenario is also used for other selected algorithm). J48 algorithm contains some parameters that can be changed to further improve classification accuracy. Initially the classification model is built with the default parameter values of the J48 algorithm. Table 2 summarizes the default parameters with their values for the J48 decision tree algorithm.

Table 2 Some of the J48 algorithm parameters and their default values

Parameter	Description	Default Value
ConfidenceFactor	The confidence factor used to for pruning (smaller values incur more pruning)	0.25
MinNumObj	The minimum number of instances per leaf	2
Unpruned	Whether pruning	False

By changing the different default parameter values of the J48 algorithm, the experimentations of the decision tree model-building phase are approved.

Table 3 Values of parameters used for J48 algorithm

Experiments	Parameters
Experiments	d	Confidence	Numobj)	Test option
Experiment #1	True	0.25	2	10 fold cross validation
Experiment #2	True	0,5	5	10 fold cross validation
Experiment #3	True	0.5	2	80% percentage split
Experiment #4	False	0.25	2	10 fold cross validation
Experiment #5	False	0.25	5	10 fold cross validation
Experiment #6	False	0.5	5	80% percentage split

The performance measures (in terms of accuracy, ROC, and other effectiveness measures) of the above six experiments of J48 decision tree algorithms are organized in Table 4 below.

Table 4 Experimentation result of J48 Algorithms

Performance measurements	Experiments
Performance measurements	#1	#2	#3	#4	#5	36
Accuracy (%)	88.31	S3.35	88.00	88.34	88.32	87.95
Numbers of	77	142	1.97	197	134	134
Size of tree	98	182	253	253	172	172
Tune taken	0.08	0.06	0.08	0.08	0.06	0.05
ROC area	0.919	0.969	0.972	0.972	0.970	0.972
ca	16022	16031	3193	1629	16023	3191
ICCI	2119	2110	435	2112	2118	437

Key : CCI: Correctly classified Instance, ICCI (Incorrectly classified Instance), ROC: Relative Optical character curve.

On presented Table 4 the result of each experiment developed. The experiment was designed to evaluate the performance of a J48 classifier Unpruned and pruned tree. The J48 pruned achieved more accuracy out of the 6 experiments . As the result Experiment #2 (Building decision tree pruned and testing option10 cross validation) is best based on accuracy which registered 88.35% and correctly classified instance which is accounts 16031 out of 18141. And The Number of leaves and the size of the tree have a value of 142 and 182 respectively.

Therefore, the researcher selected Experiment #2 (pruned J48 decision tree with 10 fold cross validation) for comparison with other classification algorithms out of the 8 experiments.

In addition to this, the researcher attempted to use different parameters out of that listed in the table 2 to increases the accuracy of the model and to minimize the number of leaves and tree for all experiments. With this objective in mind, the MinNumObj (minimum number of objects in a leaf) parameter was tried with a value 10, 15 and 20. The confidence factor also tried with a value 0.6, 0.7... But the result is not much improved when compare it with Experiment#2. This is because if the value of MinNumObj increases, the number of the leaves and size of the tree also decreases but the accuracy and the performance of the model decrease. Consequently, if the value of confidence factor increases, the number of the leaves and size of the tree also increases as well as accuracy and the performance of the model decrease. This is because smaller values incur more pruning.

V. Conclusions

In this study, DM techniques have been used with the aim of identifying and critical determinate factors for commodities price. The hybrid process model was followed during undertaking experimentation and discussion. The data set used in this study has been taken from ECX market data. After taking the data, it has been preprocessed and prepared in a format suitable for the DM tasks.

The study was conducted using classification techniques namely decision tree, for model building and experimentation J48 Experimentation was conducted using two test options (10-fold cross validation and 80 % percentage split) for each experiment. Various experiments were made by making adjustments on the modeling parameters in order to come up with meaningful results. J48 algorithms performed with accuracy of 88.35%. The results from this study can contributes towards encouraging and supporting the decision making process for marketing organizations and marketers. To conclude, the study showed that the problem of fluctuating the commodities price can be solve using data mining techniques.

VI. Recommendations

This study has provided a predictive model for determining critical factors of commodity price in ECX. Based on the results and findings obtained from the study, the researcher forwards the following recommendations and potential future works.

> The output of this research is helpful to the domain experts of ECX and suppliers for identifying the determinant factors for commodities price. This can help for solving problems rising during trading activity with regard to price fluctuations.

The model which is developed in this research generates various patterns and rules that can be used by ECX branches in deferent location of the country.

Market policy makers, suppliers and planners can have useful insight for future planning and special intervention program based on the findings of the study.

In this study the scope was limited to the commodity namely coffee, further study can be conducted by considering different commodities including local coffee.

This study can be used as an input for the development of a full-fledged model that can be used for price prediction by developing a full-fledged decision support system. Graphical User Interface as a prototype of the proposed model

Other methodology or classification algorithms (neural network and Support Vector Machine) can also be used for further research work.

Acknowledgement

We would like thank to Ethiopian Commodity Exchange (ECX) higher management for their willingness to access the data and assigned the experts. We thank to Ethiopian Commodity Exchange (ECX) experts for their support and valuable comments.

Список литературы Efficient Predictive Model for Determining Critical Factors Affecting Commodity Price: The Case of Coffee in Ethiopian Commodity Exchange (ECX)

C. J. Santana-Boado, Leonela; Brading, ‘Commodity exchanges in a globalized economy’, no. September, pp. 1–8, 2000.
M. A. Hernandez, S. Lemma, and S. Rashid, ‘The Ethiopian commodity exchange and the coffee market: Are local prices more integrated to global markets?’, no. April 2008, 2015.
H. Patel and D. Patel, ‘A Brief survey of Data Mining Techniques Applied to Agricultural Data’, vol. 95, no. 9, pp. 6–8, 2014.
T. Edition and T. C. Corporation, Introduction to Data Mining and. .
M. Gandhi and G. Vishwavidyalaya, ‘Data mining Techniques for Predicting Crop Productivity – A review article’, vol. 4333, pp. 98–100, 2011.
G. Marketos, K. Pediaditakis, and Y. Theodoridis, ‘Intelligent Stock Market Assistant using Temporal Data Mining’, pp. 1–11.
D. Das and M. S. Uddin, ‘T he S Tock M arkeT’, vol. 4, no. 1, pp. 117–127, 2013.
D. V. Setty, ‘A Review on Data Mining Applications to the Performance of Stock Marketing’, vol. 1, no. 3, pp. 33–43, 2010.
S. Tiwari and A. Gulati, ‘Prediction of Stock Market from Stream Data Time Series Pattern using Neural Network and Decision Tree’, vol. 7109, pp. 99–102, 2011.
S. Džeroski, ‘Relational data mining’, in Data Mining and Knowledge Discovery Handbook, Springer, 2009, pp. 887–911.
K. V Nesbitt and S. Barrass, ‘Patterns in Stock’, pp. 45–55, 2004.
Niketa Gandhi, Leisa Armstrong,” Applying Data Mining Techniques to predict yield of Rice in Humid Subtropical Climatic Zone of India”. 978-9-3805-44212/16/$31.00_c 2016
A.T.M Shakil Ahamed, Navid Tanzeem Mahmood, Nazmul Hossain, Mohammad Tanzir Kabir, Kallal Das, Faridur Rahman, Rashedur M Rahman,” Applying Data Mining Techniques to Predict Annual Yield of Major Crops and Recommend Planting Different Crops in Different Districts in Bangladesh”, 15978-1-4799-8676-7/15/$31.00 copyright 2015 IEEE SNPD 2015, June 1-3 2015,
A. M. Ticlavilca, D. M. Feuz, and M. Mckee, ‘Forecasting Agricultural Commodity Prices Using Multivariate Bayesian Machine Learning Regression by Andres M . Ticlavilca , Dillon M . Feuz , and Mac McKee’, 2010.
M. P. Naeini, H. Taremian, and H. B. Hashemi, ‘Stock market value prediction using neural networks’, in Computer Information Systems and Industrial Management Applications (CISIM), 2010 International Conference on, 2010, pp. 132–136.
H. Pan, C. Tilakaratne, and J. Yearwood, ‘Predicting Australian stock market index using neural networks exploiting dynamical swings and intermarket influences’, J. Res. Pract. Inf. Technol., vol. 37, no. 1, pp. 43–56, 2005.
Monali Paul, Santosh K. Vishwakarma, Ashok Verma,” Analysis of Soil Behaviour and Prediction of Crop Yield using Data Mining Approach” ,2015 International Conference on Computational Intelligence and Communication Networks, 978-1-5090-0076-0/15 $31.00 © 2015
Ramesh A. Medar,Vijay S. Rajpurohit, “A survey on Data Mining Techniques for crop yield prediction”, International Journal of Advance Research in Computer Science and Management Studies.Volume 2 , Issue 9, Sept 2014.

Еще