Gestão & Produção
Gestão & Produção
Artigo Original

Evaluation of classification techniques for identifying fake reviews about products and services on the internet

Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda

Downloads: 0
Views: 102


Abstract: With the e-commerce growth, more people are buying products over the internet. To increase customer satisfaction, merchants provide spaces for product and service reviews. Products with positive reviews attract customers, while products with negative reviews lose customers. Following this idea, some individuals and corporations write fake reviews to promote their products and services or defame their competitors. The difficulty for finding these reviews was in the large amount of information available. One solution is to use data mining techniques and tools, such as the classification function. Exploring this situation, the present work evaluates classification techniques to identify fake reviews about products and services on the Internet. The research also presents a literature systematic review on fake reviews. The research used 8 classification algorithms. The algorithms were trained and tested with a hotels database. The CONCENSO algorithm presented the best result, with 88% in the precision indicator. After the first test, the algorithms classified reviews on another hotels database. To compare the results of this new classification, the Review Skeptic algorithm was used. The SVM and GLMNET algorithms presented the highest convergence with the Review Skeptic algorithm, classifying 83% of reviews with the same result. The research contributes by demonstrating the algorithms ability to understand consumers’ real reviews to products and services on the Internet. Another contribution is to be the pioneer in the investigation of fake reviews in Brazil and in production engineering.


Fake reviews, Text classification, Knowledge discovery in databases, Text mining


Akinator. 2018.

Andrade A. L., Seleme A., Rodrigues L. H., Souto R. Pensamento sistêmico caderno de campo. 2006.

Blei D., McAuliffe J. Supervised topic models. 2010:121-8.

Breiman L. Bagging predictors. Machine Learning. 1996;24(2):123-40.

Breiman L., Friedman J., Stone C. J., Olshen R. A. Classification and regression trees. 1984.

Cauchick M. P. A., Fleury A., Mello C. H. P., Nakano D. N., Lima E. P., Turrioni J. B., Ho L. L., Morabito R., Martins R. A., Sousa R., Costa S. E. G., Pureza V. Metodologia de pesquisa em engenharia de produção e gestão de operações. 2011.

Rave reviews: why do they matter most to local businesses. 2014.

Cormack G. V. Email spam filtering: a systematic review. Foundation and Trends in Information Retrieval. 2008;1(4):335-455.

Fan R., Chen P., Lin C. Working set selection using second order information for training support vector machines. Journal of Machine Learning Research. 2005;6:1889-918.

Fayyad U., Piatetsky-Shapiro G., Smyth P. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM. 1996;39(11):27-34.

Fei G., Mukherjee A., Liu B., Hsu M., Castellanos M., Ghosh R. Exploiting burstiness in reviews for review spammer detection. 2013:175-84.

Feldman R., Sanger J. The text mining handbook. 2007.

Freund Y., Schapire R. A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 1997;55(1):119-39.

Friedman J., Hastie T., Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010;33(1):1-22.

Fuller C., Biros D., Delen D. An investigation of data and text mining methods for real world deception detection. Expert Systems with Applications. 2011;38(7):8392-8.

Ghosh S., Roy S., Bandyopadhyay S. A tutorial review on text mining algorithms. International Journal of Advanced Research in Computer and Communication Engineering. 2012;1(4):223-33.

Groth R. Data mining: building competitive strategy. 2000.

Han J., Kamber M. Data mining: concepts and techniques. 2006.

Hotho A., Nürnberger A., Paab G. A brief survey of text mining. Journal for Computational Linguistics and Language Technology. 2005;20(1):19-62.

Hu M., Liu B. Mining opinion features in costumer reviews. 2004:755-60.

Jindal N., Liu B. Review spam detection. 2007:1189-90.

Jindal N., Liu B. Opinion spam and analysis. 2008:219-30.

Jindal N., Liu B., Lim E. Findind unusual review patterns using unexpected rules. 2010:1549-52.

Jurka T. P., Collingwood L., Boydstun A. E., Grossman E., Atteveldt W. RTextTools: a supervised learning package for text classification. The R Journal. 2013;5(1):6-12.

Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. 1995:1137-43.

Lappas T. Fake reviews: the malicious perspective. 2012:23-4.

Larose D. Discovering knowledge in data: an introduction to data mining. 2005.

Lau R. Y. K., Liao S. Y., Kwok R. C., Xu K., Xia Y., Li Y. Text mining and probabilistic language modeling for online review spam detection. ACM Transactions on Management Information Systems. 2011;2(4):2501-30.

Li F., Huang M., Yang Y., Zhu X. Learning to identify review spam. 2011:2488-93.

Liaw A., Wiener M. Classification and regression by randon forest. R News. 2002;2(3):18-22.

Liu B. Web data mining. 2007.

Liu B. Opinion spam detection: detecting fake reviews and reviewers. 2008.

Liu B. Sentiment analysis and opinion mining. 2012.

Lu Y., Zhang L., Xiao Y., Li Y. Simultaneously detecting fake reviews and review spammers using factor graph model. 2013:225-33.

Malbon J. Taking fake online consumer reviews seriously. Journal of Consumer Policy. 2013;36(2):139-57.

Mitchell T. Machine learning and data mining. Communications of the ACM. 1999;42(11):30-6.

Mukherjee A., Kumar A., Liu B., Wang J., Hsu M., Castellanos M., Ghosh R. Spotting opinion spammers using behavioral footprints. 2013:632-40.

Mukherjee A., Liu B., Glance N. Spotting fake reviewer groups in consumer reviews. 2012:191-200.

Mukherjee A., Liu B., Wang J., Glance N., Jindal N. Detecting group review spam. 2011:93-4.

Mukherjee A., Venkataraman V., Liu B., Glance N. What yelp fake review filter might be doing. 2013:409-18.

Ott M., Cardie C., Hancock J. Negative deceptive opinion spam. 2013:497-501.

Ott M., Choi Y., Cardie C., Hancock J. Finding deceptive opinion spam by any stretch of the imagination. 2011:309-19.

Qian T., Liu B. Identifying multiple userids of the same author. 2013:1124-35.

What is R?.. 2018.

Review Skeptic is based on research at Cornell University that uses machine learning to identifiy fake hotel reviews with nearly 90% accuracy. 2013.

Saunders M., Lewis P., Thornhill A. Research methods for business students. 2009.

Sharma K., Lin K. Review spam detector with rating consistency check. 2013:341-6.

Tan P., Steinbach M., Kumar V. Introdução ao data mining. 2009.

Hóteis em Porto Alegre. 2018.

Weiss S., Indurkhya N., Zhang T. Fundamentals of predictive text mining. 2010.

Wu X., Kumar V., Ross Quinlan J., Ghosh J., Yang Q., Motoda H., McLachlan G. J., Ng A., Liu B., Yu P. S., Zhou Z.-H., Steinbach M., Hand D. J., Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems. 2008;14(1):1-37.

Xie S., Wang G., Lin S., Yu P. S. Review spam detection via temporal pattern discovery. 2012:823-31.

Zhao Y., Yang S., Narayan V., Zhao Y. Modeling consumer learning from online product reviews. Marketing Science. 2013;32(1):153-69.

5ff6f70e0e8825fd5f5aeabd gp Articles

Gest. Prod.

Share this page
Page Sections