A Multilabel Text Classification and Association Rule Mining Framework for Data-Driven Skincare Product Name Generation in E-Commerce
Keywords:
Multilabel Classification, Association, Apriori, Skincare Product Name, E-commerce Data MiningAbstract
The rapid expansion of e-commerce has produced large volumes of unstructured product-name data, particularly in the skincare sector, where a single name often conveys multiple functions. This study proposes an integrated framework combining multilabel text classification and association rule mining to extract structured knowledge from skincare product names and generate data-driven naming recommendations. Using product names scraped from Tokopedia, the dataset was preprocessed through text cleaning, TF-IDF vectorization, and multilabel binarization. Three multilabel classification strategies, Binary Relevance, Classifier Chain, and Label Powerset, were evaluated with Support Vector Machine (SVM), Logistic Regression, Random Forest, and K-Nearest Neighbor algorithms under 60:40, 75:25, and 80:20 train–test splits. According to the experimental results, Binary Relevance achieved the best performance across all strategies, with an F1-test score of 0.925 for the SVM model at an 80:20 split, followed by Random Forest (0.912) and Logistic Regression (0.883). Classifier Chain had consistent results across all, still slightly lower, but robust findings, while Label Powerset obtained the lowest results due to label sparsity. The predicted labels were further explored using the Apriori algorithm, which revealed strong co-occurrence patterns, in particular, moisturizers were a dominant component of product bundles. High-confidence rules (0.99) and strong lift values (2.73) provide evidence of a significant association in the dataset. These insights formed the basis for proposing new naming themes such as Daily Radiance, Glow Hydration, UV Protection, and Clean & Fresh. Overall, the study demonstrates that combining multilabel classification with association rule mining is effective for uncovering naming structures in skincare e-commerce data.
References
[1] M. Viu-Roig and E. J. Alvarez-Palau, “The Impact of E-Commerce-Related Last-Mile Logistics on Cities: A Systematic Literature Review,” Sustainability, vol. 12, no. 16, Aug. 2020, doi: 10.3390/su12166492.
[2] H. J. Bermudez-Sosa, J. Olarte-Henao, and S. Rojas-Berrio, “Sentiment and Emotion Analysis From Textual Information: A Systematic Literature Review,” Journal of Information Science, Jul. 2025, doi: 10.1177/01655515251353170.
[3] O. Ouda, E. AbdelMaksoud, A. A. Abd El-Aziz, and M. Elmogy, “Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multilabel Deep Learning Classification,” Electronics, vol. 11, no. 13, Jul. 2022, doi: 10.3390/electronics11131966.
[4] S. N. Alsubari, S. N. Deshmukh, A. A. Alqarni, N. Alsharif, T. H. H. Aldhyani, F. W. Alsaade, and O. I. Khalaf, “Data Analytics for the Identification of Fake Reviews Using Supervised Learning,” Computers, Materials & Continua, vol. 70, no. 2, doi: 10.32604/cmc.2022.019625.
[5] M. AbdelHamid, A. Jafar, and Y. Rahal, “Levantine Hate Speech Detection in Twitter,” Social Network Analysis and Mining, vol. 12, no. 1, Dec. 2022, doi: 10.1007/s13278-022-00950-4.
[6] Q. A. Hidayaturrohman and E. Hanada, “A Comparative Analysis Of Hyper-Parameter Optimization Methods For Predicting Heart Failure Outcomes,” Applied Sciences, vol. 15, no. 6, 2025, doi: 10.3390/app15063393.
[7] B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng, and M. Lindauer,
“Hyperparameter Optimization: Foundations, Algorithms, Best Practices, And Open Challenges,” WIREs Data Mining and Knowledge Discovery, vol. 13, no. 2, Apr. 2023, doi: 10.1002/widm.1484.
[8] J. Llivisaca and J. Avilés-González, “Customer Segmentation In Food Retail Sector: An Approach From Customer Behavior And Product Association Rules,” in Applied Technologies. ICAT 2022, M. Botto-Tobar, M. Zambrano Vizuete, S. Montes León, P. Torres-Carrión, and B. Durakovic, Eds. Cham: Springer, vol. 1755, pp240-254, 2023, doi: 10.1007/978-3-031-24985-3_18.
[9] O. F. Althuwaynee, A. Aydda, I.-T. Hwang, Y.-K. Lee, S.-W. Kim, H.-J. Park, M.-S. Lee, and Y. Park, “Uncertainty Reduction Of Unlabeled Features In Landslide Inventory Using Machine Learning T-SNE Clustering And Data Mining Apriori Association Rule Algorithms,” Applied Sciences, vol. 11, no. 2, 2021. doi: 10.3390/app11020556.
[10] N. Verma, D. Malhotra, and J. Singh, “Big Data Analytics For Retail Industry Using Mapreduce-Apriori Framework,” Journal of Management Analytics, vol. 7, no. 3, pp424-442, 2020, doi: 10.1080/23270012.2020.1728403.
[11] N. Singhal, C. R. (Chhotu Ram), and H. S. Sirohi, “A Review on Knowledge Discovery from Databases,” in Electronic Systems and Intelligent Computing, Lecture Notes in Electrical Engineering, vol. 860, pp. 457–464, Jan. 2022, doi: 10.1007/978-981-16-9488-2_43.
[12] J. Bogatinovski, L. Todorovski, S. Džeroski, and D. Kocev, “Comprehensive Comparative Study of Multilabel Classification Methods,” Expert Systems with Applications, vol. 203, Nov. 2022, doi: 10.1016/j.eswa.2022.117215.
[13] I. D. Hunyadi, N. Constantinescu, and O.-A. Țicleanu, “Efficient Discovery of Association Rules in E-Commerce: Comparing Candidate Generation and Pattern Growth Techniques,” Applied Sciences, vol. 15, no. 10, May 2025, doi: 10.3390/app15105498.
[14] M. R. Setiawan, H. Pudjoprastyono, and N. Hariyana, “The Influence of Security, Convenience, and Customer Trust on Purchase Decision in Tokopedia Marketplace in Surabaya City,” Indonesian Interdisciplinary Journal of Sharia Economics (IIJSE), vol. 8, no. 3, 2025, doi: 10.31538/iijse.v8i3.6866.
[15] M. Arslan and C. Cruz, “Business Text Classification with Imbalanced Data and Moderately Large Label Spaces for Digital Transformation,” Applied Network Science, vol. 9, no. 11, 2024, doi: 10.1007/s41109-024-00623-5.
[16] A. Akundi, D. Euresti, S. Luna, W. Ankobiah, A. Lopes, and I. Edinbarough, “State of Industry 5.0—Analysis and Identification of Current Research Trends,” Appl. Syst. Innov., vol. 5, no. 1, Feb. 2022, doi: 10.3390/asi5010027.
[17] M. A. Palomino and F. Aider “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis”, Applied. Sciences, vol. 12, no. 17, 2022, doi: 10.3390/app12178765.
[18] P. Sharma and B. Nagpal, “Regex: An Experimental Approach For Searching In Cyber Forensic,” Int. J. Inf. Technol., vol. 12, no. 2, pp339-343, 2020, doi: 10.1007/s41870-019-00401-y.
[19] A. M. U. D. Khanday, S. T. Rabani, Q. R. Khan, N. Rouf, and M. M. U. Din, "Machine Learning Based Approaches for Detecting COVID-19 Using Clinical Text Data," International Journal of Information Technology, vol. 12, no. 3, pp731-739, Jun. 2020, doi: 10.1007/s41870-020-00495-9.
[20] Z. Deng, W.-T. Chen, L. Chen, and P. S. Yu, “AE-smnsMLC: Multilabel Classification with Semantic Matching and Negative Label Sampling for Product Attribute Value Extraction,” arXiv, 2023, doi: 10.48550/arXiv.2310.07137.
[21] W. Qian, J. Huang, F. Xu, W. Shu, and W. Ding, “A Survey on Multilabel Feature Selection from Perspectives of Label Fusion,” Information Fusion, vol. 100, Dec. 2023, doi: 10.1016/j.inffus.2023.101948.
[22] E. Deniz, H. Erbay, and M. Coşar, “Multilabel Classification of E-Commerce Customer Reviews via Machine Learning,” Axioms, vol. 11, no. 9, Aug. 2022, doi: 10.3390/axioms11090436.
[23] P-H. Lu, J-L. Keng, K-L Kuo, Y-F. Wang, Y-C. Tai, C-Y. Kuo, "An Apriori Algorithm-Based Association Rule Analysis to Identify Herb Combinations for Treating Uremic Pruritus Using Chinese Herbal Bath Therapy," Evidence-Based Complementary and Alternative Medicine, vol. 2020, no. 1, 2020, doi: 10.1155/2020/8854772.
[24] Z. Zhao, Z. Jian, G. S. Gaba, R. Alroobaea, M. Masud, and S. Rubaiee, "An Improved Association Rule Mining Algorithm for Large Data," Journal of Intelligent Systems, vol. 30, pp. 750–762, 2021, doi: 10.1515/jisys-2020-0121.
[25] A. Sharma, S. Rani, D. K. Sah, Z. Khan, and W. Boulila, “HOMLC—Hyperparameter Optimization for Multilabel Classification of Intrusion Detection Data for Internet of Things network,” Sensors, vol. 23, no. 19, doi: 10.3390/s23198333.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Naffa Nur Fauziah (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

