This page provides the WDC-222 Gold Standard for hierarchical product categorization for public download and reports the results of various categorization experiments using the gold standard. The gold standard consists 2,984 product offers from different e-shops which were selected from the Web Data Commons product corpus. The offers are assigned to 222 leaf node categories of the Icecat product categorization hierarchy. The experiments compare the performance of hierarchical and flat, as well as traditional and neural network-based classification methods using the gold standard. The experimental results are set in relation to results on the Rakuten SIGIR2018 dataset and the Open Icecat dataset, which both only contain offers from a single source and are thus less heterogeneous.
Contents
- 1. Motivation
- 2. Datasets
- 3. Methodology
- 4. Results
- 5. Related Work
- 6. Download
- 7. Feedback
- 8. References
1. Motivation
Automatically classifying product offers into a large hierarchy of product categories is a major challenge in e-commerce. The existing benchmark datasets for large-scale product categorization, such as the Rakuten SIGIR2018 dataset, only contain data from a single provider. Product offers gathered from many different e-shops on the public web are more heterogeneous and thus more difficult to categorize. In order to understand how product categorization methods perform for such heterogeneous data, we have gathered a test dataset consisting of 2,984 product offers originating from different e-shops from the Web Data Commons product corpus. The test set uses the product hierarchy of the Open Icecat catalog as classification target. We use the test set together with a training set consisting of 489,902 product descriptions from the Open Icecat catalog which are categorized into the same hierarchy.
The sections below provide details about the datasets, the methodology of the experiments, as well as their results.
For researchers that are interested in flat product categorization, we also offer the WDC-25 Product Categorization Gold Standard which consists of 24,000 product offers from different e-shops that are manually assigned to a flat catagorization schema consisting of 25 product categories.
2. Datasets
This section describes the WDC-222 Gold Standard as well as the two other datasets (Icecat and Rakuten) that were used in the experiments. The datasets are provided for download at the end of this page.
WDC-222 Gold Standard
The WDC-222 Gold Standard is a manually reviewed subset of the WDC Training Dataset for Large-scale Product Matching (WDC-LSPM). Only products that occur in the Icecat dataset were selected from the WDC-LSPM dataset for inclusion into the WDC-222 Gold Standard. To do this, the "Global Trade Item Number" (gtin), a unique product identifier, was used. First, all unique gtins occurring in the Icecat dataset were collected. Next, the WDC-LSPM dataset was scanned for products matching any of these gtins. However, it was observed that different identifiers were often used inconsistently in the form of vendor-scoped terms, such as "sku" or "mpn", being used to annotate global-scoped identifier values like gtin. Consequently, all identifiers were searched for matching gtins.
As part of the matching process, a number of additional columns in the WDC-222 Gold Standard were created: the title-column corresponds to the name-property and the desc-column to the description-property of the schema.org_properties-column in the WDC-LSPM dataset. Additionally, the columns CategoryID and CategoryName of the corresponding Icecat dataset instance were added along with columns for the full path representation (category-IDs as well as -names) from the root node to the leaf node category of each instance. Table 1 shows the additional columns. Descriptions of the used WDC-LSPM columns can be found at the website for the WDC-LSPM.
Column name | Description |
---|---|
CategoryID | CategoryID from Icecat hierarchy |
Gtin | Global Trade Item Number |
CategoryName | CategoryName from Icecat hierarchy |
Desc | Description-property of the schema.org properties-column in the WDC-LSPM dataset |
Title | Name-property of the schema.org properties-column in the WDC-LSPM dataset |
Tokenmatch | Indicator whether two or more tokens matched with the Icecat product texts |
Pathlist_ids | CategoryID path to leaf node from Icecat hierarchy |
pathlist_names | Name path to leaf node from Icecat hierarchy |
The matching of the Icecat dataset and the WDC-LSPM dataset resulted in around 19,500 examples, but since there might have been incorrect identifiers e.g. due to manual data entering by humans, another cleansing process was applied to the matched WDC dataset. The examples were scanned a second time and only instances that matched at least two tokens with its corresponding example in the Icecat dataset were kept. For this, the columns Title, Description.LongDesc, and SummaryDescription.LongSummaryDescription on the one side (Icecat dataset) and title and desc on the other side (WDC-222 Gold Standard) were used. This way, 1,200 instances were removed.
Due to its origin being thousands of different websites from the Common Crawl, the WDC-222 Gold Standard can be considered rather "dirty" and heterogeneous. Therefore, it serves as an interesting counterpart to the cleaner Icecat and Rakuten datasets which both originate only from a single source. After duplicate removal und further cleansing, the WDC-222 Gold Standard has 2,984 instances with 222 distinct leaf node labels (Icecat product categories). Similar to the Icecat dataset, it is imbalanced. The columns used (title and desc) had prevalences of 100% and 77%, respectively. Considering Computers & Electronics as the root node, the minimum depth of the WDC hierarchy is 2, the maximum 3 and the average 2.47, which is almost identical to the Icecat hierarchy. The WDC-222 Gold Standard was used in the experiments as a test set only.
Icecat Dataset
Icecat is a worldwide publisher and syndicator of multilingual, standardized product data sheets from various domains that are consumed by e.g. online shops, ERP systems, comparison sites, purchase systems, or rating portals. Currently, Icecat is working together with 80,000 E-commerce companies and provides two different catalogs of product data: The "Full Icecat" and the "Open Icecat". The Full Icecat contains around 6.5M product data sheets from 24,000 vendors but is only available for purchase. The Open Icecat contains around 1M product data sheets from 340 vendors and is available for free download. We therefore used the Open Icecat for our experiments. The underlying hierarchy was also downloaded from the Icecat website. We removed duplicate products and only kept the ones belonging to the largest first level category "Computers & Electronics". It was decided to only keep categories with 25 or more instances in the dataset to ensure enough training data per category.
Icecat works together directly with its sponsoring brands and online channel partners. The product data sheets are fully approved by the partnering companies and controlled as well as standardized by Icecat’s editorial team in most cases. Therefore, this dataset can be considered relatively "clean" and thorough. In total, there are 765,473 examples in the final Icecat dataset with 370 distinct labels which are highly imbalanced. The attributes of a product used for classification were title, description, and brand. The title, brand, and a summarizing description are present for every instance. A longer description, however, is available for only 70% of the products. As for the depth of the tree, the minimum path length from the root node (if "Computers & Electronics" is considered the root instead of an artificial root) to any leaf node is 2, the maximum path length is 3, and the average path length from the root to a leaf node is 2.44. The final Icecat dataset as well as the original download of the Open Icecat can be found in the download section.
Rakuten Dataset
The Rakuten dataset originates from the 2018 SIGIR eCom Data Challenge (SeDC) organized by Rakuten Institute of Technology Boston, an R&D organization for the Rakuten group. The task was, similar to the experiments we conducted, to predict a product's category from a pre-defined taxonomy given the product's title (available for 100% of the products in the Rakuten dataset).
For this large-scale taxonomy classification task, Rakuten released 1M product titles and the corresponding category paths from their catalog. The categories are masked by random integers to preserve the anonymity of the taxonomy. The full Rakuten catalog was de-duplicated using leaf node label and product title tuples as keys and then a random sample of 1M instances was drawn leading to 3,008 distinct labels. Subsequently, the data was split into 0.8M training and 0.2M testing examples by category-wise stratified sampling. Whether the data, i.e. the quality of product titles and correct label assignment, was controlled or standardized in any way could not be determined. Just like the Icecat and WDC datasets, the Rakuten dataset is imbalanced consisting of a few large categories and a significant number of categories with infrequent occurrences.
The minimum depth of the hierarchy is 1, the maximum 8 and the average 4.15. Therefore, the Rakuten hierarchy is considerably deeper than the hierarchies of Icecat and WDC datasets. Like for the Icecat and WDC datasets, the number of nodes tends to be the highest towards the middle of the hierarchy and the average number of children decreases going down from top to bottom.
3. Methodology of the Experiments
This section describes the methodology of the product categorization experiments.
Tabel 2 gives a high-level overview of the main aspects/methods that have been used and combined in the experiments.
Process Step | Method |
---|---|
Data Preprocessing | Lowercased |
Non-alphanumeric characters removed | |
Additional whitespace removed | |
HTML text extracted | |
Contractions resolved | |
Feature Engineering | Title |
Description | |
Title+Description (concatenated) | |
Title+Brand+Description (concatenated) | |
Count vectors | |
Tf-idf vectors | |
Pre-trained word2vec embeddings | |
Pre-trained GloVe embeddings | |
Pre-trained fastText embeddings | |
Self-trained word2vec embeddings | |
Self-trained fastText embeddings | |
Model Training | Dictionary-based approach |
Naive Bayes | |
Linear Support Vector Machine | |
Random Forest | |
FastText classifier | |
Feedforward neural network | |
Modified Kim convolutional neural network | |
Model Evaluation | Weighted F1 |
Hierarchical F1 |
The goal of each experiment was to find the classification system with the highest performance measures within the respective experiment and then to use these findings in subsequent experiments. This means that not every aspect was used in every experiment (if the aspect proved to be unrewarding in previous experiments it was discarded). To make these decisions and report final scores, validation and testing sets were needed (see Table 3).
Dataset | Leafs | Training | Validation | Testing | Total |
---|---|---|---|---|---|
Icecat | 370 | 489,902 | 122,476 | 153,095 | 765,473 |
WDC | 222 | - | - | 2,984 | 2,984 |
Rakuten | 3,008 | 640,003 | 159,997 | 200,000 | 1,000,000 |
To assess system performance and to decide which aspects to further pursue in the course of the experiments, two measures were deemed most important: weighted f1 (wF) and hierarchical f1 (hF) [1].
The classification methods that were compared in the experiments are described below:
Dictionary-based Classification
The system in this experiment is based on overlapping words of a product's title and the list of possible categories (labels). The initial overlap with the raw product titles was small, which makes sense since syntactic variations and synonyms make exact matches rare. Therefore, synonyms were obtained for every leaf node category, which makes it a flat rather than a hierarchical approach. In order to increase the amount of matches even more, stemming and lemmatization was added. Classification in this approach worked as follows: For every category name (leaf nodes), the category name was split into individual words (some category names consist of multiple words). Then, overlaps of the product title's words (optionally lemmatized and/or stemmed, depending on the setting) with the category's words were counted. The result was divided by the number of words in the category name because if a category consists of multiple words, its count is likely to be higher and thus distorts results. Finally, if any category had a count of > 1, the (first) category with the highest count was selected as the prediction for the respective product title. If there were no matches, either the most frequent category in the dataset or a fallback classifier (depending on the setting) was used to make a prediction. The fallback classifier was multinomial Naive Bayes.
Flat Traditional Classification
In this experiment, the three traditional classifiers Naive Bayes, linear SVM, and Random Forest were used. In addition, different attributes (title, brand, and description), feature selection methods, as well as feature generation methods (frequency- and prediction based) and balancing techniques were explored and assessed regarding suitability for further experiments. Further parameter tuning was done via grid search. Altogether, 96 combinations of parameters were tested. Based on the identified optimal parameters, it was explored whether pre- and self-trained embeddings could further increase the performance. For this, Word2Vec (pre- and self-trained), GloVe (pre-trained), and fastText (pre- and self-trained) corpora were chosen.
Hierarchical Traditional Classification
In this experiment, traditional classifiers were employed in an local classifier per parent node (LCPN) setup, i.e. a separate flat classifier was trained at each parent node of the respective hierarchy. Furthermore, the "selective classifier" [2] was used to potentially boost performance by using the best classifier out of a selection at each parent node. For the implementation of the selective approach, linear SVM was trained additionally since it was the second-best performing algorithm in the flat traditional experiment. Our implementation of both approaches, regular LCPN and selective classifier, can be applied to arbitrary datasets and tree-structured hierarchies and may therefore potentially be used in future works.
Flat FastText Classification
Facebook Research's fastText library not only offers learning of word representations but also a text classifier. To find optimized training parameters for the fastText classifier, grid search was used on the learning rate, the number of epochs, and n-gram size as they appeared to be the most influential parameters. Additionally, fastText provides a built-in automatic hyperparameter optimization on a specified validation file (beta-functionality) which was inferior to using regular grid search.
Hierarchical FastText Classification
In conformity with the previously described experiments using traditional classifiers, which were employed in a flat as well as a hierarchical setup, the same was done for the fastText classifier. Therefore, in this experiment, fastText classifiers were used in an LCPN setup, i.e. a separate fastText classifier was trained for each parent node of the respective hierarchy. Parameters found in the grid search process of the flat fastText experiment were reused for the hierarchical system.
Flat Neural Network Classification
The first group of neural networks that were used in this experiment were regular feedforward neural networks (FNN) with two hidden layers. We used the Adam optimizer and categorical cross-entropy as the loss function. Besides training a network with tf-idf vectors, networks with an embedding layer receiving GloVe embeddings pre-trained on the Common Crawl were built. They performed best among the pre-trained embeddings in previous experiments. The networks were trained with a static and a non-static embedding layer to allow weights to be updated during training, thus becoming even more task-specific.
The second group of neural networks that were used in this experiment were convolutional neural networks (CNN). In particular, a modified version of the popular Kim-CNN [3] was implemented and named MK-CNN (see Figure 1). By adding multiple convolutional layers, this network first retrieves expressive of features of varying length before applying classic fully connected layers at the end of the network. This design is similar to the idea of combining different word n-grams, which has shown good performance in the non-deep learning experiments.

Hierarchical Neural Network + Traditional Classification
Analysis of the prediction paths produced by hierarchical systems in previous experiments revealed that the origin of the first misclassification was mostly at the root node classifier. Therefore, the approach in this experiment tried to tackle this problem by employing an MK-CNN instead of a traditional or fastText classifier at the root node. The networks trained in this experiment were the same architectures used in the previous flat neural network experiment. In the first step, they were trained and evaluated regarding the performance on the root node's children exclusively. In order to create the training and evaluation datasets for the root node network, all products were re-labeled with their respective first-level category. Based on the results, the best network was then fused together with the selective classifier approach by replacing the traditional root node classifier with the neural network.
Hierarchical Neural Network + FastText Classification
In this experiment, the same MK-CNN as before was used as the root node classifier while at every other non-leaf node, a fastText classifier was trained. The fastText classifiers were substantially larger than the Random Forest and linear SVM classifiers, which already produced difficulties in the hierarchical fastText experiment, even when hundreds of gigabytes of memory were available. Therefore, the hierarchical implementation was altered in a way that the training and classification process was done on a per-level basis, so that classifiers of past levels could be discarded to free up memory. To be precise, the approach still involved training a separate fastText classifier per parent node, only the process was done per level to discard classifiers once training and classification were past the respective level. This way, the implementation only requires a fraction of the memory compared to what was needed before. This makes it more suitable for future experiments as well.
4. Results
Table 4 summarizes the results of the experiments. We see that the results of all methods on the WDC-222 Gold Standard are significantly lower compared to the results on the cleaner Icecat dataset. The results on the Rakuten dataset show that the performance of the tested methods is in the same range as the performance of other state of the art methods for hierarchical product categorization [14].
Classification System | Icecat | WDC-222 Gold Standard | Rakuten |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
wP | wR | wF | hF | wP | wR | wF | hF | wP | wR | wF | hF | |
Dictionary-based approach | 0.79 | 0.43 | 0.48 | 0.55 | 0.72 | 0.61 | 0.62 | 0.71 | - | - | - | - |
Flat traditional classifier | 0.98 | 0.98 | 0.98 | 0.99 | 0.80 | 0.72 | 0.73 | 0.79 | 0.77 | 0.77 | 0.76 | 0.83 |
LCPN traditional classifier | 0.98 | 0.98 | 0.98 | 0.99 | 0.82 | 0.73 |
0.74 | 0.81 | 0.78 | 0.78 |
0.77 | 0.83 |
Flat FastText classifier | 0.99 | 0.98 | 0.99 | 0.99 | 0.86 | 0.73 | 0.76 | 0.83 | 0.82 | 0.79 | 0.80 | 0.85 |
LCPN FastText classifier | 0.99 | 0.98 | 0.98 | 0.99 | 0.85 | 0.76 | 0.78 | 0.84 | 0.81 | 0.81 | 0.80 | 0.87 |
Flat neural network | 0.98 | 0.98 | 0.98 | 0.99 | 0.84 | 0.76 | 0.78 | 0.84 | 0.77 | 0.78 | 0.77 | 0.85 |
Neural network + LCPN traditional classifier | 0.98 | 0.98 | 0.98 | 0.99 |
0.84 | 0.75 | 0.76 | 0.84 | 0.80 | 0.80 | 0.79 | 0.86 |
Neural network + LCPN FastText classifier | 0.98 | 0.98 |
0.98 | 0.99 |
0.84 | 0.78 | 0.78 | 0.85 | 0.80 | 0.80 | 0.80 | 0.86 |
5. Related Work
The table below gives an overview of related work on hierarchical product classification. The table categorizes the approaches from two perspectives: whether a system uses a flat (F) or a hierarchical (H) approach and whether traditional machine learning algorithms (T) or neural networks (NN) are used, or both (B). The table also contains statistics about the size of the employed product hierarchy as well as the number of product instances used in the evaluation.
Paper | Approach | Algorithm | Embeddings | Levels | Leafs | Instances |
---|---|---|---|---|---|---|
[4] |
H | T | no | n/a | 7,960 | 0.5M |
[5] | H | T | no | 4 | 319 | 0.4M |
[6] | H | NN | yes | 8 | 3,008 | 1M |
[7] | H | B | yes | 8 | 3,008 | 1M |
[8] | H | B | yes | 5 | 26,223 | 172M* |
[9] | H | B | no | 5 | 28,338 | 150M |
[10] | F | T | yes | 6 | 319 | 0.5M |
[11] | F | T | yes | 3 | 303 | 8,362 |
[12] | F | T | no | n/a | 29 | 17M |
[13] | F | T | no | 8 | 3,008 | 1M |
[14] | F | T | no | 8 | 3,008 | 1M |
[15] | F | T | no | 7 | 18,000 | 18.3M |
[16] | F | T | no | 6 | 20,000 | 14M |
[17] | F | T | no | 3 | 37 | 9,414 |
[18] | F | T | no | 1 | 9 | 1.9M |
[19] | F | NN | yes | n/a | 122 | 0.1M |
[20] |
F | NN | yes | n/a | 21,819 | 100M* |
[21] | F | NN | yes | n/a | 13,234 | 1.5M |
[22] | F | NN | yes | n/a | 4,100 | 95M |
[23] | F | NN | yes | n/a | 2,890 | 1.2M |
[24] | F | NN | yes | n/a | 2,598 | 0.4M |
[25] | F | NN | yes | 8 | 3,008 | 1M |
[26] | F | NN | yes | 8 | 3,008 | 1M |
[27] | F | NN | yes | 8 | 3,008 | 1M |
[28] | F | NN | yes | 8 | 3,008 | 1M |
[29] | F | NN | yes | 8 | 3,008 | 1M |
[30] | F | NN | yes | 8 | 3,008 | 1M |
[31] | F | NN | yes | 3 | 303 | 8,362 |
[32] | F | NN | yes | 1 | 35 | 20M |
[33] | F | B | no | n/a | 468 | 28,000* |
[34] | B | T | yes | 6 | n/a | 1M |
[35] | B | T | no | 6 | 21,337 | 83M |
[36] | B | T | no | 4 | 421 | 40,000 |
[37] | B | NN | yes | 8 | 3,008 | 1M |
[38] | B | NN | yes | 8 | 3,008 | 1M |
[39] | B | B | yes | n/a | 6,000 | 25M |
[40] | B | B | yes | 8 | 3,008 | 1M |
6. Download
This section provides the WDC-222 Gold Standard as well as the Icecat dataset for download in JSON format. Table 1 documents the content of the different fields.
File | Size | Download |
WDC-222 Gold Standard | 4 MB | wdc_data_full.json |
Icecat Training Set | 1.2 GB | icecat_data_train.json |
Icecat Validation Set | 318 MB | icecat_data_validate.json |
Icecat Test Set | 389 MB | icecat_data_test.json |
In addition, we also provide the complete datasets for replicating all experiments as Python pickle files together with Python code for running the experiments.
File | Size | Download |
Data of all experiments | 5.81 GB | data.zip |
Code to run the experiments | 7 MB | src.zip |
Files/folders descriptions | 100 KB | Files and folders descriptions upload.pdf |
7. Feedback
Please send questions and feedback to the Web Data
Commons Google Group.
More information about Web Data Commons is found here.
8. References
- Svetlana Kiritchenko, Stan Matwin, Richard Nock, and Fazel Famili. Learningand Evaluation in the Presence of Class Hierarchies: Application to TextCategorization. In Advances in Artificial Intelligence, volume 4013, pages395–406, June 2006.
- Andrew D. Secker, Matthew N. Davies, Alex A. Freitas, Jon Timmis,Miguel Mendao, and Darren R. Flower. An experimental comparison ofclassification algorithms for hierarchical prediction of protein function. ExpertUpdate (Magazine of the British Computer Society’s Specialist Groupon AI), 9:17–22, November 2007.
- Yoon Kim. Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882 [cs], September 2014.
- Young-gon Kim, Taehee Lee, Jonghoon Chun, and Sang-goo Lee. Modified Na¨ıve Bayes Classifier for E-Catalog Classification. In Juhnyoung Lee, Junho Shim, Sang-goo Lee, Christoph Bussler, and Simon Shim, editors, Data Engineering Issues in E-Commerce and Services, Lecture Notes in Computer Science, pages 246–257, Berlin, Heidelberg, 2006. Springer.
- Damir Vandic, Flavius Frasincar, and Uzay Kaymak. A Framework for Product Description Classification in e-Commerce. J. Web Eng., 17(1-2):1–27, March 2018.
- Angshuman Ghosh, Vineet John, and Rahul Iyer. TeamWaterloo at the SIGIR e-Commerce Data Challenge. In eCOM@SIGIR, 2018.
- Quentin Labernia, Yashio Kabashima, Michimasa Irie, Toshiyuki Oike, Kohei Asano, Jinhee Chun, and Takeshi Tokuyama. Large-Scale Taxonomy Problem: A Mixed Machine Learning Approach. page 6, 2018.
- Pradipto Das, Yandi Xia, Aaron Levine, Giuseppe Di Fabbrizio, and Ankur Datta. Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 969–979, Valencia, Spain, April 2017. Association for Computational Linguistics.
- Ali Cevahir and Koji Murakami. Large-scale Multi-class and Hierarchical Product Categorization for an E-commerce Giant. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 525–535, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee.
- Zornitsa Kozareva. Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1329–1333, Denver, Colorado, May 2015. Association for Computational Linguistics.
- Petar Ristoski, Petar Petrovski, Peter Mika, and Heiko Paulheim. A machine learning approach for product matching and categorization: Use case: Enriching product ads with semantic structured data. Semantic Web, 9:1–22, August 2018.
- Hsiang-Fu Yu, Chia-Hua Ho, Prakash Arunachalam, Manas Somaiya, and Chih-Jen Lin. Product Title Classification versus Text Classification. 2012.
- Haohao Hu, Runjie Zhu, Yuqi Wang, Wenying Feng, Xing Tan, and Jimmy Xiangji Huang. A Best Match KNN-based Approach for Large-scale Product Categorization. page 6, 2018.
- Timothy Chappell, Shlomo Geva, and Lawrence Buckingham. TopSig at the SIGIR’eCom 2018 Rakuten Data Challenge. In eCOM@SIGIR, 2018.
- Dan Shen, Jean-David Ruvini, Rajyashree Mukherjee, and Neel Sundaresan. A Study of Smoothing Algorithms for Item Categorization on e-Commerce Sites. In 2010 Ninth International Conference on Machine Learning and Applications, pages 23–28, December 2010.
- Dan Shen, Jean-David Ruvini, Manas Somaiya, and Neel Sundaresan. Item categorization in the e-Commerce domain. In International Conference on Information and Knowledge Management, Proceedings, pages 1921–1924, October 2011.
- Robert Meusel, Anna Primpeli, Christian Meilicke, Heiko Paulheim, and Christian Bizer. Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale. In Heiner Stuckenschmidt and Dietmar Jannach, editors, E-Commerce and Web Technologies, Lecture Notes in Business Information Processing, pages 83–99, Cham, 2015. Springer International Publishing.
- Petar Petrovski, Volha Bryl, and Christian Bizer. Integrating Product Data from Websites Offering Microdata Markup. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion, pages 1299–1304, Seoul, Korea, 2014. ACM.
- PasaweeWirojwatanakul and ArtitWangperawong. Multi-Label Product Categorization Using Multi-Modal Fusion Models. arXiv:1907.00420 [cs, stat], September 2019.
- Maggie Li, Stanley Kok, and Liling Tan. Don’t Classify, Translate: Multi- Level E-Commerce Product Categorization Via Machine Translation. December 2018.
- Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit Dhillon. X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers. December 2019.
- Jung-Woo Ha, Hyuna Pyo, and Jeonghee Kim. Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 107–115, San Francisco, California, USA, 2016. ACM.
- Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, and Shie Mannor. Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce. arXiv:1611.09534 [cs], November 2016.
- Hu Xu, Bing Liu, Lei Shu, and P. Yu. Open-world Learning and Application to Product Classification. September 2018.
- Shogo Suzuki, Yohei Iseki, Hiroaki Shiino, Hongwei Zhang, Aya Iwamoto, and Fumihiko Takahashi. Convolutional Neural Network and Bidirectional LSTM Based Taxonomy Classification Using External Dataset at SIGIR eCom Data Challenge. In eCOM@SIGIR, January 2018.
- Michael Skinner. Product Categorization with LSTMs and Balanced Pooling Views. In eCOM@SIGIR, 2018.
- Yugang Jia, Xin Wang, Hanqing Cao, Boshu Ru, and Tianzhong Yang. An Empirical Study of Using An Ensemble Model in e-Commerce Taxonomy Classification Challenge. In eCOM@SIGIR, 2018.
- Makoto Hiramatsu and Kei Wakabayashi. Encoder-Decoder neural networks for taxonomy classification. page 4, 2018.
- Maggie Yundi Li, Liling Tan, Stanley Kok, and Ewa Szymanska. Unconstrained Product Categorization with Sequence-to-Sequence Models. page 6, 2018.
- Hang Gao and Tim Oates. Large Scale Taxonomy Classification using BiLSTM with Self-Attention. page 5, 2018.
- Ziqi Zhang and Monica Paramita. Product Classification Using Microdata Annotations. In Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtˇech Sv´atek, Isabel Cruz, Aidan Hogan, Jie Song, Maxime Lefranc¸ois, and Fabien Gandon, editors, The SemanticWeb – ISWC 2019, Lecture Notes in Computer Science, pages 716–732, Cham, 2019. Springer International Publishing.
- Yandi Xia, Aaron Levine, Pradipto Das, Giuseppe Di Fabbrizio, Keiji Shinzato, and Ankur Datta. Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 663–668, Valencia, Spain, April 2017. Association for Computational Linguistics.
- Chanawee Chavaltada, Kitsuchart Pasupa, and David R. Hardoon. A Comparative Study of Machine Learning Techniques for Automatic Product Categorisation. In Fengyu Cong, Andrew Leung, and Qinglai Wei, editors, Advances in Neural Networks - ISNN 2017, Lecture Notes in Computer Science, pages 10–17, Cham, 2017. Springer International Publishing.
- Vivek Gupta, Harish Karnick, Ashendra Bansal, and Pradhuman Jhala. Product Classification in E-Commerce using Distributional Semantics. arXiv:1606.06083 [cs], July 2016.
- Dan Shen, Jean-David Ruvini, and Badrul Sarwar. Large-scale item categorization for e-commerce. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 595–604, Maui, Hawaii, USA, 2012. ACM.
- Ying Ding, M. Korotkiy, B. Omelayenko, V. Kartseva, V. Zykov, Michel Klein, and E. Schulten. GoldenBullet: Automated Classification of Product Data in E-commerce. June 2002.
- Yiu-Chang Lin, Pradipto Das, and Ankur Datta. Overview of the SIGIR 2018 eCom Rakuten Data Challenge. In eCOM@SIGIR, 2018.
- Wenhu Yu, Zhiqiang Sun, Haifeng Liu, Zhipeng Li, and Zhitong Zheng. Multi-level Deep Learning based E-commerce Product Categorization. page 6, 2018.
- Abhinandan Krishnan and Abilash Amarthaluri. Large Scale Product Categorization using Structured and Unstructured Attributes. arXiv:1903.04254 [cs, stat], March 2019.
- Sylvain Goumy and Mohamed-Amine Mejri. Ecommerce Product Title Classification. page 4, 2018.