計算機科學必讀書籍
Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.
產品分類/產品分類是將產品組織到各自部門或類別中。 同樣,整個過程的很大一部分是整個產品分類的設計。
Product categorization was initially a text classification task that analyzed the product’s title to choose the appropriate category. However, numerous methods have been developed which take into account the product title, description, images, and other available metadata. The following papers on product categorization represent essential reading in the field and offer novel approaches to product classification tasks.
產品分類最初是一個文本分類任務,用于分析產品標題以選擇適當的類別。 但是,已經開發出許多方法來考慮產品標題,描述,圖像和其他可用的元數據。 以下有關產品分類的論文代表了該領域的重要閱讀內容,并為產品分類任務提供了新穎的方法。
1.不要分類,翻譯 (1. Don’t Classify, Translate)
In this paper, researchers from the National University of Singapore and the Rakuten Institute of Technology propose and explain a novel machine translation approach to product categorization. The experiment uses the Rakuten Data Challenge and Rakuten Ichiba datasets. Their method translates or converts a product’s description into a sequence of tokens which represent a root-to-leaf path to the correct category. Using this method, they are also able to propose meaningful new paths in the taxonomy.
在本文中,新加坡國立大學和樂天技術學院的研究人員提出并解釋了一種新穎的機器翻譯方法來進行產品分類。 該實驗使用了Rakuten Data Challenge和Rakuten Ichiba數據集。 他們的方法將產品的描述轉換或轉換為一系列標記,這些標記代表從根到葉的正確類別路徑。 使用這種方法,他們還能夠在分類法中提出有意義的新路徑。
The researchers state that their method outperforms many of the existing classification algorithms commonly used in machine learning today.
研究人員指出,他們的方法優于當今機器學習中常用的許多現有分類算法。
Published/Last Updated — Dec. 14, 2018
發布/最新更新— 2018年12月14日
Authors and Contributors — Maggie Yundi Li (National University of Singapore), Stanley Kok (National University of Singapore), and Liling Tan (Rakuten Institute of Technology)
作者和撰稿人:李Mag(新加坡國立大學),斯坦利·科克(新加坡國立大學)和譚麗玲(樂天技術學院)
Read Now
現在讀
2.使用神經注意模型對日本商品名稱進行大規模分類 (2. Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models)
The authors of this paper propose attention convolutional neural network (ACNN) models over baseline convolutional neural network (CNN) models and gradient boosted tree (GBT) classifiers. The study uses Japanese product titles taken from Rakuten Ichiba as training data. Using this data, the authors compare the performance of the three methods (ACNN, CNN, and GBT) for large-scale product categorization. While differences in accuracy can be less than 5%, even minor improvements in accuracy can result in millions of additional correct categorizations.
本文的作者提出了關注卷積神經網絡(ACNN)模型,而不是基線卷積神經網絡(CNN)模型和梯度提升樹(GBT)分類器。 該研究使用從Rakuten Ichiba獲得的日語產品標題作為培訓數據。 利用這些數據,作者比較了三種方法(ACNN,CNN和GBT)用于大規模產品分類的性能。 盡管精度差異可以小于5%,但即使精度略有提高,也可以導致數百萬種其他正確的分類。
Lastly, the authors explain how an ensemble of ACNN and GBT models can further minimize false categorizations.
最后,作者解釋了ACNN和GBT模型的集成如何進一步減少錯誤分類。
Published/Last Updated — April, 2017 for EACL 2017
已發布/最新更新— 2017年4月,適用于EACL 2017
Authors and Contributors — From the Rakuten Institute of Technology: Yandi Xia, Aaron Levine, Pradipto Das Giuseppe Di Fabbrizio, Keiji Shinzato and Ankur Datta
作者和撰稿人—來自樂天技術學院:夏彥迪,亞倫·萊文,Pradipto Das Giuseppe Di Fabbrizio,京急新zato和安庫·達塔
Read Now
現在讀
3.地圖集:電子商務服裝產品分類的數據集和基準 (3. Atlas: A Dataset and Benchmark for Ecommerce Clothing Product Classification)

Researchers at the University of Colorado and Ericsson Research (Chennai, India) have created a large product dataset known as Atlas. In this paper, the team presents their dataset which includes over 186,000 images of clothing products along with their product titles. Furthermore, they introduce related work in the field that has influenced their study. Finally, they test their dataset using a Resnet34 classification model and a Seq to Seq model to categorize the products. The data is taken from Indian ecommerce stores, so some of the categories used may not be applicable to Western markets. However, the dataset has been open-sourced and is available on Github.
科羅拉多大學和愛立信研究公司(印度金奈)的研究人員創建了一個名為Atlas的大型產品數據集。 在本文中,研究小組展示了他們的數據集,其中包括超過186,000種服裝產品的圖像以及產品標題。 此外,他們介紹了影響他們的研究領域的相關工作。 最后,他們使用Resnet34分類模型和Seq to Seq模型對產品進行測試,以對產品進行分類。 數據來自印度的電子商務商店,因此使用的某些類別可能不適用于西方市場。 但是,該數據集已經開源,可以在Github上使用。
Published/Last Updated — Aug. 19, 2019
發布/最后更新— 2019年8月19日
Authors and Contributors — Venkatesh Umaashankar (Ericsson Research), Girish Shanmugam (Ericsson Research), and Aditi Prakash (University of Colorado)
作者和撰稿人— Venkatesh Umaashankar(愛立信研究中心),Girish Shanmugam(愛立信研究中心)和Aditi Prakash(科羅拉多大學)
Read Now
現在讀
4.使用結構化和非結構化屬性的大規模產品分類 (4. Large Scale Product Categorization using Structured and Unstructured Attributes)
In this study, a team at WalmartLabs compares hierarchical models to flat models for product categorization.
在這項研究中,沃爾瑪實驗室的一個團隊將層次模型與平面模型進行了比較,以進行產品分類。
The researchers employ deep-learning based models which extract features from each product to create a product signature. In the paper, the researchers describe a multi-LSTM and multi-CNN based approach to this extreme classification task. Furthermore, they present a novel way to use structured attributes. The team states that their methods can be scaled to take into account any number of product attributes during categorization.
研究人員采用了基于深度學習的模型,該模型從每個產品中提取功能以創建產品簽名。 在論文中,研究人員描述了一種基于多LSTM和多CNN的方法來完成這種極端分類任務。 此外,它們提供了一種使用結構化屬性的新穎方法。 該團隊指出,他們的方法可以擴展,以在分類過程中考慮任何數量的產品屬性。
Published/Last Updated — Mar. 1, 2019
已發布/最新更新— 2019年3月1日
Authors and Contributors — From WalmartLabs: Abhinandan Krishnan and Abilash Amarthaluri
作者和貢獻者—來自沃爾瑪實驗室:Abhinandan Krishnan和Abilash Amarthaluri
Read Now
現在讀
5.使用多模式融合模型進行多標簽產品分類 (5. Multi-Label Product Categorization Using Multi-Modal Fusion Models)
In this paper, researchers from New York University and U.S. Bank investigate multi-modal approaches to categorize products on Amazon. Their approach utilizes multiple classifiers trained on each type of input data from the product listings. Using a dataset of 9.4 million Amazon products, they developed a tri-modal model for product classification based on product images, titles, and descriptions. Their tri-modal late fusion model retains an F1 score of 88.2%.
在本文中,來自紐約大學和美國銀行的研究人員研究了多模式方法來對亞馬遜上的產品進行分類。 他們的方法利用了針對產品列表中每種輸入數據類型進行訓練的多個分類器。 他們使用940萬個Amazon產品的數據集,開發了一種基于產品圖像,標題和描述的產品分類的三峰模型。 他們的三峰后期融合模型保留了88.2%的F1分數。
The findings of their study demonstrate that increasing the number of modalities could improve performance in multi-label product categorization.
他們研究的結果表明,增加模式數量可以改善多標簽產品分類的性能。
Published/Last Updated — June 30, 2019
發布/最新更新— 2019年6月30日
Authors and Contributors — Pasawee Wirojwatanakul (New York University) and Artit Wangperawong (U.S. Bank)
作者和貢獻者— Pasawee Wirojwatanakul(紐約大學)和Artit Wangperawong(美國銀行)
Read Now
現在讀
In the papers on product categorization above, the researchers trained their models on open datasets which included millions of products. However, if you are building a product categorization model for commercial use, many open datasets may not be available to you.
在上面有關產品分類的論文中,研究人員在包含數百萬種產品的開放數據集上訓練了他們的模型。 但是,如果您要構建用于商業用途的產品分類模型,則可能無法使用許多開放數據集。
Looking for training data for your product classification model? Check out this training data guide and these open datasets.
尋找針對您的產品分類模型的培訓數據? 查閱本培訓數據指南和這些開放的數據集 。
翻譯自: https://medium.com/analytics-vidhya/5-must-read-papers-on-product-categorization-for-data-scientists-19c98421cef3
計算機科學必讀書籍
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389410.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389410.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389410.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!