A Machine Readable Sense Inventory for Emoji

Here you will find links to download the EmoSim508 dataset and our Emoji embedding results for emoji similarity.

Please note that this work is currently under review.

EmoSim508 Dataset - Emoji Pairs

Description This dataset consists of 508 Emoji pairs. We used co-occurence frequency to select the emoji pairs for this dataset. We selected top-k emoji pairs that covers 25% of our Twitter corpus. These 508 emoji pairs have 158 unique emoji.
URL http://emojinet.org/emojisimilarity/emojipairs508.html

EmoSim508 Dataset - User Ratings

Description We use human annotators to assign similarity scores for each emoji pair in the EmoSim508 dataset. A total of ten annotators annotated this dataset.
URL http://emojinet.org/emojisimilarity/emojipairs508_userstudy.html

Semantic Similarity of Emoji - Results

Description We learn emoji embedding from Sense_Desc,Sense_Label,Sense_Def and Sense_All using Twitter and Google corpora. Then we use each embedding model to rank emoji pairs based on the similarity.
Download URL http://emojinet.org/emojisimilarity/emojipairs508_userstudy_embedding.html