A Machine Readable Sense Inventory for Emoji

EmoSim508 Dataset | API

Here you will find links to download the EmoSim508 dataset and our Emoji embedding results for emoji similarity.

Please note that this work is currently under review.

EmoSim508 Dataset - Emoji Pairs

Description This dataset consists of 508 Emoji pairs. We used co-occurence frequency to select the emoji pairs for this dataset. We selected top-k emoji pairs that covers 25% of our Twitter corpus. These 508 emoji pairs have 158 unique emoji.
URL http://emojinet.org/emojisimilarity/emojipairs508.html

EmoSim508 Dataset - User Ratings

Description We use human annotators to assign similarity scores for each emoji pair in the EmoSim508 dataset. A total of ten annotators annotated this dataset.
URL http://emojinet.org/emojisimilarity/emojipairs508_userstudy.html

Semantic Similarity of Emoji - Results

Description We learn emoji embedding from Sense_Desc,Sense_Label,Sense_Def and Sense_All using Twitter and Google corpora. Then we use each embedding model to rank emoji pairs based on the similarity.
Download URL http://emojinet.org/emojisimilarity/emojipairs508_userstudy_embedding.html