Emoji Translator
← Lab index
Two models · one referee

Emoji Translator.

English in, emoji out — and a measurable answer to whether the translation is any good.

0 / 500
Try one
Enter a phrase above and hit Translate to see both models go head-to-head.
Why three meters

The research behind the metrics

Most emoji translators just emit emoji and stop. The hard part is knowing whether the translation is any good — and the academic literature shows why that's genuinely difficult.

01 · Approaches

Two ways machines turn text into emoji

Supervised sequence-to-sequence

EmojiLM (Peng et al., 2023) is a distilled BART/T5 encoder–decoder for bidirectional text ↔ emoji. It's trained on Text2Emoji — a ~503K-pair English→emoji corpus that was itself synthesized by prompting an LLM (gpt-3.5-turbo) across 19 domains, with a 2.3K-emoji vocabulary (over 100× the 20 classes of the older TweetEval benchmark).

EmojiLM — arXiv:2311.01751 ↗

Embedding / semantics-grounded

emoji2vec (Eisner et al., 2016) maps every Unicode emoji into the same 300-d word2vec space by training each emoji's vector against the summed word vectors of its Unicode name and keywords — giving usable vectors even for rare emoji.

emoji2vec — arXiv:1609.08359 ↗

02 · The core problem

Why you can't grade emoji like a spelling test

One sentence maps to many valid emoji renderings, and one emoji carries many senses (😂 ≠ "crying"). So token-overlap metrics like BLEU and exact-match against a single reference are invalid — "I'm happy 🎉" and "I'm happy 😄" are both correct but share no tokens.

That splits "is it good?" into two different questions: Fidelity (does the emoji preserve the meaning?) and Naturalness (would a real person actually text this?). A string of 12 literal emoji can be perfectly faithful yet completely inhuman.

03 · Measurement

How the literature measures it → how this tool measures it

In the literature On this page
Forced-choice human preference vs a reference — EmojiLM's human study has annotators pick the better of model-vs-corpus emoji. Measures naturalness, crudely. → motivates our Naturalness score.
Back-translation / cloze test — Emojinize (2024) asks whether a third party can recover the original meaning from the emoji alone. An objective fidelity probe. arXiv:2403.03857 ↗ → this is exactly our Fidelity score: we back-translate the emoji and check how well the meaning survives ("reads back as …").
Single-emoji prediction with macro-F1 — SemEval-2018 Task 2. Useful, but single-emoji and exact-match only. ACL S18-1003 ↗ → context for why exact-match alone isn't enough.
Semantics-preserving evaluation (2024) — instead of exact match, check whether attributes like sentiment, emotion, and stance are preserved. arXiv:2409.10760 ↗ → motivates our Tone-match check.
04 · Open gaps

What's still missing

That gap is why this page runs two models head-to-head and scores both with the same fixed referee.

References

  1. EmojiLM: Modeling the New Emoji Language. Peng et al., 2023. arXiv:2311.01751 ↗
  2. emoji2vec: Learning Emoji Representations from their Description. Eisner et al., 2016. arXiv:1609.08359 ↗
  3. Emojinize: Enriching Any Text with Emoji Translations. 2024. arXiv:2403.03857 ↗
  4. SemEval-2018 Task 2: Multilingual Emoji Prediction. Barbieri et al., 2018. ACL S18-1003 ↗
  5. Semantics-Preserving Evaluation of Text-to-Emoji Translation. 2024. arXiv:2409.10760 ↗