Code Points

repo: Codepoints/awesome-codepoints
category: Miscellaneous


Awesome Code Points Awesome

This is a curated list of characters in Unicode, that have interesting (and
maybe not widely known) features or are awesome in some other way.

Table of Contents

  1. Standalone code points
  2. Code points that affect others
    1. Breaking and Gluing other characters
  3. Record holders and extremes
  4. For funsies
    1. Games
  5. Other Lists of Code Points
  6. Contributing
  7. License

Standalone Code Points

  • The code points of the Unicode blocks Box
    Drawing
    (U+2500 to U+257F) and Block
    Elements
    (U+2580 to U+259F) cover
    most of your monospace command-line visualization needs.

      ╭───────╮
      │Unicode│
      │rules! │
      ╰┬─────┬╯
    
  • U+2E2E REVERSED QUESTION MARK - the “irony
    mark” to express irony/sarcasm. A useful character⸮

  • U+D800 to
    U+DFFF - surrogate code points. They are
    only reserved to ease UTF-16
    encoding
    .

  • U+FEFF ZERO WIDTH NO-BREAK SPACE - it’s name
    suggests, that it can be used like U+2060 WORD JOINER. And in fact the
    latter was introduced to inherit its semantics. This is because U+FEFF had
    become a special beacon called the byte order
    mark
    , that was placed on
    the beginning of some UTF-8 files. In complying software (including many
    text editors) this character is stripped from the start of a file and
    handled as metadata. In non-complying software (like the PHP interpreter)
    this leads to all sorts of fun behaviour.

  • U+FFFD REPLACEMENT CHARACTER - when a
    character cannot be displayed (e.g., decoding an erroneous UTF-8 sequency),
    this code point steps into the breach.

  • U+1D455 is missing. It would be an italic
    small “h”. It was not encoded, because it would be identical to the Planck
    constant ℎ (U+210E).

  • U+FF03 FULLWIDTH NUMBER SIGN - it is the
    "Japanese Hashtag" . Sites like Twitter accept it as equivalent to the
    regular # (U+0023).

Code Points that Affect Others

  • U+202D and
    U+202E - change the text direction.
    Relevant XKCD:

  • U+FE0E VARIATION SELECTOR-15 - force
    black-&-white emoji. If this code point follows an emoji, an explicit
    monochrome rendering of the emoji is requested (if the client supports it).

  • U+FE0F VARIATION SELECTOR-16 - force
    colorful emoji. If this code point follows an emoji, an explicit colorful
    rendering of the emoji is requested (if the client supports it).

  • Diacritics and combining marks: There is a host of
    characters
    , that add
    to the characters before. Those are called Combining Marks. Unicode
    provides a handy FAQ on the
    details, but in a nutshell: If you add one after a character, it is placed
    on top of that previous one. So, a + ̊ = å. This may lead to all kinds
    of funny problems, because for some combinations there are pre-composed
    characters. Our little å here can also be encoded as U+00E5. You might
    note, that while this has a length of one character, the combination of a
    and combining ring has a length of two characters.

    Of course, one can also do fun things with those characters like
    this answer on StackOverflow.

  • The Regional Indicator Symbols
    U+1F1E6 to U+1F1FF resemble the 26 latin characters. They are used to
    create flag emoji. Since the Unicode consortium didn’t feel like getting on
    board with international politics, the solution to flags is to combine
    these 26 characters to the respective ISO code for a country. Examples:

    Country ISO Code Code Points Emoji (if supported)
    USA US U+1F1FA + U+1F1F8 🇺🇸
    Germany DE U+1F1E9 + U+1F1EA 🇩🇪
    China CN U+1F1E8 + U+1F1F3 🇨🇳
  • Skin color of emoji: There are five code points, that control the skin color
    of emoji, U+1F3FB to U+1F3FF.
    They are called “Emoji Modifier Fitzpatrick Type” 1 to 6, with 1 the palest
    and 6 the darkest. If one of these characters follows an emoji, that emoji
    is meant to be rendered in the appropriate skin color of the Fitzpatrick
    scale
    . If no such
    modifier is added, the skin tone should be unnatural, e. g., bright yellow.
    Fun fact: Since the Fitzpatrick modifiers are normal code points, emoji
    with such skin colors have the length 2, which Twitter users noticed first.
    Here is a comparison chart directly from the
    specification
    :

    Code Name Samples
    U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-1-2.png" alt="" height="20" width="auto"> <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-1-2-bw.png" alt="" height="20" width="auto">
    U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3 <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-3.png" alt="" height="20" width="auto"> <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-3-bw.png" alt="" height="20" width="auto">
    U+1F3FD EMOJI MODIFIER FITZPATRICK TYPE-4 <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-4.png" alt="" height="20" width="auto"> <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-4-bw.png" alt="" height="20" width="auto">
    U+1F3FE EMOJI MODIFIER FITZPATRICK TYPE-5 <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-5.png" alt="" height="20" width="auto"> <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-5-bw.png" alt="" height="20" width="auto">
    U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-6 <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-6.png" alt="" height="20" width="auto"> <img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-6-bw.png" alt="" height="20" width="auto">

Breaking and Gluing other characters

  • U+00A0 NO-BREAK SPACE - force adjacent
    characters to stick together. Well known as &nbsp; in HTML.
  • U+00AD SOFT HYPHEN - (in HTML: &shy;)
    like ZERO WIDTH SPACE, but show a hyphen if (and only if) a break occurs.
  • U+200B ZERO WIDTH SPACE - the inverse to
    U+00A0: create no space, but allow word breaking.
  • U+200D ZERO WIDTH JOINER - force adjacent
    characters to be joined together (e.g., arabic characters or supported
    emoji). Apple uses this to compose some emoji like different families.
  • U+2060 WORD JOINER - the same as
    U+00A0, but completely invisible. Good for writing @font-face on Twitter.

For better comparison of which code point has which effect, consult this
table:

| U+00A0 | U+00AD | U+200B | U+200D | U+2060
---------------|--------|--------|--------|--------|--------
create space | ✓ | ✗ | ✗ | ✗ | ✗
allow breaking | ✗ | ✓ | ✓ | ✗ | ✗
possible change| ✗ | ✓ | ✗ | ✓ | ✗

Smashing Magazine featured a comprehensive
article
on the
different types of whitespace.

Record Holders and Extremes

  • U+0000 <control> - first code point.
  • U+10FFFF (non-character) - last code
    point. The whole rest of its plane apart from U+10FFFE, the code points
    in the 0x10000-0x10FFFD range, are private use characters, guaranteed to
    be never filled by a future Unicode standard.
  • U+1F402 OX - shortest name.
  • U+1FBA8 BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT AND MIDDLE RIGHT TO LOWER CENTRE
    and
    U+1FBA9 BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT AND MIDDLE LEFT TO LOWER CENTRE - longest name: 88
    characters each.
  • U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE
    WASALLAM - longest decomposition form: 18 characters.
  • U+5146 and
    U+16B61 - code points that represent the
    highest “single-digit” number. In both cases that’s 1,000,000,000,000, a
    trillion.
  • U+0F33 TIBETAN DIGIT HALF ZERO - code point that
    represents the lowest “single-digit” number and at the same time the
    only negative one, -½.
  • The trophy for most useless code points goes to
    U+0080,
    U+0081 and
    U+0099. These so-called C1 control
    characters are more or less unspecified. They got into Unicode, because
    they were present in the very first version of what should later become ISO
    10646, the ISO-standardized version of Unicode. They were meant to be part
    of an upgrade to ISO
    2022
    , that
    never came to be.
  • A close second place in this regard goes to the CJK unified ideographs
    ,
    ,
    ,
    ,
    ,
    ,
    ,
    ,
    ,
    ,
    , and
    . These so-called “ghost characters”
    came to Unicode via the Japanese JIS standard, where they were added, because
    they were mis-read or misinterpreted from other signs, when JIS was compiled
    from original printed text sources.
  • U+006F LATIN SMALL LETTER O - leads the list
    of characters with confusable shapes. Of all the possible mappings in the
    list of confusable
    characters
    , the small “o”
    leads with a whopping 73 entries of similar looking glyphs, followed by
    U+006C LATIN SMALL LETTER L with 70
    entries.
  • U+1F4C0 DVD - only code point name without any vowel (source)
  • U+3106C CJK UNIFIED IDEOGRAPH 3106C - the
    character with the most
    strokes: 84.
    Take your time to write this one!

For Funsies

  • U+1680 OGHAM SPACE MARK - a space that looks
    like a dash. Great to bring programmers close to madness: 1 +  2 === 3.
  • U+037E GREEK QUESTION MARK - a look-alike to
    the semicolon. Also a fun way to annoy developers.
  • U+1DD2 COMBINING US ABOVE - this is the most
    romantic code point.
  • U+F8FF PRIVATE USE CODEPOINT - this private
    use code point is rendered as Apple logo on many Apple devices.
  • U+1F574 MAN IN BUSINESS SUIT LEVITATING -
    A rather curious character, that only made it into Unicode for its
    appearance in the Webdings font (for reasons of backwards compatibility).
  • U+1F596 RAISED HAND WITH PART BETWEEN
    MIDDLE AND RING FINGERS - the Vulcan salute. Live long and prosper!
    🖖
  • U+1F918 SIGN OF THE HORNS - Rock on!
    🤘
  • U+2800 BRAILLE PATTERN BLANK - A Braille pattern that has zero of its six or eight dots filled in. According to the standard: “* while this character is imaged as a fixed-width blank in many fonts, it does not act as a space” Essentially it is rendered as white-space, but since it is designated as not white-space it isn't matched by white-space-validating regular expressions. This can be used to bypass all kinds of validation that disallows or trims white-space.

Games

For plain-text gaming, Unicode is well equipped with several complete sets:

Other Lists of Code Points

Contributing Your Code Points

See the contribution guide for details.

License

CC0

To the extent possible under law, the
contributors

have waived all copyright and related or neighboring rights to this work. See
the license file for details.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?