Restricted Text

The Restricted Text format is a subset of the Basic Text format. It incorporates several restrictions which reduce the expressiveness of the format in order to reduce visual ambiguity.

This format is entirely hypothetical at this time. It's formed from a loose collection of ideas from a variety of sources, and is not yet ready for any practical purpose.

This format does not define conversion from Basic Text or other less restrictive formats, as that may cause meaning to be silently lost. Instead, errors should be reported when content not meeting these restrictions is encountered in any context where restricted text is expected. See Basic Text for an unrestricted alternative.

Definitions

A string is in Restricted Text form iff:

A stream is in Restricted Text form iff:

  • it is a stream in Basic Text form, and
  • it consists entirely of a string in Restricted Text form.

Note that even though this excludes U+34F (COMBINING GRAPHEME JOINER), the Stream Safe Text Format is still required; content must simply avoid using excessively long sequences of non-starters.

Sequence Table

SequenceError
[U+FE00–U+FE0F]"Variation selectors are not always visually distinct"
[U+E0100–U+E01EF]"Variation selectors are not always visually distinct"
Default Ignorable Code Points"Default Ignorable Code Points are not visually distinct"
Old Hangul Jamo"Conjoining Hangul Jamo are restricted in RFC5892"
Tag Characters"Tag Characters are not permitted"
Private-Use Characters"Private-use characters depend on private agreements"

Conversion

From Basic Text string to Restricted Text string

To convert a Basic Text string into a Restricted Text string in a manner that never loses information but may fail:

From Basic Text stream to Restricted Text stream

To convert a Basic Text stream into a Restricted Text stream in a manner than never loses information but may fail:

TODO

TODO: "Moderately Restricted" isn't stable over time.

TODO: Mixed-Number Detection

TODO: Unicode Security Mechanisms also specifies some Optional Detection rules.

TODO: U+2126 (OHM SIGN) normalizes to U+3A9 (GREEK CAPITAL LETTER OMEGA); does "Moderately Restricted" permit this Greek letter to be mixed with otherwise Latin script?

TODO: Several Braille scalars have visual similarities with other scalars, such as U+2800 and U+20, U+2802 and U+B7, and so on.

TODO: Several scalars such as U+1160, U+2062, U+FFA0, U+115F, U+16FE4, and possibly others, may display as whitespace despite not being categorized as whitespace. Can we constrain them with a mixed-script constraint, or some other mechanism?

TODO: Implicit Directional Marks have no display.