Unicode

Here, the Unicode format is just a sequence of Unicode Scalar Values.

Unicode permits control codes and other non-textual content; see Basic Text for a subset focused on textual content.

Currently Basic Text is based on Unicode 15.0.

Definitions

A string is in Unicode form iff:

A stream is in Unicode form iff:

  • it consists entirely of a string in Unicode form

Conversion

From byte sequence to Unicode string

To convert a byte sequence into a Unicode string in a manner that always succeeds but potentially loses information about invalid encodings: