Unicode
Here, the Unicode format is just a sequence of Unicode Scalar Values.
Unicode permits control codes and other non-textual content; see Basic Text for a subset focused on textual content.
Currently Basic Text is based on Unicode 15.0.
Definitions
A string is in Unicode form iff:
- it encodes a sequence of Unicode Scalar Values.
A stream is in Unicode form iff:
- it consists entirely of a string in Unicode form
Conversion
From byte sequence to Unicode string
To convert a byte sequence into a Unicode string in a manner that always succeeds but potentially loses information about invalid encodings: