CHARACTER
types in f18--- local: ---
The f18 compiler and runtime support three kinds of the intrinsic CHARACTER
type of Fortran 2018. The default (CHARACTER(KIND=1)
) holds 8-bit character codes; CHARACTER(KIND=2)
holds 16-bit character codes; and CHARACTER(KIND=4)
holds 32-bit character codes.
We assume that code values 0 through 127 correspond to the 7-bit ASCII character set (ISO-646) in every kind of CHARACTER
. This is a valid assumption for Unicode (UCS == ISO/IEC-10646), ISO-8859, and many legacy character sets and interchange formats.
CHARACTER
data in memory and unformatted files are not in an interchange representation (like UTF-8, Shift-JIS, EUC-JP, or a JIS X). Each character's code in memory occupies a 1-, 2-, or 4- byte word and substrings can be indexed with simple arithmetic. In formatted I/O, however, CHARACTER
data may be assumed to use the UTF-8 variable-length encoding when it is selected with OPEN(ENCODING='UTF-8')
.
CHARACTER(KIND=1)
literal constants in Fortran source files, Hollerith constants, and formatted I/O with ENCODING='DEFAULT'
are not translated.
For the purposes of non-default-kind CHARACTER
constants in Fortran source files, formatted I/O with ENCODING='UTF-8'
or non-default-kind CHARACTER
value, and conversions between kinds of CHARACTER
, by default:
CHARACTER(KIND=1)
is assumed to be ISO-8859-1 (Latin-1),CHARACTER(KIND=2)
is assumed to be UCS-2 (16-bit Unicode), andCHARACTER(KIND=4)
is assumed to be UCS-4 (full Unicode in a 32-bit word).In particular, conversions between kinds are assumed to be simple zero-extensions or truncation, not table look-ups.
We might want to support one or more environment variables to change these assumptions, especially for KIND=1
users of ISO-8859 character sets besides Latin-1.
Allocatable CHARACTER
objects in Fortran may defer the specification of their lengths until the time of their allocation or whole (non-substring) assignment. Non-allocatable objects (and non-deferred-length allocatables) have lengths that are fixed or assumed from an actual argument, or, in the case of assumed-length CHARACTER
functions, their local declaration in the calling scope.
The elements of CHARACTER
arrays have the same length.
Assignments to targets that are not deferred-length allocatables will truncate or pad the assigned value to the length of the left-hand side of the assignment.
Lengths and offsets that are used by or exposed to Fortran programs via declarations, substring bounds, and the LEN()
intrinsic function are always represented in units of characters, not bytes. In generated code, assumed-length arguments, the runtime support library, and in the elem_len
field of the interoperable descriptor cdesc_t
, lengths are always in units of bytes. The distinction matters only for kinds other than the default.
Fortran substrings are rather like subscript triplets into a hidden “zero” dimension of a scalar CHARACTER
value, but they cannot have strides.
Fortran has one CHARACTER
-valued intrinsic operator, //
, which concatenates its operands (10.1.5.3). The operands must have the same kind type parameter. One or both of the operands may be arrays; if both are arrays, their shapes must be identical. The effective length of the result is the sum of the lengths of the operands. Parentheses may be ignored, so any CHARACTER
-valued expression may be “flattened” into a single sequence of concatenations.
The result of //
may be used
CHARACTER
relation,SOURCE=
or MOLD=
of an ALLOCATE
statemnt,ASSOCIATE
or SELECT
construct,NAME=
of a BIND(C)
attribute,STOP
statement,The f18 compiler has a general (but slow) means of implementing concatenation and a specialized (fast) option to optimize the most common case.
In the most general case, the f18 compiler's generated code and runtime support library represent the result as a deferred-length allocatable CHARACTER
temporary scalar or array variable that is initialized as a zero-length array by AllocatableInitCharacter()
and then progressively augmented in place by the values of each of the operands of the concatenation sequence in turn with calls to CharacterConcatenate()
. Conformability errors are fatal -- Fortran has no means by which a program may recover from them. The result is then used as any other deferred-length allocatable array or scalar would be, and finally deallocated like any other allocatable.
The runtime routine CharacterAssign()
takes care of truncating, padding, or replicating the value(s) assigned to the left-hand side, as well as reallocating an nonconforming or deferred-length allocatable left-hand side. It takes the descriptors of the left- and right-hand sides of a CHARACTER
assignemnt as its arguments.
When the left-hand side of a CHARACTER
assignment is a deferred-length allocatable and the right-hand side is a temporary, use of the runtime's MoveAlloc()
subroutine instead can save an allocation and a copy.
Scalar CHARACTER(KIND=1)
expressions evaluated as the right-hand sides of assignments to independent substrings or whole variables that are not deferred-length allocatables can be optimized into a sequence of calls to the runtime support library that do not allocate temporary memory.
The routine CharacterAppend()
copies data from the right-hand side value to the remaining space, if any, in the left-hand side object, and returns the new offset of the reduced remaining space. It is essentially memcpy(lhs + offset, rhs, min(lhsLength - offset, rhsLength))
. It does nothing when offset > lhsLength
.
void CharacterPad()
adds any necessary trailing blank characters.