The [transcode] command.


The transcode command lets you perform encoding operations. It provides support for more than 100 different encodings and can perform conversions from and to these encodings. It is built over the Core Foundation Mac OS X API. Since the Core Foundation framework uses Unicode (UTF-16) internally, all converters convert between UTF-16 (with the endianness according to the current platform) and another encoding. For more information about the Core Foundations, see the Core Foundation Design Concepts.
A particular encoding can have different names depending on the platform or the environment where it is used. For instance, the UTF-8 encoding is sometimes written Utf8, Utf-8, UTF8. The [transcode] command supports all these variants. It also supports prettified names and IANA (Internet Assigned Numbers Authority) names as explained with the [transcode alias] subcommand. See also the table in the Encoding names section below.


The formal syntax of the [transcode] command is:
    transcode subcommand ?options?
The possible subcommands are described below. Depending on the subcommand, different arguments and options can be specified.

The [alias] subcommand

This command lets you obtain different forms of the encoding names. Encodings can be designated by a standard name, a prettified name or a canonic name as defined by the IANA (Internet Assigned Numbers Authority). Encoding names are case insensitive.
The syntax of this subcommand is:
    transcode alias (-pretty|-IANA|-standard) encoding
The encoding argument can be specified in any form. The command will return the form specified by the option:
-pretty, -IANA, or -standard. If the encoding is not supported or does not have a name in the specified system, the command raises an error.
Specifying an empty string as the encoding defaults to using the system encoding (see the
[transcode system] command).

The [from] subcommand

This command lets you convert data from the specified encoding to Unicode (UTF-16) encoding.
The syntax of this subcommand is:
    transcode from encoding data
This command has the same meaning as the Tcl command
[encoding convertfrom]: the difference is that it handles characters whose Unicode code-point is above the BMP correctly. In particular, in a conversion from UTF-8, a valid 4-bytes UTF-8 sequence will produce a pair of surrogates in UTF-16.
The characters in the data argument are treated as binary data where the lower 8-bits of each character are taken as a single byte. The resulting sequence of bytes is treated as a string in the specified encoding.
If the encoding argument is empty, the system encoding is used (see the [transcode system] command).

The [names] subcommand

This command lets you list the available encodings.
The syntax of this subcommand is:
    transcode names ?(-pretty|-IANA|-standard)?
With no option or
-standard option, this command returns a list of standard names for the available encodings. If the -pretty option is specified, the encoding names are returned in a pretty form more suitable for display. If the -IANA option is specified, the encoding names are returned as defined by the IANA (Internet Assigned Numbers Authority): see IANA character-sets page.
Important: note that not all of the supported encodings have a IANA name. As a consequence, if the option -IANA is specified, some encodings may be missing from the returned list. See the table in the Encoding Names section. Use the [transcode alias] command to convert encoding names between the different systems.

The [to] subcommand

This command lets you convert a string in Unicode (UTF-16) encoding to the specified encoding. The syntax is:
    transcode to ?-external? ?-subst str? encoding string
This command has the same meaning as the Tcl command
[encoding convertto]: the difference is that it handles surrogate pairs correctly. In particular, when converting to UTF-8, a valid pair of surrogates will produce a 4-bytes sequence.
The result is a sequence of bytes representing the converted string. If the encoding argument is empty, the system encoding is used (see the [transcode system] command). The following options can be specified:
the -external option is useful only for UTF-16. It inserts a BOM (Byte Order Mark, \uFEFF) at the beginning of the output string.
the -subst option lets you specify a character which will be used as a substitution character in case a Unicode character does not have an equivalent in the target charset. By default, the substitution character is the question mark (\x3F). The character is supposed to be in the target encoding: no conversion is performed on it.

The [system] subcommand

This command returns the default encoding for the system.
The syntax is:
    transcode system 

Encoding names

Standard encoding names are case insensitive and any dashes, underscores or spaces are ignored. Thus the strings "UTF-8", "utf_8", "u*T@f8" and "Utf 8" are equivalent.
The following table gives the various forms of encoding names. Not all of the encodings have a IANA name. To learn more about the IANA names, see the
IANA character-sets page. Missing names are marked as n/a.
One can use the [transcode alias] subcommand to convert between these representations.
Standard namesPretty namesIANA names
MacRomanWestern (Mac OS Roman)MACINTOSH
MacJapaneseJapanese (Mac OS)X-MAC-JAPANESE
MacChineseTradTraditional Chinese (Mac OS)X-MAC-TRAD-CHINESE
MacKoreanKorean (Mac OS)X-MAC-KOREAN
MacArabicArabic (Mac OS)X-MAC-ARABIC
MacHebrewHebrew (Mac OS)X-MAC-HEBREW
MacGreekGreek (Mac OS)X-MAC-GREEK
MacCyrillicCyrillic (Mac OS)X-MAC-CYRILLIC
MacDevanagariDevanagari (Mac OS)X-MAC-DEVANAGARI
MacGurmukhiGurmukhi (Mac OS)X-MAC-GURMUKHI
MacGujaratiGujarati (Mac OS)X-MAC-GUJARATI
MacThaiThai (Mac OS)X-MAC-THAI
MacChineseSimpSimplified Chinese (Mac OS)X-MAC-SIMP-CHINESE
MacTibetanTibetan (Mac OS)X-MAC-TIBETAN
MacCentralEurRomanCentral European (Mac OS)X-MAC-CENTRALEURROMAN
MacSymbolSymbol (Mac OS)X-MAC-SYMBOL
MacDingbatsDingbats (Mac OS)X-MAC-DINGBATS
MacTurkishTurkish (Mac OS)X-MAC-TURKISH
MacCroatianCroatian (Mac OS)X-MAC-CROATIAN
MacIcelandicIcelandic (Mac OS)X-MAC-ICELANDIC
MacRomanianRomanian (Mac OS)X-MAC-ROMANIAN
MacCelticCeltic (Mac OS)n/a
MacGaelicGaelic (Mac OS)n/a
MacFarsiFarsi (Mac OS)X-MAC-FARSI
MacUkrainianCyrillic (Mac OS Ukrainian)X-MAC-UKRAINIAN
MacInuitInuit (Mac OS)n/a
MacHFSWestern (Mac OS Roman)macintosh
ISOLatin1Western (ISO Latin 1)ISO-8859-1
ISOLatin2Central European (ISO Latin 2)ISO-8859-2
ISOLatin3Western (ISO Latin 3)ISO-8859-3
ISOLatin4Central European (ISO Latin 4)ISO-8859-4
ISOLatinCyrillicCyrillic (ISO 8859-5)ISO-8859-5
ISOLatinArabicArabic (ISO 8859-6)ISO-8859-6
ISOLatinGreekGreek (ISO 8859-7)ISO-8859-7
ISOLatinHebrewHebrew (ISO 8859-8)ISO-8859-8
ISOLatin5Turkish (ISO Latin 5)ISO-8859-9
ISOLatin6Nordic (ISO Latin 6)ISO-8859-10
ISOLatinThaiThai (ISO 8859-11)ISO-8859-11
ISOLatin7Baltic Rim (ISO Latin 7)ISO-8859-13
ISOLatin8Celtic (ISO Latin 8)ISO-8859-14
ISOLatin9Western (ISO Latin 9)ISO-8859-15
ISOLatin10Romanian (ISO Latin 10)ISO-8859-16
DOSLatinUSLatin-US (DOS)CP437
DOSGreekGreek (DOS)CP737
DOSBalticRimBaltic Rim (DOS)CP775
DOSLatin1Western (DOS Latin 1)CP850
DOSGreek1Greek (DOS Greek 1)n/a
DOSLatin2Central European (DOS Latin 2)CP852
DOSCyrillicCyrillic (DOS)n/a
DOSTurkishTurkish (DOS)CP857
DOSPortuguesePortuguese (DOS)n/a
DOSIcelandicIcelandic (DOS)CP861
DOSHebrewHebrew (DOS)DOS-862
DOSCanadianFrenchCanadian French (DOS)n/a
DOSArabicArabic (DOS)cp864
DOSNordicNordic (DOS)n/a
DOSRussianRussian (DOS)CP866
DOSGreek2Greek (DOS Greek 2)IBM869
DOSThaiThai (Windows, DOS)CP874
DOSJapaneseJapanese (Windows, DOS)CP932
DOSChineseSimplifSimplified Chinese (Windows, DOS)CP936
DOSKoreanKorean (Windows, DOS)CP949
DOSChineseTradTraditional Chinese (Windows, DOS)CP950
WindowsLatin1Western (Windows Latin 1)WINDOWS-1252
WindowsLatin2Central European (Windows Latin 2)WINDOWS-1250
WindowsCyrillicCyrillic (Windows)WINDOWS-1251
WindowsGreekGreek (Windows)WINDOWS-1253
WindowsLatin5Turkish (Windows Latin 5)WINDOWS-1254
WindowsHebrewHebrew (Windows)WINDOWS-1255
WindowsArabicArabic (Windows)WINDOWS-1256
WindowsBalticRimBaltic Rim (Windows)WINDOWS-1257
WindowsVietnameseVietnamese (Windows)WINDOWS-1258
ShiftJIS_X0213Japanese (Shift JIS X0213)Shift_JIS
GBK_95Chinese (GBK)GBK
GB_18030_2000Chinese (GB 18030)GB18030
ISO_2022_JPJapanese (ISO 2022-JP)ISO-2022-JP
ISO_2022_KRKorean (ISO 2022-KR)ISO-2022-KR
EUC_CNSimplified Chinese (EUC)GB2312
EUC_TWTraditional Chinese (EUC)EUC-TW
ShiftJISJapanese (Shift JIS)SHIFT_JIS
KOI8_RCyrillic (KOI8-R)KOI8-R
Big5Traditional Chinese (Big 5)BIG5
MacRomanLatin1Western (Mac Mail)X-MAC-ROMAN-LATIN1
HZ_GB_2312Simplified Chinese (HZ GB 2312)HZ-GB-2312
Big5_HKSCS_1999Traditional Chinese (Big 5 HKSCS)Big5-HKSCS
KOI8_UUkrainian (KOI8-U)KOI8-U
Big5_ETraditional Chinese (Big 5-E)n/a
NextStepLatinWestern (NextStep)X-NEXTSTEP
EBCDIC_CP037Western (EBCDIC Latin 1)IBM037
UTF8Unicode (UTF-8)UTF-8
UTF16Unicode (UTF-16)UTF-16
UTF16BEUnicode (UTF-16BE)UTF-16BE
UTF16LEUnicode (UTF-16LE)UTF-16LE
UTF32Unicode (UTF-32)UTF-32
UTF32BEUnicode (UTF-32BE)UTF-32BE
UTF32LEUnicode (UTF-32LE)UTF-32LE
NonLossyASCIINon-lossy ASCIIn/a


Basic examples

Here are instructions which can be executed from the Tcl shell in Alpha:
    transcode names
    transcode names -pretty
    transcode names -IANA