The [transcode] command.

Introduction

The transcode command lets you perform encoding operations. It provides support for more than 100 different encodings and can perform conversions from and to these encodings. It is built over the Core Foundation Mac OS X API. Since the Core Foundation framework uses Unicode (UTF-16) internally, all converters convert between UTF-16 (with the endianness according to the current platform) and another encoding. For more information about the Core Foundations, see the Core Foundation Design Concepts.
A particular encoding can have different names depending on the platform or the environment where it is used. For instance, the UTF-8 encoding is sometimes written Utf8, Utf-8, UTF8. The [transcode] command supports all these variants. It also supports prettified names and IANA (Internet Assigned Numbers Authority) names as explained with the [transcode alias] subcommand. See also the table in the Encoding names section below.

Synopsis

The formal syntax of the [transcode] command is:

transcode subcommand ?options?

The possible subcommands are described below. Depending on the subcommand, different arguments and options can be specified.

The [alias] subcommand

This command lets you obtain different forms of the encoding names. Encodings can be designated by a standard name, a prettified name or a canonic name as defined by the IANA (Internet Assigned Numbers Authority). Encoding names are case insensitive.
The syntax of this subcommand is:

    transcode alias (-pretty|-IANA|-standard) encoding

The encoding argument can be specified in any form. The command will return the form specified by the option: -pretty, -IANA, or -standard. If the encoding is not supported or does not have a name in the specified system, the command raises an error.

The [from] subcommand

This command lets you convert data from the specified encoding to Unicode (UTF-16) encoding.
The syntax of this subcommand is:

    transcode from encoding data

This command has the same meaning as the Tcl command [encoding convertfrom]: the difference is that it handles characters whose Unicode code-point is above the BMP correctly. In particular, in a conversion from UTF-8, a valid 4-bytes UTF-8 sequence will produce a pair of surrogates in UTF-16.
The characters in the data argument are treated as binary data where the lower 8-bits of each character are taken as a single byte. The resulting sequence of bytes is treated as a string in the specified encoding.

The [names] subcommand

This command lets you list the available encodings.
The syntax of this subcommand is:

    transcode names ?(-pretty|-IANA|-standard)?

With no option or -standard option, this command returns a list of standard names for the available encodings. If the -pretty option is specified, the encoding names are returned in a pretty form more suitable for display. If the -IANA option is specified, the encoding names are returned as defined by the IANA (Internet Assigned Numbers Authority): see IANA character-sets page.
Important: note that not all of the supported encodings have a IANA name. As a consequence, if the option -IANA is specified, some encodings may be missing from the returned list. See the table in the Encoding Names section. Use the [transcode alias] command to convert encoding names between the different systems.

The [to] subcommand

This command lets you convert a string in Unicode (UTF-16) encoding to the specified encoding. The syntax is:

    transcode to ?-external? ?-subst str? encoding string

This command has the same meaning as the Tcl command [encoding convertto]: the difference is that it handles surrogate pairs correctly. In particular, when converting to UTF-8, a valid pair of surrogates will produce a 4-bytes sequence.
The result is a sequence of bytes representing the converted string.
The following options can be specified:

-external: the -external option is useful only for UTF-16. It inserts a BOM (Byte Order Mark, \uFEFF) at the beginning of the output string.
-subst: the -subst option lets you specify a character which will be used as a substitution character in case a Unicode character does not have an equivalent in the target charset. By default, the substitution character is the question mark (\x3F). The character is supposed to be in the target encoding: no conversion is performed on it.

Encoding names

Standard encoding names are case insensitive and any dashes, underscores or spaces are ignored. Thus the strings "UTF-8", "utf_8", "u*T@f8" and "Utf 8" are equivalent.
The following table gives the various forms of encoding names. Not all of the encodings have a IANA name. To learn more about the IANA names, see the IANA character-sets page. Missing names are marked as n/a.
One can use the [transcode alias] subcommand to convert between these representations.

Standard names	Pretty names	IANA names
MacRoman	Western (Mac OS Roman)	MACINTOSH
MacJapanese	Japanese (Mac OS)	X-MAC-JAPANESE
MacChineseTrad	Traditional Chinese (Mac OS)	X-MAC-TRAD-CHINESE
MacKorean	Korean (Mac OS)	X-MAC-KOREAN
MacArabic	Arabic (Mac OS)	X-MAC-ARABIC
MacHebrew	Hebrew (Mac OS)	X-MAC-HEBREW
MacGreek	Greek (Mac OS)	X-MAC-GREEK
MacCyrillic	Cyrillic (Mac OS)	X-MAC-CYRILLIC
MacDevanagari	Devanagari (Mac OS)	X-MAC-DEVANAGARI
MacGurmukhi	Gurmukhi (Mac OS)	X-MAC-GURMUKHI
MacGujarati	Gujarati (Mac OS)	X-MAC-GUJARATI
MacThai	Thai (Mac OS)	X-MAC-THAI
MacChineseSimp	Simplified Chinese (Mac OS)	X-MAC-SIMP-CHINESE
MacTibetan	Tibetan (Mac OS)	X-MAC-TIBETAN
MacCentralEurRoman	Central European (Mac OS)	X-MAC-CENTRALEURROMAN
MacSymbol	Symbol (Mac OS)	X-MAC-SYMBOL
MacDingbats	Dingbats (Mac OS)	X-MAC-DINGBATS
MacTurkish	Turkish (Mac OS)	X-MAC-TURKISH
MacCroatian	Croatian (Mac OS)	X-MAC-CROATIAN
MacIcelandic	Icelandic (Mac OS)	X-MAC-ICELANDIC
MacRomanian	Romanian (Mac OS)	X-MAC-ROMANIAN
MacCeltic	Celtic (Mac OS)	n/a
MacGaelic	Gaelic (Mac OS)	n/a
MacFarsi	Farsi (Mac OS)	X-MAC-FARSI
MacUkrainian	Cyrillic (Mac OS Ukrainian)	X-MAC-UKRAINIAN
MacInuit	Inuit (Mac OS)	n/a
MacHFS	Western (Mac OS Roman)	macintosh
ISOLatin1	Western (ISO Latin 1)	ISO-8859-1
ISOLatin2	Central European (ISO Latin 2)	ISO-8859-2
ISOLatin3	Western (ISO Latin 3)	ISO-8859-3
ISOLatin4	Central European (ISO Latin 4)	ISO-8859-4
ISOLatinCyrillic	Cyrillic (ISO 8859-5)	ISO-8859-5
ISOLatinArabic	Arabic (ISO 8859-6)	ISO-8859-6
ISOLatinGreek	Greek (ISO 8859-7)	ISO-8859-7
ISOLatinHebrew	Hebrew (ISO 8859-8)	ISO-8859-8
ISOLatin5	Turkish (ISO Latin 5)	ISO-8859-9
ISOLatin6	Nordic (ISO Latin 6)	ISO-8859-10
ISOLatinThai	Thai (ISO 8859-11)	ISO-8859-11
ISOLatin7	Baltic Rim (ISO Latin 7)	ISO-8859-13
ISOLatin8	Celtic (ISO Latin 8)	ISO-8859-14
ISOLatin9	Western (ISO Latin 9)	ISO-8859-15
ISOLatin10	Romanian (ISO Latin 10)	ISO-8859-16
DOSLatinUS	Latin-US (DOS)	CP437
DOSGreek	Greek (DOS)	CP737
DOSBalticRim	Baltic Rim (DOS)	CP775
DOSLatin1	Western (DOS Latin 1)	CP850
DOSGreek1	Greek (DOS Greek 1)	n/a
DOSLatin2	Central European (DOS Latin 2)	CP852
DOSCyrillic	Cyrillic (DOS)	n/a
DOSTurkish	Turkish (DOS)	CP857
DOSPortuguese	Portuguese (DOS)	n/a
DOSIcelandic	Icelandic (DOS)	CP861
DOSHebrew	Hebrew (DOS)	DOS-862
DOSCanadianFrench	Canadian French (DOS)	n/a
DOSArabic	Arabic (DOS)	cp864
DOSNordic	Nordic (DOS)	n/a
DOSRussian	Russian (DOS)	CP866
DOSGreek2	Greek (DOS Greek 2)	IBM869
DOSThai	Thai (Windows, DOS)	CP874
DOSJapanese	Japanese (Windows, DOS)	CP932
DOSChineseSimplif	Simplified Chinese (Windows, DOS)	CP936
DOSKorean	Korean (Windows, DOS)	CP949
DOSChineseTrad	Traditional Chinese (Windows, DOS)	CP950
WindowsLatin1	Western (Windows Latin 1)	WINDOWS-1252
WindowsLatin2	Central European (Windows Latin 2)	WINDOWS-1250
WindowsCyrillic	Cyrillic (Windows)	WINDOWS-1251
WindowsGreek	Greek (Windows)	WINDOWS-1253
WindowsLatin5	Turkish (Windows Latin 5)	WINDOWS-1254
WindowsHebrew	Hebrew (Windows)	WINDOWS-1255
WindowsArabic	Arabic (Windows)	WINDOWS-1256
WindowsBalticRim	Baltic Rim (Windows)	WINDOWS-1257
WindowsVietnamese	Vietnamese (Windows)	WINDOWS-1258
ASCII	Western (ASCII)	US-ASCII
ShiftJIS_X0213	Japanese (Shift JIS X0213)	Shift_JIS
GBK_95	Chinese (GBK)	GBK
GB_18030_2000	Chinese (GB 18030)	GB18030
ISO_2022_JP	Japanese (ISO 2022-JP)	ISO-2022-JP
ISO_2022_KR	Korean (ISO 2022-KR)	ISO-2022-KR
EUC_JP	Japanese (EUC)	EUC-JP
EUC_CN	Simplified Chinese (EUC)	GB2312
EUC_TW	Traditional Chinese (EUC)	EUC-TW
EUC_KR	Korean (EUC)	EUC-KR
ShiftJIS	Japanese (Shift JIS)	SHIFT_JIS
KOI8_R	Cyrillic (KOI8-R)	KOI8-R
Big5	Traditional Chinese (Big 5)	BIG5
MacRomanLatin1	Western (Mac Mail)	X-MAC-ROMAN-LATIN1
HZ_GB_2312	Simplified Chinese (HZ GB 2312)	HZ-GB-2312
Big5_HKSCS_1999	Traditional Chinese (Big 5 HKSCS)	Big5-HKSCS
KOI8_U	Ukrainian (KOI8-U)	KOI8-U
Big5_E	Traditional Chinese (Big 5-E)	n/a
NextStepLatin	Western (NextStep)	X-NEXTSTEP
EBCDIC_US	Western (EBCDIC US)	n/a
EBCDIC_CP037	Western (EBCDIC Latin 1)	IBM037
UTF8	Unicode (UTF-8)	UTF-8
UTF16	Unicode (UTF-16)	UTF-16
UTF16BE	Unicode (UTF-16BE)	UTF-16BE
UTF16LE	Unicode (UTF-16LE)	UTF-16LE
UTF32	Unicode (UTF-32)	UTF-32
UTF32BE	Unicode (UTF-32BE)	UTF-32BE
UTF32LE	Unicode (UTF-32LE)	UTF-32LE
NonLossyASCII	Non-lossy ASCII	n/a

The Tcl encoding names (see the [encoding] command) are also supported.

Examples

Here are instructions which can be executed one by one in the Tcl Shell (⌘Y) in Alpha:

transcode names
transcode names -pretty
transcode names -IANA