Archive Liste Typographie
Message : Re: FW: Localising Unicode character names

(Alain LaBonté ) - Mercredi 09 Décembre 1998
Navigation par date [ Précédent    Index    Suivant ]
Navigation par sujet [ Précédent    Index    Suivant ]

Subject:    Re: FW: Localising Unicode character names
Date:    Wed, 09 Dec 1998 15:50:12 -0500
From:    Alain LaBonté  <alb@xxxxxxxxxxxxxx>

A 14:03 98-11-27 -0000, Breen McInerney a écrit :
>Alain,
>
>Murray gave me your name as a contact and someone who may be able to help on
>below. 
>
>Thanks.
>Breen
>
>> -----Original Message-----
>> From:	Murray Sargent 
>> Sent:	Wednesday, November 25, 1998 9:00 PM
>> To:	Breen McInerney
>> Subject:	RE: Localising Unicode symbol descriptions ...
>> 
>> I don't believe anyone on the Unicode Technical Committee is interested in
>> localised Unicode names.  But Alain LaBonté (Alain [alb@xxxxxxxxxxxxxx])
>> has dealt with the issue for French, and might have some answers for your
>> questions.
>> 
>> Thanks
>> Murray
>> 
>> -----Original Message-----
>> From:	Breen McInerney 
>> Sent:	Wednesday, November 25, 1998 3:08 AM
>> To:	Murray Sargent
>> Subject:	Localising Unicode symbol descriptions ...
>> 
>> Hi Murray,
>> 
>> I have seen your name on various unicode mailing lists and hope you know
>> someone who may be able to advise on below. I am the program manager
>> working on the localisation of Spanish Windows 2000. In the product there
>> are there are 6543 entries which describe the various unicode symbols -
>> strings can be seen in charmap. This industry standard is not translated
>> officially into Spanish although many of the symbols are found in
>> dictionaries and reference books. 
>> 
>> Based on the availability of that terminology, it was decided to localise
>> this subset of the Unicode range: C0 Controls and Basic Latin; C1 Controls
>> and Latin-1 Supplement, Latin Extended-A, Latin Extended-B, IPA
>> Extensions, Spacing Modifier Letters, Combining Diacritical Marks, Greek,
>> Cyrillic, Hebrew, Arabic, Latin Extended Additional, Greek Extended,
>> General Punctuation, Superscripts and Subscripts, Currency Symbols,
>> Combining Diacritical Marks for Symbols, Letterlike Symbols, Number Forms,
>> Arrows, Mathematical Operators, Miscellaneous Technical, Control Pictures,
>> Optical Character Recognition, Enclosed Alphanumerics, Box Drawing, Block
>> Elements, Geometric Shapes, Miscellaneous Symbols, Dingbats, Alphabetic
>> Presentation Forms, Arabic Presentation Forms-A, Combining Half Marks,
>> Small Form Variants, Arabic Presentation Forms-B, Halfwidth and Fullwidth
>> Forms, Specials
>> 
>> For practical reasons it was also decided not translate the Asian and some
>> middle East(?) ones: Armenian, Bengali, Bopomofo, Cicled Katakana, Coptic,
>> Devanagari, Georgian, Gurmukhi, Gujarati, Halfwidth Hangul, Halfwidth
>> Katakana, Hangul, Hiragana, Kannada, Katakana, Lao, Malayalam, Oriya,
>> Tamil, Telugu, Thai, y Tibetan.) 
>> 
>> On the part that has been translated localisers\IQA are finding it
>> difficult to assign the best translation and often are not sure if they
>> have the correct one.
>> Quote from IQA (Internal Language Quality Assurance) "After reviewing the
>> symbols in the Character Map myself, I found quite a few changes, I was
>> using a mixture of my knowledge of phonetic symbols, ancient Greek
>> alphabet, Latin metrical, and quite a lot of books on printing, but still
>> I couldn't manage to solve Arabic, Cirylic, and other alphabet symbols."
>> 
>> The French team have been able to reference a previously translated
>> standard for the French language which was a big help. Nothing equivalent
>> seems to exist for Spanish.
>> By localising for NT5 we are creating our own Microsoft standard which may
>> not comply to others who may have done work on this before for Spanish.
>> Do you have any contacts in the unicode org which may be able to advise on
>> above ? know if there is a standard already ? or anyone who would be
>> interested in reviewing the localisation done already.
>> 
>> Any help\advise much appreciated. 
>> Breen

[Alain]
This is a topic that is important to us. You will probably have had a look
at the web site:
http://babel.alis.com:8080/codage/iso10646/index.html

which lists the French character names of the UCS edition of 1993. Since
that time we updated the list up to amendment 5 of the UCS (Unicode 2 is at
amendment 7 level, if my memory is good) and we were wishing to publish the
French version of ISO/IEC 10646 in synchronization with the next English
version to be "crystallized" next march after twenty-something amendments.
Some people in AFNOR put inappropriate breaks though in saying that since
CEN had refused to adopt the UCS as an European standard (they are making
subsets suing UCS coding), they interpreted this as meaning that the UCS
was rejected in Europe, a cold shower, only temporary hopefully (the time
that they understand that the UCS is desired in Europe, of course), as this
is to the opinion of many, very important for users and your project gives
an idea of how this could also be commercially important. So if AFNOR is
not fast enough, we intend to do it in Québec anyway (it is already on the
web for the majority of names).

Story:

In 1995, Canada and France (backed by Ireland) made a campaign to make sure
that [English] names were not to be *the* machine identifiers, as standards
in ISO can be published in English, French and Russian. We succeeded to
have adopted the idea that only numerical UCS ids (of the form
"U[xyyy]zzzz" be the mandatory *anchors* for character names and coding
between different coding standards, different versions in different
languages of coding standards and different private character sets.

That said, ISO/IEC 14755 (Input methods to enter UCS characters with the
help of nay keyboard) recommends, whenever necessary to present character
names to users (and it is also highly recommended for feedback to
end-users, in particular for different characters displayed with the same
shape), that these be presented in the user's language. For French, this is
already possible. I understand that Sweden also developed a limited version
of the UCS Swedish character names and so did Ireland in Gaelic.

It would be interesting to have a multilingual database of those names.

So far, as I said, the full list of French names are pretty well
established, although still unpublished as an international standard (but
referenced by a lot of users on the web in practice). They are also used
normatively in recent French versions of the ISO/IEC 8859 series (part 14
and 15 [this one similar to Latin 1 except for 8 characters, including the
EURO SIGN] in particular, which are about to be published as international
standards).

I have to slightly correct an information given to you by Murray to the
effect that there are no UTC members interested in localizing French
character names: Michel Suignard (Microsoft Redmond), whom you might know,
is very active in ISO/IEC JTC1 SC2 and is a member of UTC. He has pioneered
the first list of French character names of the UCS when he was working for
Microsoft in France, and was doing it on a benevolent basis. So this might
not be widely known in UTC, but I have to be just in favour of him, and
rectify the information, as he has considerably contributed in this dossier.

If I can be of any help in providing data files for what we have and which
is already public, and if you can contribute to make this list improved by
any means, we would certainly be very pleased to collaborate with you [in
the measure that no paying copyright is placed on any name] (and we are
also very interested in a multilingual list of characters although outside
of French, Swedish and Gaelic, the work has not begun to my knowledge) .
Don't hesitate to recontact me if I can give further details that I would
have omitted to consider to deal with your request.

With my best Regards.

Alain LaBonté
Québec

cc interested parties (hoping that they will agree with this good will 
                       offer of collaboration)