|
|
 |
FN-FORUM: Japanese encoding/charsets in RTF files
date posted 20th March 2006 13:50
Hi
I have some code that tracks hits to websites, picks up search terms =
from search engines etc, one of our clients has a lot of hits from =
japan, this is not a problem for displaying in the browser as I set the =
correct content-type and urldecode the search term in PHP and all is =
well. However, one of the features that we offer that our client likes =
is the ability to receive reports on a monthly basis via email in RTF =
format.=20
Problem is that the rtf encoding doesn=E2=80=99t come out correctly
The search terms come in from the search engine looking like this:
%E3%83%AA%E3%83%90%E3%83%97%E3%83%BC%E3%83%AB%E5%A4%A7%E5%AD%A6
As the Japanese chars have been encoded
So we unencode the string and get something like : =
=C3=A3=C6=92=C2=AA=C3=A3=C6=92=C2=90=C3=A3=C6=92=E2=80=94=C3=A3=C6=92=C2=BC=
=C3=A3=C6=92=C2=AB=C3=A5=C2=A4=C2=A7=C3=A5=C2=AD=C2=A6
To get around this in the browser we set the charset to UTF-8 and all is =
well, but in RTF documents you are limited to the following encoding =
formats:
I have tried using a number of the formats below but nothing translates =
the chars correctly, instead it displays Korean, which is wrong!
Ive tried
ANSI
Default=20
And the one that should work : Shift-Jis, just converts the text to =
Korean.
/*
ANSI =3D 0
Default =3D 1
Symbol =3D 2
Invalid =3D 3
Mac =3D 77
Shift Jis =3D 128
Hangul =3D 129
Johab =3D 130
GB2312 =3D 134
Big5 =3D 136
Greek =3D 161
Turkish =3D 162
Vietnamese =3D 163
Hebrew =3D 177
Arabic =3D 178
Arabic Traditional =3D 179
Arabic user =3D 180
Hebrew user =3D 181
Baltic =3D 186
Russian =3D 204
Thai =3D 222
Eastern European =3D 238
PC 437 =3D 254
OEM =3D 255
So does anyone know of an effecting way to translate foreign chars =
correctly in RTF documents or can anyone point me in the correct =
direction.
Many thanks and kind regards
Ash
|
 |
|