OK well I found a work around although it is not totally logical, and that
makes me suspect that something is wrong somewhere...
1) Extract data from Oracle.. getString returns a UTF8 encoded string ...
OK DONE!
2) MySQL needs 8859 so we need to convert the UTF8 string to 8859 ..OK DONE!
3) Insert the string into MySQL OK DONE!
4a) Now we extract the string from MySQL which should be 8859 and needs to
be converted to UTF8 ... NO this never works ... Once you call getString
using the caucho MySQL driver whatever it returns can never be converted to
UTF8 it always produces garbage..
4b) So instead we use getBytes on the resultset, then convert that to UTF8
... OK DONE!
Now both the original data and the MySQL output are in UTF8 and display
correctly...
The big question in this is what is going on in the caucho MySQL driver,
when I call getString what is it returning and why can this not be converted
to UTF8 whereas the getBytes output can????
Does anyone know whether there are some setting needed to fix the MySQL
driver problem or is this just how it is supposed to work??
I thought that there might be an issue with the java default encoding being
CP1252 but this doesnt work at all, only using 8859 seems to work.
Rob
-----Original Message-----
From: owner-resin-interest@xxx.com
[mailto:owner-resin-interest@xxx.com]On Behalf Of Robert Edgar
Sent: Sunday, December 09, 2001 2:56 AM
To: resin-interest@xxx.com
Subject: RE: Urgent Charset/Encoding Problem
Sadly I tried this and it doesnt work, just outputs really screwed up stuff
-----Original Message-----
From: owner-resin-interest@xxx.com
[mailto:owner-resin-interest@xxx.com]On Behalf Of Manush Dodunekov
Sent: Sunday, December 09, 2001 2:28 AM
To: resin-interest@xxx.com
Subject: Re: Urgent Charset/Encoding Problem
On Sat, 8 Dec 2001, Robert Edgar wrote:
> We have resin setup working for a long time pulling UTF-8 data out from an
> Oracle DB.
>
> No problem at all we just set <web-app character-encoding="UTF-8"> and in
> our servlets I use res.setContentType("text/html; charset=utf-8"); and
> everything works fine.
>
> OK now we want to run a small MySQL DB on the web server and pull data
from
> oracle, then stuff it into mySQL and then display it on a web page.....
>
> So
> 1) I execute a query to extract data from Oracle
> 2) Output to page
> 3) Execute an update to MySQL passing the string retreived from Oralce
> 4) Execute a uqery to retreive the string from MYSQL
> 5) Display string in the same page as above
>
> In theory I get two identical strings but I dont...
>
> The problems arising are all related to encoding.
>
> 1) If I remove setContentType then both strings appear as identical
garbage,
> 2) If I include the setContent type, the string from Oracle is OK but the
> MYSQL string is garbage
> 3) If I remove the setContentType and use new
> String(rs.getString("EVENT_DESCRIPTION").getBytes("UTF-8"),"ISO-8859-1"))
to
> save the string as ISO-8859 into MySQL, then when output of the MySQL
string
> is OK and the Oracle is garbage
>
> To get both strings output correctly I repeat 3 above, then after output
> MySQL string I then do setContentType and output the Oracle string.. OK
then
> I now have noth string dsiplayed correctly.
>
> BUT clearly this is impractical...
>
> Can someone tell me how I should set this up so it will work logically and
> simply.
How about:
1. Get string from Oracle (s1)
2. Convert string to iso-8859-1 (s2)
3. Store s2 in mysql
4. Get string from mysql (s3)
5. Convert it to utf-8 (s4)
Use only s1 and s4 for output, assuming you set the content type to
"text/html; charset=utf-8".
hope this helps,
Manush
Received on Sat 08 Dec 2001 22:36:55 -0800
This archive was generated by hypermail 2.1.8 : Thu Sep 28 2006 - 20:16:52 PDT