Question:
Dear, I have a little problem with a Java application that I am writing.
It is an application that connects via JDBC to MySQL. When saving the records, the characters are saved correctly (if I write "Mexico" in a JText box and send it to the database, it is saved as "Mexico"); but when reading them back using the ResultSet.getString()
method the "special" characters (accents and "ñ") appear wrong ("Mexico" is read as "Mexico").
I think it's something to do with the "encoding" of the characters, but I don't know specifically what it is. The MySQL database uses utf8_spanish_ci
encoding, and the Charset.defaultCharset()
function returns UTF-8
.
My specific question then is: How to get the strings read from MySQL containing special characters (which were correctly stored) to display correctly in the Java application?
Update (partial fix):
After searching a bit more, I found this question and its answer to help me. Specifically, what it says is that when opening the connection you have to specify the set of characters that will be used; in my case:
DriverManager.getConnection(
"jdbc:mysql://" + host + "/" + dbName
+ "?useUnicode=true&characterEncoding=utf8_spanish_ci",
user, pass);
However, it only partially fixes the problem:
When reading data from VARCHAR
fields the special characters are read correctly. However, when reading from JSON
fields, values that have special characters still show "weird".
Update (final):
The problem has to do with the encoding that MySQL uses to store the JSON data; The answer below illustrates the procedure I followed to fix it.
Answer:
After scratching the problem a bit longer, I found this reference in the MySQL user manual:
MySQL handles strings used in JSON context using the
utf8mb4
character set andutf8mb4_bin
collation. Strings in other character set are converted toutf8mb4
as necessary.
So no matter what character encoding I use, MySQL automatically converts the JSON string to utf8mb4
… which is not a major problem when saving, but reading back 🙁
My solution (and I suspect it is not the best), was the following: Write, within the query, the conversion to the required encoding. Something like this:
strSQL = "select convert(cast(a.json_data as char) using 'utf8') as json_data "
+ "from mi_tabla "
+ "where id = ?";
try(PreparedStatement ps = conn.prepareStatement(strSQL)) {
/*
Mi código para leer la tabla
*/
}
With this "setting", the data is read perfectly (with all the correct accented characters).
I suppose there may be an easier way to fix this, but so far this solution has served me well.