最近在开发过程中碰到数据库写入失败的情况。
“Incorrect string value: ‘\xF0\x9F\x91\x91M\xE3…” for column ‘nick_name’ at row 1”
排查发现是由于该用户的昵称含有 emoji
表情包导致的,含有表情包和普通字符串有什么区别呢?
原来 MySQL 的默认 utf8
(也称 utf8mb3
) 只支持三个字节的字符。而 emoji
表情包需要四个字节表示。所以存储就会报错。不过 MySQL 在 5.5.3 之后增加了这个 utf8mb4
的编码,用来兼容四字节的 Unicode。
为了解决这个问题,需要修改数据库的字符编码。
> show variables like '%char%';
+---------------------------+----------------------------+
| Variable_name | Value |
+---------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| ft_query_extra_word_chars | OFF |
+---------------------------+----------------------------+
> show create database xxx;
+-------------------------+----------------------------------------------------------------------------------------------------------------+
| Database | Create Database |
+-------------------------+----------------------------------------------------------------------------------------------------------------+
| xxx | CREATE DATABASE `xxx` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */ |
+-------------------------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
> show create table user;
| user | CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL COMMENT '用户ID',
`nick_name` varchar(120) COLLATE utf8mb4_unicode_ci NOT NULL COMMENT '昵称',
`avatar` varchar(300) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '头像数据',
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '记录更新时间',
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=157 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
通过查看数据库、表的字符编码发现已经是 utf8mb4
,那么问题可能出在客户端。查看 Flask 项目的配置文件,发现客户端配置的编码为 utf8
。将编码格式修改为 utf8mb4
即可解决问题。
class devConfig(baseConfig):
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://root:xxxx@localhost/ncov?charset=utf8mb4'
UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL’s encoding called “utf8” (alias of “utf8mb3”) only stores a maximum of three bytes per code point.
So the character set “utf8”/“utf8mb3” cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the “Basic Multilingual Plane”. See also Comparison of Unicode encodings.
参考:What is the difference between utf8mb4 and utf8 charsets in MySQL?
三个字节的 uft8 支持编码的 Unicode 字符范围是 0x000-0xffff,称之为 Unicode 的基本多文种平面(BMP),不在基本多文种平面的 Unicode 字符都无法使用 MySQL 中的 utf8 字符集存储。
通过以下命令可直接将数据库字符集变更为 utf8mb4
编码。
# 修改数据库:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# 修改表:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# 修改表字段:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;