前提
上周遇到次奇葩的同步錯誤,error 1048 , 看似是簡單的not null導致,但是為什么master可以執行,slave不行呢?為什么5.1的slave可以,5.6的slave不行呢? 帶著很多疑問,準備來一窺究竟
[ERROR] Slave SQL: Error 'Column 'type_id' cannot be null' on query. Default database: ''. Query: 'insert into if_dw_stats.da_upload_nh_score_rank_result(city_id,city_name,comm_id,region_name,paid,comm_name_nh,region_id_num,region_id,subregion_id_num,subregion_id,vcuv,vcuv_z,call_vcuv,call_vcuv_z,orders_vcuv,orders_vcuv_z,peitao,peitao_z,result_score,rank,type_id,type_name,pinyin,cal_dt) values (N), 其中N>9000;
這里總結一下我遇到過的錯誤,分三種情況,雖然都是由于null引起,但是1048才是重點。
- timestamp字段類型,為什么master執行成功,同步到slave報錯?
- int字段類型,5.1(master)<--- 5.6(slave),同步報錯?
- int字段類型,5.6(master)<--- 5.6(slave),同步報錯?
接下來,開始進入主題
場景一
- explicit_defaults_for_timestamp timestamp注意事項
* DB架構: Master(5.1) <-- Slave(5.6)* 表結構如下 :
dbadmin:abc> desc lc_time;
+-------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+-------------------+-----------------------------+
| id | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------+-----------+------+-----+-------------------+-----------------------------+
1 row in set (0.00 sec)* masterdbadmin:abc> select @@global.explicit_defaults_for_timestamp;
+------------------------------------------+
| @@global.explicit_defaults_for_timestamp |
+------------------------------------------+
| 0 |
+------------------------------------------+
1 row in set (0.00 sec)dbadmin:abc> insert into lc_time values(null);
Query OK, 1 row affected (0.02 sec)dbadmin:abc> select * from lc_time;
+---------------------+
| id |
+---------------------+
| 2014-11-25 13:02:14 |
+---------------------+
1 row in set (0.00 sec)*slavedbadmin:abc> select @@global.explicit_defaults_for_timestamp;
+------------------------------------------+
| @@global.explicit_defaults_for_timestamp |
+------------------------------------------+
| 1 |
+------------------------------------------+
1 row in set (0.00 sec)dbadmin:abc> insert into lc_time values(null);
ERROR 1048 (23000): Column 'id' cannot be null
- 結論:master上explicit_defaults_for_timestamp=0,slave上explicit_defaults_for_timestamp=1,會出現這種錯誤。
-
解決方案:
- 保證master和slave explicit_defaults_for_timestamp 一致。
- 前端過濾掉null。
場景二
* DB架構master(5.1)|-------------------------------------| |slave A(5.1) slave B(5.6)* 表結構dbadmin:abc> show create table abc;
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| abc | CREATE TABLE `abc` (`id` int(11) DEFAULT NULL,`id2` int(11) NOT NULL DEFAULT '6'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)* 核心參數: master 和 slave A,B 的sql_mode 都是 '';* 癥狀:在master上執行一條SQL語句 insert into abc values(1,0),(1,null);結果 Slave A 正常, Slave B 報error 1048,Error 'Column 'id2' cannot be null' on query, 這是為什么呢?Question1:為什么insert into abc values(1,null)失敗?insert into abc values(1,0),(1,null);成功?Question2:為什么5.1 slave可以,5.6slave 不行?Question3:手動去slave B上執行同樣的insert,為什么可以執行成功?如果你已經知道為什么,可以忽略下面的分析。* 分析:細心的讀者已經發現,第一個問題的答案已經在sql_mode鏈接中。接下來,測試過程中發現:insert into abc values(1,0),(1,null); 在sql_mode=''的時候,不管是5.1還是5.6都會成功執行。那么問題只有一個,sql_mode出了問題。查看master binlog后發現:在insert語句之前,多了這個可以執行的注釋:SET @@session.sql_mode=2097152。我們來看看:dbadmin:abc> SET @@session.sql_mode=2097152;
Query OK, 0 rows affected (0.00 sec)dbadmin:abc> select @@session.sql_mode;
+---------------------+
| @@session.sql_mode |
+---------------------+
| STRICT_TRANS_TABLES |
+---------------------+
1 row in set (0.00 sec)這下,似乎發現了蛛絲馬跡,那么問題又來了。SET @@session.sql_mode=2097152; 從何而來?是程序寫的?還是mysql自帶的?
經過一番折騰,定位到此SQL來自java jdbc 。以下代碼摘自 java ConnectionIMPL.javaprivate void setupServerForTruncationChecks() throws SQLException {if (getJdbcCompliantTruncation()) {if (versionMeetsMinimum(5, 0, 2)) {String currentSqlMode =this.serverVariables.get("sql_mode");boolean strictTransTablesIsSet = StringUtils.indexOfIgnoreCase(currentSqlMode, "STRICT_TRANS_TABLES") != -1;if (currentSqlMode == null ||currentSqlMode.length() == 0 || !strictTransTablesIsSet) {StringBuffer commandBuf = new StringBuffer("SET sql_mode='");if (currentSqlMode != null && currentSqlMode.length() > 0) {commandBuf.append(currentSqlMode);commandBuf.append(",");}commandBuf.append("STRICT_TRANS_TABLES'");execSQL(null, commandBuf.toString(), -1, null,DEFAULT_RESULT_SET_TYPE,DEFAULT_RESULT_SET_CONCURRENCY, false,this.database, null, false);setJdbcCompliantTruncation(false); // server's handling this for us now} else if (strictTransTablesIsSet) {// We didn't set it, but someone did, so we piggy back on itsetJdbcCompliantTruncation(false); // server's handling this for us now}}}}大致的意思就是:如果sql_mode = ‘’,那么java會調高sql_mode的級別,commandBuf.append("STRICT_TRANS_TABLES'");ok,這下我們已經知道此set來自java,那么問題又來了。即便設置STRICT_TRANS_TABLES,要出問題,master就會報錯了,為啥master是好的,Slave A是好的,卻Slave B 同步出錯呢?結果已經很明顯,因為Slave B是5.6。說的明顯一點就是:
在嚴格模式下,5.1中可以執行,但是5.6不行,這應該算是5.6安全方面的新特性么?
有興趣的同學可以自己測試下。
-
解決方案
- 配置java或者修改java源碼,讓其不要更改mysql的sql_mode
- 臨時解決方案: insert ignore xxx;
- sql_mode的規范。
場景三
* DB架構 Master(5.6) <--- Slave (5.6)* sql_mode 都是'';* 報錯如下:Replicate_Wild_Ignore_Table: mysql.%,test.%Last_Errno: 1048Last_Error: Error 'Column 'referer' cannot be null' on query. Default database: 'action_db'. Query: 'insert into oplogin_log(`cityId`,`userId`,`userName`,`uri
`,`referer`,`logType`,`logDate`,`ip`,`status`)values('','','kyqxmxyt','/login.php?rtn=1','http://xx.com:80/login.php?rtn=' RLIKE (SELECT (CASE WHEN (ORD(MID((SELECT IFNULL(CAST(COUNT(DISTINCT(schema_na
me)) AS CHAR),0x20) FROM INFORMATION_SCHEMA.SCHEMATA),1,1))>50) THEN 0x687474703a2f2f6f70746f6f6c732e616e6a756b652e636f6d3a38302f6c6f67696e2e7068703f72746e3d ELSE 0x28 END)) AND 'ae
WZ'='aeWZ','1','1416020259','114.242.250.192','2') #v1:checklogin@login.php (15) 1416020259'這條奇葩且牛B的SQL,我來稍微翻譯一下,如果INFORMATION_SCHEMA.SCHEMATA 去重后,得到的庫名的第一個字符如果是1,返回0,否則返回 null。將這種SQL稍微轉換成簡單一點的:master:abc> desc abc;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| id2 | int(11) | NO | | 6 | |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)master:abc> select * from abc;
+------+-----+
| id | id2 |
+------+-----+
| 1 | 0 |
| 1 | 0 |
| 2 | 0 |
| 2 | 0 |
| 1 | 1 |
| 1 | 0 |
+------+-----+
6 rows in set (0.00 sec)master:abc> select * from lc;
Empty set (0.00 sec)master:abc> insert into abc values('1', case when (select count(*) from lc) < 1 then 1 else NULL end );
Query OK, 1 row affected (0.00 sec)查看master的binlog如下:
*binlog*
# at 1109
#141125 12:44:51 server id 101082106 end_log_pos 1271 CRC32 0x9ec0ca94 Query thread_id=28 exec_time=0 error_code=0
SET TIMESTAMP=1416890691/*!*/;
insert into abc values('1', case when (select count(*) from lc) < 1 then 1 else NULL end )
/*!*/;slave:abc> select * from abc;
+------+-----+
| id | id2 |
+------+-----+
| 1 | 0 |
| 1 | 0 |
| 2 | 0 |
| 2 | 0 |
| 1 | 0 |
| 1 | 0 |
+------+-----+
6 rows in set (0.00 sec)slave:abc> select * from lc;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)*slave status*Last_SQL_Errno: 1048Last_SQL_Error: Error 'Column 'id2' cannot be null' on query. Default database: 'abc'. Query: 'insert into abc values('1', case when (select count(*) from lc) < 1 then 1 else NULL end )'
-
結論
- 最終binlog并不是RBR,所以會報錯。
- 臨時解決方案: insert ignore xxx. 然后再用pt-table-checksum && pt-sync等修復。
- 禁止case when語句。
Mysql reference 5.6 error code
總結
以上是生活随笔為你收集整理的MySQL Error 1048 奇遇记的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。