背景:
楼主所在公司,因双11流量较大,现有数据库版本存在Bug且性能存在问题,故在双11前,切换到mysql5.7版本且服务器规格大幅度提升,线上顺利切换。
问题:
数据库切换后,应用的sql监控发现,新增了一条高频的sql:
select @@session.tx_read_only
同时数据库的监控发现:事物回滚数特别多。(因数据敏感,监控图不贴出来)
排查:
针对第一个问题:新增高频sql
查阅官方文档发现此属性为5.6.5版本后新增的。
https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_tx_read_only
同时该属性在5.7.20后被废弃,在5.8之后直接被删除。
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_tx_read_only
排查对应驱动的实现:
ConnectionImpl类中,如下实现:
/** * Tests to see if the connection is in Read Only Mode. Note that prior to 5.6, * we cannot really put the database in read only mode, but we pretend we can by * returning the value of the readOnly flag * * @param useSessionStatus * in some cases, for example when restoring connection with autoReconnect=true, * we can rely only on saved readOnly state, so use useSessionStatus=false in that case * * @return true if the connection is read only * @exception SQLException * if a database access error occurs */ public boolean isReadOnly(boolean useSessionStatus) throws SQLException { if (useSessionStatus && !this.isClosed && versionMeetsMinimum(5, 6, 5) && !getUseLocalSessionState() && getReadOnlyPropagatesToServer()) { java.sql.Statement stmt = null; java.sql.ResultSet rs = null; try { try { stmt = getMetadataSafeStatement(); rs = stmt.executeQuery("select @@session.tx_read_only"); if (rs.next()) { return rs.getInt(1) != 0; // mysql has a habit of tri+ state booleans } } catch (SQLException ex1) { if (ex1.getErrorCode() != MysqlErrorNumbers.ER_MUST_CHANGE_PASSWORD || getDisconnectOnExpiredPasswords()) { throw SQLError.createSQLException("Could not retrieve transation read-only status server", SQLError.SQL_STATE_GENERAL_ERROR, ex1, getExceptionInterceptor()); } } } finally { if (rs != null) { try { rs.close(); } catch (Exception ex) { // ignore } rs = null; } if (stmt != null) { try { stmt.close(); } catch (Exception ex) { // ignore } stmt = null; } } } return this.readOnly; }
具体在什么场景调用此方法的,后面在介绍,如果不熟悉JDBC的,可参考下此文章:
https://www.cnblogs.com/jianshu/p/6023098.html
针对第二个问题:事物回滚数多的问题,有以下猜想
第一猜想:应用事物代码编写存在问题,导致开启的事物存在大量的回滚操作。(系统在十来年前编写的,事物管理都是自己编码实现的,未使用事物管理框架)
而后分析了应用服务器的代码,事物存在回滚的数量原低于开启事物的数量,故否定这一问题导致的。
第二猜想:既然应用代码没有任何问题,那么问题很难定位到,潜在可能为数据库连接池导致的(我们使用的是dbcp 1.4)。
如何验证第二猜想呢?
验证猜想二
1、本地搭建一个mysql5.7。
使用docker快速创建一个Mysql5.7的实例(语法不介绍):
docker run --name mysql5.7 -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7
同时开启mysql的generate log来抓取数据库日志
( 具体介绍可参照官方文档:https://dev.mysql.com/doc/refman/5.7/en/query-log.html )。
2、本地编写简单的一个事物管理代码
import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import javax.annotation.Resource; import javax.sql.DataSource; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; /** * * @author hzwumsh * 创建时间 2018-03-16 14:04 * */ @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = "classpath:spring-test.xml") public class TestDb { @Resource private DataSource dataSource; @Test public void test1() throws SQLException { for(int i = 1;i < 10;i++) { Connection connection = dataSource.getConnection(); connection.setTransactionIsolation(2);//设置隔离级别会读提交 PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)"); preparedStatement.setString(1, "1"); preparedStatement.executeUpdate(); preparedStatement.close(); connection.close(); } for(int i = 1;i < 10;i++) { Connection connection = dataSource.getConnection(); connection.setTransactionIsolation(2);//设置隔离级别会读提交 connection.setAutoCommit(false); PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)"); preparedStatement.setString(1, "1"); preparedStatement.executeUpdate(); connection.commit(); preparedStatement.close(); connection.close(); } for(int i = 1;i < 10;i++) { Connection connection = dataSource.getConnection(); connection.setTransactionIsolation(2);//设置隔离级别会读提交 PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)"); preparedStatement.setString(1, "1"); preparedStatement.executeUpdate(); preparedStatement.close(); connection.close(); } } }
然后执行代码,观察generate log,以下为一次操作的完整日志:
event_time user_host thread_id server_id command_type argument 2018-03-16 06:34:59.121870 testdb[testdb] @ [172.17.0.1] 14 0 Query SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED 2018-03-16 06:34:59.122509 testdb[testdb] @ [172.17.0.1] 14 0 Query SET autocommit=0 2018-03-16 06:34:59.123312 testdb[testdb] @ [172.17.0.1] 14 0 Query INSERT INTO user (test1) VALUE ('1') 2018-03-16 06:34:59.123858 testdb[testdb] @ [172.17.0.1] 14 0 Query commit 2018-03-16 06:34:59.126554 testdb[testdb] @ [172.17.0.1] 14 0 Query select @@session.tx_read_only 2018-03-16 06:34:59.127432 testdb[testdb] @ [172.17.0.1] 14 0 Query rollback 2018-03-16 06:34:59.128008 testdb[testdb] @ [172.17.0.1] 14 0 Query SET autocommit=1
发现每次执行完事物操作后,均会有一次查询服务器的事物是否只读操作,然后做回滚,然后设置事物自动提交。(目前已经重现线上的此类问题)
然后debug跟进代码,同时观察数据库的日志,发现结果如下:
1、第一个不显式开启事物的,默认事物自动提交的情况下,不会出现最后三条sql。
2、第二个显式开启事物的,默认事物不自动提交的情况下,出现上述问题。
跟进代码为:
PoolableConnection的关闭方法:
/** * Returns me to my pool. */ public synchronized void close() throws SQLException { if (_closed) { // already closed return; } boolean isUnderlyingConectionClosed; try { isUnderlyingConectionClosed = _conn.isClosed(); } catch (SQLException e) { try { _pool.invalidateObject(this); // XXX should be guarded to happen at most once } catch(IllegalStateException ise) { // pool is closed, so close the connection passivate(); getInnermostDelegate().close(); } catch (Exception ie) { // DO NOTHING the original exception will be rethrown } throw (SQLException) new SQLException("Cannot close connection (isClosed check failed)").initCause(e); } if (!isUnderlyingConectionClosed) { // Normal close: underlying connection is still open, so we // simply need to return this proxy to the pool try { _pool.returnObject(this); // XXX should be guarded to happen at most once } catch(IllegalStateException e) { // pool is closed, so close the connection passivate(); getInnermostDelegate().close(); } catch(SQLException e) { throw e; } catch(RuntimeException e) { throw e; } catch(Exception e) { throw (SQLException) new SQLException("Cannot close connection (return to pool failed)").initCause(e); } } else { // Abnormal close: underlying connection closed unexpectedly, so we // must destroy this proxy try { _pool.invalidateObject(this); // XXX should be guarded to happen at most once } catch(IllegalStateException e) { // pool is closed, so close the connection passivate(); getInnermostDelegate().close(); } catch (Exception ie) { // DO NOTHING, "Already closed" exception thrown below } throw new SQLException("Already closed."); } }
会将本次的连接还回数据库连接池中,重点为:
_pool.returnObject(this);
dbcp默认的pool为:GenericObjectPool或者AbandonedObjectPool(继承自GenericObjectPool)
其中pool的returnObject的实现如下:
/** * <p>Returns an object instance to the pool.</p> * * <p>If {@link #getMaxIdle() maxIdle} is set to a positive value and the number of idle instances * has reached this value, the returning instance is destroyed.</p> * * <p>If {@link #getTestOnReturn() testOnReturn} == true, the returning instance is validated before being returned * to the idle instance pool. In this case, if validation fails, the instance is destroyed.</p> * * <p><strong>Note: </strong> There is no guard to prevent an object * being returned to the pool multiple times. Clients are expected to * discard references to returned objects and ensure that an object is not * returned to the pool multiple times in sequence (i.e., without being * borrowed again between returns). Violating this contract will result in * the same object appearing multiple times in the pool and pool counters * (numActive, numIdle) returning incorrect values.</p> * * @param obj instance to return to the pool */ public void returnObject(Object obj) throws Exception { try { addObjectToPool(obj, true); } catch (Exception e) { if (_factory != null) { try { _factory.destroyObject(obj); } catch (Exception e2) { // swallowed } // TODO: Correctness here depends on control in addObjectToPool. // These two methods should be refactored, removing the // "behavior flag", decrementNumActive, from addObjectToPool. synchronized(this) { _numActive--; allocate(); } } } }
/** * <p>Adds an object to the pool.</p> * * <p>Validates the object if testOnReturn == true and passivates it before returning it to the pool. * if validation or passivation fails, or maxIdle is set and there is no room in the pool, the instance * is destroyed.</p> * * <p>Calls {@link #allocate()} on successful completion</p> * * @param obj instance to add to the pool * @param decrementNumActive whether or not to decrement the active count * @throws Exception */ private void addObjectToPool(Object obj, boolean decrementNumActive) throws Exception { boolean success = true; if(_testOnReturn && !(_factory.validateObject(obj))) { success = false; } else { _factory.passivateObject(obj); } boolean shouldDestroy = !success; // Add instance to pool if there is room and it has passed validation // (if testOnreturn is set) synchronized (this) { if (isClosed()) { shouldDestroy = true; } else { if((_maxIdle >= 0) && (_pool.size() >= _maxIdle)) { shouldDestroy = true; } else if(success) { // borrowObject always takes the first element from the queue, // so for LIFO, push on top, FIFO add to end if (_lifo) { _pool.addFirst(new ObjectTimestampPair(obj)); } else { _pool.addLast(new ObjectTimestampPair(obj)); } if (decrementNumActive) { _numActive--; } allocate(); } } } // Destroy the instance if necessary if(shouldDestroy) { try { _factory.destroyObject(obj); } catch(Exception e) { // ignored } // Decrement active count *after* destroy if applicable if (decrementNumActive) { synchronized(this) { _numActive--; allocate(); } } } }
其中pool中的默认工厂为:PoolableConnectionFactory。
在还会连接到连接池中,会调用工厂的如下方法:
public void passivateObject(Object obj) throws Exception { if(obj instanceof Connection) { Connection conn = (Connection)obj; if(!conn.getAutoCommit() && !conn.isReadOnly()) { conn.rollback(); } conn.clearWarnings(); if(!conn.getAutoCommit()) { conn.setAutoCommit(true); } } if(obj instanceof DelegatingConnection) { ((DelegatingConnection)obj).passivate(); } }
从而发现,最外层将连接关闭时,连接会还回工厂同时进行如上判断。
自动提交在显式事物开启是false,同时服务器在5.6.5时,isReadOnly返回为false,所以这儿固定会有一次rollback的操作,同时设置事物自动提交。
至此问题基本找到,是否可以通过升级连接池的版本来解决此问题呢?
对比发现dbcp 2.x开始新增了一个属性来判断是否需要执行此操作:
public void passivateObject(final PooledObject<PoolableConnection> p) throws Exception { validateLifetime(p); final PoolableConnection conn = p.getObject(); Boolean connAutoCommit = null; if (rollbackOnReturn) { connAutoCommit = Boolean.valueOf(conn.getAutoCommit()); if(!connAutoCommit.booleanValue() && !conn.isReadOnly()) { conn.rollback(); } } conn.clearWarnings(); // DBCP-97 / DBCP-399 / DBCP-351 Idle connections in the pool should // have autoCommit enabled if (enableAutoCommitOnReturn) { if (connAutoCommit == null) { connAutoCommit = Boolean.valueOf(conn.getAutoCommit()); } if(!connAutoCommit.booleanValue()) { conn.setAutoCommit(true); } } conn.passivate(); }
代码:
https://github.com/apache/commons-dbcp/blob/master/src/main/java/org/apache/commons/dbcp2/PoolableConnectionFactory.java#L373
但是此操作存在潜在的风险,如果应用代码中,存在事物已经开启了,但是因异常等特殊场景,导致事物未提交,连接复用后,导致两个事物合并提交等问题。
调研阿里开源的数据库连接池发现,可解决此问题。
https://github.com/alibaba/druid/blob/master/src/main/java/com/alibaba/druid/pool/DruidPooledConnection.java#L233
@Override public void close() throws SQLException { if (this.disable) { return; } DruidConnectionHolder holder = this.holder; if (holder == null) { if (dupCloseLogEnable) { LOG.error("dup close"); } return; } DruidAbstractDataSource dataSource = holder.getDataSource(); boolean isSameThread = this.getOwnerThread() == Thread.currentThread(); if (!isSameThread) { dataSource.setAsyncCloseConnectionEnable(true); } if (dataSource.isAsyncCloseConnectionEnable()) { syncClose(); return; } for (ConnectionEventListener listener : holder.getConnectionEventListeners()) { listener.connectionClosed(new ConnectionEvent(this)); } List<Filter> filters = dataSource.getProxyFilters(); if (filters.size() > 0) { FilterChainImpl filterChain = new FilterChainImpl(dataSource); filterChain.dataSource_recycle(this); } else { recycle(); } this.disable = true; }
会执行recycle方法:
public void recycle() throws SQLException { if (this.disable) { return; } DruidConnectionHolder holder = this.holder; if (holder == null) { if (dupCloseLogEnable) { LOG.error("dup close"); } return; } if (!this.abandoned) { DruidAbstractDataSource dataSource = holder.getDataSource(); dataSource.recycle(this); } this.holder = null; conn = null; transactionInfo = null; closed = true; }
会执行数据源的recycle方法:
https://github.com/alibaba/druid/blob/master/src/main/java/com/alibaba/druid/pool/DruidDataSource.java#L1641
/** * 回收连接 */ protected void recycle(DruidPooledConnection pooledConnection) throws SQLException { final DruidConnectionHolder holder = pooledConnection.holder; if (holder == null) { LOG.warn("connectionHolder is null"); return; } if (logDifferentThread // && (!isAsyncCloseConnectionEnable()) // && pooledConnection.ownerThread != Thread.currentThread()// ) { LOG.warn("get/close not same thread"); } final Connection physicalConnection = holder.conn; if (pooledConnection.traceEnable) { Object oldInfo = null; activeConnectionLock.lock(); try { if (pooledConnection.traceEnable) { oldInfo = activeConnections.remove(pooledConnection); pooledConnection.traceEnable = false; } } finally { activeConnectionLock.unlock(); } if (oldInfo == null) { if (LOG.isWarnEnabled()) { LOG.warn("remove abandonded failed. activeConnections.size " + activeConnections.size()); } } } final boolean isAutoCommit = holder.underlyingAutoCommit; final boolean isReadOnly = holder.underlyingReadOnly; final boolean testOnReturn = this.testOnReturn; try { // check need to rollback? if ((!isAutoCommit) && (!isReadOnly)) { pooledConnection.rollback(); } // reset holder, restore default settings, clear warnings boolean isSameThread = pooledConnection.ownerThread == Thread.currentThread(); if (!isSameThread) { final ReentrantLock lock = pooledConnection.lock; lock.lock(); try { holder.reset(); } finally { lock.unlock(); } } else { holder.reset(); } if (holder.discard) { return; } if (physicalConnection.isClosed()) { lock.lock(); try { activeCount--; closeCount++; } finally { lock.unlock(); } return; } if (testOnReturn) { boolean validate = testConnectionInternal(holder, physicalConnection); if (!validate) { JdbcUtils.close(physicalConnection); destroyCount.incrementAndGet(); lock.lock(); try { activeCount--; closeCount++; } finally { lock.unlock(); } return; } } if (!enable) { discardConnection(holder.conn); return; } boolean result; final long lastActiveTimeMillis = System.currentTimeMillis(); lock.lock(); try { activeCount--; closeCount++; result = putLast(holder, lastActiveTimeMillis); recycleCount++; } finally { lock.unlock(); } if (!result) { JdbcUtils.close(holder.conn); LOG.info("connection recyle failed."); } } catch (Throwable e) { holder.clearStatementCache(); if (!holder.discard) { this.discardConnection(physicalConnection); holder.discard = true; } LOG.error("recyle error", e); recycleErrorCountUpdater.incrementAndGet(this); } }
我们发现druid也和dbcp一样,有读取事物是否自动提交和是否只读的操作,为何druid可以解决这个问题呢?
原因为:
@Override public void rollback() throws SQLException { if (transactionInfo == null) { return; } if (holder == null) { return; } DruidAbstractDataSource dataSource = holder.getDataSource(); dataSource.incrementRollbackCount(); try { conn.rollback(); } catch (SQLException ex) { handleException(ex, null); } finally { handleEndTransaction(dataSource, null); } }
druid的数据连接,会保存本次操作的事物信息,如果事物已经进行提交,那么会清空当前的事物信息,从而判断是否需要回滚时,可以得知当前没有任何操作需要进行回滚操作,这个就是比dbcp高级的地方。
至此这个问题已经分析完毕,楼主目前使用druid解决此问题。