记线上数据库升级后,数据库事物回滚过多的问题排查

原创
2018/03/25 11:20
阅读数 2.1K

背景:

楼主所在公司,因双11流量较大,现有数据库版本存在Bug且性能存在问题,故在双11前,切换到mysql5.7版本且服务器规格大幅度提升,线上顺利切换。

问题:

数据库切换后,应用的sql监控发现,新增了一条高频的sql:

select @@session.tx_read_only

同时数据库的监控发现:事物回滚数特别多。(因数据敏感,监控图不贴出来)

排查:

针对第一个问题:新增高频sql

查阅官方文档发现此属性为5.6.5版本后新增的。

https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_tx_read_only

同时该属性在5.7.20后被废弃,在5.8之后直接被删除。

https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_tx_read_only

排查对应驱动的实现:

ConnectionImpl类中,如下实现:

/**
     * Tests to see if the connection is in Read Only Mode. Note that prior to 5.6,
     * we cannot really put the database in read only mode, but we pretend we can by
     * returning the value of the readOnly flag
     * 
     * @param useSessionStatus
     *            in some cases, for example when restoring connection with autoReconnect=true,
     *            we can rely only on saved readOnly state, so use useSessionStatus=false in that case
     * 
     * @return true if the connection is read only
     * @exception SQLException
     *                if a database access error occurs
     */
    public boolean isReadOnly(boolean useSessionStatus) throws SQLException {
        if (useSessionStatus && !this.isClosed && versionMeetsMinimum(5, 6, 5) && !getUseLocalSessionState() && getReadOnlyPropagatesToServer()) {
            java.sql.Statement stmt = null;
            java.sql.ResultSet rs = null;

            try {
                try {
                    stmt = getMetadataSafeStatement();

                    rs = stmt.executeQuery("select @@session.tx_read_only");
                    if (rs.next()) {
                        return rs.getInt(1) != 0; // mysql has a habit of tri+ state booleans
                    }
                } catch (SQLException ex1) {
                    if (ex1.getErrorCode() != MysqlErrorNumbers.ER_MUST_CHANGE_PASSWORD || getDisconnectOnExpiredPasswords()) {
                        throw SQLError.createSQLException("Could not retrieve transation read-only status server", SQLError.SQL_STATE_GENERAL_ERROR, ex1,
                                getExceptionInterceptor());
                    }
                }

            } finally {
                if (rs != null) {
                    try {
                        rs.close();
                    } catch (Exception ex) {
                        // ignore
                    }

                    rs = null;
                }

                if (stmt != null) {
                    try {
                        stmt.close();
                    } catch (Exception ex) {
                        // ignore
                    }

                    stmt = null;
                }
            }
        }

        return this.readOnly;
    }

具体在什么场景调用此方法的,后面在介绍,如果不熟悉JDBC的,可参考下此文章:

https://www.cnblogs.com/jianshu/p/6023098.html

针对第二个问题:事物回滚数多的问题,有以下猜想

第一猜想:应用事物代码编写存在问题,导致开启的事物存在大量的回滚操作。(系统在十来年前编写的,事物管理都是自己编码实现的,未使用事物管理框架)

而后分析了应用服务器的代码,事物存在回滚的数量原低于开启事物的数量,故否定这一问题导致的。

第二猜想:既然应用代码没有任何问题,那么问题很难定位到,潜在可能为数据库连接池导致的(我们使用的是dbcp 1.4)。

如何验证第二猜想呢?

验证猜想二

1、本地搭建一个mysql5.7。

使用docker快速创建一个Mysql5.7的实例(语法不介绍):

docker run --name mysql5.7 -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7

同时开启mysql的generate log来抓取数据库日志

( 具体介绍可参照官方文档:https://dev.mysql.com/doc/refman/5.7/en/query-log.html )。

2、本地编写简单的一个事物管理代码

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

import javax.annotation.Resource;
import javax.sql.DataSource;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

/**
 *
 * @author hzwumsh
 * 创建时间 2018-03-16 14:04
 *
 */
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = "classpath:spring-test.xml")
public class TestDb
{
	@Resource
	private DataSource dataSource;

	@Test
	public void test1() throws SQLException
	{
		for(int i = 1;i < 10;i++)
		{
			Connection connection = dataSource.getConnection();
			connection.setTransactionIsolation(2);//设置隔离级别会读提交
			PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)");
			preparedStatement.setString(1, "1");
			preparedStatement.executeUpdate();
			preparedStatement.close();
			connection.close();
		}

		for(int i = 1;i < 10;i++)
		{
			Connection connection = dataSource.getConnection();
			connection.setTransactionIsolation(2);//设置隔离级别会读提交
			connection.setAutoCommit(false);
			PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)");
			preparedStatement.setString(1, "1");
			preparedStatement.executeUpdate();
			connection.commit();
			preparedStatement.close();
			connection.close();
		}

		for(int i = 1;i < 10;i++)
		{
			Connection connection = dataSource.getConnection();
			connection.setTransactionIsolation(2);//设置隔离级别会读提交
			PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)");
			preparedStatement.setString(1, "1");
			preparedStatement.executeUpdate();
			preparedStatement.close();
			connection.close();
		}
	}
}

然后执行代码,观察generate log,以下为一次操作的完整日志:

event_time	user_host	thread_id	server_id	command_type	argument
2018-03-16 06:34:59.121870	testdb[testdb] @  [172.17.0.1]	14	0	Query	SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED
2018-03-16 06:34:59.122509	testdb[testdb] @  [172.17.0.1]	14	0	Query	SET autocommit=0
2018-03-16 06:34:59.123312	testdb[testdb] @  [172.17.0.1]	14	0	Query	INSERT INTO user (test1) VALUE ('1')
2018-03-16 06:34:59.123858	testdb[testdb] @  [172.17.0.1]	14	0	Query	commit
2018-03-16 06:34:59.126554	testdb[testdb] @  [172.17.0.1]	14	0	Query	select @@session.tx_read_only
2018-03-16 06:34:59.127432	testdb[testdb] @  [172.17.0.1]	14	0	Query	rollback
2018-03-16 06:34:59.128008	testdb[testdb] @  [172.17.0.1]	14	0	Query	SET autocommit=1

发现每次执行完事物操作后,均会有一次查询服务器的事物是否只读操作,然后做回滚,然后设置事物自动提交。(目前已经重现线上的此类问题)

然后debug跟进代码,同时观察数据库的日志,发现结果如下:

1、第一个不显式开启事物的,默认事物自动提交的情况下,不会出现最后三条sql。

2、第二个显式开启事物的,默认事物不自动提交的情况下,出现上述问题。

跟进代码为:

 

PoolableConnection的关闭方法:

 

 
/**
 * Returns me to my pool.
 */
 public synchronized void close() throws SQLException {
    if (_closed) {
        // already closed
        return;
    }

    boolean isUnderlyingConectionClosed;
    try {
        isUnderlyingConectionClosed = _conn.isClosed();
    } catch (SQLException e) {
        try {
            _pool.invalidateObject(this); // XXX should be guarded to happen at most once
        } catch(IllegalStateException ise) {
            // pool is closed, so close the connection
            passivate();
            getInnermostDelegate().close();
        } catch (Exception ie) {
            // DO NOTHING the original exception will be rethrown
        }
        throw (SQLException) new SQLException("Cannot close connection (isClosed check failed)").initCause(e);
    }

    if (!isUnderlyingConectionClosed) {
        // Normal close: underlying connection is still open, so we
        // simply need to return this proxy to the pool
        try {
            _pool.returnObject(this); // XXX should be guarded to happen at most once
        } catch(IllegalStateException e) {
            // pool is closed, so close the connection
            passivate();
            getInnermostDelegate().close();
        } catch(SQLException e) {
            throw e;
        } catch(RuntimeException e) {
            throw e;
        } catch(Exception e) {
            throw (SQLException) new SQLException("Cannot close connection (return to pool failed)").initCause(e);
        }
    } else {
        // Abnormal close: underlying connection closed unexpectedly, so we
        // must destroy this proxy
        try {
            _pool.invalidateObject(this); // XXX should be guarded to happen at most once
        } catch(IllegalStateException e) {
            // pool is closed, so close the connection
            passivate();
            getInnermostDelegate().close();
        } catch (Exception ie) {
            // DO NOTHING, "Already closed" exception thrown below
        }
        throw new SQLException("Already closed.");
    }
}

会将本次的连接还回数据库连接池中,重点为:

 

_pool.returnObject(this);

dbcp默认的pool为:GenericObjectPool或者AbandonedObjectPool(继承自GenericObjectPool)

其中pool的returnObject的实现如下:

/**
     * <p>Returns an object instance to the pool.</p>
     * 
     * <p>If {@link #getMaxIdle() maxIdle} is set to a positive value and the number of idle instances
     * has reached this value, the returning instance is destroyed.</p>
     * 
     * <p>If {@link #getTestOnReturn() testOnReturn} == true, the returning instance is validated before being returned
     * to the idle instance pool.  In this case, if validation fails, the instance is destroyed.</p>
     * 
     * <p><strong>Note: </strong> There is no guard to prevent an object
     * being returned to the pool multiple times. Clients are expected to
     * discard references to returned objects and ensure that an object is not
     * returned to the pool multiple times in sequence (i.e., without being
     * borrowed again between returns). Violating this contract will result in
     * the same object appearing multiple times in the pool and pool counters
     * (numActive, numIdle) returning incorrect values.</p>
     * 
     * @param obj instance to return to the pool
     */
    public void returnObject(Object obj) throws Exception {
        try {
            addObjectToPool(obj, true);
        } catch (Exception e) {
            if (_factory != null) {
                try {
                    _factory.destroyObject(obj);
                } catch (Exception e2) {
                    // swallowed
                }
                // TODO: Correctness here depends on control in addObjectToPool.
                // These two methods should be refactored, removing the
                // "behavior flag", decrementNumActive, from addObjectToPool.
                synchronized(this) {
                    _numActive--;
                    allocate();
                }
            }
        }
    }
/**
     * <p>Adds an object to the pool.</p>
     * 
     * <p>Validates the object if testOnReturn == true and passivates it before returning it to the pool.
     * if validation or passivation fails, or maxIdle is set and there is no room in the pool, the instance
     * is destroyed.</p>
     * 
     * <p>Calls {@link #allocate()} on successful completion</p>
     * 
     * @param obj instance to add to the pool
     * @param decrementNumActive whether or not to decrement the active count
     * @throws Exception
     */
    private void addObjectToPool(Object obj, boolean decrementNumActive) throws Exception {
        boolean success = true;
        if(_testOnReturn && !(_factory.validateObject(obj))) {
            success = false;
        } else {
            _factory.passivateObject(obj);
        }

        boolean shouldDestroy = !success;

        // Add instance to pool if there is room and it has passed validation
        // (if testOnreturn is set)
        synchronized (this) {
            if (isClosed()) {
                shouldDestroy = true;
            } else {
                if((_maxIdle >= 0) && (_pool.size() >= _maxIdle)) {
                    shouldDestroy = true;
                } else if(success) {
                    // borrowObject always takes the first element from the queue,
                    // so for LIFO, push on top, FIFO add to end
                    if (_lifo) {
                        _pool.addFirst(new ObjectTimestampPair(obj));
                    } else {
                        _pool.addLast(new ObjectTimestampPair(obj));
                    }
                    if (decrementNumActive) {
                        _numActive--;
                    }
                    allocate();
                }
            }
        }

        // Destroy the instance if necessary
        if(shouldDestroy) {
            try {
                _factory.destroyObject(obj);
            } catch(Exception e) {
                // ignored
            }
            // Decrement active count *after* destroy if applicable
            if (decrementNumActive) {
                synchronized(this) {
                    _numActive--;
                    allocate();
                }
            }
        }

    }

其中pool中的默认工厂为:PoolableConnectionFactory。

在还会连接到连接池中,会调用工厂的如下方法:

 
public void passivateObject(Object obj) throws Exception {
    if(obj instanceof Connection) {
        Connection conn = (Connection)obj;
        if(!conn.getAutoCommit() && !conn.isReadOnly()) {
            conn.rollback();
        }
        conn.clearWarnings();
        if(!conn.getAutoCommit()) {
            conn.setAutoCommit(true);
        }
    }
    if(obj instanceof DelegatingConnection) {
        ((DelegatingConnection)obj).passivate();
    }
}

 

从而发现,最外层将连接关闭时,连接会还回工厂同时进行如上判断。

自动提交在显式事物开启是false,同时服务器在5.6.5时,isReadOnly返回为false,所以这儿固定会有一次rollback的操作,同时设置事物自动提交。

至此问题基本找到,是否可以通过升级连接池的版本来解决此问题呢?

对比发现dbcp 2.x开始新增了一个属性来判断是否需要执行此操作:

public void passivateObject(final PooledObject<PoolableConnection> p)
            throws Exception {

        validateLifetime(p);

        final PoolableConnection conn = p.getObject();
        Boolean connAutoCommit = null;
        if (rollbackOnReturn) {
            connAutoCommit = Boolean.valueOf(conn.getAutoCommit());
            if(!connAutoCommit.booleanValue() && !conn.isReadOnly()) {
                conn.rollback();
            }
        }

        conn.clearWarnings();

        // DBCP-97 / DBCP-399 / DBCP-351 Idle connections in the pool should
        // have autoCommit enabled
        if (enableAutoCommitOnReturn) {
            if (connAutoCommit == null) {
                connAutoCommit = Boolean.valueOf(conn.getAutoCommit());
            }
            if(!connAutoCommit.booleanValue()) {
                conn.setAutoCommit(true);
            }
        }

        conn.passivate();
    }

 

代码:

https://github.com/apache/commons-dbcp/blob/master/src/main/java/org/apache/commons/dbcp2/PoolableConnectionFactory.java#L373

但是此操作存在潜在的风险,如果应用代码中,存在事物已经开启了,但是因异常等特殊场景,导致事物未提交,连接复用后,导致两个事物合并提交等问题。

调研阿里开源的数据库连接池发现,可解决此问题。

https://github.com/alibaba/druid/blob/master/src/main/java/com/alibaba/druid/pool/DruidPooledConnection.java#L233

@Override
    public void close() throws SQLException {
        if (this.disable) {
            return;
        }

        DruidConnectionHolder holder = this.holder;
        if (holder == null) {
            if (dupCloseLogEnable) {
                LOG.error("dup close");
            }
            return;
        }

        DruidAbstractDataSource dataSource = holder.getDataSource();
        boolean isSameThread = this.getOwnerThread() == Thread.currentThread();
        
        if (!isSameThread) {
            dataSource.setAsyncCloseConnectionEnable(true);
        }
        
        if (dataSource.isAsyncCloseConnectionEnable()) {
            syncClose();
            return;
        }

        for (ConnectionEventListener listener : holder.getConnectionEventListeners()) {
            listener.connectionClosed(new ConnectionEvent(this));
        }

        
        List<Filter> filters = dataSource.getProxyFilters();
        if (filters.size() > 0) {
            FilterChainImpl filterChain = new FilterChainImpl(dataSource);
            filterChain.dataSource_recycle(this);
        } else {
            recycle();
        }

        this.disable = true;
    }

会执行recycle方法:

public void recycle() throws SQLException {
        if (this.disable) {
            return;
        }

        DruidConnectionHolder holder = this.holder;
        if (holder == null) {
            if (dupCloseLogEnable) {
                LOG.error("dup close");
            }
            return;
        }

        if (!this.abandoned) {
            DruidAbstractDataSource dataSource = holder.getDataSource();
            dataSource.recycle(this);
        }

        this.holder = null;
        conn = null;
        transactionInfo = null;
        closed = true;
    }

会执行数据源的recycle方法:

https://github.com/alibaba/druid/blob/master/src/main/java/com/alibaba/druid/pool/DruidDataSource.java#L1641

/**
     * 回收连接
     */
    protected void recycle(DruidPooledConnection pooledConnection) throws SQLException {
        final DruidConnectionHolder holder = pooledConnection.holder;

        if (holder == null) {
            LOG.warn("connectionHolder is null");
            return;
        }

        if (logDifferentThread //
            && (!isAsyncCloseConnectionEnable()) //
            && pooledConnection.ownerThread != Thread.currentThread()//
        ) {
            LOG.warn("get/close not same thread");
        }

        final Connection physicalConnection = holder.conn;

        if (pooledConnection.traceEnable) {
            Object oldInfo = null;
            activeConnectionLock.lock();
            try {
                if (pooledConnection.traceEnable) {
                    oldInfo = activeConnections.remove(pooledConnection);
                    pooledConnection.traceEnable = false;
                }
            } finally {
                activeConnectionLock.unlock();
            }
            if (oldInfo == null) {
                if (LOG.isWarnEnabled()) {
                    LOG.warn("remove abandonded failed. activeConnections.size " + activeConnections.size());
                }
            }
        }

        final boolean isAutoCommit = holder.underlyingAutoCommit;
        final boolean isReadOnly = holder.underlyingReadOnly;
        final boolean testOnReturn = this.testOnReturn;

        try {
            // check need to rollback?
            if ((!isAutoCommit) && (!isReadOnly)) {
                pooledConnection.rollback();
            }

            // reset holder, restore default settings, clear warnings
            boolean isSameThread = pooledConnection.ownerThread == Thread.currentThread();
            if (!isSameThread) {
                final ReentrantLock lock = pooledConnection.lock;
                lock.lock();
                try {
                    holder.reset();
                } finally {
                    lock.unlock();
                }
            } else {
                holder.reset();
            }

            if (holder.discard) {
                return;
            }

            if (physicalConnection.isClosed()) {
                lock.lock();
                try {
                    activeCount--;
                    closeCount++;
                } finally {
                    lock.unlock();
                }
                return;
            }

            if (testOnReturn) {
                boolean validate = testConnectionInternal(holder, physicalConnection);
                if (!validate) {
                    JdbcUtils.close(physicalConnection);

                    destroyCount.incrementAndGet();

                    lock.lock();
                    try {
                        activeCount--;
                        closeCount++;
                    } finally {
                        lock.unlock();
                    }
                    return;
                }
            }

            if (!enable) {
                discardConnection(holder.conn);
                return;
            }

            boolean result;
            final long lastActiveTimeMillis = System.currentTimeMillis();
            lock.lock();
            try {
                activeCount--;
                closeCount++;

                result = putLast(holder, lastActiveTimeMillis);
                recycleCount++;
            } finally {
                lock.unlock();
            }

            if (!result) {
                JdbcUtils.close(holder.conn);
                LOG.info("connection recyle failed.");
            }
        } catch (Throwable e) {
            holder.clearStatementCache();

            if (!holder.discard) {
                this.discardConnection(physicalConnection);
                holder.discard = true;
            }

            LOG.error("recyle error", e);
            recycleErrorCountUpdater.incrementAndGet(this);
        }
    }

我们发现druid也和dbcp一样,有读取事物是否自动提交和是否只读的操作,为何druid可以解决这个问题呢?

原因为:

@Override
    public void rollback() throws SQLException {
        if (transactionInfo == null) {
            return;
        }

        if (holder == null) {
            return;
        }

        DruidAbstractDataSource dataSource = holder.getDataSource();
        dataSource.incrementRollbackCount();

        try {
            conn.rollback();
        } catch (SQLException ex) {
            handleException(ex, null);
        } finally {
            handleEndTransaction(dataSource, null);
        }
    }

druid的数据连接,会保存本次操作的事物信息,如果事物已经进行提交,那么会清空当前的事物信息,从而判断是否需要回滚时,可以得知当前没有任何操作需要进行回滚操作,这个就是比dbcp高级的地方。

 

至此这个问题已经分析完毕,楼主目前使用druid解决此问题。

展开阅读全文
打赏
0
3 收藏
分享
加载中
更多评论
打赏
0 评论
3 收藏
0
分享
返回顶部
顶部