Hystrix Semaphore timeout

2018/10/03 22:17
阅读数 1.8K

When to use semaphore

For Thread isolation. there is a thread context switch cost. but this is almost can be ignored in most application. see Thread pool

For circuits that wrap very low-latency requests (such as those that primarily hit in-memory caches) the overhead can be too high and in those cases you can use another method such as tryable semaphores which, while they do not allow for timeouts, provide most of the resilience benefits without the overhead. The overhead in general, however, is small enough that Netflix in practice usually prefers the isolation benefits of a separate thread over such techniques.


It does not allow for timing out and walking away.

why? show me the code.

     * Semaphore that only supports tryAcquire and never blocks and that supports a dynamic permit count.
     * <p>
     * Using AtomicInteger increment/decrement instead of java.util.concurrent.Semaphore since we don't need blocking and need a custom implementation to get the dynamic permit count and since
     * AtomicInteger achieves the same behavior and performance without the more complex implementation of the actual Semaphore class using AbstractQueueSynchronizer.
    /* package */static class TryableSemaphoreActual implements TryableSemaphore {
        protected final HystrixProperty<Integer> numberOfPermits;
        private final AtomicInteger count = new AtomicInteger(0);

        public TryableSemaphoreActual(HystrixProperty<Integer> numberOfPermits) {
            this.numberOfPermits = numberOfPermits;

        public boolean tryAcquire() {
            int currentCount = count.incrementAndGet();
            if (currentCount > numberOfPermits.get()) {
                return false;
            } else {
                return true;

        public void release() {

        public int getNumberOfPermitsUsed() {
            return count.get();


通过计数的形式实现信号量机制。 Since1.4.4 Hystrix 提供了信号量的timeOut 机制,但是timeout 不会中断原来的线程,但是在timeout 发生的时候,把这个信息反馈到熔断器上,这样能做到更加实时的熔断机制。timeout 也是通过Timer 来实现的。

       Observable<R> execution;
        if (properties.executionTimeoutEnabled().get()) {
            execution = executeCommandWithSpecifiedIsolation(_cmd)
                    .lift(new HystrixObservableTimeoutOperator<R>(_cmd));
        } else {
            execution = executeCommandWithSpecifiedIsolation(_cmd);
		TimerListener listener = new TimerListener() {

                public void tick() {
                    // if we can go from NOT_EXECUTED to TIMED_OUT then we do the timeout codepath
                    // otherwise it means we lost a race and the run() execution completed or did not start
                    if (originalCommand.isCommandTimedOut.compareAndSet(TimedOutStatus.NOT_EXECUTED, TimedOutStatus.TIMED_OUT)) {
                        // report timeout failure
                        originalCommand.eventNotifier.markEvent(HystrixEventType.TIMEOUT, originalCommand.commandKey);

                        // shut down the original request

                        final HystrixContextRunnable timeoutRunnable = new HystrixContextRunnable(originalCommand.concurrencyStrategy, hystrixRequestContext, new Runnable() {

                            public void run() {
                                child.onError(new HystrixTimeoutException());

                        //if it did not start, then we need to mark a command start for concurrency metrics, and then issue the timeout

                public int getIntervalTimeInMilliseconds() {
                    return originalCommand.properties.executionTimeoutInMilliseconds().get();

Note: if a dependency is isolated with a semaphore and then becomes latent, the parent threads will remain blocked until the underlying network calls timeout. Semaphore rejection will start once the limit is hit but the threads filling the semaphore can not walk away.



ab -n 200 -c 200 http://localhost:8990/coke/block
Percentage of the requests served within a certain time (ms)
  50%   3057
  66%   3238
  75%   3274
  80%   3321
  90%   3510
  95%   3516
  98%   3523
  99%   3524
 100%   3526 (longest request)


Zuul and Hystrix timeout

As we know Zuul default use HystrixCommand with in ribbon http client.Default Hystrix isolation pattern (ExecutionIsolationStrategy) for all routes is SEMAPHORE.

Zuul 默认使用信号量做隔离(因为Zuul主要做请求转发已经是线程隔离的了,所以没有必要再使用一次线程隔离),超时由HttpClient的timeout 设置,当请求timeout之后抛出异常,然后才会触发对应的熔断降级。 如果使用线程池做隔离,则超时实践如下:

(ribbon.ConnectTimeout + ribbon.ReadTimeout) * (ribbon.MaxAutoRetries + 1) * (ribbon.MaxAutoRetriesNextServer + 1)

Zuul default use serviceId as commandKey, default semophore is 100.


If you trust the client and you only want load shedding, you could use this approach.(Semaphore)

当我们的调用方有可能出现延迟,并且qps很高的时候(如果这个时候使用线程池,你可能需要创建很大数量的线程池比如,qps2000,响应时间1秒,这个时候就需要 2000大小的线程,会额外带来大量的线程切换开销),这个时候我们可以使用alibaba/Sentinel 来做熔断降级。


Sentinel 与 Hystrix 的对比

0 收藏
Zuul如果使用semaphore隔离,zuul server接收请求和执行转发调用的都是同一个线程吧? 我感觉zuul之所以默认用信号量隔离,是因为网关是多扇出的,后端可能有几十个微服务,每个都有一个线程池的话,线程会很多
2018/11/21 18:16
1 评论
0 收藏