問題描述
生產環境下有幾臺tomcat,但突然某個時候發現所有的請求都不能響應了,由于我們的web server使用的是nginx,會將請求反向到tomcat上,所以起初懷疑是nginx就沒有收到請求,但查看日志后發現,nginx中大量出現499的返回,這說明問題還是出在tomcat上.
問題排查
首先我想到的是不是CPU跑滿了,雖說CPU沒有報警但還是本能的top命令看下系統負載,發現系統只有0.x的負載,cpu,內存消耗都是正常的.
由于CPU沒有出現異常,所以應該不是GC出現了問題,但還是檢查了下GC log,果然GC也沒問題
此時必須讓jstack上場了,果然在使用jstack后發現很多線程都是WAITING狀態
"http-nio-127.0.0.1-801-exec-498" daemon prio=10 tid=0x00002ada7c14f800 nid=0x16a6 waiting on condition [0x00002ada9c905000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007873e6990> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:133) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:282) at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:64) at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:177) at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:170) at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:102) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:240) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:227) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:173) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:85) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at com.weimai.utils.HttpClientUtil.doGet(HttpClientUtil.java:105) at com.weimai.utils.HttpClientUtil.doGet(HttpClientUtil.java:87) at com.weimai.utils.WeiBoUtil.checkUser(WeiBoUtil.java:214) at com.weimai.web.UserInfoController.newWeiboLogin(UserInfoController.java:1223) at sun.reflect.GeneratedMethodAccessor390.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)
此時意識到問題應該出現http連接上,馬上用netstat查看下801端口的連接狀態,果然發現很多請求都是CLOSE_WAIT,這里簡單解釋下CLOSE_WAIT狀態,如果我們的client程序處于CLOSE_WAIT狀態的話,說明套接字是被動關閉的,整個流程應該是這樣
因為如果是server端主動斷掉當前連接的話,那么雙方關閉這個TCP連接共需要四個packet
server -> FIN -> client
server <- ACK <- client
這時候server端處于FIN_WAIT_2狀態,而我們的程序處于CLOSE_WAIT狀態
server <- FIN <- client
這時client發送FIN給server,client就置為LAST_ACK狀態。
server -> ACK -> client
server回應了ACK,那么client的套接字才會真正置為CLOSED狀態
我們的請求處于CLOSE_WAIT狀態,而不是LAST_ACK狀態,說明還沒有發FIN給server,那么很簡單,去看HttpClientUtil中如何處理就知道了,果然在查看HttpClientUtil代碼中發現對于非正常關閉的http連接沒有做abort,補充完善好try catch finally塊后問題得到解決.