Middleware + Devops(MiddleOps): October 2014

Friday, October 31, 2014

stuck threads

Hogging threads are candidates for stuck threads.

Threads that “might” get stuck. These threads will be declared “stuck” after StuckThreadMaxTimeout seconds which usually is 600secs.

If the thread gets released by the request before this timeout, it wont be called hogging thread anymore and will be released to the thread pool.

Hogging Threads that have taken the too much time and we can assume that they are never going to come back.
Hogging threads help us take some decisions, lets say many threads are hogging, we may take a decision to create new threads for next cycle.

Understanding about Thread States in WebLogic Server:
ACTIVE
STUCK
STANDBY

A live thread which is ready to process the request, which is known as ACTIVE state. That is indicated when thread newly created. WebLogic Server start the server instance with 1 ACTIVE thread and the thread count grows as per the min size if specified other wise it will do self-tune as per the request.

Threads might wait for other thread to release resource. This might happen due to application variables. The variables are 2 types thread-safe other is risk for thread. All local variables in the methods are thread-safe. The variable defined in class level are unsafe. which causes memory leak, this state of threads are known as hogging. WebLogic identify a thread as hog by the time interval. If thread is waiting more than 600 sec will be treated as hog. STUCK thread interval we can tune as per the project need.
If the number of HoggingThreadCount increases then the server health is in dangerous. That time you can take the ThreadDump
After Threads increase to a max utilization then the thread will be in STANDBY state.

WebLogic Server Health Status can be one of the following:

HEALTH_OK
HEALTH_WARN
HEALTH_FAILED
HEALTH_CRITICAL
LOW_MEMORY_REASON
HEALTH_OVERLOADED

OK is indicates everything fine, no worries!!

WARN raised when there is few stuck threads in the server instance.

LOW_MEMORY_REASON is going to tell you about JVM crash expected. You can configure to ‘Exit’ the managed server on low memory conditions with the help of NodeManager and WorkManager.

CRITICAL when multiple number of stuck threads happening and the threadpool count reaching unsual number. This case you need to suspect Network, JDBC or back-end connectivity has trouble.

FAILED happen when the new deployments fails. The NodeManager should not restart this managed server.

OVERLOADED Change the server health state to OVERLOADED on overload. The Nodemanager need to work at this state and bounce such WebLogic instance. This is a new feature of WebLogic 9.x and later versions, for detecting, avoiding and recovering from an overload condition of a WebLogic managed server. Overload protection can be used to throttle Work Managers and thread pools for performance. You can configure Shutdown the Work Manager or application on stuck threads when it crosss more than 5 or you can set threshold.

Wednesday, October 29, 2014

Server Monitoring script

def reportDomainHealth(usern, passw, url):
print ""
print "===================================================================================="
connect(usern,passw,url)
domainRuntime()

print "Found Servers: "
serverList=ls('ServerRuntimes');
serverList=serverList.split()
print "%15s %15s %20s %15s %15s %40s" % ("Server","Threads","HoggingThreads","ServerState","Heap_Free","HealthState ")
print "----------------------------------------------------------------------------------------------------------------------------"
for i in range(len(serverList)):
if serverList[i] != 'dr--':
server_st=get('ServerRuntimes/' + serverList[i] + '/HealthState')
server_tc=get('ServerRuntimes/' + serverList[i] + '/ThreadPoolRuntime/ThreadPoolRuntime/ExecuteThreadTotalCount')
server_hog=get('ServerRuntimes/' + serverList[i] + '/ThreadPoolRuntime/ThreadPoolRuntime/HoggingThreadCount')
server_ql=get('ServerRuntimes/' + serverList[i] + '/State')
server_hpfp=get('ServerRuntimes/' + serverList[i] + '/JVMRuntime/' + serverList[i] + '/HeapFreePercent')
print "%15s %15s %20s %15s %15s %40s" % (serverList[i],str(server_tc),str(server_hog),str(server_ql),str(server_hpfp)+"%",str(server_st))
print "===================================================================================="

reportDomainHealth('weblogic','weblogic1','t3://localhost:7001')

Thursday, October 23, 2014

delete files before n days in the folder

find * -mtime +n -exec rm {} \;

n=1 before 1 day

n=2 before 2 days

n=7 before 7 days

Wednesday, October 8, 2014

Vi Search and Replace Commands

Vi: Search and Replace

Change to normal mode with <ESC>.

Search (Wrapped around at end of file):

Search STRING forward : / STRING.

Search STRING backward: ? STRING.

Repeat search: n

Repeat search in opposite direction: N (SHIFT-n)

Replace: Same as with sed, Replace OLD with NEW:

First occurrence on current line: :s/OLD/NEW

Globally (all) on current line: :s/OLD/NEW/g

Between two lines #,#: :#,#s/OLD/NEW/g

Every occurrence in file: :%s/OLD/NEW/g

Sunday, October 5, 2014

Time out while waiting for a managed process to stop HTTP_Server

when some times we found OHS server in stopped mode.

opmnctl startall: starting opmn and all managed processes...

oracle@localhost [/l01/apps/oracle/middleware/Oracle_WT1/instances/ohs1/bin] opmnctl status

Processes in Instance: ohs1

---------------------------------+--------------------+---------+---------

ias-component | process-type | pid | status

---------------------------------+--------------------+---------+---------

ohs1 | OHS | 26508 | Stop

When we normally start this process we will get

oracle@localhost [/l01/apps/oracle/middleware/Oracle_WT1/instances/ohs1/bin] opmnctl startproc ias-component=ohs1

opmnctl startproc: starting opmn managed processes...

================================================================================

opmn id=uslx148:6701

0 of 0 processes started.

Processes are already started: ohs1~ohs1~OHS~OHS

When we have face these problems we need to follow the below process

step1:

ps -ef | grep ohs

Step2:

kill all realted processes

kill -9 process id

Step3:

oracle@localhost [/l01/apps/oracle/middleware/Oracle_WT1/instances/ohs1/bin] opmnctl startall