please help me to understand the what are the possible resions of Kernal Panic .
how to get sortout the problem .
SonaliDive786
<sonalidive786@gmail.com>_____________________________________________________________________________________
How to Troubleshoot Linux Kernel Panics?
Problem Description:
Kernel panics on Linux are hard to identify and
troubleshoot. Troubleshooting kernel panics often
requires reproducing a situation that occurs rarely
and collecting data that is difficult to gather.
Technical Discussion:
What is a kernel panic?
As the name implies, the Linux kernel gets into a
situation where it doesn¡¯t know what to do next. When
this happens, the kernel gives as much information as
it can about what caused the problem, depending on
what caused the panic.
There are two main kinds of kernel panics:
1. Hard Panic ¨C also known as Aieee!
2. Soft Panic ¨C also known as Oops
What can cause a kernel panic?
Only modules that are located within kernel space can
directly cause the kernel to panic. To see what
modules are dynamically loaded, do lsmod ¨C this shows
all dynamically loaded modules (Dialogic drivers, LiS,
SCSI driver, filesystem, etc.). In addition to these
dynamically loaded modules, components that are built
into the kernel (memory map, etc.) can cause a panic.
Since hard panics and soft panics are different in
nature, we will discuss how to deal with each
separately.
How to Troubleshoot a Hard Kernel Panic
Hard Panics ¨C Symptoms:
1. Machine is completely locked up and unusable.
2. Num Lock / Caps Lock / Scroll Lock keys usually
blink.
3. If in console mode, dump is displayed on monitor
(including the phrase ¡°Aieee!¡±).
4. Similar to Windows Blue Screen.
Hard panics ¨C causes:
The most common cause of a hard kernel panic is when a
driver crashes within an interrupt handler, usually
because it tried to access a null pointer within the
interrupt handler. When this happens, that driver
cannot handle any new interrupts and eventually the
system crashes. This is not exclusive to Dialogic
drivers.
Hard panics ¨C information to collect:
Depending on the nature of the panic, the kernel will
log all information it can prior to locking up. Since
a kernel panic is a drastic failure, it is uncertain
how much information will be logged. Below are key
pieces of information to collect. It is important to
collect as many of these as possible, but there is no
guarantee that all of them will be available,
especially the first time a panic is seen.
1. /var/log/messages ¡ª sometimes the entire kernel
panic stack trace will be logged there
2. Application / Library logs (RTF, cheetah, etc.) ¨C
may show what was happening before the panic
3. Other information about what happened just prior to
the panic, or how to reproduce
4. Screen dump from console. Since the OS is locked,
you cannot cut and paste from the screen. There are
two common ways to get this info:
o Digital Picture of screen (preferred, since it¡¯s
quicker and easier)
o Copying screen with pen and paper or typing to
another computer
If the dump is not available either in
/var/log/message or on the screen, follow these tips
to get a dump:
1. If in GUI mode, switch to full console mode ¨C no
dump info is passed to the GUI (not even to GUI
shell).
2. Make sure screen stays on during full test run ¨C
if a screen saver kicks in, the screen won¡¯t return
after a kernel panic. Use these settings to ensure the
screen stays on.
o setterm -blank 0
o setterm -powerdown 0
o setvesablank off
3. From console, copy dump from screen (see above).
Hard panics ¨C Troubleshooting when a full trace is
available
The stack trace is the most important piece of
information to use in troubleshooting a kernel panic.
It is often crucial to have a full stack trace,
something that may not be available if only a screen
dump is provided ¨C the top of the stack may scroll
off the screen, leaving only a partial stack trace. If
a full trace is available, it is usually sufficient to
isolate root cause. To identify whether or not you
have a large enough stack trace, look for a line with
EIP, which will show what function call and module
caused the panic. In the example below, this is shown
in the following line:
EIP is at _dlgn_setevmask [streams-dlgnDriver ] 0xe
If the culprit is a Dialogic driver you will see a
module name with:
streams-xxxxDriver (xxxx = dlgn, dvbm, mercd, etc.)
Hard panic ¨C full trace example:
Unable to handle kernel NULL pointer dereference at
virtual address 0000000c
printing eip:
f89e568a
*pde = 32859001
*pte = 00000000
Oops: 0000
Kernel 2.4.9-31enterprise
CPU: 1
EIP: 0010:[] Tainted: PF
EFLAGS: 00010096
EIP is at _dlgn_setevmask [streams-dlgnDriver ] 0xe
eax: 00000000 ebx: f65f5410 ecx: f5e16710 edx:
f65f5410
esi: 00001ea0 edi: f5e23c30 ebp: f65f5410 esp:
f1cf7e78
ds: 0018 es: 0018 ss: 0018
Process pwcallmgr (pid: 10334, stackpage=f1cf7000)
Stack: 00000000 c01067fa 00000086 f1cf7ec0 00001ea0
f5e23c30 f65f5410 f89e53ec
f89fcd60 f5e16710 f65f5410 f65f5410 f8a54420 f1cf7ec0
f8a4d73a 0000139e
f5e16710 f89fcd60 00000086 f5e16710 f5e16754 f65f5410
0000034a f894e648
Call Trace: [setup_sigcontext+ 218/288]
setup_sigcontext [kernel] 0xda
Call Trace: [] setup_sigcontext [kernel] 0xda
[] dlgnwput [streams-dlgnDriver ] 0xe8
[] Sm_Handle [streams-dlgnDriver ] 0¡Á1ea0
[] intdrv_lock [streams-dlgnDriver ] 0¡Á0
[] Gn_Maxpm [streams-dlgnDriver ] 0¡Á8ba
[] Sm_Handle [streams-dlgnDriver ] 0¡Á1ea0
[] lis_safe_putnext [streams] 0¡Á168
[] __insmod_streams- dvbmDriver_ S.bss_L117376
[streams-dvbmDriver ] 0xab8 [] dvbmwput
[streams-dvbmDriver ] 0¡Á6f5
[] dvwinit [streams-dvbmDriver ] 0¡Á2c0
[] lis_safe_putnext [streams] 0¡Á168
[] lis_strputpmsg [streams] 0¡Á54c
[] __insmod_streams_ S.rodata_ L35552 [streams] 0¡Á182e
[] sys_putpmsg [streams] 0¡Á6f
[system_call+ 51/56] system_call [kernel] 0¡Á33
[] system_call [kernel] 0¡Á33
Nov 28 12:17:58 talus kernel:
Nov 28 12:17:58 talus kernel:
Code: 8b 70 0c 8b 06 83 f8 20 8b 54 24 20 8b 6c 24 24
76 1c 89 5c
Hard panics ¨C Troubleshooting when a full trace is
not available
If only a partial stack trace is available, it can be
tricky to isolate the root cause, since there is no
explicit information about what module of function
call caused the panic. Instead, only commands leading
up to the final command will be seen in a partial
stack trace. In this case, it is very important to
collect as much information as possible about what
happened leading up to the kernel panic (application
logs, library traces, steps to reproduce, etc).
Hard panic ¨C partial trace example (note there is no
line with EIP information)
[] ip_rcv [kernel] 0¡Á357
[] sramintr [streams_dlgnDriver ] 0¡Á32d
[] lis_spin_lock_ irqsave_fcn [streams] 0¡Á7d
[] inthw_lock [streams_dlgnDriver ] 0¡Á1c
[] pwswtbl [streams_dlgnDriver ] 0¡Á0
[] dlgnintr [streams_dlgnDriver ] 0¡Á4b
[] Gn_Maxpm [streams_dlgnDriver ] 0¡Á7ae
[] __run_timers [kernel] 0xd1
[] handle_IRQ_event [kernel] 0¡Á5e
[] do_IRQ [kernel] 0xa4
[] default_idle [kernel] 0¡Á0
[] default_idle [kernel] 0¡Á0
[] call_do_IRQ [kernel] 0¡Á5
[] default_idle [kernel] 0¡Á0
[] default_idle [kernel] 0¡Á0
[] default_idle [kernel] 0¡Á2d
[] cpu_idle [kernel] 0¡Á2d
[] __call_console_ drivers [kernel] 0¡Á4b
[] call_console_ drivers [kernel] 0xeb
Code: 8b 50 0c 85 d2 74 31 f6 42 0a 02 74 04 89 44 24
08 31 f6 0f
<0> Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
Hard panics ¨C using kernel debugger (KDB)
If only a partial trace is available and the
supporting information is not sufficient to isolate
root cause, it may be useful to use KDB. KDB is a tool
that is compiled into the kernel that causes the
kernel to break into a shell rather than lock up when
a panic occurs. This enables you to collect additional
information about the panic, which is often useful in
determining root cause.
Some important things to note about using KDB:
1. If this is a potential Dialogic issue, technical
support should be contacted prior to the to use of KDB
2. Must use base kernel ¨C i.e. 2.4.18 kernel instead
of 2.4.18-5 from RedHat. This is because KDB is only
available for the base kernels, and not the builds
created by RedHat. While this does create a slight
deviation from the original configuration, it usually
does not interfere with root cause analysis.
3. Need different Dialogic drivers compiled to handle
the specific kernel.
How to Troubleshoot a Soft Kernel Panic
Soft panics ¨C symptoms:
1. Much less severe than hard panic.
2. Usually results in a segmentation fault.
3. Can see an oops message ¨C search /var/log/messages
for string ¡®Oops¡¯.
4. Machine still somewhat usable (but should be
rebooted after information is collected).
Soft panics ¨C causes:
Almost anything that causes a module to crash when it
is not within an interrupt handler can cause a soft
panic. In this case, the driver itself will crash, but
will not cause catastrophic system failure since it
was not locked in the interrupt handler. The same
possible causes exist for soft panics as do for hard
panics (i.e. accessing a null pointer during runtime).
Soft panics ¨C information to collect:
When a soft panic occurs, the kernel will generate a
dump that contains kernel symbols ¨C this information
is logged in /var/log/messages. To begin
troubleshooting, use the ksymoops utility to turn
kernel symbols into meaningful data.
To generate a ksymoops file:
1. Create new file from text of stack trace found in
/var/log/messages. Make sure to strip off timestamps,
otherwise ksymoops will fail.
2. Run ksymoops on new stack trace file:
Generic: ksymoops -o [location of Dialogic drivers]
filename
Example: ksymoops -o /lib/modules/ 2.4.18-5/ misc
ksymoops.log
All other defaults should work fine
For a man page on ksymoops, see the following webpage:
http://gd.tuwien. ac.at/linuxcomma nd.org/man_ pages/ksymoops8. html
############ ######### ######### ######### ######### ######### ######### ##
So you¡¯re trying to start Linux for the first time
and ¡ wham! You get messages like:
¡¤ Unable to mount root device.
¡¤ Kernel panic - not syncing.
What do I do now? Oh, how I love Windows ¡
Here¡¯s the scoop ¡
(1) The first part of the system that starts running
is the ¡°boot loader,¡± usually grub. This is the
program that loads Linux, and/or Windows if you so
desire. (The ¡°master boot record,¡± or MBR, enables
the computer to load grub.)
(2) The first thing that Grub needs to know is ¡
¡°where is the kernel?¡± It gets this from the
/boot/grub/grub. conf file. The way that you specify
the correct drive and partition in Grub is a little
different from, like ¡°(hd0,0)¡± what you use in
ordinary Linux. The kernel will be in some file named
¡°vmlinuz-¡¡±
(3) Once Grub has loaded the kernel into memory, the
first thing that the kernel needs to know is, ¡°where
is the root filesystem?¡± The root= parameter is
passed to the kernel to provide this information.
Notice that now you are talking ¡°to Linux,¡± and you
identify devices ¡°in Linux¡¯s terms,¡± like
¡°/dev/hda2¡å.
(4) Given this information, Linux is going to try to
mount the root filesystem ¡ prepare it for use. The
most common mistake at this point is that you¡¯ve
specified the wrong device in step #3. Unfortunately,
the message that results is rather nasty looking¡
When Linux doesn¡¯t know how to proceed, as in this
case, it says ¡°kernel panic¡± and it stops. But, even
then, it tries to go down gracefully. It tries to
write anything to disk that hasn¡¯t been written out
(an operation called ¡°syncing¡±, for some darn-fool
reason), and if it succeeds in doing so it will say
¡°not syncing.¡± What¡¯s totally misleading about this
message combination is that it implies, incorrectly,
that the reason for the panic is ¡°not syncing,¡± when
actually the reason for the panic will be found in the
preceding few lines.
You might see the message, ¡°tried to kill ¡®init¡¯.¡±
That really means that a program called init died¡
which it is not allowed to ever do. init is a very
special program in Linux¡ the first program created
when the machine starts.
So, basically, when you get these messages on startup
¡ the situation is really a lot more dreadful looking
than it actually is. You have probably just made a
¡°tpyo¡± when entering the information in grub.conf.
(Another common place to make a typo is in /etc/fstab,
which tells Linux where all the other drives are.)
So what do you do? If you¡¯re doing a first-time
install you can just start over. Otherwise, you need
to boot a separate CD-ROM, which will give you a
stand-alone Linux installation from which you can edit
the offending files.
Explained: ¡°kernel panic - not syncing - attempted to
kill init¡±
When the kernel gets into a situation where it does
not know how to proceed (most often during booting,
but at other times), it issues a kernel panic by
calling the panic(msg) routine defined in
kernel/panic. c. (Good name, huh?) This is a call from
which No One Ever Returns.
The panic() routine adds text to the front of the
message, telling you more about what the system was
actually doing when the panic occurred ¡ basically
how big and bad the trail of debris in the filesystem
is likely to be. This is where the ¡°not syncing¡±
part comes from, and when you see that, it¡¯s good.
(panic() does try to issue a sinc() system-call to
push all buffered data out to the hard-disks before it
goes down.)
The second part of the message is what was provided by
the original call to panic(). For example, we find
panic(¡±Tried to kill init!¡±) in kernel/exit. c.
So, what does this actually mean? Well, in this case
it really doesn¡¯t mean that someone tried to kill the
magical init process (process #1¡), but simply that
it tried to die. This process is not allowed to die or
to be killed.
When you see this message, it¡¯s almost always at
boot-time, and the real messages ¡ the cause of the
actual failure ¡ will be found in the startup
messages immediately preceding this one. This is often
the case with kernel-panics. init encountered
something ¡°really bad,¡± and it didn¡¯t know what to
do, so it died, so the kernel died too.
BTW, the kernel-panic code is rather cute. It can
blink lights and beep the system-speaker in Morse
code. It can reboot the system automagically.
Obviously the people who wrote this stuff encountere
Linux शूटर
<shabbathster@gmail.com>__________________________________________________________________________________
No comments:
Post a Comment