Recently, experienced a SRX crash failure. SRX 240H was getting into crash and rebooted itself twice. At last, it came back normal. During checking system log, unfortunately could not find out any details and clues for this crash. Only clues are from console screen. There are some dump messages showing up:

roo@SRX> NMI Exception on core:0
Watchdog status, core 0: 0xfffe6fffffb
FPA INT Summery: 0x0
Err EPC: 0x80745f50
Trapframe Register Dump:
        zero: 00000000  at: 00000001    v0: 00000001    v1: 0000000e
        a0: 000003e8    a1: 00000001    a2: ffff8010    a3: ffffffffd6f23176
        t0: 00000208    t1: 8001070000000208    t2: ffffffff80010700    t3: 00000208
        t4: 00000000    t5: 00000000    t6: 00000000    t7: 00000001
        t8: 23c34600    t9: 0006c48b    s0: 00018853    s1: 0379255d
        s2: 000927c0    s3: ffffffffc1be6b80    s4: ffffffff80b00000    s5: ffffffffd6f230d0
        s6: fffffffffffffffe    s7: 00003fff    k0: 3480100034  k1: 8010003400000080
        gp: ffffffff80af9040    sp: ffffffffd6f23060    s8: 00000000    ra: ffffffff80745f58
        sr: 50c808e5    mullo: 05a0d200 mulhi: 09600000 badvaddr: ffffffffc1bd76fc
        cause: 40008400 pc: ffffffff80745f58
        ErrPC: 00000840
Current ticks/softticks 172842/160779, curproc [1153] rcp

PCPU dump:
cpuid        = 0
curthread    = 0xc1d2b840: pid 1153 “rcp”
ipis         = 0x0
cpuid        = 1
curthread    = 0xc1beb210: pid 21 “idle: cpu1”
ipis         = 0x0
cpuid        = 2
curthread    = 0xc1beb000: pid 20 “idle: cpu2”
ipis         = 0x0
cpuid        = 3
curthread    = 0xc1be7c60: pid 19 “idle: cpu3”
ipis         = 0x0
cpuid        = 4
curthread    = none
ipis         = 0x0
cpuid        = 5
curthread    = none
ipis         = 0x0
cpuid        = 6
curthread    = none
ipis         = 0x0
cpuid        = 7
curthread    = none
ipis         = 0x0
cpuid        = 8
curthread    = none
ipis         = 0x0
cpuid        = 9
curthread    = none
ipis         = 0x0
cpuid        = 10
curthread    = none
ipis         = 0x0
cpuid        = 11
curthread    = none
ipis         = 0x0
Memory dump of 1024 words starting at 0x80000000
0x80000000: 08258e23 401a4000 00000000 80055a7c
0x80000010: 80055ae0 80071cd0 aaaaaaaa aaaaaaaa
0x80000020: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000030: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000040: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000050: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000060: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000070: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000080: 08258e23 401a4000 00000000 aaaaaaaa
0x80000090: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800000a0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800000b0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800000c0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800000d0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800000e0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800000f0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000100: 3c1a8096 275a4b8c 3c1b1fff 377bffff
0x80000110: 035bd024 3c1ba000 035bd025 03400008
0x80000120: 00000000 aaaaaaaa aaaaaaaa aaaaaaaa
0x80000130: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000140: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000150: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000160: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000170: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000180: 401a6000 401b6800 335a0010 001ad0c0
0x80000190: 337b007c 037ad825 3c1a80ae 275a74b8
0x800001a0: 035bd021 8f5a0000 00000000 03400008
0x800001b0: 00000000 aaaaaaaa aaaaaaaa aaaaaaaa
0x800001c0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800001d0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800001e0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800001f0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000200: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000210: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000220: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000230: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000240: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000250: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000260: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000270: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000280: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000290: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800002a0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800002b0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800002c0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800002d0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800002e0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800002f0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000300: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000310: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000320: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000330: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000340: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000350: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000360: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000370: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000380: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x80000390: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800003a0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800003b0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800003c0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800003d0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800003e0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
0x800003f0: aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa
Stack trace:
DELAY+0x4c (3e8,1,ffff8010,d6f23176) ra 80115438 sz 32
xpt_polled_action+0x64 (3e8,1,ffff8010,d6f23176) ra 80119608 sz 48
dashutdown+0xa0 (3e8,1,ffff8010,d6f23176) ra 802284bc sz 664
boot+0x6f8 (3e8,1,ffff8010,d6f23176) ra 80229028 sz 64
panic+0x608 (3e8,1,80010700,508008e1) ra 8075b324 sz 48
panic_on_watchdog_timeout+0x78 (3e8,1,80010700,508008e1) ra 8077a3f0 sz 32
re_srxsme_watchdog_intr+0x14c (3e8,1,80010700,508008e1) ra 80735fb0 sz 24
mips_handle_this_interrupt+0x8c (3e8,1,80010700,508008e1) ra 80736044 sz 40
mips_handle_interrupts+0x60 (3e8,1,80010700,508008e1) ra 80736464 sz 48
mips_interrupt+0x22c (3e8,1,80010700,508008e1) ra 809641cc sz 32
MipsKernIntr+0x144 (3e8,1,ffff8010,d6f236e6) ra 80745f58 sz 360
DELAY+0x54 (3e8,1,ffff8010,d6f236e6) ra 80115438 sz 32
xpt_polled_action+0x64 (3e8,1,ffff8010,d6f236e6) ra 80119608 sz 48
dashutdown+0xa0 (3e8,1,ffff8010,d6f236e6) ra 802284bc sz 664
boot+0x6f8 (3e8,1,ffff8010,d6f236e6) ra 80229028 sz 64
panic+0x608 (3e8,c1d89600,57d3,0) ra 806e1770 sz 48
ufs_dirbad+0x3c (3e8,c1d89600,57d3,0) ra 806e26d4 sz 32
ufs_lookup+0x280 (d6f23a00,100c044,57d3,0) ra 8095ffac sz 144
VOP_CACHEDLOOKUP_APV+0x64 (d6f23a00,100c044,57d3,0) ra 802c4280 sz 24
vfs_cache_lookup+0xf4 (d6f23a00,100c044,57d3,0) ra 80962260 sz 64
VOP_LOOKUP_APV+0x74 (d6f23a00,100c044,57d3,0) ra 802cbb90 sz 32
lookup+0x750 (d6f23a00,100c044,57d3,0) ra 802cd008 sz 104
namei+0x774 (d6f23a00,100c044,57d3,0) ra 802e1688 sz 120
kern_stat+0x4c (c1d2b840,3ffe8ff8,0,0) ra 802e1810 sz 224
stat+0x28 (c1d2b840,3ffe8ff8,0,0) ra 80741a8c sz 128
trap+0x15dc (c1d2b840,3ffe8ff8,0,0) ra 80963e74 sz 144
OcteonNMIException+0x350 (2f,3ffe8ff8,3ffe6f80,0) ra 3ffeb4f0 sz 360
PC 0x3ffeb4f0: not in kernel
uart_z8530_class+0x3ffeb4f0 (2f,3ffe8ff8,3ffe6f80,0) ra 0 sz 0
pid 1153, process: rcp
Resetting the  system now…

U-Boot 1.1.6 (Build time: Oct  9 2009 – 11:16:01)

SRX_240_HIGHMEM board revision major:1, minor:40, serial #: ******
OCTEON CN5230R-SCP pass 2.0, Core clock: 600 MHz, DDR clock: 333 MHz (666 Mhz data rate)
DRAM:  1024 MB
Starting Memory POST…
Checking datalines… OK
Checking address lines… OK
Checking 512K memory for U-Boot… OK.
Running U-Boot CRC Test… OK.
Flash:  4 MB
USB:   scanning bus for devices…
Root Hub 0: 3 USB Device(s) found
Root Hub 1: 1 USB Device(s) found
       scanning bus for storage devices… 1 Storage Device(s) found
Clearing DRAM…….. done
BIST check passed.
1:00:00.0 Vendor/Device ID = 0x811210b5
1:01:07.0 Vendor/Device ID = 0xc72414e4
Net:   octeth0
POST Passed
Press SPACE to abort autoboot in 1 seconds
ELF file is 32 bit
Loading .text @ 0x8f000078 (241008 bytes)
Loading .rodata @ 0x8f03ade8 (13908 bytes)
Loading .rodata.str1.4 @ 0x8f03e43c (15972 bytes)
Loading set_Xcommand_set @ 0x8f0422a0 (96 bytes)
Loading .rodata.cst4 @ 0x8f042300 (20 bytes)
Loading .data @ 0x8f043000 (5572 bytes)
Loading .data.rel.ro @ 0x8f0445c4 (120 bytes)
Loading .data.rel @ 0x8f04463c (136 bytes)
Clearing .bss @ 0x8f0446c8 (8304 bytes)
## Starting application at 0x8f000078 …
Consoles: U-Boot console 
Found compatible API, ver. 1.5

FreeBSD/MIPS U-Boot bootstrap loader, Revision 1.5
([email protected], Fri Oct  9 10:55:15 UTC 2009)
Memory: 1024MB
[0]Booting from nand-flash slice 1
Un-Protected 1 sectors
writing to flash…
Protected 1 sectors
Loading /boot/defaults/loader.conf
/kernel data=0x9f5c18+0xd8d68 syms=[0x4+0x7e350+0x4+0xb49c7]

Hit [Enter] to boot immediately, or space bar for command prompt.
Booting [/kernel]…              
Kernel entry at 0x801000d8 …
getbootinfo: magic 0x0 md 0x80d03000 memsize 0x0
getbootinfo: boothowto 0x1000 kernend 0x80e00000 memsize 1024MB kernelname /kernel

Platform Starting
init regular console
Initializing octeon watchdog
GDB: debug ports: uart
GDB: current port: uart
KDB: debugger backends: ddb gdb
KDB: current backend: ddb
getmemsize: msgbufp[size=32768] = 0x8000cfe4
Copyright (c) 1996-2011, Juniper Networks, Inc.
All rights reserved.
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
JUNOS 10.4R6.5 #0: 2011-07-23 11:18:23 UTC
    [email protected]:/volume/build/junos/10.4/release/10.4R6.5/obj-octeon/bsd/sys/compile/JSRXNLE
JUNOS 10.4R6.5 #0: 2011-07-23 11:18:23 UTC
    [email protected]:/volume/build/junos/10.4/release/10.4R6.5/obj-octeon/bsd/sys/compile/JSRXNLE
real memory  = 1073741824 (1024MB)
avail memory = 527036416 (502MB)
cpuid: 0, btlb_cpumap:0xffffffff
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
Initializing watchdog interupt

Loading RT Fifo module…..
Loaded RT Fifo module
pmap_helper loaded (interface version 6, syscall 210)
cpu0 on motherboard
: CAVIUM’s Octeon CPU Rev. 0.8 with no FPU implemented
        L1 Cache: I size 32kb(128 line), D size 8kb(128 line), sixty four way.
        L2 Cache: Size 128kb, ? way
obio0 on motherboard
uart0: <Octeon-16550 channel 0> on obio0
uart0: console (9600,n,8,1)
twsi0 on obio0
dwc0: <Synopsis DWC OTG Controller Driver> on obio0
usb0: DWC OTG Controller
Using DMA mode
Init: Port Power? op_state=1
Init: Power Port (0)
usb0: <USB Bus for DWC OTG Controller> on dwc0
usb0: USB revision 2.0
uhub0: vendor 0x0000 DWC OTG root hub, class 9/0, rev 2.00/1.00, addr 1
uhub0: 1 port with 1 removable, self powered
uhub1: vendor 0x0409 product 0x005a, class 9/0, rev 2.00/1.00, addr 2
uhub1: single transaction translator
uhub1: 3 ports with 2 removable, self powered
umass0: STMicroelectronics ST72682  High Speed Mode, rev 2.00/2.10, addr 3
dwc1: <Synopsis DWC OTG Controller Driver> on obio0
usb1: DWC OTG Controller
Using DMA mode
Init: Port Power? op_state=1
Init: Power Port (0)
usb1: <USB Bus for DWC OTG Controller> on dwc1
usb1: USB revision 2.0
uhub2: vendor 0x0000 DWC OTG root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 1 port with 1 removable, self powered
pcib1: <Cavium on-chip PCIe HOST bridge> on obio0
Disabling Octeon big bar support
PCIe: Waiting for port 0 to finish reset
PCIe: Port 0 link active, 2 lanes
PCIe: Waiting for port 1 to finish reset
PCIe: Port 1 link active, 1 lanes
pcib1: Initialized controller
pci0: <PCI bus> on pcib1
pcib2: <PCI-PCI bridge> irq 0 at device 0.0 on pci0
pci1: <PCI bus> on pcib2
pci1: <serial bus, USB> at device 2.0 (no driver attached)
pci1: <network> at device 7.0 (no driver attached)
pcib0: <Cavium on-chip PCIe HOST bridge> on obio0
pci2: <PCI bus> on pcib0
pci2: <processor> at device 0.0 (no driver attached)
cpld0 on obio0
gblmem0 on obio0
octpkt0: <Octeon RGMII> on obio0
cfi0: <AMD/Fujitsu – 4MB> on obio0
platform_cookie_read not implemented
Timecounter “mips” frequency 600000000 Hz quality 0
Timecounters tick every 1.000 msec
Loading the NETPFE ethernet module
Loading E1/T1/J1 driver
Loading the DS1/E1 Media Layer; Attaching to media services layer
Loading common multilink module.
Loading the NETPFE PPPoE module
Loading the netpfe services driver
Loading the NETPFE docsis module
Loading DS0 driver
Loading the DS0 Media Layer; Attaching to media services layer
Loading the XDSL Media Layer; Attaching to media services layer
Loading the IPSec driver
 Loading the PTM driver
Loading the ISDN driver

Loading the ISDN BRI Media Layer; Attaching to media services layer
Loading Link Services PICs module.
IPsec: Initialized Security Association Processing.
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
da0 at umass-sim0 bus 0 target 0 lun 0
da0: <ST ST72682 2.10> Removable Direct Access SCSI-2 device
da0: 40.000MB/s transfers
da0: 1000MB (2048000 512 byte sectors: 64H 32S/T 1000C)
Trying to create bootdev, rootpartition da0s1a
Trying to mount root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
Attaching /cf/packages/junos via /dev/mdctl…
Mounted junos package on /dev/md0…

Media check on da0
Zone 06 Block 0965 Addr 1bc500 : Bad read
Recovering block
Automatic reboot in progress…
** /dev/da0s1a
** Last Mounted on /
** Root file system
** Phase 1 – Check Blocks and Sizes
** Phase 2 – Check Pathnames
** Phase 3 – Check Connectivity
** Phase 4 – Check Reference Counts
** Phase 5 – Check Cyl groups
134 files, 109199 used, 40835 free (27 frags, 5101 blocks, 0.0% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****
Verified junos signed by PackageProduction_10_4_0
Verified jboot signed by PackageProduction_10_4_0

From JTAC support’s suggestion, there is one option to enable kernel debug when crash happened:

configure following two commands into your SRX system:
set system debugger-on-panic  and set system debugger-on-break

This will enable the box to fall into debugger when kernel crashes/panic happens ( please note that until you type reset on console, it will not boot upto give you cli access)
When box falls to db>, we can collect the following information.
db> bt
db> show reg
db> show msgbuf
db> x/s version
db> ps
db> show page
db> show intrcnt
db> x/x ticks
db> x/x softticks
db> show tlb
db> show pcpu
db> show allpcpu
db> show allvms
db> show threads
db> show files

Good luck, hopefully this will never happen to you.

By Jon

Leave a Reply