Sunday, February 17, 2008

oracle is back to live after a long time - 36h give or take,

procedure was:

  1. connect sqlplus sys/******* as sysdba
  2. shutdown immediate
  3. startup nomount
  4. restore control file
  5. shutdown abort
  6. startup mount
  7. exit
  8. export MALLOC_CHECK_=0
  9. rman catalog rman/******@BACKUP target sys/******
  10. restore database
  11. recover database
  12. alter database open resetlogs;
a simple 9 step procedure once you got it all figured out and just because a ******** barge hat to cap the cables at the sacramento river and took me out of business.

Helpfull links: this

or like a wise man said:

boss: how long do you need to fix the problem?
wise man: 10 seconds after I found the solution

fighting with oracle since 16 hours because of a stupid bug.

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: BINBASE (DBID=3083741697, not open)
connected to recovery catalog database

RMAN> recover datafile 1;

Starting recover at 15-FEB-08
Starting implicit crosscheck backup at 15-FEB-08
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=155 devtype=DISK
Crosschecked 103 objects
Finished implicit crosscheck backup at 15-FEB-08

Starting implicit crosscheck copy at 15-FEB-08
using channel ORA_DISK_1
Crosschecked 10 objects
Finished implicit crosscheck copy at 15-FEB-08

searching for all files in the recovery area
cataloging files...
*** glibc detected *** double free or corruption (!prev): 0x0cc2f3a8 ***







RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 02/15/2008 23:46:02
ORA-03113: end-of-file on communication channel
RMAN-00571: ===========================================================


basically it is some cglib mojo with the glib library from centos which kills your application if it thinks it founds a bug...

it is pretty well described here


In short:

The version of glibc provided with CentOS 4.3 performs additional internal
sanity checks to prevent and detect data corruption as early as possible. By
default, should corruption be detected, a message similar to the following will
be displayed on standard error (or logged via syslog if stderr is not open):

*** glibc detected *** double free or corruption: 0x0937d008 ***

By default, the program that generated this error will also be killed; however,
this (and whether or not an error message is generated) can be controlled via
the MALLOC_CHECK_ environment variable. The following settings are supported:

0 Do not generate an error message, and do not kill the program
1 Generate an error message, but do not kill the program
2 Do not generate an error message, but kill the program


3 Generate an error message and kill the program

After I set this variable I made some progress.

...need to take more pictures...
...need more spare time...
...need to figure out how to get a new visa...
...dentist, need togo to dentist...
...need more mojitos...