Article delegate-en/3946 of [1-5169] on the server localhost:119
  upper oldest olders older1 this newer1 newers latest
search
[Top/Up] [oldest] - [Older+chunk] - [Newer+chunk] - [newest + Check]
[Reference:<_A3945@delegate-en.ML_>]
Newsgroups: mail-lists.delegate-en

[DeleGate-En] Re: intermittent 'abort: caught SIGPIPE' during startup
02 Apr 2008 07:13:31 GMT feedback@delegate.org (Yutaka Sato)
The DeleGate Project


Hi,

On 04/02/08(04:19) you Brent Beck <pjyhqbdyi-ytjem465gilr.ml@ml.delegate.org> wrote
in <_A3945@delegate-en.ML_>
 |Since upgrading to the 9.x.x version of DeleGate, we have been having
 |startup failures intermittently.  
 |
 |When it happens, the parent process dies immediately after completing
 |initialization.  Trying again generally succeeds.  Once running, it will
 |stay running without any problems.  
 |
 |I can sometimes reproduce it once or twice, but then further attempts to
 |restart are all successful, making it difficult to troubleshoot.  
...
 |04/01 13:35:18.70 [14576] 0+0: #{TR}# START accepting SIGCHLD
...
 |04/01 13:35:18.72 [14576] 0+0: --INITIALIZATION DONE-00000000--00X:
 |9.7.7 on Linux/2.4.9-e.12smp--
 |04/01 13:35:18.75 [14576] 0+0: abort: caught SIGPIPE

What do you see afther this line when DeleGate starts normally?
I think it might be like follows:

  04/02 15:54:59.69 [31536] 0+0: --INITIALIZATION DONE-08040215+0900: 9.8.2-pre21 on Linux/2.4.20-8--
  04/02 15:54:59.71 [31535] 0+0: --beDaemon: ready=1, stat=0
  04/02 15:54:59.71 [31535] 0+0: --beDaemon: going background ...
  04/02 15:54:59.71 [31535] 0+0: --beDaemon: going background
  04/02 15:54:59.72 [31536] 0+0: ## left connected but dead [10]
  04/02 15:54:59.72 [31536] 0+0: --beDaemon:[10]0 wcc=1 err=0 rdy=1 1/1

Your case seems like the problem that I fixed in 9.7.[36] for Solaris8,
which could be possible in other offsprings of SysV.

  [CHANGES]
  9.7.6 071025 fix iotimeout.c: SEGV<-SIGPIPE on Solaris<=8 (9.6.3-pre4)
  9.7.3 070927 fix delegated.c: killed by SIGPIPE on Solaris8 (9.4.3)

The cause of the problem minght be "delayed SIGPIPE" (by SysV) caused in
src/delegated.c:_main() as follows: 

  signal(SIGPIPE,SIG_IGN);
  write(pipe,data,size);
  signal(SIGPIPE,sigPIPE);

The code in ver.9.7.7 is as follows:

  6405		if( 0 <= dmsync ){
  6406			char dmstat = 0;
  6407			if( RESOLV_UNKNOWN ) dmstat |= 1;
  6408			if( SCRIPT_UNKNOWN ) dmstat |= 2;
  6409			signal(SIGPIPE,SIG_IGN);
  6410			if( PollIn(dmsync,10) ){
  6411				/* to suppress SIGPIPE */
  6412				sv1log("--beDaemon:[%d]%d parent=%d/%d\n",
  6413					dmsync,IsAlive(dmsync),
  6414					getppid(),procIsAlive(getppid()));
  6415			}else{
  6416				/* try to catch delayed SIGPIPE on SVR4? */
  6417				int rdy1,wcc,err1;
  6418				wcc =
  6419			write(dmsync,&dmstat,1);
  6420				err1 = errno;
  6421				rdy1 = PollIn(dmsync,50);
  6422				sv1log("--beDaemon:[%d]%d wcc=%d err=%d rdy=%d %d/%d\n",
  6423					dmsync,IsAlive(dmsync),wcc,err1,rdy1,
  6424					getppid(),procIsAlive(getppid()));
  6425			}
  6426			signal(SIGPIPE,sigPIPE);
  6427		}
  6428		if( 0 <= dmsync ) close(dmsync);

You might be able to escape the problem by some workarounds:
1) longer timeout at 6410 like this:
  6410			if( PollIn(dmsync,100) ){
2) or, longer timeout at 6422 like this:
  6421				rdy1 = PollIn(dmsync,100);
3) or, close() instread of write()
  6419			close(dmsync);
and so on.

Cheers,
Yutaka
--
  9 9   Yutaka Sato <y.sato@delegate.org> http://delegate.org/y.sato/
 ( ~ )  National Institute of Advanced Industrial Science and Technology
_<   >_ 1-1-4 Umezono, Tsukuba, Ibaraki, 305-8568 Japan
Do the more with the less -- B. Fuller


  admin search upper oldest olders older1 this newer1 newers latest
[Top/Up] [oldest] - [Older+chunk] - [Newer+chunk] - [newest + Check]
@_@V