Hi, On 04/02/08(04:19) you Brent Beck <pjyhqbdyi-ytjem465gilr.ml@ml.delegate.org> wrote in <_A3945@delegate-en.ML_> |Since upgrading to the 9.x.x version of DeleGate, we have been having |startup failures intermittently. | |When it happens, the parent process dies immediately after completing |initialization. Trying again generally succeeds. Once running, it will |stay running without any problems. | |I can sometimes reproduce it once or twice, but then further attempts to |restart are all successful, making it difficult to troubleshoot. ... |04/01 13:35:18.70 [14576] 0+0: #{TR}# START accepting SIGCHLD ... |04/01 13:35:18.72 [14576] 0+0: --INITIALIZATION DONE-00000000--00X: |9.7.7 on Linux/2.4.9-e.12smp-- |04/01 13:35:18.75 [14576] 0+0: abort: caught SIGPIPE What do you see afther this line when DeleGate starts normally? I think it might be like follows: 04/02 15:54:59.69 [31536] 0+0: --INITIALIZATION DONE-08040215+0900: 9.8.2-pre21 on Linux/2.4.20-8-- 04/02 15:54:59.71 [31535] 0+0: --beDaemon: ready=1, stat=0 04/02 15:54:59.71 [31535] 0+0: --beDaemon: going background ... 04/02 15:54:59.71 [31535] 0+0: --beDaemon: going background 04/02 15:54:59.72 [31536] 0+0: ## left connected but dead [10] 04/02 15:54:59.72 [31536] 0+0: --beDaemon:[10]0 wcc=1 err=0 rdy=1 1/1 Your case seems like the problem that I fixed in 9.7.[36] for Solaris8, which could be possible in other offsprings of SysV. [CHANGES] 9.7.6 071025 fix iotimeout.c: SEGV<-SIGPIPE on Solaris<=8 (9.6.3-pre4) 9.7.3 070927 fix delegated.c: killed by SIGPIPE on Solaris8 (9.4.3) The cause of the problem minght be "delayed SIGPIPE" (by SysV) caused in src/delegated.c:_main() as follows: signal(SIGPIPE,SIG_IGN); write(pipe,data,size); signal(SIGPIPE,sigPIPE); The code in ver.9.7.7 is as follows: 6405 if( 0 <= dmsync ){ 6406 char dmstat = 0; 6407 if( RESOLV_UNKNOWN ) dmstat |= 1; 6408 if( SCRIPT_UNKNOWN ) dmstat |= 2; 6409 signal(SIGPIPE,SIG_IGN); 6410 if( PollIn(dmsync,10) ){ 6411 /* to suppress SIGPIPE */ 6412 sv1log("--beDaemon:[%d]%d parent=%d/%d\n", 6413 dmsync,IsAlive(dmsync), 6414 getppid(),procIsAlive(getppid())); 6415 }else{ 6416 /* try to catch delayed SIGPIPE on SVR4? */ 6417 int rdy1,wcc,err1; 6418 wcc = 6419 write(dmsync,&dmstat,1); 6420 err1 = errno; 6421 rdy1 = PollIn(dmsync,50); 6422 sv1log("--beDaemon:[%d]%d wcc=%d err=%d rdy=%d %d/%d\n", 6423 dmsync,IsAlive(dmsync),wcc,err1,rdy1, 6424 getppid(),procIsAlive(getppid())); 6425 } 6426 signal(SIGPIPE,sigPIPE); 6427 } 6428 if( 0 <= dmsync ) close(dmsync); You might be able to escape the problem by some workarounds: 1) longer timeout at 6410 like this: 6410 if( PollIn(dmsync,100) ){ 2) or, longer timeout at 6422 like this: 6421 rdy1 = PollIn(dmsync,100); 3) or, close() instread of write() 6419 close(dmsync); and so on. Cheers, Yutaka -- 9 9 Yutaka Sato <y.sato@delegate.org> http://delegate.org/y.sato/ ( ~ ) National Institute of Advanced Industrial Science and Technology _< >_ 1-1-4 Umezono, Tsukuba, Ibaraki, 305-8568 Japan Do the more with the less -- B. Fuller