Article delegate-en/3947 of [1-5169] on the server localhost:119
  upper oldest olders older1 this newer1 newers latest
search
[Top/Up] [oldest] - [Older+chunk] - [Newer+chunk] - [newest + Check]
[Reference:<_A3946@delegate-en.ML_>]
Newsgroups: mail-lists.delegate-en

[DeleGate-En] Re: intermittent 'abort: caught SIGPIPE' during startup
02 Apr 2008 15:58:30 GMT Brent Beck <pjyhqbdyi-q4vsjhr2irvr.ml@ml.delegate.org>
Cox Communications


Thanks for the fast response.  

I have seen the problem in both 9.5.6 and 9.7.7, too.  


## log excerpt using normal logging...  

04/02 11:43:13.54 [27691] 0+0: --INITIALIZATION DONE-00000000--00X:
9.7.7 on Linux/2.4.9-e.12smp--
04/02 11:43:13.55 [27691] 0+0: ## left connected but dead [8]
04/02 11:43:13.55 [27691] 0+0: --beDaemon:[8]0 parent=1/1
04/02 11:43:13.55 [27691] 0+0: abort: caught SIGPIPE


## log excerpt using debug/verbose logging...  

04/02 11:05:12.76 [27421] 0+0: --INITIALIZATION DONE-00000000--00X:
9.7.7 on Linux/2.4.9-e.12smp--
04/02 11:05:12.76 [27421] 0+0: PollIn.POLLHUP (8) errno=0
04/02 11:05:12.76 [27421] 0+0: PollIn.POLLHUP (8) errno=0
04/02 11:05:12.76 [27421] 0+0: ## left connected but dead [8]
04/02 11:05:12.76 [27421] 0+0: --beDaemon:[8]0 parent=1/1
04/02 11:05:12.76 [27421] 0+0:
dirfopen(/var/spool/delegate-nobody/adm/svstats9/_xxxxsrpxy01%3A80_,r+):
83409d8 [8]
04/02 11:05:12.76 [27421] 0+0: 
### HTMLCONV configuration:
HTMLCONV=deent
URICONV=defelem:{A,APPLET,AREA,BASE,BLOCKQUOTE,BODY,DEL,EMBED,FORM,FRAME,HEAD,IFRAME,IMG,INPUT,INS,LINK,OBJECT,Q,SCRIPT,Header,?xml,META,!--#echo,!--#include,!--#fsize,!--#flastmod,!--#exec,!--#config,ssitags,TABLE,TR,TD}
URICONV=defattr:{ACTION,ARCHIVE,BACKGROUND,CITE,CLASSID,CODE,CODEBASE,DATA,HREF,LONGDESC,PROFILE,SRC,USEMAP,IMAGEMAP,SCRIPT,URL,Location,Content-Location,Set-Cookie,-,encoding,HTTP-EQUIV,var,file,virtual,cgi,cmd,timefmt,sizefmt}
URICONV=full:ACTION/FORM
URICONV=full:ARCHIVE/OBJECT
URICONV=full:BACKGROUND/BODY
URICONV=full:CITE/BLOCKQUOTE
URICONV=full:CITE/DEL
URICONV=full:CITE/INS
URICONV=full:CITE/Q
URICONV=full:CLASSID/OBJECT
URICONV=full:CODEBASE/APPLET
URICONV=full:CODEBASE/OBJECT
URICONV=full:DATA/OBJECT
URICONV=full:LONGDESC/FRAME
URICONV=full:LONGDESC/IFRAME
URICONV=full:LONGDESC/IMG
URICONV=full:PROFILE/HEAD
URICONV=full:SRC/IMG
URICONV=full:SRC/FRAME
URICONV=full:SRC/IFRAME
URICONV=full:SRC/INPUT
URICONV=full:SRC/SCRIPT
URICONV=full:USEMAP/IMG
URICONV=full:USEMAP/INPUT
URICONV=full:USEMAP/OBJECT
URICONV=full:IMAGEMAP/*
URICONV=full:SCRIPT/*
URICONV=full:URL/*
URICONV=full:Content-Location/Header
URICONV=full:Location/Header
URICONV=full:Set-Cookie/Header
URICONV=full:encoding/?xml
URICONV=full:var/!--#echo
URICONV=full:file/!--#include
URICONV=full:virtual/!--#include
URICONV=full:file/!--#fsize
URICONV=full:virtual/!--#fsize
URICONV=full:file/!--#flastmod
URICONV=full:virtual/!--#flastmod
URICONV=full:virtual/!--#exec
URICONV=full:cmd/!--#exec
URICONV=full:cgi/!--#exec
URICONV=full:var/!--#config
URICONV=full:timefmt/!--#config
URICONV=full:sizefmt/!--#config
URICONV=full:CODE/APPLET
URICONV=full:-/APPLET
URICONV=full:CODE/OBJECT
URICONV=full:-/OBJECT
URICONV=full:CODE/EMBED
URICONV=full:-/EMBED
URICONV=full:BACKGROUND/TABLE
04/02 11:05:12.76 [27421] 0+0: dirfopen(/var/log/stdout.log,a): 0 [-1]
04/02 11:05:12.76 [27421] 0+0: abort: caught SIGPIPE



Le mercredi 02 avril 2008 à 16:13 +0900, Yutaka Sato a écrit :
> Hi,
> 
> On 04/02/08(04:19) you Brent Beck <pjyhqbdyi-q4vsjhr2irvr.ml@ml.delegate.org> wrote
> in <_A3945@delegate-en.ML_>
>  |Since upgrading to the 9.x.x version of DeleGate, we have been having
>  |startup failures intermittently.  
>  |
>  |When it happens, the parent process dies immediately after completing
>  |initialization.  Trying again generally succeeds.  Once running, it will
>  |stay running without any problems.  
>  |
>  |I can sometimes reproduce it once or twice, but then further attempts to
>  |restart are all successful, making it difficult to troubleshoot.  
> ...
>  |04/01 13:35:18.70 [14576] 0+0: #{TR}# START accepting SIGCHLD
> ...
>  |04/01 13:35:18.72 [14576] 0+0: --INITIALIZATION DONE-00000000--00X:
>  |9.7.7 on Linux/2.4.9-e.12smp--
>  |04/01 13:35:18.75 [14576] 0+0: abort: caught SIGPIPE
> 
> What do you see afther this line when DeleGate starts normally?
> I think it might be like follows:
> 
>   04/02 15:54:59.69 [31536] 0+0: --INITIALIZATION DONE-08040215+0900: 9.8..2-pre21 on Linux/2.4.20-8--
>   04/02 15:54:59.71 [31535] 0+0: --beDaemon: ready=1, stat=0
>   04/02 15:54:59.71 [31535] 0+0: --beDaemon: going background ...
>   04/02 15:54:59.71 [31535] 0+0: --beDaemon: going background
>   04/02 15:54:59.72 [31536] 0+0: ## left connected but dead [10]
>   04/02 15:54:59.72 [31536] 0+0: --beDaemon:[10]0 wcc=1 err=0 rdy=1 1/1
> 
> Your case seems like the problem that I fixed in 9.7.[36] for Solaris8,
> which could be possible in other offsprings of SysV.
> 
>   [CHANGES]
>   9.7.6 071025 fix iotimeout.c: SEGV<-SIGPIPE on Solaris<=8 (9.6.3-pre4)
>   9.7.3 070927 fix delegated.c: killed by SIGPIPE on Solaris8 (9.4.3)
> 
> The cause of the problem minght be "delayed SIGPIPE" (by SysV) caused in
> src/delegated.c:_main() as follows: 
> 
>   signal(SIGPIPE,SIG_IGN);
>   write(pipe,data,size);
>   signal(SIGPIPE,sigPIPE);
> 
> The code in ver.9.7.7 is as follows:
> 
>   6405		if( 0 <= dmsync ){
>   6406			char dmstat = 0;
>   6407			if( RESOLV_UNKNOWN ) dmstat |= 1;
>   6408			if( SCRIPT_UNKNOWN ) dmstat |= 2;
>   6409			signal(SIGPIPE,SIG_IGN);
>   6410			if( PollIn(dmsync,10) ){
>   6411				/* to suppress SIGPIPE */
>   6412				sv1log("--beDaemon:[%d]%d parent=%d/%d\n",
>   6413					dmsync,IsAlive(dmsync),
>   6414					getppid(),procIsAlive(getppid()));
>   6415			}else{
>   6416				/* try to catch delayed SIGPIPE on SVR4? */
>   6417				int rdy1,wcc,err1;
>   6418				wcc =
>   6419			write(dmsync,&dmstat,1);
>   6420				err1 = errno;
>   6421				rdy1 = PollIn(dmsync,50);
>   6422				sv1log("--beDaemon:[%d]%d wcc=%d err=%d rdy=%d %d/%d\n",
>   6423					dmsync,IsAlive(dmsync),wcc,err1,rdy1,
>   6424					getppid(),procIsAlive(getppid()));
>   6425			}
>   6426			signal(SIGPIPE,sigPIPE);
>   6427		}
>   6428		if( 0 <= dmsync ) close(dmsync);
> 
> You might be able to escape the problem by some workarounds:
> 1) longer timeout at 6410 like this:
>   6410			if( PollIn(dmsync,100) ){
> 2) or, longer timeout at 6422 like this:
>   6421				rdy1 = PollIn(dmsync,100);
> 3) or, close() instread of write()
>   6419			close(dmsync);
> and so on.
> 
> Cheers,
> Yutaka
> --
>   9 9   Yutaka Sato <y.sato@delegate.org> http://delegate.org/y.sato/
>  ( ~ )  National Institute of Advanced Industrial Science and Technology
> _<   >_ 1-1-4 Umezono, Tsukuba, Ibaraki, 305-8568 Japan
> Do the more with the less -- B. Fuller
> 

  admin search upper oldest olders older1 this newer1 newers latest
[Top/Up] [oldest] - [Older+chunk] - [Newer+chunk] - [newest + Check]
@_@V