Article delegate-en/3936 of [1-5169] on the server localhost:119
  upper oldest olders older1 this newer1 newers latest
search
[Top/Up] [oldest] - [Older+chunk] - [Newer+chunk] - [newest + Check]
[Reference:<_A3932@delegate-en.ML_>]
Newsgroups: mail-lists.delegate-en

[DeleGate-En] Re: Question about URICONV=where:any ...
19 Mar 2008 01:16:35 GMT feedback@delegate.org (Yutaka Sato)
The DeleGate Project


Hi,

On 03/14/08(14:10) you "Jette Benoit" <pqyeabdyi-fjxi26au6qlr.ml@ml.delegate.org> wrote
in <_A3932@delegate-en.ML_> (nntp://127.0.0.1:7119/~/Mail/drafts/12113):
 |We're using delegate as a reverse proxy, and we have a strange thing
 |happening. Since there's some shockwave file on this site, we add the
 |URICONV="where:any" in the proxy configuration. It's working fine with
 |those, but since then a simple web page gave us a bad web page.
 | 
 |Here's the original web page (contacting the real web server) or without
 |the URICONV="where:any" [good web page]:
...
 |			<p><input type="hidden" value="/test/"></p>
...
 |And here's the web page after adding the URICONV="where:any" [bad web
 |page]:
...
 |	                                           <p><input
 |type="hidden" value="http://publicsite.com/test/"></p>
...
 |I've tried removing all conversion with URICONV="+" URICONV="-*/*", but
 |if URICONV="where:any" was there, it still gave us the bad web page.
 |Is there something we're missing to do what we want? Is the "where:any"
 |parameter doing conversion on all HTML tags as well as CSS,  XML,
 |Javascript, and shockwave flash?

DeleGate does URL rewriting in these phases:
 0) pre-scan to see the existens of URLs
 1) scan URLs and rewrite them to full URLs
 2) scan URLs and rewrite them based on MOUNT parameters
 3) scan URLs and rewrite them to partial URLs based on the Host: field

With "-v2" option, you can observe it in LOGFILE like follows:

 03/19 09:29:26.92 [6428] 1+1: HTTP/1.0 200 Content-{Type:text/html Encoding:[/] Leng:243} Server:DeleGate/9.8.1
 URL in #B# ScriptATTR <inpu  value="/test/"></p>
 URL in #B# ScriptATTR <inpu  value="/test/"></p>
 URL in #B# ScriptATTR <inpu  value="http://publicsite.com:80/test/"></p>
 URL in #B# ScriptATTR <inpu  value="http://publicsite.com:80/test/"></p>

If you specify "fullurl" or disable "partial" with URLCONV, the phase 3)
will be suppressed and you will get full-URLs.
If you did not have such configuration, something is wrong in the
matching of protocol, host or port by the "hostcmp_lexical()" function
in "src/url.c".

Well, the real problem with URICONV="where:any" is the difficulty of finding
URLs embedded in JavaScript.  Since it is so difficult, the current
implementation is left so simple :p
In your case, the string '="/' in the VALUE attribute in the INPUT tag is
regarded as some kind of manipulation of a URL fragment in JavaScript.
At least the application of the matching should be restricted to attributes
that may contain JavaScript.  To cope with your case, it will be done
as the enclosed patch.

Cheers,
Yutaka
--
  9 9   Yutaka Sato <y.sato@delegate.org> http://delegate.org/y.sato/
 ( ~ )  National Institute of Advanced Industrial Science and Technology
_<   >_ 1-1-4 Umezono, Tsukuba, Ibaraki, 305-8568 Japan
Do the more with the less -- B. Fuller

*** dist/src/delegate9.8.2-pre18/src/url.c	Wed Feb  6 12:33:47 2008
--- src/url.c	Wed Mar 19 09:07:06 2008
***************
*** 249,262 ****
--- 249,265 ----
  					goto FOUND;
  				}
  			}
  		}
  	}
  
  	if( URL_SEARCH & URL_IN_ATTR_SCRIPT )
+ 	if( tag && strncaseeq(tag,"<INPUT",6) && strncaseeq(ref,"VALUE=",6) ){
+ 		uritrace("#B# NOT-ScriptATTR",tag,ref);
+ 	}else
  	for( p = ref; ch = *p; p++ ){
  		if( isspace(ch) || ch=='"' || ch=='\'' || ch=='>' || ch=='<' )
  			break;
  		if( isJSop(ch) ){
  			if( up = isURLinJavaScript(p,&qch) ){
  				uritrace("#B# ScriptATTR",tag,ref);
  				goto FOUND;

  admin search upper oldest olders older1 this newer1 newers latest
[Top/Up] [oldest] - [Older+chunk] - [Newer+chunk] - [newest + Check]
@_@V