diff options
| author | Jakub Narebski <jnareb@gmail.com> | 2011-12-18 23:00:58 +0100 | 
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2011-12-19 12:25:43 -0800 | 
| commit | b13e3eacefc0fb6f4f89738f74ba5ef14437bed5 (patch) | |
| tree | 79ceb6d975c95b8548e431e479136fb67c9e96c1 /gitweb/gitweb.perl | |
| parent | 57cf4ad6e82af6aaa38bb215ea35ea9c465c6045 (diff) | |
| download | git-b13e3eacefc0fb6f4f89738f74ba5ef14437bed5.tar.gz | |
gitweb: Fix fallback mode of to_utf8 subroutinejn/maint-gitweb-utf8-fix
e5d3de5 (gitweb: use Perl built-in utf8 function for UTF-8 decoding.,
2007-12-04) was meant to make gitweb faster by using Perl's internals
(see subsection "Messing with Perl's Internals" in Encode(3pm) manpage)
Simple benchmark confirms that (old = 00f429a, new = this version):
        old  new
  old    -- -65%
  new  189%   --
Unfortunately it made fallback mode of to_utf8 do not work...  except
for default value 'latin1' of $fallback_encoding ('latin1' is Perl
native encoding), which is why it was not noticed for such long time.
utf8::valid(STRING) is an internal function that tests whether STRING
is in a _consistent state_ regarding UTF-8.  It returns true is
well-formed UTF-8 and has the UTF-8 flag on _*or*_ if string is held
as bytes (both these states are 'consistent').  For gitweb the second
option was true, as output from git commands is opened without ':utf8'
layer.
What made it work at all for STRING in 'latin1' encoding is the fact
that utf8:decode(STRING) turns on UTF-8 flag only if source string is
valid UTF-8 and contains multi-byte UTF-8 characters... and that if
string doesn't have UTF-8 flag set it is treated as in native Perl
encoding, i.e.  'latin1' / 'iso-8859-1' (unless native encoding it is
EBCDIC ;-)).  It was ':utf8' layer that actually converted 'latin1'
(no UTF-8 flag == native == 'latin1) to 'utf8'.
Let's make use of the fact that utf8:decode(STRING) returns false if
STRING is invalid as UTF-8 to check whether to enable fallback mode.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'gitweb/gitweb.perl')
| -rwxr-xr-x | gitweb/gitweb.perl | 4 | 
1 files changed, 2 insertions, 2 deletions
| diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index dc2ad9d4a4..874023a33e 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -1442,8 +1442,8 @@ sub validate_refname {  sub to_utf8 {  	my $str = shift;  	return undef unless defined $str; -	if (utf8::valid($str)) { -		utf8::decode($str); + +	if (utf8::is_utf8($str) || utf8::decode($str)) {  		return $str;  	} else {  		return decode($fallback_encoding, $str, Encode::FB_DEFAULT); | 
