Merge remote-tracking branch 'origin/5.8' into 5.9

Change-Id: I9cf7f04769944935d7b836453c7982839857a909
author: Liang Qi <liang.qi@qt.io> 2017-03-06 14:33:16 +0100
committer: Liang Qi <liang.qi@qt.io> 2017-03-06 14:33:34 +0100
commit: f2dbc67c2b032a5f27d0224e020fb6dfcd3fd142 (patch)
tree: c5c195998f538fd7b9aa1382ced53caf2dec3926 /src
parent: d2306d74850986692c02b70df0d7a6a6e933d0dc (diff)
parent: 03e507c8f3816053257b3c351d79c494b6f9174d (diff)
download: qtimageformats-f2dbc67c2b032a5f27d0224e020fb6dfcd3fd142.tar.gz
116 files changed, 5670 insertions, 2293 deletions
diff --git a/src/3rdparty/libtiff/ChangeLog b/src/3rdparty/libtiff/ChangeLog
index 5b77723..9b9d397 100644
--- a/src/3rdparty/libtiff/ChangeLog
+++ b/src/3rdparty/libtiff/ChangeLog
@@ -1,3 +1,485 @@
+2016-11-19  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* libtiff 4.0.7 released.
+
+	* configure.ac: Update for 4.0.7 release.
+
+	* tools/tiffdump.c (ReadDirectory): Remove uint32 cast to
+	_TIFFmalloc() argument which resulted in Coverity report.  Added
+	more mutiplication overflow checks.
+
+2016-11-18 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: Fix memory leak in (recent) error code path.
+	Fixes Coverity 1394415.
+
+2016-11-17  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* libtiff/tif_getimage.c: Fix some benign warnings which appear in
+	64-bit compilation under Microsoft Visual Studio of the form
+	"Arithmetic overflow: 32-bit value is shifted, then cast to 64-bit
+	value.  Results might not be an expected value.".  Problem was
+	reported on November 16, 2016 on the tiff mailing list.
+
+2016-11-16 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: in TIFFFetchNormalTag(), do not dereference
+	NULL pointer when values of tags with TIFF_SETGET_C16_ASCII / TIFF_SETGET_C32_ASCII
+	access are 0-byte arrays.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2593 (regression introduced
+	by previous fix done on 2016-11-11 for CVE-2016-9297).
+	Reported by Henri Salo. Assigned as CVE-2016-9448
+
+2016-11-12  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* tools/tiffinfo.c (TIFFReadContigTileData): Fix signed/unsigned
+	comparison warning.
+	(TIFFReadSeparateTileData): Fix signed/unsigned comparison
+	warning.
+
+	* tools/tiffcrop.c (readContigTilesIntoBuffer): Fix
+	signed/unsigned comparison warning.
+
+	* html/v4.0.7.html: Add a file to document the pending 4.0.7
+	release.
+
+2016-11-11 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiff2pdf.c: avoid undefined behaviour related to overlapping
+	of source and destination buffer in memcpy() call in
+	t2p_sample_rgbaa_to_rgb()
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2577
+
+2016-11-11 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiff2pdf.c: fix potential integer overflows on 32 bit builds
+	in t2p_read_tiff_size()
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2576
+
+2016-11-11 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_aux.c: fix crash in TIFFVGetFieldDefaulted()
+	when requesting Predictor tag and that the zip/lzw codec is not
+	configured.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2591
+
+2016-11-11 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: in TIFFFetchNormalTag(), make sure that
+	values of tags with TIFF_SETGET_C16_ASCII / TIFF_SETGET_C32_ASCII
+	access are null terminated, to avoid potential read outside buffer
+	in _TIFFPrintField().
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2590 (CVE-2016-9297)
+
+2016-11-11 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: reject images with OJPEG compression that
+	have no TileOffsets/StripOffsets tag, when OJPEG compression is
+	disabled. Prevent null pointer dereference in TIFFReadRawStrip1()
+	and other functions that expect td_stripbytecount to be non NULL.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2585
+
+2016-11-11 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: fix multiple uint32 overflows in
+	writeBufferToSeparateStrips(), writeBufferToContigTiles() and
+	writeBufferToSeparateTiles() that could cause heap buffer overflows.
+	Reported by Henri Salo from Nixu Corporation.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2592
+
+2016-11-10 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_strip.c: make TIFFNumberOfStrips() return the td->td_nstrips
+	value when it is non-zero, instead of recomputing it. This is needed in
+	TIFF_STRIPCHOP mode where td_nstrips is modified. Fixes a read outsize of
+	array in tiffsplit (or other utilities using TIFFNumberOfStrips()).
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2587 (CVE-2016-9273)
+
+2016-11-04 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_predic.c: fix memory leaks in error code paths added in
+	previous commit (fix for MSVR 35105)
+
+2016-10-31 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_predict.h, libtiff/tif_predict.c:
+	Replace assertions by runtime checks to avoid assertions in debug mode,
+	or buffer overflows in release mode. Can happen when dealing with
+	unusual tile size like YCbCr with subsampling. Reported as MSVR 35105
+	by Axel Souchet	& Vishal Chauhan from the MSRC Vulnerabilities & Mitigations
+	team.
+
+2016-10-26 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/fax2tiff.c: fix segfault when specifying -r without
+	argument. Patch by Yuriy M. Kaminskiy.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2572
+
+2016-10-25 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dir.c: discard values of SMinSampleValue and
+	SMaxSampleValue when they have been read and the value of
+	SamplesPerPixel is changed afterwards (like when reading a
+	OJPEG compressed image with a missing SamplesPerPixel tag,
+	and whose photometric is RGB or YCbCr, forcing SamplesPerPixel
+	being 3). Otherwise when rewriting the directory (for example
+	with tiffset, we will expect 3 values whereas the array had been
+	allocated with just one), thus causing a out of bound read access.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2500
+	(CVE-2014-8127, duplicate: CVE-2016-3658)
+	
+	* libtiff/tif_dirwrite.c: avoid null pointer dereference on td_stripoffset
+	when writing directory, if FIELD_STRIPOFFSETS was artificially set
+	for a hack case	in OJPEG case.
+	Fixes http://bugzilla.maptools.org/show_bug.cgi?id=2500
+	(CVE-2014-8127, duplicate: CVE-2016-3658)
+
+2016-10-25 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffinfo.c: fix out-of-bound read on some tiled images.
+	(http://bugzilla.maptools.org/show_bug.cgi?id=2517)
+
+	* libtiff/tif_compress.c: make TIFFNoDecode() return 0 to indicate an
+	error and make upper level read routines treat it accordingly.
+	(linked to the test case of http://bugzilla.maptools.org/show_bug.cgi?id=2517)
+
+2016-10-14 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: fix out-of-bound read of up to 3 bytes in
+	readContigTilesIntoBuffer(). Reported as MSVR 35092 by Axel Souchet
+	& Vishal Chauhan from the MSRC Vulnerabilities & Mitigations team.
+
+2016-10-09 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiff2pdf.c: fix write buffer overflow of 2 bytes on JPEG
+	compressed images. Reported by Tyler Bohan of Cisco Talos as
+	TALOS-CAN-0187 / CVE-2016-5652.
+	Also prevents writing 2 extra uninitialized bytes to the file stream.
+
+2016-10-08 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcp.c: fix out-of-bounds write on tiled images with odd
+	tile width vs image width. Reported as MSVR 35103
+	by Axel Souchet and Vishal Chauhan from the MSRC Vulnerabilities &
+	Mitigations team.
+
+2016-10-08 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiff2pdf.c: fix read -largely- outsize of buffer in
+	t2p_readwrite_pdf_image_tile(), causing crash, when reading a
+	JPEG compressed image with TIFFTAG_JPEGTABLES length being one.
+	Reported as MSVR 35101 by Axel Souchet and Vishal Chauhan from
+	the MSRC Vulnerabilities & Mitigations team. CVE-2016-9453
+
+2016-10-08 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcp.c: fix read of undefined variable in case of missing
+	required tags. Found on test case of MSVR 35100.
+	* tools/tiffcrop.c: fix read of undefined buffer in
+	readContigStripsIntoBuffer() due to uint16 overflow. Probably not a
+	security issue but I can be wrong. Reported as MSVR 35100 by Axel
+	Souchet from the MSRC Vulnerabilities & Mitigations team.
+
+2016-09-25  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* html: Change as many remotesensing.org broken links to a working
+	URL as possible.
+
+2016-09-24  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* libtiff/tif_getimage.c (TIFFRGBAImageOK): Reject attempts to
+	read floating point images.
+
+	* libtiff/tif_predict.c (PredictorSetup): Enforce bits-per-sample
+	requirements of floating point predictor (3).  Fixes CVE-2016-3622
+	"Divide By Zero in the tiff2rgba tool."
+
+2016-09-23 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: fix various out-of-bounds write vulnerabilities
+	in heap or stack allocated buffers. Reported as MSVR 35093,
+	MSVR 35096 and MSVR 35097. Discovered by Axel Souchet and Vishal
+	Chauhan from the MSRC Vulnerabilities & Mitigations team.
+	* tools/tiff2pdf.c: fix out-of-bounds write vulnerabilities in
+	heap allocate buffer in t2p_process_jpeg_strip(). Reported as MSVR
+	35098. Discovered by Axel Souchet and Vishal Chauhan from the MSRC
+	Vulnerabilities & Mitigations team.
+	* libtiff/tif_pixarlog.c: fix out-of-bounds write vulnerabilities
+	in heap allocated buffers. Reported as MSVR 35094. Discovered by
+	Axel Souchet and Vishal Chauhan from the MSRC Vulnerabilities &
+	Mitigations team.
+	* libtiff/tif_write.c: fix issue in error code path of TIFFFlushData1()
+	that didn't reset the tif_rawcc and tif_rawcp members. I'm not
+	completely sure if that could happen in practice outside of the odd
+	behaviour of t2p_seekproc() of tiff2pdf). The report points that a
+	better fix could be to check the return value of TIFFFlushData1() in
+	places where it isn't done currently, but it seems this patch is enough.
+	Reported as MSVR 35095. Discovered by Axel Souchet & Vishal Chauhan &
+	Suha Can from the MSRC Vulnerabilities & Mitigations team.
+
+2016-09-20  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* html/man/index.html: Comment out links to documentation for
+	abandoned utilities.
+
+2016-09-17 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_lzma.c: typo fix in comment
+
+2016-09-04 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/*.c: fix warnings raised by clang 3.9 -Wcomma
+
+2016-09-03 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirwrite.c, libtiff/tif_color.c: fix warnings raised
+	by GCC 5 / clang -Wfloat-conversion
+
+2016-08-16 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: fix C99'ism.
+
+2016-08-15 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiff2bw.c: fix weight computation that could result of color
+	value overflow (no security implication). Fix bugzilla #2550.
+	Patch by Frank Freudenberg.
+
+2016-08-15 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/rgb2ycbcr.c: validate values of -v and -h parameters to
+	avoid potential divide by zero. Fixes CVE-2016-3623 (bugzilla #2569)
+
+2016-08-15 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: Fix out-of-bounds write in loadImage().
+	From patch libtiff-CVE-2016-3991.patch from
+	libtiff-4.0.3-25.el7_2.src.rpm by Nikola Forro (bugzilla #2543)
+
+2016-08-15 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_pixarlog.c: Fix write buffer overflow in PixarLogEncode
+	if more input samples are provided than expected by PixarLogSetupEncode.
+	Idea based on libtiff-CVE-2016-3990.patch from
+	libtiff-4.0.3-25.el7_2.src.rpm by Nikola Forro, but with different and
+	simpler check. (bugzilla #2544)
+
+2016-08-15 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiff2rgba.c: Fix integer overflow in size of allocated
+	buffer, when -b mode is enabled, that could result in out-of-bounds
+	write. Based initially on patch tiff-CVE-2016-3945.patch from
+	libtiff-4.0.3-25.el7_2.src.rpm by Nikola Forro, with correction for
+	invalid tests that rejected valid files. (bugzilla #2545)
+
+2016-07-11 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffcrop.c: Avoid access outside of stack allocated array
+	on a tiled separate TIFF with more than 8 samples per pixel.
+	Reported by Kaixiang Zhang of the Cloud Security Team, Qihoo 360
+	(CVE-2016-5321 / CVE-2016-5323 , bugzilla #2558 / #2559)
+
+2016-07-10 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_read.c: Fix out-of-bounds read on
+	memory-mapped files in TIFFReadRawStrip1() and TIFFReadRawTile1()
+	when stripoffset is beyond tmsize_t max value (reported by
+	Mathias Svensson)
+
+2016-07-10 Even Rouault <even.rouault at spatialys.com>
+
+	* tools/tiffdump.c: fix a few misaligned 64-bit reads warned
+	by -fsanitize
+
+2016-07-03 Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_read.c: make TIFFReadEncodedStrip() and
+	TIFFReadEncodedTile() directly use user provided buffer when
+	no compression (and other conditions) to save a memcpy().
+
+	* libtiff/tif_write.c: make TIFFWriteEncodedStrip() and
+	TIFFWriteEncodedTile() directly use user provided buffer when
+	no compression to save a memcpy().
+
+2016-07-01  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_luv.c: validate that for COMPRESSION_SGILOG and
+	PHOTOMETRIC_LOGL, there is only one sample per pixel. Avoid
+	potential invalid memory write on corrupted/unexpected images when
+	using the TIFFRGBAImageBegin() interface (reported by
+	Clay Wood)
+
+2016-06-28  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_pixarlog.c: fix potential buffer write overrun in
+	PixarLogDecode() on corrupted/unexpected images (reported by Mathias Svensson)
+	(CVE-2016-5875)
+
+2016-06-15  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* libtiff/libtiff.def: Added _TIFFMultiply32 and _TIFFMultiply64
+	to libtiff.def
+
+2016-06-05  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* tools/Makefile.am: The libtiff tools bmp2tiff, gif2tiff,
+	ras2tiff, sgi2tiff, sgisv, and ycbcr are completely removed from
+	the distribution.  The libtiff tools rgb2ycbcr and thumbnail are
+	only built in the build tree for testing.  Old files are put in
+	new 'archive' subdirectory of the source repository, but not in
+	distribution archives.  These changes are made in order to lessen
+	the maintenance burden.
+
+2016-05-10  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* libtiff/tif_config.vc.h (HAVE_SNPRINTF): Add a '1' to the
+	HAVE_SNPRINTF definition.'
+
+2016-05-09  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* libtiff/tif_config.vc.h (HAVE_SNPRINTF): Applied patch by Edward
+	Lam to define HAVE_SNPRINTF for Visual Studio 2015.
+
+2016-04-27  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: when compiled with DEFER_STRILE_LOAD,
+	fix regression, introduced on 2014-12-23, when reading a one-strip
+	file without a StripByteCounts tag. GDAL #6490
+
+2016-04-07  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
+
+	* html/bugs.html: Replace Andrey Kiselev with Bob Friesenhahn for
+	purposes of security issue reporting.
+
+2016-01-23  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/*: upstream typo fixes (mostly contributed by Kurt Schwehr)
+	coming from GDAL internal libtiff
+
+2016-01-09  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_fax3.h: make Param member of TIFFFaxTabEnt structure
+	a uint16 to reduce size of the binary.
+
+2016-01-03  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_read.c, tif_dirread.c: fix indentation issues raised
+	by GCC 6 -Wmisleading-indentation
+
+2015-12-27  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_pixarlog.c: avoid zlib error messages to pass a NULL
+	string to %s formatter, which is undefined behaviour in sprintf().
+
+2015-12-27  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_next.c: fix potential out-of-bound write in NeXTDecode()
+	triggered by http://lcamtuf.coredump.cx/afl/vulns/libtiff5.tif
+	(bugzilla #2508)
+
+2015-12-27  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_luv.c: fix potential out-of-bound writes in decode
+	functions in non debug builds by replacing assert()s by regular if
+	checks (bugzilla #2522).
+	Fix potential out-of-bound reads in case of short input data.
+
+2015-12-26  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_getimage.c: fix out-of-bound reads in TIFFRGBAImage
+	interface in case of unsupported values of SamplesPerPixel/ExtraSamples
+	for LogLUV / CIELab. Add explicit call to TIFFRGBAImageOK() in
+	TIFFRGBAImageBegin(). Fix CVE-2015-8665 reported by limingxing and
+	CVE-2015-8683 reported by zzf of Alibaba.
+
+2015-12-21  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: workaround false positive warning of Clang Static
+	Analyzer about null pointer dereference in TIFFCheckDirOffset().
+
+2015-12-19  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_fax3.c: remove dead assignment in Fax3PutEOLgdal(). Found
+	by Clang Static Analyzer
+
+2015-12-18  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirwrite.c: fix truncation to 32 bit of file offsets in
+	TIFFLinkDirectory() and TIFFWriteDirectorySec() when aligning directory
+	offsets on a even offset (affects BigTIFF). This was a regression of the
+	changeset of 2015-10-19.
+
+2015-12-12  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_write.c: TIFFWriteEncodedStrip() and TIFFWriteEncodedTile()
+	should return -1 in case of failure of tif_encodestrip() as documented
+	* libtiff/tif_dumpmode.c: DumpModeEncode() should return 0 in case of
+	failure so that the above mentionned functions detect the error.
+
+2015-12-06  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/uvcode.h: const'ify uv_code array
+
+2015-12-06  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirinfo.c: const'ify tiffFields, exifFields,
+	tiffFieldArray and exifFieldArray arrays
+
+2015-12-06  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_print.c: constify photoNames and orientNames arrays
+
+2015-12-06  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_close.c, libtiff/tif_extension.c : rename link
+	variable to avoid -Wshadow warnings
+
+2015-11-22  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/*.c: fix typos in comments (patch by Kurt Schwehr)
+ 
+2015-11-22  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/*.c: fix MSVC warnings related to cast shortening and
+	assignment within conditional expression
+
+2015-11-18  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/*.c: fix clang -Wshorten-64-to-32 warnings
+
+2015-11-18  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: initialize double* data at line 3693 to NULL
+	to please MSVC 2013
+
+2015-11-17  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: prevent reading ColorMap or TransferFunction
+	if BitsPerPixel > 24, so as to avoid huge memory allocation and file
+	read attempts
+
+2015-11-02  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dirread.c: remove duplicated assignment (reported by
+	Clang static analyzer)
+
+2015-10-28  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dir.c, libtiff/tif_dirinfo.c, libtiff/tif_compress.c,
+	libtiff/tif_jpeg_12.c: suppress warnings about 'no previous
+	declaration/prototype'
+
+2015-10-19  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tiffiop.h, libtiff/tif_dirwrite.c: suffix constants by U to fix 
+	'warning: negative integer implicitly converted to unsigned type' warning
+	(part of -Wconversion)
+
+2015-10-17  Even Rouault <even.rouault at spatialys.com>
+
+	* libtiff/tif_dir.c, libtiff/tif_dirread.c, libtiff/tif_getimage.c,
+	  libtiff/tif_print.c: fix -Wshadow warnings (only in libtiff/)
+
 2015-09-12  Bob Friesenhahn  <bfriesen@simple.dallas.tx.us>
 
 	* libtiff 4.0.6 released.
diff --git a/src/3rdparty/libtiff/RELEASE-DATE b/src/3rdparty/libtiff/RELEASE-DATE
index ae758a7..fb9e3f6 100644
--- a/src/3rdparty/libtiff/RELEASE-DATE
+++ b/src/3rdparty/libtiff/RELEASE-DATE
@@ -1 +1 @@
-20150912
+20161119
diff --git a/src/3rdparty/libtiff/VERSION b/src/3rdparty/libtiff/VERSION
index d13e837..43beb40 100644
--- a/src/3rdparty/libtiff/VERSION
+++ b/src/3rdparty/libtiff/VERSION
@@ -1 +1 @@
-4.0.6
+4.0.7
diff --git a/src/3rdparty/libtiff/libtiff/libtiff.def b/src/3rdparty/libtiff/libtiff/libtiff.def
index 98228f1..acf8609 100644
--- a/src/3rdparty/libtiff/libtiff/libtiff.def
+++ b/src/3rdparty/libtiff/libtiff/libtiff.def
@@ -162,3 +162,5 @@ EXPORTS	TIFFAccessTagMethods
 	_TIFFmemcpy
 	_TIFFmemset
 	_TIFFrealloc
+        _TIFFMultiply32
+        _TIFFMultiply64
diff --git a/src/3rdparty/libtiff/libtiff/tif_aux.c b/src/3rdparty/libtiff/libtiff/tif_aux.c
index 927150a..3d35ba9 100644
--- a/src/3rdparty/libtiff/libtiff/tif_aux.c
+++ b/src/3rdparty/libtiff/libtiff/tif_aux.c
@@ -1,4 +1,4 @@
-/* $Id: tif_aux.c,v 1.26 2010-07-01 15:33:28 dron Exp $ */
+/* $Id: tif_aux.c,v 1.29 2016-11-11 20:45:53 erouault Exp $ */
 
 /*
  * Copyright (c) 1991-1997 Sam Leffler
@@ -100,7 +100,8 @@ TIFFDefaultTransferFunction(TIFFDirectory* td)
 
 	n = ((tmsize_t)1)<<td->td_bitspersample;
 	nbytes = n * sizeof (uint16);
-	if (!(tf[0] = (uint16 *)_TIFFmalloc(nbytes)))
+        tf[0] = (uint16 *)_TIFFmalloc(nbytes);
+	if (tf[0] == NULL)
 		return 0;
 	tf[0][0] = 0;
 	for (i = 1; i < n; i++) {
@@ -109,10 +110,12 @@ TIFFDefaultTransferFunction(TIFFDirectory* td)
 	}
 
 	if (td->td_samplesperpixel - td->td_extrasamples > 1) {
-		if (!(tf[1] = (uint16 *)_TIFFmalloc(nbytes)))
+                tf[1] = (uint16 *)_TIFFmalloc(nbytes);
+		if(tf[1] == NULL)
 			goto bad;
 		_TIFFmemcpy(tf[1], tf[0], nbytes);
-		if (!(tf[2] = (uint16 *)_TIFFmalloc(nbytes)))
+                tf[2] = (uint16 *)_TIFFmalloc(nbytes);
+		if (tf[2] == NULL)
 			goto bad;
 		_TIFFmemcpy(tf[2], tf[0], nbytes);
 	}
@@ -134,7 +137,8 @@ TIFFDefaultRefBlackWhite(TIFFDirectory* td)
 {
 	int i;
 
-	if (!(td->td_refblackwhite = (float *)_TIFFmalloc(6*sizeof (float))))
+        td->td_refblackwhite = (float *)_TIFFmalloc(6*sizeof (float));
+	if (td->td_refblackwhite == NULL)
 		return 0;
         if (td->td_photometric == PHOTOMETRIC_YCBCR) {
 		/*
@@ -163,7 +167,7 @@ TIFFDefaultRefBlackWhite(TIFFDirectory* td)
  * value if the tag is not present in the directory.
  *
  * NB:	We use the value in the directory, rather than
- *	explcit values so that defaults exist only one
+ *	explicit values so that defaults exist only one
  *	place in the library -- in TIFFDefaultDirectory.
  */
 int
@@ -208,11 +212,18 @@ TIFFVGetFieldDefaulted(TIFF* tif, uint32 tag, va_list ap)
 		*va_arg(ap, uint16 *) = td->td_resolutionunit;
 		return (1);
 	case TIFFTAG_PREDICTOR:
-                {
-			TIFFPredictorState* sp = (TIFFPredictorState*) tif->tif_data;
-			*va_arg(ap, uint16*) = (uint16) sp->predictor;
-			return 1;
-                }
+    {
+        TIFFPredictorState* sp = (TIFFPredictorState*) tif->tif_data;
+        if( sp == NULL )
+        {
+            TIFFErrorExt(tif->tif_clientdata, tif->tif_name,
+                         "Cannot get \"Predictor\" tag as plugin is not configured");
+            *va_arg(ap, uint16*) = 0;
+            return 0;
+        }
+        *va_arg(ap, uint16*) = (uint16) sp->predictor;
+        return 1;
+    }
 	case TIFFTAG_DOTRANGE:
 		*va_arg(ap, uint16 *) = 0;
 		*va_arg(ap, uint16 *) = (1<<td->td_bitspersample)-1;
diff --git a/src/3rdparty/libtiff/libtiff/tif_close.c b/src/3rdparty/libtiff/libtiff/tif_close.c
index 13d2bab..a0cb661 100644
--- a/src/3rdparty/libtiff/libtiff/tif_close.c
+++ b/src/3rdparty/libtiff/libtiff/tif_close.c
@@ -1,4 +1,4 @@
-/* $Id: tif_close.c,v 1.19 2010-03-10 18:56:48 bfriesen Exp $ */
+/* $Id: tif_close.c,v 1.21 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -36,7 +36,7 @@
 
 /**
  * Auxiliary function to free the TIFF structure. Given structure will be
- * completetly freed, so you should save opened file handle and pointer
+ * completely freed, so you should save opened file handle and pointer
  * to the close procedure in external variables before calling
  * _TIFFCleanup(), if you will need these ones to close the file.
  * 
@@ -62,11 +62,11 @@ TIFFCleanup(TIFF* tif)
          */
 	while( tif->tif_clientinfo )
 	{
-		TIFFClientInfoLink *link = tif->tif_clientinfo;
+		TIFFClientInfoLink *psLink = tif->tif_clientinfo;
 
-		tif->tif_clientinfo = link->next;
-		_TIFFfree( link->name );
-		_TIFFfree( link );
+		tif->tif_clientinfo = psLink->next;
+		_TIFFfree( psLink->name );
+		_TIFFfree( psLink );
 	}
 
 	if (tif->tif_rawdata && (tif->tif_flags&TIFF_MYBUFFER))
diff --git a/src/3rdparty/libtiff/libtiff/tif_color.c b/src/3rdparty/libtiff/libtiff/tif_color.c
index be4850c..89194c2 100644
--- a/src/3rdparty/libtiff/libtiff/tif_color.c
+++ b/src/3rdparty/libtiff/libtiff/tif_color.c
@@ -1,4 +1,4 @@
-/* $Id: tif_color.c,v 1.19 2010-12-14 02:22:42 faxguy Exp $ */
+/* $Id: tif_color.c,v 1.22 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -126,37 +126,37 @@ TIFFCIELabToRGBInit(TIFFCIELabToRGB* cielab,
 		    const TIFFDisplay *display, float *refWhite)
 {
 	int i;
-	double gamma;
+	double dfGamma;
 
 	cielab->range = CIELABTORGB_TABLE_RANGE;
 
 	_TIFFmemcpy(&cielab->display, display, sizeof(TIFFDisplay));
 
 	/* Red */
-	gamma = 1.0 / cielab->display.d_gammaR ;
+	dfGamma = 1.0 / cielab->display.d_gammaR ;
 	cielab->rstep =
 		(cielab->display.d_YCR - cielab->display.d_Y0R)	/ cielab->range;
 	for(i = 0; i <= cielab->range; i++) {
 		cielab->Yr2r[i] = cielab->display.d_Vrwr
-		    * ((float)pow((double)i / cielab->range, gamma));
+		    * ((float)pow((double)i / cielab->range, dfGamma));
 	}
 
 	/* Green */
-	gamma = 1.0 / cielab->display.d_gammaG ;
+	dfGamma = 1.0 / cielab->display.d_gammaG ;
 	cielab->gstep =
 	    (cielab->display.d_YCR - cielab->display.d_Y0R) / cielab->range;
 	for(i = 0; i <= cielab->range; i++) {
 		cielab->Yg2g[i] = cielab->display.d_Vrwg
-		    * ((float)pow((double)i / cielab->range, gamma));
+		    * ((float)pow((double)i / cielab->range, dfGamma));
 	}
 
 	/* Blue */
-	gamma = 1.0 / cielab->display.d_gammaB ;
+	dfGamma = 1.0 / cielab->display.d_gammaB ;
 	cielab->bstep =
 	    (cielab->display.d_YCR - cielab->display.d_Y0R) / cielab->range;
 	for(i = 0; i <= cielab->range; i++) {
 		cielab->Yb2b[i] = cielab->display.d_Vrwb
-		    * ((float)pow((double)i / cielab->range, gamma));
+		    * ((float)pow((double)i / cielab->range, dfGamma));
 	}
 
 	/* Init reference white point */
@@ -175,7 +175,7 @@ TIFFCIELabToRGBInit(TIFFCIELabToRGB* cielab,
 #define	SHIFT			16
 #define	FIX(x)			((int32)((x) * (1L<<SHIFT) + 0.5))
 #define	ONE_HALF		((int32)(1<<(SHIFT-1)))
-#define	Code2V(c, RB, RW, CR)	((((c)-(int32)(RB))*(float)(CR))/(float)(((RW)-(RB)) ? ((RW)-(RB)) : 1))
+#define	Code2V(c, RB, RW, CR)	((((c)-(int32)(RB))*(float)(CR))/(float)(((RW)-(RB)!=0) ? ((RW)-(RB)) : 1))
 #define	CLAMP(f,min,max)	((f)<(min)?(min):(f)>(max)?(max):(f))
 #define HICLAMP(f,max)		((f)>(max)?(max):(f))
 
@@ -186,7 +186,9 @@ TIFFYCbCrtoRGB(TIFFYCbCrToRGB *ycbcr, uint32 Y, int32 Cb, int32 Cr,
 	int32 i;
 
 	/* XXX: Only 8-bit YCbCr input supported for now */
-	Y = HICLAMP(Y, 255), Cb = CLAMP(Cb, 0, 255), Cr = CLAMP(Cr, 0, 255);
+	Y = HICLAMP(Y, 255);
+	Cb = CLAMP(Cb, 0, 255);
+	Cr = CLAMP(Cr, 0, 255);
 
 	i = ycbcr->Y_tab[Y] + ycbcr->Cr_r_tab[Cr];
 	*r = CLAMP(i, 0, 255);
diff --git a/src/3rdparty/libtiff/libtiff/tif_compress.c b/src/3rdparty/libtiff/libtiff/tif_compress.c
index 20e72fd..b571d19 100644
--- a/src/3rdparty/libtiff/libtiff/tif_compress.c
+++ b/src/3rdparty/libtiff/libtiff/tif_compress.c
@@ -1,4 +1,4 @@
-/* $Id: tif_compress.c,v 1.22 2010-03-10 18:56:48 bfriesen Exp $ */
+/* $Id: tif_compress.c,v 1.25 2016-10-25 20:04:22 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -82,10 +82,10 @@ TIFFNoDecode(TIFF* tif, const char* method)
 		TIFFErrorExt(tif->tif_clientdata, tif->tif_name,
 			     "Compression scheme %u %s decoding is not implemented",
 			     tif->tif_dir.td_compression, method);
-	return (-1);
+	return (0);
 }
 
-int
+static int
 _TIFFNoFixupTags(TIFF* tif)
 {
 	(void) tif;
@@ -227,7 +227,7 @@ TIFFUnRegisterCODEC(TIFFCodec* c)
 	codec_t* cd;
 	codec_t** pcd;
 
-	for (pcd = &registeredCODECS; (cd = *pcd); pcd = &cd->next)
+	for (pcd = &registeredCODECS; (cd = *pcd) != NULL; pcd = &cd->next)
 		if (cd->info == c) {
 			*pcd = cd->next;
 			_TIFFfree(cd);
diff --git a/src/3rdparty/libtiff/libtiff/tif_config.vc.h b/src/3rdparty/libtiff/libtiff/tif_config.vc.h
index 7e6c5c2..5cebfa0 100644
--- a/src/3rdparty/libtiff/libtiff/tif_config.vc.h
+++ b/src/3rdparty/libtiff/libtiff/tif_config.vc.h
@@ -107,6 +107,8 @@
 /* Visual Studio 2015 / VC 14 / MSVC 19.00 finally has snprintf() */
 #if defined(_MSC_VER) && _MSC_VER < 1900
 #define snprintf _snprintf
+#else
+#define HAVE_SNPRINTF 1
 #endif
 
 /* Define to 1 if your processor stores words with the most significant byte
diff --git a/src/3rdparty/libtiff/libtiff/tif_dir.c b/src/3rdparty/libtiff/libtiff/tif_dir.c
index 73212c0..ad21655 100644
--- a/src/3rdparty/libtiff/libtiff/tif_dir.c
+++ b/src/3rdparty/libtiff/libtiff/tif_dir.c
@@ -1,4 +1,4 @@
-/* $Id: tif_dir.c,v 1.121 2015-05-31 23:11:43 bfriesen Exp $ */
+/* $Id: tif_dir.c,v 1.127 2016-10-25 21:35:15 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -43,8 +43,10 @@
 static void
 setByteArray(void** vpp, void* vp, size_t nmemb, size_t elem_size)
 {
-	if (*vpp)
-		_TIFFfree(*vpp), *vpp = 0;
+	if (*vpp) {
+		_TIFFfree(*vpp);
+		*vpp = 0;
+	}
 	if (vp) {
 		tmsize_t bytes = (tmsize_t)(nmemb * elem_size);
 		if (elem_size && bytes / elem_size == nmemb)
@@ -57,13 +59,13 @@ void _TIFFsetByteArray(void** vpp, void* vp, uint32 n)
     { setByteArray(vpp, vp, n, 1); }
 void _TIFFsetString(char** cpp, char* cp)
     { setByteArray((void**) cpp, (void*) cp, strlen(cp)+1, 1); }
-void _TIFFsetNString(char** cpp, char* cp, uint32 n)
+static void _TIFFsetNString(char** cpp, char* cp, uint32 n)
     { setByteArray((void**) cpp, (void*) cp, n, 1); }
 void _TIFFsetShortArray(uint16** wpp, uint16* wp, uint32 n)
     { setByteArray((void**) wpp, (void*) wp, n, sizeof (uint16)); }
 void _TIFFsetLongArray(uint32** lpp, uint32* lp, uint32 n)
     { setByteArray((void**) lpp, (void*) lp, n, sizeof (uint32)); }
-void _TIFFsetLong8Array(uint64** lpp, uint64* lp, uint32 n)
+static void _TIFFsetLong8Array(uint64** lpp, uint64* lp, uint32 n)
     { setByteArray((void**) lpp, (void*) lp, n, sizeof (uint64)); }
 void _TIFFsetFloatArray(float** fpp, float* fp, uint32 n)
     { setByteArray((void**) fpp, (void*) fp, n, sizeof (float)); }
@@ -170,7 +172,7 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 	 * We want to force the custom code to be used for custom
 	 * fields even if the tag happens to match a well known 
 	 * one - important for reinterpreted handling of standard
-	 * tag values in custom directories (ie. EXIF) 
+	 * tag values in custom directories (i.e. EXIF) 
 	 */
 	if (fip->field_bit == FIELD_CUSTOM) {
 		standard_tag = 0;
@@ -254,6 +256,28 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 		v = (uint16) va_arg(ap, uint16_vap);
 		if (v == 0)
 			goto badvalue;
+        if( v != td->td_samplesperpixel )
+        {
+            /* See http://bugzilla.maptools.org/show_bug.cgi?id=2500 */
+            if( td->td_sminsamplevalue != NULL )
+            {
+                TIFFWarningExt(tif->tif_clientdata,module,
+                    "SamplesPerPixel tag value is changing, "
+                    "but SMinSampleValue tag was read with a different value. Cancelling it");
+                TIFFClrFieldBit(tif,FIELD_SMINSAMPLEVALUE);
+                _TIFFfree(td->td_sminsamplevalue);
+                td->td_sminsamplevalue = NULL;
+            }
+            if( td->td_smaxsamplevalue != NULL )
+            {
+                TIFFWarningExt(tif->tif_clientdata,module,
+                    "SamplesPerPixel tag value is changing, "
+                    "but SMaxSampleValue tag was read with a different value. Cancelling it");
+                TIFFClrFieldBit(tif,FIELD_SMAXSAMPLEVALUE);
+                _TIFFfree(td->td_smaxsamplevalue);
+                td->td_smaxsamplevalue = NULL;
+            }
+        }
 		td->td_samplesperpixel = (uint16) v;
 		break;
 	case TIFFTAG_ROWSPERSTRIP:
@@ -402,7 +426,7 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 		if ((tif->tif_flags & TIFF_INSUBIFD) == 0) {
 			td->td_nsubifd = (uint16) va_arg(ap, uint16_vap);
 			_TIFFsetLong8Array(&td->td_subifd, (uint64*) va_arg(ap, uint64*),
-			    (long) td->td_nsubifd);
+			    (uint32) td->td_nsubifd);
 		} else {
 			TIFFErrorExt(tif->tif_clientdata, module,
 				     "%s: Sorry, cannot nest SubIFDs",
@@ -421,7 +445,7 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 		v = (td->td_samplesperpixel - td->td_extrasamples) > 1 ? 3 : 1;
 		for (i = 0; i < v; i++)
 			_TIFFsetShortArray(&td->td_transferfunction[i],
-			    va_arg(ap, uint16*), 1L<<td->td_bitspersample);
+			    va_arg(ap, uint16*), 1U<<td->td_bitspersample);
 		break;
 	case TIFFTAG_REFERENCEBLACKWHITE:
 		/* XXX should check for null range */
@@ -579,10 +603,10 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 				   handled this way ... likely best if we move it into
 				   the directory structure with an explicit field in 
 				   libtiff 4.1 and assign it a FIELD_ value */
-				uint16 v[2];
-				v[0] = (uint16)va_arg(ap, int);
-				v[1] = (uint16)va_arg(ap, int);
-				_TIFFmemcpy(tv->value, &v, 4);
+				uint16 v2[2];
+				v2[0] = (uint16)va_arg(ap, int);
+				v2[1] = (uint16)va_arg(ap, int);
+				_TIFFmemcpy(tv->value, &v2, 4);
 			}
 
 			else if (fip->field_passcount
@@ -600,66 +624,66 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 				case TIFF_BYTE:
 				case TIFF_UNDEFINED:
 					{
-						uint8 v = (uint8)va_arg(ap, int);
-						_TIFFmemcpy(val, &v, tv_size);
+						uint8 v2 = (uint8)va_arg(ap, int);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_SBYTE:
 					{
-						int8 v = (int8)va_arg(ap, int);
-						_TIFFmemcpy(val, &v, tv_size);
+						int8 v2 = (int8)va_arg(ap, int);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_SHORT:
 					{
-						uint16 v = (uint16)va_arg(ap, int);
-						_TIFFmemcpy(val, &v, tv_size);
+						uint16 v2 = (uint16)va_arg(ap, int);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_SSHORT:
 					{
-						int16 v = (int16)va_arg(ap, int);
-						_TIFFmemcpy(val, &v, tv_size);
+						int16 v2 = (int16)va_arg(ap, int);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_LONG:
 				case TIFF_IFD:
 					{
-						uint32 v = va_arg(ap, uint32);
-						_TIFFmemcpy(val, &v, tv_size);
+						uint32 v2 = va_arg(ap, uint32);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_SLONG:
 					{
-						int32 v = va_arg(ap, int32);
-						_TIFFmemcpy(val, &v, tv_size);
+						int32 v2 = va_arg(ap, int32);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_LONG8:
 				case TIFF_IFD8:
 					{
-						uint64 v = va_arg(ap, uint64);
-						_TIFFmemcpy(val, &v, tv_size);
+						uint64 v2 = va_arg(ap, uint64);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_SLONG8:
 					{
-						int64 v = va_arg(ap, int64);
-						_TIFFmemcpy(val, &v, tv_size);
+						int64 v2 = va_arg(ap, int64);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_RATIONAL:
 				case TIFF_SRATIONAL:
 				case TIFF_FLOAT:
 					{
-						float v = (float)va_arg(ap, double);
-						_TIFFmemcpy(val, &v, tv_size);
+						float v2 = (float)va_arg(ap, double);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				case TIFF_DOUBLE:
 					{
-						double v = va_arg(ap, double);
-						_TIFFmemcpy(val, &v, tv_size);
+						double v2 = va_arg(ap, double);
+						_TIFFmemcpy(val, &v2, tv_size);
 					}
 					break;
 				default:
@@ -672,9 +696,9 @@ _TIFFVSetField(TIFF* tif, uint32 tag, va_list ap)
 	}
 	}
 	if (status) {
-		const TIFFField* fip=TIFFFieldWithTag(tif,tag);
-		if (fip)                
-			TIFFSetFieldBit(tif, fip->field_bit);
+		const TIFFField* fip2=TIFFFieldWithTag(tif,tag);
+		if (fip2)                
+			TIFFSetFieldBit(tif, fip2->field_bit);
 		tif->tif_flags |= TIFF_DIRTYDIRECT;
 	}
 
@@ -683,31 +707,31 @@ end:
 	return (status);
 badvalue:
         {
-		const TIFFField* fip=TIFFFieldWithTag(tif,tag);
+		const TIFFField* fip2=TIFFFieldWithTag(tif,tag);
 		TIFFErrorExt(tif->tif_clientdata, module,
 		     "%s: Bad value %u for \"%s\" tag",
 		     tif->tif_name, v,
-		     fip ? fip->field_name : "Unknown");
+		     fip2 ? fip2->field_name : "Unknown");
 		va_end(ap);
         }
 	return (0);
 badvalue32:
         {
-		const TIFFField* fip=TIFFFieldWithTag(tif,tag);
+		const TIFFField* fip2=TIFFFieldWithTag(tif,tag);
 		TIFFErrorExt(tif->tif_clientdata, module,
 		     "%s: Bad value %u for \"%s\" tag",
 		     tif->tif_name, v32,
-		     fip ? fip->field_name : "Unknown");
+		     fip2 ? fip2->field_name : "Unknown");
 		va_end(ap);
         }
 	return (0);
 badvaluedouble:
         {
-        const TIFFField* fip=TIFFFieldWithTag(tif,tag);
+        const TIFFField* fip2=TIFFFieldWithTag(tif,tag);
         TIFFErrorExt(tif->tif_clientdata, module,
              "%s: Bad value %f for \"%s\" tag",
              tif->tif_name, dblval,
-             fip ? fip->field_name : "Unknown");
+             fip2 ? fip2->field_name : "Unknown");
         va_end(ap);
         }
     return (0);
@@ -834,7 +858,7 @@ _TIFFVGetField(TIFF* tif, uint32 tag, va_list ap)
 	 * We want to force the custom code to be used for custom
 	 * fields even if the tag happens to match a well known 
 	 * one - important for reinterpreted handling of standard
-	 * tag values in custom directories (ie. EXIF) 
+	 * tag values in custom directories (i.e. EXIF) 
 	 */
 	if (fip->field_bit == FIELD_CUSTOM) {
 		standard_tag = 0;
@@ -885,7 +909,7 @@ _TIFFVGetField(TIFF* tif, uint32 tag, va_list ap)
 				*va_arg(ap, double**) = td->td_sminsamplevalue;
 			else
 			{
-				/* libtiff historially treats this as a single value. */
+				/* libtiff historically treats this as a single value. */
 				uint16 i;
 				double v = td->td_sminsamplevalue[0];
 				for (i=1; i < td->td_samplesperpixel; ++i)
@@ -899,7 +923,7 @@ _TIFFVGetField(TIFF* tif, uint32 tag, va_list ap)
 				*va_arg(ap, double**) = td->td_smaxsamplevalue;
 			else
 			{
-				/* libtiff historially treats this as a single value. */
+				/* libtiff historically treats this as a single value. */
 				uint16 i;
 				double v = td->td_smaxsamplevalue[0];
 				for (i=1; i < td->td_samplesperpixel; ++i)
diff --git a/src/3rdparty/libtiff/libtiff/tif_dirinfo.c b/src/3rdparty/libtiff/libtiff/tif_dirinfo.c
index 7db4bdb..23ad002 100644
--- a/src/3rdparty/libtiff/libtiff/tif_dirinfo.c
+++ b/src/3rdparty/libtiff/libtiff/tif_dirinfo.c
@@ -1,4 +1,4 @@
-/* $Id: tif_dirinfo.c,v 1.121 2014-05-07 01:58:46 bfriesen Exp $ */
+/* $Id: tif_dirinfo.c,v 1.126 2016-11-18 02:52:13 bfriesen Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -38,14 +38,22 @@
  * NOTE: The second field (field_readcount) and third field (field_writecount)
  *       sometimes use the values TIFF_VARIABLE (-1), TIFF_VARIABLE2 (-3)
  *       and TIFF_SPP (-2). The macros should be used but would throw off
- *       the formatting of the code, so please interprete the -1, -2 and -3
+ *       the formatting of the code, so please interpret the -1, -2 and -3
  *       values accordingly.
  */
 
-static TIFFFieldArray tiffFieldArray;
-static TIFFFieldArray exifFieldArray;
-
-static TIFFField
+/* const object should be initialized */
+#ifdef _MSC_VER
+#pragma warning( push )
+#pragma warning( disable : 4132 )
+#endif
+static const TIFFFieldArray tiffFieldArray;
+static const TIFFFieldArray exifFieldArray;
+#ifdef _MSC_VER
+#pragma warning( pop )
+#endif
+
+static const TIFFField
 tiffFields[] = {
 	{ TIFFTAG_SUBFILETYPE, 1, 1, TIFF_LONG, 0, TIFF_SETGET_UINT32, TIFF_SETGET_UNDEFINED, FIELD_SUBFILETYPE, 1, 0, "SubfileType", NULL },
 	{ TIFFTAG_OSUBFILETYPE, 1, 1, TIFF_SHORT, 0, TIFF_SETGET_UNDEFINED, TIFF_SETGET_UNDEFINED, FIELD_SUBFILETYPE, 1, 0, "OldSubfileType", NULL },
@@ -95,7 +103,7 @@ tiffFields[] = {
 	{ TIFFTAG_TILELENGTH, 1, 1, TIFF_LONG, 0, TIFF_SETGET_UINT32, TIFF_SETGET_UNDEFINED, FIELD_TILEDIMENSIONS, 0, 0, "TileLength", NULL },
 	{ TIFFTAG_TILEOFFSETS, -1, 1, TIFF_LONG8, 0, TIFF_SETGET_UNDEFINED, TIFF_SETGET_UNDEFINED, FIELD_STRIPOFFSETS, 0, 0, "TileOffsets", NULL },
 	{ TIFFTAG_TILEBYTECOUNTS, -1, 1, TIFF_LONG8, 0, TIFF_SETGET_UNDEFINED, TIFF_SETGET_UNDEFINED, FIELD_STRIPBYTECOUNTS, 0, 0, "TileByteCounts", NULL },
-	{ TIFFTAG_SUBIFD, -1, -1, TIFF_IFD8, 0, TIFF_SETGET_C16_IFD8, TIFF_SETGET_UNDEFINED, FIELD_SUBIFD, 1, 1, "SubIFD", &tiffFieldArray },
+	{ TIFFTAG_SUBIFD, -1, -1, TIFF_IFD8, 0, TIFF_SETGET_C16_IFD8, TIFF_SETGET_UNDEFINED, FIELD_SUBIFD, 1, 1, "SubIFD", (TIFFFieldArray*) &tiffFieldArray },
 	{ TIFFTAG_INKSET, 1, 1, TIFF_SHORT, 0, TIFF_SETGET_UINT16, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 0, "InkSet", NULL },
 	{ TIFFTAG_INKNAMES, -1, -1, TIFF_ASCII, 0, TIFF_SETGET_C16_ASCII, TIFF_SETGET_UNDEFINED, FIELD_INKNAMES, 1, 1, "InkNames", NULL },
 	{ TIFFTAG_NUMBEROFINKS, 1, 1, TIFF_SHORT, 0, TIFF_SETGET_UINT16, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 1, 0, "NumberOfInks", NULL },
@@ -134,7 +142,7 @@ tiffFields[] = {
 	/* end Pixar tags */
 	{ TIFFTAG_RICHTIFFIPTC, -3, -3, TIFF_LONG, 0, TIFF_SETGET_C32_UINT32, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 1, "RichTIFFIPTC", NULL },
 	{ TIFFTAG_PHOTOSHOP, -3, -3, TIFF_BYTE, 0, TIFF_SETGET_C32_UINT8, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 1, "Photoshop", NULL },
-	{ TIFFTAG_EXIFIFD, 1, 1, TIFF_IFD8, 0, TIFF_SETGET_IFD8, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 0, "EXIFIFDOffset", &exifFieldArray },
+	{ TIFFTAG_EXIFIFD, 1, 1, TIFF_IFD8, 0, TIFF_SETGET_IFD8, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 0, "EXIFIFDOffset", (TIFFFieldArray*) &exifFieldArray },
 	{ TIFFTAG_ICCPROFILE, -3, -3, TIFF_UNDEFINED, 0, TIFF_SETGET_C32_UINT8, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 1, "ICC Profile", NULL },
 	{ TIFFTAG_GPSIFD, 1, 1, TIFF_IFD8, 0, TIFF_SETGET_IFD8, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 0, 0, "GPSIFDOffset", NULL },
 	{ TIFFTAG_FAXRECVPARAMS, 1, 1, TIFF_LONG, 0, TIFF_SETGET_UINT32, TIFF_SETGET_UINT32, FIELD_CUSTOM, TRUE, FALSE, "FaxRecvParams", NULL },
@@ -211,7 +219,7 @@ tiffFields[] = {
 	/* begin pseudo tags */
 };
 
-static TIFFField
+static const TIFFField
 exifFields[] = {
 	{ EXIFTAG_EXPOSURETIME, 1, 1, TIFF_RATIONAL, 0, TIFF_SETGET_DOUBLE, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 1, 0, "ExposureTime", NULL },
 	{ EXIFTAG_FNUMBER, 1, 1, TIFF_RATIONAL, 0, TIFF_SETGET_DOUBLE, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 1, 0, "FNumber", NULL },
@@ -271,13 +279,13 @@ exifFields[] = {
 	{ EXIFTAG_IMAGEUNIQUEID, 33, 33, TIFF_ASCII, 0, TIFF_SETGET_ASCII, TIFF_SETGET_UNDEFINED, FIELD_CUSTOM, 1, 0, "ImageUniqueID", NULL }
 };
 
-static TIFFFieldArray
-tiffFieldArray = { tfiatImage, 0, TIFFArrayCount(tiffFields), tiffFields };
-static TIFFFieldArray
-exifFieldArray = { tfiatExif, 0, TIFFArrayCount(exifFields), exifFields };
+static const TIFFFieldArray
+tiffFieldArray = { tfiatImage, 0, TIFFArrayCount(tiffFields), (TIFFField*) tiffFields };
+static const TIFFFieldArray
+exifFieldArray = { tfiatExif, 0, TIFFArrayCount(exifFields), (TIFFField*) exifFields };
 
 /*
- *  We have our own local lfind() equivelent to avoid subtle differences
+ *  We have our own local lfind() equivalent to avoid subtle differences
  *  in types passed to lfind() on different systems. 
  */
 
@@ -521,7 +529,7 @@ TIFFFindField(TIFF* tif, uint32 tag, TIFFDataType dt)
 	return tif->tif_foundfield = (ret ? *ret : NULL);
 }
 
-const TIFFField*
+static const TIFFField*
 _TIFFFindFieldByName(TIFF* tif, const char *field_name, TIFFDataType dt)
 {
 	TIFFField key = {0, 0, 0, TIFF_NOTYPE, 0, 0, 0, 0, 0, 0, NULL, NULL};
@@ -713,7 +721,7 @@ _TIFFCreateAnonField(TIFF *tif, uint32 tag, TIFFDataType field_type)
 	 * note that this name is a special sign to TIFFClose() and
 	 * _TIFFSetupFields() to free the field
 	 */
-	snprintf(fld->field_name, 32, "Tag %d", (int) tag);
+	(void) snprintf(fld->field_name, 32, "Tag %d", (int) tag);
 
 	return fld;    
 }
diff --git a/src/3rdparty/libtiff/libtiff/tif_dirread.c b/src/3rdparty/libtiff/libtiff/tif_dirread.c
index a0dc68b..01070f2 100644
--- a/src/3rdparty/libtiff/libtiff/tif_dirread.c
+++ b/src/3rdparty/libtiff/libtiff/tif_dirread.c
@@ -1,4 +1,4 @@
-/* $Id: tif_dirread.c,v 1.191 2015-09-05 20:31:41 bfriesen Exp $ */
+/* $Id: tif_dirread.c,v 1.204 2016-11-16 15:14:15 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -166,6 +166,8 @@ static int TIFFFetchSubjectDistance(TIFF*, TIFFDirEntry*);
 static void ChopUpSingleUncompressedStrip(TIFF*);
 static uint64 TIFFReadUInt64(const uint8 *value);
 
+static int _TIFFFillStrilesInternal( TIFF *tif, int loadStripByteCount );
+
 typedef union _UInt64Aligned_t
 {
         double d;
@@ -3457,12 +3459,12 @@ TIFFReadDirectory(TIFF* tif)
 	 * the fields to check type and tag information,
 	 * and to extract info required to size data
 	 * structures.  A second pass is made afterwards
-	 * to read in everthing not taken in the first pass.
+	 * to read in everything not taken in the first pass.
 	 * But we must process the Compression tag first
 	 * in order to merge in codec-private tag definitions (otherwise
 	 * we may get complaints about unknown tags).  However, the
 	 * Compression tag may be dependent on the SamplesPerPixel
-	 * tag value because older TIFF specs permited Compression
+	 * tag value because older TIFF specs permitted Compression
 	 * to be written as a SamplesPerPixel-count tag entry.
 	 * Thus if we don't first figure out the correct SamplesPerPixel
 	 * tag value then we may end up ignoring the Compression tag
@@ -3626,6 +3628,7 @@ TIFFReadDirectory(TIFF* tif)
 	if (tif->tif_dir.td_planarconfig == PLANARCONFIG_SEPARATE)
 		tif->tif_dir.td_stripsperimage /= tif->tif_dir.td_samplesperpixel;
 	if (!TIFFFieldSet(tif, FIELD_STRIPOFFSETS)) {
+#ifdef OJPEG_SUPPORT
 		if ((tif->tif_dir.td_compression==COMPRESSION_OJPEG) &&
 		    (isTiled(tif)==0) &&
 		    (tif->tif_dir.td_nstrips==1)) {
@@ -3638,7 +3641,9 @@ TIFFReadDirectory(TIFF* tif)
 			 * JpegInterchangeFormat stream.
 			 */
 			TIFFSetFieldBit(tif, FIELD_STRIPOFFSETS);
-		} else {
+		} else
+#endif
+        {
 			MissingRequired(tif,
 				isTiled(tif) ? "TileOffsets" : "StripOffsets");
 			goto bad;
@@ -3663,7 +3668,7 @@ TIFFReadDirectory(TIFF* tif)
 				 * DataType and SampleFormat tags are supposed to be
 				 * written as one value/sample, but some vendors
 				 * incorrectly write one value only -- so we accept
-				 * that as well (yech). Other vendors write correct
+				 * that as well (yuck). Other vendors write correct
 				 * value for NumberOfSamples, but incorrect one for
 				 * BitsPerSample and friends, and we will read this
 				 * too.
@@ -3690,7 +3695,7 @@ TIFFReadDirectory(TIFF* tif)
 			case TIFFTAG_SMAXSAMPLEVALUE:
 				{
 
-					double *data;
+					double *data = NULL;
 					enum TIFFReadDirEntryErr err;
 					uint32 saved_flags;
 					int m;
@@ -3741,7 +3746,7 @@ TIFFReadDirectory(TIFF* tif)
 					uint32 countrequired;
 					uint32 incrementpersample;
 					uint16* value=NULL;
-                    /* It would be dangerous to instanciate those tag values */
+                    /* It would be dangerous to instantiate those tag values */
                     /* since if td_bitspersample has not yet been read (due to */
                     /* unordered tags), it could be read afterwards with a */
                     /* values greater than the default one (1), which may cause */
@@ -3754,7 +3759,19 @@ TIFFReadDirectory(TIFF* tif)
                                        fip ? fip->field_name : "unknown tagname");
                         continue;
                     }
-					countpersample=(1L<<tif->tif_dir.td_bitspersample);
+					/* ColorMap or TransferFunction for high bit */
+					/* depths do not make much sense and could be */
+					/* used as a denial of service vector */
+					if (tif->tif_dir.td_bitspersample > 24)
+					{
+					    fip = TIFFFieldWithTag(tif,dp->tdir_tag);
+					    TIFFWarningExt(tif->tif_clientdata,module,
+						"Ignoring %s because BitsPerSample=%d>24",
+						fip ? fip->field_name : "unknown tagname",
+						tif->tif_dir.td_bitspersample);
+					    continue;
+					}
+					countpersample=(1U<<tif->tif_dir.td_bitspersample);
 					if ((dp->tdir_tag==TIFFTAG_TRANSFERFUNCTION)&&(dp->tdir_count==(uint64)countpersample))
 					{
 						countrequired=countpersample;
@@ -4142,7 +4159,7 @@ TIFFReadDirectoryFindFieldInfo(TIFF* tif, uint16 tagid, uint32* fii)
 }
 
 /*
- * Read custom directory from the arbitarry offset.
+ * Read custom directory from the arbitrary offset.
  * The code is very similar to TIFFReadDirectory().
  */
 int
@@ -4269,8 +4286,9 @@ EstimateStripByteCounts(TIFF* tif, TIFFDirEntry* dir, uint16 dircount)
 	TIFFDirectory *td = &tif->tif_dir;
 	uint32 strip;
 
-    if( !_TIFFFillStriles( tif ) )
-        return -1;
+    /* Do not try to load stripbytecount as we will compute it */
+        if( !_TIFFFillStrilesInternal( tif, 0 ) )
+            return -1;
 
 	if (td->td_stripbytecount)
 		_TIFFfree(td->td_stripbytecount);
@@ -4292,7 +4310,7 @@ EstimateStripByteCounts(TIFF* tif, TIFFDirEntry* dir, uint16 dircount)
 		/* calculate amount of space used by indirect values */
 		for (dp = dir, n = dircount; n > 0; n--, dp++)
 		{
-			uint32 typewidth = TIFFDataWidth((TIFFDataType) dp->tdir_type);
+			uint32 typewidth;
 			uint64 datasize;
 			typewidth = TIFFDataWidth((TIFFDataType) dp->tdir_type);
 			if (typewidth == 0) {
@@ -4382,7 +4400,7 @@ TIFFCheckDirOffset(TIFF* tif, uint64 diroff)
 
 	tif->tif_dirnumber++;
 
-	if (tif->tif_dirnumber > tif->tif_dirlistsize) {
+	if (tif->tif_dirlist == NULL || tif->tif_dirnumber > tif->tif_dirlistsize) {
 		uint64* new_dirlist;
 
 		/*
@@ -4572,7 +4590,6 @@ TIFFFetchDirectory(TIFF* tif, uint64 diroff, TIFFDirEntry** pdir,
 		}
 		else
 		{
-			tmsize_t m;
 			uint64 dircount64;
 			m=off+sizeof(uint64);
 			if ((m<off)||(m<(tmsize_t)sizeof(uint64))||(m>tif->tif_size)) {
@@ -4983,6 +5000,11 @@ TIFFFetchNormalTag(TIFF* tif, TIFFDirEntry* dp, int recover)
 					if (err==TIFFReadDirEntryErrOk)
 					{
 						int m;
+                        if( dp->tdir_count > 0 && data[dp->tdir_count-1] != '\0' )
+                        {
+                            TIFFWarningExt(tif->tif_clientdata,module,"ASCII value for tag \"%s\" does not end in null byte. Forcing it to be null",fip->field_name);
+                            data[dp->tdir_count-1] = '\0';
+                        }
 						m=TIFFSetField(tif,dp->tdir_tag,(uint16)(dp->tdir_count),data);
 						if (data!=0)
 							_TIFFfree(data);
@@ -5155,6 +5177,11 @@ TIFFFetchNormalTag(TIFF* tif, TIFFDirEntry* dp, int recover)
 				if (err==TIFFReadDirEntryErrOk)
 				{
 					int m;
+                    if( dp->tdir_count > 0 && data[dp->tdir_count-1] != '\0' )
+                    {
+                        TIFFWarningExt(tif->tif_clientdata,module,"ASCII value for tag \"%s\" does not end in null byte. Forcing it to be null",fip->field_name);
+                        data[dp->tdir_count-1] = '\0';
+                    }
 					m=TIFFSetField(tif,dp->tdir_tag,(uint32)(dp->tdir_count),data);
 					if (data!=0)
 						_TIFFfree(data);
@@ -5558,6 +5585,11 @@ ChopUpSingleUncompressedStrip(TIFF* tif)
 
 int _TIFFFillStriles( TIFF *tif )
 {
+    return _TIFFFillStrilesInternal( tif, 1 );
+}
+
+static int _TIFFFillStrilesInternal( TIFF *tif, int loadStripByteCount )
+{
 #if defined(DEFER_STRILE_LOAD)
         register TIFFDirectory *td = &tif->tif_dir;
         int return_value = 1;
@@ -5574,7 +5606,8 @@ int _TIFFFillStriles( TIFF *tif )
                 return_value = 0;
         }
 
-        if (!TIFFFetchStripThing(tif,&(td->td_stripbytecount_entry),
+        if (loadStripByteCount &&
+            !TIFFFetchStripThing(tif,&(td->td_stripbytecount_entry),
                                  td->td_nstrips,&td->td_stripbytecount))
         {
                 return_value = 0;
@@ -5599,6 +5632,7 @@ int _TIFFFillStriles( TIFF *tif )
         return return_value;
 #else /* !defined(DEFER_STRILE_LOAD) */
         (void) tif;
+        (void) loadStripByteCount;
         return 1;
 #endif 
 }
diff --git a/src/3rdparty/libtiff/libtiff/tif_dirwrite.c b/src/3rdparty/libtiff/libtiff/tif_dirwrite.c
index a0fd8dc..d34f6f6 100644
--- a/src/3rdparty/libtiff/libtiff/tif_dirwrite.c
+++ b/src/3rdparty/libtiff/libtiff/tif_dirwrite.c
@@ -1,4 +1,4 @@
-/* $Id: tif_dirwrite.c,v 1.78 2015-05-31 00:38:46 bfriesen Exp $ */
+/* $Id: tif_dirwrite.c,v 1.83 2016-10-25 21:35:15 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -542,8 +542,20 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 			{
 				if (!isTiled(tif))
 				{
-					if (!TIFFWriteDirectoryTagLongLong8Array(tif,&ndir,dir,TIFFTAG_STRIPOFFSETS,tif->tif_dir.td_nstrips,tif->tif_dir.td_stripoffset))
-						goto bad;
+                    /* td_stripoffset might be NULL in an odd OJPEG case. See
+                     *  tif_dirread.c around line 3634.
+                     * XXX: OJPEG hack.
+                     * If a) compression is OJPEG, b) it's not a tiled TIFF,
+                     * and c) the number of strips is 1,
+                     * then we tolerate the absence of stripoffsets tag,
+                     * because, presumably, all required data is in the
+                     * JpegInterchangeFormat stream.
+                     * We can get here when using tiffset on such a file.
+                     * See http://bugzilla.maptools.org/show_bug.cgi?id=2500
+                    */
+                    if (tif->tif_dir.td_stripoffset != NULL &&
+                        !TIFFWriteDirectoryTagLongLong8Array(tif,&ndir,dir,TIFFTAG_STRIPOFFSETS,tif->tif_dir.td_nstrips,tif->tif_dir.td_stripoffset))
+                        goto bad;
 				}
 				else
 				{
@@ -645,7 +657,7 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 									assert(o->field_passcount==0);
 									TIFFGetField(tif,o->field_tag,&pb);
 									pa=(uint32)(strlen(pb));
-									if (!TIFFWriteDirectoryTagAscii(tif,&ndir,dir,o->field_tag,pa,pb))
+									if (!TIFFWriteDirectoryTagAscii(tif,&ndir,dir,(uint16)o->field_tag,pa,pb))
 										goto bad;
 								}
 								break;
@@ -656,7 +668,7 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 									assert(o->field_readcount==1);
 									assert(o->field_passcount==0);
 									TIFFGetField(tif,o->field_tag,&p);
-									if (!TIFFWriteDirectoryTagShort(tif,&ndir,dir,o->field_tag,p))
+									if (!TIFFWriteDirectoryTagShort(tif,&ndir,dir,(uint16)o->field_tag,p))
 										goto bad;
 								}
 								break;
@@ -667,7 +679,7 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 									assert(o->field_readcount==1);
 									assert(o->field_passcount==0);
 									TIFFGetField(tif,o->field_tag,&p);
-									if (!TIFFWriteDirectoryTagLong(tif,&ndir,dir,o->field_tag,p))
+									if (!TIFFWriteDirectoryTagLong(tif,&ndir,dir,(uint16)o->field_tag,p))
 										goto bad;
 								}
 								break;
@@ -679,7 +691,7 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 									assert(o->field_readcount==TIFF_VARIABLE2);
 									assert(o->field_passcount==1);
 									TIFFGetField(tif,o->field_tag,&pa,&pb);
-									if (!TIFFWriteDirectoryTagUndefinedArray(tif,&ndir,dir,o->field_tag,pa,pb))
+									if (!TIFFWriteDirectoryTagUndefinedArray(tif,&ndir,dir,(uint16)o->field_tag,pa,pb))
 										goto bad;
 								}
 								break;
@@ -693,70 +705,72 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 		}
 		for (m=0; m<(uint32)(tif->tif_dir.td_customValueCount); m++)
 		{
+                        uint16 tag = (uint16)tif->tif_dir.td_customValues[m].info->field_tag;
+                        uint32 count = tif->tif_dir.td_customValues[m].count;
 			switch (tif->tif_dir.td_customValues[m].info->field_type)
 			{
 				case TIFF_ASCII:
-					if (!TIFFWriteDirectoryTagAscii(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagAscii(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_UNDEFINED:
-					if (!TIFFWriteDirectoryTagUndefinedArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagUndefinedArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_BYTE:
-					if (!TIFFWriteDirectoryTagByteArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagByteArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_SBYTE:
-					if (!TIFFWriteDirectoryTagSbyteArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagSbyteArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_SHORT:
-					if (!TIFFWriteDirectoryTagShortArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagShortArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_SSHORT:
-					if (!TIFFWriteDirectoryTagSshortArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagSshortArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_LONG:
-					if (!TIFFWriteDirectoryTagLongArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagLongArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_SLONG:
-					if (!TIFFWriteDirectoryTagSlongArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagSlongArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_LONG8:
-					if (!TIFFWriteDirectoryTagLong8Array(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagLong8Array(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_SLONG8:
-					if (!TIFFWriteDirectoryTagSlong8Array(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagSlong8Array(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_RATIONAL:
-					if (!TIFFWriteDirectoryTagRationalArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagRationalArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_SRATIONAL:
-					if (!TIFFWriteDirectoryTagSrationalArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagSrationalArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_FLOAT:
-					if (!TIFFWriteDirectoryTagFloatArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagFloatArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_DOUBLE:
-					if (!TIFFWriteDirectoryTagDoubleArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagDoubleArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_IFD:
-					if (!TIFFWriteDirectoryTagIfdArray(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagIfdArray(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				case TIFF_IFD8:
-					if (!TIFFWriteDirectoryTagIfdIfd8Array(tif,&ndir,dir,tif->tif_dir.td_customValues[m].info->field_tag,tif->tif_dir.td_customValues[m].count,tif->tif_dir.td_customValues[m].value))
+					if (!TIFFWriteDirectoryTagIfdIfd8Array(tif,&ndir,dir,tag,count,tif->tif_dir.td_customValues[m].value))
 						goto bad;
 					break;
 				default:
@@ -778,7 +792,7 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 				goto bad;
 		}
 		else
-			tif->tif_diroff=(TIFFSeekFile(tif,0,SEEK_END)+1)&(~1);
+			tif->tif_diroff=(TIFFSeekFile(tif,0,SEEK_END)+1)&(~((toff_t)1));
 		if (pdiroff!=NULL)
 			*pdiroff=tif->tif_diroff;
 		if (!(tif->tif_flags&TIFF_BIGTIFF))
@@ -828,7 +842,7 @@ TIFFWriteDirectorySec(TIFF* tif, int isimage, int imagedone, uint64* pdiroff)
 		uint32 nTmp;
 		TIFFDirEntry* o;
 		n=dirmem;
-		*(uint16*)n=ndir;
+		*(uint16*)n=(uint16)ndir;
 		if (tif->tif_flags&TIFF_SWAB)
 			TIFFSwabShort((uint16*)n);
 		n+=2;
@@ -2141,13 +2155,13 @@ TIFFWriteDirectoryTagCheckedRationalArray(TIFF* tif, uint32* ndir, TIFFDirEntry*
 		}
 		else if (*na<1.0)
 		{
-			nb[0]=(uint32)((*na)*0xFFFFFFFF);
+			nb[0]=(uint32)((double)(*na)*0xFFFFFFFF);
 			nb[1]=0xFFFFFFFF;
 		}
 		else
 		{
 			nb[0]=0xFFFFFFFF;
-			nb[1]=(uint32)(0xFFFFFFFF/(*na));
+			nb[1]=(uint32)((double)0xFFFFFFFF/(*na));
 		}
 	}
 	if (tif->tif_flags&TIFF_SWAB)
@@ -2184,13 +2198,13 @@ TIFFWriteDirectoryTagCheckedSrationalArray(TIFF* tif, uint32* ndir, TIFFDirEntry
 			}
 			else if (*na>-1.0)
 			{
-				nb[0]=-(int32)((-*na)*0x7FFFFFFF);
+				nb[0]=-(int32)((double)(-*na)*0x7FFFFFFF);
 				nb[1]=0x7FFFFFFF;
 			}
 			else
 			{
 				nb[0]=-0x7FFFFFFF;
-				nb[1]=(int32)(0x7FFFFFFF/(-*na));
+				nb[1]=(int32)((double)0x7FFFFFFF/(-*na));
 			}
 		}
 		else
@@ -2202,13 +2216,13 @@ TIFFWriteDirectoryTagCheckedSrationalArray(TIFF* tif, uint32* ndir, TIFFDirEntry
 			}
 			else if (*na<1.0)
 			{
-				nb[0]=(int32)((*na)*0x7FFFFFFF);
+				nb[0]=(int32)((double)(*na)*0x7FFFFFFF);
 				nb[1]=0x7FFFFFFF;
 			}
 			else
 			{
 				nb[0]=0x7FFFFFFF;
-				nb[1]=(int32)(0x7FFFFFFF/(*na));
+				nb[1]=(int32)((double)0x7FFFFFFF/(*na));
 			}
 		}
 	}
@@ -2368,7 +2382,7 @@ TIFFLinkDirectory(TIFF* tif)
 {
 	static const char module[] = "TIFFLinkDirectory";
 
-	tif->tif_diroff = (TIFFSeekFile(tif,0,SEEK_END)+1) &~ 1;
+	tif->tif_diroff = (TIFFSeekFile(tif,0,SEEK_END)+1) & (~((toff_t)1));
 
 	/*
 	 * Handle SubIFDs
diff --git a/src/3rdparty/libtiff/libtiff/tif_dumpmode.c b/src/3rdparty/libtiff/libtiff/tif_dumpmode.c
index a94cf0b..a6a94c0 100644
--- a/src/3rdparty/libtiff/libtiff/tif_dumpmode.c
+++ b/src/3rdparty/libtiff/libtiff/tif_dumpmode.c
@@ -1,4 +1,4 @@
-/* $Header: /cvs/maptools/cvsroot/libtiff/libtiff/tif_dumpmode.c,v 1.14 2011-04-02 20:54:09 bfriesen Exp $ */
+/* $Header: /cvs/maptools/cvsroot/libtiff/libtiff/tif_dumpmode.c,v 1.15 2015-12-12 18:04:26 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -66,7 +66,7 @@ DumpModeEncode(TIFF* tif, uint8* pp, tmsize_t cc, uint16 s)
 		cc -= n;
 		if (tif->tif_rawcc >= tif->tif_rawdatasize &&
 		    !TIFFFlushData1(tif))
-			return (-1);
+			return (0);
 	}
 	return (1);
 }
diff --git a/src/3rdparty/libtiff/libtiff/tif_extension.c b/src/3rdparty/libtiff/libtiff/tif_extension.c
index 10afd41..39fab4c 100644
--- a/src/3rdparty/libtiff/libtiff/tif_extension.c
+++ b/src/3rdparty/libtiff/libtiff/tif_extension.c
@@ -1,4 +1,4 @@
-/* $Header: /cvs/maptools/cvsroot/libtiff/libtiff/tif_extension.c,v 1.7 2010-03-10 18:56:48 bfriesen Exp $ */
+/* $Header: /cvs/maptools/cvsroot/libtiff/libtiff/tif_extension.c,v 1.8 2015-12-06 11:13:43 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -66,13 +66,13 @@ TIFFTagMethods *TIFFAccessTagMethods( TIFF *tif )
 void *TIFFGetClientInfo( TIFF *tif, const char *name )
 
 {
-    TIFFClientInfoLink *link = tif->tif_clientinfo;
+    TIFFClientInfoLink *psLink = tif->tif_clientinfo;
 
-    while( link != NULL && strcmp(link->name,name) != 0 )
-        link = link->next;
+    while( psLink != NULL && strcmp(psLink->name,name) != 0 )
+        psLink = psLink->next;
 
-    if( link != NULL )
-        return link->data;
+    if( psLink != NULL )
+        return psLink->data;
     else
         return NULL;
 }
@@ -80,18 +80,18 @@ void *TIFFGetClientInfo( TIFF *tif, const char *name )
 void TIFFSetClientInfo( TIFF *tif, void *data, const char *name )
 
 {
-    TIFFClientInfoLink *link = tif->tif_clientinfo;
+    TIFFClientInfoLink *psLink = tif->tif_clientinfo;
 
     /*
     ** Do we have an existing link with this name?  If so, just
     ** set it.
     */
-    while( link != NULL && strcmp(link->name,name) != 0 )
-        link = link->next;
+    while( psLink != NULL && strcmp(psLink->name,name) != 0 )
+        psLink = psLink->next;
 
-    if( link != NULL )
+    if( psLink != NULL )
     {
-        link->data = data;
+        psLink->data = data;
         return;
     }
 
@@ -99,15 +99,15 @@ void TIFFSetClientInfo( TIFF *tif, void *data, const char *name )
     ** Create a new link.
     */
 
-    link = (TIFFClientInfoLink *) _TIFFmalloc(sizeof(TIFFClientInfoLink));
-    assert (link != NULL);
-    link->next = tif->tif_clientinfo;
-    link->name = (char *) _TIFFmalloc((tmsize_t)(strlen(name)+1));
-    assert (link->name != NULL);
-    strcpy(link->name, name);
-    link->data = data;
+    psLink = (TIFFClientInfoLink *) _TIFFmalloc(sizeof(TIFFClientInfoLink));
+    assert (psLink != NULL);
+    psLink->next = tif->tif_clientinfo;
+    psLink->name = (char *) _TIFFmalloc((tmsize_t)(strlen(name)+1));
+    assert (psLink->name != NULL);
+    strcpy(psLink->name, name);
+    psLink->data = data;
 
-    tif->tif_clientinfo = link;
+    tif->tif_clientinfo = psLink;
 }
 /*
  * Local Variables:
diff --git a/src/3rdparty/libtiff/libtiff/tif_fax3.c b/src/3rdparty/libtiff/libtiff/tif_fax3.c
index bbe7255..16bb4d3 100644
--- a/src/3rdparty/libtiff/libtiff/tif_fax3.c
+++ b/src/3rdparty/libtiff/libtiff/tif_fax3.c
@@ -1,4 +1,4 @@
-/* $Id: tif_fax3.c,v 1.75 2015-08-30 20:49:55 erouault Exp $ */
+/* $Id: tif_fax3.c,v 1.78 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1990-1997 Sam Leffler
@@ -644,7 +644,8 @@ putspan(TIFF* tif, int32 span, const tableentry* tab)
 
 	while (span >= 2624) {
 		const tableentry* te = &tab[63 + (2560>>6)];
-		code = te->code, length = te->length;
+		code = te->code;
+		length = te->length;
 #ifdef FAX3_DEBUG
 		DEBUG_PRINT("MakeUp", te->runlen);
 #endif
@@ -654,14 +655,16 @@ putspan(TIFF* tif, int32 span, const tableentry* tab)
 	if (span >= 64) {
 		const tableentry* te = &tab[63 + (span>>6)];
 		assert(te->runlen == 64*(span>>6));
-		code = te->code, length = te->length;
+		code = te->code;
+		length = te->length;
 #ifdef FAX3_DEBUG
 		DEBUG_PRINT("MakeUp", te->runlen);
 #endif
 		_PutBits(tif, code, length);
 		span -= te->runlen;
 	}
-	code = tab[span].code, length = tab[span].length;
+	code = tab[span].code;
+	length = tab[span].length;
 #ifdef FAX3_DEBUG
 	DEBUG_PRINT("  Term", tab[span].runlen);
 #endif
@@ -697,14 +700,16 @@ Fax3PutEOL(TIFF* tif)
 				align = sp->bit + (8 - align);
 			else
 				align = sp->bit - align;
-			code = 0;
 			tparm=align; 
 			_PutBits(tif, 0, tparm);
 		}
 	}
-	code = EOL, length = 12;
-	if (is2DEncoding(sp))
-		code = (code<<1) | (sp->tag == G3_1D), length++;
+	code = EOL;
+	length = 12;
+	if (is2DEncoding(sp)) {
+		code = (code<<1) | (sp->tag == G3_1D);
+		length++;
+	}
 	_PutBits(tif, code, length);
 
 	sp->data = data;
@@ -815,7 +820,7 @@ find0span(unsigned char* bp, int32 bs, int32 be)
 	/*
 	 * Check partial byte on lhs.
 	 */
-	if (bits > 0 && (n = (bs & 7))) {
+	if (bits > 0 && (n = (bs & 7)) != 0) {
 		span = zeroruns[(*bp << n) & 0xff];
 		if (span > 8-n)		/* table value too generous */
 			span = 8-n;
@@ -835,12 +840,14 @@ find0span(unsigned char* bp, int32 bs, int32 be)
 		while (!isAligned(bp, long)) {
 			if (*bp != 0x00)
 				return (span + zeroruns[*bp]);
-			span += 8, bits -= 8;
+			span += 8;
+			bits -= 8;
 			bp++;
 		}
 		lp = (long*) bp;
 		while ((bits >= (int32)(8 * sizeof(long))) && (0 == *lp)) {
-			span += 8*sizeof (long), bits -= 8*sizeof (long);
+			span += 8*sizeof (long);
+			bits -= 8*sizeof (long);
 			lp++;
 		}
 		bp = (unsigned char*) lp;
@@ -851,7 +858,8 @@ find0span(unsigned char* bp, int32 bs, int32 be)
 	while (bits >= 8) {
 		if (*bp != 0x00)	/* end of run */
 			return (span + zeroruns[*bp]);
-		span += 8, bits -= 8;
+		span += 8;
+		bits -= 8;
 		bp++;
 	}
 	/*
@@ -874,7 +882,7 @@ find1span(unsigned char* bp, int32 bs, int32 be)
 	/*
 	 * Check partial byte on lhs.
 	 */
-	if (bits > 0 && (n = (bs & 7))) {
+	if (bits > 0 && (n = (bs & 7)) != 0) {
 		span = oneruns[(*bp << n) & 0xff];
 		if (span > 8-n)		/* table value too generous */
 			span = 8-n;
@@ -894,12 +902,14 @@ find1span(unsigned char* bp, int32 bs, int32 be)
 		while (!isAligned(bp, long)) {
 			if (*bp != 0xff)
 				return (span + oneruns[*bp]);
-			span += 8, bits -= 8;
+			span += 8;
+			bits -= 8;
 			bp++;
 		}
 		lp = (long*) bp;
 		while ((bits >= (int32)(8 * sizeof(long))) && (~0 == *lp)) {
-			span += 8*sizeof (long), bits -= 8*sizeof (long);
+			span += 8*sizeof (long);
+			bits -= 8*sizeof (long);
 			lp++;
 		}
 		bp = (unsigned char*) lp;
@@ -910,7 +920,8 @@ find1span(unsigned char* bp, int32 bs, int32 be)
 	while (bits >= 8) {
 		if (*bp != 0xff)	/* end of run */
 			return (span + oneruns[*bp]);
-		span += 8, bits -= 8;
+		span += 8;
+		bits -= 8;
 		bp++;
 	}
 	/*
@@ -1094,8 +1105,10 @@ Fax3Close(TIFF* tif)
 		unsigned int length = 12;
 		int i;
 
-		if (is2DEncoding(sp))
-			code = (code<<1) | (sp->tag == G3_1D), length++;
+		if (is2DEncoding(sp)) {
+			code = (code<<1) | (sp->tag == G3_1D);
+			length++;
+		}
 		for (i = 0; i < 6; i++)
 			Fax3PutBits(tif, code, length);
 		Fax3FlushBits(tif, sp);
@@ -1182,7 +1195,7 @@ Fax3VSetField(TIFF* tif, uint32 tag, va_list ap)
 		return (*sp->vsetparent)(tif, tag, ap);
 	}
 	
-	if ((fip = TIFFFieldWithTag(tif, tag)))
+	if ((fip = TIFFFieldWithTag(tif, tag)) != NULL)
 		TIFFSetFieldBit(tif, fip->field_bit);
 	else
 		return 0;
@@ -1241,10 +1254,14 @@ Fax3PrintDir(TIFF* tif, FILE* fd, long flags)
 		} else {
 
 			fprintf(fd, "  Group 3 Options:");
-			if (sp->groupoptions & GROUP3OPT_2DENCODING)
-				fprintf(fd, "%s2-d encoding", sep), sep = "+";
-			if (sp->groupoptions & GROUP3OPT_FILLBITS)
-				fprintf(fd, "%sEOL padding", sep), sep = "+";
+			if (sp->groupoptions & GROUP3OPT_2DENCODING) {
+				fprintf(fd, "%s2-d encoding", sep);
+				sep = "+";
+			}
+			if (sp->groupoptions & GROUP3OPT_FILLBITS) {
+				fprintf(fd, "%sEOL padding", sep);
+				sep = "+";
+			}
 			if (sp->groupoptions & GROUP3OPT_UNCOMPRESSED)
 				fprintf(fd, "%suncompressed data", sep);
 		}
diff --git a/src/3rdparty/libtiff/libtiff/tif_fax3.h b/src/3rdparty/libtiff/libtiff/tif_fax3.h
index b0f46c9..e0b2ca6 100644
--- a/src/3rdparty/libtiff/libtiff/tif_fax3.h
+++ b/src/3rdparty/libtiff/libtiff/tif_fax3.h
@@ -1,4 +1,4 @@
-/* $Id: tif_fax3.h,v 1.9 2011-03-10 20:23:07 fwarmerdam Exp $ */
+/* $Id: tif_fax3.h,v 1.11 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1990-1997 Sam Leffler
@@ -39,7 +39,7 @@
 
 /*
  * To override the default routine used to image decoded
- * spans one can use the pseduo tag TIFFTAG_FAXFILLFUNC.
+ * spans one can use the pseudo tag TIFFTAG_FAXFILLFUNC.
  * The routine must have the type signature given below;
  * for example:
  *
@@ -84,7 +84,7 @@ extern void _TIFFFax3fillruns(unsigned char*, uint32*, uint32*, uint32);
 typedef struct {                /* state table entry */
 	unsigned char State;    /* see above */
 	unsigned char Width;    /* width of code in bits */
-	uint32 Param;           /* unsigned 32-bit run length in bits */
+	uint16 Param;           /* unsigned 16-bit run length in bits */
 } TIFFFaxTabEnt;
 
 extern const TIFFFaxTabEnt TIFFFaxMainTable[];
diff --git a/src/3rdparty/libtiff/libtiff/tif_getimage.c b/src/3rdparty/libtiff/libtiff/tif_getimage.c
index f49b73f..0f5e932 100644
--- a/src/3rdparty/libtiff/libtiff/tif_getimage.c
+++ b/src/3rdparty/libtiff/libtiff/tif_getimage.c
@@ -1,4 +1,4 @@
-/* $Id: tif_getimage.c,v 1.90 2015-06-17 01:34:08 bfriesen Exp $ */
+/* $Id: tif_getimage.c,v 1.98 2016-11-18 02:47:45 bfriesen Exp $ */
 
 /*
  * Copyright (c) 1991-1997 Sam Leffler
@@ -95,6 +95,10 @@ TIFFRGBAImageOK(TIFF* tif, char emsg[1024])
 			    td->td_bitspersample);
 			return (0);
 	}
+        if (td->td_sampleformat == SAMPLEFORMAT_IEEEFP) {
+                sprintf(emsg, "Sorry, can not handle images with IEEE floating-point samples");
+                return (0);
+        }
 	colorchannels = td->td_samplesperpixel - td->td_extrasamples;
 	if (!TIFFGetField(tif, TIFFTAG_PHOTOMETRIC, &photometric)) {
 		switch (colorchannels) {
@@ -182,25 +186,25 @@ TIFFRGBAImageOK(TIFF* tif, char emsg[1024])
 				    "Planarconfiguration", td->td_planarconfig);
 				return (0);
 			}
-			if( td->td_samplesperpixel != 3 )
-            {
-                sprintf(emsg,
-                        "Sorry, can not handle image with %s=%d",
-                        "Samples/pixel", td->td_samplesperpixel);
-                return 0;
-            }
+			if ( td->td_samplesperpixel != 3 || colorchannels != 3 ) {
+                                sprintf(emsg,
+                                        "Sorry, can not handle image with %s=%d, %s=%d",
+                                        "Samples/pixel", td->td_samplesperpixel,
+                                        "colorchannels", colorchannels);
+                                return 0;
+                        }
 			break;
 		case PHOTOMETRIC_CIELAB:
-            if( td->td_samplesperpixel != 3 || td->td_bitspersample != 8 )
-            {
-                sprintf(emsg,
-                        "Sorry, can not handle image with %s=%d and %s=%d",
-                        "Samples/pixel", td->td_samplesperpixel,
-                        "Bits/sample", td->td_bitspersample);
-                return 0;
-            }
+                        if ( td->td_samplesperpixel != 3 || colorchannels != 3 || td->td_bitspersample != 8 ) {
+                                sprintf(emsg,
+                                        "Sorry, can not handle image with %s=%d, %s=%d and %s=%d",
+                                        "Samples/pixel", td->td_samplesperpixel,
+                                        "colorchannels", colorchannels,
+                                        "Bits/sample", td->td_bitspersample);
+                                return 0;
+                        }
 			break;
-		default:
+                default:
 			sprintf(emsg, "Sorry, can not handle image with %s=%d",
 			    photoTag, photometric);
 			return (0);
@@ -211,20 +215,34 @@ TIFFRGBAImageOK(TIFF* tif, char emsg[1024])
 void
 TIFFRGBAImageEnd(TIFFRGBAImage* img)
 {
-	if (img->Map)
-		_TIFFfree(img->Map), img->Map = NULL;
-	if (img->BWmap)
-		_TIFFfree(img->BWmap), img->BWmap = NULL;
-	if (img->PALmap)
-		_TIFFfree(img->PALmap), img->PALmap = NULL;
-	if (img->ycbcr)
-		_TIFFfree(img->ycbcr), img->ycbcr = NULL;
-	if (img->cielab)
-		_TIFFfree(img->cielab), img->cielab = NULL;
-	if (img->UaToAa)
-		_TIFFfree(img->UaToAa), img->UaToAa = NULL;
-	if (img->Bitdepth16To8)
-		_TIFFfree(img->Bitdepth16To8), img->Bitdepth16To8 = NULL;
+	if (img->Map) {
+		_TIFFfree(img->Map);
+		img->Map = NULL;
+	}
+	if (img->BWmap) {
+		_TIFFfree(img->BWmap);
+		img->BWmap = NULL;
+	}
+	if (img->PALmap) {
+		_TIFFfree(img->PALmap);
+		img->PALmap = NULL;
+	}
+	if (img->ycbcr) {
+		_TIFFfree(img->ycbcr);
+		img->ycbcr = NULL;
+	}
+	if (img->cielab) {
+		_TIFFfree(img->cielab);
+		img->cielab = NULL;
+	}
+	if (img->UaToAa) {
+		_TIFFfree(img->UaToAa);
+		img->UaToAa = NULL;
+	}
+	if (img->Bitdepth16To8) {
+		_TIFFfree(img->Bitdepth16To8);
+		img->Bitdepth16To8 = NULL;
+	}
 
 	if( img->redcmap ) {
 		_TIFFfree( img->redcmap );
@@ -255,6 +273,9 @@ TIFFRGBAImageBegin(TIFFRGBAImage* img, TIFF* tif, int stop, char emsg[1024])
 	int colorchannels;
 	uint16 *red_orig, *green_orig, *blue_orig;
 	int n_color;
+	
+	if( !TIFFRGBAImageOK(tif, emsg) )
+		return 0;
 
 	/* Initialize to normal values */
 	img->row_offset = 0;
@@ -338,7 +359,7 @@ TIFFRGBAImageBegin(TIFFRGBAImage* img, TIFF* tif, int stop, char emsg[1024])
 			}
 
 			/* copy the colormaps so we can modify them */
-			n_color = (1L << img->bitspersample);
+			n_color = (1U << img->bitspersample);
 			img->redcmap = (uint16 *) _TIFFmalloc(sizeof(uint16)*n_color);
 			img->greencmap = (uint16 *) _TIFFmalloc(sizeof(uint16)*n_color);
 			img->bluecmap = (uint16 *) _TIFFmalloc(sizeof(uint16)*n_color);
@@ -351,7 +372,7 @@ TIFFRGBAImageBegin(TIFFRGBAImage* img, TIFF* tif, int stop, char emsg[1024])
 			_TIFFmemcpy( img->greencmap, green_orig, n_color * 2 );
 			_TIFFmemcpy( img->bluecmap, blue_orig, n_color * 2 );
 
-			/* fall thru... */
+			/* fall through... */
 		case PHOTOMETRIC_MINISWHITE:
 		case PHOTOMETRIC_MINISBLACK:
 			if (planarconfig == PLANARCONFIG_CONTIG
@@ -509,7 +530,7 @@ TIFFReadRGBAImageOriented(TIFF* tif,
     int ok;
 
 	if (TIFFRGBAImageOK(tif, emsg) && TIFFRGBAImageBegin(&img, tif, stop, emsg)) {
-		img.req_orientation = orientation;
+		img.req_orientation = (uint16)orientation;
 		/* XXX verify rwidth and rheight against width and height */
 		ok = TIFFRGBAImageGet(&img, raster+(rheight-img.height)*rwidth,
 			rwidth, img.height);
@@ -696,7 +717,8 @@ gtTileContig(TIFFRGBAImage* img, uint32* raster, uint32 w, uint32 h)
 			    uint32 temp = *left;
 			    *left = *right;
 			    *right = temp;
-			    left++, right--;
+			    left++;
+				right--;
 		    }
 	    }
     }
@@ -729,7 +751,7 @@ gtTileSeparate(TIFFRGBAImage* img, uint32* raster, uint32 w, uint32 h)
 	int alpha = img->alpha;
 	uint32 nrow;
 	int ret = 1, flip;
-        int colorchannels;
+        uint16 colorchannels;
 	uint32 this_tw, tocol;
 	int32 this_toskew, leftmost_toskew;
 	int32 leftmost_fromskew;
@@ -863,7 +885,8 @@ gtTileSeparate(TIFFRGBAImage* img, uint32* raster, uint32 w, uint32 h)
 				uint32 temp = *left;
 				*left = *right;
 				*right = temp;
-				left++, right--;
+				left++;
+				right--;
 			}
 		}
 	}
@@ -953,7 +976,8 @@ gtStripContig(TIFFRGBAImage* img, uint32* raster, uint32 w, uint32 h)
 				uint32 temp = *left;
 				*left = *right;
 				*right = temp;
-				left++, right--;
+				left++;
+				right--;
 			}
 		}
 	}
@@ -984,7 +1008,8 @@ gtStripSeparate(TIFFRGBAImage* img, uint32* raster, uint32 w, uint32 h)
 	tmsize_t bufsize;
 	int32 fromskew, toskew;
 	int alpha = img->alpha;
-	int ret = 1, flip, colorchannels;
+	int ret = 1, flip;
+        uint16 colorchannels;
 
 	stripsize = TIFFStripSize(tif);  
 	bufsize = TIFFSafeMultiply(tmsize_t,alpha?4:3,stripsize);
@@ -1086,7 +1111,8 @@ gtStripSeparate(TIFFRGBAImage* img, uint32* raster, uint32 w, uint32 h)
 				uint32 temp = *left;
 				*left = *right;
 				*right = temp;
-				left++, right--;
+				left++;
+				right--;
 			}
 		}
 	}
@@ -1414,7 +1440,7 @@ DECLAREContigPutFunc(putRGBUAcontig8bittile)
 		uint8* m;
 		for (x = w; x-- > 0;) {
 			a = pp[3];
-			m = img->UaToAa+(a<<8);
+			m = img->UaToAa+((size_t) a<<8);
 			r = m[pp[0]];
 			g = m[pp[1]];
 			b = m[pp[2]];
@@ -1485,7 +1511,7 @@ DECLAREContigPutFunc(putRGBUAcontig16bittile)
 		uint8* m;
 		for (x = w; x-- > 0;) {
 			a = img->Bitdepth16To8[wp[3]];
-			m = img->UaToAa+(a<<8);
+			m = img->UaToAa+((size_t) a<<8);
 			r = m[img->Bitdepth16To8[wp[0]]];
 			g = m[img->Bitdepth16To8[wp[1]]];
 			b = m[img->Bitdepth16To8[wp[2]]];
@@ -1616,7 +1642,7 @@ DECLARESepPutFunc(putRGBUAseparate8bittile)
 		uint8* m;
 		for (x = w; x-- > 0;) {
 			av = *a++;
-			m = img->UaToAa+(av<<8);
+			m = img->UaToAa+((size_t) av<<8);
 			rv = m[*r++];
 			gv = m[*g++];
 			bv = m[*b++];
@@ -1678,15 +1704,15 @@ DECLARESepPutFunc(putRGBUAseparate16bittile)
 	uint16 *wa = (uint16*) a;
 	(void) img; (void) y;
 	while (h-- > 0) {
-		uint32 r,g,b,a;
+		uint32 r2,g2,b2,a2;
 		uint8* m;
 		for (x = w; x-- > 0;) {
-			a = img->Bitdepth16To8[*wa++];
-			m = img->UaToAa+(a<<8);
-			r = m[img->Bitdepth16To8[*wr++]];
-			g = m[img->Bitdepth16To8[*wg++]];
-			b = m[img->Bitdepth16To8[*wb++]];
-			*cp++ = PACK4(r,g,b,a);
+			a2 = img->Bitdepth16To8[*wa++];
+			m = img->UaToAa+((size_t) a2<<8);
+			r2 = m[img->Bitdepth16To8[*wr++]];
+			g2 = m[img->Bitdepth16To8[*wg++]];
+			b2 = m[img->Bitdepth16To8[*wb++]];
+			*cp++ = PACK4(r2,g2,b2,a2);
 		}
 		SKEW4(wr, wg, wb, wa, fromskew);
 		cp += toskew;
@@ -1841,10 +1867,16 @@ DECLAREContigPutFunc(putcontig8bitYCbCr44tile)
                 YCbCrtoRGB(cp3[2], pp[14]);
                 YCbCrtoRGB(cp3[3], pp[15]);
 
-                cp += 4, cp1 += 4, cp2 += 4, cp3 += 4;
+                cp += 4;
+                cp1 += 4;
+                cp2 += 4;
+                cp3 += 4;
                 pp += 18;
             } while (--x);
-            cp += incr, cp1 += incr, cp2 += incr, cp3 += incr;
+            cp += incr;
+            cp1 += incr;
+            cp2 += incr;
+            cp3 += incr;
             pp += fromskew;
         }
     } else {
@@ -1895,7 +1927,10 @@ DECLAREContigPutFunc(putcontig8bitYCbCr44tile)
             if (h <= 4)
                 break;
             h -= 4;
-            cp += incr, cp1 += incr, cp2 += incr, cp3 += incr;
+            cp += incr;
+            cp1 += incr;
+            cp2 += incr;
+            cp3 += incr;
             pp += fromskew;
         }
     }
@@ -1927,10 +1962,12 @@ DECLAREContigPutFunc(putcontig8bitYCbCr42tile)
                 YCbCrtoRGB(cp1[2], pp[6]);
                 YCbCrtoRGB(cp1[3], pp[7]);
                 
-                cp += 4, cp1 += 4;
+                cp += 4;
+                cp1 += 4;
                 pp += 10;
             } while (--x);
-            cp += incr, cp1 += incr;
+            cp += incr;
+            cp1 += incr;
             pp += fromskew;
         }
     } else {
@@ -1973,7 +2010,8 @@ DECLAREContigPutFunc(putcontig8bitYCbCr42tile)
             if (h <= 2)
                 break;
             h -= 2;
-            cp += incr, cp1 += incr;
+            cp += incr;
+            cp1 += incr;
             pp += fromskew;
         }
     }
@@ -2362,7 +2400,8 @@ setupMap(TIFFRGBAImage* img)
 	if (!makebwmap(img))
 	    return (0);
 	/* no longer need Map, free it */
-	_TIFFfree(img->Map), img->Map = NULL;
+	_TIFFfree(img->Map);
+	img->Map = NULL;
     }
     return (1);
 }
@@ -2470,7 +2509,7 @@ buildMap(TIFFRGBAImage* img)
     case PHOTOMETRIC_SEPARATED:
 	if (img->bitspersample == 8)
 	    break;
-	/* fall thru... */
+	/* fall through... */
     case PHOTOMETRIC_MINISBLACK:
     case PHOTOMETRIC_MINISWHITE:
 	if (!setupMap(img))
@@ -2508,29 +2547,33 @@ PickContigCase(TIFFRGBAImage* img)
 		case PHOTOMETRIC_RGB:
 			switch (img->bitspersample) {
 				case 8:
-					if (img->alpha == EXTRASAMPLE_ASSOCALPHA)
+					if (img->alpha == EXTRASAMPLE_ASSOCALPHA &&
+						img->samplesperpixel >= 4)
 						img->put.contig = putRGBAAcontig8bittile;
-					else if (img->alpha == EXTRASAMPLE_UNASSALPHA)
+					else if (img->alpha == EXTRASAMPLE_UNASSALPHA &&
+							 img->samplesperpixel >= 4)
 					{
 						if (BuildMapUaToAa(img))
 							img->put.contig = putRGBUAcontig8bittile;
 					}
-					else
+					else if( img->samplesperpixel >= 3 )
 						img->put.contig = putRGBcontig8bittile;
 					break;
 				case 16:
-					if (img->alpha == EXTRASAMPLE_ASSOCALPHA)
+					if (img->alpha == EXTRASAMPLE_ASSOCALPHA &&
+						img->samplesperpixel >=4 )
 					{
 						if (BuildMapBitdepth16To8(img))
 							img->put.contig = putRGBAAcontig16bittile;
 					}
-					else if (img->alpha == EXTRASAMPLE_UNASSALPHA)
+					else if (img->alpha == EXTRASAMPLE_UNASSALPHA &&
+							 img->samplesperpixel >=4 )
 					{
 						if (BuildMapBitdepth16To8(img) &&
 						    BuildMapUaToAa(img))
 							img->put.contig = putRGBUAcontig16bittile;
 					}
-					else
+					else if( img->samplesperpixel >=3 )
 					{
 						if (BuildMapBitdepth16To8(img))
 							img->put.contig = putRGBcontig16bittile;
@@ -2539,7 +2582,7 @@ PickContigCase(TIFFRGBAImage* img)
 			}
 			break;
 		case PHOTOMETRIC_SEPARATED:
-			if (buildMap(img)) {
+			if (img->samplesperpixel >=4 && buildMap(img)) {
 				if (img->bitspersample == 8) {
 					if (!img->Map)
 						img->put.contig = putRGBcontig8bitCMYKtile;
@@ -2635,7 +2678,7 @@ PickContigCase(TIFFRGBAImage* img)
 			}
 			break;
 		case PHOTOMETRIC_CIELAB:
-			if (buildMap(img)) {
+			if (img->samplesperpixel == 3 && buildMap(img)) {
 				if (img->bitspersample == 8)
 					img->put.contig = initCIELabConversion(img);
 				break;
@@ -2736,7 +2779,7 @@ BuildMapUaToAa(TIFFRGBAImage* img)
 	for (na=0; na<256; na++)
 	{
 		for (nv=0; nv<256; nv++)
-			*m++=(nv*na+127)/255;
+			*m++=(uint8)((nv*na+127)/255);
 	}
 	return(1);
 }
@@ -2756,7 +2799,7 @@ BuildMapBitdepth16To8(TIFFRGBAImage* img)
 	}
 	m=img->Bitdepth16To8;
 	for (n=0; n<65536; n++)
-		*m++=(n+128)/257;
+		*m++=(uint8)((n+128)/257);
 	return(1);
 }
 
diff --git a/src/3rdparty/libtiff/libtiff/tif_jpeg.c b/src/3rdparty/libtiff/libtiff/tif_jpeg.c
index a970eb1..70b72d8 100644
--- a/src/3rdparty/libtiff/libtiff/tif_jpeg.c
+++ b/src/3rdparty/libtiff/libtiff/tif_jpeg.c
@@ -1,4 +1,4 @@
-/* $Id: tif_jpeg.c,v 1.119 2015-08-15 20:13:07 bfriesen Exp $ */
+/* $Id: tif_jpeg.c,v 1.123 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1994-1997 Sam Leffler
@@ -58,7 +58,7 @@ int TIFFReInitJPEG_12( TIFF *tif, int scheme, int is_encode );
   Libjpeg's jmorecfg.h defines INT16 and INT32, but only if XMD_H is
   not defined.  Unfortunately, the MinGW and Borland compilers include
   a typedef for INT32, which causes a conflict.  MSVC does not include
-  a conficting typedef given the headers which are included.
+  a conflicting typedef given the headers which are included.
 */
 #if defined(__BORLANDC__) || defined(__MINGW32__)
 # define XMD_H 1
@@ -937,7 +937,7 @@ JPEGFixupTagsSubsamplingSkip(struct JPEGFixupTagsSubsamplingData* data, uint16 s
 	else
 	{
 		uint16 m;
-		m=skiplength-data->bufferbytesleft;
+		m=(uint16)(skiplength-data->bufferbytesleft);
 		if (m<=data->filebytesleft)
 		{
 			data->bufferbytesleft=0;
@@ -1283,7 +1283,7 @@ JPEGDecode(TIFF* tif, uint8* buf, tmsize_t cc, uint16 s)
                        if( line_work_buf != NULL )
                        {
                                /*
-                                * In the MK1 case, we aways read into a 16bit
+                                * In the MK1 case, we always read into a 16bit
                                 * buffer, and then pack down to 12bit or 8bit.
                                 * In 6B case we only read into 16 bit buffer
                                 * for 12bit data, which we need to repack.
@@ -1303,10 +1303,10 @@ JPEGDecode(TIFF* tif, uint8* buf, tmsize_t cc, uint16 s)
                                                        ((unsigned char *) buf) + iPair * 3;
                                                JSAMPLE *in_ptr = line_work_buf + iPair * 2;
 
-                                               out_ptr[0] = (in_ptr[0] & 0xff0) >> 4;
-                                               out_ptr[1] = ((in_ptr[0] & 0xf) << 4)
-                                                       | ((in_ptr[1] & 0xf00) >> 8);
-                                               out_ptr[2] = ((in_ptr[1] & 0xff) >> 0);
+                                               out_ptr[0] = (unsigned char)((in_ptr[0] & 0xff0) >> 4);
+                                               out_ptr[1] = (unsigned char)(((in_ptr[0] & 0xf) << 4)
+                                                       | ((in_ptr[1] & 0xf00) >> 8));
+                                               out_ptr[2] = (unsigned char)(((in_ptr[1] & 0xff) >> 0));
                                        }
                                }
                                else if( sp->cinfo.d.data_precision == 8 )
@@ -1367,7 +1367,7 @@ JPEGDecodeRaw(TIFF* tif, uint8* buf, tmsize_t cc, uint16 s)
 	(void) s;
 
 	/* data is expected to be read in multiples of a scanline */
-	if ( (nrows = sp->cinfo.d.image_height) ) {
+	if ( (nrows = sp->cinfo.d.image_height) != 0 ) {
 
 		/* Cb,Cr both have sampling factors 1, so this is correct */
 		JDIMENSION clumps_per_line = sp->cinfo.d.comp_info[1].downsampled_width;            
@@ -1467,10 +1467,10 @@ JPEGDecodeRaw(TIFF* tif, uint8* buf, tmsize_t cc, uint16 s)
 					{
 						unsigned char *out_ptr = ((unsigned char *) buf) + iPair * 3;
 						JSAMPLE *in_ptr = (JSAMPLE *) (tmpbuf + iPair * 2);
-						out_ptr[0] = (in_ptr[0] & 0xff0) >> 4;
-						out_ptr[1] = ((in_ptr[0] & 0xf) << 4)
-							| ((in_ptr[1] & 0xf00) >> 8);
-						out_ptr[2] = ((in_ptr[1] & 0xff) >> 0);
+						out_ptr[0] = (unsigned char)((in_ptr[0] & 0xff0) >> 4);
+						out_ptr[1] = (unsigned char)(((in_ptr[0] & 0xf) << 4)
+							| ((in_ptr[1] & 0xf00) >> 8));
+						out_ptr[2] = (unsigned char)(((in_ptr[1] & 0xff) >> 0));
 					}
 				}
 			}
@@ -1893,7 +1893,7 @@ JPEGEncode(TIFF* tif, uint8* buf, tmsize_t cc, uint16 s)
 
         if( sp->cinfo.c.data_precision == 12 )
         {
-            line16_count = (sp->bytesperline * 2) / 3;
+            line16_count = (int)((sp->bytesperline * 2) / 3);
             line16 = (short *) _TIFFmalloc(sizeof(short) * line16_count);
             if (!line16)
             {
@@ -2138,8 +2138,7 @@ JPEGVSetField(TIFF* tif, uint32 tag, va_list ap)
 			/* XXX */
 			return (0);
 		}
-		_TIFFsetByteArray(&sp->jpegtables, va_arg(ap, void*),
-		    (long) v32);
+		_TIFFsetByteArray(&sp->jpegtables, va_arg(ap, void*), v32);
 		sp->jpegtables_length = v32;
 		TIFFSetFieldBit(tif, FIELD_JPEGTABLES);
 		break;
@@ -2168,7 +2167,7 @@ JPEGVSetField(TIFF* tif, uint32 tag, va_list ap)
 		return (*sp->vsetparent)(tif, tag, ap);
 	}
 
-	if ((fip = TIFFFieldWithTag(tif, tag))) {
+	if ((fip = TIFFFieldWithTag(tif, tag)) != NULL) {
 		TIFFSetFieldBit(tif, fip->field_bit);
 	} else {
 		return (0);
diff --git a/src/3rdparty/libtiff/libtiff/tif_jpeg_12.c b/src/3rdparty/libtiff/libtiff/tif_jpeg_12.c
index 87aaa19..8499e64 100644
--- a/src/3rdparty/libtiff/libtiff/tif_jpeg_12.c
+++ b/src/3rdparty/libtiff/libtiff/tif_jpeg_12.c
@@ -5,6 +5,9 @@
 
 #  define TIFFInitJPEG TIFFInitJPEG_12
 
+int
+TIFFInitJPEG_12(TIFF* tif, int scheme);
+
 #  include LIBJPEG_12_PATH
 
 #  include "tif_jpeg.c"
diff --git a/src/3rdparty/libtiff/libtiff/tif_luv.c b/src/3rdparty/libtiff/libtiff/tif_luv.c
index 4e328ba..ca08f30 100644
--- a/src/3rdparty/libtiff/libtiff/tif_luv.c
+++ b/src/3rdparty/libtiff/libtiff/tif_luv.c
@@ -1,4 +1,4 @@
-/* $Id: tif_luv.c,v 1.40 2015-06-21 01:09:09 bfriesen Exp $ */
+/* $Id: tif_luv.c,v 1.43 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1997 Greg Ward Larson
@@ -202,7 +202,11 @@ LogL16Decode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	if (sp->user_datafmt == SGILOGDATAFMT_16BIT)
 		tp = (int16*) op;
 	else {
-		assert(sp->tbuflen >= npixels);
+		if(sp->tbuflen < npixels) {
+			TIFFErrorExt(tif->tif_clientdata, module,
+						 "Translation buffer too short");
+			return (0);
+		}
 		tp = (int16*) sp->tbuf;
 	}
 	_TIFFmemset((void*) tp, 0, npixels*sizeof (tp[0]));
@@ -211,9 +215,11 @@ LogL16Decode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	cc = tif->tif_rawcc;
 	/* get each byte string */
 	for (shft = 2*8; (shft -= 8) >= 0; ) {
-		for (i = 0; i < npixels && cc > 0; )
+		for (i = 0; i < npixels && cc > 0; ) {
 			if (*bp >= 128) {		/* run */
-				rc = *bp++ + (2-128);   /* TODO: potential input buffer overrun when decoding corrupt or truncated data */
+				if( cc < 2 )
+					break;
+				rc = *bp++ + (2-128);
 				b = (int16)(*bp++ << shft);
 				cc -= 2;
 				while (rc-- && i < npixels)
@@ -223,6 +229,7 @@ LogL16Decode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 				while (--cc && rc-- && i < npixels)
 					tp[i++] |= (int16)*bp++ << shft;
 			}
+		}
 		if (i != npixels) {
 #if defined(__WIN32__) && (defined(_MSC_VER) || defined(__MINGW32__))
 			TIFFErrorExt(tif->tif_clientdata, module,
@@ -268,13 +275,17 @@ LogLuvDecode24(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	if (sp->user_datafmt == SGILOGDATAFMT_RAW)
 		tp = (uint32 *)op;
 	else {
-		assert(sp->tbuflen >= npixels);
+		if(sp->tbuflen < npixels) {
+			TIFFErrorExt(tif->tif_clientdata, module,
+						 "Translation buffer too short");
+			return (0);
+		}
 		tp = (uint32 *) sp->tbuf;
 	}
 	/* copy to array of uint32 */
 	bp = (unsigned char*) tif->tif_rawcp;
 	cc = tif->tif_rawcc;
-	for (i = 0; i < npixels && cc > 0; i++) {
+	for (i = 0; i < npixels && cc >= 3; i++) {
 		tp[i] = bp[0] << 16 | bp[1] << 8 | bp[2];
 		bp += 3;
 		cc -= 3;
@@ -325,7 +336,11 @@ LogLuvDecode32(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	if (sp->user_datafmt == SGILOGDATAFMT_RAW)
 		tp = (uint32*) op;
 	else {
-		assert(sp->tbuflen >= npixels);
+		if(sp->tbuflen < npixels) {
+			TIFFErrorExt(tif->tif_clientdata, module,
+						 "Translation buffer too short");
+			return (0);
+		}
 		tp = (uint32*) sp->tbuf;
 	}
 	_TIFFmemset((void*) tp, 0, npixels*sizeof (tp[0]));
@@ -334,11 +349,13 @@ LogLuvDecode32(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	cc = tif->tif_rawcc;
 	/* get each byte string */
 	for (shft = 4*8; (shft -= 8) >= 0; ) {
-		for (i = 0; i < npixels && cc > 0; )
+		for (i = 0; i < npixels && cc > 0; ) {
 			if (*bp >= 128) {		/* run */
+				if( cc < 2 )
+					break;
 				rc = *bp++ + (2-128);
 				b = (uint32)*bp++ << shft;
-				cc -= 2;                /* TODO: potential input buffer overrun when decoding corrupt or truncated data */
+				cc -= 2;
 				while (rc-- && i < npixels)
 					tp[i++] |= b;
 			} else {			/* non-run */
@@ -346,6 +363,7 @@ LogLuvDecode32(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 				while (--cc && rc-- && i < npixels)
 					tp[i++] |= (uint32)*bp++ << shft;
 			}
+		}
 		if (i != npixels) {
 #if defined(__WIN32__) && (defined(_MSC_VER) || defined(__MINGW32__))
 			TIFFErrorExt(tif->tif_clientdata, module,
@@ -383,8 +401,10 @@ LogLuvDecodeStrip(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
                 return 0;
 
 	assert(cc%rowlen == 0);
-	while (cc && (*tif->tif_decoderow)(tif, bp, rowlen, s))
-		bp += rowlen, cc -= rowlen;
+	while (cc && (*tif->tif_decoderow)(tif, bp, rowlen, s)) {
+		bp += rowlen;
+		cc -= rowlen;
+	}
 	return (cc == 0);
 }
 
@@ -402,8 +422,10 @@ LogLuvDecodeTile(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
                 return 0;
 
 	assert(cc%rowlen == 0);
-	while (cc && (*tif->tif_decoderow)(tif, bp, rowlen, s))
-		bp += rowlen, cc -= rowlen;
+	while (cc && (*tif->tif_decoderow)(tif, bp, rowlen, s)) {
+		bp += rowlen;
+		cc -= rowlen;
+	}
 	return (cc == 0);
 }
 
@@ -413,6 +435,7 @@ LogLuvDecodeTile(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 static int
 LogL16Encode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 {
+	static const char module[] = "LogL16Encode";
 	LogLuvState* sp = EncoderState(tif);
 	int shft;
 	tmsize_t i;
@@ -433,7 +456,11 @@ LogL16Encode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 		tp = (int16*) bp;
 	else {
 		tp = (int16*) sp->tbuf;
-		assert(sp->tbuflen >= npixels);
+		if(sp->tbuflen < npixels) {
+			TIFFErrorExt(tif->tif_clientdata, module,
+						 "Translation buffer too short");
+			return (0);
+		}
 		(*sp->tfunc)(sp, bp, npixels);
 	}
 	/* compress each byte string */
@@ -506,6 +533,7 @@ LogL16Encode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 static int
 LogLuvEncode24(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 {
+	static const char module[] = "LogLuvEncode24";
 	LogLuvState* sp = EncoderState(tif);
 	tmsize_t i;
 	tmsize_t npixels;
@@ -521,7 +549,11 @@ LogLuvEncode24(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 		tp = (uint32*) bp;
 	else {
 		tp = (uint32*) sp->tbuf;
-		assert(sp->tbuflen >= npixels);
+		if(sp->tbuflen < npixels) {
+			TIFFErrorExt(tif->tif_clientdata, module,
+						 "Translation buffer too short");
+			return (0);
+		}
 		(*sp->tfunc)(sp, bp, npixels);
 	}
 	/* write out encoded pixels */
@@ -553,6 +585,7 @@ LogLuvEncode24(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 static int
 LogLuvEncode32(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 {
+	static const char module[] = "LogLuvEncode32";
 	LogLuvState* sp = EncoderState(tif);
 	int shft;
 	tmsize_t i;
@@ -574,7 +607,11 @@ LogLuvEncode32(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 		tp = (uint32*) bp;
 	else {
 		tp = (uint32*) sp->tbuf;
-		assert(sp->tbuflen >= npixels);
+		if(sp->tbuflen < npixels) {
+			TIFFErrorExt(tif->tif_clientdata, module,
+						 "Translation buffer too short");
+			return (0);
+		}
 		(*sp->tfunc)(sp, bp, npixels);
 	}
 	/* compress each byte string */
@@ -654,8 +691,10 @@ LogLuvEncodeStrip(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
                 return 0;
 
 	assert(cc%rowlen == 0);
-	while (cc && (*tif->tif_encoderow)(tif, bp, rowlen, s) == 1)
-		bp += rowlen, cc -= rowlen;
+	while (cc && (*tif->tif_encoderow)(tif, bp, rowlen, s) == 1) {
+		bp += rowlen;
+		cc -= rowlen;
+	}
 	return (cc == 0);
 }
 
@@ -672,8 +711,10 @@ LogLuvEncodeTile(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
                 return 0;
 
 	assert(cc%rowlen == 0);
-	while (cc && (*tif->tif_encoderow)(tif, bp, rowlen, s) == 1)
-		bp += rowlen, cc -= rowlen;
+	while (cc && (*tif->tif_encoderow)(tif, bp, rowlen, s) == 1) {
+		bp += rowlen;
+		cc -= rowlen;
+	}
 	return (cc == 0);
 }
 
@@ -1243,6 +1284,14 @@ LogL16InitState(TIFF* tif)
 	assert(sp != NULL);
 	assert(td->td_photometric == PHOTOMETRIC_LOGL);
 
+	if( td->td_samplesperpixel != 1 )
+	{
+		TIFFErrorExt(tif->tif_clientdata, module,
+		             "Sorry, can not handle LogL image with %s=%d",
+			     "Samples/pixel", td->td_samplesperpixel);
+		return 0;
+	}
+
 	/* for some reason, we can't do this in TIFFInitLogL16 */
 	if (sp->user_datafmt == SGILOGDATAFMT_UNKNOWN)
 		sp->user_datafmt = LogL16GuessDataFmt(td);
@@ -1565,17 +1614,21 @@ LogLuvVSetField(TIFF* tif, uint32 tag, va_list ap)
 		 */
 		switch (sp->user_datafmt) {
 		case SGILOGDATAFMT_FLOAT:
-			bps = 32, fmt = SAMPLEFORMAT_IEEEFP;
+			bps = 32;
+			fmt = SAMPLEFORMAT_IEEEFP;
 			break;
 		case SGILOGDATAFMT_16BIT:
-			bps = 16, fmt = SAMPLEFORMAT_INT;
+			bps = 16;
+			fmt = SAMPLEFORMAT_INT;
 			break;
 		case SGILOGDATAFMT_RAW:
-			bps = 32, fmt = SAMPLEFORMAT_UINT;
+			bps = 32;
+			fmt = SAMPLEFORMAT_UINT;
 			TIFFSetField(tif, TIFFTAG_SAMPLESPERPIXEL, 1);
 			break;
 		case SGILOGDATAFMT_8BIT:
-			bps = 8, fmt = SAMPLEFORMAT_UINT;
+			bps = 8;
+			fmt = SAMPLEFORMAT_UINT;
 			break;
 		default:
 			TIFFErrorExt(tif->tif_clientdata, tif->tif_name,
diff --git a/src/3rdparty/libtiff/libtiff/tif_lzma.c b/src/3rdparty/libtiff/libtiff/tif_lzma.c
index dedf1d9..80fc394 100644
--- a/src/3rdparty/libtiff/libtiff/tif_lzma.c
+++ b/src/3rdparty/libtiff/libtiff/tif_lzma.c
@@ -1,4 +1,4 @@
-/* $Id: tif_lzma.c,v 1.4 2011-12-22 00:29:29 bfriesen Exp $ */
+/* $Id: tif_lzma.c,v 1.6 2016-09-17 09:18:59 erouault Exp $ */
 
 /*
  * Copyright (c) 2010, Andrey Kiselev <dron@ak4719.spb.edu>
@@ -95,7 +95,7 @@ LZMAStrerror(lzma_ret ret)
 		case LZMA_PROG_ERROR:
 		    return "programming error";
 		default:
-		    return "unindentified liblzma error";
+		    return "unidentified liblzma error";
 	}
 }
 
@@ -490,6 +490,6 @@ bad:
 		     "No space for LZMA2 state block");
 	return 0;
 }
-#endif /* LZMA_SUPORT */
+#endif /* LZMA_SUPPORT */
 
 /* vim: set ts=8 sts=8 sw=8 noet: */
diff --git a/src/3rdparty/libtiff/libtiff/tif_lzw.c b/src/3rdparty/libtiff/libtiff/tif_lzw.c
index 9b76dd0..240e19c 100644
--- a/src/3rdparty/libtiff/libtiff/tif_lzw.c
+++ b/src/3rdparty/libtiff/libtiff/tif_lzw.c
@@ -1,4 +1,4 @@
-/* $Id: tif_lzw.c,v 1.49 2015-08-30 21:07:44 erouault Exp $ */
+/* $Id: tif_lzw.c,v 1.52 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -240,8 +240,8 @@ LZWSetupDecode(TIFF* tif)
 		 */
 		code = 255;
 		do {
-			sp->dec_codetab[code].value = code;
-			sp->dec_codetab[code].firstchar = code;
+			sp->dec_codetab[code].value = (unsigned char)code;
+			sp->dec_codetab[code].firstchar = (unsigned char)code;
 			sp->dec_codetab[code].length = 1;
 			sp->dec_codetab[code].next = NULL;
 		} while (code--);
@@ -411,14 +411,15 @@ LZWDecode(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 		/*
 		 * Residue satisfies only part of the decode request.
 		 */
-		op += residue, occ -= residue;
+		op += residue;
+		occ -= residue;
 		tp = op;
 		do {
 			int t;
 			--tp;
 			t = codep->value;
 			codep = codep->next;
-			*tp = t;
+			*tp = (char)t;
 		} while (--residue && codep);
 		sp->dec_restart = 0;
 	}
@@ -454,7 +455,8 @@ LZWDecode(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 					     tif->tif_row);
 				return (0);
 			}
-			*op++ = (char)code, occ--;
+			*op++ = (char)code;
+			occ--;
 			oldcodep = sp->dec_codetab + code;
 			continue;
 		}
@@ -532,16 +534,19 @@ LZWDecode(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 				--tp;
 				t = codep->value;
 				codep = codep->next;
-				*tp = t;
+				*tp = (char)t;
 			} while (codep && tp > op);
 			if (codep) {
 			    codeLoop(tif, module);
 			    break;
 			}
 			assert(occ >= len);
-			op += len, occ -= len;
-		} else
-			*op++ = (char)code, occ--;
+			op += len;
+			occ -= len;
+		} else {
+			*op++ = (char)code;
+			occ--;
+		}
 	}
 
 	tif->tif_rawcp = (uint8*) bp;
@@ -635,7 +640,8 @@ LZWDecodeCompat(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 		/*
 		 * Residue satisfies only part of the decode request.
 		 */
-		op += residue, occ -= residue;
+		op += residue;
+		occ -= residue;
 		tp = op;
 		do {
 			*--tp = codep->value;
@@ -675,7 +681,8 @@ LZWDecodeCompat(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 					     tif->tif_row);
 				return (0);
 			}
-			*op++ = code, occ--;
+			*op++ = (char)code;
+			occ--;
 			oldcodep = sp->dec_codetab + code;
 			continue;
 		}
@@ -741,17 +748,20 @@ LZWDecodeCompat(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 				break;
 			}
 			assert(occ >= codep->length);
-			op += codep->length, occ -= codep->length;
+			op += codep->length;
+			occ -= codep->length;
 			tp = op;
 			do {
 				*--tp = codep->value;
 			} while( (codep = codep->next) != NULL );
-		} else
-			*op++ = code, occ--;
+		} else {
+			*op++ = (char)code;
+			occ--;
+		}
 	}
 
 	tif->tif_rawcp = (uint8*) bp;
-	sp->lzw_nbits = nbits;
+	sp->lzw_nbits = (unsigned short)nbits;
 	sp->lzw_nextdata = nextdata;
 	sp->lzw_nextbits = nextbits;
 	sp->dec_nbitsmask = nbitsmask;
@@ -900,7 +910,7 @@ LZWEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	nbits = sp->lzw_nbits;
 	op = tif->tif_rawcp;
 	limit = sp->enc_rawlimit;
-	ent = sp->enc_oldcode;
+	ent = (hcode_t)sp->enc_oldcode;
 
 	if (ent == (hcode_t) -1 && cc > 0) {
 		/*
@@ -936,7 +946,7 @@ LZWEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 				disp = 1;
 			do {
 				/*
-				 * Avoid pointer arithmetic 'cuz of
+				 * Avoid pointer arithmetic because of
 				 * wraparound problems with segments.
 				 */
 				if ((h -= disp) < 0)
@@ -963,8 +973,8 @@ LZWEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 			op = tif->tif_rawdata;
 		}
 		PutNextCode(op, ent);
-		ent = c;
-		hp->code = free_ent++;
+		ent = (hcode_t)c;
+		hp->code = (hcode_t)(free_ent++);
 		hp->hash = fcode;
 		if (free_ent == CODE_MAX-1) {
 			/* table is full, emit clear code and reset */
@@ -1021,9 +1031,9 @@ LZWEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	sp->enc_oldcode = ent;
 	sp->lzw_nextdata = nextdata;
 	sp->lzw_nextbits = nextbits;
-	sp->lzw_free_ent = free_ent;
-	sp->lzw_maxcode = maxcode;
-	sp->lzw_nbits = nbits;
+	sp->lzw_free_ent = (unsigned short)free_ent;
+	sp->lzw_maxcode = (unsigned short)maxcode;
+	sp->lzw_nbits = (unsigned short)nbits;
 	tif->tif_rawcp = op;
 	return (1);
 }
diff --git a/src/3rdparty/libtiff/libtiff/tif_next.c b/src/3rdparty/libtiff/libtiff/tif_next.c
index 17e0311..0821178 100644
--- a/src/3rdparty/libtiff/libtiff/tif_next.c
+++ b/src/3rdparty/libtiff/libtiff/tif_next.c
@@ -1,4 +1,4 @@
-/* $Id: tif_next.c,v 1.16 2014-12-29 12:09:11 erouault Exp $ */
+/* $Id: tif_next.c,v 1.19 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -37,7 +37,7 @@
 	case 0:	op[0]  = (unsigned char) ((v) << 6); break;	\
 	case 1:	op[0] |= (v) << 4; break;	\
 	case 2:	op[0] |= (v) << 2; break;	\
-	case 3:	*op++ |= (v);	   break;	\
+	case 3:	*op++ |= (v);	   op_offset++; break;	\
 	}					\
 }
 
@@ -72,7 +72,8 @@ NeXTDecode(TIFF* tif, uint8* buf, tmsize_t occ, uint16 s)
 		return (0);
 	}
 	for (row = buf; cc > 0 && occ > 0; occ -= scanline, row += scanline) {
-		n = *bp++, cc--;
+		n = *bp++;
+		cc--;
 		switch (n) {
 		case LITERALROW:
 			/*
@@ -103,6 +104,7 @@ NeXTDecode(TIFF* tif, uint8* buf, tmsize_t occ, uint16 s)
 		}
 		default: {
 			uint32 npixels = 0, grey;
+			tmsize_t op_offset = 0;
 			uint32 imagewidth = tif->tif_dir.td_imagewidth;
             if( isTiled(tif) )
                 imagewidth = tif->tif_dir.td_tilewidth;
@@ -122,13 +124,19 @@ NeXTDecode(TIFF* tif, uint8* buf, tmsize_t occ, uint16 s)
 				 * bounds, potentially resulting in a security
 				 * issue.
 				 */
-				while (n-- > 0 && npixels < imagewidth)
+				while (n-- > 0 && npixels < imagewidth && op_offset < scanline)
 					SETPIXEL(op, grey);
 				if (npixels >= imagewidth)
 					break;
+                if (op_offset >= scanline ) {
+                    TIFFErrorExt(tif->tif_clientdata, module, "Invalid data for scanline %ld",
+                        (long) tif->tif_row);
+                    return (0);
+                }
 				if (cc == 0)
 					goto bad;
-				n = *bp++, cc--;
+				n = *bp++;
+				cc--;
 			}
 			break;
 		}
diff --git a/src/3rdparty/libtiff/libtiff/tif_ojpeg.c b/src/3rdparty/libtiff/libtiff/tif_ojpeg.c
index 23d4062..30a1812 100644
--- a/src/3rdparty/libtiff/libtiff/tif_ojpeg.c
+++ b/src/3rdparty/libtiff/libtiff/tif_ojpeg.c
@@ -1,4 +1,4 @@
-/* $Id: tif_ojpeg.c,v 1.60 2015-05-31 00:38:46 bfriesen Exp $ */
+/* $Id: tif_ojpeg.c,v 1.65 2016-09-04 21:32:56 erouault Exp $ */
 
 /* WARNING: The type of JPEG encapsulation defined by the TIFF Version 6.0
    specification is now totally obsolete and deprecated for new applications and
@@ -75,7 +75,7 @@
    OJPEGSubsamplingCorrect, making no note of any other data, reporting no warnings
    or errors, up to the point where either these values are read, or it's clear they
    aren't there. This means that some of the data is read twice, but we feel speed
-   in correcting these values is important enough to warrant this sacrifice. Allthough
+   in correcting these values is important enough to warrant this sacrifice. Although
    there is currently no define or other configuration mechanism to disable this behaviour,
    the actual header scanning is build to robustly respond with error report if it
    should encounter an uncorrected mismatch of subsampling values. See
@@ -84,7 +84,7 @@
    The restart interval and restart markers are the most tricky part... The restart
    interval can be specified in a tag. It can also be set inside the input JPEG stream.
    It can be used inside the input JPEG stream. If reading from strile data, we've
-   consistenly discovered the need to insert restart markers in between the different
+   consistently discovered the need to insert restart markers in between the different
    striles, as is also probably the most likely interpretation of the original TIFF 6.0
    specification. With all this setting of interval, and actual use of markers that is not
    predictable at the time of valid JPEG header assembly, the restart thing may turn
@@ -113,7 +113,7 @@
    planarconfig is not separate (vast majority). We may one day use that to build
    converters to JPEG, and/or to new-style JPEG compression inside TIFF.
 
-   A dissadvantage is the lack of random access to the individual striles. This is the
+   A disadvantage is the lack of random access to the individual striles. This is the
    reason for much of the complicated restart-and-position stuff inside OJPEGPreDecode.
    Applications would do well accessing all striles in order, as this will result in
    a single sequential scan of the input stream, and no restarting of LibJpeg decoding
@@ -135,7 +135,7 @@
  * 	The default mode, without JPEG_ENCAP_EXTERNAL, implements the call encapsulators
  * 	here, internally, with normal longjump.
  * SETJMP, LONGJMP, JMP_BUF: On some machines/environments a longjump equivalent is
- * 	conviniently available, but still it may be worthwhile to use _setjmp or sigsetjmp
+ * 	conveniently available, but still it may be worthwhile to use _setjmp or sigsetjmp
  * 	in place of plain setjmp. These macros will make it easier. It is useless
  * 	to fiddle with these if you define JPEG_ENCAP_EXTERNAL.
  * OJPEG_BUFFER: Define the size of the desired buffer here. Should be small enough so as to guarantee
@@ -200,7 +200,7 @@ static const TIFFField ojpegFields[] = {
   Libjpeg's jmorecfg.h defines INT16 and INT32, but only if XMD_H is
   not defined.  Unfortunately, the MinGW and Borland compilers include
   a typedef for INT32, which causes a conflict.  MSVC does not include
-  a conficting typedef given the headers which are included.
+  a conflicting typedef given the headers which are included.
 */
 #if defined(__BORLANDC__) || defined(__MINGW32__)
 # define XMD_H 1
@@ -1081,7 +1081,7 @@ OJPEGReadHeaderInfo(TIFF* tif)
 			TIFFErrorExt(tif->tif_clientdata,module,"Incompatible vertical subsampling and image strip/tile length");
 			return(0);
 		}
-		sp->restart_interval=((sp->strile_width+sp->subsampling_hor*8-1)/(sp->subsampling_hor*8))*(sp->strile_length/(sp->subsampling_ver*8));
+		sp->restart_interval=(uint16)(((sp->strile_width+sp->subsampling_hor*8-1)/(sp->subsampling_hor*8))*(sp->strile_length/(sp->subsampling_ver*8)));
 	}
 	if (OJPEGReadHeaderInfoSec(tif)==0)
 		return(0);
@@ -1103,7 +1103,7 @@ OJPEGReadSecondarySos(TIFF* tif, uint16 s)
 	assert(s<3);
 	assert(sp->sos_end[0].log!=0);
 	assert(sp->sos_end[s].log==0);
-	sp->plane_sample_offset=s-1;
+	sp->plane_sample_offset=(uint8)(s-1);
 	while(sp->sos_end[sp->plane_sample_offset].log==0)
 		sp->plane_sample_offset--;
 	sp->in_buffer_source=sp->sos_end[sp->plane_sample_offset].in_buffer_source;
@@ -1381,7 +1381,8 @@ OJPEGReadHeaderInfoSec(TIFF* tif)
 static int
 OJPEGReadHeaderInfoSecStreamDri(TIFF* tif)
 {
-	/* this could easilly cause trouble in some cases... but no such cases have occured sofar */
+	/* This could easily cause trouble in some cases... but no such cases have
+           occurred so far */
 	static const char module[]="OJPEGReadHeaderInfoSecStreamDri";
 	OJPEGState* sp=(OJPEGState*)tif->tif_data;
 	uint16 m;
@@ -1779,7 +1780,7 @@ OJPEGReadHeaderInfoSecTablesQTable(TIFF* tif)
 			ob[sizeof(uint32)+3]=67;
 			ob[sizeof(uint32)+4]=m;
 			TIFFSeekFile(tif,sp->qtable_offset[m],SEEK_SET); 
-			p=TIFFReadFile(tif,&ob[sizeof(uint32)+5],64);
+			p=(uint32)TIFFReadFile(tif,&ob[sizeof(uint32)+5],64);
 			if (p!=64)
 				return(0);
 			sp->qtable[m]=ob;
@@ -1822,7 +1823,7 @@ OJPEGReadHeaderInfoSecTablesDcTable(TIFF* tif)
 				}
 			}
 			TIFFSeekFile(tif,sp->dctable_offset[m],SEEK_SET);
-			p=TIFFReadFile(tif,o,16);
+			p=(uint32)TIFFReadFile(tif,o,16);
 			if (p!=16)
 				return(0);
 			q=0;
@@ -1838,12 +1839,12 @@ OJPEGReadHeaderInfoSecTablesDcTable(TIFF* tif)
 			*(uint32*)rb=ra;
 			rb[sizeof(uint32)]=255;
 			rb[sizeof(uint32)+1]=JPEG_MARKER_DHT;
-			rb[sizeof(uint32)+2]=((19+q)>>8);
+			rb[sizeof(uint32)+2]=(uint8)((19+q)>>8);
 			rb[sizeof(uint32)+3]=((19+q)&255);
 			rb[sizeof(uint32)+4]=m;
 			for (n=0; n<16; n++)
 				rb[sizeof(uint32)+5+n]=o[n];
-			p=TIFFReadFile(tif,&(rb[sizeof(uint32)+21]),q);
+			p=(uint32)TIFFReadFile(tif,&(rb[sizeof(uint32)+21]),q);
 			if (p!=q)
 				return(0);
 			sp->dctable[m]=rb;
@@ -1886,7 +1887,7 @@ OJPEGReadHeaderInfoSecTablesAcTable(TIFF* tif)
 				}
 			}
 			TIFFSeekFile(tif,sp->actable_offset[m],SEEK_SET);  
-			p=TIFFReadFile(tif,o,16);
+			p=(uint32)TIFFReadFile(tif,o,16);
 			if (p!=16)
 				return(0);
 			q=0;
@@ -1902,12 +1903,12 @@ OJPEGReadHeaderInfoSecTablesAcTable(TIFF* tif)
 			*(uint32*)rb=ra;
 			rb[sizeof(uint32)]=255;
 			rb[sizeof(uint32)+1]=JPEG_MARKER_DHT;
-			rb[sizeof(uint32)+2]=((19+q)>>8);
+			rb[sizeof(uint32)+2]=(uint8)((19+q)>>8);
 			rb[sizeof(uint32)+3]=((19+q)&255);
 			rb[sizeof(uint32)+4]=(16|m);
 			for (n=0; n<16; n++)
 				rb[sizeof(uint32)+5+n]=o[n];
-			p=TIFFReadFile(tif,&(rb[sizeof(uint32)+21]),q);
+			p=(uint32)TIFFReadFile(tif,&(rb[sizeof(uint32)+21]),q);
 			if (p!=q)
 				return(0);
 			sp->actable[m]=rb;
@@ -2269,10 +2270,10 @@ OJPEGWriteStreamSof(TIFF* tif, void** mem, uint32* len)
 	/* P */
 	sp->out_buffer[4]=8;
 	/* Y */
-	sp->out_buffer[5]=(sp->sof_y>>8);
+	sp->out_buffer[5]=(uint8)(sp->sof_y>>8);
 	sp->out_buffer[6]=(sp->sof_y&255);
 	/* X */
-	sp->out_buffer[7]=(sp->sof_x>>8);
+	sp->out_buffer[7]=(uint8)(sp->sof_x>>8);
 	sp->out_buffer[8]=(sp->sof_x&255);
 	/* Nf */
 	sp->out_buffer[9]=sp->samples_per_pixel_per_plane;
@@ -2385,7 +2386,12 @@ OJPEGWriteStreamEoi(TIFF* tif, void** mem, uint32* len)
 static int
 jpeg_create_decompress_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo)
 {
-	return(SETJMP(sp->exit_jmpbuf)?0:(jpeg_create_decompress(cinfo),1));
+	if( SETJMP(sp->exit_jmpbuf) )
+		return 0;
+	else {
+		jpeg_create_decompress(cinfo);
+		return 1;
+	}
 }
 #endif
 
@@ -2393,7 +2399,12 @@ jpeg_create_decompress_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo)
 static int
 jpeg_read_header_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo, uint8 require_image)
 {
-	return(SETJMP(sp->exit_jmpbuf)?0:(jpeg_read_header(cinfo,require_image),1));
+	if( SETJMP(sp->exit_jmpbuf) )
+		return 0;
+	else {
+		jpeg_read_header(cinfo,require_image);
+		return 1;
+	}
 }
 #endif
 
@@ -2401,7 +2412,12 @@ jpeg_read_header_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo, uint8 requ
 static int
 jpeg_start_decompress_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo)
 {
-	return(SETJMP(sp->exit_jmpbuf)?0:(jpeg_start_decompress(cinfo),1));
+	if( SETJMP(sp->exit_jmpbuf) )
+		return 0;
+	else {
+		jpeg_start_decompress(cinfo);
+		return 1;
+	}
 }
 #endif
 
@@ -2409,7 +2425,12 @@ jpeg_start_decompress_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo)
 static int
 jpeg_read_scanlines_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo, void* scanlines, uint32 max_lines)
 {
-	return(SETJMP(sp->exit_jmpbuf)?0:(jpeg_read_scanlines(cinfo,scanlines,max_lines),1));
+	if( SETJMP(sp->exit_jmpbuf) )
+		return 0;
+	else {
+		jpeg_read_scanlines(cinfo,scanlines,max_lines);
+		return 1;
+	}
 }
 #endif
 
@@ -2417,7 +2438,12 @@ jpeg_read_scanlines_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo, void* s
 static int
 jpeg_read_raw_data_encap(OJPEGState* sp, jpeg_decompress_struct* cinfo, void* data, uint32 max_lines)
 {
-	return(SETJMP(sp->exit_jmpbuf)?0:(jpeg_read_raw_data(cinfo,data,max_lines),1));
+	if( SETJMP(sp->exit_jmpbuf) )
+		return 0;
+	else {
+		jpeg_read_raw_data(cinfo,data,max_lines);
+		return 1;
+	}
 }
 #endif
 
@@ -2479,6 +2505,10 @@ OJPEGLibjpegJpegSourceMgrSkipInputData(jpeg_decompress_struct* cinfo, long num_b
 	jpeg_encap_unwind(tif);
 }
 
+#ifdef _MSC_VER
+#pragma warning( push )
+#pragma warning( disable : 4702 ) /* unreachable code */
+#endif
 static boolean
 OJPEGLibjpegJpegSourceMgrResyncToRestart(jpeg_decompress_struct* cinfo, int desired)
 {
@@ -2488,6 +2518,9 @@ OJPEGLibjpegJpegSourceMgrResyncToRestart(jpeg_decompress_struct* cinfo, int desi
 	jpeg_encap_unwind(tif);
 	return(0);
 }
+#ifdef _MSC_VER
+#pragma warning( pop ) 
+#endif
 
 static void
 OJPEGLibjpegJpegSourceMgrTermSource(jpeg_decompress_struct* cinfo)
diff --git a/src/3rdparty/libtiff/libtiff/tif_open.c b/src/3rdparty/libtiff/libtiff/tif_open.c
index 8c88328..5c9036e 100644
--- a/src/3rdparty/libtiff/libtiff/tif_open.c
+++ b/src/3rdparty/libtiff/libtiff/tif_open.c
@@ -1,4 +1,4 @@
-/* $Id: tif_open.c,v 1.46 2010-12-06 16:54:54 faxguy Exp $ */
+/* $Id: tif_open.c,v 1.47 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -168,7 +168,7 @@ TIFFClientOpen(
 	 * The following flags may be used to control intrinsic library
 	 * behaviour that may or may not be desirable (usually for
 	 * compatibility with some application that claims to support
-	 * TIFF but only supports some braindead idea of what the
+	 * TIFF but only supports some brain dead idea of what the
 	 * vendor thinks TIFF is):
 	 *
 	 * 'l' use little-endian byte order for creating a file
@@ -198,8 +198,8 @@ TIFFClientOpen(
 	 * The 'L', 'B', and 'H' flags are intended for applications
 	 * that can optimize operations on data by using a particular
 	 * bit order.  By default the library returns data in MSB2LSB
-	 * bit order for compatibiltiy with older versions of this
-	 * library.  Returning data in the bit order of the native cpu
+	 * bit order for compatibility with older versions of this
+	 * library.  Returning data in the bit order of the native CPU
 	 * makes the most sense but also requires applications to check
 	 * the value of the FillOrder tag; something they probably do
 	 * not do right now.
diff --git a/src/3rdparty/libtiff/libtiff/tif_packbits.c b/src/3rdparty/libtiff/libtiff/tif_packbits.c
index 9e77190..d2a0165 100644
--- a/src/3rdparty/libtiff/libtiff/tif_packbits.c
+++ b/src/3rdparty/libtiff/libtiff/tif_packbits.c
@@ -1,4 +1,4 @@
-/* $Id: tif_packbits.c,v 1.22 2012-06-20 05:25:33 fwarmerdam Exp $ */
+/* $Id: tif_packbits.c,v 1.24 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -38,7 +38,8 @@ PackBitsPreEncode(TIFF* tif, uint16 s)
 {
 	(void) s;
 
-	if (!(tif->tif_data = (uint8*)_TIFFmalloc(sizeof(tmsize_t))))
+        tif->tif_data = (uint8*)_TIFFmalloc(sizeof(tmsize_t));
+	if (tif->tif_data == NULL)
 		return (0);
 	/*
 	 * Calculate the scanline/tile-width size in bytes.
@@ -81,7 +82,9 @@ PackBitsEncode(TIFF* tif, uint8* buf, tmsize_t cc, uint16 s)
 		/*
 		 * Find the longest string of identical bytes.
 		 */
-		b = *bp++, cc--, n = 1;
+		b = *bp++;
+		cc--;
+		n = 1;
 		for (; cc > 0 && b == *bp; cc--, bp++)
 			n++;
 	again:
@@ -222,7 +225,8 @@ PackBitsDecode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	bp = (char*) tif->tif_rawcp;
 	cc = tif->tif_rawcc;
 	while (cc > 0 && occ > 0) {
-		n = (long) *bp++, cc--;
+		n = (long) *bp++;
+		cc--;
 		/*
 		 * Watch out for compilers that
 		 * don't sign extend chars...
@@ -241,7 +245,8 @@ PackBitsDecode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 				n = (long)occ;
 			}
 			occ -= n;
-			b = *bp++, cc--;
+			b = *bp++;
+			cc--;
 			while (n-- > 0)
 				*op++ = (uint8) b;
 		} else {		/* copy next n+1 bytes literally */
diff --git a/src/3rdparty/libtiff/libtiff/tif_pixarlog.c b/src/3rdparty/libtiff/libtiff/tif_pixarlog.c
index 044c411..f4af2ba 100644
--- a/src/3rdparty/libtiff/libtiff/tif_pixarlog.c
+++ b/src/3rdparty/libtiff/libtiff/tif_pixarlog.c
@@ -1,4 +1,4 @@
-/* $Id: tif_pixarlog.c,v 1.39 2012-12-10 17:27:13 tgl Exp $ */
+/* $Id: tif_pixarlog.c,v 1.48 2016-09-23 22:12:18 erouault Exp $ */
 
 /*
  * Copyright (c) 1996-1997 Sam Leffler
@@ -45,15 +45,15 @@
  * input is assumed to be unsigned linear color values that represent
  * the range 0-1.  In the case of IEEE values, the 0-1 range is assumed to
  * be the normal linear color range, in addition over 1 values are
- * accepted up to a value of about 25.0 to encode "hot" hightlights and such.
+ * accepted up to a value of about 25.0 to encode "hot" highlights and such.
  * The encoding is lossless for 8-bit values, slightly lossy for the
  * other bit depths.  The actual color precision should be better
  * than the human eye can perceive with extra room to allow for
  * error introduced by further image computation.  As with any quantized
  * color format, it is possible to perform image calculations which
  * expose the quantization error. This format should certainly be less 
- * susceptable to such errors than standard 8-bit encodings, but more
- * susceptable than straight 16-bit or 32-bit encodings.
+ * susceptible to such errors than standard 8-bit encodings, but more
+ * susceptible than straight 16-bit or 32-bit encodings.
  *
  * On reading the internal format is converted to the desired output format.
  * The program can request which format it desires by setting the internal
@@ -296,33 +296,35 @@ horizontalAccumulate16(uint16 *wp, int n, int stride, uint16 *op,
 static void
 horizontalAccumulate11(uint16 *wp, int n, int stride, uint16 *op)
 {
-    register unsigned int  cr, cg, cb, ca, mask;
+    register unsigned int cr, cg, cb, ca, mask;
 
     if (n >= stride) {
 	mask = CODE_MASK;
 	if (stride == 3) {
-	    op[0] = cr = wp[0];  op[1] = cg = wp[1];  op[2] = cb = wp[2];
+	    op[0] = wp[0];  op[1] = wp[1];  op[2] = wp[2];
+            cr = wp[0];  cg = wp[1];  cb = wp[2];
 	    n -= 3;
 	    while (n > 0) {
 		wp += 3;
 		op += 3;
 		n -= 3;
-		op[0] = (cr += wp[0]) & mask;
-		op[1] = (cg += wp[1]) & mask;
-		op[2] = (cb += wp[2]) & mask;
+		op[0] = (uint16)((cr += wp[0]) & mask);
+		op[1] = (uint16)((cg += wp[1]) & mask);
+		op[2] = (uint16)((cb += wp[2]) & mask);
 	    }
 	} else if (stride == 4) {
-	    op[0] = cr = wp[0];  op[1] = cg = wp[1];
-	    op[2] = cb = wp[2];  op[3] = ca = wp[3];
+	    op[0] = wp[0];  op[1] = wp[1];
+	    op[2] = wp[2];  op[3] = wp[3];
+            cr = wp[0]; cg = wp[1]; cb = wp[2]; ca = wp[3];
 	    n -= 4;
 	    while (n > 0) {
 		wp += 4;
 		op += 4;
 		n -= 4;
-		op[0] = (cr += wp[0]) & mask;
-		op[1] = (cg += wp[1]) & mask;
-		op[2] = (cb += wp[2]) & mask;
-		op[3] = (ca += wp[3]) & mask;
+		op[0] = (uint16)((cr += wp[0]) & mask);
+		op[1] = (uint16)((cg += wp[1]) & mask);
+		op[2] = (uint16)((cb += wp[2]) & mask);
+		op[3] = (uint16)((ca += wp[3]) & mask);
 	    } 
 	} else {
 	    REPEAT(stride, *op = *wp&mask; wp++; op++)
@@ -457,6 +459,7 @@ horizontalAccumulate8abgr(uint16 *wp, int n, int stride, unsigned char *op,
 typedef	struct {
 	TIFFPredictorState	predict;
 	z_stream		stream;
+	tmsize_t		tbuf_size; /* only set/used on reading for now */
 	uint16			*tbuf; 
 	uint16			stride;
 	int			state;
@@ -556,7 +559,7 @@ PixarLogMakeTables(PixarLogState *sp)
     for (i = 0; i < lt2size; i++)  {
 	if ((i*linstep)*(i*linstep) > ToLinearF[j]*ToLinearF[j+1])
 	    j++;
-	FromLT2[i] = j;
+	FromLT2[i] = (uint16)j;
     }
 
     /*
@@ -568,14 +571,14 @@ PixarLogMakeTables(PixarLogState *sp)
     for (i = 0; i < 16384; i++)  {
 	while ((i/16383.)*(i/16383.) > ToLinearF[j]*ToLinearF[j+1])
 	    j++;
-	From14[i] = j;
+	From14[i] = (uint16)j;
     }
 
     j = 0;
     for (i = 0; i < 256; i++)  {
 	while ((i/255.)*(i/255.) > ToLinearF[j]*ToLinearF[j+1])
 	    j++;
-	From8[i] = j;
+	From8[i] = (uint16)j;
     }
 
     Fltsize = (float)(lt2size/2);
@@ -692,6 +695,7 @@ PixarLogSetupDecode(TIFF* tif)
 	sp->tbuf = (uint16 *) _TIFFmalloc(tbuf_size);
 	if (sp->tbuf == NULL)
 		return (0);
+	sp->tbuf_size = tbuf_size;
 	if (sp->user_datafmt == PIXARLOGDATAFMT_UNKNOWN)
 		sp->user_datafmt = PixarLogGuessDataFmt(td);
 	if (sp->user_datafmt == PIXARLOGDATAFMT_UNKNOWN) {
@@ -702,7 +706,7 @@ PixarLogSetupDecode(TIFF* tif)
 	}
 
 	if (inflateInit(&sp->stream) != Z_OK) {
-		TIFFErrorExt(tif->tif_clientdata, module, "%s", sp->stream.msg);
+		TIFFErrorExt(tif->tif_clientdata, module, "%s", sp->stream.msg ? sp->stream.msg : "(null)");
 		return (0);
 	} else {
 		sp->state |= PLSTATE_INIT;
@@ -725,7 +729,7 @@ PixarLogPreDecode(TIFF* tif, uint16 s)
 	assert(sizeof(sp->stream.avail_in)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
+	    appropriately even before we simplify it */
 	sp->stream.avail_in = (uInt) tif->tif_rawcc;
 	if ((tmsize_t)sp->stream.avail_in != tif->tif_rawcc)
 	{
@@ -774,13 +778,19 @@ PixarLogDecode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	assert(sizeof(sp->stream.avail_out)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
+	    appropriately even before we simplify it */
 	sp->stream.avail_out = (uInt) (nsamples * sizeof(uint16));
 	if (sp->stream.avail_out != nsamples * sizeof(uint16))
 	{
 		TIFFErrorExt(tif->tif_clientdata, module, "ZLib cannot deal with buffers this size");
 		return (0);
 	}
+	/* Check that we will not fill more than what was allocated */
+	if ((tmsize_t)sp->stream.avail_out > sp->tbuf_size)
+	{
+		TIFFErrorExt(tif->tif_clientdata, module, "sp->stream.avail_out > sp->tbuf_size");
+		return (0);
+	}
 	do {
 		int state = inflate(&sp->stream, Z_PARTIAL_FLUSH);
 		if (state == Z_STREAM_END) {
@@ -789,14 +799,14 @@ PixarLogDecode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 		if (state == Z_DATA_ERROR) {
 			TIFFErrorExt(tif->tif_clientdata, module,
 			    "Decoding error at scanline %lu, %s",
-			    (unsigned long) tif->tif_row, sp->stream.msg);
+			    (unsigned long) tif->tif_row, sp->stream.msg ? sp->stream.msg : "(null)");
 			if (inflateSync(&sp->stream) != Z_OK)
 				return (0);
 			continue;
 		}
 		if (state != Z_OK) {
 			TIFFErrorExt(tif->tif_clientdata, module, "ZLib error: %s",
-			    sp->stream.msg);
+			    sp->stream.msg ? sp->stream.msg : "(null)");
 			return (0);
 		}
 	} while (sp->stream.avail_out > 0);
@@ -898,7 +908,7 @@ PixarLogSetupEncode(TIFF* tif)
 	}
 
 	if (deflateInit(&sp->stream, sp->quality) != Z_OK) {
-		TIFFErrorExt(tif->tif_clientdata, module, "%s", sp->stream.msg);
+		TIFFErrorExt(tif->tif_clientdata, module, "%s", sp->stream.msg ? sp->stream.msg : "(null)");
 		return (0);
 	} else {
 		sp->state |= PLSTATE_INIT;
@@ -921,8 +931,8 @@ PixarLogPreEncode(TIFF* tif, uint16 s)
 	assert(sizeof(sp->stream.avail_out)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
-	sp->stream.avail_out = tif->tif_rawdatasize;
+	    appropriately even before we simplify it */
+	sp->stream.avail_out = (uInt)tif->tif_rawdatasize;
 	if ((tmsize_t)sp->stream.avail_out != tif->tif_rawdatasize)
 	{
 		TIFFErrorExt(tif->tif_clientdata, module, "ZLib cannot deal with buffers this size");
@@ -953,9 +963,9 @@ horizontalDifferenceF(float *ip, int n, int stride, uint16 *wp, uint16 *FromLT2)
 		n -= 3;
 		wp += 3;
 		ip += 3;
-		r1 = (int32) CLAMP(ip[0]); wp[0] = (r1-r2) & mask; r2 = r1;
-		g1 = (int32) CLAMP(ip[1]); wp[1] = (g1-g2) & mask; g2 = g1;
-		b1 = (int32) CLAMP(ip[2]); wp[2] = (b1-b2) & mask; b2 = b1;
+		r1 = (int32) CLAMP(ip[0]); wp[0] = (uint16)((r1-r2) & mask); r2 = r1;
+		g1 = (int32) CLAMP(ip[1]); wp[1] = (uint16)((g1-g2) & mask); g2 = g1;
+		b1 = (int32) CLAMP(ip[2]); wp[2] = (uint16)((b1-b2) & mask); b2 = b1;
 	    }
 	} else if (stride == 4) {
 	    r2 = wp[0] = (uint16) CLAMP(ip[0]);
@@ -967,23 +977,20 @@ horizontalDifferenceF(float *ip, int n, int stride, uint16 *wp, uint16 *FromLT2)
 		n -= 4;
 		wp += 4;
 		ip += 4;
-		r1 = (int32) CLAMP(ip[0]); wp[0] = (r1-r2) & mask; r2 = r1;
-		g1 = (int32) CLAMP(ip[1]); wp[1] = (g1-g2) & mask; g2 = g1;
-		b1 = (int32) CLAMP(ip[2]); wp[2] = (b1-b2) & mask; b2 = b1;
-		a1 = (int32) CLAMP(ip[3]); wp[3] = (a1-a2) & mask; a2 = a1;
+		r1 = (int32) CLAMP(ip[0]); wp[0] = (uint16)((r1-r2) & mask); r2 = r1;
+		g1 = (int32) CLAMP(ip[1]); wp[1] = (uint16)((g1-g2) & mask); g2 = g1;
+		b1 = (int32) CLAMP(ip[2]); wp[2] = (uint16)((b1-b2) & mask); b2 = b1;
+		a1 = (int32) CLAMP(ip[3]); wp[3] = (uint16)((a1-a2) & mask); a2 = a1;
 	    }
 	} else {
-	    ip += n - 1;	/* point to last one */
-	    wp += n - 1;	/* point to last one */
-	    n -= stride;
-	    while (n > 0) {
-		REPEAT(stride, wp[0] = (uint16) CLAMP(ip[0]);
-				wp[stride] -= wp[0];
-				wp[stride] &= mask;
-				wp--; ip--)
-		n -= stride;
-	    }
-	    REPEAT(stride, wp[0] = (uint16) CLAMP(ip[0]); wp--; ip--)
+        REPEAT(stride, wp[0] = (uint16) CLAMP(ip[0]); wp++; ip++)
+        n -= stride;
+        while (n > 0) {
+            REPEAT(stride,
+                wp[0] = (uint16)(((int32)CLAMP(ip[0])-(int32)CLAMP(ip[-stride])) & mask);
+                wp++; ip++)
+            n -= stride;
+        }
 	}
     }
 }
@@ -1008,9 +1015,9 @@ horizontalDifference16(unsigned short *ip, int n, int stride,
 		n -= 3;
 		wp += 3;
 		ip += 3;
-		r1 = CLAMP(ip[0]); wp[0] = (r1-r2) & mask; r2 = r1;
-		g1 = CLAMP(ip[1]); wp[1] = (g1-g2) & mask; g2 = g1;
-		b1 = CLAMP(ip[2]); wp[2] = (b1-b2) & mask; b2 = b1;
+		r1 = CLAMP(ip[0]); wp[0] = (uint16)((r1-r2) & mask); r2 = r1;
+		g1 = CLAMP(ip[1]); wp[1] = (uint16)((g1-g2) & mask); g2 = g1;
+		b1 = CLAMP(ip[2]); wp[2] = (uint16)((b1-b2) & mask); b2 = b1;
 	    }
 	} else if (stride == 4) {
 	    r2 = wp[0] = CLAMP(ip[0]);  g2 = wp[1] = CLAMP(ip[1]);
@@ -1020,23 +1027,20 @@ horizontalDifference16(unsigned short *ip, int n, int stride,
 		n -= 4;
 		wp += 4;
 		ip += 4;
-		r1 = CLAMP(ip[0]); wp[0] = (r1-r2) & mask; r2 = r1;
-		g1 = CLAMP(ip[1]); wp[1] = (g1-g2) & mask; g2 = g1;
-		b1 = CLAMP(ip[2]); wp[2] = (b1-b2) & mask; b2 = b1;
-		a1 = CLAMP(ip[3]); wp[3] = (a1-a2) & mask; a2 = a1;
+		r1 = CLAMP(ip[0]); wp[0] = (uint16)((r1-r2) & mask); r2 = r1;
+		g1 = CLAMP(ip[1]); wp[1] = (uint16)((g1-g2) & mask); g2 = g1;
+		b1 = CLAMP(ip[2]); wp[2] = (uint16)((b1-b2) & mask); b2 = b1;
+		a1 = CLAMP(ip[3]); wp[3] = (uint16)((a1-a2) & mask); a2 = a1;
 	    }
 	} else {
-	    ip += n - 1;	/* point to last one */
-	    wp += n - 1;	/* point to last one */
+        REPEAT(stride, wp[0] = CLAMP(ip[0]); wp++; ip++)
 	    n -= stride;
 	    while (n > 0) {
-		REPEAT(stride, wp[0] = CLAMP(ip[0]);
-				wp[stride] -= wp[0];
-				wp[stride] &= mask;
-				wp--; ip--)
-		n -= stride;
-	    }
-	    REPEAT(stride, wp[0] = CLAMP(ip[0]); wp--; ip--)
+            REPEAT(stride,
+                wp[0] = (uint16)((CLAMP(ip[0])-CLAMP(ip[-stride])) & mask);
+                wp++; ip++)
+            n -= stride;
+        }
 	}
     }
 }
@@ -1059,9 +1063,9 @@ horizontalDifference8(unsigned char *ip, int n, int stride,
 	    n -= 3;
 	    while (n > 0) {
 		n -= 3;
-		r1 = CLAMP(ip[3]); wp[3] = (r1-r2) & mask; r2 = r1;
-		g1 = CLAMP(ip[4]); wp[4] = (g1-g2) & mask; g2 = g1;
-		b1 = CLAMP(ip[5]); wp[5] = (b1-b2) & mask; b2 = b1;
+		r1 = CLAMP(ip[3]); wp[3] = (uint16)((r1-r2) & mask); r2 = r1;
+		g1 = CLAMP(ip[4]); wp[4] = (uint16)((g1-g2) & mask); g2 = g1;
+		b1 = CLAMP(ip[5]); wp[5] = (uint16)((b1-b2) & mask); b2 = b1;
 		wp += 3;
 		ip += 3;
 	    }
@@ -1071,26 +1075,23 @@ horizontalDifference8(unsigned char *ip, int n, int stride,
 	    n -= 4;
 	    while (n > 0) {
 		n -= 4;
-		r1 = CLAMP(ip[4]); wp[4] = (r1-r2) & mask; r2 = r1;
-		g1 = CLAMP(ip[5]); wp[5] = (g1-g2) & mask; g2 = g1;
-		b1 = CLAMP(ip[6]); wp[6] = (b1-b2) & mask; b2 = b1;
-		a1 = CLAMP(ip[7]); wp[7] = (a1-a2) & mask; a2 = a1;
+		r1 = CLAMP(ip[4]); wp[4] = (uint16)((r1-r2) & mask); r2 = r1;
+		g1 = CLAMP(ip[5]); wp[5] = (uint16)((g1-g2) & mask); g2 = g1;
+		b1 = CLAMP(ip[6]); wp[6] = (uint16)((b1-b2) & mask); b2 = b1;
+		a1 = CLAMP(ip[7]); wp[7] = (uint16)((a1-a2) & mask); a2 = a1;
 		wp += 4;
 		ip += 4;
 	    }
 	} else {
-	    wp += n + stride - 1;	/* point to last one */
-	    ip += n + stride - 1;	/* point to last one */
-	    n -= stride;
-	    while (n > 0) {
-		REPEAT(stride, wp[0] = CLAMP(ip[0]);
-				wp[stride] -= wp[0];
-				wp[stride] &= mask;
-				wp--; ip--)
-		n -= stride;
-	    }
-	    REPEAT(stride, wp[0] = CLAMP(ip[0]); wp--; ip--)
-	}
+        REPEAT(stride, wp[0] = CLAMP(ip[0]); wp++; ip++)
+        n -= stride;
+        while (n > 0) {
+            REPEAT(stride,
+                wp[0] = (uint16)((CLAMP(ip[0])-CLAMP(ip[-stride])) & mask);
+                wp++; ip++)
+            n -= stride;
+        }
+    }
     }
 }
 
@@ -1131,6 +1132,13 @@ PixarLogEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	}
 
 	llen = sp->stride * td->td_imagewidth;
+    /* Check against the number of elements (of size uint16) of sp->tbuf */
+    if( n > (tmsize_t)(td->td_rowsperstrip * llen) )
+    {
+        TIFFErrorExt(tif->tif_clientdata, module,
+                     "Too many input bytes provided");
+        return 0;
+    }
 
 	for (i = 0, up = sp->tbuf; i < n; i += llen, up += llen) {
 		switch (sp->user_datafmt)  {
@@ -1161,7 +1169,7 @@ PixarLogEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	assert(sizeof(sp->stream.avail_in)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
+	    appropriately even before we simplify it */
 	sp->stream.avail_in = (uInt) (n * sizeof(uint16));
 	if ((sp->stream.avail_in / sizeof(uint16)) != (uInt) n)
 	{
@@ -1173,7 +1181,7 @@ PixarLogEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	do {
 		if (deflate(&sp->stream, Z_NO_FLUSH) != Z_OK) {
 			TIFFErrorExt(tif->tif_clientdata, module, "Encoder error: %s",
-			    sp->stream.msg);
+			    sp->stream.msg ? sp->stream.msg : "(null)");
 			return (0);
 		}
 		if (sp->stream.avail_out == 0) {
@@ -1215,7 +1223,7 @@ PixarLogPostEncode(TIFF* tif)
 		    break;
 		default:
 			TIFFErrorExt(tif->tif_clientdata, module, "ZLib error: %s",
-			sp->stream.msg);
+			sp->stream.msg ? sp->stream.msg : "(null)");
 		    return (0);
 		}
 	} while (state != Z_STREAM_END);
@@ -1227,7 +1235,7 @@ PixarLogClose(TIFF* tif)
 {
 	TIFFDirectory *td = &tif->tif_dir;
 
-	/* In a really sneaky (and really incorrect, and untruthfull, and
+	/* In a really sneaky (and really incorrect, and untruthful, and
 	 * troublesome, and error-prone) maneuver that completely goes against
 	 * the spirit of TIFF, and breaks TIFF, on close, we covertly
 	 * modify both bitspersample and sampleformat in the directory to
@@ -1285,7 +1293,7 @@ PixarLogVSetField(TIFF* tif, uint32 tag, va_list ap)
 			if (deflateParams(&sp->stream,
 			    sp->quality, Z_DEFAULT_STRATEGY) != Z_OK) {
 				TIFFErrorExt(tif->tif_clientdata, module, "ZLib error: %s",
-					sp->stream.msg);
+					sp->stream.msg ? sp->stream.msg : "(null)");
 				return (0);
 			}
 		}
diff --git a/src/3rdparty/libtiff/libtiff/tif_predict.c b/src/3rdparty/libtiff/libtiff/tif_predict.c
index 1388dde..0b185d2 100644
--- a/src/3rdparty/libtiff/libtiff/tif_predict.c
+++ b/src/3rdparty/libtiff/libtiff/tif_predict.c
@@ -1,4 +1,4 @@
-/* $Id: tif_predict.c,v 1.35 2015-08-31 15:05:57 erouault Exp $ */
+/* $Id: tif_predict.c,v 1.40 2016-11-04 09:19:13 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -34,18 +34,18 @@
 
 #define	PredictorState(tif)	((TIFFPredictorState*) (tif)->tif_data)
 
-static void horAcc8(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void horAcc16(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void horAcc32(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void swabHorAcc16(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void swabHorAcc32(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void horDiff8(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void horDiff16(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void horDiff32(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void swabHorDiff16(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void swabHorDiff32(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void fpAcc(TIFF* tif, uint8* cp0, tmsize_t cc);
-static void fpDiff(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int horAcc8(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int horAcc16(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int horAcc32(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int swabHorAcc16(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int swabHorAcc32(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int horDiff8(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int horDiff16(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int horDiff32(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int swabHorDiff16(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int swabHorDiff32(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int fpAcc(TIFF* tif, uint8* cp0, tmsize_t cc);
+static int fpDiff(TIFF* tif, uint8* cp0, tmsize_t cc);
 static int PredictorDecodeRow(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s);
 static int PredictorDecodeTile(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s);
 static int PredictorEncodeRow(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s);
@@ -80,6 +80,15 @@ PredictorSetup(TIFF* tif)
 				    td->td_sampleformat);
 				return 0;
 			}
+                        if (td->td_bitspersample != 16
+                            && td->td_bitspersample != 24
+                            && td->td_bitspersample != 32
+                            && td->td_bitspersample != 64) { /* Should 64 be allowed? */
+                                TIFFErrorExt(tif->tif_clientdata, module,
+                                             "Floating point \"Predictor\" not supported with %d-bit samples",
+                                             td->td_bitspersample);
+				return 0;
+                            }
 			break;
 		default:
 			TIFFErrorExt(tif->tif_clientdata, module,
@@ -174,7 +183,7 @@ PredictorSetupDecode(TIFF* tif)
 		}
 		/*
 		 * Allocate buffer to keep the decoded bytes before
-		 * rearranging in the ight order
+		 * rearranging in the right order
 		 */
 	}
 
@@ -213,7 +222,7 @@ PredictorSetupEncode(TIFF* tif)
                 /*
                  * If the data is horizontally differenced 16-bit data that
                  * requires byte-swapping, then it must be byte swapped after
-                 * the differenciation step.  We do this with a special-purpose
+                 * the differentiation step.  We do this with a special-purpose
                  * routine and override the normal post decoding logic that
                  * the library setup when the directory was read.
                  */
@@ -264,13 +273,19 @@ PredictorSetupEncode(TIFF* tif)
 /* - when storing into the byte stream, we explicitly mask with 0xff so */
 /*   as to make icc -check=conversions happy (not necessary by the standard) */
 
-static void
+static int
 horAcc8(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	tmsize_t stride = PredictorState(tif)->stride;
 
 	unsigned char* cp = (unsigned char*) cp0;
-	assert((cc%stride)==0);
+    if((cc%stride)!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "horAcc8",
+                     "%s", "(cc%stride)!=0");
+        return 0;
+    }
+
 	if (cc > stride) {
 		/*
 		 * Pipeline the most common cases.
@@ -312,26 +327,32 @@ horAcc8(TIFF* tif, uint8* cp0, tmsize_t cc)
 			} while (cc>0);
 		}
 	}
+	return 1;
 }
 
-static void
+static int
 swabHorAcc16(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	uint16* wp = (uint16*) cp0;
 	tmsize_t wc = cc / 2;
 
         TIFFSwabArrayOfShort(wp, wc);
-        horAcc16(tif, cp0, cc);
+        return horAcc16(tif, cp0, cc);
 }
 
-static void
+static int
 horAcc16(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	tmsize_t stride = PredictorState(tif)->stride;
 	uint16* wp = (uint16*) cp0;
 	tmsize_t wc = cc / 2;
 
-	assert((cc%(2*stride))==0);
+    if((cc%(2*stride))!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "horAcc16",
+                     "%s", "cc%(2*stride))!=0");
+        return 0;
+    }
 
 	if (wc > stride) {
 		wc -= stride;
@@ -340,26 +361,32 @@ horAcc16(TIFF* tif, uint8* cp0, tmsize_t cc)
 			wc -= stride;
 		} while (wc > 0);
 	}
+	return 1;
 }
 
-static void
+static int
 swabHorAcc32(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	uint32* wp = (uint32*) cp0;
 	tmsize_t wc = cc / 4;
 
         TIFFSwabArrayOfLong(wp, wc);
-	horAcc32(tif, cp0, cc);
+	return horAcc32(tif, cp0, cc);
 }
 
-static void
+static int
 horAcc32(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	tmsize_t stride = PredictorState(tif)->stride;
 	uint32* wp = (uint32*) cp0;
 	tmsize_t wc = cc / 4;
 
-	assert((cc%(4*stride))==0);
+    if((cc%(4*stride))!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "horAcc32",
+                     "%s", "cc%(4*stride))!=0");
+        return 0;
+    }
 
 	if (wc > stride) {
 		wc -= stride;
@@ -368,12 +395,13 @@ horAcc32(TIFF* tif, uint8* cp0, tmsize_t cc)
 			wc -= stride;
 		} while (wc > 0);
 	}
+	return 1;
 }
 
 /*
  * Floating point predictor accumulation routine.
  */
-static void
+static int
 fpAcc(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	tmsize_t stride = PredictorState(tif)->stride;
@@ -381,12 +409,18 @@ fpAcc(TIFF* tif, uint8* cp0, tmsize_t cc)
 	tmsize_t wc = cc / bps;
 	tmsize_t count = cc;
 	uint8 *cp = (uint8 *) cp0;
-	uint8 *tmp = (uint8 *)_TIFFmalloc(cc);
+	uint8 *tmp;
 
-	assert((cc%(bps*stride))==0);
+    if(cc%(bps*stride)!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "fpAcc",
+                     "%s", "cc%(bps*stride))!=0");
+        return 0;
+    }
 
+    tmp = (uint8 *)_TIFFmalloc(cc);
 	if (!tmp)
-		return;
+		return 0;
 
 	while (count > stride) {
 		REPEAT4(stride, cp[stride] =
@@ -408,6 +442,7 @@ fpAcc(TIFF* tif, uint8* cp0, tmsize_t cc)
 		}
 	}
 	_TIFFfree(tmp);
+    return 1;
 }
 
 /*
@@ -423,8 +458,7 @@ PredictorDecodeRow(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 	assert(sp->decodepfunc != NULL);  
 
 	if ((*sp->decoderow)(tif, op0, occ0, s)) {
-		(*sp->decodepfunc)(tif, op0, occ0);
-		return 1;
+		return (*sp->decodepfunc)(tif, op0, occ0);
 	} else
 		return 0;
 }
@@ -447,10 +481,16 @@ PredictorDecodeTile(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 	if ((*sp->decodetile)(tif, op0, occ0, s)) {
 		tmsize_t rowsize = sp->rowsize;
 		assert(rowsize > 0);
-		assert((occ0%rowsize)==0);
+		if((occ0%rowsize) !=0)
+        {
+            TIFFErrorExt(tif->tif_clientdata, "PredictorDecodeTile",
+                         "%s", "occ0%rowsize != 0");
+            return 0;
+        }
 		assert(sp->decodepfunc != NULL);
 		while (occ0 > 0) {
-			(*sp->decodepfunc)(tif, op0, rowsize);
+			if( !(*sp->decodepfunc)(tif, op0, rowsize) )
+                return 0;
 			occ0 -= rowsize;
 			op0 += rowsize;
 		}
@@ -459,14 +499,19 @@ PredictorDecodeTile(TIFF* tif, uint8* op0, tmsize_t occ0, uint16 s)
 		return 0;
 }
 
-static void
+static int
 horDiff8(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	TIFFPredictorState* sp = PredictorState(tif);
 	tmsize_t stride = sp->stride;
 	unsigned char* cp = (unsigned char*) cp0;
 
-	assert((cc%stride)==0);
+    if((cc%stride)!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "horDiff8",
+                     "%s", "(cc%stride)!=0");
+        return 0;
+    }
 
 	if (cc > stride) {
 		cc -= stride;
@@ -504,9 +549,10 @@ horDiff8(TIFF* tif, uint8* cp0, tmsize_t cc)
 			} while ((cc -= stride) > 0);
 		}
 	}
+	return 1;
 }
 
-static void
+static int
 horDiff16(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	TIFFPredictorState* sp = PredictorState(tif);
@@ -514,7 +560,12 @@ horDiff16(TIFF* tif, uint8* cp0, tmsize_t cc)
 	uint16 *wp = (uint16*) cp0;
 	tmsize_t wc = cc/2;
 
-	assert((cc%(2*stride))==0);
+    if((cc%(2*stride))!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "horDiff8",
+                     "%s", "(cc%(2*stride))!=0");
+        return 0;
+    }
 
 	if (wc > stride) {
 		wc -= stride;
@@ -524,20 +575,23 @@ horDiff16(TIFF* tif, uint8* cp0, tmsize_t cc)
 			wc -= stride;
 		} while (wc > 0);
 	}
+	return 1;
 }
 
-static void
+static int
 swabHorDiff16(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
     uint16* wp = (uint16*) cp0;
     tmsize_t wc = cc / 2;
 
-    horDiff16(tif, cp0, cc);
+    if( !horDiff16(tif, cp0, cc) )
+        return 0;
 
     TIFFSwabArrayOfShort(wp, wc);
+    return 1;
 }
 
-static void
+static int
 horDiff32(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	TIFFPredictorState* sp = PredictorState(tif);
@@ -545,7 +599,12 @@ horDiff32(TIFF* tif, uint8* cp0, tmsize_t cc)
 	uint32 *wp = (uint32*) cp0;
 	tmsize_t wc = cc/4;
 
-	assert((cc%(4*stride))==0);
+    if((cc%(4*stride))!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "horDiff32",
+                     "%s", "(cc%(4*stride))!=0");
+        return 0;
+    }
 
 	if (wc > stride) {
 		wc -= stride;
@@ -555,23 +614,26 @@ horDiff32(TIFF* tif, uint8* cp0, tmsize_t cc)
 			wc -= stride;
 		} while (wc > 0);
 	}
+	return 1;
 }
 
-static void
+static int
 swabHorDiff32(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
     uint32* wp = (uint32*) cp0;
     tmsize_t wc = cc / 4;
 
-    horDiff32(tif, cp0, cc);
+    if( !horDiff32(tif, cp0, cc) )
+        return 0;
 
     TIFFSwabArrayOfLong(wp, wc);
+    return 1;
 }
 
 /*
  * Floating point predictor differencing routine.
  */
-static void
+static int
 fpDiff(TIFF* tif, uint8* cp0, tmsize_t cc)
 {
 	tmsize_t stride = PredictorState(tif)->stride;
@@ -579,12 +641,18 @@ fpDiff(TIFF* tif, uint8* cp0, tmsize_t cc)
 	tmsize_t wc = cc / bps;
 	tmsize_t count;
 	uint8 *cp = (uint8 *) cp0;
-	uint8 *tmp = (uint8 *)_TIFFmalloc(cc);
+	uint8 *tmp;
 
-	assert((cc%(bps*stride))==0);
+    if((cc%(bps*stride))!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "fpDiff",
+                     "%s", "(cc%(bps*stride))!=0");
+        return 0;
+    }
 
+    tmp = (uint8 *)_TIFFmalloc(cc);
 	if (!tmp)
-		return;
+		return 0;
 
 	_TIFFmemcpy(tmp, cp0, cc);
 	for (count = 0; count < wc; count++) {
@@ -604,6 +672,7 @@ fpDiff(TIFF* tif, uint8* cp0, tmsize_t cc)
 	cp += cc - stride - 1;
 	for (count = cc; count > stride; count -= stride)
 		REPEAT4(stride, cp[stride] = (unsigned char)((cp[stride] - cp[0])&0xff); cp--)
+    return 1;
 }
 
 static int
@@ -616,7 +685,8 @@ PredictorEncodeRow(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	assert(sp->encoderow != NULL);
 
 	/* XXX horizontal differencing alters user's data XXX */
-	(*sp->encodepfunc)(tif, bp, cc);
+	if( !(*sp->encodepfunc)(tif, bp, cc) )
+        return 0;
 	return (*sp->encoderow)(tif, bp, cc, s);
 }
 
@@ -651,7 +721,13 @@ PredictorEncodeTile(TIFF* tif, uint8* bp0, tmsize_t cc0, uint16 s)
 
 	rowsize = sp->rowsize;
 	assert(rowsize > 0);
-	assert((cc0%rowsize)==0);
+	if((cc0%rowsize)!=0)
+    {
+        TIFFErrorExt(tif->tif_clientdata, "PredictorEncodeTile",
+                     "%s", "(cc0%rowsize)!=0");
+        _TIFFfree( working_copy );
+        return 0;
+    }
 	while (cc > 0) {
 		(*sp->encodepfunc)(tif, bp, rowsize);
 		cc -= rowsize;
@@ -700,7 +776,7 @@ PredictorVGetField(TIFF* tif, uint32 tag, va_list ap)
 
 	switch (tag) {
 	case TIFFTAG_PREDICTOR:
-		*va_arg(ap, uint16*) = sp->predictor;
+		*va_arg(ap, uint16*) = (uint16)sp->predictor;
 		break;
 	default:
 		return (*sp->vgetparent)(tif, tag, ap);
diff --git a/src/3rdparty/libtiff/libtiff/tif_predict.h b/src/3rdparty/libtiff/libtiff/tif_predict.h
index dc7144c..6c68e21 100644
--- a/src/3rdparty/libtiff/libtiff/tif_predict.h
+++ b/src/3rdparty/libtiff/libtiff/tif_predict.h
@@ -1,4 +1,4 @@
-/* $Id: tif_predict.h,v 1.8 2010-03-10 18:56:49 bfriesen Exp $ */
+/* $Id: tif_predict.h,v 1.9 2016-10-31 17:24:26 erouault Exp $ */
 
 /*
  * Copyright (c) 1995-1997 Sam Leffler
@@ -30,6 +30,8 @@
  * ``Library-private'' Support for the Predictor Tag
  */
 
+typedef int (*TIFFEncodeDecodeMethod)(TIFF* tif, uint8* buf, tmsize_t size);
+
 /*
  * Codecs that want to support the Predictor tag must place
  * this structure first in their private state block so that
@@ -43,12 +45,12 @@ typedef struct {
 	TIFFCodeMethod  encoderow;	/* parent codec encode/decode row */
 	TIFFCodeMethod  encodestrip;	/* parent codec encode/decode strip */
 	TIFFCodeMethod  encodetile;	/* parent codec encode/decode tile */ 
-	TIFFPostMethod  encodepfunc;	/* horizontal differencer */
+	TIFFEncodeDecodeMethod  encodepfunc;	/* horizontal differencer */
 
 	TIFFCodeMethod  decoderow;	/* parent codec encode/decode row */
 	TIFFCodeMethod  decodestrip;	/* parent codec encode/decode strip */
 	TIFFCodeMethod  decodetile;	/* parent codec encode/decode tile */ 
-	TIFFPostMethod  decodepfunc;	/* horizontal accumulator */
+	TIFFEncodeDecodeMethod  decodepfunc;	/* horizontal accumulator */
 
 	TIFFVGetMethod  vgetparent;	/* super-class method */
 	TIFFVSetMethod  vsetparent;	/* super-class method */
diff --git a/src/3rdparty/libtiff/libtiff/tif_print.c b/src/3rdparty/libtiff/libtiff/tif_print.c
index 7b1a422..186f2ee 100644
--- a/src/3rdparty/libtiff/libtiff/tif_print.c
+++ b/src/3rdparty/libtiff/libtiff/tif_print.c
@@ -1,4 +1,4 @@
-/* $Id: tif_print.c,v 1.62 2015-08-19 02:31:04 bfriesen Exp $ */
+/* $Id: tif_print.c,v 1.64 2015-12-06 22:19:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -37,7 +37,7 @@
 static void
 _TIFFprintAsciiBounded(FILE* fd, const char* cp, size_t max_chars);
 
-static const char *photoNames[] = {
+static const char * const photoNames[] = {
     "min-is-white",				/* PHOTOMETRIC_MINISWHITE */
     "min-is-black",				/* PHOTOMETRIC_MINISBLACK */
     "RGB color",				/* PHOTOMETRIC_RGB */
@@ -52,7 +52,7 @@ static const char *photoNames[] = {
 };
 #define	NPHOTONAMES	(sizeof (photoNames) / sizeof (photoNames[0]))
 
-static const char *orientNames[] = {
+static const char * const orientNames[] = {
     "0 (0x0)",
     "row 0 top, col 0 lhs",			/* ORIENTATION_TOPLEFT */
     "row 0 top, col 0 rhs",			/* ORIENTATION_TOPRIGHT */
@@ -237,7 +237,6 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 {
 	TIFFDirectory *td = &tif->tif_dir;
 	char *sep;
-	uint16 i;
 	long l, n;
 
 #if defined(__WIN32__) && (defined(_MSC_VER) || defined(__MINGW32__))
@@ -365,6 +364,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 		}
 	}
 	if (TIFFFieldSet(tif,FIELD_EXTRASAMPLES) && td->td_extrasamples) {
+		uint16 i;
 		fprintf(fd, "  Extra Samples: %u<", td->td_extrasamples);
 		sep = "";
 		for (i = 0; i < td->td_extrasamples; i++) {
@@ -389,6 +389,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 	}
 	if (TIFFFieldSet(tif,FIELD_INKNAMES)) {
 		char* cp;
+		uint16 i;
 		fprintf(fd, "  Ink Names: ");
 		i = td->td_samplesperpixel;
 		sep = "";
@@ -481,6 +482,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 	if (TIFFFieldSet(tif,FIELD_MAXSAMPLEVALUE))
 		fprintf(fd, "  Max Sample Value: %u\n", td->td_maxsamplevalue);
 	if (TIFFFieldSet(tif,FIELD_SMINSAMPLEVALUE)) {
+		int i;
 		int count = (tif->tif_flags & TIFF_PERSAMPLE) ? td->td_samplesperpixel : 1;
 		fprintf(fd, "  SMin Sample Value:");
 		for (i = 0; i < count; ++i)
@@ -488,6 +490,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 		fprintf(fd, "\n");
 	}
 	if (TIFFFieldSet(tif,FIELD_SMAXSAMPLEVALUE)) {
+		int i;
 		int count = (tif->tif_flags & TIFF_PERSAMPLE) ? td->td_samplesperpixel : 1;
 		fprintf(fd, "  SMax Sample Value:");
 		for (i = 0; i < count; ++i)
@@ -527,6 +530,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 			fprintf(fd, "(present)\n");
 	}
 	if (TIFFFieldSet(tif,FIELD_REFBLACKWHITE)) {
+		int i;
 		fprintf(fd, "  Reference Black/White:\n");
 		for (i = 0; i < 3; i++)
 		fprintf(fd, "    %2d: %5g %5g\n", i,
@@ -539,6 +543,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 			fprintf(fd, "\n");
 			n = 1L<<td->td_bitspersample;
 			for (l = 0; l < n; l++) {
+				uint16 i;
 				fprintf(fd, "    %2lu: %5u",
 				    l, td->td_transferfunction[0][l]);
 				for (i = 1; i < td->td_samplesperpixel; i++)
@@ -550,6 +555,7 @@ TIFFPrintDirectory(TIFF* tif, FILE* fd, long flags)
 			fprintf(fd, "(present)\n");
 	}
 	if (TIFFFieldSet(tif, FIELD_SUBIFD) && (td->td_subifd)) {
+		uint16 i;
 		fprintf(fd, "  SubIFD Offsets:");
 		for (i = 0; i < td->td_nsubifd; i++)
 #if defined(__WIN32__) && (defined(_MSC_VER) || defined(__MINGW32__))
diff --git a/src/3rdparty/libtiff/libtiff/tif_read.c b/src/3rdparty/libtiff/libtiff/tif_read.c
index 5cb419b..8003592 100644
--- a/src/3rdparty/libtiff/libtiff/tif_read.c
+++ b/src/3rdparty/libtiff/libtiff/tif_read.c
@@ -1,4 +1,4 @@
-/* $Id: tif_read.c,v 1.45 2015-06-07 22:35:40 bfriesen Exp $ */
+/* $Id: tif_read.c,v 1.49 2016-07-10 18:00:21 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -31,6 +31,9 @@
 #include "tiffiop.h"
 #include <stdio.h>
 
+#define TIFF_SIZE_T_MAX ((size_t) ~ ((size_t)0))
+#define TIFF_TMSIZE_T_MAX (tmsize_t)(TIFF_SIZE_T_MAX >> 1)
+
 int TIFFFillStrip(TIFF* tif, uint32 strip);
 int TIFFFillTile(TIFF* tif, uint32 tile);
 static int TIFFStartStrip(TIFF* tif, uint32 strip);
@@ -38,6 +41,8 @@ static int TIFFStartTile(TIFF* tif, uint32 tile);
 static int TIFFCheckRead(TIFF*, int);
 static tmsize_t
 TIFFReadRawStrip1(TIFF* tif, uint32 strip, void* buf, tmsize_t size,const char* module);
+static tmsize_t
+TIFFReadRawTile1(TIFF* tif, uint32 tile, void* buf, tmsize_t size, const char* module);
 
 #define NOSTRIP ((uint32)(-1))       /* undefined state */
 #define NOTILE ((uint32)(-1))         /* undefined state */
@@ -343,13 +348,31 @@ TIFFReadEncodedStrip(TIFF* tif, uint32 strip, void* buf, tmsize_t size)
 		rowsperstrip=td->td_imagelength;
 	stripsperplane=((td->td_imagelength+rowsperstrip-1)/rowsperstrip);
 	stripinplane=(strip%stripsperplane);
-	plane=(strip/stripsperplane);
+	plane=(uint16)(strip/stripsperplane);
 	rows=td->td_imagelength-stripinplane*rowsperstrip;
 	if (rows>rowsperstrip)
 		rows=rowsperstrip;
 	stripsize=TIFFVStripSize(tif,rows);
 	if (stripsize==0)
 		return((tmsize_t)(-1));
+
+    /* shortcut to avoid an extra memcpy() */
+    if( td->td_compression == COMPRESSION_NONE &&
+        size!=(tmsize_t)(-1) && size >= stripsize &&
+        !isMapped(tif) &&
+        ((tif->tif_flags&TIFF_NOREADRAW)==0) )
+    {
+        if (TIFFReadRawStrip1(tif, strip, buf, stripsize, module) != stripsize)
+            return ((tmsize_t)(-1));
+
+        if (!isFillOrder(tif, td->td_fillorder) &&
+            (tif->tif_flags & TIFF_NOBITREV) == 0)
+            TIFFReverseBits(buf,stripsize);
+
+        (*tif->tif_postdecode)(tif,buf,stripsize);
+        return (stripsize);
+    }
+
 	if ((size!=(tmsize_t)(-1))&&(size<stripsize))
 		stripsize=size;
 	if (!TIFFFillStrip(tif,strip))
@@ -401,7 +424,7 @@ TIFFReadRawStrip1(TIFF* tif, uint32 strip, void* buf, tmsize_t size,
 		tmsize_t n;
 		ma=(tmsize_t)td->td_stripoffset[strip];
 		mb=ma+size;
-		if (((uint64)ma!=td->td_stripoffset[strip])||(ma>tif->tif_size))
+		if ((td->td_stripoffset[strip] > (uint64)TIFF_TMSIZE_T_MAX)||(ma>tif->tif_size))
 			n=0;
 		else if ((mb<ma)||(mb<size)||(mb>tif->tif_size))
 			n=tif->tif_size-ma;
@@ -492,9 +515,9 @@ TIFFFillStrip(TIFF* tif, uint32 strip)
 	static const char module[] = "TIFFFillStrip";
 	TIFFDirectory *td = &tif->tif_dir;
 
-    if (!_TIFFFillStriles( tif ) || !tif->tif_dir.td_stripbytecount)
-        return 0;
-        
+        if (!_TIFFFillStriles( tif ) || !tif->tif_dir.td_stripbytecount)
+            return 0;
+
 	if ((tif->tif_flags&TIFF_NOREADRAW)==0)
 	{
 		uint64 bytecount = td->td_stripbytecount[strip];
@@ -661,6 +684,24 @@ TIFFReadEncodedTile(TIFF* tif, uint32 tile, void* buf, tmsize_t size)
 		    (unsigned long) tile, (unsigned long) td->td_nstrips);
 		return ((tmsize_t)(-1));
 	}
+
+    /* shortcut to avoid an extra memcpy() */
+    if( td->td_compression == COMPRESSION_NONE &&
+        size!=(tmsize_t)(-1) && size >= tilesize &&
+        !isMapped(tif) &&
+        ((tif->tif_flags&TIFF_NOREADRAW)==0) )
+    {
+        if (TIFFReadRawTile1(tif, tile, buf, tilesize, module) != tilesize)
+            return ((tmsize_t)(-1));
+
+        if (!isFillOrder(tif, td->td_fillorder) &&
+            (tif->tif_flags & TIFF_NOBITREV) == 0)
+            TIFFReverseBits(buf,tilesize);
+
+        (*tif->tif_postdecode)(tif,buf,tilesize);
+        return (tilesize);
+    }
+
 	if (size == (tmsize_t)(-1))
 		size = tilesize;
 	else if (size > tilesize)
@@ -717,7 +758,7 @@ TIFFReadRawTile1(TIFF* tif, uint32 tile, void* buf, tmsize_t size, const char* m
 		tmsize_t n;
 		ma=(tmsize_t)td->td_stripoffset[tile];
 		mb=ma+size;
-		if (((uint64)ma!=td->td_stripoffset[tile])||(ma>tif->tif_size))
+		if ((td->td_stripoffset[tile] > (uint64)TIFF_TMSIZE_T_MAX)||(ma>tif->tif_size))
 			n=0;
 		else if ((mb<ma)||(mb<size)||(mb>tif->tif_size))
 			n=tif->tif_size-ma;
@@ -795,9 +836,9 @@ TIFFFillTile(TIFF* tif, uint32 tile)
 	static const char module[] = "TIFFFillTile";
 	TIFFDirectory *td = &tif->tif_dir;
 
-    if (!_TIFFFillStriles( tif ) || !tif->tif_dir.td_stripbytecount)
-        return 0;
-        
+        if (!_TIFFFillStriles( tif ) || !tif->tif_dir.td_stripbytecount)
+            return 0;
+
 	if ((tif->tif_flags&TIFF_NOREADRAW)==0)
 	{
 		uint64 bytecount = td->td_stripbytecount[tile];
@@ -957,8 +998,8 @@ TIFFStartStrip(TIFF* tif, uint32 strip)
 {
 	TIFFDirectory *td = &tif->tif_dir;
 
-    if (!_TIFFFillStriles( tif ) || !tif->tif_dir.td_stripbytecount)
-        return 0;
+        if (!_TIFFFillStriles( tif ) || !tif->tif_dir.td_stripbytecount)
+            return 0;
 
 	if ((tif->tif_flags & TIFF_CODERSETUP) == 0) {
 		if (!(*tif->tif_setupdecode)(tif))
diff --git a/src/3rdparty/libtiff/libtiff/tif_strip.c b/src/3rdparty/libtiff/libtiff/tif_strip.c
index 6cac71d..b6098dd 100644
--- a/src/3rdparty/libtiff/libtiff/tif_strip.c
+++ b/src/3rdparty/libtiff/libtiff/tif_strip.c
@@ -1,4 +1,4 @@
-/* $Id: tif_strip.c,v 1.36 2015-06-07 22:35:40 bfriesen Exp $ */
+/* $Id: tif_strip.c,v 1.37 2016-11-09 23:00:49 erouault Exp $ */
 
 /*
  * Copyright (c) 1991-1997 Sam Leffler
@@ -63,6 +63,15 @@ TIFFNumberOfStrips(TIFF* tif)
 	TIFFDirectory *td = &tif->tif_dir;
 	uint32 nstrips;
 
+    /* If the value was already computed and store in td_nstrips, then return it,
+       since ChopUpSingleUncompressedStrip might have altered and resized the
+       since the td_stripbytecount and td_stripoffset arrays to the new value
+       after the initial affectation of td_nstrips = TIFFNumberOfStrips() in
+       tif_dirread.c ~line 3612.
+       See http://bugzilla.maptools.org/show_bug.cgi?id=2587 */
+    if( td->td_nstrips )
+        return td->td_nstrips;
+
 	nstrips = (td->td_rowsperstrip == (uint32) -1 ? 1 :
 	     TIFFhowmany_32(td->td_imagelength, td->td_rowsperstrip));
 	if (td->td_planarconfig == PLANARCONFIG_SEPARATE)
diff --git a/src/3rdparty/libtiff/libtiff/tif_swab.c b/src/3rdparty/libtiff/libtiff/tif_swab.c
index f37e33f..211dc57 100644
--- a/src/3rdparty/libtiff/libtiff/tif_swab.c
+++ b/src/3rdparty/libtiff/libtiff/tif_swab.c
@@ -1,4 +1,4 @@
-/* $Id: tif_swab.c,v 1.13 2010-03-10 18:56:49 bfriesen Exp $ */
+/* $Id: tif_swab.c,v 1.14 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -296,8 +296,10 @@ TIFFReverseBits(uint8* cp, tmsize_t n)
 		cp[7] = TIFFBitRevTable[cp[7]];
 		cp += 8;
 	}
-	while (n-- > 0)
-		*cp = TIFFBitRevTable[*cp], cp++;
+	while (n-- > 0) {
+		*cp = TIFFBitRevTable[*cp];
+		cp++;
+	}
 }
 
 /* vim: set ts=8 sts=8 sw=8 noet: */
diff --git a/src/3rdparty/libtiff/libtiff/tif_thunder.c b/src/3rdparty/libtiff/libtiff/tif_thunder.c
index 390891c..183199d 100644
--- a/src/3rdparty/libtiff/libtiff/tif_thunder.c
+++ b/src/3rdparty/libtiff/libtiff/tif_thunder.c
@@ -1,4 +1,4 @@
-/* $Id: tif_thunder.c,v 1.12 2011-04-02 20:54:09 bfriesen Exp $ */
+/* $Id: tif_thunder.c,v 1.13 2016-09-04 21:32:56 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -100,7 +100,8 @@ ThunderDecode(TIFF* tif, uint8* op, tmsize_t maxpixels)
 	while (cc > 0 && npixels < maxpixels) {
 		int n, delta;
 
-		n = *bp++, cc--;
+		n = *bp++;
+		cc--;
 		switch (n & THUNDER_CODE) {
 		case THUNDER_RUN:		/* pixel run */
 			/*
diff --git a/src/3rdparty/libtiff/libtiff/tif_write.c b/src/3rdparty/libtiff/libtiff/tif_write.c
index 7996c31..34c4d81 100644
--- a/src/3rdparty/libtiff/libtiff/tif_write.c
+++ b/src/3rdparty/libtiff/libtiff/tif_write.c
@@ -1,4 +1,4 @@
-/* $Id: tif_write.c,v 1.42 2015-06-07 23:00:23 bfriesen Exp $ */
+/* $Id: tif_write.c,v 1.45 2016-09-23 22:12:18 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -258,6 +258,23 @@ TIFFWriteEncodedStrip(TIFF* tif, uint32 strip, void* data, tmsize_t cc)
     tif->tif_rawcp = tif->tif_rawdata;
 
 	tif->tif_flags &= ~TIFF_POSTENCODE;
+
+    /* shortcut to avoid an extra memcpy() */
+    if( td->td_compression == COMPRESSION_NONE )
+    {
+        /* swab if needed - note that source buffer will be altered */
+        tif->tif_postdecode( tif, (uint8*) data, cc );
+
+        if (!isFillOrder(tif, td->td_fillorder) &&
+            (tif->tif_flags & TIFF_NOBITREV) == 0)
+            TIFFReverseBits((uint8*) data, cc);
+
+        if (cc > 0 &&
+            !TIFFAppendToStrip(tif, strip, (uint8*) data, cc))
+            return ((tmsize_t) -1);
+        return (cc);
+    }
+
 	sample = (uint16)(strip / td->td_stripsperimage);
 	if (!(*tif->tif_preencode)(tif, sample))
 		return ((tmsize_t) -1);
@@ -266,7 +283,7 @@ TIFFWriteEncodedStrip(TIFF* tif, uint32 strip, void* data, tmsize_t cc)
 	tif->tif_postdecode( tif, (uint8*) data, cc );
 
 	if (!(*tif->tif_encodestrip)(tif, (uint8*) data, cc, sample))
-		return (0);
+		return ((tmsize_t) -1);
 	if (!(*tif->tif_postencode)(tif))
 		return ((tmsize_t) -1);
 	if (!isFillOrder(tif, td->td_fillorder) &&
@@ -431,9 +448,7 @@ TIFFWriteEncodedTile(TIFF* tif, uint32 tile, void* data, tmsize_t cc)
 		tif->tif_flags |= TIFF_CODERSETUP;
 	}
 	tif->tif_flags &= ~TIFF_POSTENCODE;
-	sample = (uint16)(tile/td->td_stripsperimage);
-	if (!(*tif->tif_preencode)(tif, sample))
-		return ((tmsize_t)(-1));
+
 	/*
 	 * Clamp write amount to the tile size.  This is mostly
 	 * done so that callers can pass in some large number
@@ -442,11 +457,30 @@ TIFFWriteEncodedTile(TIFF* tif, uint32 tile, void* data, tmsize_t cc)
 	if ( cc < 1 || cc > tif->tif_tilesize)
 		cc = tif->tif_tilesize;
 
+    /* shortcut to avoid an extra memcpy() */
+    if( td->td_compression == COMPRESSION_NONE )
+    {
+        /* swab if needed - note that source buffer will be altered */
+        tif->tif_postdecode( tif, (uint8*) data, cc );
+
+        if (!isFillOrder(tif, td->td_fillorder) &&
+            (tif->tif_flags & TIFF_NOBITREV) == 0)
+            TIFFReverseBits((uint8*) data, cc);
+
+        if (cc > 0 &&
+            !TIFFAppendToStrip(tif, tile, (uint8*) data, cc))
+            return ((tmsize_t) -1);
+        return (cc);
+    }
+
+    sample = (uint16)(tile/td->td_stripsperimage);
+    if (!(*tif->tif_preencode)(tif, sample))
+        return ((tmsize_t)(-1));
         /* swab if needed - note that source buffer will be altered */
 	tif->tif_postdecode( tif, (uint8*) data, cc );
 
 	if (!(*tif->tif_encodetile)(tif, (uint8*) data, cc, sample))
-		return (0);
+		return ((tmsize_t) -1);
 	if (!(*tif->tif_postencode)(tif))
 		return ((tmsize_t)(-1));
 	if (!isFillOrder(tif, td->td_fillorder) &&
@@ -764,7 +798,14 @@ TIFFFlushData1(TIFF* tif)
 		if (!TIFFAppendToStrip(tif,
 		    isTiled(tif) ? tif->tif_curtile : tif->tif_curstrip,
 		    tif->tif_rawdata, tif->tif_rawcc))
+        {
+            /* We update those variables even in case of error since there's */
+            /* code that doesn't really check the return code of this */
+            /* function */
+            tif->tif_rawcc = 0;
+            tif->tif_rawcp = tif->tif_rawdata;
 			return (0);
+        }
 		tif->tif_rawcc = 0;
 		tif->tif_rawcp = tif->tif_rawdata;
 	}
diff --git a/src/3rdparty/libtiff/libtiff/tif_zip.c b/src/3rdparty/libtiff/libtiff/tif_zip.c
index 22e9f35..8c35aea 100644
--- a/src/3rdparty/libtiff/libtiff/tif_zip.c
+++ b/src/3rdparty/libtiff/libtiff/tif_zip.c
@@ -1,4 +1,4 @@
-/* $Id: tif_zip.c,v 1.33 2014-12-25 18:29:11 erouault Exp $ */
+/* $Id: tif_zip.c,v 1.36 2016-11-12 16:48:28 erouault Exp $ */
 
 /*
  * Copyright (c) 1995-1997 Sam Leffler
@@ -135,7 +135,7 @@ ZIPPreDecode(TIFF* tif, uint16 s)
 	assert(sizeof(sp->stream.avail_in)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
+	    appropriately even before we simplify it */
 	sp->stream.avail_in = (uInt) tif->tif_rawcc;
 	if ((tmsize_t)sp->stream.avail_in != tif->tif_rawcc)
 	{
@@ -162,7 +162,7 @@ ZIPDecode(TIFF* tif, uint8* op, tmsize_t occ, uint16 s)
 	assert(sizeof(sp->stream.avail_out)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
+	    appropriately even before we simplify it */
 	sp->stream.avail_out = (uInt) occ;
 	if ((tmsize_t)sp->stream.avail_out != occ)
 	{
@@ -239,8 +239,8 @@ ZIPPreEncode(TIFF* tif, uint16 s)
 	assert(sizeof(sp->stream.avail_out)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
-	sp->stream.avail_out = tif->tif_rawdatasize;
+	    appropriately even before we simplify it */
+	sp->stream.avail_out = (uInt)tif->tif_rawdatasize;
 	if ((tmsize_t)sp->stream.avail_out != tif->tif_rawdatasize)
 	{
 		TIFFErrorExt(tif->tif_clientdata, module, "ZLib cannot deal with buffers this size");
@@ -266,7 +266,7 @@ ZIPEncode(TIFF* tif, uint8* bp, tmsize_t cc, uint16 s)
 	assert(sizeof(sp->stream.avail_in)==4);  /* if this assert gets raised,
 	    we need to simplify this code to reflect a ZLib that is likely updated
 	    to deal with 8byte memory sizes, though this code will respond
-	    apropriately even before we simplify it */
+	    appropriately even before we simplify it */
 	sp->stream.avail_in = (uInt) cc;
 	if ((tmsize_t)sp->stream.avail_in != cc)
 	{
@@ -460,7 +460,7 @@ bad:
 		     "No space for ZIP state block");
 	return (0);
 }
-#endif /* ZIP_SUPORT */
+#endif /* ZIP_SUPPORT */
 
 /* vim: set ts=8 sts=8 sw=8 noet: */
 /*
diff --git a/src/3rdparty/libtiff/libtiff/tiff.h b/src/3rdparty/libtiff/libtiff/tiff.h
index bc46acd..fb39634 100644
--- a/src/3rdparty/libtiff/libtiff/tiff.h
+++ b/src/3rdparty/libtiff/libtiff/tiff.h
@@ -1,4 +1,4 @@
-/* $Id: tiff.h,v 1.69 2014-04-02 17:23:06 fwarmerdam Exp $ */
+/* $Id: tiff.h,v 1.70 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -279,7 +279,7 @@ typedef enum {
 #define     PREDICTOR_FLOATINGPOINT	3	/* floating point predictor */
 #define	TIFFTAG_WHITEPOINT		318	/* image white point */
 #define	TIFFTAG_PRIMARYCHROMATICITIES	319	/* !primary chromaticities */
-#define	TIFFTAG_COLORMAP		320	/* RGB map for pallette image */
+#define	TIFFTAG_COLORMAP		320	/* RGB map for palette image */
 #define	TIFFTAG_HALFTONEHINTS		321	/* !highlight+shadow info */
 #define	TIFFTAG_TILEWIDTH		322	/* !tile width in pixels */
 #define	TIFFTAG_TILELENGTH		323	/* !tile height in pixels */
@@ -358,7 +358,7 @@ typedef enum {
 #define	TIFFTAG_JPEGRESTARTINTERVAL	515	/* !restart interval length */
 #define	TIFFTAG_JPEGLOSSLESSPREDICTORS	517	/* !lossless proc predictor */
 #define	TIFFTAG_JPEGPOINTTRANSFORM	518	/* !lossless point transform */
-#define	TIFFTAG_JPEGQTABLES		519	/* !Q matrice offsets */
+#define	TIFFTAG_JPEGQTABLES		519	/* !Q matrix offsets */
 #define	TIFFTAG_JPEGDCTABLES		520	/* !DCT table offsets */
 #define	TIFFTAG_JPEGACTABLES		521	/* !AC coefficient offsets */
 #define	TIFFTAG_YCBCRCOEFFICIENTS	529	/* !RGB -> YCbCr transform */
diff --git a/src/3rdparty/libtiff/libtiff/tiffio.h b/src/3rdparty/libtiff/libtiff/tiffio.h
index 038b670..6a84d80 100644
--- a/src/3rdparty/libtiff/libtiff/tiffio.h
+++ b/src/3rdparty/libtiff/libtiff/tiffio.h
@@ -1,4 +1,4 @@
-/* $Id: tiffio.h,v 1.91 2012-07-29 15:45:29 tgl Exp $ */
+/* $Id: tiffio.h,v 1.92 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -208,7 +208,7 @@ struct _TIFFRGBAImage {
 	uint16 orientation;                     /* image orientation */
 	uint16 req_orientation;                 /* requested orientation */
 	uint16 photometric;                     /* image photometric interp */
-	uint16* redcmap;                        /* colormap pallete */
+	uint16* redcmap;                        /* colormap palette */
 	uint16* greencmap;
 	uint16* bluecmap;
 	/* get image data routine */
@@ -225,7 +225,7 @@ struct _TIFFRGBAImage {
 	TIFFYCbCrToRGB* ycbcr;                  /* YCbCr conversion state */
 	TIFFCIELabToRGB* cielab;                /* CIE L*a*b conversion state */
 
-	uint8* UaToAa;                          /* Unassociated alpha to associated alpha convertion LUT */
+	uint8* UaToAa;                          /* Unassociated alpha to associated alpha conversion LUT */
 	uint8* Bitdepth16To8;                   /* LUT for conversion from 16bit to 8bit values */
 
 	int row_offset;
diff --git a/src/3rdparty/libtiff/libtiff/tiffiop.h b/src/3rdparty/libtiff/libtiff/tiffiop.h
index e5d862b..0b4e590 100644
--- a/src/3rdparty/libtiff/libtiff/tiffiop.h
+++ b/src/3rdparty/libtiff/libtiff/tiffiop.h
@@ -1,4 +1,4 @@
-/* $Id: tiffiop.h,v 1.87 2015-08-23 17:49:01 bfriesen Exp $ */
+/* $Id: tiffiop.h,v 1.89 2016-01-23 21:20:34 erouault Exp $ */
 
 /*
  * Copyright (c) 1988-1997 Sam Leffler
@@ -90,7 +90,7 @@ typedef struct client_info {
 
 /*
  * Typedefs for ``method pointers'' used internally.
- * these are depriciated and provided only for backwards compatibility
+ * these are deprecated and provided only for backwards compatibility.
  */
 typedef unsigned char tidataval_t;    /* internal image data value type */
 typedef tidataval_t* tidata_t;        /* reference to internal image data */
@@ -109,33 +109,33 @@ struct tiff {
 	int                  tif_fd;           /* open file descriptor */
 	int                  tif_mode;         /* open mode (O_*) */
 	uint32               tif_flags;
-	#define TIFF_FILLORDER   0x00003 /* natural bit fill order for machine */
-	#define TIFF_DIRTYHEADER 0x00004 /* header must be written on close */
-	#define TIFF_DIRTYDIRECT 0x00008 /* current directory must be written */
-	#define TIFF_BUFFERSETUP 0x00010 /* data buffers setup */
-	#define TIFF_CODERSETUP  0x00020 /* encoder/decoder setup done */
-	#define TIFF_BEENWRITING 0x00040 /* written 1+ scanlines to file */
-	#define TIFF_SWAB        0x00080 /* byte swap file information */
-	#define TIFF_NOBITREV    0x00100 /* inhibit bit reversal logic */
-	#define TIFF_MYBUFFER    0x00200 /* my raw data buffer; free on close */
-	#define TIFF_ISTILED     0x00400 /* file is tile, not strip- based */
-	#define TIFF_MAPPED      0x00800 /* file is mapped into memory */
-	#define TIFF_POSTENCODE  0x01000 /* need call to postencode routine */
-	#define TIFF_INSUBIFD    0x02000 /* currently writing a subifd */
-	#define TIFF_UPSAMPLED   0x04000 /* library is doing data up-sampling */
-	#define TIFF_STRIPCHOP   0x08000 /* enable strip chopping support */
-	#define TIFF_HEADERONLY  0x10000 /* read header only, do not process the first directory */
-	#define TIFF_NOREADRAW   0x20000 /* skip reading of raw uncompressed image data */
-	#define TIFF_INCUSTOMIFD 0x40000 /* currently writing a custom IFD */
-	#define TIFF_BIGTIFF     0x80000 /* read/write bigtiff */
-        #define TIFF_BUF4WRITE  0x100000 /* rawcc bytes are for writing */
-        #define TIFF_DIRTYSTRIP 0x200000 /* stripoffsets/stripbytecount dirty*/
-        #define TIFF_PERSAMPLE  0x400000 /* get/set per sample tags as arrays */
-        #define TIFF_BUFFERMMAP 0x800000 /* read buffer (tif_rawdata) points into mmap() memory */
+	#define TIFF_FILLORDER   0x00003U /* natural bit fill order for machine */
+	#define TIFF_DIRTYHEADER 0x00004U /* header must be written on close */
+	#define TIFF_DIRTYDIRECT 0x00008U /* current directory must be written */
+	#define TIFF_BUFFERSETUP 0x00010U /* data buffers setup */
+	#define TIFF_CODERSETUP  0x00020U /* encoder/decoder setup done */
+	#define TIFF_BEENWRITING 0x00040U /* written 1+ scanlines to file */
+	#define TIFF_SWAB        0x00080U /* byte swap file information */
+	#define TIFF_NOBITREV    0x00100U /* inhibit bit reversal logic */
+	#define TIFF_MYBUFFER    0x00200U /* my raw data buffer; free on close */
+	#define TIFF_ISTILED     0x00400U /* file is tile, not strip- based */
+	#define TIFF_MAPPED      0x00800U /* file is mapped into memory */
+	#define TIFF_POSTENCODE  0x01000U /* need call to postencode routine */
+	#define TIFF_INSUBIFD    0x02000U /* currently writing a subifd */
+	#define TIFF_UPSAMPLED   0x04000U /* library is doing data up-sampling */
+	#define TIFF_STRIPCHOP   0x08000U /* enable strip chopping support */
+	#define TIFF_HEADERONLY  0x10000U /* read header only, do not process the first directory */
+	#define TIFF_NOREADRAW   0x20000U /* skip reading of raw uncompressed image data */
+	#define TIFF_INCUSTOMIFD 0x40000U /* currently writing a custom IFD */
+	#define TIFF_BIGTIFF     0x80000U /* read/write bigtiff */
+        #define TIFF_BUF4WRITE  0x100000U /* rawcc bytes are for writing */
+        #define TIFF_DIRTYSTRIP 0x200000U /* stripoffsets/stripbytecount dirty*/
+        #define TIFF_PERSAMPLE  0x400000U /* get/set per sample tags as arrays */
+        #define TIFF_BUFFERMMAP 0x800000U /* read buffer (tif_rawdata) points into mmap() memory */
 	uint64               tif_diroff;       /* file offset of current directory */
 	uint64               tif_nextdiroff;   /* file offset of following directory */
 	uint64*              tif_dirlist;      /* list of offsets to already seen directories to prevent IFD looping */
-	uint16               tif_dirlistsize;  /* number of entires in offset list */
+	uint16               tif_dirlistsize;  /* number of entries in offset list */
 	uint16               tif_dirnumber;    /* number of already seen directories */
 	TIFFDirectory        tif_dir;          /* internal rep of current directory */
 	TIFFDirectory        tif_customdir;    /* custom IFDs are separated from the main ones */
diff --git a/src/3rdparty/libtiff/libtiff/tiffvers.h b/src/3rdparty/libtiff/libtiff/tiffvers.h
index e965814..fe55c72 100644
--- a/src/3rdparty/libtiff/libtiff/tiffvers.h
+++ b/src/3rdparty/libtiff/libtiff/tiffvers.h
@@ -1,4 +1,4 @@
-#define TIFFLIB_VERSION_STR "LIBTIFF, Version 4.0.6\nCopyright (c) 1988-1996 Sam Leffler\nCopyright (c) 1991-1996 Silicon Graphics, Inc."
+#define TIFFLIB_VERSION_STR "LIBTIFF, Version 4.0.7\nCopyright (c) 1988-1996 Sam Leffler\nCopyright (c) 1991-1996 Silicon Graphics, Inc."
 /*
  * This define can be used in code that requires
  * compilation-related definitions specific to a
@@ -6,4 +6,4 @@
  * version checking should be done based on the
  * string returned by TIFFGetVersion.
  */
-#define TIFFLIB_VERSION 20150912
+#define TIFFLIB_VERSION 20161119
diff --git a/src/3rdparty/libtiff/libtiff/uvcode.h b/src/3rdparty/libtiff/libtiff/uvcode.h
index 50f11d7..6286cfb 100644
--- a/src/3rdparty/libtiff/libtiff/uvcode.h
+++ b/src/3rdparty/libtiff/libtiff/uvcode.h
@@ -3,7 +3,7 @@
 #define UV_NDIVS	16289
 #define UV_VSTART	(float)0.016940
 #define UV_NVS		163
-static struct {
+static const struct {
 	float	ustart;
 	short	nus, ncum;
 }	uv_row[UV_NVS] = {
diff --git a/src/3rdparty/libtiff/qt_attribution.json b/src/3rdparty/libtiff/qt_attribution.json
index 8a00a8f..e330f93 100644
--- a/src/3rdparty/libtiff/qt_attribution.json
+++ b/src/3rdparty/libtiff/qt_attribution.json
@@ -6,7 +6,7 @@
 
     "Description": "",
     "Homepage": "http://www.simplesystems.org/libtiff/",
-    "Version": "4.0.6",
+    "Version": "4.0.7",
     "License": "libtiff License",
     "LicenseId": "libtiff",
     "LicenseFile": "COPYRIGHT",
diff --git a/src/3rdparty/libwebp.pri b/src/3rdparty/libwebp.pri
index 1fc719b..f21ba6d 100644
--- a/src/3rdparty/libwebp.pri
+++ b/src/3rdparty/libwebp.pri
@@ -36,6 +36,7 @@ SOURCES += \
     $$PWD/libwebp/src/dsp/cpu.c \
     $$PWD/libwebp/src/dsp/dec.c \
     $$PWD/libwebp/src/dsp/dec_mips_dsp_r2.c \
+    $$PWD/libwebp/src/dsp/dec_msa.c \
     $$PWD/libwebp/src/dsp/dec_sse2.c \
     $$PWD/libwebp/src/dsp/dec_sse41.c \
     $$PWD/libwebp/src/dsp/enc.c \
diff --git a/src/3rdparty/libwebp/AUTHORS b/src/3rdparty/libwebp/AUTHORS
index ea6e21f..0f382da 100644
--- a/src/3rdparty/libwebp/AUTHORS
+++ b/src/3rdparty/libwebp/AUTHORS
@@ -10,10 +10,13 @@ Contributors:
 - Lode Vandevenne (lode at google dot com)
 - Lou Quillio (louquillio at google dot com)
 - Mans Rullgard (mans at mansr dot com)
+- Marcin Kowalczyk (qrczak at google dot com)
 - Martin Olsson (mnemo at minimum dot se)
 - Mikołaj Zalewski (mikolajz at google dot com)
 - Mislav Bradac (mislavm at google dot com)
+- Nico Weber (thakis at chromium dot org)
 - Noel Chromium (noel at chromium dot org)
+- Parag Salasakar (img dot mips1 at gmail dot com)
 - Pascal Massimino (pascal dot massimino at gmail dot com)
 - Paweł Hajdan, Jr (phajdan dot jr at chromium dot org)
 - Pierre Joye (pierre dot php at gmail dot com)
diff --git a/src/3rdparty/libwebp/ChangeLog b/src/3rdparty/libwebp/ChangeLog
index 2f8def2..99fb3c0 100644
--- a/src/3rdparty/libwebp/ChangeLog
+++ b/src/3rdparty/libwebp/ChangeLog
@@ -1,3 +1,179 @@
+deb54d9 Clarify the expected 'config' lifespan in WebPIDecode()
+c7e2d24 update ChangeLog (tag: v0.5.1-rc5)
+c7eb06f Fix corner case in CostManagerInit.
+ab7937a gif2webp: normalize the number of .'s in the help message
+3cdec84 vwebp: normalize the number of .'s in the help message
+bdf6241 cwebp: normalize the number of .'s in the help message
+06a38c7 fix rescaling bug: alpha plane wasn't filled with 0xff
+319e37b Improve lossless compression.
+447adbc 'our bug tracker' -> 'the bug tracker'
+97b9e64 normalize the number of .'s in the help message
+bb50bf4 pngdec,ReadFunc: throw an error on invalid read
+38063af decode.h,WebPGetInfo: normalize function comment
+9e8e1b7 Inline GetResidual for speed.
+7d58d1b Speed-up uniform-region processing.
+23e29cb Merge "Fix a boundary case in BackwardReferencesHashChainDistanceOnly." into 0.5.1
+0bb23b2 free -> WebPSafeFree()
+e7b9177 Merge "DecodeImageData(): change the incorrect assert" into 0.5.1
+2abfa54 DecodeImageData(): change the incorrect assert
+5a48fcd Merge "configure: test for -Wfloat-conversion"
+0174d18 Fix a boundary case in BackwardReferencesHashChainDistanceOnly.
+6a9c262 Merge "Added MSA optimized transform functions"
+cfbcc5e Make sure to consider small distances in LZ77.
+5e60c42 Added MSA optimized transform functions
+3dc28d7 configure: test for -Wfloat-conversion
+f2a0946 add some asserts to delimit the perimeter of CostManager's operation
+9a583c6 fix invalid-write bug for alpha-decoding
+f66512d make gradlew executable
+6fda58f backward_references: quiet double->int warning
+a48cc9d Merge "Fix a compression regression for images with long uniform regions." into 0.5.1
+cc2720c Merge "Revert an LZ77 boundary constant." into 0.5.1
+059aab4 Fix a compression regression for images with long uniform regions.
+b0c7e49 Check more backward matches with higher quality.
+a361151 Revert an LZ77 boundary constant.
+8190374 README: fix typo
+7551db4 update NEWS
+0fb2269 bump version to 0.5.1
+f453761 update AUTHORS & .mailmap
+3259571 Refactor GetColorPalette method.
+1df5e26 avoid using tmp histogram in PreparePair()
+7685123 fix comment typos
+a246b92 Speedup backward references.
+76d73f1 Merge "CostManager: introduce a free-list of ~10 intervals"
+eab39d8 CostManager: introduce a free-list of ~10 intervals
+4c59aac Merge "mips msa webp configuration"
+043c33f Merge "Improve speed and compression in backward reference for lossless."
+71be9b8 Merge "clarify variable names in HistogramRemap()"
+0ba7fd7 Improve speed and compression in backward reference for lossless.
+0481d42 CostManager: cache one interval and re-use it when possible
+41b7e6b Merge "histogram: fix bin calculation"
+96c3d62 histogram: fix bin calculation
+fe9e31e clarify variable names in HistogramRemap()
+ce3c824 disable near-lossless quantization if palette is used
+e11da08 mips msa webp configuration
+5f8f998 mux: Presence of unknown chunks should trigger VP8X chunk output.
+cadec0b Merge "Sync mips32 and dsp_r2 YUV->RGB code with C verison"
+d963775 Compute the hash chain once and for all for lossless compression.
+50a4866 Sync mips32 and dsp_r2 YUV->RGB code with C verison
+eee788e Merge "introduce a common signature for all image reader function"
+d77b877 introduce a common signature for all image reader function
+ca8d951 remove some obsolete TODOs
+ae2a722 collect all decoding utilities from examples/ in libexampledec.a
+0b8ae85 Merge "Move DitherCombine8x8 to dsp/dec.c"
+77cad88 Merge "ReadWebP: avoid conversion to ARGB if final format is YUVA"
+ab8d669 ReadWebP: avoid conversion to ARGB if final format is YUVA
+f8b7ce9 Merge "test pointer to NULL explicitly"
+5df6f21 test pointer to NULL explicitly
+77f21c9 Move DitherCombine8x8 to dsp/dec.c
+c9e6d86 Add gradle support
+c65f41e Revert "Add gradle support"
+bf731ed Add gradle support
+08333b8 WebPAnimEncoder: Detect when canvas is modified, restore only when needed.
+0209d7e Merge "speed-up MapToPalette() with binary search"
+fdd29a3 speed-up MapToPalette() with binary search
+cf4a651 Revert "Refactor GetColorPalette method."
+0a27aca Merge changes Idfa8ce83,I19adc9c4
+f25c440 WebPAnimEncoder: Restore original canvas between multiple encodes.
+169004b Refactor GetColorPalette method.
+576362a VP8LDoFillBitWindow: support big-endian in fast path
+ac49e4e bit_reader.c: s/VP8L_USE_UNALIGNED_LOAD/VP8L_USE_FAST_LOAD/
+d39ceb5 VP8LDoFillBitWindow: remove stale TODO
+2ec2de1 Merge "Speed-up BackwardReferencesHashChainDistanceOnly."
+3e023c1 Speed-up BackwardReferencesHashChainDistanceOnly.
+f2e1efb Improve near lossless compression when a prediction filter is used.
+e15afbc dsp.h: fix ubsan macro name
+e53c9cc dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
+af81fdb utils.h: quiet -fsanitize=undefined warnings
+ea0be35 dsp.h: remove utils.h include
+cd276ae utils/*.c: ../utils/utils.h -> ./utils.h
+c892713 utils/Makefile.am: add some missing headers
+ea24e02 Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF"
+369e264 dsp.h: add WEBP_UBSAN_IGNORE_UNDEF
+0d020a7 Merge "add runtime NEON detection"
+5ee2136 Merge "add VP8LAddPixels() to lossless.h"
+47435a6 add VP8LAddPixels() to lossless.h
+8fa6ac6 remove two ubsan warnings
+74fb56f add runtime NEON detection
+4154a83 MIPS update to new Unfilter API
+c80b9fc Merge "cherry-pick decoder fix for 64-bit android devices"
+6235147 cherry-pick decoder fix for 64-bit android devices
+d41b8c4 configure: test for -Wformat-* w/-Wformat present
+5f95589 Fix WEBP_ALIGN in case the argument is a pointer to a type larger than a byte.
+2309fd5 replace num_parts_ by num_parts_minus_one_ (unsigned)
+9629f4b SimplifySegments: quiet -Warray-bounds warning
+de47492 Merge "update the Unfilter API in dsp to process one row independently"
+2102ccd update the Unfilter API in dsp to process one row independently
+e3912d5 WebPAnimEncoder: Restore canvas before evaluating blending possibility.
+6e12e1e WebPAnimEncoder: Fix for single-frame optimization.
+602f344 Merge changes I1d03acac,Ifcb64219
+95ecccf only apply color-mapping for alpha on the cropped area
+47dd070 anim_diff: Add an experimental option for max inter-frame diff.
+aa809cf only allocate alpha_plane_ up to crop_bottom row
+31f2b8d WebPAnimEncoder: FlattenSimilarPixels(): look for similar
+774dfbd perform alpha filtering within the decoding loop
+a4cae68 lossless decoding: only process decoded row up to last_row
+238cdcd Only call WebPDequantizeLevels() on cropped area
+cf6c713 alpha: preparatory cleanup
+b95ac0a Merge "VP8GetHeaders(): initialize VP8Io with sane value for crop/scale dimensions"
+8923139 VP8GetHeaders(): initialize VP8Io with sane value for crop/scale dimensions
+5828e19 use_8b_decode -> use_8b_decode_
+8dca024 fix bug in alpha.c that was triggering a memory error in incremental mode
+9a950c5 WebPAnimEncoder: Disable filtering when blending is used with lossy encoding.
+eb42390 WebPAnimEncoder: choose max diff for framerect based on quality.
+ff0a94b WebPAnimEncoder lossy: ignore small pixel differences for frame rectangles.
+f804008 gif2webp: Remove the 'prev_to_prev_canvas' buffer.
+6d8c07d Merge "WebPDequantizeLevels(): use stride in CountLevels()"
+d96fe5e WebPDequantizeLevels(): use stride in CountLevels()
+ec1b240 WebPPictureImport*: check output pointer
+c076876 Merge "Revert "Re-enable encoding of alpha plane with color cache for next release.""
+41f14bc WebPPictureImport*: check src pointer
+64eed38 Pass stride parameter to WebPDequantizeLevels()
+97934e2 Revert "Re-enable encoding of alpha plane with color cache for next release."
+e88c4ca fix -m 2 mode-cost evaluation (causing partition0 overflow)
+4562e83 Merge "add extra meaning to WebPDecBuffer::is_external_memory"
+abdb109 add extra meaning to WebPDecBuffer::is_external_memory
+875aec7 enc_neon,cosmetics: break long comment
+71e856c GetMBSSIM,cosmetics: fix alignment
+a90edff fix missing 'extern' for SSIM function in dsp/
+423ecaf move some SSIM-accumulation function for dsp/
+f08e662 Merge "Fix FindClosestDiscretized in near lossless:"
+0d40cc5 enc_neon,Disto4x4: remove an unnecessary transpose
+e8feb20 Fix FindClosestDiscretized in near lossless:
+8200643 anim_util: quiet static analysis warning
+a6f23c4 Merge "AnimEncoder: Support progress hook and user data."
+a519377 Merge "Near lossless feature: fix some comments."
+da98d31 AnimEncoder: Support progress hook and user data.
+3335713 Near lossless feature: fix some comments.
+0beed01 cosmetics: fix indent after 2f5e898
+6753f35 Merge "FTransformWHT optimization."
+6583bb1 Improve SSE4.1 implementation of TTransform.
+7561d0c FTransformWHT optimization.
+7ccdb73 fix indentation after patch #328220
+6ec0d2a clarify the logic of the error path when decoding fails.
+8aa352b Merge "Remove an unnecessary transposition in TTransform."
+db86088 Merge "remove useless #include"
+9960c31 Remove an unnecessary transposition in TTransform.
+6e36b51 Small speedup in FTransform.
+9dbd4aa Merge "fix C and SIMD flags completion."
+e60853e Add missing common_sse2.h file to makefile.unix
+696eb2b fix C and SIMD flags completion.
+2b4fe33 Merge "fix multiple allocation for transform buffer"
+2f5e898 fix multiple allocation for transform buffer
+bf2b4f1 Regroup common SSE code + optimization.
+4ed650a force "-pass 6" if -psnr or -size is used but -pass isn't.
+3ef1ce9 yuv_sse2: fix -Wconstant-conversion warning
+a7a03e9 Merge changes I4852d18f,I51ccb85d
+5e122bd gif2webp: set enc_options.verbose = 0 w/-quiet
+ab3c258 anim_encode,DefaultEncoderOptions: init verbose
+8f0dee7 Merge "configure: fix builtin detection w/-Werror"
+4a7b85a cmake: fix builtin detection w/-Werror
+b74657f configure: fix builtin detection w/-Werror
+3661b98 Add a CMakeLists.txt
+75f4af4 remove useless #include
+6c1d763 avoid Yoda style for comparison
+8ce975a SSE optimization for vector mismatch.
+7db5383 Merge tag 'v0.5.0'
+37f0494 update ChangeLog (tag: v0.5.0-rc1, tag: v0.5.0, origin/0.5.0, 0.5.0)
 7e7b6cc faster rgb565/rgb4444/argb output
 4c7f565 update NEWS
 1f62b6b update AUTHORS
@@ -140,7 +316,7 @@ f7c507a Merge "remove unnecessary #include "yuv.h""
 7861578 for ReadXXXX() image-readers, use the value of pic->use_argb
 14e4043 remove unnecessary #include "yuv.h"
 469ba2c vwebp: fix incorrect clipping w/NO_BLEND
-4b9186b update issue tracker url (master)
+4b9186b update issue tracker url
 d64d376 change WEBP_ALIGN_CST value to 31
 f717b82 vp8l.c, cosmetics: fix indent after 95509f9
 927ccdc Merge "fix alignment of allocated memory in AllocateTransformBuffer"
diff --git a/src/3rdparty/libwebp/NEWS b/src/3rdparty/libwebp/NEWS
index a72f179..30554bf 100644
--- a/src/3rdparty/libwebp/NEWS
+++ b/src/3rdparty/libwebp/NEWS
@@ -1,3 +1,17 @@
+- 6/14/2016: version 0.5.1
+  This is a binary compatible release.
+  * miscellaneous bug fixes (issues #280, #289)
+  * reverted alpha plane encoding with color cache for compatibility with
+    libwebp 0.4.0->0.4.3 (issues #291, #298)
+  * lossless encoding performance improvements
+  * memory reduction in both lossless encoding and decoding
+  * force mux output to be in the extended format (VP8X) when undefined chunks
+    are present (issue #294)
+  * gradle, cmake build support
+  * workaround for compiler bug causing 64-bit decode failures on android
+    devices using clang-3.8 in the r11c NDK
+  * various WebPAnimEncoder improvements
+
 - 12/17/2015: version 0.5.0
   * miscellaneous bug & build fixes (issues #234, #258, #274, #275, #278)
   * encoder & decoder speed-ups on x86/ARM/MIPS for lossy & lossless
diff --git a/src/3rdparty/libwebp/README b/src/3rdparty/libwebp/README
index 381b927..90f8f10 100644
--- a/src/3rdparty/libwebp/README
+++ b/src/3rdparty/libwebp/README
@@ -4,7 +4,7 @@
           \__\__/\____/\_____/__/ ____  ___
                 / _/ /    \    \ /  _ \/ _/
                /  \_/   / /   \ \   __/  \__
-               \____/____/\_____/_____/____/v0.5.0
+               \____/____/\_____/_____/____/v0.5.1
 
 Description:
 ============
@@ -20,7 +20,7 @@ https://chromium.googlesource.com/webm/libwebp
 
 It is released under the same license as the WebM project.
 See http://www.webmproject.org/license/software/ or the
-file "COPYING" file for details. An additional intellectual
+"COPYING" file for details. An additional intellectual
 property rights grant can be found in the file PATENTS.
 
 Building:
@@ -84,6 +84,73 @@ be installed independently using a minor modification in the corresponding
 Makefile.am configure files (see comments there). See './configure --help' for
 more options.
 
+Building for MIPS Linux:
+------------------------
+MIPS Linux toolchain stable available releases can be found at:
+https://community.imgtec.com/developers/mips/tools/codescape-mips-sdk/available-releases/
+
+# Add toolchain to PATH
+export PATH=$PATH:/path/to/toolchain/bin
+
+# 32-bit build for mips32r5 (p5600)
+HOST=mips-mti-linux-gnu
+MIPS_CFLAGS="-O3 -mips32r5 -mabi=32 -mtune=p5600 -mmsa -mfp64 \
+  -msched-weight -mload-store-pairs -fPIE"
+MIPS_LDFLAGS="-mips32r5 -mabi=32 -mmsa -mfp64 -pie"
+
+# 64-bit build for mips64r6 (i6400)
+HOST=mips-img-linux-gnu
+MIPS_CFLAGS="-O3 -mips64r6 -mabi=64 -mtune=i6400 -mmsa -mfp64 \
+  -msched-weight -mload-store-pairs -fPIE"
+MIPS_LDFLAGS="-mips64r6 -mabi=64 -mmsa -mfp64 -pie"
+
+./configure --host=${HOST} --build=`config.guess` \
+  CC="${HOST}-gcc -EL" \
+  CFLAGS="$MIPS_CFLAGS" \
+  LDFLAGS="$MIPS_LDFLAGS"
+make
+make install
+
+CMake:
+------
+The support for CMake is minimal: it only helps you compile libwebp, cwebp and
+dwebp.
+
+Prerequisites:
+A compiler (e.g., gcc with autotools) and CMake.
+On a Debian-like system the following should install everything you need for a
+minimal build:
+$ sudo apt-get install build-essential cmake
+
+When building from git sources, you will need to run cmake to generate the
+configure script.
+
+mkdir build && cd build && cmake ../
+make
+make install
+
+If you also want cwebp or dwebp, you will need to enable them through CMake:
+
+cmake -DWEBP_BUILD_CWEBP=ON -DWEBP_BUILD_DWEBP=ON ../
+
+or through your favorite interface (like ccmake or cmake-qt-gui).
+
+Gradle:
+-------
+The support for Gradle is minimal: it only helps you compile libwebp, cwebp and
+dwebp and webpmux_example.
+
+Prerequisites:
+A compiler (e.g., gcc with autotools) and gradle.
+On a Debian-like system the following should install everything you need for a
+minimal build:
+$ sudo apt-get install build-essential gradle
+
+When building from git sources, you will need to run the Gradle wrapper with the
+appropriate target, e.g. :
+
+./gradlew buildAllExecutables
+
 SWIG bindings:
 --------------
 
@@ -151,8 +218,8 @@ If input size (-s) for an image is not specified, it is
 assumed to be a PNG, JPEG, TIFF or WebP file.
 
 Options:
-  -h / -help  ............ short help
-  -H / -longhelp  ........ long help
+  -h / -help ............. short help
+  -H / -longhelp ......... long help
   -q <float> ............. quality factor (0:small..100:big)
   -alpha_q <int> ......... transparency-compression quality (0..100)
   -preset <string> ....... preset setting, one of:
@@ -274,7 +341,7 @@ Use following options to convert into alternate image formats:
   -yuv ......... save the raw YUV samples in flat layout
 
  Other options are:
-  -version  .... print version number and exit
+  -version ..... print version number and exit
   -nofancy ..... don't use the fancy YUV420 upscaler
   -nofilter .... disable in-loop filtering
   -nodither .... disable dithering
@@ -286,8 +353,8 @@ Use following options to convert into alternate image formats:
   -flip ........ flip the output vertically
   -alpha ....... only save the alpha plane
   -incremental . use incremental decoding (useful for tests)
-  -h     ....... this help message
-  -v     ....... verbose (e.g. print encoding/decoding times)
+  -h ........... this help message
+  -v ........... verbose (e.g. print encoding/decoding times)
   -quiet ....... quiet mode, don't print anything
   -noasm ....... disable all assembly optimizations
 
@@ -303,7 +370,7 @@ Usage: vwebp in_file [options]
 
 Decodes the WebP image file and visualize it using OpenGL
 Options are:
-  -version  .... print version number and exit
+  -version ..... print version number and exit
   -noicc ....... don't use the icc profile if present
   -nofancy ..... don't use the fancy YUV420 upscaler
   -nofilter .... disable in-loop filtering
@@ -311,7 +378,7 @@ Options are:
   -noalphadither disable alpha plane dithering
   -mt .......... use multi-threading
   -info ........ print info
-  -h     ....... this help message
+  -h ........... this help message
 
 Keyboard shortcuts:
   'c' ................ toggle use of color profile
@@ -353,7 +420,7 @@ vwebp.
 Usage:
  gif2webp [options] gif_file -o webp_file
 Options:
-  -h / -help  ............ this help
+  -h / -help ............. this help
   -lossy ................. encode image using lossy compression
   -mixed ................. for each frame in the image, pick lossy
                            or lossless compression heuristically
@@ -637,7 +704,7 @@ an otherwise too-large picture. Some CPU can be saved too, incidentally.
 Bugs:
 =====
 
-Please report all bugs to our issue tracker:
+Please report all bugs to the issue tracker:
     https://bugs.chromium.org/p/webp
 Patches welcome! See this page to get started:
     http://www.webmproject.org/code/contribute/submitting-patches/
diff --git a/src/3rdparty/libwebp/qt_attribution.json b/src/3rdparty/libwebp/qt_attribution.json
index 6084e7a..825cfea 100644
--- a/src/3rdparty/libwebp/qt_attribution.json
+++ b/src/3rdparty/libwebp/qt_attribution.json
@@ -6,7 +6,7 @@
 
     "Description": "WebP is a new image format that provides lossless and lossy compression for images on the web.",
     "Homepage": "https://developers.google.com/speed/webp/",
-    "Version": "0.5.0",
+    "Version": "0.5.1",
     "License": "BSD 3-clause \"New\" or \"Revised\" License",
     "LicenseId": "BSD-3-Clause",
     "LicenseFile": "COPYING",
diff --git a/src/3rdparty/libwebp/src/dec/alpha.c b/src/3rdparty/libwebp/src/dec/alpha.c
index 52216fc..028eb3d 100644
--- a/src/3rdparty/libwebp/src/dec/alpha.c
+++ b/src/3rdparty/libwebp/src/dec/alpha.c
@@ -23,12 +23,14 @@
 //------------------------------------------------------------------------------
 // ALPHDecoder object.
 
-ALPHDecoder* ALPHNew(void) {
+// Allocates a new alpha decoder instance.
+static ALPHDecoder* ALPHNew(void) {
   ALPHDecoder* const dec = (ALPHDecoder*)WebPSafeCalloc(1ULL, sizeof(*dec));
   return dec;
 }
 
-void ALPHDelete(ALPHDecoder* const dec) {
+// Clears and deallocates an alpha decoder instance.
+static void ALPHDelete(ALPHDecoder* const dec) {
   if (dec != NULL) {
     VP8LDelete(dec->vp8l_dec_);
     dec->vp8l_dec_ = NULL;
@@ -44,17 +46,21 @@ void ALPHDelete(ALPHDecoder* const dec) {
 // Returns false in case of error in alpha header (data too short, invalid
 // compression method or filter, error in lossless header data etc).
 static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
-                    size_t data_size, int width, int height, uint8_t* output) {
+                    size_t data_size, const VP8Io* const src_io,
+                    uint8_t* output) {
   int ok = 0;
   const uint8_t* const alpha_data = data + ALPHA_HEADER_LEN;
   const size_t alpha_data_size = data_size - ALPHA_HEADER_LEN;
   int rsrv;
+  VP8Io* const io = &dec->io_;
 
-  assert(width > 0 && height > 0);
-  assert(data != NULL && output != NULL);
+  assert(data != NULL && output != NULL && src_io != NULL);
 
-  dec->width_ = width;
-  dec->height_ = height;
+  VP8FiltersInit();
+  dec->output_ = output;
+  dec->width_ = src_io->width;
+  dec->height_ = src_io->height;
+  assert(dec->width_ > 0 && dec->height_ > 0);
 
   if (data_size <= ALPHA_HEADER_LEN) {
     return 0;
@@ -72,14 +78,28 @@ static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
     return 0;
   }
 
+  // Copy the necessary parameters from src_io to io
+  VP8InitIo(io);
+  WebPInitCustomIo(NULL, io);
+  io->opaque = dec;
+  io->width = src_io->width;
+  io->height = src_io->height;
+
+  io->use_cropping = src_io->use_cropping;
+  io->crop_left = src_io->crop_left;
+  io->crop_right = src_io->crop_right;
+  io->crop_top = src_io->crop_top;
+  io->crop_bottom = src_io->crop_bottom;
+  // No need to copy the scaling parameters.
+
   if (dec->method_ == ALPHA_NO_COMPRESSION) {
     const size_t alpha_decoded_size = dec->width_ * dec->height_;
     ok = (alpha_data_size >= alpha_decoded_size);
   } else {
     assert(dec->method_ == ALPHA_LOSSLESS_COMPRESSION);
-    ok = VP8LDecodeAlphaHeader(dec, alpha_data, alpha_data_size, output);
+    ok = VP8LDecodeAlphaHeader(dec, alpha_data, alpha_data_size);
   }
-  VP8FiltersInit();
+
   return ok;
 }
 
@@ -90,15 +110,30 @@ static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data,
 static int ALPHDecode(VP8Decoder* const dec, int row, int num_rows) {
   ALPHDecoder* const alph_dec = dec->alph_dec_;
   const int width = alph_dec->width_;
-  const int height = alph_dec->height_;
-  WebPUnfilterFunc unfilter_func = WebPUnfilters[alph_dec->filter_];
-  uint8_t* const output = dec->alpha_plane_;
+  const int height = alph_dec->io_.crop_bottom;
   if (alph_dec->method_ == ALPHA_NO_COMPRESSION) {
-    const size_t offset = row * width;
-    const size_t num_pixels = num_rows * width;
-    assert(dec->alpha_data_size_ >= ALPHA_HEADER_LEN + offset + num_pixels);
-    memcpy(dec->alpha_plane_ + offset,
-           dec->alpha_data_ + ALPHA_HEADER_LEN + offset, num_pixels);
+    int y;
+    const uint8_t* prev_line = dec->alpha_prev_line_;
+    const uint8_t* deltas = dec->alpha_data_ + ALPHA_HEADER_LEN + row * width;
+    uint8_t* dst = dec->alpha_plane_ + row * width;
+    assert(deltas <= &dec->alpha_data_[dec->alpha_data_size_]);
+    if (alph_dec->filter_ != WEBP_FILTER_NONE) {
+      assert(WebPUnfilters[alph_dec->filter_] != NULL);
+      for (y = 0; y < num_rows; ++y) {
+        WebPUnfilters[alph_dec->filter_](prev_line, deltas, dst, width);
+        prev_line = dst;
+        dst += width;
+        deltas += width;
+      }
+    } else {
+      for (y = 0; y < num_rows; ++y) {
+        memcpy(dst, deltas, width * sizeof(*dst));
+        prev_line = dst;
+        dst += width;
+        deltas += width;
+      }
+    }
+    dec->alpha_prev_line_ = prev_line;
   } else {  // alph_dec->method_ == ALPHA_LOSSLESS_COMPRESSION
     assert(alph_dec->vp8l_dec_ != NULL);
     if (!VP8LDecodeAlphaImageStream(alph_dec, row + num_rows)) {
@@ -106,62 +141,92 @@ static int ALPHDecode(VP8Decoder* const dec, int row, int num_rows) {
     }
   }
 
-  if (unfilter_func != NULL) {
-    unfilter_func(width, height, width, row, num_rows, output);
+  if (row + num_rows >= height) {
+    dec->is_alpha_decoded_ = 1;
   }
+  return 1;
+}
 
-  if (row + num_rows == dec->pic_hdr_.height_) {
-    dec->is_alpha_decoded_ = 1;
+static int AllocateAlphaPlane(VP8Decoder* const dec, const VP8Io* const io) {
+  const int stride = io->width;
+  const int height = io->crop_bottom;
+  const uint64_t alpha_size = (uint64_t)stride * height;
+  assert(dec->alpha_plane_mem_ == NULL);
+  dec->alpha_plane_mem_ =
+      (uint8_t*)WebPSafeMalloc(alpha_size, sizeof(*dec->alpha_plane_));
+  if (dec->alpha_plane_mem_ == NULL) {
+    return 0;
   }
+  dec->alpha_plane_ = dec->alpha_plane_mem_;
+  dec->alpha_prev_line_ = NULL;
   return 1;
 }
 
+void WebPDeallocateAlphaMemory(VP8Decoder* const dec) {
+  assert(dec != NULL);
+  WebPSafeFree(dec->alpha_plane_mem_);
+  dec->alpha_plane_mem_ = NULL;
+  dec->alpha_plane_ = NULL;
+  ALPHDelete(dec->alph_dec_);
+  dec->alph_dec_ = NULL;
+}
+
 //------------------------------------------------------------------------------
 // Main entry point.
 
 const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
+                                      const VP8Io* const io,
                                       int row, int num_rows) {
-  const int width = dec->pic_hdr_.width_;
-  const int height = dec->pic_hdr_.height_;
+  const int width = io->width;
+  const int height = io->crop_bottom;
+
+  assert(dec != NULL && io != NULL);
 
   if (row < 0 || num_rows <= 0 || row + num_rows > height) {
     return NULL;    // sanity check.
   }
 
-  if (row == 0) {
-    // Initialize decoding.
-    assert(dec->alpha_plane_ != NULL);
-    dec->alph_dec_ = ALPHNew();
-    if (dec->alph_dec_ == NULL) return NULL;
-    if (!ALPHInit(dec->alph_dec_, dec->alpha_data_, dec->alpha_data_size_,
-                  width, height, dec->alpha_plane_)) {
-      ALPHDelete(dec->alph_dec_);
-      dec->alph_dec_ = NULL;
-      return NULL;
-    }
-    // if we allowed use of alpha dithering, check whether it's needed at all
-    if (dec->alph_dec_->pre_processing_ != ALPHA_PREPROCESSED_LEVELS) {
-      dec->alpha_dithering_ = 0;  // disable dithering
-    } else {
-      num_rows = height;          // decode everything in one pass
+  if (!dec->is_alpha_decoded_) {
+    if (dec->alph_dec_ == NULL) {    // Initialize decoder.
+      dec->alph_dec_ = ALPHNew();
+      if (dec->alph_dec_ == NULL) return NULL;
+      if (!AllocateAlphaPlane(dec, io)) goto Error;
+      if (!ALPHInit(dec->alph_dec_, dec->alpha_data_, dec->alpha_data_size_,
+                    io, dec->alpha_plane_)) {
+        goto Error;
+      }
+      // if we allowed use of alpha dithering, check whether it's needed at all
+      if (dec->alph_dec_->pre_processing_ != ALPHA_PREPROCESSED_LEVELS) {
+        dec->alpha_dithering_ = 0;   // disable dithering
+      } else {
+        num_rows = height - row;     // decode everything in one pass
+      }
     }
-  }
 
-  if (!dec->is_alpha_decoded_) {
-    int ok = 0;
     assert(dec->alph_dec_ != NULL);
-    ok = ALPHDecode(dec, row, num_rows);
-    if (ok && dec->alpha_dithering_ > 0) {
-      ok = WebPDequantizeLevels(dec->alpha_plane_, width, height,
-                                dec->alpha_dithering_);
-    }
-    if (!ok || dec->is_alpha_decoded_) {
+    assert(row + num_rows <= height);
+    if (!ALPHDecode(dec, row, num_rows)) goto Error;
+
+    if (dec->is_alpha_decoded_) {   // finished?
       ALPHDelete(dec->alph_dec_);
       dec->alph_dec_ = NULL;
+      if (dec->alpha_dithering_ > 0) {
+        uint8_t* const alpha = dec->alpha_plane_ + io->crop_top * width
+                             + io->crop_left;
+        if (!WebPDequantizeLevels(alpha,
+                                  io->crop_right - io->crop_left,
+                                  io->crop_bottom - io->crop_top,
+                                  width, dec->alpha_dithering_)) {
+          goto Error;
+        }
+      }
     }
-    if (!ok) return NULL;  // Error.
   }
 
   // Return a pointer to the current decoded row.
   return dec->alpha_plane_ + row * width;
+
+ Error:
+  WebPDeallocateAlphaMemory(dec);
+  return NULL;
 }
diff --git a/src/3rdparty/libwebp/src/dec/alphai.h b/src/3rdparty/libwebp/src/dec/alphai.h
index 5fa230c..69dd7c0 100644
--- a/src/3rdparty/libwebp/src/dec/alphai.h
+++ b/src/3rdparty/libwebp/src/dec/alphai.h
@@ -32,19 +32,18 @@ struct ALPHDecoder {
   int pre_processing_;
   struct VP8LDecoder* vp8l_dec_;
   VP8Io io_;
-  int use_8b_decode;  // Although alpha channel requires only 1 byte per
-                      // pixel, sometimes VP8LDecoder may need to allocate
-                      // 4 bytes per pixel internally during decode.
+  int use_8b_decode_;  // Although alpha channel requires only 1 byte per
+                       // pixel, sometimes VP8LDecoder may need to allocate
+                       // 4 bytes per pixel internally during decode.
+  uint8_t* output_;
+  const uint8_t* prev_line_;   // last output row (or NULL)
 };
 
 //------------------------------------------------------------------------------
 // internal functions. Not public.
 
-// Allocates a new alpha decoder instance.
-ALPHDecoder* ALPHNew(void);
-
-// Clears and deallocates an alpha decoder instance.
-void ALPHDelete(ALPHDecoder* const dec);
+// Deallocate memory associated to dec->alpha_plane_ decoding
+void WebPDeallocateAlphaMemory(VP8Decoder* const dec);
 
 //------------------------------------------------------------------------------
 
diff --git a/src/3rdparty/libwebp/src/dec/buffer.c b/src/3rdparty/libwebp/src/dec/buffer.c
index 9ed2b3f..547e69b 100644
--- a/src/3rdparty/libwebp/src/dec/buffer.c
+++ b/src/3rdparty/libwebp/src/dec/buffer.c
@@ -92,7 +92,7 @@ static VP8StatusCode AllocateBuffer(WebPDecBuffer* const buffer) {
     return VP8_STATUS_INVALID_PARAM;
   }
 
-  if (!buffer->is_external_memory && buffer->private_memory == NULL) {
+  if (buffer->is_external_memory <= 0 && buffer->private_memory == NULL) {
     uint8_t* output;
     int uv_stride = 0, a_stride = 0;
     uint64_t uv_size = 0, a_size = 0, total_size;
@@ -227,7 +227,7 @@ int WebPInitDecBufferInternal(WebPDecBuffer* buffer, int version) {
 
 void WebPFreeDecBuffer(WebPDecBuffer* buffer) {
   if (buffer != NULL) {
-    if (!buffer->is_external_memory) {
+    if (buffer->is_external_memory <= 0) {
       WebPSafeFree(buffer->private_memory);
     }
     buffer->private_memory = NULL;
@@ -256,5 +256,45 @@ void WebPGrabDecBuffer(WebPDecBuffer* const src, WebPDecBuffer* const dst) {
   }
 }
 
-//------------------------------------------------------------------------------
+VP8StatusCode WebPCopyDecBufferPixels(const WebPDecBuffer* const src_buf,
+                                      WebPDecBuffer* const dst_buf) {
+  assert(src_buf != NULL && dst_buf != NULL);
+  assert(src_buf->colorspace == dst_buf->colorspace);
+
+  dst_buf->width = src_buf->width;
+  dst_buf->height = src_buf->height;
+  if (CheckDecBuffer(dst_buf) != VP8_STATUS_OK) {
+    return VP8_STATUS_INVALID_PARAM;
+  }
+  if (WebPIsRGBMode(src_buf->colorspace)) {
+    const WebPRGBABuffer* const src = &src_buf->u.RGBA;
+    const WebPRGBABuffer* const dst = &dst_buf->u.RGBA;
+    WebPCopyPlane(src->rgba, src->stride, dst->rgba, dst->stride,
+                  src_buf->width * kModeBpp[src_buf->colorspace],
+                  src_buf->height);
+  } else {
+    const WebPYUVABuffer* const src = &src_buf->u.YUVA;
+    const WebPYUVABuffer* const dst = &dst_buf->u.YUVA;
+    WebPCopyPlane(src->y, src->y_stride, dst->y, dst->y_stride,
+                  src_buf->width, src_buf->height);
+    WebPCopyPlane(src->u, src->u_stride, dst->u, dst->u_stride,
+                  (src_buf->width + 1) / 2, (src_buf->height + 1) / 2);
+    WebPCopyPlane(src->v, src->v_stride, dst->v, dst->v_stride,
+                  (src_buf->width + 1) / 2, (src_buf->height + 1) / 2);
+    if (WebPIsAlphaMode(src_buf->colorspace)) {
+      WebPCopyPlane(src->a, src->a_stride, dst->a, dst->a_stride,
+                    src_buf->width, src_buf->height);
+    }
+  }
+  return VP8_STATUS_OK;
+}
 
+int WebPAvoidSlowMemory(const WebPDecBuffer* const output,
+                        const WebPBitstreamFeatures* const features) {
+  assert(output != NULL);
+  return (output->is_external_memory >= 2) &&
+         WebPIsPremultipliedMode(output->colorspace) &&
+         (features != NULL && features->has_alpha);
+}
+
+//------------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/dec/frame.c b/src/3rdparty/libwebp/src/dec/frame.c
index b882133..22d291d 100644
--- a/src/3rdparty/libwebp/src/dec/frame.c
+++ b/src/3rdparty/libwebp/src/dec/frame.c
@@ -316,6 +316,9 @@ static void PrecomputeFilterStrengths(VP8Decoder* const dec) {
 //------------------------------------------------------------------------------
 // Dithering
 
+// minimal amp that will provide a non-zero dithering effect
+#define MIN_DITHER_AMP 4
+
 #define DITHER_AMP_TAB_SIZE 12
 static const int kQuantToDitherAmp[DITHER_AMP_TAB_SIZE] = {
   // roughly, it's dqm->uv_mat_[1]
@@ -356,27 +359,14 @@ void VP8InitDithering(const WebPDecoderOptions* const options,
   }
 }
 
-// minimal amp that will provide a non-zero dithering effect
-#define MIN_DITHER_AMP 4
-#define DITHER_DESCALE 4
-#define DITHER_DESCALE_ROUNDER (1 << (DITHER_DESCALE - 1))
-#define DITHER_AMP_BITS 8
-#define DITHER_AMP_CENTER (1 << DITHER_AMP_BITS)
-
+// Convert to range: [-2,2] for dither=50, [-4,4] for dither=100
 static void Dither8x8(VP8Random* const rg, uint8_t* dst, int bps, int amp) {
-  int i, j;
-  for (j = 0; j < 8; ++j) {
-    for (i = 0; i < 8; ++i) {
-      // TODO: could be made faster with SSE2
-      const int bits =
-          VP8RandomBits2(rg, DITHER_AMP_BITS + 1, amp) - DITHER_AMP_CENTER;
-      // Convert to range: [-2,2] for dither=50, [-4,4] for dither=100
-      const int delta = (bits + DITHER_DESCALE_ROUNDER) >> DITHER_DESCALE;
-      const int v = (int)dst[i] + delta;
-      dst[i] = (v < 0) ? 0 : (v > 255) ? 255u : (uint8_t)v;
-    }
-    dst += bps;
+  uint8_t dither[64];
+  int i;
+  for (i = 0; i < 8 * 8; ++i) {
+    dither[i] = VP8RandomBits2(rg, VP8_DITHER_AMP_BITS + 1, amp);
   }
+  VP8DitherCombine8x8(dither, dst, bps);
 }
 
 static void DitherRow(VP8Decoder* const dec) {
@@ -462,7 +452,7 @@ static int FinishRow(VP8Decoder* const dec, VP8Io* const io) {
     if (dec->alpha_data_ != NULL && y_start < y_end) {
       // TODO(skal): testing presence of alpha with dec->alpha_data_ is not a
       // good idea.
-      io->a = VP8DecompressAlphaRows(dec, y_start, y_end - y_start);
+      io->a = VP8DecompressAlphaRows(dec, io, y_start, y_end - y_start);
       if (io->a == NULL) {
         return VP8SetError(dec, VP8_STATUS_BITSTREAM_ERROR,
                            "Could not decode alpha data.");
diff --git a/src/3rdparty/libwebp/src/dec/idec.c b/src/3rdparty/libwebp/src/dec/idec.c
index e0cf0c9..8de1319 100644
--- a/src/3rdparty/libwebp/src/dec/idec.c
+++ b/src/3rdparty/libwebp/src/dec/idec.c
@@ -70,7 +70,9 @@ struct WebPIDecoder {
   VP8Io io_;
 
   MemBuffer mem_;          // input memory buffer.
-  WebPDecBuffer output_;   // output buffer (when no external one is supplied)
+  WebPDecBuffer output_;   // output buffer (when no external one is supplied,
+                           // or if the external one has slow-memory)
+  WebPDecBuffer* final_output_;  // Slow-memory output to copy to eventually.
   size_t chunk_size_;      // Compressed VP8/VP8L size extracted from Header.
 
   int last_mb_y_;          // last row reached for intra-mode decoding
@@ -118,9 +120,9 @@ static void DoRemap(WebPIDecoder* const idec, ptrdiff_t offset) {
   if (idec->dec_ != NULL) {
     if (!idec->is_lossless_) {
       VP8Decoder* const dec = (VP8Decoder*)idec->dec_;
-      const int last_part = dec->num_parts_ - 1;
+      const uint32_t last_part = dec->num_parts_minus_one_;
       if (offset != 0) {
-        int p;
+        uint32_t p;
         for (p = 0; p <= last_part; ++p) {
           VP8RemapBitReader(dec->parts_ + p, offset);
         }
@@ -132,7 +134,6 @@ static void DoRemap(WebPIDecoder* const idec, ptrdiff_t offset) {
       }
       {
         const uint8_t* const last_start = dec->parts_[last_part].buf_;
-        assert(last_part >= 0);
         VP8BitReaderSetBuffer(&dec->parts_[last_part], last_start,
                               mem->buf_ + mem->end_ - last_start);
       }
@@ -249,10 +250,16 @@ static VP8StatusCode FinishDecoding(WebPIDecoder* const idec) {
 
   idec->state_ = STATE_DONE;
   if (options != NULL && options->flip) {
-    return WebPFlipBuffer(output);
-  } else {
-    return VP8_STATUS_OK;
+    const VP8StatusCode status = WebPFlipBuffer(output);
+    if (status != VP8_STATUS_OK) return status;
+  }
+  if (idec->final_output_ != NULL) {
+    WebPCopyDecBufferPixels(output, idec->final_output_);  // do the slow-copy
+    WebPFreeDecBuffer(&idec->output_);
+    *output = *idec->final_output_;
+    idec->final_output_ = NULL;
   }
+  return VP8_STATUS_OK;
 }
 
 //------------------------------------------------------------------------------
@@ -457,19 +464,20 @@ static VP8StatusCode DecodeRemaining(WebPIDecoder* const idec) {
     }
     for (; dec->mb_x_ < dec->mb_w_; ++dec->mb_x_) {
       VP8BitReader* const token_br =
-          &dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)];
+          &dec->parts_[dec->mb_y_ & dec->num_parts_minus_one_];
       MBContext context;
       SaveContext(dec, token_br, &context);
       if (!VP8DecodeMB(dec, token_br)) {
         // We shouldn't fail when MAX_MB data was available
-        if (dec->num_parts_ == 1 && MemDataSize(&idec->mem_) > MAX_MB_SIZE) {
+        if (dec->num_parts_minus_one_ == 0 &&
+            MemDataSize(&idec->mem_) > MAX_MB_SIZE) {
           return IDecError(idec, VP8_STATUS_BITSTREAM_ERROR);
         }
         RestoreContext(&context, dec, token_br);
         return VP8_STATUS_SUSPENDED;
       }
       // Release buffer only if there is only one partition
-      if (dec->num_parts_ == 1) {
+      if (dec->num_parts_minus_one_ == 0) {
         idec->mem_.start_ = token_br->buf_ - idec->mem_.buf_;
         assert(idec->mem_.start_ <= idec->mem_.end_);
       }
@@ -575,9 +583,10 @@ static VP8StatusCode IDecode(WebPIDecoder* idec) {
 }
 
 //------------------------------------------------------------------------------
-// Public functions
+// Internal constructor
 
-WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) {
+static WebPIDecoder* NewDecoder(WebPDecBuffer* const output_buffer,
+                                const WebPBitstreamFeatures* const features) {
   WebPIDecoder* idec = (WebPIDecoder*)WebPSafeCalloc(1ULL, sizeof(*idec));
   if (idec == NULL) {
     return NULL;
@@ -593,25 +602,46 @@ WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) {
   VP8InitIo(&idec->io_);
 
   WebPResetDecParams(&idec->params_);
-  idec->params_.output = (output_buffer != NULL) ? output_buffer
-                                                 : &idec->output_;
+  if (output_buffer == NULL || WebPAvoidSlowMemory(output_buffer, features)) {
+    idec->params_.output = &idec->output_;
+    idec->final_output_ = output_buffer;
+    if (output_buffer != NULL) {
+      idec->params_.output->colorspace = output_buffer->colorspace;
+    }
+  } else {
+    idec->params_.output = output_buffer;
+    idec->final_output_ = NULL;
+  }
   WebPInitCustomIo(&idec->params_, &idec->io_);  // Plug the I/O functions.
 
   return idec;
 }
 
+//------------------------------------------------------------------------------
+// Public functions
+
+WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) {
+  return NewDecoder(output_buffer, NULL);
+}
+
 WebPIDecoder* WebPIDecode(const uint8_t* data, size_t data_size,
                           WebPDecoderConfig* config) {
   WebPIDecoder* idec;
+  WebPBitstreamFeatures tmp_features;
+  WebPBitstreamFeatures* const features =
+      (config == NULL) ? &tmp_features : &config->input;
+  memset(&tmp_features, 0, sizeof(tmp_features));
 
   // Parse the bitstream's features, if requested:
-  if (data != NULL && data_size > 0 && config != NULL) {
-    if (WebPGetFeatures(data, data_size, &config->input) != VP8_STATUS_OK) {
+  if (data != NULL && data_size > 0) {
+    if (WebPGetFeatures(data, data_size, features) != VP8_STATUS_OK) {
       return NULL;
     }
   }
+
   // Create an instance of the incremental decoder
-  idec = WebPINewDecoder(config ? &config->output : NULL);
+  idec = (config != NULL) ? NewDecoder(&config->output, features)
+                          : NewDecoder(NULL, features);
   if (idec == NULL) {
     return NULL;
   }
@@ -645,11 +675,11 @@ void WebPIDelete(WebPIDecoder* idec) {
 
 WebPIDecoder* WebPINewRGB(WEBP_CSP_MODE mode, uint8_t* output_buffer,
                           size_t output_buffer_size, int output_stride) {
-  const int is_external_memory = (output_buffer != NULL);
+  const int is_external_memory = (output_buffer != NULL) ? 1 : 0;
   WebPIDecoder* idec;
 
   if (mode >= MODE_YUV) return NULL;
-  if (!is_external_memory) {    // Overwrite parameters to sane values.
+  if (is_external_memory == 0) {    // Overwrite parameters to sane values.
     output_buffer_size = 0;
     output_stride = 0;
   } else {  // A buffer was passed. Validate the other params.
@@ -671,11 +701,11 @@ WebPIDecoder* WebPINewYUVA(uint8_t* luma, size_t luma_size, int luma_stride,
                            uint8_t* u, size_t u_size, int u_stride,
                            uint8_t* v, size_t v_size, int v_stride,
                            uint8_t* a, size_t a_size, int a_stride) {
-  const int is_external_memory = (luma != NULL);
+  const int is_external_memory = (luma != NULL) ? 1 : 0;
   WebPIDecoder* idec;
   WEBP_CSP_MODE colorspace;
 
-  if (!is_external_memory) {    // Overwrite parameters to sane values.
+  if (is_external_memory == 0) {    // Overwrite parameters to sane values.
     luma_size = u_size = v_size = a_size = 0;
     luma_stride = u_stride = v_stride = a_stride = 0;
     u = v = a = NULL;
@@ -783,6 +813,9 @@ static const WebPDecBuffer* GetOutputBuffer(const WebPIDecoder* const idec) {
   if (idec->state_ <= STATE_VP8_PARTS0) {
     return NULL;
   }
+  if (idec->final_output_ != NULL) {
+    return NULL;   // not yet slow-copied
+  }
   return idec->params_.output;
 }
 
@@ -792,7 +825,7 @@ const WebPDecBuffer* WebPIDecodedArea(const WebPIDecoder* idec,
   const WebPDecBuffer* const src = GetOutputBuffer(idec);
   if (left != NULL) *left = 0;
   if (top != NULL) *top = 0;
-  if (src) {
+  if (src != NULL) {
     if (width != NULL) *width = src->width;
     if (height != NULL) *height = idec->params_.last_y;
   } else {
diff --git a/src/3rdparty/libwebp/src/dec/io.c b/src/3rdparty/libwebp/src/dec/io.c
index 13e469a..8d5c43f 100644
--- a/src/3rdparty/libwebp/src/dec/io.c
+++ b/src/3rdparty/libwebp/src/dec/io.c
@@ -119,6 +119,14 @@ static int EmitFancyRGB(const VP8Io* const io, WebPDecParams* const p) {
 
 //------------------------------------------------------------------------------
 
+static void FillAlphaPlane(uint8_t* dst, int w, int h, int stride) {
+  int j;
+  for (j = 0; j < h; ++j) {
+    memset(dst, 0xff, w * sizeof(*dst));
+    dst += stride;
+  }
+}
+
 static int EmitAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
                         int expected_num_lines_out) {
   const uint8_t* alpha = io->a;
@@ -137,10 +145,7 @@ static int EmitAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
     }
   } else if (buf->a != NULL) {
     // the user requested alpha, but there is none, set it to opaque.
-    for (j = 0; j < mb_h; ++j) {
-      memset(dst, 0xff, mb_w * sizeof(*dst));
-      dst += buf->a_stride;
-    }
+    FillAlphaPlane(dst, mb_w, mb_h, buf->a_stride);
   }
   return 0;
 }
@@ -269,8 +274,8 @@ static int EmitRescaledYUV(const VP8Io* const io, WebPDecParams* const p) {
 
 static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
                                 int expected_num_lines_out) {
+  const WebPYUVABuffer* const buf = &p->output->u.YUVA;
   if (io->a != NULL) {
-    const WebPYUVABuffer* const buf = &p->output->u.YUVA;
     uint8_t* dst_y = buf->y + p->last_y * buf->y_stride;
     const uint8_t* src_a = buf->a + p->last_y * buf->a_stride;
     const int num_lines_out = Rescale(io->a, io->width, io->mb_h, &p->scaler_a);
@@ -280,6 +285,11 @@ static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p,
       WebPMultRows(dst_y, buf->y_stride, src_a, buf->a_stride,
                    p->scaler_a.dst_width, num_lines_out, 1);
     }
+  } else if (buf->a != NULL) {
+    // the user requested alpha, but there is none, set it to opaque.
+    assert(p->last_y + expected_num_lines_out <= io->scaled_height);
+    FillAlphaPlane(buf->a + p->last_y * buf->a_stride,
+                   io->scaled_width, expected_num_lines_out, buf->a_stride);
   }
   return 0;
 }
diff --git a/src/3rdparty/libwebp/src/dec/vp8.c b/src/3rdparty/libwebp/src/dec/vp8.c
index d89eb1c..336680c 100644
--- a/src/3rdparty/libwebp/src/dec/vp8.c
+++ b/src/3rdparty/libwebp/src/dec/vp8.c
@@ -50,7 +50,7 @@ VP8Decoder* VP8New(void) {
     SetOk(dec);
     WebPGetWorkerInterface()->Init(&dec->worker_);
     dec->ready_ = 0;
-    dec->num_parts_ = 1;
+    dec->num_parts_minus_one_ = 0;
   }
   return dec;
 }
@@ -194,8 +194,8 @@ static VP8StatusCode ParsePartitions(VP8Decoder* const dec,
   size_t last_part;
   size_t p;
 
-  dec->num_parts_ = 1 << VP8GetValue(br, 2);
-  last_part = dec->num_parts_ - 1;
+  dec->num_parts_minus_one_ = (1 << VP8GetValue(br, 2)) - 1;
+  last_part = dec->num_parts_minus_one_;
   if (size < 3 * last_part) {
     // we can't even read the sizes with sz[]! That's a failure.
     return VP8_STATUS_NOT_ENOUGH_DATA;
@@ -303,15 +303,22 @@ int VP8GetHeaders(VP8Decoder* const dec, VP8Io* const io) {
 
     dec->mb_w_ = (pic_hdr->width_ + 15) >> 4;
     dec->mb_h_ = (pic_hdr->height_ + 15) >> 4;
+
     // Setup default output area (can be later modified during io->setup())
     io->width = pic_hdr->width_;
     io->height = pic_hdr->height_;
-    io->use_scaling  = 0;
+    // IMPORTANT! use some sane dimensions in crop_* and scaled_* fields.
+    // So they can be used interchangeably without always testing for
+    // 'use_cropping'.
     io->use_cropping = 0;
     io->crop_top  = 0;
     io->crop_left = 0;
     io->crop_right  = io->width;
     io->crop_bottom = io->height;
+    io->use_scaling  = 0;
+    io->scaled_width = io->width;
+    io->scaled_height = io->height;
+
     io->mb_w = io->width;   // sanity check
     io->mb_h = io->height;  // ditto
 
@@ -579,7 +586,7 @@ static int ParseFrame(VP8Decoder* const dec, VP8Io* io) {
   for (dec->mb_y_ = 0; dec->mb_y_ < dec->br_mb_y_; ++dec->mb_y_) {
     // Parse bitstream for this row.
     VP8BitReader* const token_br =
-        &dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)];
+        &dec->parts_[dec->mb_y_ & dec->num_parts_minus_one_];
     if (!VP8ParseIntraModeRow(&dec->br_, dec)) {
       return VP8SetError(dec, VP8_STATUS_NOT_ENOUGH_DATA,
                          "Premature end-of-partition0 encountered.");
@@ -649,8 +656,7 @@ void VP8Clear(VP8Decoder* const dec) {
     return;
   }
   WebPGetWorkerInterface()->End(&dec->worker_);
-  ALPHDelete(dec->alph_dec_);
-  dec->alph_dec_ = NULL;
+  WebPDeallocateAlphaMemory(dec);
   WebPSafeFree(dec->mem_);
   dec->mem_ = NULL;
   dec->mem_size_ = 0;
@@ -659,4 +665,3 @@ void VP8Clear(VP8Decoder* const dec) {
 }
 
 //------------------------------------------------------------------------------
-
diff --git a/src/3rdparty/libwebp/src/dec/vp8i.h b/src/3rdparty/libwebp/src/dec/vp8i.h
index 0104f25..00da02b 100644
--- a/src/3rdparty/libwebp/src/dec/vp8i.h
+++ b/src/3rdparty/libwebp/src/dec/vp8i.h
@@ -32,7 +32,7 @@ extern "C" {
 // version numbers
 #define DEC_MAJ_VERSION 0
 #define DEC_MIN_VERSION 5
-#define DEC_REV_VERSION 0
+#define DEC_REV_VERSION 1
 
 // YUV-cache parameters. Cache is 32-bytes wide (= one cacheline).
 // Constraints are: We need to store one 16x16 block of luma samples (y),
@@ -209,8 +209,8 @@ struct VP8Decoder {
   int tl_mb_x_, tl_mb_y_;  // top-left MB that must be in-loop filtered
   int br_mb_x_, br_mb_y_;  // last bottom-right MB that must be decoded
 
-  // number of partitions.
-  int num_parts_;
+  // number of partitions minus one.
+  uint32_t num_parts_minus_one_;
   // per-partition boolean decoders.
   VP8BitReader parts_[MAX_NUM_PARTITIONS];
 
@@ -258,9 +258,11 @@ struct VP8Decoder {
   struct ALPHDecoder* alph_dec_;  // alpha-plane decoder object
   const uint8_t* alpha_data_;     // compressed alpha data (if present)
   size_t alpha_data_size_;
-  int is_alpha_decoded_;  // true if alpha_data_ is decoded in alpha_plane_
-  uint8_t* alpha_plane_;  // output. Persistent, contains the whole data.
-  int alpha_dithering_;   // derived from decoding options (0=off, 100=full).
+  int is_alpha_decoded_;      // true if alpha_data_ is decoded in alpha_plane_
+  uint8_t* alpha_plane_mem_;  // memory allocated for alpha_plane_
+  uint8_t* alpha_plane_;      // output. Persistent, contains the whole data.
+  const uint8_t* alpha_prev_line_;  // last decoded alpha row (or NULL)
+  int alpha_dithering_;       // derived from decoding options (0=off, 100=full)
 };
 
 //------------------------------------------------------------------------------
@@ -306,6 +308,7 @@ int VP8DecodeMB(VP8Decoder* const dec, VP8BitReader* const token_br);
 
 // in alpha.c
 const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
+                                      const VP8Io* const io,
                                       int row, int num_rows);
 
 //------------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/dec/vp8l.c b/src/3rdparty/libwebp/src/dec/vp8l.c
index a76ad6a..cb2e317 100644
--- a/src/3rdparty/libwebp/src/dec/vp8l.c
+++ b/src/3rdparty/libwebp/src/dec/vp8l.c
@@ -714,34 +714,22 @@ static void ApplyInverseTransforms(VP8LDecoder* const dec, int num_rows,
   }
 }
 
-// Special method for paletted alpha data.
-static void ApplyInverseTransformsAlpha(VP8LDecoder* const dec, int num_rows,
-                                        const uint8_t* const rows) {
-  const int start_row = dec->last_row_;
-  const int end_row = start_row + num_rows;
-  const uint8_t* rows_in = rows;
-  uint8_t* rows_out = (uint8_t*)dec->io_->opaque + dec->io_->width * start_row;
-  VP8LTransform* const transform = &dec->transforms_[0];
-  assert(dec->next_transform_ == 1);
-  assert(transform->type_ == COLOR_INDEXING_TRANSFORM);
-  VP8LColorIndexInverseTransformAlpha(transform, start_row, end_row, rows_in,
-                                      rows_out);
-}
-
 // Processes (transforms, scales & color-converts) the rows decoded after the
 // last call.
 static void ProcessRows(VP8LDecoder* const dec, int row) {
   const uint32_t* const rows = dec->pixels_ + dec->width_ * dec->last_row_;
   const int num_rows = row - dec->last_row_;
 
-  if (num_rows <= 0) return;  // Nothing to be done.
-  ApplyInverseTransforms(dec, num_rows, rows);
-
-  // Emit output.
-  {
+  assert(row <= dec->io_->crop_bottom);
+  // We can't process more than NUM_ARGB_CACHE_ROWS at a time (that's the size
+  // of argb_cache_), but we currently don't need more than that.
+  assert(num_rows <= NUM_ARGB_CACHE_ROWS);
+  if (num_rows > 0) {    // Emit output.
     VP8Io* const io = dec->io_;
     uint8_t* rows_data = (uint8_t*)dec->argb_cache_;
     const int in_stride = io->width * sizeof(uint32_t);  // in unit of RGBA
+
+    ApplyInverseTransforms(dec, num_rows, rows);
     if (!SetCropWindow(io, dec->last_row_, row, &rows_data, in_stride)) {
       // Nothing to output (this time).
     } else {
@@ -786,14 +774,46 @@ static int Is8bOptimizable(const VP8LMetadata* const hdr) {
   return 1;
 }
 
-static void ExtractPalettedAlphaRows(VP8LDecoder* const dec, int row) {
-  const int num_rows = row - dec->last_row_;
-  const uint8_t* const in =
-      (uint8_t*)dec->pixels_ + dec->width_ * dec->last_row_;
-  if (num_rows > 0) {
-    ApplyInverseTransformsAlpha(dec, num_rows, in);
+static void AlphaApplyFilter(ALPHDecoder* const alph_dec,
+                             int first_row, int last_row,
+                             uint8_t* out, int stride) {
+  if (alph_dec->filter_ != WEBP_FILTER_NONE) {
+    int y;
+    const uint8_t* prev_line = alph_dec->prev_line_;
+    assert(WebPUnfilters[alph_dec->filter_] != NULL);
+    for (y = first_row; y < last_row; ++y) {
+      WebPUnfilters[alph_dec->filter_](prev_line, out, out, stride);
+      prev_line = out;
+      out += stride;
+    }
+    alph_dec->prev_line_ = prev_line;
   }
-  dec->last_row_ = dec->last_out_row_ = row;
+}
+
+static void ExtractPalettedAlphaRows(VP8LDecoder* const dec, int last_row) {
+  // For vertical and gradient filtering, we need to decode the part above the
+  // crop_top row, in order to have the correct spatial predictors.
+  ALPHDecoder* const alph_dec = (ALPHDecoder*)dec->io_->opaque;
+  const int top_row =
+      (alph_dec->filter_ == WEBP_FILTER_NONE ||
+       alph_dec->filter_ == WEBP_FILTER_HORIZONTAL) ? dec->io_->crop_top
+                                                    : dec->last_row_;
+  const int first_row = (dec->last_row_ < top_row) ? top_row : dec->last_row_;
+  assert(last_row <= dec->io_->crop_bottom);
+  if (last_row > first_row) {
+    // Special method for paletted alpha data. We only process the cropped area.
+    const int width = dec->io_->width;
+    uint8_t* out = alph_dec->output_ + width * first_row;
+    const uint8_t* const in =
+      (uint8_t*)dec->pixels_ + dec->width_ * first_row;
+    VP8LTransform* const transform = &dec->transforms_[0];
+    assert(dec->next_transform_ == 1);
+    assert(transform->type_ == COLOR_INDEXING_TRANSFORM);
+    VP8LColorIndexInverseTransformAlpha(transform, first_row, last_row,
+                                        in, out);
+    AlphaApplyFilter(alph_dec, first_row, last_row, out, width);
+  }
+  dec->last_row_ = dec->last_out_row_ = last_row;
 }
 
 //------------------------------------------------------------------------------
@@ -922,14 +942,14 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
   int col = dec->last_pixel_ % width;
   VP8LBitReader* const br = &dec->br_;
   VP8LMetadata* const hdr = &dec->hdr_;
-  const HTreeGroup* htree_group = GetHtreeGroupForPos(hdr, col, row);
   int pos = dec->last_pixel_;         // current position
   const int end = width * height;     // End of data
   const int last = width * last_row;  // Last pixel to decode
   const int len_code_limit = NUM_LITERAL_CODES + NUM_LENGTH_CODES;
   const int mask = hdr->huffman_mask_;
-  assert(htree_group != NULL);
-  assert(pos < end);
+  const HTreeGroup* htree_group =
+      (pos < last) ? GetHtreeGroupForPos(hdr, col, row) : NULL;
+  assert(pos <= end);
   assert(last_row <= height);
   assert(Is8bOptimizable(hdr));
 
@@ -939,6 +959,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
     if ((col & mask) == 0) {
       htree_group = GetHtreeGroupForPos(hdr, col, row);
     }
+    assert(htree_group != NULL);
     VP8LFillBitWindow(br);
     code = ReadSymbol(htree_group->htrees[GREEN], br);
     if (code < NUM_LITERAL_CODES) {  // Literal
@@ -948,7 +969,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
       if (col >= width) {
         col = 0;
         ++row;
-        if (row % NUM_ARGB_CACHE_ROWS == 0) {
+        if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) {
           ExtractPalettedAlphaRows(dec, row);
         }
       }
@@ -971,7 +992,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
       while (col >= width) {
         col -= width;
         ++row;
-        if (row % NUM_ARGB_CACHE_ROWS == 0) {
+        if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) {
           ExtractPalettedAlphaRows(dec, row);
         }
       }
@@ -985,7 +1006,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
     assert(br->eos_ == VP8LIsEndOfStream(br));
   }
   // Process the remaining rows corresponding to last row-block.
-  ExtractPalettedAlphaRows(dec, row);
+  ExtractPalettedAlphaRows(dec, row > last_row ? last_row : row);
 
  End:
   if (!ok || (br->eos_ && pos < end)) {
@@ -1025,7 +1046,6 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
   int col = dec->last_pixel_ % width;
   VP8LBitReader* const br = &dec->br_;
   VP8LMetadata* const hdr = &dec->hdr_;
-  HTreeGroup* htree_group = GetHtreeGroupForPos(hdr, col, row);
   uint32_t* src = data + dec->last_pixel_;
   uint32_t* last_cached = src;
   uint32_t* const src_end = data + width * height;     // End of data
@@ -1036,8 +1056,9 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
   VP8LColorCache* const color_cache =
       (hdr->color_cache_size_ > 0) ? &hdr->color_cache_ : NULL;
   const int mask = hdr->huffman_mask_;
-  assert(htree_group != NULL);
-  assert(src < src_end);
+  const HTreeGroup* htree_group =
+      (src < src_last) ? GetHtreeGroupForPos(hdr, col, row) : NULL;
+  assert(dec->last_row_ < last_row);
   assert(src_last <= src_end);
 
   while (src < src_last) {
@@ -1049,7 +1070,10 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
     // Only update when changing tile. Note we could use this test:
     // if "((((prev_col ^ col) | prev_row ^ row)) > mask)" -> tile changed
     // but that's actually slower and needs storing the previous col/row.
-    if ((col & mask) == 0) htree_group = GetHtreeGroupForPos(hdr, col, row);
+    if ((col & mask) == 0) {
+      htree_group = GetHtreeGroupForPos(hdr, col, row);
+    }
+    assert(htree_group != NULL);
     if (htree_group->is_trivial_code) {
       *src = htree_group->literal_arb;
       goto AdvanceByOne;
@@ -1080,8 +1104,10 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
       if (col >= width) {
         col = 0;
         ++row;
-        if ((row % NUM_ARGB_CACHE_ROWS == 0) && (process_func != NULL)) {
-          process_func(dec, row);
+        if (process_func != NULL) {
+          if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) {
+            process_func(dec, row);
+          }
         }
         if (color_cache != NULL) {
           while (last_cached < src) {
@@ -1108,8 +1134,10 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
       while (col >= width) {
         col -= width;
         ++row;
-        if ((row % NUM_ARGB_CACHE_ROWS == 0) && (process_func != NULL)) {
-          process_func(dec, row);
+        if (process_func != NULL) {
+          if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) {
+            process_func(dec, row);
+          }
         }
       }
       // Because of the check done above (before 'src' was incremented by
@@ -1140,7 +1168,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
   } else if (!br->eos_) {
     // Process the remaining rows corresponding to last row-block.
     if (process_func != NULL) {
-      process_func(dec, row);
+      process_func(dec, row > last_row ? last_row : row);
     }
     dec->status_ = VP8_STATUS_OK;
     dec->last_pixel_ = (int)(src - data);  // end-of-scan marker
@@ -1438,46 +1466,51 @@ static int AllocateInternalBuffers8b(VP8LDecoder* const dec) {
 //------------------------------------------------------------------------------
 
 // Special row-processing that only stores the alpha data.
-static void ExtractAlphaRows(VP8LDecoder* const dec, int row) {
-  const int num_rows = row - dec->last_row_;
-  const uint32_t* const in = dec->pixels_ + dec->width_ * dec->last_row_;
-
-  if (num_rows <= 0) return;  // Nothing to be done.
-  ApplyInverseTransforms(dec, num_rows, in);
-
-  // Extract alpha (which is stored in the green plane).
-  {
+static void ExtractAlphaRows(VP8LDecoder* const dec, int last_row) {
+  int cur_row = dec->last_row_;
+  int num_rows = last_row - cur_row;
+  const uint32_t* in = dec->pixels_ + dec->width_ * cur_row;
+
+  assert(last_row <= dec->io_->crop_bottom);
+  while (num_rows > 0) {
+    const int num_rows_to_process =
+        (num_rows > NUM_ARGB_CACHE_ROWS) ? NUM_ARGB_CACHE_ROWS : num_rows;
+    // Extract alpha (which is stored in the green plane).
+    ALPHDecoder* const alph_dec = (ALPHDecoder*)dec->io_->opaque;
+    uint8_t* const output = alph_dec->output_;
     const int width = dec->io_->width;      // the final width (!= dec->width_)
-    const int cache_pixs = width * num_rows;
-    uint8_t* const dst = (uint8_t*)dec->io_->opaque + width * dec->last_row_;
+    const int cache_pixs = width * num_rows_to_process;
+    uint8_t* const dst = output + width * cur_row;
     const uint32_t* const src = dec->argb_cache_;
     int i;
+    ApplyInverseTransforms(dec, num_rows_to_process, in);
     for (i = 0; i < cache_pixs; ++i) dst[i] = (src[i] >> 8) & 0xff;
-  }
-  dec->last_row_ = dec->last_out_row_ = row;
+    AlphaApplyFilter(alph_dec,
+                     cur_row, cur_row + num_rows_to_process, dst, width);
+    num_rows -= num_rows_to_process;
+    in += num_rows_to_process * dec->width_;
+    cur_row += num_rows_to_process;
+  }
+  assert(cur_row == last_row);
+  dec->last_row_ = dec->last_out_row_ = last_row;
 }
 
 int VP8LDecodeAlphaHeader(ALPHDecoder* const alph_dec,
-                          const uint8_t* const data, size_t data_size,
-                          uint8_t* const output) {
+                          const uint8_t* const data, size_t data_size) {
   int ok = 0;
-  VP8LDecoder* dec;
-  VP8Io* io;
+  VP8LDecoder* dec = VP8LNew();
+
+  if (dec == NULL) return 0;
+
   assert(alph_dec != NULL);
-  alph_dec->vp8l_dec_ = VP8LNew();
-  if (alph_dec->vp8l_dec_ == NULL) return 0;
-  dec = alph_dec->vp8l_dec_;
+  alph_dec->vp8l_dec_ = dec;
 
   dec->width_ = alph_dec->width_;
   dec->height_ = alph_dec->height_;
   dec->io_ = &alph_dec->io_;
-  io = dec->io_;
-
-  VP8InitIo(io);
-  WebPInitCustomIo(NULL, io);  // Just a sanity Init. io won't be used.
-  io->opaque = output;
-  io->width = alph_dec->width_;
-  io->height = alph_dec->height_;
+  dec->io_->opaque = alph_dec;
+  dec->io_->width = alph_dec->width_;
+  dec->io_->height = alph_dec->height_;
 
   dec->status_ = VP8_STATUS_OK;
   VP8LInitBitReader(&dec->br_, data, data_size);
@@ -1492,11 +1525,11 @@ int VP8LDecodeAlphaHeader(ALPHDecoder* const alph_dec,
   if (dec->next_transform_ == 1 &&
       dec->transforms_[0].type_ == COLOR_INDEXING_TRANSFORM &&
       Is8bOptimizable(&dec->hdr_)) {
-    alph_dec->use_8b_decode = 1;
+    alph_dec->use_8b_decode_ = 1;
     ok = AllocateInternalBuffers8b(dec);
   } else {
     // Allocate internal buffers (note that dec->width_ may have changed here).
-    alph_dec->use_8b_decode = 0;
+    alph_dec->use_8b_decode_ = 0;
     ok = AllocateInternalBuffers32b(dec, alph_dec->width_);
   }
 
@@ -1515,12 +1548,12 @@ int VP8LDecodeAlphaImageStream(ALPHDecoder* const alph_dec, int last_row) {
   assert(dec != NULL);
   assert(last_row <= dec->height_);
 
-  if (dec->last_pixel_ == dec->width_ * dec->height_) {
+  if (dec->last_row_ >= last_row) {
     return 1;  // done
   }
 
   // Decode (with special row processing).
-  return alph_dec->use_8b_decode ?
+  return alph_dec->use_8b_decode_ ?
       DecodeAlphaData(dec, (uint8_t*)dec->pixels_, dec->width_, dec->height_,
                       last_row) :
       DecodeImageData(dec, dec->pixels_, dec->width_, dec->height_,
@@ -1611,7 +1644,7 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
 
   // Decode.
   if (!DecodeImageData(dec, dec->pixels_, dec->width_, dec->height_,
-                       dec->height_, ProcessRows)) {
+                       io->crop_bottom, ProcessRows)) {
     goto Err;
   }
 
diff --git a/src/3rdparty/libwebp/src/dec/vp8li.h b/src/3rdparty/libwebp/src/dec/vp8li.h
index 8886e47..9313bdc 100644
--- a/src/3rdparty/libwebp/src/dec/vp8li.h
+++ b/src/3rdparty/libwebp/src/dec/vp8li.h
@@ -100,8 +100,7 @@ struct ALPHDecoder;  // Defined in dec/alphai.h.
 // Decodes image header for alpha data stored using lossless compression.
 // Returns false in case of error.
 int VP8LDecodeAlphaHeader(struct ALPHDecoder* const alph_dec,
-                          const uint8_t* const data, size_t data_size,
-                          uint8_t* const output);
+                          const uint8_t* const data, size_t data_size);
 
 // Decodes *at least* 'last_row' rows of alpha. If some of the initial rows are
 // already decoded in previous call(s), it will resume decoding from where it
diff --git a/src/3rdparty/libwebp/src/dec/webp.c b/src/3rdparty/libwebp/src/dec/webp.c
index 952178f..d0b912f 100644
--- a/src/3rdparty/libwebp/src/dec/webp.c
+++ b/src/3rdparty/libwebp/src/dec/webp.c
@@ -415,7 +415,8 @@ static VP8StatusCode ParseHeadersInternal(const uint8_t* data,
 }
 
 VP8StatusCode WebPParseHeaders(WebPHeaderStructure* const headers) {
-  VP8StatusCode status;
+  // status is marked volatile as a workaround for a clang-3.8 (aarch64) bug
+  volatile VP8StatusCode status;
   int has_animation = 0;
   assert(headers != NULL);
   // fill out headers, ignore width/height/has_alpha.
@@ -512,10 +513,12 @@ static VP8StatusCode DecodeInto(const uint8_t* const data, size_t data_size,
 
   if (status != VP8_STATUS_OK) {
     WebPFreeDecBuffer(params->output);
-  }
-
-  if (params->options != NULL && params->options->flip) {
-    status = WebPFlipBuffer(params->output);
+  } else {
+    if (params->options != NULL && params->options->flip) {
+      // This restores the original stride values if options->flip was used
+      // during the call to WebPAllocateDecBuffer above.
+      status = WebPFlipBuffer(params->output);
+    }
   }
   return status;
 }
@@ -758,9 +761,24 @@ VP8StatusCode WebPDecode(const uint8_t* data, size_t data_size,
   }
 
   WebPResetDecParams(&params);
-  params.output = &config->output;
   params.options = &config->options;
-  status = DecodeInto(data, data_size, &params);
+  params.output = &config->output;
+  if (WebPAvoidSlowMemory(params.output, &config->input)) {
+    // decoding to slow memory: use a temporary in-mem buffer to decode into.
+    WebPDecBuffer in_mem_buffer;
+    WebPInitDecBuffer(&in_mem_buffer);
+    in_mem_buffer.colorspace = config->output.colorspace;
+    in_mem_buffer.width = config->input.width;
+    in_mem_buffer.height = config->input.height;
+    params.output = &in_mem_buffer;
+    status = DecodeInto(data, data_size, &params);
+    if (status == VP8_STATUS_OK) {  // do the slow-copy
+      status = WebPCopyDecBufferPixels(&in_mem_buffer, &config->output);
+    }
+    WebPFreeDecBuffer(&in_mem_buffer);
+  } else {
+    status = DecodeInto(data, data_size, &params);
+  }
 
   return status;
 }
@@ -809,7 +827,7 @@ int WebPIoInitFromOptions(const WebPDecoderOptions* const options,
   }
 
   // Filter
-  io->bypass_filtering = options && options->bypass_filtering;
+  io->bypass_filtering = (options != NULL) && options->bypass_filtering;
 
   // Fancy upsampler
 #ifdef FANCY_UPSAMPLING
@@ -826,4 +844,3 @@ int WebPIoInitFromOptions(const WebPDecoderOptions* const options,
 }
 
 //------------------------------------------------------------------------------
-
diff --git a/src/3rdparty/libwebp/src/dec/webpi.h b/src/3rdparty/libwebp/src/dec/webpi.h
index c75a2e4..991b194 100644
--- a/src/3rdparty/libwebp/src/dec/webpi.h
+++ b/src/3rdparty/libwebp/src/dec/webpi.h
@@ -45,11 +45,20 @@ struct WebPDecParams {
   OutputFunc emit;               // output RGB or YUV samples
   OutputAlphaFunc emit_alpha;    // output alpha channel
   OutputRowFunc emit_alpha_row;  // output one line of rescaled alpha values
+
+  WebPDecBuffer* final_output;   // In case the user supplied a slow-memory
+                                 // output, we decode image in temporary buffer
+                                 // (this::output) and copy it here.
+  WebPDecBuffer tmp_buffer;      // this::output will point to this one in case
+                                 // of slow memory.
 };
 
 // Should be called first, before any use of the WebPDecParams object.
 void WebPResetDecParams(WebPDecParams* const params);
 
+// Delete all memory (after an error occurred, for instance)
+void WebPFreeDecParams(WebPDecParams* const params);
+
 //------------------------------------------------------------------------------
 // Header parsing helpers
 
@@ -107,13 +116,23 @@ VP8StatusCode WebPAllocateDecBuffer(int width, int height,
 VP8StatusCode WebPFlipBuffer(WebPDecBuffer* const buffer);
 
 // Copy 'src' into 'dst' buffer, making sure 'dst' is not marked as owner of the
-// memory (still held by 'src').
+// memory (still held by 'src'). No pixels are copied.
 void WebPCopyDecBuffer(const WebPDecBuffer* const src,
                        WebPDecBuffer* const dst);
 
 // Copy and transfer ownership from src to dst (beware of parameter order!)
 void WebPGrabDecBuffer(WebPDecBuffer* const src, WebPDecBuffer* const dst);
 
+// Copy pixels from 'src' into a *preallocated* 'dst' buffer. Returns
+// VP8_STATUS_INVALID_PARAM if the 'dst' is not set up correctly for the copy.
+VP8StatusCode WebPCopyDecBufferPixels(const WebPDecBuffer* const src,
+                                      WebPDecBuffer* const dst);
+
+// Returns true if decoding will be slow with the current configuration
+// and bitstream features.
+int WebPAvoidSlowMemory(const WebPDecBuffer* const output,
+                        const WebPBitstreamFeatures* const features);
+
 //------------------------------------------------------------------------------
 
 #ifdef __cplusplus
diff --git a/src/3rdparty/libwebp/src/dsp/common_sse2.h b/src/3rdparty/libwebp/src/dsp/common_sse2.h
new file mode 100644
index 0000000..7cea13f
--- /dev/null
+++ b/src/3rdparty/libwebp/src/dsp/common_sse2.h
@@ -0,0 +1,109 @@
+// Copyright 2016 Google Inc. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the COPYING file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS. All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+// -----------------------------------------------------------------------------
+//
+// SSE2 code common to several files.
+//
+// Author: Vincent Rabaud (vrabaud@google.com)
+
+#ifndef WEBP_DSP_COMMON_SSE2_H_
+#define WEBP_DSP_COMMON_SSE2_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if defined(WEBP_USE_SSE2)
+
+#include <emmintrin.h>
+
+//------------------------------------------------------------------------------
+// Quite useful macro for debugging. Left here for convenience.
+
+#if 0
+#include <stdio.h>
+static WEBP_INLINE void PrintReg(const __m128i r, const char* const name,
+                                 int size) {
+  int n;
+  union {
+    __m128i r;
+    uint8_t i8[16];
+    uint16_t i16[8];
+    uint32_t i32[4];
+    uint64_t i64[2];
+  } tmp;
+  tmp.r = r;
+  fprintf(stderr, "%s\t: ", name);
+  if (size == 8) {
+    for (n = 0; n < 16; ++n) fprintf(stderr, "%.2x ", tmp.i8[n]);
+  } else if (size == 16) {
+    for (n = 0; n < 8; ++n) fprintf(stderr, "%.4x ", tmp.i16[n]);
+  } else if (size == 32) {
+    for (n = 0; n < 4; ++n) fprintf(stderr, "%.8x ", tmp.i32[n]);
+  } else {
+    for (n = 0; n < 2; ++n) fprintf(stderr, "%.16lx ", tmp.i64[n]);
+  }
+  fprintf(stderr, "\n");
+}
+#endif
+
+//------------------------------------------------------------------------------
+// Math functions.
+
+// Return the sum of all the 8b in the register.
+static WEBP_INLINE int VP8HorizontalAdd8b(const __m128i* const a) {
+  const __m128i zero = _mm_setzero_si128();
+  const __m128i sad8x2 = _mm_sad_epu8(*a, zero);
+  // sum the two sads: sad8x2[0:1] + sad8x2[8:9]
+  const __m128i sum = _mm_add_epi32(sad8x2, _mm_shuffle_epi32(sad8x2, 2));
+  return _mm_cvtsi128_si32(sum);
+}
+
+// Transpose two 4x4 16b matrices horizontally stored in registers.
+static WEBP_INLINE void VP8Transpose_2_4x4_16b(
+    const __m128i* const in0, const __m128i* const in1,
+    const __m128i* const in2, const __m128i* const in3, __m128i* const out0,
+    __m128i* const out1, __m128i* const out2, __m128i* const out3) {
+  // Transpose the two 4x4.
+  // a00 a01 a02 a03   b00 b01 b02 b03
+  // a10 a11 a12 a13   b10 b11 b12 b13
+  // a20 a21 a22 a23   b20 b21 b22 b23
+  // a30 a31 a32 a33   b30 b31 b32 b33
+  const __m128i transpose0_0 = _mm_unpacklo_epi16(*in0, *in1);
+  const __m128i transpose0_1 = _mm_unpacklo_epi16(*in2, *in3);
+  const __m128i transpose0_2 = _mm_unpackhi_epi16(*in0, *in1);
+  const __m128i transpose0_3 = _mm_unpackhi_epi16(*in2, *in3);
+  // a00 a10 a01 a11   a02 a12 a03 a13
+  // a20 a30 a21 a31   a22 a32 a23 a33
+  // b00 b10 b01 b11   b02 b12 b03 b13
+  // b20 b30 b21 b31   b22 b32 b23 b33
+  const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
+  const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
+  const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
+  const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
+  // a00 a10 a20 a30 a01 a11 a21 a31
+  // b00 b10 b20 b30 b01 b11 b21 b31
+  // a02 a12 a22 a32 a03 a13 a23 a33
+  // b02 b12 a22 b32 b03 b13 b23 b33
+  *out0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
+  *out1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
+  *out2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
+  *out3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
+  // a00 a10 a20 a30   b00 b10 b20 b30
+  // a01 a11 a21 a31   b01 b11 b21 b31
+  // a02 a12 a22 a32   b02 b12 b22 b32
+  // a03 a13 a23 a33   b03 b13 b23 b33
+}
+
+#endif  // WEBP_USE_SSE2
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // WEBP_DSP_COMMON_SSE2_H_
diff --git a/src/3rdparty/libwebp/src/dsp/cpu.c b/src/3rdparty/libwebp/src/dsp/cpu.c
index 8844cb4..cbb08db 100644
--- a/src/3rdparty/libwebp/src/dsp/cpu.c
+++ b/src/3rdparty/libwebp/src/dsp/cpu.c
@@ -13,6 +13,11 @@
 
 #include "./dsp.h"
 
+#if defined(WEBP_HAVE_NEON_RTCD)
+#include <stdio.h>
+#include <string.h>
+#endif
+
 #if defined(WEBP_ANDROID_NEON)
 #include <cpu-features.h>
 #endif
@@ -142,13 +147,33 @@ VP8CPUInfo VP8GetCPUInfo = AndroidCPUInfo;
 // define a dummy function to enable turning off NEON at runtime by setting
 // VP8DecGetCPUInfo = NULL
 static int armCPUInfo(CPUFeature feature) {
-  (void)feature;
+  if (feature != kNEON) return 0;
+#if defined(__linux__) && defined(WEBP_HAVE_NEON_RTCD)
+  {
+    int has_neon = 0;
+    char line[200];
+    FILE* const cpuinfo = fopen("/proc/cpuinfo", "r");
+    if (cpuinfo == NULL) return 0;
+    while (fgets(line, sizeof(line), cpuinfo)) {
+      if (!strncmp(line, "Features", 8)) {
+        if (strstr(line, " neon ") != NULL) {
+          has_neon = 1;
+          break;
+        }
+      }
+    }
+    fclose(cpuinfo);
+    return has_neon;
+  }
+#else
   return 1;
+#endif
 }
 VP8CPUInfo VP8GetCPUInfo = armCPUInfo;
-#elif defined(WEBP_USE_MIPS32) || defined(WEBP_USE_MIPS_DSP_R2)
+#elif defined(WEBP_USE_MIPS32) || defined(WEBP_USE_MIPS_DSP_R2) || \
+      defined(WEBP_USE_MSA)
 static int mipsCPUInfo(CPUFeature feature) {
-  if ((feature == kMIPS32) || (feature == kMIPSdspR2)) {
+  if ((feature == kMIPS32) || (feature == kMIPSdspR2) || (feature == kMSA)) {
     return 1;
   } else {
     return 0;
diff --git a/src/3rdparty/libwebp/src/dsp/dec.c b/src/3rdparty/libwebp/src/dsp/dec.c
index a787206..e92d693 100644
--- a/src/3rdparty/libwebp/src/dsp/dec.c
+++ b/src/3rdparty/libwebp/src/dsp/dec.c
@@ -13,6 +13,7 @@
 
 #include "./dsp.h"
 #include "../dec/vp8i.h"
+#include "../utils/utils.h"
 
 //------------------------------------------------------------------------------
 
@@ -654,6 +655,23 @@ static void HFilter8i(uint8_t* u, uint8_t* v, int stride,
 
 //------------------------------------------------------------------------------
 
+static void DitherCombine8x8(const uint8_t* dither, uint8_t* dst,
+                             int dst_stride) {
+  int i, j;
+  for (j = 0; j < 8; ++j) {
+    for (i = 0; i < 8; ++i) {
+      const int delta0 = dither[i] - VP8_DITHER_AMP_CENTER;
+      const int delta1 =
+          (delta0 + VP8_DITHER_DESCALE_ROUNDER) >> VP8_DITHER_DESCALE;
+      dst[i] = clip_8b((int)dst[i] + delta1);
+    }
+    dst += dst_stride;
+    dither += 8;
+  }
+}
+
+//------------------------------------------------------------------------------
+
 VP8DecIdct2 VP8Transform;
 VP8DecIdct VP8TransformAC3;
 VP8DecIdct VP8TransformUV;
@@ -673,11 +691,15 @@ VP8SimpleFilterFunc VP8SimpleHFilter16;
 VP8SimpleFilterFunc VP8SimpleVFilter16i;
 VP8SimpleFilterFunc VP8SimpleHFilter16i;
 
+void (*VP8DitherCombine8x8)(const uint8_t* dither, uint8_t* dst,
+                            int dst_stride);
+
 extern void VP8DspInitSSE2(void);
 extern void VP8DspInitSSE41(void);
 extern void VP8DspInitNEON(void);
 extern void VP8DspInitMIPS32(void);
 extern void VP8DspInitMIPSdspR2(void);
+extern void VP8DspInitMSA(void);
 
 static volatile VP8CPUInfo dec_last_cpuinfo_used =
     (VP8CPUInfo)&dec_last_cpuinfo_used;
@@ -734,6 +756,8 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8DspInit(void) {
   VP8PredChroma8[5] = DC8uvNoLeft;
   VP8PredChroma8[6] = DC8uvNoTopLeft;
 
+  VP8DitherCombine8x8 = DitherCombine8x8;
+
   // If defined, use CPUInfo() to overwrite some pointers with faster versions.
   if (VP8GetCPUInfo != NULL) {
 #if defined(WEBP_USE_SSE2)
@@ -761,6 +785,11 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8DspInit(void) {
       VP8DspInitMIPSdspR2();
     }
 #endif
+#if defined(WEBP_USE_MSA)
+    if (VP8GetCPUInfo(kMSA)) {
+      VP8DspInitMSA();
+    }
+#endif
   }
   dec_last_cpuinfo_used = VP8GetCPUInfo;
 }
diff --git a/src/3rdparty/libwebp/src/dsp/dec_msa.c b/src/3rdparty/libwebp/src/dsp/dec_msa.c
new file mode 100644
index 0000000..f76055c
--- /dev/null
+++ b/src/3rdparty/libwebp/src/dsp/dec_msa.c
@@ -0,0 +1,172 @@
+// Copyright 2016 Google Inc. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the COPYING file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS. All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+// -----------------------------------------------------------------------------
+//
+// MSA version of dsp functions
+//
+// Author(s):  Prashant Patil   (prashant.patil@imgtec.com)
+
+
+#include "./dsp.h"
+
+#if defined(WEBP_USE_MSA)
+
+#include "./msa_macro.h"
+
+//------------------------------------------------------------------------------
+// Transforms
+
+#define IDCT_1D_W(in0, in1, in2, in3, out0, out1, out2, out3) {  \
+  v4i32 a1_m, b1_m, c1_m, d1_m;                                  \
+  v4i32 c_tmp1_m, c_tmp2_m, d_tmp1_m, d_tmp2_m;                  \
+  const v4i32 cospi8sqrt2minus1 = __msa_fill_w(20091);           \
+  const v4i32 sinpi8sqrt2 = __msa_fill_w(35468);                 \
+                                                                 \
+  a1_m = in0 + in2;                                              \
+  b1_m = in0 - in2;                                              \
+  c_tmp1_m = (in1 * sinpi8sqrt2) >> 16;                          \
+  c_tmp2_m = in3 + ((in3 * cospi8sqrt2minus1) >> 16);            \
+  c1_m = c_tmp1_m - c_tmp2_m;                                    \
+  d_tmp1_m = in1 + ((in1 * cospi8sqrt2minus1) >> 16);            \
+  d_tmp2_m = (in3 * sinpi8sqrt2) >> 16;                          \
+  d1_m = d_tmp1_m + d_tmp2_m;                                    \
+  BUTTERFLY_4(a1_m, b1_m, c1_m, d1_m, out0, out1, out2, out3);   \
+}
+#define MULT1(a) ((((a) * 20091) >> 16) + (a))
+#define MULT2(a) (((a) * 35468) >> 16)
+
+static void TransformOne(const int16_t* in, uint8_t* dst) {
+  v8i16 input0, input1;
+  v4i32 in0, in1, in2, in3, hz0, hz1, hz2, hz3, vt0, vt1, vt2, vt3;
+  v4i32 res0, res1, res2, res3;
+  const v16i8 zero = { 0 };
+  v16i8 dest0, dest1, dest2, dest3;
+
+  LD_SH2(in, 8, input0, input1);
+  UNPCK_SH_SW(input0, in0, in1);
+  UNPCK_SH_SW(input1, in2, in3);
+  IDCT_1D_W(in0, in1, in2, in3, hz0, hz1, hz2, hz3);
+  TRANSPOSE4x4_SW_SW(hz0, hz1, hz2, hz3, hz0, hz1, hz2, hz3);
+  IDCT_1D_W(hz0, hz1, hz2, hz3, vt0, vt1, vt2, vt3);
+  SRARI_W4_SW(vt0, vt1, vt2, vt3, 3);
+  TRANSPOSE4x4_SW_SW(vt0, vt1, vt2, vt3, vt0, vt1, vt2, vt3);
+  LD_SB4(dst, BPS, dest0, dest1, dest2, dest3);
+  ILVR_B4_SW(zero, dest0, zero, dest1, zero, dest2, zero, dest3,
+             res0, res1, res2, res3);
+  ILVR_H4_SW(zero, res0, zero, res1, zero, res2, zero, res3,
+             res0, res1, res2, res3);
+  ADD4(res0, vt0, res1, vt1, res2, vt2, res3, vt3, res0, res1, res2, res3);
+  CLIP_SW4_0_255(res0, res1, res2, res3);
+  PCKEV_B2_SW(res0, res1, res2, res3, vt0, vt1);
+  res0 = (v4i32)__msa_pckev_b((v16i8)vt0, (v16i8)vt1);
+  ST4x4_UB(res0, res0, 3, 2, 1, 0, dst, BPS);
+}
+
+static void TransformTwo(const int16_t* in, uint8_t* dst, int do_two) {
+  TransformOne(in, dst);
+  if (do_two) {
+    TransformOne(in + 16, dst + 4);
+  }
+}
+
+static void TransformWHT(const int16_t* in, int16_t* out) {
+  v8i16 input0, input1;
+  const v8i16 mask0 = { 0, 1, 2, 3, 8, 9, 10, 11 };
+  const v8i16 mask1 = { 4, 5, 6, 7, 12, 13, 14, 15 };
+  const v8i16 mask2 = { 0, 4, 8, 12, 1, 5, 9, 13 };
+  const v8i16 mask3 = { 3, 7, 11, 15, 2, 6, 10, 14 };
+  v8i16 tmp0, tmp1, tmp2, tmp3;
+  v8i16 out0, out1;
+
+  LD_SH2(in, 8, input0, input1);
+  input1 = SLDI_SH(input1, input1, 8);
+  tmp0 = input0 + input1;
+  tmp1 = input0 - input1;
+  VSHF_H2_SH(tmp0, tmp1, tmp0, tmp1, mask0, mask1, tmp2, tmp3);
+  out0 = tmp2 + tmp3;
+  out1 = tmp2 - tmp3;
+  VSHF_H2_SH(out0, out1, out0, out1, mask2, mask3, input0, input1);
+  tmp0 = input0 + input1;
+  tmp1 = input0 - input1;
+  VSHF_H2_SH(tmp0, tmp1, tmp0, tmp1, mask0, mask1, tmp2, tmp3);
+  tmp0 = tmp2 + tmp3;
+  tmp1 = tmp2 - tmp3;
+  ADDVI_H2_SH(tmp0, 3, tmp1, 3, out0, out1);
+  SRAI_H2_SH(out0, out1, 3);
+  out[0] = __msa_copy_s_h(out0, 0);
+  out[16] = __msa_copy_s_h(out0, 4);
+  out[32] = __msa_copy_s_h(out1, 0);
+  out[48] = __msa_copy_s_h(out1, 4);
+  out[64] = __msa_copy_s_h(out0, 1);
+  out[80] = __msa_copy_s_h(out0, 5);
+  out[96] = __msa_copy_s_h(out1, 1);
+  out[112] = __msa_copy_s_h(out1, 5);
+  out[128] = __msa_copy_s_h(out0, 2);
+  out[144] = __msa_copy_s_h(out0, 6);
+  out[160] = __msa_copy_s_h(out1, 2);
+  out[176] = __msa_copy_s_h(out1, 6);
+  out[192] = __msa_copy_s_h(out0, 3);
+  out[208] = __msa_copy_s_h(out0, 7);
+  out[224] = __msa_copy_s_h(out1, 3);
+  out[240] = __msa_copy_s_h(out1, 7);
+}
+
+static void TransformDC(const int16_t* in, uint8_t* dst) {
+  const int DC = (in[0] + 4) >> 3;
+  const v8i16 tmp0 = __msa_fill_h(DC);
+  ADDBLK_ST4x4_UB(tmp0, tmp0, tmp0, tmp0, dst, BPS);
+}
+
+static void TransformAC3(const int16_t* in, uint8_t* dst) {
+  const int a = in[0] + 4;
+  const int c4 = MULT2(in[4]);
+  const int d4 = MULT1(in[4]);
+  const int in2 = MULT2(in[1]);
+  const int in3 = MULT1(in[1]);
+  v4i32 tmp0 = { 0 };
+  v4i32 out0 = __msa_fill_w(a + d4);
+  v4i32 out1 = __msa_fill_w(a + c4);
+  v4i32 out2 = __msa_fill_w(a - c4);
+  v4i32 out3 = __msa_fill_w(a - d4);
+  v4i32 res0, res1, res2, res3;
+  const v4i32 zero = { 0 };
+  v16u8 dest0, dest1, dest2, dest3;
+
+  INSERT_W4_SW(in3, in2, -in2, -in3, tmp0);
+  ADD4(out0, tmp0, out1, tmp0, out2, tmp0, out3, tmp0,
+       out0, out1, out2, out3);
+  SRAI_W4_SW(out0, out1, out2, out3, 3);
+  LD_UB4(dst, BPS, dest0, dest1, dest2, dest3);
+  ILVR_B4_SW(zero, dest0, zero, dest1, zero, dest2, zero, dest3,
+             res0, res1, res2, res3);
+  ILVR_H4_SW(zero, res0, zero, res1, zero, res2, zero, res3,
+             res0, res1, res2, res3);
+  ADD4(res0, out0, res1, out1, res2, out2, res3, out3, res0, res1, res2, res3);
+  CLIP_SW4_0_255(res0, res1, res2, res3);
+  PCKEV_B2_SW(res0, res1, res2, res3, out0, out1);
+  res0 = (v4i32)__msa_pckev_b((v16i8)out0, (v16i8)out1);
+  ST4x4_UB(res0, res0, 3, 2, 1, 0, dst, BPS);
+}
+
+//------------------------------------------------------------------------------
+// Entry point
+
+extern void VP8DspInitMSA(void);
+
+WEBP_TSAN_IGNORE_FUNCTION void VP8DspInitMSA(void) {
+  VP8TransformWHT = TransformWHT;
+  VP8Transform = TransformTwo;
+  VP8TransformDC = TransformDC;
+  VP8TransformAC3 = TransformAC3;
+}
+
+#else  // !WEBP_USE_MSA
+
+WEBP_DSP_INIT_STUB(VP8DspInitMSA)
+
+#endif  // WEBP_USE_MSA
diff --git a/src/3rdparty/libwebp/src/dsp/dec_sse2.c b/src/3rdparty/libwebp/src/dsp/dec_sse2.c
index 935bf02..f0a8ddc 100644
--- a/src/3rdparty/libwebp/src/dsp/dec_sse2.c
+++ b/src/3rdparty/libwebp/src/dsp/dec_sse2.c
@@ -21,7 +21,9 @@
 // #define USE_TRANSFORM_AC3
 
 #include <emmintrin.h>
+#include "./common_sse2.h"
 #include "../dec/vp8i.h"
+#include "../utils/utils.h"
 
 //------------------------------------------------------------------------------
 // Transforms (Paragraph 14.4)
@@ -102,34 +104,7 @@ static void Transform(const int16_t* in, uint8_t* dst, int do_two) {
     const __m128i tmp3 = _mm_sub_epi16(a, d);
 
     // Transpose the two 4x4.
-    // a00 a01 a02 a03   b00 b01 b02 b03
-    // a10 a11 a12 a13   b10 b11 b12 b13
-    // a20 a21 a22 a23   b20 b21 b22 b23
-    // a30 a31 a32 a33   b30 b31 b32 b33
-    const __m128i transpose0_0 = _mm_unpacklo_epi16(tmp0, tmp1);
-    const __m128i transpose0_1 = _mm_unpacklo_epi16(tmp2, tmp3);
-    const __m128i transpose0_2 = _mm_unpackhi_epi16(tmp0, tmp1);
-    const __m128i transpose0_3 = _mm_unpackhi_epi16(tmp2, tmp3);
-    // a00 a10 a01 a11   a02 a12 a03 a13
-    // a20 a30 a21 a31   a22 a32 a23 a33
-    // b00 b10 b01 b11   b02 b12 b03 b13
-    // b20 b30 b21 b31   b22 b32 b23 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
-    const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
-    // a00 a10 a20 a30 a01 a11 a21 a31
-    // b00 b10 b20 b30 b01 b11 b21 b31
-    // a02 a12 a22 a32 a03 a13 a23 a33
-    // b02 b12 a22 b32 b03 b13 b23 b33
-    T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
-    T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
-    T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
-    T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    VP8Transpose_2_4x4_16b(&tmp0, &tmp1, &tmp2, &tmp3, &T0, &T1, &T2, &T3);
   }
 
   // Horizontal pass and subsequent transpose.
@@ -164,34 +139,8 @@ static void Transform(const int16_t* in, uint8_t* dst, int do_two) {
     const __m128i shifted3 = _mm_srai_epi16(tmp3, 3);
 
     // Transpose the two 4x4.
-    // a00 a01 a02 a03   b00 b01 b02 b03
-    // a10 a11 a12 a13   b10 b11 b12 b13
-    // a20 a21 a22 a23   b20 b21 b22 b23
-    // a30 a31 a32 a33   b30 b31 b32 b33
-    const __m128i transpose0_0 = _mm_unpacklo_epi16(shifted0, shifted1);
-    const __m128i transpose0_1 = _mm_unpacklo_epi16(shifted2, shifted3);
-    const __m128i transpose0_2 = _mm_unpackhi_epi16(shifted0, shifted1);
-    const __m128i transpose0_3 = _mm_unpackhi_epi16(shifted2, shifted3);
-    // a00 a10 a01 a11   a02 a12 a03 a13
-    // a20 a30 a21 a31   a22 a32 a23 a33
-    // b00 b10 b01 b11   b02 b12 b03 b13
-    // b20 b30 b21 b31   b22 b32 b23 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
-    const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
-    // a00 a10 a20 a30 a01 a11 a21 a31
-    // b00 b10 b20 b30 b01 b11 b21 b31
-    // a02 a12 a22 a32 a03 a13 a23 a33
-    // b02 b12 a22 b32 b03 b13 b23 b33
-    T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
-    T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
-    T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
-    T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    VP8Transpose_2_4x4_16b(&shifted0, &shifted1, &shifted2, &shifted3, &T0, &T1,
+                        &T2, &T3);
   }
 
   // Add inverse transform to 'dst' and store.
diff --git a/src/3rdparty/libwebp/src/dsp/dec_sse41.c b/src/3rdparty/libwebp/src/dsp/dec_sse41.c
index 224c6f8..8d6aed1 100644
--- a/src/3rdparty/libwebp/src/dsp/dec_sse41.c
+++ b/src/3rdparty/libwebp/src/dsp/dec_sse41.c
@@ -17,6 +17,7 @@
 
 #include <smmintrin.h>
 #include "../dec/vp8i.h"
+#include "../utils/utils.h"
 
 static void HE16(uint8_t* dst) {     // horizontal
   int j;
diff --git a/src/3rdparty/libwebp/src/dsp/dsp.h b/src/3rdparty/libwebp/src/dsp/dsp.h
index 95f1ce0..1faac27 100644
--- a/src/3rdparty/libwebp/src/dsp/dsp.h
+++ b/src/3rdparty/libwebp/src/dsp/dsp.h
@@ -14,8 +14,11 @@
 #ifndef WEBP_DSP_DSP_H_
 #define WEBP_DSP_DSP_H_
 
+#ifdef HAVE_CONFIG_H
+#include "../webp/config.h"
+#endif
+
 #include "../webp/types.h"
-#include "../utils/utils.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -72,7 +75,8 @@ extern "C" {
 // The intrinsics currently cause compiler errors with arm-nacl-gcc and the
 // inline assembly would need to be modified for use with Native Client.
 #if (defined(__ARM_NEON__) || defined(WEBP_ANDROID_NEON) || \
-     defined(__aarch64__)) && !defined(__native_client__)
+     defined(__aarch64__) || defined(WEBP_HAVE_NEON)) && \
+    !defined(__native_client__)
 #define WEBP_USE_NEON
 #endif
 
@@ -92,6 +96,10 @@ extern "C" {
 #endif
 #endif
 
+#if defined(__mips_msa) && defined(__mips_isa_rev) && (__mips_isa_rev >= 5)
+#define WEBP_USE_MSA
+#endif
+
 // This macro prevents thread_sanitizer from reporting known concurrent writes.
 #define WEBP_TSAN_IGNORE_FUNCTION
 #if defined(__has_feature)
@@ -101,6 +109,27 @@ extern "C" {
 #endif
 #endif
 
+#define WEBP_UBSAN_IGNORE_UNDEF
+#define WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
+#if !defined(WEBP_FORCE_ALIGNED) && defined(__clang__) && \
+    defined(__has_attribute)
+#if __has_attribute(no_sanitize)
+// This macro prevents the undefined behavior sanitizer from reporting
+// failures. This is only meant to silence unaligned loads on platforms that
+// are known to support them.
+#undef WEBP_UBSAN_IGNORE_UNDEF
+#define WEBP_UBSAN_IGNORE_UNDEF \
+  __attribute__((no_sanitize("undefined")))
+
+// This macro prevents the undefined behavior sanitizer from reporting
+// failures related to unsigned integer overflows. This is only meant to
+// silence cases where this well defined behavior is expected.
+#undef WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
+#define WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW \
+  __attribute__((no_sanitize("unsigned-integer-overflow")))
+#endif
+#endif
+
 typedef enum {
   kSSE2,
   kSSE3,
@@ -109,7 +138,8 @@ typedef enum {
   kAVX2,
   kNEON,
   kMIPS32,
-  kMIPSdspR2
+  kMIPSdspR2,
+  kMSA
 } CPUFeature;
 // returns true if the CPU supports the feature.
 typedef int (*VP8CPUInfo)(CPUFeature feature);
@@ -151,6 +181,8 @@ typedef int (*VP8Metric)(const uint8_t* pix, const uint8_t* ref);
 extern VP8Metric VP8SSE16x16, VP8SSE16x8, VP8SSE8x8, VP8SSE4x4;
 typedef int (*VP8WMetric)(const uint8_t* pix, const uint8_t* ref,
                           const uint16_t* const weights);
+// The weights for VP8TDisto4x4 and VP8TDisto16x16 contain a row-major
+// 4 by 4 symmetric matrix.
 extern VP8WMetric VP8TDisto4x4, VP8TDisto16x16;
 
 typedef void (*VP8BlockCopy)(const uint8_t* src, uint8_t* dst);
@@ -214,6 +246,35 @@ extern VP8GetResidualCostFunc VP8GetResidualCost;
 void VP8EncDspCostInit(void);
 
 //------------------------------------------------------------------------------
+// SSIM utils
+
+// struct for accumulating statistical moments
+typedef struct {
+  double w;              // sum(w_i) : sum of weights
+  double xm, ym;         // sum(w_i * x_i), sum(w_i * y_i)
+  double xxm, xym, yym;  // sum(w_i * x_i * x_i), etc.
+} VP8DistoStats;
+
+#define VP8_SSIM_KERNEL 3   // total size of the kernel: 2 * VP8_SSIM_KERNEL + 1
+typedef void (*VP8SSIMAccumulateClippedFunc)(const uint8_t* src1, int stride1,
+                                             const uint8_t* src2, int stride2,
+                                             int xo, int yo,  // center position
+                                             int W, int H,    // plane dimension
+                                             VP8DistoStats* const stats);
+
+// This version is called with the guarantee that you can load 8 bytes and
+// 8 rows at offset src1 and src2
+typedef void (*VP8SSIMAccumulateFunc)(const uint8_t* src1, int stride1,
+                                      const uint8_t* src2, int stride2,
+                                      VP8DistoStats* const stats);
+
+extern VP8SSIMAccumulateFunc VP8SSIMAccumulate;         // unclipped / unchecked
+extern VP8SSIMAccumulateClippedFunc VP8SSIMAccumulateClipped;   // with clipping
+
+// must be called before using any of the above directly
+void VP8SSIMDspInit(void);
+
+//------------------------------------------------------------------------------
 // Decoding
 
 typedef void (*VP8DecIdct)(const int16_t* coeffs, uint8_t* dst);
@@ -265,6 +326,15 @@ extern VP8LumaFilterFunc VP8HFilter16i;
 extern VP8ChromaFilterFunc VP8VFilter8i;  // filtering u and v altogether
 extern VP8ChromaFilterFunc VP8HFilter8i;
 
+// Dithering. Combines dithering values (centered around 128) with dst[],
+// according to: dst[] = clip(dst[] + (((dither[]-128) + 8) >> 4)
+#define VP8_DITHER_DESCALE 4
+#define VP8_DITHER_DESCALE_ROUNDER (1 << (VP8_DITHER_DESCALE - 1))
+#define VP8_DITHER_AMP_BITS 7
+#define VP8_DITHER_AMP_CENTER (1 << VP8_DITHER_AMP_BITS)
+extern void (*VP8DitherCombine8x8)(const uint8_t* dither, uint8_t* dst,
+                                   int dst_stride);
+
 // must be called before anything using the above
 void VP8DspInit(void);
 
@@ -472,8 +542,10 @@ typedef enum {     // Filter types.
 
 typedef void (*WebPFilterFunc)(const uint8_t* in, int width, int height,
                                int stride, uint8_t* out);
-typedef void (*WebPUnfilterFunc)(int width, int height, int stride,
-                                 int row, int num_rows, uint8_t* data);
+// In-place un-filtering.
+// Warning! 'prev_line' pointer can be equal to 'cur_line' or 'preds'.
+typedef void (*WebPUnfilterFunc)(const uint8_t* prev_line, const uint8_t* preds,
+                                 uint8_t* cur_line, int width);
 
 // Filter the given data using the given predictor.
 // 'in' corresponds to a 2-dimensional pixel array of size (stride * height)
diff --git a/src/3rdparty/libwebp/src/dsp/enc.c b/src/3rdparty/libwebp/src/dsp/enc.c
index 8899d50..f639f55 100644
--- a/src/3rdparty/libwebp/src/dsp/enc.c
+++ b/src/3rdparty/libwebp/src/dsp/enc.c
@@ -69,7 +69,7 @@ static void CollectHistogram(const uint8_t* ref, const uint8_t* pred,
 
     // Convert coefficients to bin.
     for (k = 0; k < 16; ++k) {
-      const int v = abs(out[k]) >> 3;  // TODO(skal): add rounding?
+      const int v = abs(out[k]) >> 3;
       const int clipped_value = clip_max(v, MAX_COEFF_THRESH);
       ++distribution[clipped_value];
     }
@@ -559,6 +559,7 @@ static int SSE4x4(const uint8_t* a, const uint8_t* b) {
 
 // Hadamard transform
 // Returns the weighted sum of the absolute value of transformed coefficients.
+// w[] contains a row-major 4 by 4 symmetric matrix.
 static int TTransform(const uint8_t* in, const uint16_t* w) {
   int sum = 0;
   int tmp[16];
@@ -636,7 +637,7 @@ static int QuantizeBlock(int16_t in[16], int16_t out[16],
       int level = QUANTDIV(coeff, iQ, B);
       if (level > MAX_LEVEL) level = MAX_LEVEL;
       if (sign) level = -level;
-      in[j] = level * Q;
+      in[j] = level * (int)Q;
       out[n] = level;
       if (level) last = n;
     } else {
@@ -670,7 +671,7 @@ static int QuantizeBlockWHT(int16_t in[16], int16_t out[16],
       int level = QUANTDIV(coeff, iQ, B);
       if (level > MAX_LEVEL) level = MAX_LEVEL;
       if (sign) level = -level;
-      in[j] = level * Q;
+      in[j] = level * (int)Q;
       out[n] = level;
       if (level) last = n;
     } else {
@@ -702,6 +703,68 @@ static void Copy16x8(const uint8_t* src, uint8_t* dst) {
 }
 
 //------------------------------------------------------------------------------
+
+static void SSIMAccumulateClipped(const uint8_t* src1, int stride1,
+                                  const uint8_t* src2, int stride2,
+                                  int xo, int yo, int W, int H,
+                                  VP8DistoStats* const stats) {
+  const int ymin = (yo - VP8_SSIM_KERNEL < 0) ? 0 : yo - VP8_SSIM_KERNEL;
+  const int ymax = (yo + VP8_SSIM_KERNEL > H - 1) ? H - 1
+                                                  : yo + VP8_SSIM_KERNEL;
+  const int xmin = (xo - VP8_SSIM_KERNEL < 0) ? 0 : xo - VP8_SSIM_KERNEL;
+  const int xmax = (xo + VP8_SSIM_KERNEL > W - 1) ? W - 1
+                                                  : xo + VP8_SSIM_KERNEL;
+  int x, y;
+  src1 += ymin * stride1;
+  src2 += ymin * stride2;
+  for (y = ymin; y <= ymax; ++y, src1 += stride1, src2 += stride2) {
+    for (x = xmin; x <= xmax; ++x) {
+      const int s1 = src1[x];
+      const int s2 = src2[x];
+      stats->w   += 1;
+      stats->xm  += s1;
+      stats->ym  += s2;
+      stats->xxm += s1 * s1;
+      stats->xym += s1 * s2;
+      stats->yym += s2 * s2;
+    }
+  }
+}
+
+static void SSIMAccumulate(const uint8_t* src1, int stride1,
+                           const uint8_t* src2, int stride2,
+                           VP8DistoStats* const stats) {
+  int x, y;
+  for (y = 0; y <= 2 * VP8_SSIM_KERNEL; ++y, src1 += stride1, src2 += stride2) {
+    for (x = 0; x <= 2 * VP8_SSIM_KERNEL; ++x) {
+      const int s1 = src1[x];
+      const int s2 = src2[x];
+      stats->w   += 1;
+      stats->xm  += s1;
+      stats->ym  += s2;
+      stats->xxm += s1 * s1;
+      stats->xym += s1 * s2;
+      stats->yym += s2 * s2;
+    }
+  }
+}
+
+VP8SSIMAccumulateFunc VP8SSIMAccumulate;
+VP8SSIMAccumulateClippedFunc VP8SSIMAccumulateClipped;
+
+static volatile VP8CPUInfo ssim_last_cpuinfo_used =
+    (VP8CPUInfo)&ssim_last_cpuinfo_used;
+
+WEBP_TSAN_IGNORE_FUNCTION void VP8SSIMDspInit(void) {
+  if (ssim_last_cpuinfo_used == VP8GetCPUInfo) return;
+
+  VP8SSIMAccumulate = SSIMAccumulate;
+  VP8SSIMAccumulateClipped = SSIMAccumulateClipped;
+
+  ssim_last_cpuinfo_used = VP8GetCPUInfo;
+}
+
+//------------------------------------------------------------------------------
 // Initialization
 
 // Speed-critical function pointers. We have to initialize them to the default
diff --git a/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c
index 7c814fa..7ab96f6 100644
--- a/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c
+++ b/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c
@@ -1393,8 +1393,6 @@ static void FTransformWHT(const int16_t* in, int16_t* out) {
   "absq_s.ph  %[temp1],  %[temp1]                      \n\t"                   \
   "absq_s.ph  %[temp2],  %[temp2]                      \n\t"                   \
   "absq_s.ph  %[temp3],  %[temp3]                      \n\t"                   \
-  /* TODO(skal): add rounding ? shra_r.ph : shra.ph */                         \
-  /*             for following 4 instructions       */                         \
   "shra.ph    %[temp0],  %[temp0],    3                \n\t"                   \
   "shra.ph    %[temp1],  %[temp1],    3                \n\t"                   \
   "shra.ph    %[temp2],  %[temp2],    3                \n\t"                   \
diff --git a/src/3rdparty/libwebp/src/dsp/enc_neon.c b/src/3rdparty/libwebp/src/dsp/enc_neon.c
index c2aef58..46f6bf9 100644
--- a/src/3rdparty/libwebp/src/dsp/enc_neon.c
+++ b/src/3rdparty/libwebp/src/dsp/enc_neon.c
@@ -560,21 +560,6 @@ static void FTransformWHT(const int16_t* src, int16_t* out) {
 // a 26ae, b 26ae
 // a 37bf, b 37bf
 //
-static WEBP_INLINE uint8x8x4_t DistoTranspose4x4U8(uint8x8x4_t d4_in) {
-  const uint8x8x2_t d2_tmp0 = vtrn_u8(d4_in.val[0], d4_in.val[1]);
-  const uint8x8x2_t d2_tmp1 = vtrn_u8(d4_in.val[2], d4_in.val[3]);
-  const uint16x4x2_t d2_tmp2 = vtrn_u16(vreinterpret_u16_u8(d2_tmp0.val[0]),
-                                        vreinterpret_u16_u8(d2_tmp1.val[0]));
-  const uint16x4x2_t d2_tmp3 = vtrn_u16(vreinterpret_u16_u8(d2_tmp0.val[1]),
-                                        vreinterpret_u16_u8(d2_tmp1.val[1]));
-
-  d4_in.val[0] = vreinterpret_u8_u16(d2_tmp2.val[0]);
-  d4_in.val[2] = vreinterpret_u8_u16(d2_tmp2.val[1]);
-  d4_in.val[1] = vreinterpret_u8_u16(d2_tmp3.val[0]);
-  d4_in.val[3] = vreinterpret_u8_u16(d2_tmp3.val[1]);
-  return d4_in;
-}
-
 static WEBP_INLINE int16x8x4_t DistoTranspose4x4S16(int16x8x4_t q4_in) {
   const int16x8x2_t q2_tmp0 = vtrnq_s16(q4_in.val[0], q4_in.val[1]);
   const int16x8x2_t q2_tmp1 = vtrnq_s16(q4_in.val[2], q4_in.val[3]);
@@ -589,41 +574,40 @@ static WEBP_INLINE int16x8x4_t DistoTranspose4x4S16(int16x8x4_t q4_in) {
   return q4_in;
 }
 
-static WEBP_INLINE int16x8x4_t DistoHorizontalPass(const uint8x8x4_t d4_in) {
+static WEBP_INLINE int16x8x4_t DistoHorizontalPass(const int16x8x4_t q4_in) {
   // {a0, a1} = {in[0] + in[2], in[1] + in[3]}
   // {a3, a2} = {in[0] - in[2], in[1] - in[3]}
-  const int16x8_t q_a0 = vreinterpretq_s16_u16(vaddl_u8(d4_in.val[0],
-                                                        d4_in.val[2]));
-  const int16x8_t q_a1 = vreinterpretq_s16_u16(vaddl_u8(d4_in.val[1],
-                                                        d4_in.val[3]));
-  const int16x8_t q_a3 = vreinterpretq_s16_u16(vsubl_u8(d4_in.val[0],
-                                                        d4_in.val[2]));
-  const int16x8_t q_a2 = vreinterpretq_s16_u16(vsubl_u8(d4_in.val[1],
-                                                        d4_in.val[3]));
+  const int16x8_t q_a0 = vaddq_s16(q4_in.val[0], q4_in.val[2]);
+  const int16x8_t q_a1 = vaddq_s16(q4_in.val[1], q4_in.val[3]);
+  const int16x8_t q_a3 = vsubq_s16(q4_in.val[0], q4_in.val[2]);
+  const int16x8_t q_a2 = vsubq_s16(q4_in.val[1], q4_in.val[3]);
   int16x8x4_t q4_out;
   // tmp[0] = a0 + a1
   // tmp[1] = a3 + a2
   // tmp[2] = a3 - a2
   // tmp[3] = a0 - a1
   INIT_VECTOR4(q4_out,
-               vaddq_s16(q_a0, q_a1), vaddq_s16(q_a3, q_a2),
-               vsubq_s16(q_a3, q_a2), vsubq_s16(q_a0, q_a1));
+               vabsq_s16(vaddq_s16(q_a0, q_a1)),
+               vabsq_s16(vaddq_s16(q_a3, q_a2)),
+               vabdq_s16(q_a3, q_a2), vabdq_s16(q_a0, q_a1));
   return q4_out;
 }
 
-static WEBP_INLINE int16x8x4_t DistoVerticalPass(int16x8x4_t q4_in) {
-  const int16x8_t q_a0 = vaddq_s16(q4_in.val[0], q4_in.val[2]);
-  const int16x8_t q_a1 = vaddq_s16(q4_in.val[1], q4_in.val[3]);
-  const int16x8_t q_a2 = vsubq_s16(q4_in.val[1], q4_in.val[3]);
-  const int16x8_t q_a3 = vsubq_s16(q4_in.val[0], q4_in.val[2]);
+static WEBP_INLINE int16x8x4_t DistoVerticalPass(const uint8x8x4_t q4_in) {
+  const int16x8_t q_a0 = vreinterpretq_s16_u16(vaddl_u8(q4_in.val[0],
+                                                        q4_in.val[2]));
+  const int16x8_t q_a1 = vreinterpretq_s16_u16(vaddl_u8(q4_in.val[1],
+                                                        q4_in.val[3]));
+  const int16x8_t q_a2 = vreinterpretq_s16_u16(vsubl_u8(q4_in.val[1],
+                                                        q4_in.val[3]));
+  const int16x8_t q_a3 = vreinterpretq_s16_u16(vsubl_u8(q4_in.val[0],
+                                                        q4_in.val[2]));
+  int16x8x4_t q4_out;
 
-  q4_in.val[0] = vaddq_s16(q_a0, q_a1);
-  q4_in.val[1] = vaddq_s16(q_a3, q_a2);
-  q4_in.val[2] = vabdq_s16(q_a3, q_a2);
-  q4_in.val[3] = vabdq_s16(q_a0, q_a1);
-  q4_in.val[0] = vabsq_s16(q4_in.val[0]);
-  q4_in.val[1] = vabsq_s16(q4_in.val[1]);
-  return q4_in;
+  INIT_VECTOR4(q4_out,
+               vaddq_s16(q_a0, q_a1), vaddq_s16(q_a3, q_a2),
+               vsubq_s16(q_a3, q_a2), vsubq_s16(q_a0, q_a1));
+  return q4_out;
 }
 
 static WEBP_INLINE int16x4x4_t DistoLoadW(const uint16_t* w) {
@@ -667,6 +651,7 @@ static WEBP_INLINE int32x2_t DistoSum(const int16x8x4_t q4_in,
 
 // Hadamard transform
 // Returns the weighted sum of the absolute value of transformed coefficients.
+// w[] contains a row-major 4 by 4 symmetric matrix.
 static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
                     const uint16_t* const w) {
   uint32x2_t d_in_ab_0123 = vdup_n_u32(0);
@@ -691,18 +676,19 @@ static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
                vreinterpret_u8_u32(d_in_ab_cdef));
 
   {
-    // horizontal pass
-    const uint8x8x4_t d4_t = DistoTranspose4x4U8(d4_in);
-    const int16x8x4_t q4_h = DistoHorizontalPass(d4_t);
+    // Vertical pass first to avoid a transpose (vertical and horizontal passes
+    // are commutative because w/kWeightY is symmetric) and subsequent
+    // transpose.
+    const int16x8x4_t q4_v = DistoVerticalPass(d4_in);
     const int16x4x4_t d4_w = DistoLoadW(w);
-    // vertical pass
-    const int16x8x4_t q4_t = DistoTranspose4x4S16(q4_h);
-    const int16x8x4_t q4_v = DistoVerticalPass(q4_t);
-    int32x2_t d_sum = DistoSum(q4_v, d4_w);
+    // horizontal pass
+    const int16x8x4_t q4_t = DistoTranspose4x4S16(q4_v);
+    const int16x8x4_t q4_h = DistoHorizontalPass(q4_t);
+    int32x2_t d_sum = DistoSum(q4_h, d4_w);
 
     // abs(sum2 - sum1) >> 5
     d_sum = vabs_s32(d_sum);
-    d_sum  = vshr_n_s32(d_sum, 5);
+    d_sum = vshr_n_s32(d_sum, 5);
     return vget_lane_s32(d_sum, 0);
   }
 }
diff --git a/src/3rdparty/libwebp/src/dsp/enc_sse2.c b/src/3rdparty/libwebp/src/dsp/enc_sse2.c
index 2333d2b..4a2e3ce 100644
--- a/src/3rdparty/libwebp/src/dsp/enc_sse2.c
+++ b/src/3rdparty/libwebp/src/dsp/enc_sse2.c
@@ -17,39 +17,11 @@
 #include <stdlib.h>  // for abs()
 #include <emmintrin.h>
 
+#include "./common_sse2.h"
 #include "../enc/cost.h"
 #include "../enc/vp8enci.h"
 
 //------------------------------------------------------------------------------
-// Quite useful macro for debugging. Left here for convenience.
-
-#if 0
-#include <stdio.h>
-static void PrintReg(const __m128i r, const char* const name, int size) {
-  int n;
-  union {
-    __m128i r;
-    uint8_t i8[16];
-    uint16_t i16[8];
-    uint32_t i32[4];
-    uint64_t i64[2];
-  } tmp;
-  tmp.r = r;
-  fprintf(stderr, "%s\t: ", name);
-  if (size == 8) {
-    for (n = 0; n < 16; ++n) fprintf(stderr, "%.2x ", tmp.i8[n]);
-  } else if (size == 16) {
-    for (n = 0; n < 8; ++n) fprintf(stderr, "%.4x ", tmp.i16[n]);
-  } else if (size == 32) {
-    for (n = 0; n < 4; ++n) fprintf(stderr, "%.8x ", tmp.i32[n]);
-  } else {
-    for (n = 0; n < 2; ++n) fprintf(stderr, "%.16lx ", tmp.i64[n]);
-  }
-  fprintf(stderr, "\n");
-}
-#endif
-
-//------------------------------------------------------------------------------
 // Transforms (Paragraph 14.4)
 
 // Does one or two inverse transforms.
@@ -131,34 +103,7 @@ static void ITransform(const uint8_t* ref, const int16_t* in, uint8_t* dst,
     const __m128i tmp3 = _mm_sub_epi16(a, d);
 
     // Transpose the two 4x4.
-    // a00 a01 a02 a03   b00 b01 b02 b03
-    // a10 a11 a12 a13   b10 b11 b12 b13
-    // a20 a21 a22 a23   b20 b21 b22 b23
-    // a30 a31 a32 a33   b30 b31 b32 b33
-    const __m128i transpose0_0 = _mm_unpacklo_epi16(tmp0, tmp1);
-    const __m128i transpose0_1 = _mm_unpacklo_epi16(tmp2, tmp3);
-    const __m128i transpose0_2 = _mm_unpackhi_epi16(tmp0, tmp1);
-    const __m128i transpose0_3 = _mm_unpackhi_epi16(tmp2, tmp3);
-    // a00 a10 a01 a11   a02 a12 a03 a13
-    // a20 a30 a21 a31   a22 a32 a23 a33
-    // b00 b10 b01 b11   b02 b12 b03 b13
-    // b20 b30 b21 b31   b22 b32 b23 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
-    const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
-    // a00 a10 a20 a30 a01 a11 a21 a31
-    // b00 b10 b20 b30 b01 b11 b21 b31
-    // a02 a12 a22 a32 a03 a13 a23 a33
-    // b02 b12 a22 b32 b03 b13 b23 b33
-    T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
-    T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
-    T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
-    T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    VP8Transpose_2_4x4_16b(&tmp0, &tmp1, &tmp2, &tmp3, &T0, &T1, &T2, &T3);
   }
 
   // Horizontal pass and subsequent transpose.
@@ -193,34 +138,8 @@ static void ITransform(const uint8_t* ref, const int16_t* in, uint8_t* dst,
     const __m128i shifted3 = _mm_srai_epi16(tmp3, 3);
 
     // Transpose the two 4x4.
-    // a00 a01 a02 a03   b00 b01 b02 b03
-    // a10 a11 a12 a13   b10 b11 b12 b13
-    // a20 a21 a22 a23   b20 b21 b22 b23
-    // a30 a31 a32 a33   b30 b31 b32 b33
-    const __m128i transpose0_0 = _mm_unpacklo_epi16(shifted0, shifted1);
-    const __m128i transpose0_1 = _mm_unpacklo_epi16(shifted2, shifted3);
-    const __m128i transpose0_2 = _mm_unpackhi_epi16(shifted0, shifted1);
-    const __m128i transpose0_3 = _mm_unpackhi_epi16(shifted2, shifted3);
-    // a00 a10 a01 a11   a02 a12 a03 a13
-    // a20 a30 a21 a31   a22 a32 a23 a33
-    // b00 b10 b01 b11   b02 b12 b03 b13
-    // b20 b30 b21 b31   b22 b32 b23 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
-    const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
-    // a00 a10 a20 a30 a01 a11 a21 a31
-    // b00 b10 b20 b30 b01 b11 b21 b31
-    // a02 a12 a22 a32 a03 a13 a23 a33
-    // b02 b12 a22 b32 b03 b13 b23 b33
-    T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
-    T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
-    T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
-    T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    VP8Transpose_2_4x4_16b(&shifted0, &shifted1, &shifted2, &shifted3, &T0, &T1,
+                        &T2, &T3);
   }
 
   // Add inverse transform to 'ref' and store.
@@ -373,42 +292,42 @@ static void FTransformPass2(const __m128i* const v01, const __m128i* const v32,
 
 static void FTransform(const uint8_t* src, const uint8_t* ref, int16_t* out) {
   const __m128i zero = _mm_setzero_si128();
-
-  // Load src and convert to 16b.
+  // Load src.
   const __m128i src0 = _mm_loadl_epi64((const __m128i*)&src[0 * BPS]);
   const __m128i src1 = _mm_loadl_epi64((const __m128i*)&src[1 * BPS]);
   const __m128i src2 = _mm_loadl_epi64((const __m128i*)&src[2 * BPS]);
   const __m128i src3 = _mm_loadl_epi64((const __m128i*)&src[3 * BPS]);
-  const __m128i src_0 = _mm_unpacklo_epi8(src0, zero);
-  const __m128i src_1 = _mm_unpacklo_epi8(src1, zero);
-  const __m128i src_2 = _mm_unpacklo_epi8(src2, zero);
-  const __m128i src_3 = _mm_unpacklo_epi8(src3, zero);
-  // Load ref and convert to 16b.
+  // 00 01 02 03 *
+  // 10 11 12 13 *
+  // 20 21 22 23 *
+  // 30 31 32 33 *
+  // Shuffle.
+  const __m128i src_0 = _mm_unpacklo_epi16(src0, src1);
+  const __m128i src_1 = _mm_unpacklo_epi16(src2, src3);
+  // 00 01 10 11 02 03 12 13 * * ...
+  // 20 21 30 31 22 22 32 33 * * ...
+
+  // Load ref.
   const __m128i ref0 = _mm_loadl_epi64((const __m128i*)&ref[0 * BPS]);
   const __m128i ref1 = _mm_loadl_epi64((const __m128i*)&ref[1 * BPS]);
   const __m128i ref2 = _mm_loadl_epi64((const __m128i*)&ref[2 * BPS]);
   const __m128i ref3 = _mm_loadl_epi64((const __m128i*)&ref[3 * BPS]);
-  const __m128i ref_0 = _mm_unpacklo_epi8(ref0, zero);
-  const __m128i ref_1 = _mm_unpacklo_epi8(ref1, zero);
-  const __m128i ref_2 = _mm_unpacklo_epi8(ref2, zero);
-  const __m128i ref_3 = _mm_unpacklo_epi8(ref3, zero);
-  // Compute difference. -> 00 01 02 03 00 00 00 00
-  const __m128i diff0 = _mm_sub_epi16(src_0, ref_0);
-  const __m128i diff1 = _mm_sub_epi16(src_1, ref_1);
-  const __m128i diff2 = _mm_sub_epi16(src_2, ref_2);
-  const __m128i diff3 = _mm_sub_epi16(src_3, ref_3);
-
-  // Unpack and shuffle
-  // 00 01 02 03   0 0 0 0
-  // 10 11 12 13   0 0 0 0
-  // 20 21 22 23   0 0 0 0
-  // 30 31 32 33   0 0 0 0
-  const __m128i shuf01 = _mm_unpacklo_epi32(diff0, diff1);
-  const __m128i shuf23 = _mm_unpacklo_epi32(diff2, diff3);
+  const __m128i ref_0 = _mm_unpacklo_epi16(ref0, ref1);
+  const __m128i ref_1 = _mm_unpacklo_epi16(ref2, ref3);
+
+  // Convert both to 16 bit.
+  const __m128i src_0_16b = _mm_unpacklo_epi8(src_0, zero);
+  const __m128i src_1_16b = _mm_unpacklo_epi8(src_1, zero);
+  const __m128i ref_0_16b = _mm_unpacklo_epi8(ref_0, zero);
+  const __m128i ref_1_16b = _mm_unpacklo_epi8(ref_1, zero);
+
+  // Compute the difference.
+  const __m128i row01 = _mm_sub_epi16(src_0_16b, ref_0_16b);
+  const __m128i row23 = _mm_sub_epi16(src_1_16b, ref_1_16b);
   __m128i v01, v32;
 
   // First pass
-  FTransformPass1(&shuf01, &shuf23, &v01, &v32);
+  FTransformPass1(&row01, &row23, &v01, &v32);
 
   // Second pass
   FTransformPass2(&v01, &v32, out);
@@ -463,8 +382,7 @@ static void FTransform2(const uint8_t* src, const uint8_t* ref, int16_t* out) {
 }
 
 static void FTransformWHTRow(const int16_t* const in, __m128i* const out) {
-  const __m128i kMult1 = _mm_set_epi16(0, 0, 0, 0, 1, 1, 1, 1);
-  const __m128i kMult2 = _mm_set_epi16(0, 0, 0, 0, -1, 1, -1, 1);
+  const __m128i kMult = _mm_set_epi16(-1, 1, -1, 1, 1, 1, 1, 1);
   const __m128i src0 = _mm_loadl_epi64((__m128i*)&in[0 * 16]);
   const __m128i src1 = _mm_loadl_epi64((__m128i*)&in[1 * 16]);
   const __m128i src2 = _mm_loadl_epi64((__m128i*)&in[2 * 16]);
@@ -473,33 +391,38 @@ static void FTransformWHTRow(const int16_t* const in, __m128i* const out) {
   const __m128i A23 = _mm_unpacklo_epi16(src2, src3);  // A2 A3 | ...
   const __m128i B0 = _mm_adds_epi16(A01, A23);    // a0 | a1 | ...
   const __m128i B1 = _mm_subs_epi16(A01, A23);    // a3 | a2 | ...
-  const __m128i C0 = _mm_unpacklo_epi32(B0, B1);  // a0 | a1 | a3 | a2
-  const __m128i C1 = _mm_unpacklo_epi32(B1, B0);  // a3 | a2 | a0 | a1
-  const __m128i D0 = _mm_madd_epi16(C0, kMult1);  // out0, out1
-  const __m128i D1 = _mm_madd_epi16(C1, kMult2);  // out2, out3
-  *out = _mm_unpacklo_epi64(D0, D1);
+  const __m128i C0 = _mm_unpacklo_epi32(B0, B1);  // a0 | a1 | a3 | a2 | ...
+  const __m128i C1 = _mm_unpacklo_epi32(B1, B0);  // a3 | a2 | a0 | a1 | ...
+  const __m128i D = _mm_unpacklo_epi64(C0, C1);   // a0 a1 a3 a2 a3 a2 a0 a1
+  *out = _mm_madd_epi16(D, kMult);
 }
 
 static void FTransformWHT(const int16_t* in, int16_t* out) {
+  // Input is 12b signed.
   __m128i row0, row1, row2, row3;
+  // Rows are 14b signed.
   FTransformWHTRow(in + 0 * 64, &row0);
   FTransformWHTRow(in + 1 * 64, &row1);
   FTransformWHTRow(in + 2 * 64, &row2);
   FTransformWHTRow(in + 3 * 64, &row3);
 
   {
+    // The a* are 15b signed.
     const __m128i a0 = _mm_add_epi32(row0, row2);
     const __m128i a1 = _mm_add_epi32(row1, row3);
     const __m128i a2 = _mm_sub_epi32(row1, row3);
     const __m128i a3 = _mm_sub_epi32(row0, row2);
-    const __m128i b0 = _mm_srai_epi32(_mm_add_epi32(a0, a1), 1);
-    const __m128i b1 = _mm_srai_epi32(_mm_add_epi32(a3, a2), 1);
-    const __m128i b2 = _mm_srai_epi32(_mm_sub_epi32(a3, a2), 1);
-    const __m128i b3 = _mm_srai_epi32(_mm_sub_epi32(a0, a1), 1);
-    const __m128i out0 = _mm_packs_epi32(b0, b1);
-    const __m128i out1 = _mm_packs_epi32(b2, b3);
-    _mm_storeu_si128((__m128i*)&out[0], out0);
-    _mm_storeu_si128((__m128i*)&out[8], out1);
+    const __m128i a0a3 = _mm_packs_epi32(a0, a3);
+    const __m128i a1a2 = _mm_packs_epi32(a1, a2);
+
+    // The b* are 16b signed.
+    const __m128i b0b1 = _mm_add_epi16(a0a3, a1a2);
+    const __m128i b3b2 = _mm_sub_epi16(a0a3, a1a2);
+    const __m128i tmp_b2b3 = _mm_unpackhi_epi64(b3b2, b3b2);
+    const __m128i b2b3 = _mm_unpacklo_epi64(tmp_b2b3, b3b2);
+
+    _mm_storeu_si128((__m128i*)&out[0], _mm_srai_epi16(b0b1, 1));
+    _mm_storeu_si128((__m128i*)&out[8], _mm_srai_epi16(b2b3, 1));
   }
 }
 
@@ -692,12 +615,10 @@ static WEBP_INLINE void TrueMotion(uint8_t* dst, const uint8_t* left,
 
 static WEBP_INLINE void DC8uv(uint8_t* dst, const uint8_t* left,
                               const uint8_t* top) {
-  const __m128i zero = _mm_setzero_si128();
   const __m128i top_values = _mm_loadl_epi64((const __m128i*)top);
   const __m128i left_values = _mm_loadl_epi64((const __m128i*)left);
-  const __m128i sum_top = _mm_sad_epu8(top_values, zero);
-  const __m128i sum_left = _mm_sad_epu8(left_values, zero);
-  const int DC = _mm_cvtsi128_si32(sum_top) + _mm_cvtsi128_si32(sum_left) + 8;
+  const __m128i combined = _mm_unpacklo_epi64(top_values, left_values);
+  const int DC = VP8HorizontalAdd8b(&combined) + 8;
   Put8x8uv(DC >> 4, dst);
 }
 
@@ -735,27 +656,16 @@ static WEBP_INLINE void DC8uvMode(uint8_t* dst, const uint8_t* left,
 
 static WEBP_INLINE void DC16(uint8_t* dst, const uint8_t* left,
                              const uint8_t* top) {
-  const __m128i zero = _mm_setzero_si128();
   const __m128i top_row = _mm_load_si128((const __m128i*)top);
   const __m128i left_row = _mm_load_si128((const __m128i*)left);
-  const __m128i sad8x2 = _mm_sad_epu8(top_row, zero);
-  // sum the two sads: sad8x2[0:1] + sad8x2[8:9]
-  const __m128i sum_top = _mm_add_epi16(sad8x2, _mm_shuffle_epi32(sad8x2, 2));
-  const __m128i sad8x2_left = _mm_sad_epu8(left_row, zero);
-  // sum the two sads: sad8x2[0:1] + sad8x2[8:9]
-  const __m128i sum_left =
-      _mm_add_epi16(sad8x2_left, _mm_shuffle_epi32(sad8x2_left, 2));
-  const int DC = _mm_cvtsi128_si32(sum_top) + _mm_cvtsi128_si32(sum_left) + 16;
+  const int DC =
+      VP8HorizontalAdd8b(&top_row) + VP8HorizontalAdd8b(&left_row) + 16;
   Put16(DC >> 5, dst);
 }
 
 static WEBP_INLINE void DC16NoLeft(uint8_t* dst, const uint8_t* top) {
-  const __m128i zero = _mm_setzero_si128();
   const __m128i top_row = _mm_load_si128((const __m128i*)top);
-  const __m128i sad8x2 = _mm_sad_epu8(top_row, zero);
-  // sum the two sads: sad8x2[0:1] + sad8x2[8:9]
-  const __m128i sum = _mm_add_epi16(sad8x2, _mm_shuffle_epi32(sad8x2, 2));
-  const int DC = _mm_cvtsi128_si32(sum) + 8;
+  const int DC = VP8HorizontalAdd8b(&top_row) + 8;
   Put16(DC >> 4, dst);
 }
 
@@ -1142,15 +1052,15 @@ static int SSE4x4(const uint8_t* a, const uint8_t* b) {
 // reconstructed samples.
 
 // Hadamard transform
-// Returns the difference between the weighted sum of the absolute value of
-// transformed coefficients.
+// Returns the weighted sum of the absolute value of transformed coefficients.
+// w[] contains a row-major 4 by 4 symmetric matrix.
 static int TTransform(const uint8_t* inA, const uint8_t* inB,
                       const uint16_t* const w) {
   int32_t sum[4];
   __m128i tmp_0, tmp_1, tmp_2, tmp_3;
   const __m128i zero = _mm_setzero_si128();
 
-  // Load, combine and transpose inputs.
+  // Load and combine inputs.
   {
     const __m128i inA_0 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 0]);
     const __m128i inA_1 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 1]);
@@ -1162,37 +1072,22 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB,
     const __m128i inB_3 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 3]);
 
     // Combine inA and inB (we'll do two transforms in parallel).
-    const __m128i inAB_0 = _mm_unpacklo_epi8(inA_0, inB_0);
-    const __m128i inAB_1 = _mm_unpacklo_epi8(inA_1, inB_1);
-    const __m128i inAB_2 = _mm_unpacklo_epi8(inA_2, inB_2);
-    const __m128i inAB_3 = _mm_unpacklo_epi8(inA_3, inB_3);
-    // a00 b00 a01 b01 a02 b03 a03 b03   0 0 0 0 0 0 0 0
-    // a10 b10 a11 b11 a12 b12 a13 b13   0 0 0 0 0 0 0 0
-    // a20 b20 a21 b21 a22 b22 a23 b23   0 0 0 0 0 0 0 0
-    // a30 b30 a31 b31 a32 b32 a33 b33   0 0 0 0 0 0 0 0
-
-    // Transpose the two 4x4, discarding the filling zeroes.
-    const __m128i transpose0_0 = _mm_unpacklo_epi8(inAB_0, inAB_2);
-    const __m128i transpose0_1 = _mm_unpacklo_epi8(inAB_1, inAB_3);
-    // a00 a20  b00 b20  a01 a21  b01 b21  a02 a22  b02 b22  a03 a23  b03 b23
-    // a10 a30  b10 b30  a11 a31  b11 b31  a12 a32  b12 b32  a13 a33  b13 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi8(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpackhi_epi8(transpose0_0, transpose0_1);
-    // a00 a10 a20 a30  b00 b10 b20 b30  a01 a11 a21 a31  b01 b11 b21 b31
-    // a02 a12 a22 a32  b02 b12 b22 b32  a03 a13 a23 a33  b03 b13 b23 b33
-
-    // Convert to 16b.
-    tmp_0 = _mm_unpacklo_epi8(transpose1_0, zero);
-    tmp_1 = _mm_unpackhi_epi8(transpose1_0, zero);
-    tmp_2 = _mm_unpacklo_epi8(transpose1_1, zero);
-    tmp_3 = _mm_unpackhi_epi8(transpose1_1, zero);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    const __m128i inAB_0 = _mm_unpacklo_epi32(inA_0, inB_0);
+    const __m128i inAB_1 = _mm_unpacklo_epi32(inA_1, inB_1);
+    const __m128i inAB_2 = _mm_unpacklo_epi32(inA_2, inB_2);
+    const __m128i inAB_3 = _mm_unpacklo_epi32(inA_3, inB_3);
+    tmp_0 = _mm_unpacklo_epi8(inAB_0, zero);
+    tmp_1 = _mm_unpacklo_epi8(inAB_1, zero);
+    tmp_2 = _mm_unpacklo_epi8(inAB_2, zero);
+    tmp_3 = _mm_unpacklo_epi8(inAB_3, zero);
+    // a00 a01 a02 a03   b00 b01 b02 b03
+    // a10 a11 a12 a13   b10 b11 b12 b13
+    // a20 a21 a22 a23   b20 b21 b22 b23
+    // a30 a31 a32 a33   b30 b31 b32 b33
   }
 
-  // Horizontal pass and subsequent transpose.
+  // Vertical pass first to avoid a transpose (vertical and horizontal passes
+  // are commutative because w/kWeightY is symmetric) and subsequent transpose.
   {
     // Calculate a and b (two 4x4 at once).
     const __m128i a0 = _mm_add_epi16(tmp_0, tmp_2);
@@ -1209,33 +1104,10 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB,
     // a30 a31 a32 a33   b30 b31 b32 b33
 
     // Transpose the two 4x4.
-    const __m128i transpose0_0 = _mm_unpacklo_epi16(b0, b1);
-    const __m128i transpose0_1 = _mm_unpacklo_epi16(b2, b3);
-    const __m128i transpose0_2 = _mm_unpackhi_epi16(b0, b1);
-    const __m128i transpose0_3 = _mm_unpackhi_epi16(b2, b3);
-    // a00 a10 a01 a11   a02 a12 a03 a13
-    // a20 a30 a21 a31   a22 a32 a23 a33
-    // b00 b10 b01 b11   b02 b12 b03 b13
-    // b20 b30 b21 b31   b22 b32 b23 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
-    const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
-    // a00 a10 a20 a30 a01 a11 a21 a31
-    // b00 b10 b20 b30 b01 b11 b21 b31
-    // a02 a12 a22 a32 a03 a13 a23 a33
-    // b02 b12 a22 b32 b03 b13 b23 b33
-    tmp_0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
-    tmp_1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
-    tmp_2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
-    tmp_3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    VP8Transpose_2_4x4_16b(&b0, &b1, &b2, &b3, &tmp_0, &tmp_1, &tmp_2, &tmp_3);
   }
 
-  // Vertical pass and difference of weighted sums.
+  // Horizontal pass and difference of weighted sums.
   {
     // Load all inputs.
     const __m128i w_0 = _mm_loadu_si128((const __m128i*)&w[0]);
diff --git a/src/3rdparty/libwebp/src/dsp/enc_sse41.c b/src/3rdparty/libwebp/src/dsp/enc_sse41.c
index 65c01ae..a178390 100644
--- a/src/3rdparty/libwebp/src/dsp/enc_sse41.c
+++ b/src/3rdparty/libwebp/src/dsp/enc_sse41.c
@@ -17,6 +17,7 @@
 #include <smmintrin.h>
 #include <stdlib.h>  // for abs()
 
+#include "./common_sse2.h"
 #include "../enc/vp8enci.h"
 
 //------------------------------------------------------------------------------
@@ -67,55 +68,45 @@ static void CollectHistogram(const uint8_t* ref, const uint8_t* pred,
 // reconstructed samples.
 
 // Hadamard transform
-// Returns the difference between the weighted sum of the absolute value of
-// transformed coefficients.
+// Returns the weighted sum of the absolute value of transformed coefficients.
+// w[] contains a row-major 4 by 4 symmetric matrix.
 static int TTransform(const uint8_t* inA, const uint8_t* inB,
                       const uint16_t* const w) {
+  int32_t sum[4];
   __m128i tmp_0, tmp_1, tmp_2, tmp_3;
 
-  // Load, combine and transpose inputs.
+  // Load and combine inputs.
   {
-    const __m128i inA_0 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 0]);
-    const __m128i inA_1 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 1]);
-    const __m128i inA_2 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 2]);
+    const __m128i inA_0 = _mm_loadu_si128((const __m128i*)&inA[BPS * 0]);
+    const __m128i inA_1 = _mm_loadu_si128((const __m128i*)&inA[BPS * 1]);
+    const __m128i inA_2 = _mm_loadu_si128((const __m128i*)&inA[BPS * 2]);
+    // In SSE4.1, with gcc 4.8 at least (maybe other versions),
+    // _mm_loadu_si128 is faster than _mm_loadl_epi64. But for the last lump
+    // of inA and inB, _mm_loadl_epi64 is still used not to have an out of
+    // bound read.
     const __m128i inA_3 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 3]);
-    const __m128i inB_0 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 0]);
-    const __m128i inB_1 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 1]);
-    const __m128i inB_2 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 2]);
+    const __m128i inB_0 = _mm_loadu_si128((const __m128i*)&inB[BPS * 0]);
+    const __m128i inB_1 = _mm_loadu_si128((const __m128i*)&inB[BPS * 1]);
+    const __m128i inB_2 = _mm_loadu_si128((const __m128i*)&inB[BPS * 2]);
     const __m128i inB_3 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 3]);
 
     // Combine inA and inB (we'll do two transforms in parallel).
-    const __m128i inAB_0 = _mm_unpacklo_epi8(inA_0, inB_0);
-    const __m128i inAB_1 = _mm_unpacklo_epi8(inA_1, inB_1);
-    const __m128i inAB_2 = _mm_unpacklo_epi8(inA_2, inB_2);
-    const __m128i inAB_3 = _mm_unpacklo_epi8(inA_3, inB_3);
-    // a00 b00 a01 b01 a02 b03 a03 b03   0 0 0 0 0 0 0 0
-    // a10 b10 a11 b11 a12 b12 a13 b13   0 0 0 0 0 0 0 0
-    // a20 b20 a21 b21 a22 b22 a23 b23   0 0 0 0 0 0 0 0
-    // a30 b30 a31 b31 a32 b32 a33 b33   0 0 0 0 0 0 0 0
-
-    // Transpose the two 4x4, discarding the filling zeroes.
-    const __m128i transpose0_0 = _mm_unpacklo_epi8(inAB_0, inAB_2);
-    const __m128i transpose0_1 = _mm_unpacklo_epi8(inAB_1, inAB_3);
-    // a00 a20  b00 b20  a01 a21  b01 b21  a02 a22  b02 b22  a03 a23  b03 b23
-    // a10 a30  b10 b30  a11 a31  b11 b31  a12 a32  b12 b32  a13 a33  b13 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi8(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpackhi_epi8(transpose0_0, transpose0_1);
-    // a00 a10 a20 a30  b00 b10 b20 b30  a01 a11 a21 a31  b01 b11 b21 b31
-    // a02 a12 a22 a32  b02 b12 b22 b32  a03 a13 a23 a33  b03 b13 b23 b33
-
-    // Convert to 16b.
-    tmp_0 = _mm_cvtepu8_epi16(transpose1_0);
-    tmp_1 = _mm_cvtepu8_epi16(_mm_srli_si128(transpose1_0, 8));
-    tmp_2 = _mm_cvtepu8_epi16(transpose1_1);
-    tmp_3 = _mm_cvtepu8_epi16(_mm_srli_si128(transpose1_1, 8));
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    const __m128i inAB_0 = _mm_unpacklo_epi32(inA_0, inB_0);
+    const __m128i inAB_1 = _mm_unpacklo_epi32(inA_1, inB_1);
+    const __m128i inAB_2 = _mm_unpacklo_epi32(inA_2, inB_2);
+    const __m128i inAB_3 = _mm_unpacklo_epi32(inA_3, inB_3);
+    tmp_0 = _mm_cvtepu8_epi16(inAB_0);
+    tmp_1 = _mm_cvtepu8_epi16(inAB_1);
+    tmp_2 = _mm_cvtepu8_epi16(inAB_2);
+    tmp_3 = _mm_cvtepu8_epi16(inAB_3);
+    // a00 a01 a02 a03   b00 b01 b02 b03
+    // a10 a11 a12 a13   b10 b11 b12 b13
+    // a20 a21 a22 a23   b20 b21 b22 b23
+    // a30 a31 a32 a33   b30 b31 b32 b33
   }
 
-  // Horizontal pass and subsequent transpose.
+  // Vertical pass first to avoid a transpose (vertical and horizontal passes
+  // are commutative because w/kWeightY is symmetric) and subsequent transpose.
   {
     // Calculate a and b (two 4x4 at once).
     const __m128i a0 = _mm_add_epi16(tmp_0, tmp_2);
@@ -132,33 +123,10 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB,
     // a30 a31 a32 a33   b30 b31 b32 b33
 
     // Transpose the two 4x4.
-    const __m128i transpose0_0 = _mm_unpacklo_epi16(b0, b1);
-    const __m128i transpose0_1 = _mm_unpacklo_epi16(b2, b3);
-    const __m128i transpose0_2 = _mm_unpackhi_epi16(b0, b1);
-    const __m128i transpose0_3 = _mm_unpackhi_epi16(b2, b3);
-    // a00 a10 a01 a11   a02 a12 a03 a13
-    // a20 a30 a21 a31   a22 a32 a23 a33
-    // b00 b10 b01 b11   b02 b12 b03 b13
-    // b20 b30 b21 b31   b22 b32 b23 b33
-    const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3);
-    const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1);
-    const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3);
-    // a00 a10 a20 a30 a01 a11 a21 a31
-    // b00 b10 b20 b30 b01 b11 b21 b31
-    // a02 a12 a22 a32 a03 a13 a23 a33
-    // b02 b12 a22 b32 b03 b13 b23 b33
-    tmp_0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1);
-    tmp_1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1);
-    tmp_2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3);
-    tmp_3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3);
-    // a00 a10 a20 a30   b00 b10 b20 b30
-    // a01 a11 a21 a31   b01 b11 b21 b31
-    // a02 a12 a22 a32   b02 b12 b22 b32
-    // a03 a13 a23 a33   b03 b13 b23 b33
+    VP8Transpose_2_4x4_16b(&b0, &b1, &b2, &b3, &tmp_0, &tmp_1, &tmp_2, &tmp_3);
   }
 
-  // Vertical pass and difference of weighted sums.
+  // Horizontal pass and difference of weighted sums.
   {
     // Load all inputs.
     const __m128i w_0 = _mm_loadu_si128((const __m128i*)&w[0]);
@@ -195,11 +163,9 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB,
 
     // difference of weighted sums
     A_b2 = _mm_sub_epi32(A_b0, B_b0);
-    // cascading summation of the differences
-    B_b0 = _mm_hadd_epi32(A_b2, A_b2);
-    B_b2 = _mm_hadd_epi32(B_b0, B_b0);
-    return _mm_cvtsi128_si32(B_b2);
+    _mm_storeu_si128((__m128i*)&sum[0], A_b2);
   }
+  return sum[0] + sum[1] + sum[2] + sum[3];
 }
 
 static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
diff --git a/src/3rdparty/libwebp/src/dsp/filters.c b/src/3rdparty/libwebp/src/dsp/filters.c
index 5c30f2e..9f04faf 100644
--- a/src/3rdparty/libwebp/src/dsp/filters.c
+++ b/src/3rdparty/libwebp/src/dsp/filters.c
@@ -184,19 +184,40 @@ static void GradientFilter(const uint8_t* data, int width, int height,
 
 //------------------------------------------------------------------------------
 
-static void VerticalUnfilter(int width, int height, int stride, int row,
-                             int num_rows, uint8_t* data) {
-  DoVerticalFilter(data, width, height, stride, row, num_rows, 1, data);
+static void HorizontalUnfilter(const uint8_t* prev, const uint8_t* in,
+                               uint8_t* out, int width) {
+  uint8_t pred = (prev == NULL) ? 0 : prev[0];
+  int i;
+  for (i = 0; i < width; ++i) {
+    out[i] = pred + in[i];
+    pred = out[i];
+  }
 }
 
-static void HorizontalUnfilter(int width, int height, int stride, int row,
-                               int num_rows, uint8_t* data) {
-  DoHorizontalFilter(data, width, height, stride, row, num_rows, 1, data);
+static void VerticalUnfilter(const uint8_t* prev, const uint8_t* in,
+                             uint8_t* out, int width) {
+  if (prev == NULL) {
+    HorizontalUnfilter(NULL, in, out, width);
+  } else {
+    int i;
+    for (i = 0; i < width; ++i) out[i] = prev[i] + in[i];
+  }
 }
 
-static void GradientUnfilter(int width, int height, int stride, int row,
-                             int num_rows, uint8_t* data) {
-  DoGradientFilter(data, width, height, stride, row, num_rows, 1, data);
+static void GradientUnfilter(const uint8_t* prev, const uint8_t* in,
+                             uint8_t* out, int width) {
+  if (prev == NULL) {
+    HorizontalUnfilter(NULL, in, out, width);
+  } else {
+    uint8_t top = prev[0], top_left = top, left = top;
+    int i;
+    for (i = 0; i < width; ++i) {
+      top = prev[i];  // need to read this first, in case prev==out
+      left = in[i] + GradientPredictor(left, top, top_left);
+      top_left = top;
+      out[i] = left;
+    }
+  }
 }
 
 //------------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c
index 8134af5..1d82e3c 100644
--- a/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c
+++ b/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c
@@ -33,10 +33,6 @@
   assert(row >= 0 && num_rows > 0 && row + num_rows <= height);                \
   (void)height;  // Silence unused warning.
 
-// if INVERSE
-//   preds == &dst[-1] == &src[-1]
-// else
-//   preds == &src[-1] != &dst[-1]
 #define DO_PREDICT_LINE(SRC, DST, LENGTH, INVERSE) do {                        \
     const uint8_t* psrc = (uint8_t*)(SRC);                                     \
     uint8_t* pdst = (uint8_t*)(DST);                                           \
@@ -45,27 +41,28 @@
     __asm__ volatile (                                                         \
       ".set      push                                   \n\t"                  \
       ".set      noreorder                              \n\t"                  \
-      "srl       %[temp0],    %[length],    0x2         \n\t"                  \
+      "srl       %[temp0],    %[length],    2           \n\t"                  \
       "beqz      %[temp0],    4f                        \n\t"                  \
-      " andi     %[temp6],    %[length],    0x3         \n\t"                  \
+      " andi     %[temp6],    %[length],    3           \n\t"                  \
     ".if " #INVERSE "                                   \n\t"                  \
-      "lbu       %[temp1],    -1(%[src])                \n\t"                  \
     "1:                                                 \n\t"                  \
+      "lbu       %[temp1],    -1(%[dst])                \n\t"                  \
       "lbu       %[temp2],    0(%[src])                 \n\t"                  \
       "lbu       %[temp3],    1(%[src])                 \n\t"                  \
       "lbu       %[temp4],    2(%[src])                 \n\t"                  \
       "lbu       %[temp5],    3(%[src])                 \n\t"                  \
+      "addu      %[temp1],    %[temp1],     %[temp2]    \n\t"                  \
+      "addu      %[temp2],    %[temp1],     %[temp3]    \n\t"                  \
+      "addu      %[temp3],    %[temp2],     %[temp4]    \n\t"                  \
+      "addu      %[temp4],    %[temp3],     %[temp5]    \n\t"                  \
+      "sb        %[temp1],    0(%[dst])                 \n\t"                  \
+      "sb        %[temp2],    1(%[dst])                 \n\t"                  \
+      "sb        %[temp3],    2(%[dst])                 \n\t"                  \
+      "sb        %[temp4],    3(%[dst])                 \n\t"                  \
       "addiu     %[src],      %[src],       4           \n\t"                  \
       "addiu     %[temp0],    %[temp0],     -1          \n\t"                  \
-      "addu      %[temp2],    %[temp2],     %[temp1]    \n\t"                  \
-      "addu      %[temp3],    %[temp3],     %[temp2]    \n\t"                  \
-      "addu      %[temp4],    %[temp4],     %[temp3]    \n\t"                  \
-      "addu      %[temp1],    %[temp5],     %[temp4]    \n\t"                  \
-      "sb        %[temp2],    -4(%[src])                \n\t"                  \
-      "sb        %[temp3],    -3(%[src])                \n\t"                  \
-      "sb        %[temp4],    -2(%[src])                \n\t"                  \
       "bnez      %[temp0],    1b                        \n\t"                  \
-      " sb       %[temp1],    -1(%[src])                \n\t"                  \
+      " addiu    %[dst],      %[dst],       4           \n\t"                  \
     ".else                                              \n\t"                  \
     "1:                                                 \n\t"                  \
       "ulw       %[temp1],    -1(%[src])                \n\t"                  \
@@ -81,16 +78,16 @@
       "beqz      %[temp6],    3f                        \n\t"                  \
       " nop                                             \n\t"                  \
     "2:                                                 \n\t"                  \
-      "lbu       %[temp1],    -1(%[src])                \n\t"                  \
       "lbu       %[temp2],    0(%[src])                 \n\t"                  \
-      "addiu     %[src],      %[src],       1           \n\t"                  \
     ".if " #INVERSE "                                   \n\t"                  \
+      "lbu       %[temp1],    -1(%[dst])                \n\t"                  \
       "addu      %[temp3],    %[temp1],     %[temp2]    \n\t"                  \
-      "sb        %[temp3],    -1(%[src])                \n\t"                  \
     ".else                                              \n\t"                  \
+      "lbu       %[temp1],    -1(%[src])                \n\t"                  \
       "subu      %[temp3],    %[temp1],     %[temp2]    \n\t"                  \
-      "sb        %[temp3],    0(%[dst])                 \n\t"                  \
     ".endif                                             \n\t"                  \
+      "addiu     %[src],      %[src],       1           \n\t"                  \
+      "sb        %[temp3],    0(%[dst])                 \n\t"                  \
       "addiu     %[temp6],    %[temp6],     -1          \n\t"                  \
       "bnez      %[temp6],    2b                        \n\t"                  \
       " addiu    %[dst],      %[dst],       1           \n\t"                  \
@@ -105,12 +102,8 @@
   } while (0)
 
 static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst,
-                                    int length, int inverse) {
-  if (inverse) {
-    DO_PREDICT_LINE(src, dst, length, 1);
-  } else {
-    DO_PREDICT_LINE(src, dst, length, 0);
-  }
+                                    int length) {
+  DO_PREDICT_LINE(src, dst, length, 0);
 }
 
 #define DO_PREDICT_LINE_VERTICAL(SRC, PRED, DST, LENGTH, INVERSE) do {         \
@@ -172,16 +165,12 @@ static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst,
     );                                                                         \
   } while (0)
 
-#define PREDICT_LINE_ONE_PASS(SRC, PRED, DST, INVERSE) do {                    \
+#define PREDICT_LINE_ONE_PASS(SRC, PRED, DST) do {                             \
     int temp1, temp2, temp3;                                                   \
     __asm__ volatile (                                                         \
       "lbu       %[temp1],   0(%[src])               \n\t"                     \
       "lbu       %[temp2],   0(%[pred])              \n\t"                     \
-    ".if " #INVERSE "                                \n\t"                     \
-      "addu      %[temp3],   %[temp1],   %[temp2]    \n\t"                     \
-    ".else                                           \n\t"                     \
       "subu      %[temp3],   %[temp1],   %[temp2]    \n\t"                     \
-    ".endif                                          \n\t"                     \
       "sb        %[temp3],   0(%[dst])               \n\t"                     \
       : [temp1]"=&r"(temp1), [temp2]"=&r"(temp2), [temp3]"=&r"(temp3)          \
       : [pred]"r"((PRED)), [dst]"r"((DST)), [src]"r"((SRC))                    \
@@ -192,10 +181,10 @@ static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst,
 //------------------------------------------------------------------------------
 // Horizontal filter.
 
-#define FILTER_LINE_BY_LINE(INVERSE) do {                                      \
+#define FILTER_LINE_BY_LINE do {                                               \
     while (row < last_row) {                                                   \
-      PREDICT_LINE_ONE_PASS(in, preds - stride, out, INVERSE);                 \
-      DO_PREDICT_LINE(in + 1, out + 1, width - 1, INVERSE);                    \
+      PREDICT_LINE_ONE_PASS(in, preds - stride, out);                          \
+      DO_PREDICT_LINE(in + 1, out + 1, width - 1, 0);                          \
       ++row;                                                                   \
       preds += stride;                                                         \
       in += stride;                                                            \
@@ -206,19 +195,19 @@ static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst,
 static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in,
                                            int width, int height, int stride,
                                            int row, int num_rows,
-                                           int inverse, uint8_t* out) {
+                                           uint8_t* out) {
   const uint8_t* preds;
   const size_t start_offset = row * stride;
   const int last_row = row + num_rows;
   SANITY_CHECK(in, out);
   in += start_offset;
   out += start_offset;
-  preds = inverse ? out : in;
+  preds = in;
 
   if (row == 0) {
     // Leftmost pixel is the same as input for topmost scanline.
     out[0] = in[0];
-    PredictLine(in + 1, out + 1, width - 1, inverse);
+    PredictLine(in + 1, out + 1, width - 1);
     row = 1;
     preds += stride;
     in += stride;
@@ -226,31 +215,21 @@ static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in,
   }
 
   // Filter line-by-line.
-  if (inverse) {
-    FILTER_LINE_BY_LINE(1);
-  } else {
-    FILTER_LINE_BY_LINE(0);
-  }
+  FILTER_LINE_BY_LINE;
 }
-
 #undef FILTER_LINE_BY_LINE
 
 static void HorizontalFilter(const uint8_t* data, int width, int height,
                              int stride, uint8_t* filtered_data) {
-  DoHorizontalFilter(data, width, height, stride, 0, height, 0, filtered_data);
-}
-
-static void HorizontalUnfilter(int width, int height, int stride, int row,
-                               int num_rows, uint8_t* data) {
-  DoHorizontalFilter(data, width, height, stride, row, num_rows, 1, data);
+  DoHorizontalFilter(data, width, height, stride, 0, height, filtered_data);
 }
 
 //------------------------------------------------------------------------------
 // Vertical filter.
 
-#define FILTER_LINE_BY_LINE(INVERSE) do {                                      \
+#define FILTER_LINE_BY_LINE do {                                               \
     while (row < last_row) {                                                   \
-      DO_PREDICT_LINE_VERTICAL(in, preds, out, width, INVERSE);                \
+      DO_PREDICT_LINE_VERTICAL(in, preds, out, width, 0);                      \
       ++row;                                                                   \
       preds += stride;                                                         \
       in += stride;                                                            \
@@ -260,21 +239,20 @@ static void HorizontalUnfilter(int width, int height, int stride, int row,
 
 static WEBP_INLINE void DoVerticalFilter(const uint8_t* in,
                                          int width, int height, int stride,
-                                         int row, int num_rows,
-                                         int inverse, uint8_t* out) {
+                                         int row, int num_rows, uint8_t* out) {
   const uint8_t* preds;
   const size_t start_offset = row * stride;
   const int last_row = row + num_rows;
   SANITY_CHECK(in, out);
   in += start_offset;
   out += start_offset;
-  preds = inverse ? out : in;
+  preds = in;
 
   if (row == 0) {
     // Very first top-left pixel is copied.
     out[0] = in[0];
     // Rest of top scan-line is left-predicted.
-    PredictLine(in + 1, out + 1, width - 1, inverse);
+    PredictLine(in + 1, out + 1, width - 1);
     row = 1;
     in += stride;
     out += stride;
@@ -284,24 +262,13 @@ static WEBP_INLINE void DoVerticalFilter(const uint8_t* in,
   }
 
   // Filter line-by-line.
-  if (inverse) {
-    FILTER_LINE_BY_LINE(1);
-  } else {
-    FILTER_LINE_BY_LINE(0);
-  }
+  FILTER_LINE_BY_LINE;
 }
-
 #undef FILTER_LINE_BY_LINE
-#undef DO_PREDICT_LINE_VERTICAL
 
 static void VerticalFilter(const uint8_t* data, int width, int height,
                            int stride, uint8_t* filtered_data) {
-  DoVerticalFilter(data, width, height, stride, 0, height, 0, filtered_data);
-}
-
-static void VerticalUnfilter(int width, int height, int stride, int row,
-                             int num_rows, uint8_t* data) {
-  DoVerticalFilter(data, width, height, stride, row, num_rows, 1, data);
+  DoVerticalFilter(data, width, height, stride, 0, height, filtered_data);
 }
 
 //------------------------------------------------------------------------------
@@ -321,10 +288,10 @@ static WEBP_INLINE int GradientPredictor(uint8_t a, uint8_t b, uint8_t c) {
   return temp0;
 }
 
-#define FILTER_LINE_BY_LINE(INVERSE, PREDS, OPERATION) do {                    \
+#define FILTER_LINE_BY_LINE(PREDS, OPERATION) do {                             \
     while (row < last_row) {                                                   \
       int w;                                                                   \
-      PREDICT_LINE_ONE_PASS(in, PREDS - stride, out, INVERSE);                 \
+      PREDICT_LINE_ONE_PASS(in, PREDS - stride, out);                          \
       for (w = 1; w < width; ++w) {                                            \
         const int pred = GradientPredictor(PREDS[w - 1],                       \
                                            PREDS[w - stride],                  \
@@ -339,20 +306,19 @@ static WEBP_INLINE int GradientPredictor(uint8_t a, uint8_t b, uint8_t c) {
 
 static WEBP_INLINE void DoGradientFilter(const uint8_t* in,
                                          int width, int height, int stride,
-                                         int row, int num_rows,
-                                         int inverse, uint8_t* out) {
+                                         int row, int num_rows, uint8_t* out) {
   const uint8_t* preds;
   const size_t start_offset = row * stride;
   const int last_row = row + num_rows;
   SANITY_CHECK(in, out);
   in += start_offset;
   out += start_offset;
-  preds = inverse ? out : in;
+  preds = in;
 
   // left prediction for top scan-line
   if (row == 0) {
     out[0] = in[0];
-    PredictLine(in + 1, out + 1, width - 1, inverse);
+    PredictLine(in + 1, out + 1, width - 1);
     row = 1;
     preds += stride;
     in += stride;
@@ -360,25 +326,49 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in,
   }
 
   // Filter line-by-line.
-  if (inverse) {
-    FILTER_LINE_BY_LINE(1, out, +);
-  } else {
-    FILTER_LINE_BY_LINE(0, in, -);
-  }
+  FILTER_LINE_BY_LINE(in, -);
 }
-
 #undef FILTER_LINE_BY_LINE
 
 static void GradientFilter(const uint8_t* data, int width, int height,
                            int stride, uint8_t* filtered_data) {
-  DoGradientFilter(data, width, height, stride, 0, height, 0, filtered_data);
+  DoGradientFilter(data, width, height, stride, 0, height, filtered_data);
 }
 
-static void GradientUnfilter(int width, int height, int stride, int row,
-                             int num_rows, uint8_t* data) {
-  DoGradientFilter(data, width, height, stride, row, num_rows, 1, data);
+//------------------------------------------------------------------------------
+
+static void HorizontalUnfilter(const uint8_t* prev, const uint8_t* in,
+                               uint8_t* out, int width) {
+ out[0] = in[0] + (prev == NULL ? 0 : prev[0]);
+ DO_PREDICT_LINE(in + 1, out + 1, width - 1, 1);
 }
 
+static void VerticalUnfilter(const uint8_t* prev, const uint8_t* in,
+                             uint8_t* out, int width) {
+  if (prev == NULL) {
+    HorizontalUnfilter(NULL, in, out, width);
+  } else {
+    DO_PREDICT_LINE_VERTICAL(in, prev, out, width, 1);
+  }
+}
+
+static void GradientUnfilter(const uint8_t* prev, const uint8_t* in,
+                             uint8_t* out, int width) {
+  if (prev == NULL) {
+    HorizontalUnfilter(NULL, in, out, width);
+  } else {
+    uint8_t top = prev[0], top_left = top, left = top;
+    int i;
+    for (i = 0; i < width; ++i) {
+      top = prev[i];  // need to read this first, in case prev==dst
+      left = in[i] + GradientPredictor(left, top, top_left);
+      top_left = top;
+      out[i] = left;
+    }
+  }
+}
+
+#undef DO_PREDICT_LINE_VERTICAL
 #undef PREDICT_LINE_ONE_PASS
 #undef DO_PREDICT_LINE
 #undef SANITY_CHECK
@@ -389,13 +379,13 @@ static void GradientUnfilter(int width, int height, int stride, int row,
 extern void VP8FiltersInitMIPSdspR2(void);
 
 WEBP_TSAN_IGNORE_FUNCTION void VP8FiltersInitMIPSdspR2(void) {
-  WebPFilters[WEBP_FILTER_HORIZONTAL] = HorizontalFilter;
-  WebPFilters[WEBP_FILTER_VERTICAL] = VerticalFilter;
-  WebPFilters[WEBP_FILTER_GRADIENT] = GradientFilter;
-
   WebPUnfilters[WEBP_FILTER_HORIZONTAL] = HorizontalUnfilter;
   WebPUnfilters[WEBP_FILTER_VERTICAL] = VerticalUnfilter;
   WebPUnfilters[WEBP_FILTER_GRADIENT] = GradientUnfilter;
+
+  WebPFilters[WEBP_FILTER_HORIZONTAL] = HorizontalFilter;
+  WebPFilters[WEBP_FILTER_VERTICAL] = VerticalFilter;
+  WebPFilters[WEBP_FILTER_GRADIENT] = GradientFilter;
 }
 
 #else  // !WEBP_USE_MIPS_DSP_R2
diff --git a/src/3rdparty/libwebp/src/dsp/filters_sse2.c b/src/3rdparty/libwebp/src/dsp/filters_sse2.c
index bf93342..67f7799 100644
--- a/src/3rdparty/libwebp/src/dsp/filters_sse2.c
+++ b/src/3rdparty/libwebp/src/dsp/filters_sse2.c
@@ -33,82 +33,39 @@
   (void)height;  // Silence unused warning.
 
 static void PredictLineTop(const uint8_t* src, const uint8_t* pred,
-                           uint8_t* dst, int length, int inverse) {
+                           uint8_t* dst, int length) {
   int i;
   const int max_pos = length & ~31;
   assert(length >= 0);
-  if (inverse) {
-    for (i = 0; i < max_pos; i += 32) {
-      const __m128i A0 = _mm_loadu_si128((const __m128i*)&src[i +  0]);
-      const __m128i A1 = _mm_loadu_si128((const __m128i*)&src[i + 16]);
-      const __m128i B0 = _mm_loadu_si128((const __m128i*)&pred[i +  0]);
-      const __m128i B1 = _mm_loadu_si128((const __m128i*)&pred[i + 16]);
-      const __m128i C0 = _mm_add_epi8(A0, B0);
-      const __m128i C1 = _mm_add_epi8(A1, B1);
-      _mm_storeu_si128((__m128i*)&dst[i +  0], C0);
-      _mm_storeu_si128((__m128i*)&dst[i + 16], C1);
-    }
-    for (; i < length; ++i) dst[i] = src[i] + pred[i];
-  } else {
-    for (i = 0; i < max_pos; i += 32) {
-      const __m128i A0 = _mm_loadu_si128((const __m128i*)&src[i +  0]);
-      const __m128i A1 = _mm_loadu_si128((const __m128i*)&src[i + 16]);
-      const __m128i B0 = _mm_loadu_si128((const __m128i*)&pred[i +  0]);
-      const __m128i B1 = _mm_loadu_si128((const __m128i*)&pred[i + 16]);
-      const __m128i C0 = _mm_sub_epi8(A0, B0);
-      const __m128i C1 = _mm_sub_epi8(A1, B1);
-      _mm_storeu_si128((__m128i*)&dst[i +  0], C0);
-      _mm_storeu_si128((__m128i*)&dst[i + 16], C1);
-    }
-    for (; i < length; ++i) dst[i] = src[i] - pred[i];
+  for (i = 0; i < max_pos; i += 32) {
+    const __m128i A0 = _mm_loadu_si128((const __m128i*)&src[i +  0]);
+    const __m128i A1 = _mm_loadu_si128((const __m128i*)&src[i + 16]);
+    const __m128i B0 = _mm_loadu_si128((const __m128i*)&pred[i +  0]);
+    const __m128i B1 = _mm_loadu_si128((const __m128i*)&pred[i + 16]);
+    const __m128i C0 = _mm_sub_epi8(A0, B0);
+    const __m128i C1 = _mm_sub_epi8(A1, B1);
+    _mm_storeu_si128((__m128i*)&dst[i +  0], C0);
+    _mm_storeu_si128((__m128i*)&dst[i + 16], C1);
   }
+  for (; i < length; ++i) dst[i] = src[i] - pred[i];
 }
 
 // Special case for left-based prediction (when preds==dst-1 or preds==src-1).
-static void PredictLineLeft(const uint8_t* src, uint8_t* dst, int length,
-                            int inverse) {
+static void PredictLineLeft(const uint8_t* src, uint8_t* dst, int length) {
   int i;
-  if (length <= 0) return;
-  if (inverse) {
-    const int max_pos = length & ~7;
-    __m128i last = _mm_set_epi32(0, 0, 0, dst[-1]);
-    for (i = 0; i < max_pos; i += 8) {
-      const __m128i A0 = _mm_loadl_epi64((const __m128i*)(src + i));
-      const __m128i A1 = _mm_add_epi8(A0, last);
-      const __m128i A2 = _mm_slli_si128(A1, 1);
-      const __m128i A3 = _mm_add_epi8(A1, A2);
-      const __m128i A4 = _mm_slli_si128(A3, 2);
-      const __m128i A5 = _mm_add_epi8(A3, A4);
-      const __m128i A6 = _mm_slli_si128(A5, 4);
-      const __m128i A7 = _mm_add_epi8(A5, A6);
-      _mm_storel_epi64((__m128i*)(dst + i), A7);
-      last = _mm_srli_epi64(A7, 56);
-    }
-    for (; i < length; ++i) dst[i] = src[i] + dst[i - 1];
-  } else {
-    const int max_pos = length & ~31;
-    for (i = 0; i < max_pos; i += 32) {
-      const __m128i A0 = _mm_loadu_si128((const __m128i*)(src + i +  0    ));
-      const __m128i B0 = _mm_loadu_si128((const __m128i*)(src + i +  0 - 1));
-      const __m128i A1 = _mm_loadu_si128((const __m128i*)(src + i + 16    ));
-      const __m128i B1 = _mm_loadu_si128((const __m128i*)(src + i + 16 - 1));
-      const __m128i C0 = _mm_sub_epi8(A0, B0);
-      const __m128i C1 = _mm_sub_epi8(A1, B1);
-      _mm_storeu_si128((__m128i*)(dst + i +  0), C0);
-      _mm_storeu_si128((__m128i*)(dst + i + 16), C1);
-    }
-    for (; i < length; ++i) dst[i] = src[i] - src[i - 1];
-  }
-}
-
-static void PredictLineC(const uint8_t* src, const uint8_t* pred,
-                         uint8_t* dst, int length, int inverse) {
-  int i;
-  if (inverse) {
-    for (i = 0; i < length; ++i) dst[i] = src[i] + pred[i];
-  } else {
-    for (i = 0; i < length; ++i) dst[i] = src[i] - pred[i];
+  const int max_pos = length & ~31;
+  assert(length >= 0);
+  for (i = 0; i < max_pos; i += 32) {
+    const __m128i A0 = _mm_loadu_si128((const __m128i*)(src + i +  0    ));
+    const __m128i B0 = _mm_loadu_si128((const __m128i*)(src + i +  0 - 1));
+    const __m128i A1 = _mm_loadu_si128((const __m128i*)(src + i + 16    ));
+    const __m128i B1 = _mm_loadu_si128((const __m128i*)(src + i + 16 - 1));
+    const __m128i C0 = _mm_sub_epi8(A0, B0);
+    const __m128i C1 = _mm_sub_epi8(A1, B1);
+    _mm_storeu_si128((__m128i*)(dst + i +  0), C0);
+    _mm_storeu_si128((__m128i*)(dst + i + 16), C1);
   }
+  for (; i < length; ++i) dst[i] = src[i] - src[i - 1];
 }
 
 //------------------------------------------------------------------------------
@@ -117,21 +74,18 @@ static void PredictLineC(const uint8_t* src, const uint8_t* pred,
 static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in,
                                            int width, int height, int stride,
                                            int row, int num_rows,
-                                           int inverse, uint8_t* out) {
-  const uint8_t* preds;
+                                           uint8_t* out) {
   const size_t start_offset = row * stride;
   const int last_row = row + num_rows;
   SANITY_CHECK(in, out);
   in += start_offset;
   out += start_offset;
-  preds = inverse ? out : in;
 
   if (row == 0) {
     // Leftmost pixel is the same as input for topmost scanline.
     out[0] = in[0];
-    PredictLineLeft(in + 1, out + 1, width - 1, inverse);
+    PredictLineLeft(in + 1, out + 1, width - 1);
     row = 1;
-    preds += stride;
     in += stride;
     out += stride;
   }
@@ -139,10 +93,9 @@ static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in,
   // Filter line-by-line.
   while (row < last_row) {
     // Leftmost pixel is predicted from above.
-    PredictLineC(in, preds - stride, out, 1, inverse);
-    PredictLineLeft(in + 1, out + 1, width - 1, inverse);
+    out[0] = in[0] - in[-stride];
+    PredictLineLeft(in + 1, out + 1, width - 1);
     ++row;
-    preds += stride;
     in += stride;
     out += stride;
   }
@@ -153,34 +106,27 @@ static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in,
 
 static WEBP_INLINE void DoVerticalFilter(const uint8_t* in,
                                          int width, int height, int stride,
-                                         int row, int num_rows,
-                                         int inverse, uint8_t* out) {
-  const uint8_t* preds;
+                                         int row, int num_rows, uint8_t* out) {
   const size_t start_offset = row * stride;
   const int last_row = row + num_rows;
   SANITY_CHECK(in, out);
   in += start_offset;
   out += start_offset;
-  preds = inverse ? out : in;
 
   if (row == 0) {
     // Very first top-left pixel is copied.
     out[0] = in[0];
     // Rest of top scan-line is left-predicted.
-    PredictLineLeft(in + 1, out + 1, width - 1, inverse);
+    PredictLineLeft(in + 1, out + 1, width - 1);
     row = 1;
     in += stride;
     out += stride;
-  } else {
-    // We are starting from in-between. Make sure 'preds' points to prev row.
-    preds -= stride;
   }
 
   // Filter line-by-line.
   while (row < last_row) {
-    PredictLineTop(in, preds, out, width, inverse);
+    PredictLineTop(in, in - stride, out, width);
     ++row;
-    preds += stride;
     in += stride;
     out += stride;
   }
@@ -219,49 +165,10 @@ static void GradientPredictDirect(const uint8_t* const row,
   }
 }
 
-static void GradientPredictInverse(const uint8_t* const in,
-                                   const uint8_t* const top,
-                                   uint8_t* const row, int length) {
-  if (length > 0) {
-    int i;
-    const int max_pos = length & ~7;
-    const __m128i zero = _mm_setzero_si128();
-    __m128i A = _mm_set_epi32(0, 0, 0, row[-1]);   // left sample
-    for (i = 0; i < max_pos; i += 8) {
-      const __m128i tmp0 = _mm_loadl_epi64((const __m128i*)&top[i]);
-      const __m128i tmp1 = _mm_loadl_epi64((const __m128i*)&top[i - 1]);
-      const __m128i B = _mm_unpacklo_epi8(tmp0, zero);
-      const __m128i C = _mm_unpacklo_epi8(tmp1, zero);
-      const __m128i tmp2 = _mm_loadl_epi64((const __m128i*)&in[i]);
-      const __m128i D = _mm_unpacklo_epi8(tmp2, zero);   // base input
-      const __m128i E = _mm_sub_epi16(B, C);  // unclipped gradient basis B - C
-      __m128i out = zero;                     // accumulator for output
-      __m128i mask_hi = _mm_set_epi32(0, 0, 0, 0xff);
-      int k = 8;
-      while (1) {
-        const __m128i tmp3 = _mm_add_epi16(A, E);        // delta = A + B - C
-        const __m128i tmp4 = _mm_min_epi16(tmp3, mask_hi);
-        const __m128i tmp5 = _mm_max_epi16(tmp4, zero);  // clipped delta
-        const __m128i tmp6 = _mm_add_epi16(tmp5, D);     // add to in[] values
-        A = _mm_and_si128(tmp6, mask_hi);                // 1-complement clip
-        out = _mm_or_si128(out, A);                      // accumulate output
-        if (--k == 0) break;
-        A = _mm_slli_si128(A, 2);                        // rotate left sample
-        mask_hi = _mm_slli_si128(mask_hi, 2);            // rotate mask
-      }
-      A = _mm_srli_si128(A, 14);       // prepare left sample for next iteration
-      _mm_storel_epi64((__m128i*)&row[i], _mm_packus_epi16(out, zero));
-    }
-    for (; i < length; ++i) {
-      row[i] = in[i] + GradientPredictorC(row[i - 1], top[i], top[i - 1]);
-    }
-  }
-}
-
 static WEBP_INLINE void DoGradientFilter(const uint8_t* in,
                                          int width, int height, int stride,
                                          int row, int num_rows,
-                                         int inverse, uint8_t* out) {
+                                         uint8_t* out) {
   const size_t start_offset = row * stride;
   const int last_row = row + num_rows;
   SANITY_CHECK(in, out);
@@ -271,7 +178,7 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in,
   // left prediction for top scan-line
   if (row == 0) {
     out[0] = in[0];
-    PredictLineLeft(in + 1, out + 1, width - 1, inverse);
+    PredictLineLeft(in + 1, out + 1, width - 1);
     row = 1;
     in += stride;
     out += stride;
@@ -279,13 +186,8 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in,
 
   // Filter line-by-line.
   while (row < last_row) {
-    if (inverse) {
-      PredictLineC(in, out - stride, out, 1, inverse);  // predict from above
-      GradientPredictInverse(in + 1, out + 1 - stride, out + 1, width - 1);
-    } else {
-      PredictLineC(in, in - stride, out, 1, inverse);
-      GradientPredictDirect(in + 1, in + 1 - stride, out + 1, width - 1);
-    }
+    out[0] = in[0] - in[-stride];
+    GradientPredictDirect(in + 1, in + 1 - stride, out + 1, width - 1);
     ++row;
     in += stride;
     out += stride;
@@ -298,36 +200,112 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in,
 
 static void HorizontalFilter(const uint8_t* data, int width, int height,
                              int stride, uint8_t* filtered_data) {
-  DoHorizontalFilter(data, width, height, stride, 0, height, 0, filtered_data);
+  DoHorizontalFilter(data, width, height, stride, 0, height, filtered_data);
 }
 
 static void VerticalFilter(const uint8_t* data, int width, int height,
                            int stride, uint8_t* filtered_data) {
-  DoVerticalFilter(data, width, height, stride, 0, height, 0, filtered_data);
+  DoVerticalFilter(data, width, height, stride, 0, height, filtered_data);
 }
 
-
 static void GradientFilter(const uint8_t* data, int width, int height,
                            int stride, uint8_t* filtered_data) {
-  DoGradientFilter(data, width, height, stride, 0, height, 0, filtered_data);
+  DoGradientFilter(data, width, height, stride, 0, height, filtered_data);
 }
 
-
 //------------------------------------------------------------------------------
+// Inverse transforms
 
-static void VerticalUnfilter(int width, int height, int stride, int row,
-                             int num_rows, uint8_t* data) {
-  DoVerticalFilter(data, width, height, stride, row, num_rows, 1, data);
+static void HorizontalUnfilter(const uint8_t* prev, const uint8_t* in,
+                               uint8_t* out, int width) {
+  int i;
+  __m128i last;
+  out[0] = in[0] + (prev == NULL ? 0 : prev[0]);
+  if (width <= 1) return;
+  last = _mm_set_epi32(0, 0, 0, out[0]);
+  for (i = 1; i + 8 <= width; i += 8) {
+    const __m128i A0 = _mm_loadl_epi64((const __m128i*)(in + i));
+    const __m128i A1 = _mm_add_epi8(A0, last);
+    const __m128i A2 = _mm_slli_si128(A1, 1);
+    const __m128i A3 = _mm_add_epi8(A1, A2);
+    const __m128i A4 = _mm_slli_si128(A3, 2);
+    const __m128i A5 = _mm_add_epi8(A3, A4);
+    const __m128i A6 = _mm_slli_si128(A5, 4);
+    const __m128i A7 = _mm_add_epi8(A5, A6);
+    _mm_storel_epi64((__m128i*)(out + i), A7);
+    last = _mm_srli_epi64(A7, 56);
+  }
+  for (; i < width; ++i) out[i] = in[i] + out[i - 1];
 }
 
-static void HorizontalUnfilter(int width, int height, int stride, int row,
-                               int num_rows, uint8_t* data) {
-  DoHorizontalFilter(data, width, height, stride, row, num_rows, 1, data);
+static void VerticalUnfilter(const uint8_t* prev, const uint8_t* in,
+                             uint8_t* out, int width) {
+  if (prev == NULL) {
+    HorizontalUnfilter(NULL, in, out, width);
+  } else {
+    int i;
+    const int max_pos = width & ~31;
+    assert(width >= 0);
+    for (i = 0; i < max_pos; i += 32) {
+      const __m128i A0 = _mm_loadu_si128((const __m128i*)&in[i +  0]);
+      const __m128i A1 = _mm_loadu_si128((const __m128i*)&in[i + 16]);
+      const __m128i B0 = _mm_loadu_si128((const __m128i*)&prev[i +  0]);
+      const __m128i B1 = _mm_loadu_si128((const __m128i*)&prev[i + 16]);
+      const __m128i C0 = _mm_add_epi8(A0, B0);
+      const __m128i C1 = _mm_add_epi8(A1, B1);
+      _mm_storeu_si128((__m128i*)&out[i +  0], C0);
+      _mm_storeu_si128((__m128i*)&out[i + 16], C1);
+    }
+    for (; i < width; ++i) out[i] = in[i] + prev[i];
+  }
 }
 
-static void GradientUnfilter(int width, int height, int stride, int row,
-                             int num_rows, uint8_t* data) {
-  DoGradientFilter(data, width, height, stride, row, num_rows, 1, data);
+static void GradientPredictInverse(const uint8_t* const in,
+                                   const uint8_t* const top,
+                                   uint8_t* const row, int length) {
+  if (length > 0) {
+    int i;
+    const int max_pos = length & ~7;
+    const __m128i zero = _mm_setzero_si128();
+    __m128i A = _mm_set_epi32(0, 0, 0, row[-1]);   // left sample
+    for (i = 0; i < max_pos; i += 8) {
+      const __m128i tmp0 = _mm_loadl_epi64((const __m128i*)&top[i]);
+      const __m128i tmp1 = _mm_loadl_epi64((const __m128i*)&top[i - 1]);
+      const __m128i B = _mm_unpacklo_epi8(tmp0, zero);
+      const __m128i C = _mm_unpacklo_epi8(tmp1, zero);
+      const __m128i D = _mm_loadl_epi64((const __m128i*)&in[i]);  // base input
+      const __m128i E = _mm_sub_epi16(B, C);  // unclipped gradient basis B - C
+      __m128i out = zero;                     // accumulator for output
+      __m128i mask_hi = _mm_set_epi32(0, 0, 0, 0xff);
+      int k = 8;
+      while (1) {
+        const __m128i tmp3 = _mm_add_epi16(A, E);           // delta = A + B - C
+        const __m128i tmp4 = _mm_packus_epi16(tmp3, zero);  // saturate delta
+        const __m128i tmp5 = _mm_add_epi8(tmp4, D);         // add to in[]
+        A = _mm_and_si128(tmp5, mask_hi);                   // 1-complement clip
+        out = _mm_or_si128(out, A);                         // accumulate output
+        if (--k == 0) break;
+        A = _mm_slli_si128(A, 1);                        // rotate left sample
+        mask_hi = _mm_slli_si128(mask_hi, 1);            // rotate mask
+        A = _mm_unpacklo_epi8(A, zero);                  // convert 8b->16b
+      }
+      A = _mm_srli_si128(A, 7);       // prepare left sample for next iteration
+      _mm_storel_epi64((__m128i*)&row[i], out);
+    }
+    for (; i < length; ++i) {
+      row[i] = in[i] + GradientPredictorC(row[i - 1], top[i], top[i - 1]);
+    }
+  }
+}
+
+static void GradientUnfilter(const uint8_t* prev, const uint8_t* in,
+                             uint8_t* out, int width) {
+  if (prev == NULL) {
+    HorizontalUnfilter(NULL, in, out, width);
+  } else {
+    out[0] = in[0] + prev[0];  // predict from above
+    GradientPredictInverse(in + 1, prev + 1, out + 1, width - 1);
+  }
 }
 
 //------------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/dsp/lossless.c b/src/3rdparty/libwebp/src/dsp/lossless.c
index 71ae9d4..af913ef 100644
--- a/src/3rdparty/libwebp/src/dsp/lossless.c
+++ b/src/3rdparty/libwebp/src/dsp/lossless.c
@@ -28,9 +28,7 @@
 
 // In-place sum of each component with mod 256.
 static WEBP_INLINE void AddPixelsEq(uint32_t* a, uint32_t b) {
-  const uint32_t alpha_and_green = (*a & 0xff00ff00u) + (b & 0xff00ff00u);
-  const uint32_t red_and_blue = (*a & 0x00ff00ffu) + (b & 0x00ff00ffu);
-  *a = (alpha_and_green & 0xff00ff00u) | (red_and_blue & 0x00ff00ffu);
+  *a = VP8LAddPixels(*a, b);
 }
 
 static WEBP_INLINE uint32_t Average2(uint32_t a0, uint32_t a1) {
diff --git a/src/3rdparty/libwebp/src/dsp/lossless.h b/src/3rdparty/libwebp/src/dsp/lossless.h
index e063bdd..9f0d7a2 100644
--- a/src/3rdparty/libwebp/src/dsp/lossless.h
+++ b/src/3rdparty/libwebp/src/dsp/lossless.h
@@ -158,7 +158,8 @@ void VP8LCollectColorBlueTransforms_C(const uint32_t* argb, int stride,
 
 void VP8LResidualImage(int width, int height, int bits, int low_effort,
                        uint32_t* const argb, uint32_t* const argb_scratch,
-                       uint32_t* const image, int exact);
+                       uint32_t* const image, int near_lossless, int exact,
+                       int used_subtract_green);
 
 void VP8LColorSpaceTransform(int width, int height, int bits, int quality,
                              uint32_t* const argb, uint32_t* image);
@@ -172,6 +173,17 @@ static WEBP_INLINE uint32_t VP8LSubSampleSize(uint32_t size,
   return (size + (1 << sampling_bits) - 1) >> sampling_bits;
 }
 
+// Converts near lossless quality into max number of bits shaved off.
+static WEBP_INLINE int VP8LNearLosslessBits(int near_lossless_quality) {
+  //    100 -> 0
+  // 80..99 -> 1
+  // 60..79 -> 2
+  // 40..59 -> 3
+  // 20..39 -> 4
+  //  0..19 -> 5
+  return 5 - near_lossless_quality / 20;
+}
+
 // -----------------------------------------------------------------------------
 // Faster logarithm for integers. Small values use a look-up table.
 
@@ -262,6 +274,11 @@ extern VP8LHistogramAddFunc VP8LHistogramAdd;
 // -----------------------------------------------------------------------------
 // PrefixEncode()
 
+typedef int (*VP8LVectorMismatchFunc)(const uint32_t* const array1,
+                                      const uint32_t* const array2, int length);
+// Returns the first index where array1 and array2 are different.
+extern VP8LVectorMismatchFunc VP8LVectorMismatch;
+
 static WEBP_INLINE int VP8LBitsLog2Ceiling(uint32_t n) {
   const int log_floor = BitsLog2Floor(n);
   if (n == (n & ~(n - 1)))  // zero or a power of two.
@@ -324,7 +341,14 @@ static WEBP_INLINE void VP8LPrefixEncode(int distance, int* const code,
   }
 }
 
-// In-place difference of each component with mod 256.
+// Sum of each component, mod 256.
+static WEBP_INLINE uint32_t VP8LAddPixels(uint32_t a, uint32_t b) {
+  const uint32_t alpha_and_green = (a & 0xff00ff00u) + (b & 0xff00ff00u);
+  const uint32_t red_and_blue = (a & 0x00ff00ffu) + (b & 0x00ff00ffu);
+  return (alpha_and_green & 0xff00ff00u) | (red_and_blue & 0x00ff00ffu);
+}
+
+// Difference of each component, mod 256.
 static WEBP_INLINE uint32_t VP8LSubPixels(uint32_t a, uint32_t b) {
   const uint32_t alpha_and_green =
       0x00ff00ffu + (a & 0xff00ff00u) - (b & 0xff00ff00u);
diff --git a/src/3rdparty/libwebp/src/dsp/lossless_enc.c b/src/3rdparty/libwebp/src/dsp/lossless_enc.c
index 2eafa3d..256f6f5 100644
--- a/src/3rdparty/libwebp/src/dsp/lossless_enc.c
+++ b/src/3rdparty/libwebp/src/dsp/lossless_enc.c
@@ -382,6 +382,7 @@ static float FastLog2Slow(uint32_t v) {
 
 // Mostly used to reduce code size + readability
 static WEBP_INLINE int GetMin(int a, int b) { return (a > b) ? b : a; }
+static WEBP_INLINE int GetMax(int a, int b) { return (a < b) ? b : a; }
 
 //------------------------------------------------------------------------------
 // Methods to calculate Entropy (Shannon).
@@ -551,18 +552,204 @@ static WEBP_INLINE uint32_t Predict(VP8LPredictorFunc pred_func,
   }
 }
 
+static int MaxDiffBetweenPixels(uint32_t p1, uint32_t p2) {
+  const int diff_a = abs((int)(p1 >> 24) - (int)(p2 >> 24));
+  const int diff_r = abs((int)((p1 >> 16) & 0xff) - (int)((p2 >> 16) & 0xff));
+  const int diff_g = abs((int)((p1 >> 8) & 0xff) - (int)((p2 >> 8) & 0xff));
+  const int diff_b = abs((int)(p1 & 0xff) - (int)(p2 & 0xff));
+  return GetMax(GetMax(diff_a, diff_r), GetMax(diff_g, diff_b));
+}
+
+static int MaxDiffAroundPixel(uint32_t current, uint32_t up, uint32_t down,
+                              uint32_t left, uint32_t right) {
+  const int diff_up = MaxDiffBetweenPixels(current, up);
+  const int diff_down = MaxDiffBetweenPixels(current, down);
+  const int diff_left = MaxDiffBetweenPixels(current, left);
+  const int diff_right = MaxDiffBetweenPixels(current, right);
+  return GetMax(GetMax(diff_up, diff_down), GetMax(diff_left, diff_right));
+}
+
+static uint32_t AddGreenToBlueAndRed(uint32_t argb) {
+  const uint32_t green = (argb >> 8) & 0xff;
+  uint32_t red_blue = argb & 0x00ff00ffu;
+  red_blue += (green << 16) | green;
+  red_blue &= 0x00ff00ffu;
+  return (argb & 0xff00ff00u) | red_blue;
+}
+
+static void MaxDiffsForRow(int width, int stride, const uint32_t* const argb,
+                           uint8_t* const max_diffs, int used_subtract_green) {
+  uint32_t current, up, down, left, right;
+  int x;
+  if (width <= 2) return;
+  current = argb[0];
+  right = argb[1];
+  if (used_subtract_green) {
+    current = AddGreenToBlueAndRed(current);
+    right = AddGreenToBlueAndRed(right);
+  }
+  // max_diffs[0] and max_diffs[width - 1] are never used.
+  for (x = 1; x < width - 1; ++x) {
+    up = argb[-stride + x];
+    down = argb[stride + x];
+    left = current;
+    current = right;
+    right = argb[x + 1];
+    if (used_subtract_green) {
+      up = AddGreenToBlueAndRed(up);
+      down = AddGreenToBlueAndRed(down);
+      right = AddGreenToBlueAndRed(right);
+    }
+    max_diffs[x] = MaxDiffAroundPixel(current, up, down, left, right);
+  }
+}
+
+// Quantize the difference between the actual component value and its prediction
+// to a multiple of quantization, working modulo 256, taking care not to cross
+// a boundary (inclusive upper limit).
+static uint8_t NearLosslessComponent(uint8_t value, uint8_t predict,
+                                     uint8_t boundary, int quantization) {
+  const int residual = (value - predict) & 0xff;
+  const int boundary_residual = (boundary - predict) & 0xff;
+  const int lower = residual & ~(quantization - 1);
+  const int upper = lower + quantization;
+  // Resolve ties towards a value closer to the prediction (i.e. towards lower
+  // if value comes after prediction and towards upper otherwise).
+  const int bias = ((boundary - value) & 0xff) < boundary_residual;
+  if (residual - lower < upper - residual + bias) {
+    // lower is closer to residual than upper.
+    if (residual > boundary_residual && lower <= boundary_residual) {
+      // Halve quantization step to avoid crossing boundary. This midpoint is
+      // on the same side of boundary as residual because midpoint >= residual
+      // (since lower is closer than upper) and residual is above the boundary.
+      return lower + (quantization >> 1);
+    }
+    return lower;
+  } else {
+    // upper is closer to residual than lower.
+    if (residual <= boundary_residual && upper > boundary_residual) {
+      // Halve quantization step to avoid crossing boundary. This midpoint is
+      // on the same side of boundary as residual because midpoint <= residual
+      // (since upper is closer than lower) and residual is below the boundary.
+      return lower + (quantization >> 1);
+    }
+    return upper & 0xff;
+  }
+}
+
+// Quantize every component of the difference between the actual pixel value and
+// its prediction to a multiple of a quantization (a power of 2, not larger than
+// max_quantization which is a power of 2, smaller than max_diff). Take care if
+// value and predict have undergone subtract green, which means that red and
+// blue are represented as offsets from green.
+static uint32_t NearLossless(uint32_t value, uint32_t predict,
+                             int max_quantization, int max_diff,
+                             int used_subtract_green) {
+  int quantization;
+  uint8_t new_green = 0;
+  uint8_t green_diff = 0;
+  uint8_t a, r, g, b;
+  if (max_diff <= 2) {
+    return VP8LSubPixels(value, predict);
+  }
+  quantization = max_quantization;
+  while (quantization >= max_diff) {
+    quantization >>= 1;
+  }
+  if ((value >> 24) == 0 || (value >> 24) == 0xff) {
+    // Preserve transparency of fully transparent or fully opaque pixels.
+    a = ((value >> 24) - (predict >> 24)) & 0xff;
+  } else {
+    a = NearLosslessComponent(value >> 24, predict >> 24, 0xff, quantization);
+  }
+  g = NearLosslessComponent((value >> 8) & 0xff, (predict >> 8) & 0xff, 0xff,
+                            quantization);
+  if (used_subtract_green) {
+    // The green offset will be added to red and blue components during decoding
+    // to obtain the actual red and blue values.
+    new_green = ((predict >> 8) + g) & 0xff;
+    // The amount by which green has been adjusted during quantization. It is
+    // subtracted from red and blue for compensation, to avoid accumulating two
+    // quantization errors in them.
+    green_diff = (new_green - (value >> 8)) & 0xff;
+  }
+  r = NearLosslessComponent(((value >> 16) - green_diff) & 0xff,
+                            (predict >> 16) & 0xff, 0xff - new_green,
+                            quantization);
+  b = NearLosslessComponent((value - green_diff) & 0xff, predict & 0xff,
+                            0xff - new_green, quantization);
+  return ((uint32_t)a << 24) | ((uint32_t)r << 16) | ((uint32_t)g << 8) | b;
+}
+
+// Returns the difference between the pixel and its prediction. In case of a
+// lossy encoding, updates the source image to avoid propagating the deviation
+// further to pixels which depend on the current pixel for their predictions.
+static WEBP_INLINE uint32_t GetResidual(int width, int height,
+                                        uint32_t* const upper_row,
+                                        uint32_t* const current_row,
+                                        const uint8_t* const max_diffs,
+                                        int mode, VP8LPredictorFunc pred_func,
+                                        int x, int y, int max_quantization,
+                                        int exact, int used_subtract_green) {
+  const uint32_t predict = Predict(pred_func, x, y, current_row, upper_row);
+  uint32_t residual;
+  if (max_quantization == 1 || mode == 0 || y == 0 || y == height - 1 ||
+      x == 0 || x == width - 1) {
+    residual = VP8LSubPixels(current_row[x], predict);
+  } else {
+    residual = NearLossless(current_row[x], predict, max_quantization,
+                            max_diffs[x], used_subtract_green);
+    // Update the source image.
+    current_row[x] = VP8LAddPixels(predict, residual);
+    // x is never 0 here so we do not need to update upper_row like below.
+  }
+  if (!exact && (current_row[x] & kMaskAlpha) == 0) {
+    // If alpha is 0, cleanup RGB. We can choose the RGB values of the residual
+    // for best compression. The prediction of alpha itself can be non-zero and
+    // must be kept though. We choose RGB of the residual to be 0.
+    residual &= kMaskAlpha;
+    // Update the source image.
+    current_row[x] = predict & ~kMaskAlpha;
+    // The prediction for the rightmost pixel in a row uses the leftmost pixel
+    // in that row as its top-right context pixel. Hence if we change the
+    // leftmost pixel of current_row, the corresponding change must be applied
+    // to upper_row as well where top-right context is being read from.
+    if (x == 0 && y != 0) upper_row[width] = current_row[0];
+  }
+  return residual;
+}
+
 // Returns best predictor and updates the accumulated histogram.
+// If max_quantization > 1, assumes that near lossless processing will be
+// applied, quantizing residuals to multiples of quantization levels up to
+// max_quantization (the actual quantization level depends on smoothness near
+// the given pixel).
 static int GetBestPredictorForTile(int width, int height,
                                    int tile_x, int tile_y, int bits,
                                    int accumulated[4][256],
-                                   const uint32_t* const argb_scratch,
-                                   int exact) {
+                                   uint32_t* const argb_scratch,
+                                   const uint32_t* const argb,
+                                   int max_quantization,
+                                   int exact, int used_subtract_green) {
   const int kNumPredModes = 14;
-  const int col_start = tile_x << bits;
-  const int row_start = tile_y << bits;
+  const int start_x = tile_x << bits;
+  const int start_y = tile_y << bits;
   const int tile_size = 1 << bits;
-  const int max_y = GetMin(tile_size, height - row_start);
-  const int max_x = GetMin(tile_size, width - col_start);
+  const int max_y = GetMin(tile_size, height - start_y);
+  const int max_x = GetMin(tile_size, width - start_x);
+  // Whether there exist columns just outside the tile.
+  const int have_left = (start_x > 0);
+  const int have_right = (max_x < width - start_x);
+  // Position and size of the strip covering the tile and adjacent columns if
+  // they exist.
+  const int context_start_x = start_x - have_left;
+  const int context_width = max_x + have_left + have_right;
+  // The width of upper_row and current_row is one pixel larger than image width
+  // to allow the top right pixel to point to the leftmost pixel of the next row
+  // when at the right edge.
+  uint32_t* upper_row = argb_scratch;
+  uint32_t* current_row = upper_row + width + 1;
+  uint8_t* const max_diffs = (uint8_t*)(current_row + width + 1);
   float best_diff = MAX_DIFF_COST;
   int best_mode = 0;
   int mode;
@@ -571,28 +758,46 @@ static int GetBestPredictorForTile(int width, int height,
   // Need pointers to be able to swap arrays.
   int (*histo_argb)[256] = histo_stack_1;
   int (*best_histo)[256] = histo_stack_2;
-
   int i, j;
+
   for (mode = 0; mode < kNumPredModes; ++mode) {
-    const uint32_t* current_row = argb_scratch;
     const VP8LPredictorFunc pred_func = VP8LPredictors[mode];
     float cur_diff;
-    int y;
+    int relative_y;
     memset(histo_argb, 0, sizeof(histo_stack_1));
-    for (y = 0; y < max_y; ++y) {
-      int x;
-      const int row = row_start + y;
-      const uint32_t* const upper_row = current_row;
-      current_row = upper_row + width;
-      for (x = 0; x < max_x; ++x) {
-        const int col = col_start + x;
-        const uint32_t predict =
-            Predict(pred_func, col, row, current_row, upper_row);
-        uint32_t residual = VP8LSubPixels(current_row[col], predict);
-        if (!exact && (current_row[col] & kMaskAlpha) == 0) {
-          residual &= kMaskAlpha;  // See CopyTileWithPrediction.
-        }
-        UpdateHisto(histo_argb, residual);
+    if (start_y > 0) {
+      // Read the row above the tile which will become the first upper_row.
+      // Include a pixel to the left if it exists; include a pixel to the right
+      // in all cases (wrapping to the leftmost pixel of the next row if it does
+      // not exist).
+      memcpy(current_row + context_start_x,
+             argb + (start_y - 1) * width + context_start_x,
+             sizeof(*argb) * (max_x + have_left + 1));
+    }
+    for (relative_y = 0; relative_y < max_y; ++relative_y) {
+      const int y = start_y + relative_y;
+      int relative_x;
+      uint32_t* tmp = upper_row;
+      upper_row = current_row;
+      current_row = tmp;
+      // Read current_row. Include a pixel to the left if it exists; include a
+      // pixel to the right in all cases except at the bottom right corner of
+      // the image (wrapping to the leftmost pixel of the next row if it does
+      // not exist in the current row).
+      memcpy(current_row + context_start_x,
+             argb + y * width + context_start_x,
+             sizeof(*argb) * (max_x + have_left + (y + 1 < height)));
+      if (max_quantization > 1 && y >= 1 && y + 1 < height) {
+        MaxDiffsForRow(context_width, width, argb + y * width + context_start_x,
+                       max_diffs + context_start_x, used_subtract_green);
+      }
+
+      for (relative_x = 0; relative_x < max_x; ++relative_x) {
+        const int x = start_x + relative_x;
+        UpdateHisto(histo_argb,
+                    GetResidual(width, height, upper_row, current_row,
+                                max_diffs, mode, pred_func, x, y,
+                                max_quantization, exact, used_subtract_green));
       }
     }
     cur_diff = PredictionCostSpatialHistogram(
@@ -615,71 +820,82 @@ static int GetBestPredictorForTile(int width, int height,
   return best_mode;
 }
 
+// Converts pixels of the image to residuals with respect to predictions.
+// If max_quantization > 1, applies near lossless processing, quantizing
+// residuals to multiples of quantization levels up to max_quantization
+// (the actual quantization level depends on smoothness near the given pixel).
 static void CopyImageWithPrediction(int width, int height,
                                     int bits, uint32_t* const modes,
                                     uint32_t* const argb_scratch,
                                     uint32_t* const argb,
-                                    int low_effort, int exact) {
+                                    int low_effort, int max_quantization,
+                                    int exact, int used_subtract_green) {
   const int tiles_per_row = VP8LSubSampleSize(width, bits);
   const int mask = (1 << bits) - 1;
-  // The row size is one pixel longer to allow the top right pixel to point to
-  // the leftmost pixel of the next row when at the right edge.
-  uint32_t* current_row = argb_scratch;
-  uint32_t* upper_row = argb_scratch + width + 1;
+  // The width of upper_row and current_row is one pixel larger than image width
+  // to allow the top right pixel to point to the leftmost pixel of the next row
+  // when at the right edge.
+  uint32_t* upper_row = argb_scratch;
+  uint32_t* current_row = upper_row + width + 1;
+  uint8_t* current_max_diffs = (uint8_t*)(current_row + width + 1);
+  uint8_t* lower_max_diffs = current_max_diffs + width;
   int y;
-  VP8LPredictorFunc pred_func =
-      low_effort ? VP8LPredictors[kPredLowEffort] : NULL;
+  int mode = 0;
+  VP8LPredictorFunc pred_func = NULL;
 
   for (y = 0; y < height; ++y) {
     int x;
-    uint32_t* tmp = upper_row;
+    uint32_t* const tmp32 = upper_row;
     upper_row = current_row;
-    current_row = tmp;
-    memcpy(current_row, argb + y * width, sizeof(*current_row) * width);
-    current_row[width] = (y + 1 < height) ? argb[(y + 1) * width] : ARGB_BLACK;
+    current_row = tmp32;
+    memcpy(current_row, argb + y * width,
+           sizeof(*argb) * (width + (y + 1 < height)));
 
     if (low_effort) {
       for (x = 0; x < width; ++x) {
-        const uint32_t predict =
-            Predict(pred_func, x, y, current_row, upper_row);
+        const uint32_t predict = Predict(VP8LPredictors[kPredLowEffort], x, y,
+                                         current_row, upper_row);
         argb[y * width + x] = VP8LSubPixels(current_row[x], predict);
       }
     } else {
+      if (max_quantization > 1) {
+        // Compute max_diffs for the lower row now, because that needs the
+        // contents of argb for the current row, which we will overwrite with
+        // residuals before proceeding with the next row.
+        uint8_t* const tmp8 = current_max_diffs;
+        current_max_diffs = lower_max_diffs;
+        lower_max_diffs = tmp8;
+        if (y + 2 < height) {
+          MaxDiffsForRow(width, width, argb + (y + 1) * width, lower_max_diffs,
+                         used_subtract_green);
+        }
+      }
       for (x = 0; x < width; ++x) {
-        uint32_t predict, residual;
         if ((x & mask) == 0) {
-          const int mode =
-              (modes[(y >> bits) * tiles_per_row + (x >> bits)] >> 8) & 0xff;
+          mode = (modes[(y >> bits) * tiles_per_row + (x >> bits)] >> 8) & 0xff;
           pred_func = VP8LPredictors[mode];
         }
-        predict = Predict(pred_func, x, y, current_row, upper_row);
-        residual = VP8LSubPixels(current_row[x], predict);
-        if (!exact && (current_row[x] & kMaskAlpha) == 0) {
-          // If alpha is 0, cleanup RGB. We can choose the RGB values of the
-          // residual for best compression. The prediction of alpha itself can
-          // be non-zero and must be kept though. We choose RGB of the residual
-          // to be 0.
-          residual &= kMaskAlpha;
-          // Update input image so that next predictions use correct RGB value.
-          current_row[x] = predict & ~kMaskAlpha;
-          if (x == 0 && y != 0) upper_row[width] = current_row[x];
-        }
-        argb[y * width + x] = residual;
+        argb[y * width + x] = GetResidual(
+            width, height, upper_row, current_row, current_max_diffs, mode,
+            pred_func, x, y, max_quantization, exact, used_subtract_green);
       }
     }
   }
 }
 
+// Finds the best predictor for each tile, and converts the image to residuals
+// with respect to predictions. If near_lossless_quality < 100, applies
+// near lossless processing, shaving off more bits of residuals for lower
+// qualities.
 void VP8LResidualImage(int width, int height, int bits, int low_effort,
                        uint32_t* const argb, uint32_t* const argb_scratch,
-                       uint32_t* const image, int exact) {
-  const int max_tile_size = 1 << bits;
+                       uint32_t* const image, int near_lossless_quality,
+                       int exact, int used_subtract_green) {
   const int tiles_per_row = VP8LSubSampleSize(width, bits);
   const int tiles_per_col = VP8LSubSampleSize(height, bits);
-  uint32_t* const upper_row = argb_scratch;
-  uint32_t* const current_tile_rows = argb_scratch + width;
   int tile_y;
   int histo[4][256];
+  const int max_quantization = 1 << VP8LNearLosslessBits(near_lossless_quality);
   if (low_effort) {
     int i;
     for (i = 0; i < tiles_per_row * tiles_per_col; ++i) {
@@ -688,26 +904,19 @@ void VP8LResidualImage(int width, int height, int bits, int low_effort,
   } else {
     memset(histo, 0, sizeof(histo));
     for (tile_y = 0; tile_y < tiles_per_col; ++tile_y) {
-      const int tile_y_offset = tile_y * max_tile_size;
-      const int this_tile_height =
-          (tile_y < tiles_per_col - 1) ? max_tile_size : height - tile_y_offset;
       int tile_x;
-      if (tile_y > 0) {
-        memcpy(upper_row, current_tile_rows + (max_tile_size - 1) * width,
-               width * sizeof(*upper_row));
-      }
-      memcpy(current_tile_rows, &argb[tile_y_offset * width],
-             this_tile_height * width * sizeof(*current_tile_rows));
       for (tile_x = 0; tile_x < tiles_per_row; ++tile_x) {
         const int pred = GetBestPredictorForTile(width, height, tile_x, tile_y,
-            bits, (int (*)[256])histo, argb_scratch, exact);
+            bits, histo, argb_scratch, argb, max_quantization, exact,
+            used_subtract_green);
         image[tile_y * tiles_per_row + tile_x] = ARGB_BLACK | (pred << 8);
       }
     }
   }
 
-  CopyImageWithPrediction(width, height, bits,
-                          image, argb_scratch, argb, low_effort, exact);
+  CopyImageWithPrediction(width, height, bits, image, argb_scratch, argb,
+                          low_effort, max_quantization, exact,
+                          used_subtract_green);
 }
 
 void VP8LSubtractGreenFromBlueAndRed_C(uint32_t* argb_data, int num_pixels) {
@@ -1053,6 +1262,17 @@ void VP8LColorSpaceTransform(int width, int height, int bits, int quality,
 }
 
 //------------------------------------------------------------------------------
+
+static int VectorMismatch(const uint32_t* const array1,
+                          const uint32_t* const array2, int length) {
+  int match_len = 0;
+
+  while (match_len < length && array1[match_len] == array2[match_len]) {
+    ++match_len;
+  }
+  return match_len;
+}
+
 // Bundles multiple (1, 2, 4 or 8) pixels into a single pixel.
 void VP8LBundleColorMap(const uint8_t* const row, int width,
                         int xbits, uint32_t* const dst) {
@@ -1149,6 +1369,8 @@ GetEntropyUnrefinedHelperFunc VP8LGetEntropyUnrefinedHelper;
 
 VP8LHistogramAddFunc VP8LHistogramAdd;
 
+VP8LVectorMismatchFunc VP8LVectorMismatch;
+
 extern void VP8LEncDspInitSSE2(void);
 extern void VP8LEncDspInitSSE41(void);
 extern void VP8LEncDspInitNEON(void);
@@ -1181,6 +1403,8 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInit(void) {
 
   VP8LHistogramAdd = HistogramAdd;
 
+  VP8LVectorMismatch = VectorMismatch;
+
   // If defined, use CPUInfo() to overwrite some pointers with faster versions.
   if (VP8GetCPUInfo != NULL) {
 #if defined(WEBP_USE_SSE2)
diff --git a/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c b/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c
index e8c9834..7c894e7 100644
--- a/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c
+++ b/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c
@@ -325,6 +325,57 @@ static float CombinedShannonEntropy(const int X[256], const int Y[256]) {
 #undef ANALYZE_XY
 
 //------------------------------------------------------------------------------
+
+static int VectorMismatch(const uint32_t* const array1,
+                          const uint32_t* const array2, int length) {
+  int match_len;
+
+  if (length >= 12) {
+    __m128i A0 = _mm_loadu_si128((const __m128i*)&array1[0]);
+    __m128i A1 = _mm_loadu_si128((const __m128i*)&array2[0]);
+    match_len = 0;
+    do {
+      // Loop unrolling and early load both provide a speedup of 10% for the
+      // current function. Also, max_limit can be MAX_LENGTH=4096 at most.
+      const __m128i cmpA = _mm_cmpeq_epi32(A0, A1);
+      const __m128i B0 =
+          _mm_loadu_si128((const __m128i*)&array1[match_len + 4]);
+      const __m128i B1 =
+          _mm_loadu_si128((const __m128i*)&array2[match_len + 4]);
+      if (_mm_movemask_epi8(cmpA) != 0xffff) break;
+      match_len += 4;
+
+      {
+        const __m128i cmpB = _mm_cmpeq_epi32(B0, B1);
+        A0 = _mm_loadu_si128((const __m128i*)&array1[match_len + 4]);
+        A1 = _mm_loadu_si128((const __m128i*)&array2[match_len + 4]);
+        if (_mm_movemask_epi8(cmpB) != 0xffff) break;
+        match_len += 4;
+      }
+    } while (match_len + 12 < length);
+  } else {
+    match_len = 0;
+    // Unroll the potential first two loops.
+    if (length >= 4 &&
+        _mm_movemask_epi8(_mm_cmpeq_epi32(
+            _mm_loadu_si128((const __m128i*)&array1[0]),
+            _mm_loadu_si128((const __m128i*)&array2[0]))) == 0xffff) {
+      match_len = 4;
+      if (length >= 8 &&
+          _mm_movemask_epi8(_mm_cmpeq_epi32(
+              _mm_loadu_si128((const __m128i*)&array1[4]),
+              _mm_loadu_si128((const __m128i*)&array2[4]))) == 0xffff)
+        match_len = 8;
+    }
+  }
+
+  while (match_len < length && array1[match_len] == array2[match_len]) {
+    ++match_len;
+  }
+  return match_len;
+}
+
+//------------------------------------------------------------------------------
 // Entry point
 
 extern void VP8LEncDspInitSSE2(void);
@@ -336,6 +387,7 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE2(void) {
   VP8LCollectColorRedTransforms = CollectColorRedTransforms;
   VP8LHistogramAdd = HistogramAdd;
   VP8LCombinedShannonEntropy = CombinedShannonEntropy;
+  VP8LVectorMismatch = VectorMismatch;
 }
 
 #else  // !WEBP_USE_SSE2
diff --git a/src/3rdparty/libwebp/src/dsp/msa_macro.h b/src/3rdparty/libwebp/src/dsp/msa_macro.h
new file mode 100644
index 0000000..5c707f4
--- /dev/null
+++ b/src/3rdparty/libwebp/src/dsp/msa_macro.h
@@ -0,0 +1,555 @@
+// Copyright 2016 Google Inc. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the COPYING file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS. All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+// -----------------------------------------------------------------------------
+//
+// MSA common macros
+//
+// Author(s):  Prashant Patil   (prashant.patil@imgtec.com)
+
+#ifndef WEBP_DSP_MSA_MACRO_H_
+#define WEBP_DSP_MSA_MACRO_H_
+
+#include <stdint.h>
+#include <msa.h>
+
+#if defined(__clang__)
+  #define CLANG_BUILD
+#endif
+
+#ifdef CLANG_BUILD
+  #define ADDVI_H(a, b)  __msa_addvi_h((v8i16)a, b)
+  #define SRAI_H(a, b)  __msa_srai_h((v8i16)a, b)
+  #define SRAI_W(a, b)  __msa_srai_w((v4i32)a, b)
+#else
+  #define ADDVI_H(a, b)  (a + b)
+  #define SRAI_H(a, b)  (a >> b)
+  #define SRAI_W(a, b)  (a >> b)
+#endif
+
+#define LD_B(RTYPE, psrc) *((RTYPE*)(psrc))
+#define LD_UB(...) LD_B(v16u8, __VA_ARGS__)
+#define LD_SB(...) LD_B(v16i8, __VA_ARGS__)
+
+#define LD_H(RTYPE, psrc) *((RTYPE*)(psrc))
+#define LD_UH(...) LD_H(v8u16, __VA_ARGS__)
+#define LD_SH(...) LD_H(v8i16, __VA_ARGS__)
+
+#define LD_W(RTYPE, psrc) *((RTYPE*)(psrc))
+#define LD_UW(...) LD_W(v4u32, __VA_ARGS__)
+#define LD_SW(...) LD_W(v4i32, __VA_ARGS__)
+
+#define ST_B(RTYPE, in, pdst) *((RTYPE*)(pdst)) = in
+#define ST_UB(...) ST_B(v16u8, __VA_ARGS__)
+#define ST_SB(...) ST_B(v16i8, __VA_ARGS__)
+
+#define ST_H(RTYPE, in, pdst) *((RTYPE*)(pdst)) = in
+#define ST_UH(...) ST_H(v8u16, __VA_ARGS__)
+#define ST_SH(...) ST_H(v8i16, __VA_ARGS__)
+
+#define ST_W(RTYPE, in, pdst) *((RTYPE*)(pdst)) = in
+#define ST_UW(...) ST_W(v4u32, __VA_ARGS__)
+#define ST_SW(...) ST_W(v4i32, __VA_ARGS__)
+
+#define MSA_LOAD_FUNC(TYPE, INSTR, FUNC_NAME)             \
+  static inline TYPE FUNC_NAME(const void* const psrc) {  \
+    const uint8_t* const psrc_m = (const uint8_t*)psrc;   \
+    TYPE val_m;                                           \
+    asm volatile (                                        \
+      "" #INSTR " %[val_m], %[psrc_m]  \n\t"              \
+      : [val_m] "=r" (val_m)                              \
+      : [psrc_m] "m" (*psrc_m));                          \
+    return val_m;                                         \
+  }
+
+#define MSA_LOAD(psrc, FUNC_NAME)  FUNC_NAME(psrc)
+
+#define MSA_STORE_FUNC(TYPE, INSTR, FUNC_NAME)               \
+  static inline void FUNC_NAME(TYPE val, void* const pdst) { \
+    uint8_t* const pdst_m = (uint8_t*)pdst;                  \
+    TYPE val_m = val;                                        \
+    asm volatile (                                           \
+      " " #INSTR "  %[val_m],  %[pdst_m]  \n\t"              \
+      : [pdst_m] "=m" (*pdst_m)                              \
+      : [val_m] "r" (val_m));                                \
+  }
+
+#define MSA_STORE(val, pdst, FUNC_NAME)  FUNC_NAME(val, pdst)
+
+#if (__mips_isa_rev >= 6)
+  MSA_LOAD_FUNC(uint16_t, lh, msa_lh);
+  #define LH(psrc)  MSA_LOAD(psrc, msa_lh)
+  MSA_LOAD_FUNC(uint32_t, lw, msa_lw);
+  #define LW(psrc)  MSA_LOAD(psrc, msa_lw)
+  #if (__mips == 64)
+    MSA_LOAD_FUNC(uint64_t, ld, msa_ld);
+    #define LD(psrc)  MSA_LOAD(psrc, msa_ld)
+  #else  // !(__mips == 64)
+    #define LD(psrc)  ((((uint64_t)MSA_LOAD(psrc + 4, msa_lw)) << 32) | \
+                       MSA_LOAD(psrc, msa_lw))
+  #endif  // (__mips == 64)
+
+  MSA_STORE_FUNC(uint16_t, sh, msa_sh);
+  #define SH(val, pdst)  MSA_STORE(val, pdst, msa_sh)
+  MSA_STORE_FUNC(uint32_t, sw, msa_sw);
+  #define SW(val, pdst)  MSA_STORE(val, pdst, msa_sw)
+  MSA_STORE_FUNC(uint64_t, sd, msa_sd);
+  #define SD(val, pdst)  MSA_STORE(val, pdst, msa_sd)
+#else  // !(__mips_isa_rev >= 6)
+  MSA_LOAD_FUNC(uint16_t, ulh, msa_ulh);
+  #define LH(psrc)  MSA_LOAD(psrc, msa_ulh)
+  MSA_LOAD_FUNC(uint32_t, ulw, msa_ulw);
+  #define LW(psrc)  MSA_LOAD(psrc, msa_ulw)
+  #if (__mips == 64)
+    MSA_LOAD_FUNC(uint64_t, uld, msa_uld);
+    #define LD(psrc)  MSA_LOAD(psrc, msa_uld)
+  #else  // !(__mips == 64)
+    #define LD(psrc)  ((((uint64_t)MSA_LOAD(psrc + 4, msa_ulw)) << 32) | \
+                        MSA_LOAD(psrc, msa_ulw))
+  #endif  // (__mips == 64)
+
+  MSA_STORE_FUNC(uint16_t, ush, msa_ush);
+  #define SH(val, pdst)  MSA_STORE(val, pdst, msa_ush)
+  MSA_STORE_FUNC(uint32_t, usw, msa_usw);
+  #define SW(val, pdst)  MSA_STORE(val, pdst, msa_usw)
+  #define SD(val, pdst) {                                                  \
+    uint8_t* const pdst_sd_m = (uint8_t*)(pdst);                           \
+    const uint32_t val0_m = (uint32_t)(val & 0x00000000FFFFFFFF);          \
+    const uint32_t val1_m = (uint32_t)((val >> 32) & 0x00000000FFFFFFFF);  \
+    SW(val0_m, pdst_sd_m);                                                 \
+    SW(val1_m, pdst_sd_m + 4);                                             \
+  }
+#endif  // (__mips_isa_rev >= 6)
+
+/* Description : Load 4 words with stride
+ * Arguments   : Inputs  - psrc, stride
+ *               Outputs - out0, out1, out2, out3
+ * Details     : Load word in 'out0' from (psrc)
+ *               Load word in 'out1' from (psrc + stride)
+ *               Load word in 'out2' from (psrc + 2 * stride)
+ *               Load word in 'out3' from (psrc + 3 * stride)
+ */
+#define LW4(psrc, stride, out0, out1, out2, out3) {  \
+  const uint8_t* ptmp = (const uint8_t*)psrc;        \
+  out0 = LW(ptmp);                                   \
+  ptmp += stride;                                    \
+  out1 = LW(ptmp);                                   \
+  ptmp += stride;                                    \
+  out2 = LW(ptmp);                                   \
+  ptmp += stride;                                    \
+  out3 = LW(ptmp);                                   \
+}
+
+/* Description : Store 4 words with stride
+ * Arguments   : Inputs - in0, in1, in2, in3, pdst, stride
+ * Details     : Store word from 'in0' to (pdst)
+ *               Store word from 'in1' to (pdst + stride)
+ *               Store word from 'in2' to (pdst + 2 * stride)
+ *               Store word from 'in3' to (pdst + 3 * stride)
+ */
+#define SW4(in0, in1, in2, in3, pdst, stride) {  \
+  uint8_t* ptmp = (uint8_t*)pdst;                \
+  SW(in0, ptmp);                                 \
+  ptmp += stride;                                \
+  SW(in1, ptmp);                                 \
+  ptmp += stride;                                \
+  SW(in2, ptmp);                                 \
+  ptmp += stride;                                \
+  SW(in3, ptmp);                                 \
+}
+
+/* Description : Load vectors with 16 byte elements with stride
+ * Arguments   : Inputs  - psrc, stride
+ *               Outputs - out0, out1
+ *               Return Type - as per RTYPE
+ * Details     : Load 16 byte elements in 'out0' from (psrc)
+ *               Load 16 byte elements in 'out1' from (psrc + stride)
+ */
+#define LD_B2(RTYPE, psrc, stride, out0, out1) {  \
+  out0 = LD_B(RTYPE, psrc);                       \
+  out1 = LD_B(RTYPE, psrc + stride);              \
+}
+#define LD_UB2(...) LD_B2(v16u8, __VA_ARGS__)
+#define LD_SB2(...) LD_B2(v16i8, __VA_ARGS__)
+
+#define LD_B4(RTYPE, psrc, stride, out0, out1, out2, out3) {  \
+  LD_B2(RTYPE, psrc, stride, out0, out1);                     \
+  LD_B2(RTYPE, psrc + 2 * stride , stride, out2, out3);       \
+}
+#define LD_UB4(...) LD_B4(v16u8, __VA_ARGS__)
+#define LD_SB4(...) LD_B4(v16i8, __VA_ARGS__)
+
+/* Description : Load vectors with 8 halfword elements with stride
+ * Arguments   : Inputs  - psrc, stride
+ *               Outputs - out0, out1
+ * Details     : Load 8 halfword elements in 'out0' from (psrc)
+ *               Load 8 halfword elements in 'out1' from (psrc + stride)
+ */
+#define LD_H2(RTYPE, psrc, stride, out0, out1) {  \
+  out0 = LD_H(RTYPE, psrc);                       \
+  out1 = LD_H(RTYPE, psrc + stride);              \
+}
+#define LD_UH2(...) LD_H2(v8u16, __VA_ARGS__)
+#define LD_SH2(...) LD_H2(v8i16, __VA_ARGS__)
+
+/* Description : Store 4x4 byte block to destination memory from input vector
+ * Arguments   : Inputs - in0, in1, pdst, stride
+ * Details     : 'Idx0' word element from input vector 'in0' is copied to the
+ *               GP register and stored to (pdst)
+ *               'Idx1' word element from input vector 'in0' is copied to the
+ *               GP register and stored to (pdst + stride)
+ *               'Idx2' word element from input vector 'in0' is copied to the
+ *               GP register and stored to (pdst + 2 * stride)
+ *               'Idx3' word element from input vector 'in0' is copied to the
+ *               GP register and stored to (pdst + 3 * stride)
+ */
+#define ST4x4_UB(in0, in1, idx0, idx1, idx2, idx3, pdst, stride) {  \
+  uint8_t* const pblk_4x4_m = (uint8_t*)pdst;                       \
+  const uint32_t out0_m = __msa_copy_s_w((v4i32)in0, idx0);         \
+  const uint32_t out1_m = __msa_copy_s_w((v4i32)in0, idx1);         \
+  const uint32_t out2_m = __msa_copy_s_w((v4i32)in1, idx2);         \
+  const uint32_t out3_m = __msa_copy_s_w((v4i32)in1, idx3);         \
+  SW4(out0_m, out1_m, out2_m, out3_m, pblk_4x4_m, stride);          \
+}
+
+/* Description : Immediate number of elements to slide
+ * Arguments   : Inputs  - in0, in1, slide_val
+ *               Outputs - out
+ *               Return Type - as per RTYPE
+ * Details     : Byte elements from 'in1' vector are slid into 'in0' by
+ *               value specified in the 'slide_val'
+ */
+#define SLDI_B(RTYPE, in0, in1, slide_val)                      \
+        (RTYPE)__msa_sldi_b((v16i8)in0, (v16i8)in1, slide_val)  \
+
+#define SLDI_UB(...) SLDI_B(v16u8, __VA_ARGS__)
+#define SLDI_SB(...) SLDI_B(v16i8, __VA_ARGS__)
+#define SLDI_SH(...) SLDI_B(v8i16, __VA_ARGS__)
+
+/* Description : Shuffle halfword vector elements as per mask vector
+ * Arguments   : Inputs  - in0, in1, in2, in3, mask0, mask1
+ *               Outputs - out0, out1
+ *               Return Type - as per RTYPE
+ * Details     : halfword elements from 'in0' & 'in1' are copied selectively to
+ *               'out0' as per control vector 'mask0'
+ */
+#define VSHF_H2(RTYPE, in0, in1, in2, in3, mask0, mask1, out0, out1) {  \
+  out0 = (RTYPE)__msa_vshf_h((v8i16)mask0, (v8i16)in1, (v8i16)in0);     \
+  out1 = (RTYPE)__msa_vshf_h((v8i16)mask1, (v8i16)in3, (v8i16)in2);     \
+}
+#define VSHF_H2_UH(...) VSHF_H2(v8u16, __VA_ARGS__)
+#define VSHF_H2_SH(...) VSHF_H2(v8i16, __VA_ARGS__)
+
+/* Description : Clips all signed halfword elements of input vector
+ *               between 0 & 255
+ * Arguments   : Input/output  - val
+ *               Return Type - signed halfword
+ */
+#define CLIP_SH_0_255(val) {                      \
+  const v8i16 max_m = __msa_ldi_h(255);           \
+  val = __msa_maxi_s_h((v8i16)val, 0);            \
+  val = __msa_min_s_h(max_m, (v8i16)val);         \
+}
+#define CLIP_SH2_0_255(in0, in1) {  \
+  CLIP_SH_0_255(in0);               \
+  CLIP_SH_0_255(in1);               \
+}
+
+/* Description : Clips all signed word elements of input vector
+ *               between 0 & 255
+ * Arguments   : Input/output  - val
+ *               Return Type - signed word
+ */
+#define CLIP_SW_0_255(val) {                      \
+  const v4i32 max_m = __msa_ldi_w(255);           \
+  val = __msa_maxi_s_w((v4i32)val, 0);            \
+  val = __msa_min_s_w(max_m, (v4i32)val);         \
+}
+#define CLIP_SW4_0_255(in0, in1, in2, in3) {  \
+  CLIP_SW_0_255(in0);                         \
+  CLIP_SW_0_255(in1);                         \
+  CLIP_SW_0_255(in2);                         \
+  CLIP_SW_0_255(in3);                         \
+}
+
+/* Description : Set element n input vector to GPR value
+ * Arguments   : Inputs - in0, in1, in2, in3
+ *               Output - out
+ *               Return Type - as per RTYPE
+ * Details     : Set element 0 in vector 'out' to value specified in 'in0'
+ */
+#define INSERT_W2(RTYPE, in0, in1, out) {           \
+  out = (RTYPE)__msa_insert_w((v4i32)out, 0, in0);  \
+  out = (RTYPE)__msa_insert_w((v4i32)out, 1, in1);  \
+}
+#define INSERT_W2_UB(...) INSERT_W2(v16u8, __VA_ARGS__)
+#define INSERT_W2_SB(...) INSERT_W2(v16i8, __VA_ARGS__)
+
+#define INSERT_W4(RTYPE, in0, in1, in2, in3, out) {  \
+  out = (RTYPE)__msa_insert_w((v4i32)out, 0, in0);   \
+  out = (RTYPE)__msa_insert_w((v4i32)out, 1, in1);   \
+  out = (RTYPE)__msa_insert_w((v4i32)out, 2, in2);   \
+  out = (RTYPE)__msa_insert_w((v4i32)out, 3, in3);   \
+}
+#define INSERT_W4_UB(...) INSERT_W4(v16u8, __VA_ARGS__)
+#define INSERT_W4_SB(...) INSERT_W4(v16i8, __VA_ARGS__)
+#define INSERT_W4_SW(...) INSERT_W4(v4i32, __VA_ARGS__)
+
+/* Description : Interleave right half of byte elements from vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *               Outputs - out0, out1
+ *               Return Type - as per RTYPE
+ * Details     : Right half of byte elements of 'in0' and 'in1' are interleaved
+ *               and written to out0.
+ */
+#define ILVR_B2(RTYPE, in0, in1, in2, in3, out0, out1) {  \
+  out0 = (RTYPE)__msa_ilvr_b((v16i8)in0, (v16i8)in1);     \
+  out1 = (RTYPE)__msa_ilvr_b((v16i8)in2, (v16i8)in3);     \
+}
+#define ILVR_B2_UB(...) ILVR_B2(v16u8, __VA_ARGS__)
+#define ILVR_B2_SB(...) ILVR_B2(v16i8, __VA_ARGS__)
+#define ILVR_B2_UH(...) ILVR_B2(v8u16, __VA_ARGS__)
+#define ILVR_B2_SH(...) ILVR_B2(v8i16, __VA_ARGS__)
+#define ILVR_B2_SW(...) ILVR_B2(v4i32, __VA_ARGS__)
+
+#define ILVR_B4(RTYPE, in0, in1, in2, in3, in4, in5, in6, in7,  \
+                out0, out1, out2, out3) {                       \
+  ILVR_B2(RTYPE, in0, in1, in2, in3, out0, out1);               \
+  ILVR_B2(RTYPE, in4, in5, in6, in7, out2, out3);               \
+}
+#define ILVR_B4_UB(...) ILVR_B4(v16u8, __VA_ARGS__)
+#define ILVR_B4_SB(...) ILVR_B4(v16i8, __VA_ARGS__)
+#define ILVR_B4_UH(...) ILVR_B4(v8u16, __VA_ARGS__)
+#define ILVR_B4_SH(...) ILVR_B4(v8i16, __VA_ARGS__)
+#define ILVR_B4_SW(...) ILVR_B4(v4i32, __VA_ARGS__)
+
+/* Description : Interleave right half of halfword elements from vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *               Outputs - out0, out1
+ *               Return Type - as per RTYPE
+ * Details     : Right half of halfword elements of 'in0' and 'in1' are
+ *               interleaved and written to 'out0'.
+ */
+#define ILVR_H2(RTYPE, in0, in1, in2, in3, out0, out1) {  \
+  out0 = (RTYPE)__msa_ilvr_h((v8i16)in0, (v8i16)in1);     \
+  out1 = (RTYPE)__msa_ilvr_h((v8i16)in2, (v8i16)in3);     \
+}
+#define ILVR_H2_UB(...) ILVR_H2(v16u8, __VA_ARGS__)
+#define ILVR_H2_SH(...) ILVR_H2(v8i16, __VA_ARGS__)
+#define ILVR_H2_SW(...) ILVR_H2(v4i32, __VA_ARGS__)
+
+#define ILVR_H4(RTYPE, in0, in1, in2, in3, in4, in5, in6, in7,  \
+                out0, out1, out2, out3) {                       \
+  ILVR_H2(RTYPE, in0, in1, in2, in3, out0, out1);               \
+  ILVR_H2(RTYPE, in4, in5, in6, in7, out2, out3);               \
+}
+#define ILVR_H4_UB(...) ILVR_H4(v16u8, __VA_ARGS__)
+#define ILVR_H4_SH(...) ILVR_H4(v8i16, __VA_ARGS__)
+#define ILVR_H4_SW(...) ILVR_H4(v4i32, __VA_ARGS__)
+
+/* Description : Interleave right half of double word elements from vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *               Outputs - out0, out1
+ *               Return Type - as per RTYPE
+ * Details     : Right half of double word elements of 'in0' and 'in1' are
+ *               interleaved and written to 'out0'.
+ */
+#define ILVR_D2(RTYPE, in0, in1, in2, in3, out0, out1) {  \
+  out0 = (RTYPE)__msa_ilvr_d((v2i64)in0, (v2i64)in1);     \
+  out1 = (RTYPE)__msa_ilvr_d((v2i64)in2, (v2i64)in3);     \
+}
+#define ILVR_D2_UB(...) ILVR_D2(v16u8, __VA_ARGS__)
+#define ILVR_D2_SB(...) ILVR_D2(v16i8, __VA_ARGS__)
+#define ILVR_D2_SH(...) ILVR_D2(v8i16, __VA_ARGS__)
+
+#define ILVRL_H2(RTYPE, in0, in1, out0, out1) {        \
+  out0 = (RTYPE)__msa_ilvr_h((v8i16)in0, (v8i16)in1);  \
+  out1 = (RTYPE)__msa_ilvl_h((v8i16)in0, (v8i16)in1);  \
+}
+#define ILVRL_H2_UB(...) ILVRL_H2(v16u8, __VA_ARGS__)
+#define ILVRL_H2_SB(...) ILVRL_H2(v16i8, __VA_ARGS__)
+#define ILVRL_H2_SH(...) ILVRL_H2(v8i16, __VA_ARGS__)
+#define ILVRL_H2_SW(...) ILVRL_H2(v4i32, __VA_ARGS__)
+#define ILVRL_H2_UW(...) ILVRL_H2(v4u32, __VA_ARGS__)
+
+#define ILVRL_W2(RTYPE, in0, in1, out0, out1) {        \
+  out0 = (RTYPE)__msa_ilvr_w((v4i32)in0, (v4i32)in1);  \
+  out1 = (RTYPE)__msa_ilvl_w((v4i32)in0, (v4i32)in1);  \
+}
+#define ILVRL_W2_UB(...) ILVRL_W2(v16u8, __VA_ARGS__)
+#define ILVRL_W2_SH(...) ILVRL_W2(v8i16, __VA_ARGS__)
+#define ILVRL_W2_SW(...) ILVRL_W2(v4i32, __VA_ARGS__)
+
+/* Description : Pack even byte elements of vector pairs
+ *  Arguments   : Inputs  - in0, in1, in2, in3
+ *                Outputs - out0, out1
+ *                Return Type - as per RTYPE
+ *  Details     : Even byte elements of 'in0' are copied to the left half of
+ *                'out0' & even byte elements of 'in1' are copied to the right
+ *                half of 'out0'.
+ */
+#define PCKEV_B2(RTYPE, in0, in1, in2, in3, out0, out1) {  \
+  out0 = (RTYPE)__msa_pckev_b((v16i8)in0, (v16i8)in1);     \
+  out1 = (RTYPE)__msa_pckev_b((v16i8)in2, (v16i8)in3);     \
+}
+#define PCKEV_B2_SB(...) PCKEV_B2(v16i8, __VA_ARGS__)
+#define PCKEV_B2_UB(...) PCKEV_B2(v16u8, __VA_ARGS__)
+#define PCKEV_B2_SH(...) PCKEV_B2(v8i16, __VA_ARGS__)
+#define PCKEV_B2_SW(...) PCKEV_B2(v4i32, __VA_ARGS__)
+
+/* Description : Arithmetic immediate shift right all elements of word vector
+ * Arguments   : Inputs  - in0, in1, shift
+ *               Outputs - in place operation
+ *               Return Type - as per input vector RTYPE
+ * Details     : Each element of vector 'in0' is right shifted by 'shift' and
+ *               the result is written in-place. 'shift' is a GP variable.
+ */
+#define SRAI_W2(RTYPE, in0, in1, shift_val) {  \
+  in0 = (RTYPE)SRAI_W(in0, shift_val);         \
+  in1 = (RTYPE)SRAI_W(in1, shift_val);         \
+}
+#define SRAI_W2_SW(...) SRAI_W2(v4i32, __VA_ARGS__)
+#define SRAI_W2_UW(...) SRAI_W2(v4u32, __VA_ARGS__)
+
+#define SRAI_W4(RTYPE, in0, in1, in2, in3, shift_val) {  \
+  SRAI_W2(RTYPE, in0, in1, shift_val);                   \
+  SRAI_W2(RTYPE, in2, in3, shift_val);                   \
+}
+#define SRAI_W4_SW(...) SRAI_W4(v4i32, __VA_ARGS__)
+#define SRAI_W4_UW(...) SRAI_W4(v4u32, __VA_ARGS__)
+
+/* Description : Arithmetic shift right all elements of half-word vector
+ * Arguments   : Inputs  - in0, in1, shift
+ *               Outputs - in place operation
+ *               Return Type - as per input vector RTYPE
+ * Details     : Each element of vector 'in0' is right shifted by 'shift' and
+ *               the result is written in-place. 'shift' is a GP variable.
+ */
+#define SRAI_H2(RTYPE, in0, in1, shift_val) {  \
+  in0 = (RTYPE)SRAI_H(in0, shift_val);         \
+  in1 = (RTYPE)SRAI_H(in1, shift_val);         \
+}
+#define SRAI_H2_SH(...) SRAI_H2(v8i16, __VA_ARGS__)
+#define SRAI_H2_UH(...) SRAI_H2(v8u16, __VA_ARGS__)
+
+/* Description : Arithmetic rounded shift right all elements of word vector
+ * Arguments   : Inputs  - in0, in1, shift
+ *               Outputs - in place operation
+ *               Return Type - as per input vector RTYPE
+ * Details     : Each element of vector 'in0' is right shifted by 'shift' and
+ *               the result is written in-place. 'shift' is a GP variable.
+ */
+#define SRARI_W2(RTYPE, in0, in1, shift) {        \
+  in0 = (RTYPE)__msa_srari_w((v4i32)in0, shift);  \
+  in1 = (RTYPE)__msa_srari_w((v4i32)in1, shift);  \
+}
+#define SRARI_W2_SW(...) SRARI_W2(v4i32, __VA_ARGS__)
+
+#define SRARI_W4(RTYPE, in0, in1, in2, in3, shift) {  \
+  SRARI_W2(RTYPE, in0, in1, shift);                   \
+  SRARI_W2(RTYPE, in2, in3, shift);                   \
+}
+#define SRARI_W4_SH(...) SRARI_W4(v8i16, __VA_ARGS__)
+#define SRARI_W4_UW(...) SRARI_W4(v4u32, __VA_ARGS__)
+#define SRARI_W4_SW(...) SRARI_W4(v4i32, __VA_ARGS__)
+
+/* Description : Addition of 2 pairs of half-word vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *               Outputs - out0, out1
+ * Details     : Each element in 'in0' is added to 'in1' and result is written
+ *               to 'out0'.
+ */
+#define ADDVI_H2(RTYPE, in0, in1, in2, in3, out0, out1) {  \
+  out0 = (RTYPE)ADDVI_H(in0, in1);                         \
+  out1 = (RTYPE)ADDVI_H(in2, in3);                         \
+}
+#define ADDVI_H2_SH(...) ADDVI_H2(v8i16, __VA_ARGS__)
+#define ADDVI_H2_UH(...) ADDVI_H2(v8u16, __VA_ARGS__)
+
+/* Description : Addition of 2 pairs of vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *               Outputs - out0, out1
+ * Details     : Each element in 'in0' is added to 'in1' and result is written
+ *               to 'out0'.
+ */
+#define ADD2(in0, in1, in2, in3, out0, out1) {  \
+  out0 = in0 + in1;                             \
+  out1 = in2 + in3;                             \
+}
+#define ADD4(in0, in1, in2, in3, in4, in5, in6, in7,  \
+             out0, out1, out2, out3) {                \
+  ADD2(in0, in1, in2, in3, out0, out1);               \
+  ADD2(in4, in5, in6, in7, out2, out3);               \
+}
+
+/* Description : Sign extend halfword elements from input vector and return
+ *               the result in pair of vectors
+ * Arguments   : Input   - in            (halfword vector)
+ *               Outputs - out0, out1   (sign extended word vectors)
+ *               Return Type - signed word
+ * Details     : Sign bit of halfword elements from input vector 'in' is
+ *               extracted and interleaved right with same vector 'in0' to
+ *               generate 4 signed word elements in 'out0'
+ *               Then interleaved left with same vector 'in0' to
+ *               generate 4 signed word elements in 'out1'
+ */
+#define UNPCK_SH_SW(in, out0, out1) {                 \
+  const v8i16 tmp_m = __msa_clti_s_h((v8i16)in, 0);   \
+  ILVRL_H2_SW(tmp_m, in, out0, out1);                 \
+}
+
+/* Description : Butterfly of 4 input vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *               Outputs - out0, out1, out2, out3
+ * Details     : Butterfly operation
+ */
+#define BUTTERFLY_4(in0, in1, in2, in3, out0, out1, out2, out3) {  \
+  out0 = in0 + in3;                                                \
+  out1 = in1 + in2;                                                \
+  out2 = in1 - in2;                                                \
+  out3 = in0 - in3;                                                \
+}
+
+/* Description : Transpose 4x4 block with word elements in vectors
+ * Arguments   : Inputs  - in0, in1, in2, in3
+ *                Outputs - out0, out1, out2, out3
+ *                Return Type - as per RTYPE
+ */
+#define TRANSPOSE4x4_W(RTYPE, in0, in1, in2, in3, out0, out1, out2, out3) {  \
+  v4i32 s0_m, s1_m, s2_m, s3_m;                                              \
+  ILVRL_W2_SW(in1, in0, s0_m, s1_m);                                         \
+  ILVRL_W2_SW(in3, in2, s2_m, s3_m);                                         \
+  out0 = (RTYPE)__msa_ilvr_d((v2i64)s2_m, (v2i64)s0_m);                      \
+  out1 = (RTYPE)__msa_ilvl_d((v2i64)s2_m, (v2i64)s0_m);                      \
+  out2 = (RTYPE)__msa_ilvr_d((v2i64)s3_m, (v2i64)s1_m);                      \
+  out3 = (RTYPE)__msa_ilvl_d((v2i64)s3_m, (v2i64)s1_m);                      \
+}
+#define TRANSPOSE4x4_SW_SW(...) TRANSPOSE4x4_W(v4i32, __VA_ARGS__)
+
+/* Description : Add block 4x4
+ * Arguments   : Inputs - in0, in1, in2, in3, pdst, stride
+ * Details     : Least significant 4 bytes from each input vector are added to
+ *               the destination bytes, clipped between 0-255 and stored.
+ */
+#define ADDBLK_ST4x4_UB(in0, in1, in2, in3, pdst, stride) {     \
+  uint32_t src0_m, src1_m, src2_m, src3_m;                      \
+  v8i16 inp0_m, inp1_m, res0_m, res1_m;                         \
+  v16i8 dst0_m = { 0 };                                         \
+  v16i8 dst1_m = { 0 };                                         \
+  const v16i8 zero_m = { 0 };                                   \
+  ILVR_D2_SH(in1, in0, in3, in2, inp0_m, inp1_m);               \
+  LW4(pdst, stride, src0_m, src1_m, src2_m, src3_m);            \
+  INSERT_W2_SB(src0_m, src1_m, dst0_m);                         \
+  INSERT_W2_SB(src2_m, src3_m, dst1_m);                         \
+  ILVR_B2_SH(zero_m, dst0_m, zero_m, dst1_m, res0_m, res1_m);   \
+  ADD2(res0_m, inp0_m, res1_m, inp1_m, res0_m, res1_m);         \
+  CLIP_SH2_0_255(res0_m, res1_m);                               \
+  PCKEV_B2_SB(res0_m, res0_m, res1_m, res1_m, dst0_m, dst1_m);  \
+  ST4x4_UB(dst0_m, dst1_m, 0, 1, 0, 1, pdst, stride);           \
+}
+
+#endif  /* WEBP_DSP_MSA_MACRO_H_ */
diff --git a/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c b/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c
index 5ea4ddb..5b97028 100644
--- a/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c
+++ b/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c
@@ -18,6 +18,7 @@
 
 #include <assert.h>
 #include "../utils/rescaler.h"
+#include "../utils/utils.h"
 
 //------------------------------------------------------------------------------
 // Implementations of critical functions ImportRow / ExportRow
diff --git a/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c
index d4ccbe0..ed2eb74 100644
--- a/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c
+++ b/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c
@@ -14,9 +14,7 @@
 
 #include "./dsp.h"
 
-// Code is disabled for now, in favor of the plain-C version
-// TODO(djordje.pesut): adapt the code to reflect the C-version.
-#if 0 // defined(WEBP_USE_MIPS_DSP_R2)
+#if defined(WEBP_USE_MIPS_DSP_R2)
 
 #include <assert.h>
 #include "./yuv.h"
@@ -24,21 +22,21 @@
 #if !defined(WEBP_YUV_USE_TABLE)
 
 #define YUV_TO_RGB(Y, U, V, R, G, B) do {                                      \
-    const int t1 = kYScale * Y;                                                \
-    const int t2 = kVToG * V;                                                  \
-    R = kVToR * V;                                                             \
-    G = kUToG * U;                                                             \
-    B = kUToB * U;                                                             \
+    const int t1 = MultHi(Y, 19077);                                           \
+    const int t2 = MultHi(V, 13320);                                           \
+    R = MultHi(V, 26149);                                                      \
+    G = MultHi(U, 6419);                                                       \
+    B = MultHi(U, 33050);                                                      \
     R = t1 + R;                                                                \
     G = t1 - G;                                                                \
     B = t1 + B;                                                                \
-    R = R + kRCst;                                                             \
-    G = G - t2 + kGCst;                                                        \
-    B = B + kBCst;                                                             \
+    R = R - 14234;                                                             \
+    G = G - t2 + 8708;                                                         \
+    B = B - 17685;                                                             \
     __asm__ volatile (                                                         \
-      "shll_s.w         %[" #R "],      %[" #R "],        9          \n\t"     \
-      "shll_s.w         %[" #G "],      %[" #G "],        9          \n\t"     \
-      "shll_s.w         %[" #B "],      %[" #B "],        9          \n\t"     \
+      "shll_s.w         %[" #R "],      %[" #R "],        17         \n\t"     \
+      "shll_s.w         %[" #G "],      %[" #G "],        17         \n\t"     \
+      "shll_s.w         %[" #B "],      %[" #B "],        17         \n\t"     \
       "precrqu_s.qb.ph  %[" #R "],      %[" #R "],        $zero      \n\t"     \
       "precrqu_s.qb.ph  %[" #G "],      %[" #G "],        $zero      \n\t"     \
       "precrqu_s.qb.ph  %[" #B "],      %[" #B "],        $zero      \n\t"     \
@@ -279,6 +277,6 @@ WEBP_DSP_INIT_STUB(WebPInitYUV444ConvertersMIPSdspR2)
 
 #endif  // WEBP_USE_MIPS_DSP_R2
 
-#if 1  // !(defined(FANCY_UPSAMPLING) && defined(WEBP_USE_MIPS_DSP_R2))
+#if !(defined(FANCY_UPSAMPLING) && defined(WEBP_USE_MIPS_DSP_R2))
 WEBP_DSP_INIT_STUB(WebPInitUpsamplersMIPSdspR2)
 #endif
diff --git a/src/3rdparty/libwebp/src/dsp/yuv_mips32.c b/src/3rdparty/libwebp/src/dsp/yuv_mips32.c
index b8fe512..e61aac5 100644
--- a/src/3rdparty/libwebp/src/dsp/yuv_mips32.c
+++ b/src/3rdparty/libwebp/src/dsp/yuv_mips32.c
@@ -14,8 +14,7 @@
 
 #include "./dsp.h"
 
-// Code is disabled for now, in favor of the plain-C version
-#if 0  // defined(WEBP_USE_MIPS32)
+#if defined(WEBP_USE_MIPS32)
 
 #include "./yuv.h"
 
@@ -29,19 +28,19 @@ static void FUNC_NAME(const uint8_t* y,                                        \
   int i, r, g, b;                                                              \
   int temp0, temp1, temp2, temp3, temp4;                                       \
   for (i = 0; i < (len >> 1); i++) {                                           \
-    temp1 = kVToR * v[0];                                                      \
-    temp3 = kVToG * v[0];                                                      \
-    temp2 = kUToG * u[0];                                                      \
-    temp4 = kUToB * u[0];                                                      \
-    temp0 = kYScale * y[0];                                                    \
-    temp1 += kRCst;                                                            \
-    temp3 -= kGCst;                                                            \
+    temp1 = MultHi(v[0], 26149);                                               \
+    temp3 = MultHi(v[0], 13320);                                               \
+    temp2 = MultHi(u[0], 6419);                                                \
+    temp4 = MultHi(u[0], 33050);                                               \
+    temp0 = MultHi(y[0], 19077);                                               \
+    temp1 -= 14234;                                                            \
+    temp3 -= 8708;                                                             \
     temp2 += temp3;                                                            \
-    temp4 += kBCst;                                                            \
+    temp4 -= 17685;                                                            \
     r = VP8Clip8(temp0 + temp1);                                               \
     g = VP8Clip8(temp0 - temp2);                                               \
     b = VP8Clip8(temp0 + temp4);                                               \
-    temp0 = kYScale * y[1];                                                    \
+    temp0 = MultHi(y[1], 19077);                                               \
     dst[R] = r;                                                                \
     dst[G] = g;                                                                \
     dst[B] = b;                                                                \
@@ -59,15 +58,15 @@ static void FUNC_NAME(const uint8_t* y,                                        \
     dst += 2 * XSTEP;                                                          \
   }                                                                            \
   if (len & 1) {                                                               \
-    temp1 = kVToR * v[0];                                                      \
-    temp3 = kVToG * v[0];                                                      \
-    temp2 = kUToG * u[0];                                                      \
-    temp4 = kUToB * u[0];                                                      \
-    temp0 = kYScale * y[0];                                                    \
-    temp1 += kRCst;                                                            \
-    temp3 -= kGCst;                                                            \
+    temp1 = MultHi(v[0], 26149);                                               \
+    temp3 = MultHi(v[0], 13320);                                               \
+    temp2 = MultHi(u[0], 6419);                                                \
+    temp4 = MultHi(u[0], 33050);                                               \
+    temp0 = MultHi(y[0], 19077);                                               \
+    temp1 -= 14234;                                                            \
+    temp3 -= 8708;                                                             \
     temp2 += temp3;                                                            \
-    temp4 += kBCst;                                                            \
+    temp4 -= 17685;                                                            \
     r = VP8Clip8(temp0 + temp1);                                               \
     g = VP8Clip8(temp0 - temp2);                                               \
     b = VP8Clip8(temp0 + temp4);                                               \
diff --git a/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c
index dea0fdb..1720d41 100644
--- a/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c
+++ b/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c
@@ -14,8 +14,7 @@
 
 #include "./dsp.h"
 
-// Code is disabled for now, in favor of the plain-C version
-#if 0  // defined(WEBP_USE_MIPS_DSP_R2)
+#if defined(WEBP_USE_MIPS_DSP_R2)
 
 #include "./yuv.h"
 
@@ -31,10 +30,10 @@
   "mul              %[temp2],   %[t_con_3],     %[temp4]        \n\t"          \
   "mul              %[temp4],   %[t_con_4],     %[temp4]        \n\t"          \
   "mul              %[temp0],   %[t_con_5],     %[temp0]        \n\t"          \
-  "addu             %[temp1],   %[temp1],       %[t_con_6]      \n\t"          \
+  "subu             %[temp1],   %[temp1],       %[t_con_6]      \n\t"          \
   "subu             %[temp3],   %[temp3],       %[t_con_7]      \n\t"          \
   "addu             %[temp2],   %[temp2],       %[temp3]        \n\t"          \
-  "addu             %[temp4],   %[temp4],       %[t_con_8]      \n\t"          \
+  "subu             %[temp4],   %[temp4],       %[t_con_8]      \n\t"          \
 
 #define ROW_FUNC_PART_2(R, G, B, K)                                            \
   "addu             %[temp5],   %[temp0],       %[temp1]        \n\t"          \
@@ -43,12 +42,12 @@
 ".if " #K "                                                     \n\t"          \
   "lbu              %[temp0],   1(%[y])                         \n\t"          \
 ".endif                                                         \n\t"          \
-  "shll_s.w         %[temp5],   %[temp5],       9               \n\t"          \
-  "shll_s.w         %[temp6],   %[temp6],       9               \n\t"          \
+  "shll_s.w         %[temp5],   %[temp5],       17              \n\t"          \
+  "shll_s.w         %[temp6],   %[temp6],       17              \n\t"          \
 ".if " #K "                                                     \n\t"          \
   "mul              %[temp0],   %[t_con_5],     %[temp0]        \n\t"          \
 ".endif                                                         \n\t"          \
-  "shll_s.w         %[temp7],   %[temp7],       9               \n\t"          \
+  "shll_s.w         %[temp7],   %[temp7],       17              \n\t"          \
   "precrqu_s.qb.ph  %[temp5],   %[temp5],       $zero           \n\t"          \
   "precrqu_s.qb.ph  %[temp6],   %[temp6],       $zero           \n\t"          \
   "precrqu_s.qb.ph  %[temp7],   %[temp7],       $zero           \n\t"          \
@@ -75,14 +74,14 @@ static void FUNC_NAME(const uint8_t* y,                                        \
                       uint8_t* dst, int len) {                                 \
   int i;                                                                       \
   uint32_t temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;             \
-  const int t_con_1 = kVToR;                                                   \
-  const int t_con_2 = kVToG;                                                   \
-  const int t_con_3 = kUToG;                                                   \
-  const int t_con_4 = kUToB;                                                   \
-  const int t_con_5 = kYScale;                                                 \
-  const int t_con_6 = kRCst;                                                   \
-  const int t_con_7 = kGCst;                                                   \
-  const int t_con_8 = kBCst;                                                   \
+  const int t_con_1 = 26149;                                                   \
+  const int t_con_2 = 13320;                                                   \
+  const int t_con_3 = 6419;                                                    \
+  const int t_con_4 = 33050;                                                   \
+  const int t_con_5 = 19077;                                                   \
+  const int t_con_6 = 14234;                                                   \
+  const int t_con_7 = 8708;                                                    \
+  const int t_con_8 = 17685;                                                   \
   for (i = 0; i < (len >> 1); i++) {                                           \
     __asm__ volatile (                                                         \
       ROW_FUNC_PART_1()                                                        \
diff --git a/src/3rdparty/libwebp/src/dsp/yuv_sse2.c b/src/3rdparty/libwebp/src/dsp/yuv_sse2.c
index f72fe32..e19bddf 100644
--- a/src/3rdparty/libwebp/src/dsp/yuv_sse2.c
+++ b/src/3rdparty/libwebp/src/dsp/yuv_sse2.c
@@ -33,7 +33,8 @@ static void ConvertYUV444ToRGB(const __m128i* const Y0,
   const __m128i k19077 = _mm_set1_epi16(19077);
   const __m128i k26149 = _mm_set1_epi16(26149);
   const __m128i k14234 = _mm_set1_epi16(14234);
-  const __m128i k33050 = _mm_set1_epi16(33050);
+  // 33050 doesn't fit in a signed short: only use this with unsigned arithmetic
+  const __m128i k33050 = _mm_set1_epi16((short)33050);
   const __m128i k17685 = _mm_set1_epi16(17685);
   const __m128i k6419  = _mm_set1_epi16(6419);
   const __m128i k13320 = _mm_set1_epi16(13320);
diff --git a/src/3rdparty/libwebp/src/enc/alpha.c b/src/3rdparty/libwebp/src/enc/alpha.c
index 3c970b0..03e3ad0 100644
--- a/src/3rdparty/libwebp/src/enc/alpha.c
+++ b/src/3rdparty/libwebp/src/enc/alpha.c
@@ -79,7 +79,11 @@ static int EncodeLossless(const uint8_t* const data, int width, int height,
   config.quality = 8.f * effort_level;
   assert(config.quality >= 0 && config.quality <= 100.f);
 
-  ok = (VP8LEncodeStream(&config, &picture, bw) == VP8_ENC_OK);
+  // TODO(urvang): Temporary fix to avoid generating images that trigger
+  // a decoder bug related to alpha with color cache.
+  // See: https://code.google.com/p/webp/issues/detail?id=239
+  // Need to re-enable this later.
+  ok = (VP8LEncodeStream(&config, &picture, bw, 0 /*use_cache*/) == VP8_ENC_OK);
   WebPPictureFree(&picture);
   ok = ok && !bw->error_;
   if (!ok) {
@@ -118,7 +122,6 @@ static int EncodeAlphaInternal(const uint8_t* const data, int width, int height,
   assert(method >= ALPHA_NO_COMPRESSION);
   assert(method <= ALPHA_LOSSLESS_COMPRESSION);
   assert(sizeof(header) == ALPHA_HEADER_LEN);
-  // TODO(skal): have a common function and #define's to validate alpha params.
 
   filter_func = WebPFilters[filter];
   if (filter_func != NULL) {
diff --git a/src/3rdparty/libwebp/src/enc/backward_references.c b/src/3rdparty/libwebp/src/enc/backward_references.c
index c39437d..136a24a 100644
--- a/src/3rdparty/libwebp/src/enc/backward_references.c
+++ b/src/3rdparty/libwebp/src/enc/backward_references.c
@@ -27,11 +27,19 @@
 #define MAX_ENTROPY    (1e30f)
 
 // 1M window (4M bytes) minus 120 special codes for short distances.
-#define WINDOW_SIZE ((1 << 20) - 120)
+#define WINDOW_SIZE_BITS 20
+#define WINDOW_SIZE ((1 << WINDOW_SIZE_BITS) - 120)
 
 // Bounds for the match length.
 #define MIN_LENGTH 2
-#define MAX_LENGTH 4096
+// If you change this, you need MAX_LENGTH_BITS + WINDOW_SIZE_BITS <= 32 as it
+// is used in VP8LHashChain.
+#define MAX_LENGTH_BITS 12
+// We want the max value to be attainable and stored in MAX_LENGTH_BITS bits.
+#define MAX_LENGTH ((1 << MAX_LENGTH_BITS) - 1)
+#if MAX_LENGTH_BITS + WINDOW_SIZE_BITS > 32
+#error "MAX_LENGTH_BITS + WINDOW_SIZE_BITS > 32"
+#endif
 
 // -----------------------------------------------------------------------------
 
@@ -57,32 +65,19 @@ static int DistanceToPlaneCode(int xsize, int dist) {
   return dist + 120;
 }
 
-// Returns the exact index where array1 and array2 are different if this
-// index is strictly superior to best_len_match. Otherwise, it returns 0.
+// Returns the exact index where array1 and array2 are different. For an index
+// inferior or equal to best_len_match, the return value just has to be strictly
+// inferior to best_len_match. The current behavior is to return 0 if this index
+// is best_len_match, and the index itself otherwise.
 // If no two elements are the same, it returns max_limit.
 static WEBP_INLINE int FindMatchLength(const uint32_t* const array1,
                                        const uint32_t* const array2,
-                                       int best_len_match,
-                                       int max_limit) {
-  int match_len;
-
+                                       int best_len_match, int max_limit) {
   // Before 'expensive' linear match, check if the two arrays match at the
   // current best length index.
   if (array1[best_len_match] != array2[best_len_match]) return 0;
 
-#if defined(WEBP_USE_SSE2)
-  // Check if anything is different up to best_len_match excluded.
-  // memcmp seems to be slower on ARM so it is disabled for now.
-  if (memcmp(array1, array2, best_len_match * sizeof(*array1))) return 0;
-  match_len = best_len_match + 1;
-#else
-  match_len = 0;
-#endif
-
-  while (match_len < max_limit && array1[match_len] == array2[match_len]) {
-    ++match_len;
-  }
-  return match_len;
+  return VP8LVectorMismatch(array1, array2, max_limit);
 }
 
 // -----------------------------------------------------------------------------
@@ -194,31 +189,24 @@ int VP8LBackwardRefsCopy(const VP8LBackwardRefs* const src,
 // -----------------------------------------------------------------------------
 // Hash chains
 
-// initialize as empty
-static void HashChainReset(VP8LHashChain* const p) {
-  assert(p != NULL);
-  // Set the int32_t arrays to -1.
-  memset(p->chain_, 0xff, p->size_ * sizeof(*p->chain_));
-  memset(p->hash_to_first_index_, 0xff,
-         HASH_SIZE * sizeof(*p->hash_to_first_index_));
-}
-
 int VP8LHashChainInit(VP8LHashChain* const p, int size) {
   assert(p->size_ == 0);
-  assert(p->chain_ == NULL);
+  assert(p->offset_length_ == NULL);
   assert(size > 0);
-  p->chain_ = (int*)WebPSafeMalloc(size, sizeof(*p->chain_));
-  if (p->chain_ == NULL) return 0;
+  p->offset_length_ =
+      (uint32_t*)WebPSafeMalloc(size, sizeof(*p->offset_length_));
+  if (p->offset_length_ == NULL) return 0;
   p->size_ = size;
-  HashChainReset(p);
+
   return 1;
 }
 
 void VP8LHashChainClear(VP8LHashChain* const p) {
   assert(p != NULL);
-  WebPSafeFree(p->chain_);
+  WebPSafeFree(p->offset_length_);
+
   p->size_ = 0;
-  p->chain_ = NULL;
+  p->offset_length_ = NULL;
 }
 
 // -----------------------------------------------------------------------------
@@ -234,18 +222,10 @@ static WEBP_INLINE uint32_t GetPixPairHash64(const uint32_t* const argb) {
   return key;
 }
 
-// Insertion of two pixels at a time.
-static void HashChainInsert(VP8LHashChain* const p,
-                            const uint32_t* const argb, int pos) {
-  const uint32_t hash_code = GetPixPairHash64(argb);
-  p->chain_[pos] = p->hash_to_first_index_[hash_code];
-  p->hash_to_first_index_[hash_code] = pos;
-}
-
 // Returns the maximum number of hash chain lookups to do for a
-// given compression quality. Return value in range [6, 86].
-static int GetMaxItersForQuality(int quality, int low_effort) {
-  return (low_effort ? 6 : 8) + (quality * quality) / 128;
+// given compression quality. Return value in range [8, 86].
+static int GetMaxItersForQuality(int quality) {
+  return 8 + (quality * quality) / 128;
 }
 
 static int GetWindowSizeForHashChain(int quality, int xsize) {
@@ -261,63 +241,120 @@ static WEBP_INLINE int MaxFindCopyLength(int len) {
   return (len < MAX_LENGTH) ? len : MAX_LENGTH;
 }
 
-static void HashChainFindOffset(const VP8LHashChain* const p, int base_position,
-                                const uint32_t* const argb, int len,
-                                int window_size, int* const distance_ptr) {
-  const uint32_t* const argb_start = argb + base_position;
-  const int min_pos =
-      (base_position > window_size) ? base_position - window_size : 0;
+int VP8LHashChainFill(VP8LHashChain* const p, int quality,
+                      const uint32_t* const argb, int xsize, int ysize) {
+  const int size = xsize * ysize;
+  const int iter_max = GetMaxItersForQuality(quality);
+  const int iter_min = iter_max - quality / 10;
+  const uint32_t window_size = GetWindowSizeForHashChain(quality, xsize);
   int pos;
-  assert(len <= MAX_LENGTH);
-  for (pos = p->hash_to_first_index_[GetPixPairHash64(argb_start)];
-       pos >= min_pos;
-       pos = p->chain_[pos]) {
-    const int curr_length =
-        FindMatchLength(argb + pos, argb_start, len - 1, len);
-    if (curr_length == len) break;
+  uint32_t base_position;
+  int32_t* hash_to_first_index;
+  // Temporarily use the p->offset_length_ as a hash chain.
+  int32_t* chain = (int32_t*)p->offset_length_;
+  assert(p->size_ != 0);
+  assert(p->offset_length_ != NULL);
+
+  hash_to_first_index =
+      (int32_t*)WebPSafeMalloc(HASH_SIZE, sizeof(*hash_to_first_index));
+  if (hash_to_first_index == NULL) return 0;
+
+  // Set the int32_t array to -1.
+  memset(hash_to_first_index, 0xff, HASH_SIZE * sizeof(*hash_to_first_index));
+  // Fill the chain linking pixels with the same hash.
+  for (pos = 0; pos < size - 1; ++pos) {
+    const uint32_t hash_code = GetPixPairHash64(argb + pos);
+    chain[pos] = hash_to_first_index[hash_code];
+    hash_to_first_index[hash_code] = pos;
   }
-  *distance_ptr = base_position - pos;
-}
-
-static int HashChainFindCopy(const VP8LHashChain* const p,
-                             int base_position,
-                             const uint32_t* const argb, int max_len,
-                             int window_size, int iter_max,
-                             int* const distance_ptr,
-                             int* const length_ptr) {
-  const uint32_t* const argb_start = argb + base_position;
-  int iter = iter_max;
-  int best_length = 0;
-  int best_distance = 0;
-  const int min_pos =
-      (base_position > window_size) ? base_position - window_size : 0;
-  int pos;
-  int length_max = 256;
-  if (max_len < length_max) {
-    length_max = max_len;
-  }
-  for (pos = p->hash_to_first_index_[GetPixPairHash64(argb_start)];
-       pos >= min_pos;
-       pos = p->chain_[pos]) {
-    int curr_length;
-    int distance;
-    if (--iter < 0) {
-      break;
+  WebPSafeFree(hash_to_first_index);
+
+  // Find the best match interval at each pixel, defined by an offset to the
+  // pixel and a length. The right-most pixel cannot match anything to the right
+  // (hence a best length of 0) and the left-most pixel nothing to the left
+  // (hence an offset of 0).
+  p->offset_length_[0] = p->offset_length_[size - 1] = 0;
+  for (base_position = size - 2 < 0 ? 0 : size - 2; base_position > 0;) {
+    const int max_len = MaxFindCopyLength(size - 1 - base_position);
+    const uint32_t* const argb_start = argb + base_position;
+    int iter = iter_max;
+    int best_length = 0;
+    uint32_t best_distance = 0;
+    const int min_pos =
+        (base_position > window_size) ? base_position - window_size : 0;
+    const int length_max = (max_len < 256) ? max_len : 256;
+    uint32_t max_base_position;
+
+    for (pos = chain[base_position]; pos >= min_pos; pos = chain[pos]) {
+      int curr_length;
+      if (--iter < 0) {
+        break;
+      }
+      assert(base_position > (uint32_t)pos);
+
+      curr_length =
+          FindMatchLength(argb + pos, argb_start, best_length, max_len);
+      if (best_length < curr_length) {
+        best_length = curr_length;
+        best_distance = base_position - pos;
+        // Stop if we have reached the maximum length. Otherwise, make sure
+        // we have executed a minimum number of iterations depending on the
+        // quality.
+        if ((best_length == MAX_LENGTH) ||
+            (curr_length >= length_max && iter < iter_min)) {
+          break;
+        }
+      }
     }
-
-    curr_length = FindMatchLength(argb + pos, argb_start, best_length, max_len);
-    if (best_length < curr_length) {
-      distance = base_position - pos;
-      best_length = curr_length;
-      best_distance = distance;
-      if (curr_length >= length_max) {
+    // We have the best match but in case the two intervals continue matching
+    // to the left, we have the best matches for the left-extended pixels.
+    max_base_position = base_position;
+    while (1) {
+      assert(best_length <= MAX_LENGTH);
+      assert(best_distance <= WINDOW_SIZE);
+      p->offset_length_[base_position] =
+          (best_distance << MAX_LENGTH_BITS) | (uint32_t)best_length;
+      --base_position;
+      // Stop if we don't have a match or if we are out of bounds.
+      if (best_distance == 0 || base_position == 0) break;
+      // Stop if we cannot extend the matching intervals to the left.
+      if (base_position < best_distance ||
+          argb[base_position - best_distance] != argb[base_position]) {
         break;
       }
+      // Stop if we are matching at its limit because there could be a closer
+      // matching interval with the same maximum length. Then again, if the
+      // matching interval is as close as possible (best_distance == 1), we will
+      // never find anything better so let's continue.
+      if (best_length == MAX_LENGTH && best_distance != 1 &&
+          base_position + MAX_LENGTH < max_base_position) {
+        break;
+      }
+      if (best_length < MAX_LENGTH) {
+        ++best_length;
+        max_base_position = base_position;
+      }
     }
   }
-  *distance_ptr = best_distance;
-  *length_ptr = best_length;
-  return (best_length >= MIN_LENGTH);
+  return 1;
+}
+
+static WEBP_INLINE int HashChainFindOffset(const VP8LHashChain* const p,
+                                           const int base_position) {
+  return p->offset_length_[base_position] >> MAX_LENGTH_BITS;
+}
+
+static WEBP_INLINE int HashChainFindLength(const VP8LHashChain* const p,
+                                           const int base_position) {
+  return p->offset_length_[base_position] & ((1U << MAX_LENGTH_BITS) - 1);
+}
+
+static WEBP_INLINE void HashChainFindCopy(const VP8LHashChain* const p,
+                                          int base_position,
+                                          int* const offset_ptr,
+                                          int* const length_ptr) {
+  *offset_ptr = HashChainFindOffset(p, base_position);
+  *length_ptr = HashChainFindLength(p, base_position);
 }
 
 static WEBP_INLINE void AddSingleLiteral(uint32_t pixel, int use_color_cache,
@@ -384,84 +421,62 @@ static int BackwardReferencesRle(int xsize, int ysize,
 
 static int BackwardReferencesLz77(int xsize, int ysize,
                                   const uint32_t* const argb, int cache_bits,
-                                  int quality, int low_effort,
-                                  VP8LHashChain* const hash_chain,
+                                  const VP8LHashChain* const hash_chain,
                                   VP8LBackwardRefs* const refs) {
   int i;
+  int i_last_check = -1;
   int ok = 0;
   int cc_init = 0;
   const int use_color_cache = (cache_bits > 0);
   const int pix_count = xsize * ysize;
   VP8LColorCache hashers;
-  int iter_max = GetMaxItersForQuality(quality, low_effort);
-  const int window_size = GetWindowSizeForHashChain(quality, xsize);
-  int min_matches = 32;
 
   if (use_color_cache) {
     cc_init = VP8LColorCacheInit(&hashers, cache_bits);
     if (!cc_init) goto Error;
   }
   ClearBackwardRefs(refs);
-  HashChainReset(hash_chain);
-  for (i = 0; i < pix_count - 2; ) {
+  for (i = 0; i < pix_count;) {
     // Alternative#1: Code the pixels starting at 'i' using backward reference.
     int offset = 0;
     int len = 0;
-    const int max_len = MaxFindCopyLength(pix_count - i);
-    HashChainFindCopy(hash_chain, i, argb, max_len, window_size,
-                      iter_max, &offset, &len);
-    if (len > MIN_LENGTH || (len == MIN_LENGTH && offset <= 512)) {
-      int offset2 = 0;
-      int len2 = 0;
-      int k;
-      min_matches = 8;
-      HashChainInsert(hash_chain, &argb[i], i);
-      if ((len < (max_len >> 2)) && !low_effort) {
-        // Evaluate Alternative#2: Insert the pixel at 'i' as literal, and code
-        // the pixels starting at 'i + 1' using backward reference.
-        HashChainFindCopy(hash_chain, i + 1, argb, max_len - 1,
-                          window_size, iter_max, &offset2,
-                          &len2);
-        if (len2 > len + 1) {
-          AddSingleLiteral(argb[i], use_color_cache, &hashers, refs);
-          i++;  // Backward reference to be done for next pixel.
-          len = len2;
-          offset = offset2;
+    int j;
+    HashChainFindCopy(hash_chain, i, &offset, &len);
+    if (len > MIN_LENGTH + 1) {
+      const int len_ini = len;
+      int max_reach = 0;
+      assert(i + len < pix_count);
+      // Only start from what we have not checked already.
+      i_last_check = (i > i_last_check) ? i : i_last_check;
+      // We know the best match for the current pixel but we try to find the
+      // best matches for the current pixel AND the next one combined.
+      // The naive method would use the intervals:
+      // [i,i+len) + [i+len, length of best match at i+len)
+      // while we check if we can use:
+      // [i,j) (where j<=i+len) + [j, length of best match at j)
+      for (j = i_last_check + 1; j <= i + len_ini; ++j) {
+        const int len_j = HashChainFindLength(hash_chain, j);
+        const int reach =
+            j + (len_j > MIN_LENGTH + 1 ? len_j : 1);  // 1 for single literal.
+        if (reach > max_reach) {
+          len = j - i;
+          max_reach = reach;
         }
       }
-      BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len));
-      if (use_color_cache) {
-        for (k = 0; k < len; ++k) {
-          VP8LColorCacheInsert(&hashers, argb[i + k]);
-        }
-      }
-      // Add to the hash_chain (but cannot add the last pixel).
-      if (offset >= 3 && offset != xsize) {
-        const int last = (len < pix_count - 1 - i) ? len : pix_count - 1 - i;
-        for (k = 2; k < last - 8; k += 2) {
-          HashChainInsert(hash_chain, &argb[i + k], i + k);
-        }
-        for (; k < last; ++k) {
-          HashChainInsert(hash_chain, &argb[i + k], i + k);
-        }
-      }
-      i += len;
     } else {
+      len = 1;
+    }
+    // Go with literal or backward reference.
+    assert(len > 0);
+    if (len == 1) {
       AddSingleLiteral(argb[i], use_color_cache, &hashers, refs);
-      HashChainInsert(hash_chain, &argb[i], i);
-      ++i;
-      --min_matches;
-      if (min_matches <= 0) {
-        AddSingleLiteral(argb[i], use_color_cache, &hashers, refs);
-        HashChainInsert(hash_chain, &argb[i], i);
-        ++i;
+    } else {
+      BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len));
+      if (use_color_cache) {
+        for (j = i; j < i + len; ++j) VP8LColorCacheInsert(&hashers, argb[j]);
       }
     }
-  }
-  while (i < pix_count) {
-    // Handle the last pixel(s).
-    AddSingleLiteral(argb[i], use_color_cache, &hashers, refs);
-    ++i;
+    i += len;
   }
 
   ok = !refs->error_;
@@ -482,7 +497,7 @@ typedef struct {
 
 static int BackwardReferencesTraceBackwards(
     int xsize, int ysize, const uint32_t* const argb, int quality,
-    int cache_bits, VP8LHashChain* const hash_chain,
+    int cache_bits, const VP8LHashChain* const hash_chain,
     VP8LBackwardRefs* const refs);
 
 static void ConvertPopulationCountTableToBitEstimates(
@@ -558,16 +573,14 @@ static WEBP_INLINE double GetDistanceCost(const CostModel* const m,
   return m->distance_[code] + extra_bits;
 }
 
-static void AddSingleLiteralWithCostModel(
-    const uint32_t* const argb, VP8LHashChain* const hash_chain,
-    VP8LColorCache* const hashers, const CostModel* const cost_model, int idx,
-    int is_last, int use_color_cache, double prev_cost, float* const cost,
-    uint16_t* const dist_array) {
+static void AddSingleLiteralWithCostModel(const uint32_t* const argb,
+                                          VP8LColorCache* const hashers,
+                                          const CostModel* const cost_model,
+                                          int idx, int use_color_cache,
+                                          double prev_cost, float* const cost,
+                                          uint16_t* const dist_array) {
   double cost_val = prev_cost;
   const uint32_t color = argb[0];
-  if (!is_last) {
-    HashChainInsert(hash_chain, argb, idx);
-  }
   if (use_color_cache && VP8LColorCacheContains(hashers, color)) {
     const double mul0 = 0.68;
     const int ix = VP8LColorCacheGetIndex(hashers, color);
@@ -583,30 +596,598 @@ static void AddSingleLiteralWithCostModel(
   }
 }
 
+// -----------------------------------------------------------------------------
+// CostManager and interval handling
+
+// Empirical value to avoid high memory consumption but good for performance.
+#define COST_CACHE_INTERVAL_SIZE_MAX 100
+
+// To perform backward reference every pixel at index index_ is considered and
+// the cost for the MAX_LENGTH following pixels computed. Those following pixels
+// at index index_ + k (k from 0 to MAX_LENGTH) have a cost of:
+//     distance_cost_ at index_ + GetLengthCost(cost_model, k)
+//            (named cost)            (named cached cost)
+// and the minimum value is kept. GetLengthCost(cost_model, k) is cached in an
+// array of size MAX_LENGTH.
+// Instead of performing MAX_LENGTH comparisons per pixel, we keep track of the
+// minimal values using intervals, for which lower_ and upper_ bounds are kept.
+// An interval is defined by the index_ of the pixel that generated it and
+// is only useful in a range of indices from start_ to end_ (exclusive), i.e.
+// it contains the minimum value for pixels between start_ and end_.
+// Intervals are stored in a linked list and ordered by start_. When a new
+// interval has a better minimum, old intervals are split or removed.
+typedef struct CostInterval CostInterval;
+struct CostInterval {
+  double lower_;
+  double upper_;
+  int start_;
+  int end_;
+  double distance_cost_;
+  int index_;
+  CostInterval* previous_;
+  CostInterval* next_;
+};
+
+// The GetLengthCost(cost_model, k) part of the costs is also bounded for
+// efficiency in a set of intervals of a different type.
+// If those intervals are small enough, they are not used for comparison and
+// written into the costs right away.
+typedef struct {
+  double lower_;  // Lower bound of the interval.
+  double upper_;  // Upper bound of the interval.
+  int start_;
+  int end_;       // Exclusive.
+  int do_write_;  // If !=0, the interval is saved to cost instead of being kept
+                  // for comparison.
+} CostCacheInterval;
+
+// This structure is in charge of managing intervals and costs.
+// It caches the different CostCacheInterval, caches the different
+// GetLengthCost(cost_model, k) in cost_cache_ and the CostInterval's (whose
+// count_ is limited by COST_CACHE_INTERVAL_SIZE_MAX).
+#define COST_MANAGER_MAX_FREE_LIST 10
+typedef struct {
+  CostInterval* head_;
+  int count_;  // The number of stored intervals.
+  CostCacheInterval* cache_intervals_;
+  size_t cache_intervals_size_;
+  double cost_cache_[MAX_LENGTH];  // Contains the GetLengthCost(cost_model, k).
+  double min_cost_cache_;          // The minimum value in cost_cache_[1:].
+  double max_cost_cache_;          // The maximum value in cost_cache_[1:].
+  float* costs_;
+  uint16_t* dist_array_;
+  // Most of the time, we only need few intervals -> use a free-list, to avoid
+  // fragmentation with small allocs in most common cases.
+  CostInterval intervals_[COST_MANAGER_MAX_FREE_LIST];
+  CostInterval* free_intervals_;
+  // These are regularly malloc'd remains. This list can't grow larger than than
+  // size COST_CACHE_INTERVAL_SIZE_MAX - COST_MANAGER_MAX_FREE_LIST, note.
+  CostInterval* recycled_intervals_;
+  // Buffer used in BackwardReferencesHashChainDistanceOnly to store the ends
+  // of the intervals that can have impacted the cost at a pixel.
+  int* interval_ends_;
+  int interval_ends_size_;
+} CostManager;
+
+static int IsCostCacheIntervalWritable(int start, int end) {
+  // 100 is the length for which we consider an interval for comparison, and not
+  // for writing.
+  // The first intervals are very small and go in increasing size. This constant
+  // helps merging them into one big interval (up to index 150/200 usually from
+  // which intervals start getting much bigger).
+  // This value is empirical.
+  return (end - start + 1 < 100);
+}
+
+static void CostIntervalAddToFreeList(CostManager* const manager,
+                                      CostInterval* const interval) {
+  interval->next_ = manager->free_intervals_;
+  manager->free_intervals_ = interval;
+}
+
+static int CostIntervalIsInFreeList(const CostManager* const manager,
+                                    const CostInterval* const interval) {
+  return (interval >= &manager->intervals_[0] &&
+          interval <= &manager->intervals_[COST_MANAGER_MAX_FREE_LIST - 1]);
+}
+
+static void CostManagerInitFreeList(CostManager* const manager) {
+  int i;
+  manager->free_intervals_ = NULL;
+  for (i = 0; i < COST_MANAGER_MAX_FREE_LIST; ++i) {
+    CostIntervalAddToFreeList(manager, &manager->intervals_[i]);
+  }
+}
+
+static void DeleteIntervalList(CostManager* const manager,
+                               const CostInterval* interval) {
+  while (interval != NULL) {
+    const CostInterval* const next = interval->next_;
+    if (!CostIntervalIsInFreeList(manager, interval)) {
+      WebPSafeFree((void*)interval);
+    }  // else: do nothing
+    interval = next;
+  }
+}
+
+static void CostManagerClear(CostManager* const manager) {
+  if (manager == NULL) return;
+
+  WebPSafeFree(manager->costs_);
+  WebPSafeFree(manager->cache_intervals_);
+  WebPSafeFree(manager->interval_ends_);
+
+  // Clear the interval lists.
+  DeleteIntervalList(manager, manager->head_);
+  manager->head_ = NULL;
+  DeleteIntervalList(manager, manager->recycled_intervals_);
+  manager->recycled_intervals_ = NULL;
+
+  // Reset pointers, count_ and cache_intervals_size_.
+  memset(manager, 0, sizeof(*manager));
+  CostManagerInitFreeList(manager);
+}
+
+static int CostManagerInit(CostManager* const manager,
+                           uint16_t* const dist_array, int pix_count,
+                           const CostModel* const cost_model) {
+  int i;
+  const int cost_cache_size = (pix_count > MAX_LENGTH) ? MAX_LENGTH : pix_count;
+  // This constant is tied to the cost_model we use.
+  // Empirically, differences between intervals is usually of more than 1.
+  const double min_cost_diff = 0.1;
+
+  manager->costs_ = NULL;
+  manager->cache_intervals_ = NULL;
+  manager->interval_ends_ = NULL;
+  manager->head_ = NULL;
+  manager->recycled_intervals_ = NULL;
+  manager->count_ = 0;
+  manager->dist_array_ = dist_array;
+  CostManagerInitFreeList(manager);
+
+  // Fill in the cost_cache_.
+  manager->cache_intervals_size_ = 1;
+  manager->cost_cache_[0] = 0;
+  for (i = 1; i < cost_cache_size; ++i) {
+    manager->cost_cache_[i] = GetLengthCost(cost_model, i);
+    // Get an approximation of the number of bound intervals.
+    if (fabs(manager->cost_cache_[i] - manager->cost_cache_[i - 1]) >
+        min_cost_diff) {
+      ++manager->cache_intervals_size_;
+    }
+    // Compute the minimum of cost_cache_.
+    if (i == 1) {
+      manager->min_cost_cache_ = manager->cost_cache_[1];
+      manager->max_cost_cache_ = manager->cost_cache_[1];
+    } else if (manager->cost_cache_[i] < manager->min_cost_cache_) {
+      manager->min_cost_cache_ = manager->cost_cache_[i];
+    } else if (manager->cost_cache_[i] > manager->max_cost_cache_) {
+      manager->max_cost_cache_ = manager->cost_cache_[i];
+    }
+  }
+
+  // With the current cost models, we have 15 intervals, so we are safe by
+  // setting a maximum of COST_CACHE_INTERVAL_SIZE_MAX.
+  if (manager->cache_intervals_size_ > COST_CACHE_INTERVAL_SIZE_MAX) {
+    manager->cache_intervals_size_ = COST_CACHE_INTERVAL_SIZE_MAX;
+  }
+  manager->cache_intervals_ = (CostCacheInterval*)WebPSafeMalloc(
+      manager->cache_intervals_size_, sizeof(*manager->cache_intervals_));
+  if (manager->cache_intervals_ == NULL) {
+    CostManagerClear(manager);
+    return 0;
+  }
+
+  // Fill in the cache_intervals_.
+  {
+    double cost_prev = -1e38f;  // unprobably low initial value
+    CostCacheInterval* prev = NULL;
+    CostCacheInterval* cur = manager->cache_intervals_;
+    const CostCacheInterval* const end =
+        manager->cache_intervals_ + manager->cache_intervals_size_;
+
+    // Consecutive values in cost_cache_ are compared and if a big enough
+    // difference is found, a new interval is created and bounded.
+    for (i = 0; i < cost_cache_size; ++i) {
+      const double cost_val = manager->cost_cache_[i];
+      if (i == 0 ||
+          (fabs(cost_val - cost_prev) > min_cost_diff && cur + 1 < end)) {
+        if (i > 1) {
+          const int is_writable =
+              IsCostCacheIntervalWritable(cur->start_, cur->end_);
+          // Merge with the previous interval if both are writable.
+          if (is_writable && cur != manager->cache_intervals_ &&
+              prev->do_write_) {
+            // Update the previous interval.
+            prev->end_ = cur->end_;
+            if (cur->lower_ < prev->lower_) {
+              prev->lower_ = cur->lower_;
+            } else if (cur->upper_ > prev->upper_) {
+              prev->upper_ = cur->upper_;
+            }
+          } else {
+            cur->do_write_ = is_writable;
+            prev = cur;
+            ++cur;
+          }
+        }
+        // Initialize an interval.
+        cur->start_ = i;
+        cur->do_write_ = 0;
+        cur->lower_ = cost_val;
+        cur->upper_ = cost_val;
+      } else {
+        // Update the current interval bounds.
+        if (cost_val < cur->lower_) {
+          cur->lower_ = cost_val;
+        } else if (cost_val > cur->upper_) {
+          cur->upper_ = cost_val;
+        }
+      }
+      cur->end_ = i + 1;
+      cost_prev = cost_val;
+    }
+    manager->cache_intervals_size_ = cur + 1 - manager->cache_intervals_;
+  }
+
+  manager->costs_ = (float*)WebPSafeMalloc(pix_count, sizeof(*manager->costs_));
+  if (manager->costs_ == NULL) {
+    CostManagerClear(manager);
+    return 0;
+  }
+  // Set the initial costs_ high for every pixel as we will keep the minimum.
+  for (i = 0; i < pix_count; ++i) manager->costs_[i] = 1e38f;
+
+  // The cost at pixel is influenced by the cost intervals from previous pixels.
+  // Let us take the specific case where the offset is the same (which actually
+  // happens a lot in case of uniform regions).
+  // pixel i contributes to j>i a cost of: offset cost + cost_cache_[j-i]
+  // pixel i+1 contributes to j>i a cost of: 2*offset cost + cost_cache_[j-i-1]
+  // pixel i+2 contributes to j>i a cost of: 3*offset cost + cost_cache_[j-i-2]
+  // and so on.
+  // A pixel i influences the following length(j) < MAX_LENGTH pixels. What is
+  // the value of j such that pixel i + j cannot influence any of those pixels?
+  // This value is such that:
+  //               max of cost_cache_ < j*offset cost + min of cost_cache_
+  // (pixel i + j 's cost cannot beat the worst cost given by pixel i).
+  // This value will be used to optimize the cost computation in
+  // BackwardReferencesHashChainDistanceOnly.
+  {
+    // The offset cost is computed in GetDistanceCost and has a minimum value of
+    // the minimum in cost_model->distance_. The case where the offset cost is 0
+    // will be dealt with differently later so we are only interested in the
+    // minimum non-zero offset cost.
+    double offset_cost_min = 0.;
+    int size;
+    for (i = 0; i < NUM_DISTANCE_CODES; ++i) {
+      if (cost_model->distance_[i] != 0) {
+        if (offset_cost_min == 0.) {
+          offset_cost_min = cost_model->distance_[i];
+        } else if (cost_model->distance_[i] < offset_cost_min) {
+          offset_cost_min = cost_model->distance_[i];
+        }
+      }
+    }
+    // In case all the cost_model->distance_ is 0, the next non-zero cost we
+    // can have is from the extra bit in GetDistanceCost, hence 1.
+    if (offset_cost_min < 1.) offset_cost_min = 1.;
+
+    size = 1 + (int)ceil((manager->max_cost_cache_ - manager->min_cost_cache_) /
+                         offset_cost_min);
+    // Empirically, we usually end up with a value below 100.
+    if (size > MAX_LENGTH) size = MAX_LENGTH;
+
+    manager->interval_ends_ =
+        (int*)WebPSafeMalloc(size, sizeof(*manager->interval_ends_));
+    if (manager->interval_ends_ == NULL) {
+      CostManagerClear(manager);
+      return 0;
+    }
+    manager->interval_ends_size_ = size;
+  }
+
+  return 1;
+}
+
+// Given the distance_cost for pixel 'index', update the cost at pixel 'i' if it
+// is smaller than the previously computed value.
+static WEBP_INLINE void UpdateCost(CostManager* const manager, int i, int index,
+                                   double distance_cost) {
+  int k = i - index;
+  double cost_tmp;
+  assert(k >= 0 && k < MAX_LENGTH);
+  cost_tmp = distance_cost + manager->cost_cache_[k];
+
+  if (manager->costs_[i] > cost_tmp) {
+    manager->costs_[i] = (float)cost_tmp;
+    manager->dist_array_[i] = k + 1;
+  }
+}
+
+// Given the distance_cost for pixel 'index', update the cost for all the pixels
+// between 'start' and 'end' excluded.
+static WEBP_INLINE void UpdateCostPerInterval(CostManager* const manager,
+                                              int start, int end, int index,
+                                              double distance_cost) {
+  int i;
+  for (i = start; i < end; ++i) UpdateCost(manager, i, index, distance_cost);
+}
+
+// Given two intervals, make 'prev' be the previous one of 'next' in 'manager'.
+static WEBP_INLINE void ConnectIntervals(CostManager* const manager,
+                                         CostInterval* const prev,
+                                         CostInterval* const next) {
+  if (prev != NULL) {
+    prev->next_ = next;
+  } else {
+    manager->head_ = next;
+  }
+
+  if (next != NULL) next->previous_ = prev;
+}
+
+// Pop an interval in the manager.
+static WEBP_INLINE void PopInterval(CostManager* const manager,
+                                    CostInterval* const interval) {
+  CostInterval* const next = interval->next_;
+
+  if (interval == NULL) return;
+
+  ConnectIntervals(manager, interval->previous_, next);
+  if (CostIntervalIsInFreeList(manager, interval)) {
+    CostIntervalAddToFreeList(manager, interval);
+  } else {  // recycle regularly malloc'd intervals too
+    interval->next_ = manager->recycled_intervals_;
+    manager->recycled_intervals_ = interval;
+  }
+  --manager->count_;
+  assert(manager->count_ >= 0);
+}
+
+// Update the cost at index i by going over all the stored intervals that
+// overlap with i.
+static WEBP_INLINE void UpdateCostPerIndex(CostManager* const manager, int i) {
+  CostInterval* current = manager->head_;
+
+  while (current != NULL && current->start_ <= i) {
+    if (current->end_ <= i) {
+      // We have an outdated interval, remove it.
+      CostInterval* next = current->next_;
+      PopInterval(manager, current);
+      current = next;
+    } else {
+      UpdateCost(manager, i, current->index_, current->distance_cost_);
+      current = current->next_;
+    }
+  }
+}
+
+// Given a current orphan interval and its previous interval, before
+// it was orphaned (which can be NULL), set it at the right place in the list
+// of intervals using the start_ ordering and the previous interval as a hint.
+static WEBP_INLINE void PositionOrphanInterval(CostManager* const manager,
+                                               CostInterval* const current,
+                                               CostInterval* previous) {
+  assert(current != NULL);
+
+  if (previous == NULL) previous = manager->head_;
+  while (previous != NULL && current->start_ < previous->start_) {
+    previous = previous->previous_;
+  }
+  while (previous != NULL && previous->next_ != NULL &&
+         previous->next_->start_ < current->start_) {
+    previous = previous->next_;
+  }
+
+  if (previous != NULL) {
+    ConnectIntervals(manager, current, previous->next_);
+  } else {
+    ConnectIntervals(manager, current, manager->head_);
+  }
+  ConnectIntervals(manager, previous, current);
+}
+
+// Insert an interval in the list contained in the manager by starting at
+// interval_in as a hint. The intervals are sorted by start_ value.
+static WEBP_INLINE void InsertInterval(CostManager* const manager,
+                                       CostInterval* const interval_in,
+                                       double distance_cost, double lower,
+                                       double upper, int index, int start,
+                                       int end) {
+  CostInterval* interval_new;
+
+  if (IsCostCacheIntervalWritable(start, end) ||
+      manager->count_ >= COST_CACHE_INTERVAL_SIZE_MAX) {
+    // Write down the interval if it is too small.
+    UpdateCostPerInterval(manager, start, end, index, distance_cost);
+    return;
+  }
+  if (manager->free_intervals_ != NULL) {
+    interval_new = manager->free_intervals_;
+    manager->free_intervals_ = interval_new->next_;
+  } else if (manager->recycled_intervals_ != NULL) {
+    interval_new = manager->recycled_intervals_;
+    manager->recycled_intervals_ = interval_new->next_;
+  } else {   // malloc for good
+    interval_new = (CostInterval*)WebPSafeMalloc(1, sizeof(*interval_new));
+    if (interval_new == NULL) {
+      // Write down the interval if we cannot create it.
+      UpdateCostPerInterval(manager, start, end, index, distance_cost);
+      return;
+    }
+  }
+
+  interval_new->distance_cost_ = distance_cost;
+  interval_new->lower_ = lower;
+  interval_new->upper_ = upper;
+  interval_new->index_ = index;
+  interval_new->start_ = start;
+  interval_new->end_ = end;
+  PositionOrphanInterval(manager, interval_new, interval_in);
+
+  ++manager->count_;
+}
+
+// When an interval has its start_ or end_ modified, it needs to be
+// repositioned in the linked list.
+static WEBP_INLINE void RepositionInterval(CostManager* const manager,
+                                           CostInterval* const interval) {
+  if (IsCostCacheIntervalWritable(interval->start_, interval->end_)) {
+    // Maybe interval has been resized and is small enough to be removed.
+    UpdateCostPerInterval(manager, interval->start_, interval->end_,
+                          interval->index_, interval->distance_cost_);
+    PopInterval(manager, interval);
+    return;
+  }
+
+  // Early exit if interval is at the right spot.
+  if ((interval->previous_ == NULL ||
+       interval->previous_->start_ <= interval->start_) &&
+      (interval->next_ == NULL ||
+       interval->start_ <= interval->next_->start_)) {
+    return;
+  }
+
+  ConnectIntervals(manager, interval->previous_, interval->next_);
+  PositionOrphanInterval(manager, interval, interval->previous_);
+}
+
+// Given a new cost interval defined by its start at index, its last value and
+// distance_cost, add its contributions to the previous intervals and costs.
+// If handling the interval or one of its subintervals becomes to heavy, its
+// contribution is added to the costs right away.
+static WEBP_INLINE void PushInterval(CostManager* const manager,
+                                     double distance_cost, int index,
+                                     int last) {
+  size_t i;
+  CostInterval* interval = manager->head_;
+  CostInterval* interval_next;
+  const CostCacheInterval* const cost_cache_intervals =
+      manager->cache_intervals_;
+
+  for (i = 0; i < manager->cache_intervals_size_ &&
+              cost_cache_intervals[i].start_ < last;
+       ++i) {
+    // Define the intersection of the ith interval with the new one.
+    int start = index + cost_cache_intervals[i].start_;
+    const int end = index + (cost_cache_intervals[i].end_ > last
+                                 ? last
+                                 : cost_cache_intervals[i].end_);
+    const double lower_in = cost_cache_intervals[i].lower_;
+    const double upper_in = cost_cache_intervals[i].upper_;
+    const double lower_full_in = distance_cost + lower_in;
+    const double upper_full_in = distance_cost + upper_in;
+
+    if (cost_cache_intervals[i].do_write_) {
+      UpdateCostPerInterval(manager, start, end, index, distance_cost);
+      continue;
+    }
+
+    for (; interval != NULL && interval->start_ < end && start < end;
+         interval = interval_next) {
+      const double lower_full_interval =
+          interval->distance_cost_ + interval->lower_;
+      const double upper_full_interval =
+          interval->distance_cost_ + interval->upper_;
+
+      interval_next = interval->next_;
+
+      // Make sure we have some overlap
+      if (start >= interval->end_) continue;
+
+      if (lower_full_in >= upper_full_interval) {
+        // When intervals are represented, the lower, the better.
+        // [**********************************************************]
+        // start                                                    end
+        //                   [----------------------------------]
+        //                   interval->start_       interval->end_
+        // If we are worse than what we already have, add whatever we have so
+        // far up to interval.
+        const int start_new = interval->end_;
+        InsertInterval(manager, interval, distance_cost, lower_in, upper_in,
+                       index, start, interval->start_);
+        start = start_new;
+        continue;
+      }
+
+      // We know the two intervals intersect.
+      if (upper_full_in >= lower_full_interval) {
+        // There is no clear cut on which is best, so let's keep both.
+        // [*********[*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*]***********]
+        // start     interval->start_     interval->end_         end
+        // OR
+        // [*********[*-*-*-*-*-*-*-*-*-*-*-]----------------------]
+        // start     interval->start_     end          interval->end_
+        const int end_new = (interval->end_ <= end) ? interval->end_ : end;
+        InsertInterval(manager, interval, distance_cost, lower_in, upper_in,
+                       index, start, end_new);
+        start = end_new;
+      } else if (start <= interval->start_ && interval->end_ <= end) {
+        //                   [----------------------------------]
+        //                   interval->start_       interval->end_
+        // [**************************************************************]
+        // start                                                        end
+        // We can safely remove the old interval as it is fully included.
+        PopInterval(manager, interval);
+      } else {
+        if (interval->start_ <= start && end <= interval->end_) {
+          // [--------------------------------------------------------------]
+          // interval->start_                                  interval->end_
+          //                     [*****************************]
+          //                     start                       end
+          // We have to split the old interval as it fully contains the new one.
+          const int end_original = interval->end_;
+          interval->end_ = start;
+          InsertInterval(manager, interval, interval->distance_cost_,
+                         interval->lower_, interval->upper_, interval->index_,
+                         end, end_original);
+        } else if (interval->start_ < start) {
+          // [------------------------------------]
+          // interval->start_        interval->end_
+          //                     [*****************************]
+          //                     start                       end
+          interval->end_ = start;
+        } else {
+          //              [------------------------------------]
+          //              interval->start_        interval->end_
+          // [*****************************]
+          // start                       end
+          interval->start_ = end;
+        }
+
+        // The interval has been modified, we need to reposition it or write it.
+        RepositionInterval(manager, interval);
+      }
+    }
+    // Insert the remaining interval from start to end.
+    InsertInterval(manager, interval, distance_cost, lower_in, upper_in, index,
+                   start, end);
+  }
+}
+
 static int BackwardReferencesHashChainDistanceOnly(
-    int xsize, int ysize, const uint32_t* const argb,
-    int quality, int cache_bits, VP8LHashChain* const hash_chain,
+    int xsize, int ysize, const uint32_t* const argb, int quality,
+    int cache_bits, const VP8LHashChain* const hash_chain,
     VP8LBackwardRefs* const refs, uint16_t* const dist_array) {
   int i;
   int ok = 0;
   int cc_init = 0;
   const int pix_count = xsize * ysize;
   const int use_color_cache = (cache_bits > 0);
-  float* const cost =
-      (float*)WebPSafeMalloc(pix_count, sizeof(*cost));
   const size_t literal_array_size = sizeof(double) *
       (NUM_LITERAL_CODES + NUM_LENGTH_CODES +
        ((cache_bits > 0) ? (1 << cache_bits) : 0));
   const size_t cost_model_size = sizeof(CostModel) + literal_array_size;
   CostModel* const cost_model =
-      (CostModel*)WebPSafeMalloc(1ULL, cost_model_size);
+      (CostModel*)WebPSafeCalloc(1ULL, cost_model_size);
   VP8LColorCache hashers;
   const int skip_length = 32 + quality;
   const int skip_min_distance_code = 2;
-  int iter_max = GetMaxItersForQuality(quality, 0);
-  const int window_size = GetWindowSizeForHashChain(quality, xsize);
+  CostManager* cost_manager =
+      (CostManager*)WebPSafeMalloc(1ULL, sizeof(*cost_manager));
 
-  if (cost == NULL || cost_model == NULL) goto Error;
+  if (cost_model == NULL || cost_manager == NULL) goto Error;
 
   cost_model->literal_ = (double*)(cost_model + 1);
   if (use_color_cache) {
@@ -618,34 +1199,91 @@ static int BackwardReferencesHashChainDistanceOnly(
     goto Error;
   }
 
-  for (i = 0; i < pix_count; ++i) cost[i] = 1e38f;
+  if (!CostManagerInit(cost_manager, dist_array, pix_count, cost_model)) {
+    goto Error;
+  }
 
   // We loop one pixel at a time, but store all currently best points to
   // non-processed locations from this point.
   dist_array[0] = 0;
-  HashChainReset(hash_chain);
   // Add first pixel as literal.
-  AddSingleLiteralWithCostModel(argb + 0, hash_chain, &hashers, cost_model, 0,
-                                0, use_color_cache, 0.0, cost, dist_array);
+  AddSingleLiteralWithCostModel(argb + 0, &hashers, cost_model, 0,
+                                use_color_cache, 0.0, cost_manager->costs_,
+                                dist_array);
+
   for (i = 1; i < pix_count - 1; ++i) {
-    int offset = 0;
-    int len = 0;
-    double prev_cost = cost[i - 1];
-    const int max_len = MaxFindCopyLength(pix_count - i);
-    HashChainFindCopy(hash_chain, i, argb, max_len, window_size,
-                      iter_max, &offset, &len);
+    int offset = 0, len = 0;
+    double prev_cost = cost_manager->costs_[i - 1];
+    HashChainFindCopy(hash_chain, i, &offset, &len);
     if (len >= MIN_LENGTH) {
       const int code = DistanceToPlaneCode(xsize, offset);
-      const double distance_cost =
-          prev_cost + GetDistanceCost(cost_model, code);
-      int k;
-      for (k = 1; k < len; ++k) {
-        const double cost_val = distance_cost + GetLengthCost(cost_model, k);
-        if (cost[i + k] > cost_val) {
-          cost[i + k] = (float)cost_val;
-          dist_array[i + k] = k + 1;
+      const double offset_cost = GetDistanceCost(cost_model, code);
+      const int first_i = i;
+      int j_max = 0, interval_ends_index = 0;
+      const int is_offset_zero = (offset_cost == 0.);
+
+      if (!is_offset_zero) {
+        j_max = (int)ceil(
+            (cost_manager->max_cost_cache_ - cost_manager->min_cost_cache_) /
+            offset_cost);
+        if (j_max < 1) {
+          j_max = 1;
+        } else if (j_max > cost_manager->interval_ends_size_ - 1) {
+          // This could only happen in the case of MAX_LENGTH.
+          j_max = cost_manager->interval_ends_size_ - 1;
+        }
+      }  // else j_max is unused anyway.
+
+      // Instead of considering all contributions from a pixel i by calling:
+      //         PushInterval(cost_manager, prev_cost + offset_cost, i, len);
+      // we optimize these contributions in case offset_cost stays the same for
+      // consecutive pixels. This describes a set of pixels similar to a
+      // previous set (e.g. constant color regions).
+      for (; i < pix_count - 1; ++i) {
+        int offset_next, len_next;
+        prev_cost = cost_manager->costs_[i - 1];
+
+        if (is_offset_zero) {
+          // No optimization can be made so we just push all of the
+          // contributions from i.
+          PushInterval(cost_manager, prev_cost, i, len);
+        } else {
+          // j_max is chosen as the smallest j such that:
+          //       max of cost_cache_ < j*offset cost + min of cost_cache_
+          // Therefore, the pixel influenced by i-j_max, cannot be influenced
+          // by i. Only the costs after the end of what i contributed need to be
+          // updated. cost_manager->interval_ends_ is a circular buffer that
+          // stores those ends.
+          const double distance_cost = prev_cost + offset_cost;
+          int j = cost_manager->interval_ends_[interval_ends_index];
+          if (i - first_i <= j_max ||
+              !IsCostCacheIntervalWritable(j, i + len)) {
+            PushInterval(cost_manager, distance_cost, i, len);
+          } else {
+            for (; j < i + len; ++j) {
+              UpdateCost(cost_manager, j, i, distance_cost);
+            }
+          }
+          // Store the new end in the circular buffer.
+          assert(interval_ends_index < cost_manager->interval_ends_size_);
+          cost_manager->interval_ends_[interval_ends_index] = i + len;
+          if (++interval_ends_index > j_max) interval_ends_index = 0;
         }
+
+        // Check whether i is the last pixel to consider, as it is handled
+        // differently.
+        if (i + 1 >= pix_count - 1) break;
+        HashChainFindCopy(hash_chain, i + 1, &offset_next, &len_next);
+        if (offset_next != offset) break;
+        len = len_next;
+        UpdateCostPerIndex(cost_manager, i);
+        AddSingleLiteralWithCostModel(argb + i, &hashers, cost_model, i,
+                                      use_color_cache, prev_cost,
+                                      cost_manager->costs_, dist_array);
       }
+      // Submit the last pixel.
+      UpdateCostPerIndex(cost_manager, i + 1);
+
       // This if is for speedup only. It roughly doubles the speed, and
       // makes compression worse by .1 %.
       if (len >= skip_length && code <= skip_min_distance_code) {
@@ -653,53 +1291,55 @@ static int BackwardReferencesHashChainDistanceOnly(
         // lookups for better copies.
         // 1) insert the hashes.
         if (use_color_cache) {
+          int k;
           for (k = 0; k < len; ++k) {
             VP8LColorCacheInsert(&hashers, argb[i + k]);
           }
         }
-        // 2) Add to the hash_chain (but cannot add the last pixel)
+        // 2) jump.
         {
-          const int last = (len + i < pix_count - 1) ? len + i
-                                                     : pix_count - 1;
-          for (k = i; k < last; ++k) {
-            HashChainInsert(hash_chain, &argb[k], k);
-          }
+          const int i_next = i + len - 1;  // for loop does ++i, thus -1 here.
+          for (; i <= i_next; ++i) UpdateCostPerIndex(cost_manager, i + 1);
+          i = i_next;
         }
-        // 3) jump.
-        i += len - 1;  // for loop does ++i, thus -1 here.
         goto next_symbol;
       }
-      if (len != MIN_LENGTH) {
+      if (len > MIN_LENGTH) {
         int code_min_length;
         double cost_total;
-        HashChainFindOffset(hash_chain, i, argb, MIN_LENGTH, window_size,
-                            &offset);
+        offset = HashChainFindOffset(hash_chain, i);
         code_min_length = DistanceToPlaneCode(xsize, offset);
         cost_total = prev_cost +
             GetDistanceCost(cost_model, code_min_length) +
             GetLengthCost(cost_model, 1);
-        if (cost[i + 1] > cost_total) {
-          cost[i + 1] = (float)cost_total;
+        if (cost_manager->costs_[i + 1] > cost_total) {
+          cost_manager->costs_[i + 1] = (float)cost_total;
           dist_array[i + 1] = 2;
         }
       }
+    } else {    // len < MIN_LENGTH
+      UpdateCostPerIndex(cost_manager, i + 1);
     }
-    AddSingleLiteralWithCostModel(argb + i, hash_chain, &hashers, cost_model, i,
-                                  0, use_color_cache, prev_cost, cost,
-                                  dist_array);
+
+    AddSingleLiteralWithCostModel(argb + i, &hashers, cost_model, i,
+                                  use_color_cache, prev_cost,
+                                  cost_manager->costs_, dist_array);
+
  next_symbol: ;
   }
   // Handle the last pixel.
   if (i == (pix_count - 1)) {
-    AddSingleLiteralWithCostModel(argb + i, hash_chain, &hashers, cost_model, i,
-                                  1, use_color_cache, cost[pix_count - 2], cost,
-                                  dist_array);
+    AddSingleLiteralWithCostModel(
+        argb + i, &hashers, cost_model, i, use_color_cache,
+        cost_manager->costs_[pix_count - 2], cost_manager->costs_, dist_array);
   }
+
   ok = !refs->error_;
  Error:
   if (cc_init) VP8LColorCacheClear(&hashers);
+  CostManagerClear(cost_manager);
   WebPSafeFree(cost_model);
-  WebPSafeFree(cost);
+  WebPSafeFree(cost_manager);
   return ok;
 }
 
@@ -723,18 +1363,14 @@ static void TraceBackwards(uint16_t* const dist_array,
 }
 
 static int BackwardReferencesHashChainFollowChosenPath(
-    int xsize, int ysize, const uint32_t* const argb,
-    int quality, int cache_bits,
+    const uint32_t* const argb, int cache_bits,
     const uint16_t* const chosen_path, int chosen_path_size,
-    VP8LHashChain* const hash_chain,
-    VP8LBackwardRefs* const refs) {
-  const int pix_count = xsize * ysize;
+    const VP8LHashChain* const hash_chain, VP8LBackwardRefs* const refs) {
   const int use_color_cache = (cache_bits > 0);
   int ix;
   int i = 0;
   int ok = 0;
   int cc_init = 0;
-  const int window_size = GetWindowSizeForHashChain(quality, xsize);
   VP8LColorCache hashers;
 
   if (use_color_cache) {
@@ -743,25 +1379,17 @@ static int BackwardReferencesHashChainFollowChosenPath(
   }
 
   ClearBackwardRefs(refs);
-  HashChainReset(hash_chain);
   for (ix = 0; ix < chosen_path_size; ++ix) {
-    int offset = 0;
     const int len = chosen_path[ix];
     if (len != 1) {
       int k;
-      HashChainFindOffset(hash_chain, i, argb, len, window_size, &offset);
+      const int offset = HashChainFindOffset(hash_chain, i);
       BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len));
       if (use_color_cache) {
         for (k = 0; k < len; ++k) {
           VP8LColorCacheInsert(&hashers, argb[i + k]);
         }
       }
-      {
-        const int last = (len < pix_count - 1 - i) ? len : pix_count - 1 - i;
-        for (k = 0; k < last; ++k) {
-          HashChainInsert(hash_chain, &argb[i + k], i + k);
-        }
-      }
       i += len;
     } else {
       PixOrCopy v;
@@ -774,9 +1402,6 @@ static int BackwardReferencesHashChainFollowChosenPath(
         v = PixOrCopyCreateLiteral(argb[i]);
       }
       BackwardRefsCursorAdd(refs, v);
-      if (i + 1 < pix_count) {
-        HashChainInsert(hash_chain, &argb[i], i);
-      }
       ++i;
     }
   }
@@ -787,11 +1412,10 @@ static int BackwardReferencesHashChainFollowChosenPath(
 }
 
 // Returns 1 on success.
-static int BackwardReferencesTraceBackwards(int xsize, int ysize,
-                                            const uint32_t* const argb,
-                                            int quality, int cache_bits,
-                                            VP8LHashChain* const hash_chain,
-                                            VP8LBackwardRefs* const refs) {
+static int BackwardReferencesTraceBackwards(
+    int xsize, int ysize, const uint32_t* const argb, int quality,
+    int cache_bits, const VP8LHashChain* const hash_chain,
+    VP8LBackwardRefs* const refs) {
   int ok = 0;
   const int dist_array_size = xsize * ysize;
   uint16_t* chosen_path = NULL;
@@ -808,8 +1432,7 @@ static int BackwardReferencesTraceBackwards(int xsize, int ysize,
   }
   TraceBackwards(dist_array, dist_array_size, &chosen_path, &chosen_path_size);
   if (!BackwardReferencesHashChainFollowChosenPath(
-      xsize, ysize, argb, quality, cache_bits, chosen_path, chosen_path_size,
-      hash_chain, refs)) {
+          argb, cache_bits, chosen_path, chosen_path_size, hash_chain, refs)) {
     goto Error;
   }
   ok = 1;
@@ -897,7 +1520,7 @@ static double ComputeCacheEntropy(const uint32_t* argb,
 // Returns 0 in case of memory error.
 static int CalculateBestCacheSize(const uint32_t* const argb,
                                   int xsize, int ysize, int quality,
-                                  VP8LHashChain* const hash_chain,
+                                  const VP8LHashChain* const hash_chain,
                                   VP8LBackwardRefs* const refs,
                                   int* const lz77_computed,
                                   int* const best_cache_bits) {
@@ -917,8 +1540,8 @@ static int CalculateBestCacheSize(const uint32_t* const argb,
     // Local color cache is disabled.
     return 1;
   }
-  if (!BackwardReferencesLz77(xsize, ysize, argb, cache_bits_low, quality, 0,
-                              hash_chain, refs)) {
+  if (!BackwardReferencesLz77(xsize, ysize, argb, cache_bits_low, hash_chain,
+                              refs)) {
     return 0;
   }
   // Do a binary search to find the optimal entropy for cache_bits.
@@ -983,13 +1606,12 @@ static int BackwardRefsWithLocalCache(const uint32_t* const argb,
 }
 
 static VP8LBackwardRefs* GetBackwardReferencesLowEffort(
-    int width, int height, const uint32_t* const argb, int quality,
-    int* const cache_bits, VP8LHashChain* const hash_chain,
+    int width, int height, const uint32_t* const argb,
+    int* const cache_bits, const VP8LHashChain* const hash_chain,
     VP8LBackwardRefs refs_array[2]) {
   VP8LBackwardRefs* refs_lz77 = &refs_array[0];
   *cache_bits = 0;
-  if (!BackwardReferencesLz77(width, height, argb, 0, quality,
-                              1 /* Low effort. */, hash_chain, refs_lz77)) {
+  if (!BackwardReferencesLz77(width, height, argb, 0, hash_chain, refs_lz77)) {
     return NULL;
   }
   BackwardReferences2DLocality(width, refs_lz77);
@@ -998,7 +1620,7 @@ static VP8LBackwardRefs* GetBackwardReferencesLowEffort(
 
 static VP8LBackwardRefs* GetBackwardReferences(
     int width, int height, const uint32_t* const argb, int quality,
-    int* const cache_bits, VP8LHashChain* const hash_chain,
+    int* const cache_bits, const VP8LHashChain* const hash_chain,
     VP8LBackwardRefs refs_array[2]) {
   int lz77_is_useful;
   int lz77_computed;
@@ -1021,8 +1643,8 @@ static VP8LBackwardRefs* GetBackwardReferences(
       }
     }
   } else {
-    if (!BackwardReferencesLz77(width, height, argb, *cache_bits, quality,
-                                0 /* Low effort. */, hash_chain, refs_lz77)) {
+    if (!BackwardReferencesLz77(width, height, argb, *cache_bits, hash_chain,
+                                refs_lz77)) {
       goto Error;
     }
   }
@@ -1081,11 +1703,11 @@ static VP8LBackwardRefs* GetBackwardReferences(
 
 VP8LBackwardRefs* VP8LGetBackwardReferences(
     int width, int height, const uint32_t* const argb, int quality,
-    int low_effort, int* const cache_bits, VP8LHashChain* const hash_chain,
-    VP8LBackwardRefs refs_array[2]) {
+    int low_effort, int* const cache_bits,
+    const VP8LHashChain* const hash_chain, VP8LBackwardRefs refs_array[2]) {
   if (low_effort) {
-    return GetBackwardReferencesLowEffort(width, height, argb, quality,
-                                          cache_bits, hash_chain, refs_array);
+    return GetBackwardReferencesLowEffort(width, height, argb, cache_bits,
+                                          hash_chain, refs_array);
   } else {
     return GetBackwardReferences(width, height, argb, quality, cache_bits,
                                  hash_chain, refs_array);
diff --git a/src/3rdparty/libwebp/src/enc/backward_references.h b/src/3rdparty/libwebp/src/enc/backward_references.h
index daa084d..0cadb11 100644
--- a/src/3rdparty/libwebp/src/enc/backward_references.h
+++ b/src/3rdparty/libwebp/src/enc/backward_references.h
@@ -115,11 +115,12 @@ static WEBP_INLINE uint32_t PixOrCopyDistance(const PixOrCopy* const p) {
 
 typedef struct VP8LHashChain VP8LHashChain;
 struct VP8LHashChain {
-  // Stores the most recently added position with the given hash value.
-  int32_t hash_to_first_index_[HASH_SIZE];
-  // chain_[pos] stores the previous position with the same hash value
-  // for every pixel in the image.
-  int32_t* chain_;
+  // The 20 most significant bits contain the offset at which the best match
+  // is found. These 20 bits are the limit defined by GetWindowSizeForHashChain
+  // (through WINDOW_SIZE = 1<<20).
+  // The lower 12 bits contain the length of the match. The 12 bit limit is
+  // defined in MaxFindCopyLength with MAX_LENGTH=4096.
+  uint32_t* offset_length_;
   // This is the maximum size of the hash_chain that can be constructed.
   // Typically this is the pixel count (width x height) for a given image.
   int size_;
@@ -127,6 +128,9 @@ struct VP8LHashChain {
 
 // Must be called first, to set size.
 int VP8LHashChainInit(VP8LHashChain* const p, int size);
+// Pre-compute the best matches for argb.
+int VP8LHashChainFill(VP8LHashChain* const p, int quality,
+                      const uint32_t* const argb, int xsize, int ysize);
 void VP8LHashChainClear(VP8LHashChain* const p);  // release memory
 
 // -----------------------------------------------------------------------------
@@ -192,8 +196,8 @@ static WEBP_INLINE void VP8LRefsCursorNext(VP8LRefsCursor* const c) {
 // refs[0] or refs[1].
 VP8LBackwardRefs* VP8LGetBackwardReferences(
     int width, int height, const uint32_t* const argb, int quality,
-    int low_effort, int* const cache_bits, VP8LHashChain* const hash_chain,
-    VP8LBackwardRefs refs[2]);
+    int low_effort, int* const cache_bits,
+    const VP8LHashChain* const hash_chain, VP8LBackwardRefs refs[2]);
 
 #ifdef __cplusplus
 }
diff --git a/src/3rdparty/libwebp/src/enc/filter.c b/src/3rdparty/libwebp/src/enc/filter.c
index 41813cf..e8ea8b4 100644
--- a/src/3rdparty/libwebp/src/enc/filter.c
+++ b/src/3rdparty/libwebp/src/enc/filter.c
@@ -107,10 +107,9 @@ static void DoFilter(const VP8EncIterator* const it, int level) {
 //------------------------------------------------------------------------------
 // SSIM metric
 
-enum { KERNEL = 3 };
 static const double kMinValue = 1.e-10;  // minimal threshold
 
-void VP8SSIMAddStats(const DistoStats* const src, DistoStats* const dst) {
+void VP8SSIMAddStats(const VP8DistoStats* const src, VP8DistoStats* const dst) {
   dst->w   += src->w;
   dst->xm  += src->xm;
   dst->ym  += src->ym;
@@ -119,32 +118,7 @@ void VP8SSIMAddStats(const DistoStats* const src, DistoStats* const dst) {
   dst->yym += src->yym;
 }
 
-static void VP8SSIMAccumulate(const uint8_t* src1, int stride1,
-                              const uint8_t* src2, int stride2,
-                              int xo, int yo, int W, int H,
-                              DistoStats* const stats) {
-  const int ymin = (yo - KERNEL < 0) ? 0 : yo - KERNEL;
-  const int ymax = (yo + KERNEL > H - 1) ? H - 1 : yo + KERNEL;
-  const int xmin = (xo - KERNEL < 0) ? 0 : xo - KERNEL;
-  const int xmax = (xo + KERNEL > W - 1) ? W - 1 : xo + KERNEL;
-  int x, y;
-  src1 += ymin * stride1;
-  src2 += ymin * stride2;
-  for (y = ymin; y <= ymax; ++y, src1 += stride1, src2 += stride2) {
-    for (x = xmin; x <= xmax; ++x) {
-      const int s1 = src1[x];
-      const int s2 = src2[x];
-      stats->w   += 1;
-      stats->xm  += s1;
-      stats->ym  += s2;
-      stats->xxm += s1 * s1;
-      stats->xym += s1 * s2;
-      stats->yym += s2 * s2;
-    }
-  }
-}
-
-double VP8SSIMGet(const DistoStats* const stats) {
+double VP8SSIMGet(const VP8DistoStats* const stats) {
   const double xmxm = stats->xm * stats->xm;
   const double ymym = stats->ym * stats->ym;
   const double xmym = stats->xm * stats->ym;
@@ -165,7 +139,7 @@ double VP8SSIMGet(const DistoStats* const stats) {
   return (fden != 0.) ? fnum / fden : kMinValue;
 }
 
-double VP8SSIMGetSquaredError(const DistoStats* const s) {
+double VP8SSIMGetSquaredError(const VP8DistoStats* const s) {
   if (s->w > 0.) {
     const double iw2 = 1. / (s->w * s->w);
     const double sxx = s->xxm * s->w - s->xm * s->xm;
@@ -177,34 +151,66 @@ double VP8SSIMGetSquaredError(const DistoStats* const s) {
   return kMinValue;
 }
 
+#define LIMIT(A, M)  ((A) > (M) ? (M) : (A))
+static void VP8SSIMAccumulateRow(const uint8_t* src1, int stride1,
+                                 const uint8_t* src2, int stride2,
+                                 int y, int W, int H,
+                                 VP8DistoStats* const stats) {
+  int x = 0;
+  const int w0 = LIMIT(VP8_SSIM_KERNEL, W);
+  for (x = 0; x < w0; ++x) {
+    VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats);
+  }
+  for (; x <= W - 8 + VP8_SSIM_KERNEL; ++x) {
+    VP8SSIMAccumulate(
+        src1 + (y - VP8_SSIM_KERNEL) * stride1 + (x - VP8_SSIM_KERNEL), stride1,
+        src2 + (y - VP8_SSIM_KERNEL) * stride2 + (x - VP8_SSIM_KERNEL), stride2,
+        stats);
+  }
+  for (; x < W; ++x) {
+    VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats);
+  }
+}
+
 void VP8SSIMAccumulatePlane(const uint8_t* src1, int stride1,
                             const uint8_t* src2, int stride2,
-                            int W, int H, DistoStats* const stats) {
+                            int W, int H, VP8DistoStats* const stats) {
   int x, y;
-  for (y = 0; y < H; ++y) {
+  const int h0 = LIMIT(VP8_SSIM_KERNEL, H);
+  const int h1 = LIMIT(VP8_SSIM_KERNEL, H - VP8_SSIM_KERNEL);
+  for (y = 0; y < h0; ++y) {
+    for (x = 0; x < W; ++x) {
+      VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats);
+    }
+  }
+  for (; y < h1; ++y) {
+    VP8SSIMAccumulateRow(src1, stride1, src2, stride2, y, W, H, stats);
+  }
+  for (; y < H; ++y) {
     for (x = 0; x < W; ++x) {
-      VP8SSIMAccumulate(src1, stride1, src2, stride2, x, y, W, H, stats);
+      VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats);
     }
   }
 }
+#undef LIMIT
 
 static double GetMBSSIM(const uint8_t* yuv1, const uint8_t* yuv2) {
   int x, y;
-  DistoStats s = { .0, .0, .0, .0, .0, .0 };
+  VP8DistoStats s = { .0, .0, .0, .0, .0, .0 };
 
   // compute SSIM in a 10 x 10 window
-  for (x = 3; x < 13; x++) {
-    for (y = 3; y < 13; y++) {
-      VP8SSIMAccumulate(yuv1 + Y_OFF_ENC, BPS, yuv2 + Y_OFF_ENC, BPS,
-                        x, y, 16, 16, &s);
+  for (y = VP8_SSIM_KERNEL; y < 16 - VP8_SSIM_KERNEL; y++) {
+    for (x = VP8_SSIM_KERNEL; x < 16 - VP8_SSIM_KERNEL; x++) {
+      VP8SSIMAccumulateClipped(yuv1 + Y_OFF_ENC, BPS, yuv2 + Y_OFF_ENC, BPS,
+                               x, y, 16, 16, &s);
     }
   }
   for (x = 1; x < 7; x++) {
     for (y = 1; y < 7; y++) {
-      VP8SSIMAccumulate(yuv1 + U_OFF_ENC, BPS, yuv2 + U_OFF_ENC, BPS,
-                        x, y, 8, 8, &s);
-      VP8SSIMAccumulate(yuv1 + V_OFF_ENC, BPS, yuv2 + V_OFF_ENC, BPS,
-                        x, y, 8, 8, &s);
+      VP8SSIMAccumulateClipped(yuv1 + U_OFF_ENC, BPS, yuv2 + U_OFF_ENC, BPS,
+                               x, y, 8, 8, &s);
+      VP8SSIMAccumulateClipped(yuv1 + V_OFF_ENC, BPS, yuv2 + V_OFF_ENC, BPS,
+                               x, y, 8, 8, &s);
     }
   }
   return VP8SSIMGet(&s);
@@ -222,6 +228,7 @@ void VP8InitFilter(VP8EncIterator* const it) {
         (*it->lf_stats_)[s][i] = 0;
       }
     }
+    VP8SSIMDspInit();
   }
 }
 
diff --git a/src/3rdparty/libwebp/src/enc/histogram.c b/src/3rdparty/libwebp/src/enc/histogram.c
index 869882d..395372b 100644
--- a/src/3rdparty/libwebp/src/enc/histogram.c
+++ b/src/3rdparty/libwebp/src/enc/histogram.c
@@ -386,29 +386,27 @@ static void UpdateHistogramCost(VP8LHistogram* const h) {
 }
 
 static int GetBinIdForEntropy(double min, double max, double val) {
-  const double range = max - min + 1e-6;
-  const double delta = val - min;
-  return (int)(NUM_PARTITIONS * delta / range);
+  const double range = max - min;
+  if (range > 0.) {
+    const double delta = val - min;
+    return (int)((NUM_PARTITIONS - 1e-6) * delta / range);
+  } else {
+    return 0;
+  }
 }
 
-static int GetHistoBinIndexLowEffort(
-    const VP8LHistogram* const h, const DominantCostRange* const c) {
-  const int bin_id = GetBinIdForEntropy(c->literal_min_, c->literal_max_,
-                                        h->literal_cost_);
+static int GetHistoBinIndex(const VP8LHistogram* const h,
+                            const DominantCostRange* const c, int low_effort) {
+  int bin_id = GetBinIdForEntropy(c->literal_min_, c->literal_max_,
+                                  h->literal_cost_);
   assert(bin_id < NUM_PARTITIONS);
-  return bin_id;
-}
-
-static int GetHistoBinIndex(
-    const VP8LHistogram* const h, const DominantCostRange* const c) {
-  const int bin_id =
-      GetBinIdForEntropy(c->blue_min_, c->blue_max_, h->blue_cost_) +
-      NUM_PARTITIONS * GetBinIdForEntropy(c->red_min_, c->red_max_,
-                                          h->red_cost_) +
-      NUM_PARTITIONS * NUM_PARTITIONS * GetBinIdForEntropy(c->literal_min_,
-                                                           c->literal_max_,
-                                                           h->literal_cost_);
-  assert(bin_id < BIN_SIZE);
+  if (!low_effort) {
+    bin_id = bin_id * NUM_PARTITIONS
+           + GetBinIdForEntropy(c->red_min_, c->red_max_, h->red_cost_);
+    bin_id = bin_id * NUM_PARTITIONS
+           + GetBinIdForEntropy(c->blue_min_, c->blue_max_, h->blue_cost_);
+    assert(bin_id < BIN_SIZE);
+  }
   return bin_id;
 }
 
@@ -469,16 +467,13 @@ static void HistogramAnalyzeEntropyBin(VP8LHistogramSet* const image_histo,
   // bin-hash histograms on three of the dominant (literal, red and blue)
   // symbol costs.
   for (i = 0; i < histo_size; ++i) {
-    int num_histos;
-    VP8LHistogram* const histo = histograms[i];
-    const int16_t bin_id = low_effort ?
-        (int16_t)GetHistoBinIndexLowEffort(histo, &cost_range) :
-        (int16_t)GetHistoBinIndex(histo, &cost_range);
+    const VP8LHistogram* const histo = histograms[i];
+    const int bin_id = GetHistoBinIndex(histo, &cost_range, low_effort);
     const int bin_offset = bin_id * bin_depth;
     // bin_map[n][0] for every bin 'n' maintains the counter for the number of
     // histograms in that bin.
     // Get and increment the num_histos in that bin.
-    num_histos = ++bin_map[bin_offset];
+    const int num_histos = ++bin_map[bin_offset];
     assert(bin_offset + num_histos < bin_depth * BIN_SIZE);
     // Add histogram i'th index at num_histos (last) position in the bin_map.
     bin_map[bin_offset + num_histos] = i;
@@ -636,8 +631,11 @@ static void UpdateQueueFront(HistoQueue* const histo_queue) {
 // -----------------------------------------------------------------------------
 
 static void PreparePair(VP8LHistogram** histograms, int idx1, int idx2,
-                        HistogramPair* const pair,
-                        VP8LHistogram* const histos) {
+                        HistogramPair* const pair) {
+  VP8LHistogram* h1;
+  VP8LHistogram* h2;
+  double sum_cost;
+
   if (idx1 > idx2) {
     const int tmp = idx2;
     idx2 = idx1;
@@ -645,15 +643,17 @@ static void PreparePair(VP8LHistogram** histograms, int idx1, int idx2,
   }
   pair->idx1 = idx1;
   pair->idx2 = idx2;
-  pair->cost_diff =
-      HistogramAddEval(histograms[idx1], histograms[idx2], histos, 0);
-  pair->cost_combo = histos->bit_cost_;
+  h1 = histograms[idx1];
+  h2 = histograms[idx2];
+  sum_cost = h1->bit_cost_ + h2->bit_cost_;
+  pair->cost_combo = 0.;
+  GetCombinedHistogramEntropy(h1, h2, sum_cost, &pair->cost_combo);
+  pair->cost_diff = pair->cost_combo - sum_cost;
 }
 
 // Combines histograms by continuously choosing the one with the highest cost
 // reduction.
-static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo,
-                                  VP8LHistogram* const histos) {
+static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo) {
   int ok = 0;
   int image_histo_size = image_histo->size;
   int i, j;
@@ -672,8 +672,7 @@ static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo,
     clusters[i] = i;
     for (j = i + 1; j < image_histo_size; ++j) {
       // Initialize positions array.
-      PreparePair(histograms, i, j, &histo_queue.queue[histo_queue.size],
-                  histos);
+      PreparePair(histograms, i, j, &histo_queue.queue[histo_queue.size]);
       UpdateQueueFront(&histo_queue);
     }
   }
@@ -715,7 +714,7 @@ static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo,
     for (i = 0; i < image_histo_size; ++i) {
       if (clusters[i] != idx1) {
         PreparePair(histograms, idx1, clusters[i],
-                    &histo_queue.queue[histo_queue.size], histos);
+                    &histo_queue.queue[histo_queue.size]);
         UpdateQueueFront(&histo_queue);
       }
     }
@@ -736,11 +735,10 @@ static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo,
   return ok;
 }
 
-static VP8LHistogram* HistogramCombineStochastic(
-    VP8LHistogramSet* const image_histo,
-    VP8LHistogram* tmp_histo,
-    VP8LHistogram* best_combo,
-    int quality, int min_cluster_size) {
+static void HistogramCombineStochastic(VP8LHistogramSet* const image_histo,
+                                       VP8LHistogram* tmp_histo,
+                                       VP8LHistogram* best_combo,
+                                       int quality, int min_cluster_size) {
   int iter;
   uint32_t seed = 0;
   int tries_with_no_success = 0;
@@ -800,7 +798,6 @@ static VP8LHistogram* HistogramCombineStochastic(
     }
   }
   image_histo->size = image_histo_size;
-  return best_combo;
 }
 
 // -----------------------------------------------------------------------------
@@ -808,24 +805,23 @@ static VP8LHistogram* HistogramCombineStochastic(
 
 // Find the best 'out' histogram for each of the 'in' histograms.
 // Note: we assume that out[]->bit_cost_ is already up-to-date.
-static void HistogramRemap(const VP8LHistogramSet* const orig_histo,
-                           const VP8LHistogramSet* const image_histo,
+static void HistogramRemap(const VP8LHistogramSet* const in,
+                           const VP8LHistogramSet* const out,
                            uint16_t* const symbols) {
   int i;
-  VP8LHistogram** const orig_histograms = orig_histo->histograms;
-  VP8LHistogram** const histograms = image_histo->histograms;
-  const int orig_histo_size = orig_histo->size;
-  const int image_histo_size = image_histo->size;
-  if (image_histo_size > 1) {
-    for (i = 0; i < orig_histo_size; ++i) {
+  VP8LHistogram** const in_histo = in->histograms;
+  VP8LHistogram** const out_histo = out->histograms;
+  const int in_size = in->size;
+  const int out_size = out->size;
+  if (out_size > 1) {
+    for (i = 0; i < in_size; ++i) {
       int best_out = 0;
-      double best_bits =
-          HistogramAddThresh(histograms[0], orig_histograms[i], MAX_COST);
+      double best_bits = MAX_COST;
       int k;
-      for (k = 1; k < image_histo_size; ++k) {
+      for (k = 0; k < out_size; ++k) {
         const double cur_bits =
-            HistogramAddThresh(histograms[k], orig_histograms[i], best_bits);
-        if (cur_bits < best_bits) {
+            HistogramAddThresh(out_histo[k], in_histo[i], best_bits);
+        if (k == 0 || cur_bits < best_bits) {
           best_bits = cur_bits;
           best_out = k;
         }
@@ -833,20 +829,20 @@ static void HistogramRemap(const VP8LHistogramSet* const orig_histo,
       symbols[i] = best_out;
     }
   } else {
-    assert(image_histo_size == 1);
-    for (i = 0; i < orig_histo_size; ++i) {
+    assert(out_size == 1);
+    for (i = 0; i < in_size; ++i) {
       symbols[i] = 0;
     }
   }
 
   // Recompute each out based on raw and symbols.
-  for (i = 0; i < image_histo_size; ++i) {
-    HistogramClear(histograms[i]);
+  for (i = 0; i < out_size; ++i) {
+    HistogramClear(out_histo[i]);
   }
 
-  for (i = 0; i < orig_histo_size; ++i) {
+  for (i = 0; i < in_size; ++i) {
     const int idx = symbols[i];
-    VP8LHistogramAdd(orig_histograms[i], histograms[idx], histograms[idx]);
+    VP8LHistogramAdd(in_histo[i], out_histo[idx], out_histo[idx]);
   }
 }
 
@@ -920,11 +916,10 @@ int VP8LGetHistoImageSymbols(int xsize, int ysize,
     const float x = quality / 100.f;
     // cubic ramp between 1 and MAX_HISTO_GREEDY:
     const int threshold_size = (int)(1 + (x * x * x) * (MAX_HISTO_GREEDY - 1));
-    cur_combo = HistogramCombineStochastic(image_histo,
-                                           tmp_histos->histograms[0],
-                                           cur_combo, quality, threshold_size);
+    HistogramCombineStochastic(image_histo, tmp_histos->histograms[0],
+                               cur_combo, quality, threshold_size);
     if ((image_histo->size <= threshold_size) &&
-        !HistogramCombineGreedy(image_histo, cur_combo)) {
+        !HistogramCombineGreedy(image_histo)) {
       goto Error;
     }
   }
diff --git a/src/3rdparty/libwebp/src/enc/near_lossless.c b/src/3rdparty/libwebp/src/enc/near_lossless.c
index 9bc0f0e..f4ab91f 100644
--- a/src/3rdparty/libwebp/src/enc/near_lossless.c
+++ b/src/3rdparty/libwebp/src/enc/near_lossless.c
@@ -14,6 +14,7 @@
 // Author: Jyrki Alakuijala (jyrki@google.com)
 // Converted to C by Aleksander Kramarz (akramarz@google.com)
 
+#include <assert.h>
 #include <stdlib.h>
 
 #include "../dsp/lossless.h"
@@ -23,42 +24,14 @@
 #define MIN_DIM_FOR_NEAR_LOSSLESS 64
 #define MAX_LIMIT_BITS             5
 
-// Computes quantized pixel value and distance from original value.
-static void GetValAndDistance(int a, int initial, int bits,
-                              int* const val, int* const distance) {
-  const int mask = ~((1 << bits) - 1);
-  *val = (initial & mask) | (initial >> (8 - bits));
-  *distance = 2 * abs(a - *val);
-}
-
-// Clamps the value to range [0, 255].
-static int Clamp8b(int val) {
-  const int min_val = 0;
-  const int max_val = 0xff;
-  return (val < min_val) ? min_val : (val > max_val) ? max_val : val;
-}
-
-// Quantizes values {a, a+(1<<bits), a-(1<<bits)} and returns the nearest one.
+// Quantizes the value up or down to a multiple of 1<<bits (or to 255),
+// choosing the closer one, resolving ties using bankers' rounding.
 static int FindClosestDiscretized(int a, int bits) {
-  int best_val = a, i;
-  int min_distance = 256;
-
-  for (i = -1; i <= 1; ++i) {
-    int candidate, distance;
-    const int val = Clamp8b(a + i * (1 << bits));
-    GetValAndDistance(a, val, bits, &candidate, &distance);
-    if (i != 0) {
-      ++distance;
-    }
-    // Smallest distance but favor i == 0 over i == -1 and i == 1
-    // since that keeps the overall intensity more constant in the
-    // images.
-    if (distance < min_distance) {
-      min_distance = distance;
-      best_val = candidate;
-    }
-  }
-  return best_val;
+  const int mask = (1 << bits) - 1;
+  const int biased = a + (mask >> 1) + ((a >> bits) & 1);
+  assert(bits > 0);
+  if (biased > 0xff) return 0xff;
+  return biased & ~mask;
 }
 
 // Applies FindClosestDiscretized to all channels of pixel.
@@ -124,22 +97,11 @@ static void NearLossless(int xsize, int ysize, uint32_t* argb,
   }
 }
 
-static int QualityToLimitBits(int quality) {
-  // quality mapping:
-  //  0..19 -> 5
-  //  0..39 -> 4
-  //  0..59 -> 3
-  //  0..79 -> 2
-  //  0..99 -> 1
-  //  100   -> 0
-  return MAX_LIMIT_BITS - quality / 20;
-}
-
 int VP8ApplyNearLossless(int xsize, int ysize, uint32_t* argb, int quality) {
   int i;
   uint32_t* const copy_buffer =
       (uint32_t*)WebPSafeMalloc(xsize * 3, sizeof(*copy_buffer));
-  const int limit_bits = QualityToLimitBits(quality);
+  const int limit_bits = VP8LNearLosslessBits(quality);
   assert(argb != NULL);
   assert(limit_bits >= 0);
   assert(limit_bits <= MAX_LIMIT_BITS);
diff --git a/src/3rdparty/libwebp/src/enc/picture.c b/src/3rdparty/libwebp/src/enc/picture.c
index 26679a7..d9befbc 100644
--- a/src/3rdparty/libwebp/src/enc/picture.c
+++ b/src/3rdparty/libwebp/src/enc/picture.c
@@ -237,6 +237,8 @@ static size_t Encode(const uint8_t* rgba, int width, int height, int stride,
   WebPMemoryWriter wrt;
   int ok;
 
+  if (output == NULL) return 0;
+
   if (!WebPConfigPreset(&config, WEBP_PRESET_DEFAULT, quality_factor) ||
       !WebPPictureInit(&pic)) {
     return 0;  // shouldn't happen, except if system installation is broken
diff --git a/src/3rdparty/libwebp/src/enc/picture_csp.c b/src/3rdparty/libwebp/src/enc/picture_csp.c
index 0ef5f9e..607a624 100644
--- a/src/3rdparty/libwebp/src/enc/picture_csp.c
+++ b/src/3rdparty/libwebp/src/enc/picture_csp.c
@@ -1125,32 +1125,44 @@ static int Import(WebPPicture* const picture,
 
 int WebPPictureImportRGB(WebPPicture* picture,
                          const uint8_t* rgb, int rgb_stride) {
-  return (picture != NULL) ? Import(picture, rgb, rgb_stride, 3, 0, 0) : 0;
+  return (picture != NULL && rgb != NULL)
+             ? Import(picture, rgb, rgb_stride, 3, 0, 0)
+             : 0;
 }
 
 int WebPPictureImportBGR(WebPPicture* picture,
                          const uint8_t* rgb, int rgb_stride) {
-  return (picture != NULL) ? Import(picture, rgb, rgb_stride, 3, 1, 0) : 0;
+  return (picture != NULL && rgb != NULL)
+             ? Import(picture, rgb, rgb_stride, 3, 1, 0)
+             : 0;
 }
 
 int WebPPictureImportRGBA(WebPPicture* picture,
                           const uint8_t* rgba, int rgba_stride) {
-  return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 0, 1) : 0;
+  return (picture != NULL && rgba != NULL)
+             ? Import(picture, rgba, rgba_stride, 4, 0, 1)
+             : 0;
 }
 
 int WebPPictureImportBGRA(WebPPicture* picture,
                           const uint8_t* rgba, int rgba_stride) {
-  return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 1, 1) : 0;
+  return (picture != NULL && rgba != NULL)
+             ? Import(picture, rgba, rgba_stride, 4, 1, 1)
+             : 0;
 }
 
 int WebPPictureImportRGBX(WebPPicture* picture,
                           const uint8_t* rgba, int rgba_stride) {
-  return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 0, 0) : 0;
+  return (picture != NULL && rgba != NULL)
+             ? Import(picture, rgba, rgba_stride, 4, 0, 0)
+             : 0;
 }
 
 int WebPPictureImportBGRX(WebPPicture* picture,
                           const uint8_t* rgba, int rgba_stride) {
-  return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 1, 0) : 0;
+  return (picture != NULL && rgba != NULL)
+             ? Import(picture, rgba, rgba_stride, 4, 1, 0)
+             : 0;
 }
 
 //------------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/enc/picture_psnr.c b/src/3rdparty/libwebp/src/enc/picture_psnr.c
index 40214ef..81ab1b5 100644
--- a/src/3rdparty/libwebp/src/enc/picture_psnr.c
+++ b/src/3rdparty/libwebp/src/enc/picture_psnr.c
@@ -27,7 +27,7 @@
 
 static void AccumulateLSIM(const uint8_t* src, int src_stride,
                            const uint8_t* ref, int ref_stride,
-                           int w, int h, DistoStats* stats) {
+                           int w, int h, VP8DistoStats* stats) {
   int x, y;
   double total_sse = 0.;
   for (y = 0; y < h; ++y) {
@@ -71,11 +71,13 @@ static float GetPSNR(const double v) {
 
 int WebPPictureDistortion(const WebPPicture* src, const WebPPicture* ref,
                           int type, float result[5]) {
-  DistoStats stats[5];
+  VP8DistoStats stats[5];
   int w, h;
 
   memset(stats, 0, sizeof(stats));
 
+  VP8SSIMDspInit();
+
   if (src == NULL || ref == NULL ||
       src->width != ref->width || src->height != ref->height ||
       src->use_argb != ref->use_argb || result == NULL) {
diff --git a/src/3rdparty/libwebp/src/enc/quant.c b/src/3rdparty/libwebp/src/enc/quant.c
index dd6885a..549ad26 100644
--- a/src/3rdparty/libwebp/src/enc/quant.c
+++ b/src/3rdparty/libwebp/src/enc/quant.c
@@ -30,8 +30,6 @@
 #define SNS_TO_DQ 0.9     // Scaling constant between the sns value and the QP
                           // power-law modulation. Must be strictly less than 1.
 
-#define I4_PENALTY 14000  // Rate-penalty for quick i4/i16 decision
-
 // number of non-zero coeffs below which we consider the block very flat
 // (and apply a penalty to complex predictions)
 #define FLATNESS_LIMIT_I16 10      // I16 mode
@@ -236,6 +234,8 @@ static int ExpandMatrix(VP8Matrix* const m, int type) {
   return (sum + 8) >> 4;
 }
 
+static void CheckLambdaValue(int* const v) { if (*v < 1) *v = 1; }
+
 static void SetupMatrices(VP8Encoder* enc) {
   int i;
   const int tlambda_scale =
@@ -245,7 +245,7 @@ static void SetupMatrices(VP8Encoder* enc) {
   for (i = 0; i < num_segments; ++i) {
     VP8SegmentInfo* const m = &enc->dqm_[i];
     const int q = m->quant_;
-    int q4, q16, quv;
+    int q_i4, q_i16, q_uv;
     m->y1_.q_[0] = kDcTable[clip(q + enc->dq_y1_dc_, 0, 127)];
     m->y1_.q_[1] = kAcTable[clip(q,                  0, 127)];
 
@@ -255,21 +255,33 @@ static void SetupMatrices(VP8Encoder* enc) {
     m->uv_.q_[0] = kDcTable[clip(q + enc->dq_uv_dc_, 0, 117)];
     m->uv_.q_[1] = kAcTable[clip(q + enc->dq_uv_ac_, 0, 127)];
 
-    q4  = ExpandMatrix(&m->y1_, 0);
-    q16 = ExpandMatrix(&m->y2_, 1);
-    quv = ExpandMatrix(&m->uv_, 2);
-
-    m->lambda_i4_          = (3 * q4 * q4) >> 7;
-    m->lambda_i16_         = (3 * q16 * q16);
-    m->lambda_uv_          = (3 * quv * quv) >> 6;
-    m->lambda_mode_        = (1 * q4 * q4) >> 7;
-    m->lambda_trellis_i4_  = (7 * q4 * q4) >> 3;
-    m->lambda_trellis_i16_ = (q16 * q16) >> 2;
-    m->lambda_trellis_uv_  = (quv *quv) << 1;
-    m->tlambda_            = (tlambda_scale * q4) >> 5;
+    q_i4  = ExpandMatrix(&m->y1_, 0);
+    q_i16 = ExpandMatrix(&m->y2_, 1);
+    q_uv  = ExpandMatrix(&m->uv_, 2);
+
+    m->lambda_i4_          = (3 * q_i4 * q_i4) >> 7;
+    m->lambda_i16_         = (3 * q_i16 * q_i16);
+    m->lambda_uv_          = (3 * q_uv * q_uv) >> 6;
+    m->lambda_mode_        = (1 * q_i4 * q_i4) >> 7;
+    m->lambda_trellis_i4_  = (7 * q_i4 * q_i4) >> 3;
+    m->lambda_trellis_i16_ = (q_i16 * q_i16) >> 2;
+    m->lambda_trellis_uv_  = (q_uv * q_uv) << 1;
+    m->tlambda_            = (tlambda_scale * q_i4) >> 5;
+
+    // none of these constants should be < 1
+    CheckLambdaValue(&m->lambda_i4_);
+    CheckLambdaValue(&m->lambda_i16_);
+    CheckLambdaValue(&m->lambda_uv_);
+    CheckLambdaValue(&m->lambda_mode_);
+    CheckLambdaValue(&m->lambda_trellis_i4_);
+    CheckLambdaValue(&m->lambda_trellis_i16_);
+    CheckLambdaValue(&m->lambda_trellis_uv_);
+    CheckLambdaValue(&m->tlambda_);
 
     m->min_disto_ = 10 * m->y1_.q_[0];   // quantization-aware min disto
     m->max_edge_  = 0;
+
+    m->i4_penalty_ = 1000 * q_i4 * q_i4;
   }
 }
 
@@ -348,7 +360,12 @@ static int SegmentsAreEquivalent(const VP8SegmentInfo* const S1,
 
 static void SimplifySegments(VP8Encoder* const enc) {
   int map[NUM_MB_SEGMENTS] = { 0, 1, 2, 3 };
-  const int num_segments = enc->segment_hdr_.num_segments_;
+  // 'num_segments_' is previously validated and <= NUM_MB_SEGMENTS, but an
+  // explicit check is needed to avoid a spurious warning about 'i' exceeding
+  // array bounds of 'dqm_' with some compilers (noticed with gcc-4.9).
+  const int num_segments = (enc->segment_hdr_.num_segments_ < NUM_MB_SEGMENTS)
+                               ? enc->segment_hdr_.num_segments_
+                               : NUM_MB_SEGMENTS;
   int num_final_segments = 1;
   int s1, s2;
   for (s1 = 1; s1 < num_segments; ++s1) {    // find similar segments
@@ -1127,19 +1144,29 @@ static void RefineUsingDistortion(VP8EncIterator* const it,
                                   int try_both_modes, int refine_uv_mode,
                                   VP8ModeScore* const rd) {
   score_t best_score = MAX_COST;
-  score_t score_i4 = (score_t)I4_PENALTY;
-  int16_t tmp_levels[16][16];
-  uint8_t modes_i4[16];
   int nz = 0;
   int mode;
   int is_i16 = try_both_modes || (it->mb_->type_ == 1);
 
+  const VP8SegmentInfo* const dqm = &it->enc_->dqm_[it->mb_->segment_];
+  // Some empiric constants, of approximate order of magnitude.
+  const int lambda_d_i16 = 106;
+  const int lambda_d_i4 = 11;
+  const int lambda_d_uv = 120;
+  score_t score_i4 = dqm->i4_penalty_;
+  score_t i4_bit_sum = 0;
+  const score_t bit_limit = it->enc_->mb_header_limit_;
+
   if (is_i16) {   // First, evaluate Intra16 distortion
     int best_mode = -1;
     const uint8_t* const src = it->yuv_in_ + Y_OFF_ENC;
     for (mode = 0; mode < NUM_PRED_MODES; ++mode) {
       const uint8_t* const ref = it->yuv_p_ + VP8I16ModeOffsets[mode];
-      const score_t score = VP8SSE16x16(src, ref);
+      const score_t score = VP8SSE16x16(src, ref) * RD_DISTO_MULT
+                          + VP8FixedCostsI16[mode] * lambda_d_i16;
+      if (mode > 0 && VP8FixedCostsI16[mode] > bit_limit) {
+        continue;
+      }
       if (score < best_score) {
         best_mode = mode;
         best_score = score;
@@ -1159,25 +1186,28 @@ static void RefineUsingDistortion(VP8EncIterator* const it,
       int best_i4_mode = -1;
       score_t best_i4_score = MAX_COST;
       const uint8_t* const src = it->yuv_in_ + Y_OFF_ENC + VP8Scan[it->i4_];
+      const uint16_t* const mode_costs = GetCostModeI4(it, rd->modes_i4);
 
       VP8MakeIntra4Preds(it);
       for (mode = 0; mode < NUM_BMODES; ++mode) {
         const uint8_t* const ref = it->yuv_p_ + VP8I4ModeOffsets[mode];
-        const score_t score = VP8SSE4x4(src, ref);
+        const score_t score = VP8SSE4x4(src, ref) * RD_DISTO_MULT
+                            + mode_costs[mode] * lambda_d_i4;
         if (score < best_i4_score) {
           best_i4_mode = mode;
           best_i4_score = score;
         }
       }
-      modes_i4[it->i4_] = best_i4_mode;
+      i4_bit_sum += mode_costs[best_i4_mode];
+      rd->modes_i4[it->i4_] = best_i4_mode;
       score_i4 += best_i4_score;
-      if (score_i4 >= best_score) {
+      if (score_i4 >= best_score || i4_bit_sum > bit_limit) {
         // Intra4 won't be better than Intra16. Bail out and pick Intra16.
         is_i16 = 1;
         break;
       } else {  // reconstruct partial block inside yuv_out2_ buffer
         uint8_t* const tmp_dst = it->yuv_out2_ + Y_OFF_ENC + VP8Scan[it->i4_];
-        nz |= ReconstructIntra4(it, tmp_levels[it->i4_],
+        nz |= ReconstructIntra4(it, rd->y_ac_levels[it->i4_],
                                 src, tmp_dst, best_i4_mode) << it->i4_;
       }
     } while (VP8IteratorRotateI4(it, it->yuv_out2_ + Y_OFF_ENC));
@@ -1185,8 +1215,7 @@ static void RefineUsingDistortion(VP8EncIterator* const it,
 
   // Final reconstruction, depending on which mode is selected.
   if (!is_i16) {
-    VP8SetIntra4Mode(it, modes_i4);
-    memcpy(rd->y_ac_levels, tmp_levels, sizeof(tmp_levels));
+    VP8SetIntra4Mode(it, rd->modes_i4);
     SwapOut(it);
     best_score = score_i4;
   } else {
@@ -1200,7 +1229,8 @@ static void RefineUsingDistortion(VP8EncIterator* const it,
     const uint8_t* const src = it->yuv_in_ + U_OFF_ENC;
     for (mode = 0; mode < NUM_PRED_MODES; ++mode) {
       const uint8_t* const ref = it->yuv_p_ + VP8UVModeOffsets[mode];
-      const score_t score = VP8SSE16x8(src, ref);
+      const score_t score = VP8SSE16x8(src, ref) * RD_DISTO_MULT
+                          + VP8FixedCostsUV[mode] * lambda_d_uv;
       if (score < best_uv_score) {
         best_mode = mode;
         best_uv_score = score;
diff --git a/src/3rdparty/libwebp/src/enc/vp8enci.h b/src/3rdparty/libwebp/src/enc/vp8enci.h
index b2cc8d1..c1fbd76 100644
--- a/src/3rdparty/libwebp/src/enc/vp8enci.h
+++ b/src/3rdparty/libwebp/src/enc/vp8enci.h
@@ -22,10 +22,6 @@
 #include "../utils/utils.h"
 #include "../webp/encode.h"
 
-#ifdef WEBP_EXPERIMENTAL_FEATURES
-#include "./vp8li.h"
-#endif  // WEBP_EXPERIMENTAL_FEATURES
-
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -36,7 +32,7 @@ extern "C" {
 // version numbers
 #define ENC_MAJ_VERSION 0
 #define ENC_MIN_VERSION 5
-#define ENC_REV_VERSION 0
+#define ENC_REV_VERSION 1
 
 enum { MAX_LF_LEVELS = 64,       // Maximum loop filter level
        MAX_VARIABLE_LEVEL = 67,  // last (inclusive) level with variable cost
@@ -200,6 +196,9 @@ typedef struct {
   int lambda_i16_, lambda_i4_, lambda_uv_;
   int lambda_mode_, lambda_trellis_, tlambda_;
   int lambda_trellis_i16_, lambda_trellis_i4_, lambda_trellis_uv_;
+
+  // lambda values for distortion-based evaluation
+  score_t i4_penalty_;   // penalty for using Intra4
 } VP8SegmentInfo;
 
 // Handy transient struct to accumulate score and info during RD-optimization
@@ -395,6 +394,7 @@ struct VP8Encoder {
   int method_;               // 0=fastest, 6=best/slowest.
   VP8RDLevel rd_opt_level_;  // Deduced from method_.
   int max_i4_header_bits_;   // partition #0 safeness factor
+  int mb_header_limit_;      // rough limit for header bits per MB
   int thread_level_;         // derived from config->thread_level
   int do_search_;            // derived from config->target_XXX
   int use_tokens_;           // if true, use token buffer
@@ -477,17 +477,12 @@ int VP8EncFinishAlpha(VP8Encoder* const enc);   // finalize compressed data
 int VP8EncDeleteAlpha(VP8Encoder* const enc);   // delete compressed data
 
   // in filter.c
-
-// SSIM utils
-typedef struct {
-  double w, xm, ym, xxm, xym, yym;
-} DistoStats;
-void VP8SSIMAddStats(const DistoStats* const src, DistoStats* const dst);
+void VP8SSIMAddStats(const VP8DistoStats* const src, VP8DistoStats* const dst);
 void VP8SSIMAccumulatePlane(const uint8_t* src1, int stride1,
                             const uint8_t* src2, int stride2,
-                            int W, int H, DistoStats* const stats);
-double VP8SSIMGet(const DistoStats* const stats);
-double VP8SSIMGetSquaredError(const DistoStats* const stats);
+                            int W, int H, VP8DistoStats* const stats);
+double VP8SSIMGet(const VP8DistoStats* const stats);
+double VP8SSIMGetSquaredError(const VP8DistoStats* const stats);
 
 // autofilter
 void VP8InitFilter(VP8EncIterator* const it);
diff --git a/src/3rdparty/libwebp/src/enc/vp8l.c b/src/3rdparty/libwebp/src/enc/vp8l.c
index db94e78..c16e256 100644
--- a/src/3rdparty/libwebp/src/enc/vp8l.c
+++ b/src/3rdparty/libwebp/src/enc/vp8l.c
@@ -126,54 +126,8 @@ static int AnalyzeAndCreatePalette(const WebPPicture* const pic,
                                    int low_effort,
                                    uint32_t palette[MAX_PALETTE_SIZE],
                                    int* const palette_size) {
-  int i, x, y, key;
-  int num_colors = 0;
-  uint8_t in_use[MAX_PALETTE_SIZE * 4] = { 0 };
-  uint32_t colors[MAX_PALETTE_SIZE * 4];
-  static const uint32_t kHashMul = 0x1e35a7bd;
-  const uint32_t* argb = pic->argb;
-  const int width = pic->width;
-  const int height = pic->height;
-  uint32_t last_pix = ~argb[0];   // so we're sure that last_pix != argb[0]
-
-  for (y = 0; y < height; ++y) {
-    for (x = 0; x < width; ++x) {
-      if (argb[x] == last_pix) {
-        continue;
-      }
-      last_pix = argb[x];
-      key = (kHashMul * last_pix) >> PALETTE_KEY_RIGHT_SHIFT;
-      while (1) {
-        if (!in_use[key]) {
-          colors[key] = last_pix;
-          in_use[key] = 1;
-          ++num_colors;
-          if (num_colors > MAX_PALETTE_SIZE) {
-            return 0;
-          }
-          break;
-        } else if (colors[key] == last_pix) {
-          // The color is already there.
-          break;
-        } else {
-          // Some other color sits there.
-          // Do linear conflict resolution.
-          ++key;
-          key &= (MAX_PALETTE_SIZE * 4 - 1);  // key mask for 1K buffer.
-        }
-      }
-    }
-    argb += pic->argb_stride;
-  }
-
-  // TODO(skal): could we reuse in_use[] to speed up EncodePalette()?
-  num_colors = 0;
-  for (i = 0; i < (int)(sizeof(in_use) / sizeof(in_use[0])); ++i) {
-    if (in_use[i]) {
-      palette[num_colors] = colors[i];
-      ++num_colors;
-    }
-  }
+  const int num_colors = WebPGetColorPalette(pic, palette);
+  if (num_colors > MAX_PALETTE_SIZE) return 0;
   *palette_size = num_colors;
   qsort(palette, num_colors, sizeof(*palette), PaletteCompareColorsForQsort);
   if (!low_effort && PaletteHasNonMonotonousDeltas(palette, num_colors)) {
@@ -336,7 +290,7 @@ static int AnalyzeEntropy(const uint32_t* argb,
         }
       }
     }
-    free(histo);
+    WebPSafeFree(histo);
     return 1;
   } else {
     return 0;
@@ -761,6 +715,10 @@ static WebPEncodingError EncodeImageNoHuffman(VP8LBitWriter* const bw,
   }
 
   // Calculate backward references from ARGB image.
+  if (VP8LHashChainFill(hash_chain, quality, argb, width, height) == 0) {
+    err = VP8_ENC_ERROR_OUT_OF_MEMORY;
+    goto Error;
+  }
   refs = VP8LGetBackwardReferences(width, height, argb, quality, 0, &cache_bits,
                                    hash_chain, refs_array);
   if (refs == NULL) {
@@ -824,7 +782,8 @@ static WebPEncodingError EncodeImageInternal(VP8LBitWriter* const bw,
                                              VP8LHashChain* const hash_chain,
                                              VP8LBackwardRefs refs_array[2],
                                              int width, int height, int quality,
-                                             int low_effort, int* cache_bits,
+                                             int low_effort,
+                                             int use_cache, int* cache_bits,
                                              int histogram_bits,
                                              size_t init_byte_position,
                                              int* const hdr_size,
@@ -856,10 +815,14 @@ static WebPEncodingError EncodeImageInternal(VP8LBitWriter* const bw,
     goto Error;
   }
 
-  *cache_bits = MAX_COLOR_CACHE_BITS;
+  *cache_bits = use_cache ? MAX_COLOR_CACHE_BITS : 0;
   // 'best_refs' is the reference to the best backward refs and points to one
   // of refs_array[0] or refs_array[1].
   // Calculate backward references from ARGB image.
+  if (VP8LHashChainFill(hash_chain, quality, argb, width, height) == 0) {
+    err = VP8_ENC_ERROR_OUT_OF_MEMORY;
+    goto Error;
+  }
   best_refs = VP8LGetBackwardReferences(width, height, argb, quality,
                                         low_effort, cache_bits, hash_chain,
                                         refs_array);
@@ -1007,14 +970,19 @@ static void ApplySubtractGreen(VP8LEncoder* const enc, int width, int height,
 static WebPEncodingError ApplyPredictFilter(const VP8LEncoder* const enc,
                                             int width, int height,
                                             int quality, int low_effort,
+                                            int used_subtract_green,
                                             VP8LBitWriter* const bw) {
   const int pred_bits = enc->transform_bits_;
   const int transform_width = VP8LSubSampleSize(width, pred_bits);
   const int transform_height = VP8LSubSampleSize(height, pred_bits);
+  // we disable near-lossless quantization if palette is used.
+  const int near_lossless_strength = enc->use_palette_ ? 100
+                                   : enc->config_->near_lossless;
 
   VP8LResidualImage(width, height, pred_bits, low_effort, enc->argb_,
                     enc->argb_scratch_, enc->transform_data_,
-                    enc->config_->exact);
+                    near_lossless_strength, enc->config_->exact,
+                    used_subtract_green);
   VP8LPutBits(bw, TRANSFORM_PRESENT, 1);
   VP8LPutBits(bw, PREDICTOR_TRANSFORM, 2);
   assert(pred_bits >= 2);
@@ -1114,6 +1082,12 @@ static WebPEncodingError WriteImage(const WebPPicture* const pic,
 
 // -----------------------------------------------------------------------------
 
+static void ClearTransformBuffer(VP8LEncoder* const enc) {
+  WebPSafeFree(enc->transform_mem_);
+  enc->transform_mem_ = NULL;
+  enc->transform_mem_size_ = 0;
+}
+
 // Allocates the memory for argb (W x H) buffer, 2 rows of context for
 // prediction and transform data.
 // Flags influencing the memory allocated:
@@ -1122,43 +1096,48 @@ static WebPEncodingError WriteImage(const WebPPicture* const pic,
 static WebPEncodingError AllocateTransformBuffer(VP8LEncoder* const enc,
                                                  int width, int height) {
   WebPEncodingError err = VP8_ENC_OK;
-  if (enc->argb_ == NULL) {
-    const int tile_size = 1 << enc->transform_bits_;
-    const uint64_t image_size = width * height;
-    // Ensure enough size for tiles, as well as for two scanlines and two
-    // extra pixels for CopyImageWithPrediction.
-    const uint64_t argb_scratch_size =
-        enc->use_predict_ ? tile_size * width + width + 2 : 0;
-    const int transform_data_size =
-        (enc->use_predict_ || enc->use_cross_color_)
-            ? VP8LSubSampleSize(width, enc->transform_bits_) *
-              VP8LSubSampleSize(height, enc->transform_bits_)
-            : 0;
-    const uint64_t total_size =
-        image_size + WEBP_ALIGN_CST +
-        argb_scratch_size + WEBP_ALIGN_CST +
-        (uint64_t)transform_data_size;
-    uint32_t* mem = (uint32_t*)WebPSafeMalloc(total_size, sizeof(*mem));
+  const uint64_t image_size = width * height;
+  // VP8LResidualImage needs room for 2 scanlines of uint32 pixels with an extra
+  // pixel in each, plus 2 regular scanlines of bytes.
+  // TODO(skal): Clean up by using arithmetic in bytes instead of words.
+  const uint64_t argb_scratch_size =
+      enc->use_predict_
+          ? (width + 1) * 2 +
+            (width * 2 + sizeof(uint32_t) - 1) / sizeof(uint32_t)
+          : 0;
+  const uint64_t transform_data_size =
+      (enc->use_predict_ || enc->use_cross_color_)
+          ? VP8LSubSampleSize(width, enc->transform_bits_) *
+                VP8LSubSampleSize(height, enc->transform_bits_)
+          : 0;
+  const uint64_t max_alignment_in_words =
+      (WEBP_ALIGN_CST + sizeof(uint32_t) - 1) / sizeof(uint32_t);
+  const uint64_t mem_size =
+      image_size + max_alignment_in_words +
+      argb_scratch_size + max_alignment_in_words +
+      transform_data_size;
+  uint32_t* mem = enc->transform_mem_;
+  if (mem == NULL || mem_size > enc->transform_mem_size_) {
+    ClearTransformBuffer(enc);
+    mem = (uint32_t*)WebPSafeMalloc(mem_size, sizeof(*mem));
     if (mem == NULL) {
       err = VP8_ENC_ERROR_OUT_OF_MEMORY;
       goto Error;
     }
-    enc->argb_ = mem;
-    mem = (uint32_t*)WEBP_ALIGN(mem + image_size);
-    enc->argb_scratch_ = mem;
-    mem = (uint32_t*)WEBP_ALIGN(mem + argb_scratch_size);
-    enc->transform_data_ = mem;
-    enc->current_width_ = width;
+    enc->transform_mem_ = mem;
+    enc->transform_mem_size_ = (size_t)mem_size;
   }
+  enc->argb_ = mem;
+  mem = (uint32_t*)WEBP_ALIGN(mem + image_size);
+  enc->argb_scratch_ = mem;
+  mem = (uint32_t*)WEBP_ALIGN(mem + argb_scratch_size);
+  enc->transform_data_ = mem;
+
+  enc->current_width_ = width;
  Error:
   return err;
 }
 
-static void ClearTransformBuffer(VP8LEncoder* const enc) {
-  WebPSafeFree(enc->argb_);
-  enc->argb_ = NULL;
-}
-
 static WebPEncodingError MakeInputImageCopy(VP8LEncoder* const enc) {
   WebPEncodingError err = VP8_ENC_OK;
   const WebPPicture* const picture = enc->pic_;
@@ -1178,8 +1157,35 @@ static WebPEncodingError MakeInputImageCopy(VP8LEncoder* const enc) {
 
 // -----------------------------------------------------------------------------
 
-static void MapToPalette(const uint32_t palette[], int num_colors,
+static int SearchColor(const uint32_t sorted[], uint32_t color, int hi) {
+  int low = 0;
+  if (sorted[low] == color) return low;  // loop invariant: sorted[low] != color
+  while (1) {
+    const int mid = (low + hi) >> 1;
+    if (sorted[mid] == color) {
+      return mid;
+    } else if (sorted[mid] < color) {
+      low = mid;
+    } else {
+      hi = mid;
+    }
+  }
+}
+
+// Sort palette in increasing order and prepare an inverse mapping array.
+static void PrepareMapToPalette(const uint32_t palette[], int num_colors,
+                                uint32_t sorted[], int idx_map[]) {
+  int i;
+  memcpy(sorted, palette, num_colors * sizeof(*sorted));
+  qsort(sorted, num_colors, sizeof(*sorted), PaletteCompareColorsForQsort);
+  for (i = 0; i < num_colors; ++i) {
+    idx_map[SearchColor(sorted, palette[i], num_colors)] = i;
+  }
+}
+
+static void MapToPalette(const uint32_t sorted_palette[], int num_colors,
                          uint32_t* const last_pix, int* const last_idx,
+                         const int idx_map[],
                          const uint32_t* src, uint8_t* dst, int width) {
   int x;
   int prev_idx = *last_idx;
@@ -1187,14 +1193,8 @@ static void MapToPalette(const uint32_t palette[], int num_colors,
   for (x = 0; x < width; ++x) {
     const uint32_t pix = src[x];
     if (pix != prev_pix) {
-      int i;
-      for (i = 0; i < num_colors; ++i) {
-        if (pix == palette[i]) {
-          prev_idx = i;
-          prev_pix = pix;
-          break;
-        }
-      }
+      prev_idx = idx_map[SearchColor(sorted_palette, pix, num_colors)];
+      prev_pix = pix;
     }
     dst[x] = prev_idx;
   }
@@ -1241,11 +1241,16 @@ static WebPEncodingError ApplyPalette(const uint32_t* src, uint32_t src_stride,
     }
   } else {
     // Use 1 pixel cache for ARGB pixels.
-    uint32_t last_pix = palette[0];
-    int last_idx = 0;
+    uint32_t last_pix;
+    int last_idx;
+    uint32_t sorted[MAX_PALETTE_SIZE];
+    int idx_map[MAX_PALETTE_SIZE];
+    PrepareMapToPalette(palette, palette_size, sorted, idx_map);
+    last_pix = palette[0];
+    last_idx = 0;
     for (y = 0; y < height; ++y) {
-      MapToPalette(palette, palette_size, &last_pix, &last_idx,
-                   src, tmp_row, width);
+      MapToPalette(sorted, palette_size, &last_pix, &last_idx,
+                   idx_map, src, tmp_row, width);
       VP8LBundleColorMap(tmp_row, width, xbits, dst);
       src += src_stride;
       dst += dst_stride;
@@ -1378,7 +1383,7 @@ static void VP8LEncoderDelete(VP8LEncoder* enc) {
 
 WebPEncodingError VP8LEncodeStream(const WebPConfig* const config,
                                    const WebPPicture* const picture,
-                                   VP8LBitWriter* const bw) {
+                                   VP8LBitWriter* const bw, int use_cache) {
   WebPEncodingError err = VP8_ENC_OK;
   const int quality = (int)config->quality;
   const int low_effort = (config->method == 0);
@@ -1405,7 +1410,8 @@ WebPEncodingError VP8LEncodeStream(const WebPConfig* const config,
   }
 
   // Apply near-lossless preprocessing.
-  use_near_lossless = !enc->use_palette_ && (config->near_lossless < 100);
+  use_near_lossless =
+      (config->near_lossless < 100) && !enc->use_palette_ && !enc->use_predict_;
   if (use_near_lossless) {
     if (!VP8ApplyNearLossless(width, height, picture->argb,
                               config->near_lossless)) {
@@ -1457,7 +1463,7 @@ WebPEncodingError VP8LEncodeStream(const WebPConfig* const config,
 
     if (enc->use_predict_) {
       err = ApplyPredictFilter(enc, enc->current_width_, height, quality,
-                               low_effort, bw);
+                               low_effort, enc->use_subtract_green_, bw);
       if (err != VP8_ENC_OK) goto Error;
     }
 
@@ -1474,8 +1480,8 @@ WebPEncodingError VP8LEncodeStream(const WebPConfig* const config,
   // Encode and write the transformed image.
   err = EncodeImageInternal(bw, enc->argb_, &enc->hash_chain_, enc->refs_,
                             enc->current_width_, height, quality, low_effort,
-                            &enc->cache_bits_, enc->histo_bits_, byte_position,
-                            &hdr_size, &data_size);
+                            use_cache, &enc->cache_bits_, enc->histo_bits_,
+                            byte_position, &hdr_size, &data_size);
   if (err != VP8_ENC_OK) goto Error;
 
   if (picture->stats != NULL) {
@@ -1560,7 +1566,7 @@ int VP8LEncodeImage(const WebPConfig* const config,
   if (!WebPReportProgress(picture, 5, &percent)) goto UserAbort;
 
   // Encode main image stream.
-  err = VP8LEncodeStream(config, picture, &bw);
+  err = VP8LEncodeStream(config, picture, &bw, 1 /*use_cache*/);
   if (err != VP8_ENC_OK) goto Error;
 
   // TODO(skal): have a fine-grained progress report in VP8LEncodeStream().
diff --git a/src/3rdparty/libwebp/src/enc/vp8li.h b/src/3rdparty/libwebp/src/enc/vp8li.h
index 6b6db12..371e276 100644
--- a/src/3rdparty/libwebp/src/enc/vp8li.h
+++ b/src/3rdparty/libwebp/src/enc/vp8li.h
@@ -25,14 +25,17 @@ extern "C" {
 #endif
 
 typedef struct {
-  const WebPConfig* config_;    // user configuration and parameters
-  const WebPPicture* pic_;      // input picture.
+  const WebPConfig* config_;      // user configuration and parameters
+  const WebPPicture* pic_;        // input picture.
 
-  uint32_t* argb_;              // Transformed argb image data.
-  uint32_t* argb_scratch_;      // Scratch memory for argb rows
-                                // (used for prediction).
-  uint32_t* transform_data_;    // Scratch memory for transform data.
-  int       current_width_;     // Corresponds to packed image width.
+  uint32_t* argb_;                // Transformed argb image data.
+  uint32_t* argb_scratch_;        // Scratch memory for argb rows
+                                  // (used for prediction).
+  uint32_t* transform_data_;      // Scratch memory for transform data.
+  uint32_t* transform_mem_;       // Currently allocated memory.
+  size_t    transform_mem_size_;  // Currently allocated memory size.
+
+  int       current_width_;       // Corresponds to packed image width.
 
   // Encoding parameters derived from quality parameter.
   int histo_bits_;
@@ -64,9 +67,10 @@ int VP8LEncodeImage(const WebPConfig* const config,
                     const WebPPicture* const picture);
 
 // Encodes the main image stream using the supplied bit writer.
+// If 'use_cache' is false, disables the use of color cache.
 WebPEncodingError VP8LEncodeStream(const WebPConfig* const config,
                                    const WebPPicture* const picture,
-                                   VP8LBitWriter* const bw);
+                                   VP8LBitWriter* const bw, int use_cache);
 
 //------------------------------------------------------------------------------
 
diff --git a/src/3rdparty/libwebp/src/enc/webpenc.c b/src/3rdparty/libwebp/src/enc/webpenc.c
index fece736..a7d04ea 100644
--- a/src/3rdparty/libwebp/src/enc/webpenc.c
+++ b/src/3rdparty/libwebp/src/enc/webpenc.c
@@ -105,6 +105,10 @@ static void MapConfigToTools(VP8Encoder* const enc) {
       256 * 16 * 16 *                 // upper bound: up to 16bit per 4x4 block
       (limit * limit) / (100 * 100);  // ... modulated with a quadratic curve.
 
+  // partition0 = 512k max.
+  enc->mb_header_limit_ =
+      (score_t)256 * 510 * 8 * 1024 / (enc->mb_w_ * enc->mb_h_);
+
   enc->thread_level_ = config->thread_level;
 
   enc->do_search_ = (config->target_size > 0 || config->target_PSNR > 0);
diff --git a/src/3rdparty/libwebp/src/mux/anim_encode.c b/src/3rdparty/libwebp/src/mux/anim_encode.c
index fa86eaa..53e2906 100644
--- a/src/3rdparty/libwebp/src/mux/anim_encode.c
+++ b/src/3rdparty/libwebp/src/mux/anim_encode.c
@@ -12,7 +12,9 @@
 
 #include <assert.h>
 #include <limits.h>
+#include <math.h>    // for pow()
 #include <stdio.h>
+#include <stdlib.h>  // for abs()
 
 #include "../utils/utils.h"
 #include "../webp/decode.h"
@@ -49,8 +51,10 @@ struct WebPAnimEncoder {
 
   FrameRect prev_rect_;               // Previous WebP frame rectangle.
   WebPConfig last_config_;            // Cached in case a re-encode is needed.
-  WebPConfig last_config2_;           // 2nd cached config; only valid if
-                                      // 'options_.allow_mixed' is true.
+  WebPConfig last_config_reversed_;   // If 'last_config_' uses lossless, then
+                                      // this config uses lossy and vice versa;
+                                      // only valid if 'options_.allow_mixed'
+                                      // is true.
 
   WebPPicture* curr_canvas_;          // Only pointer; we don't own memory.
 
@@ -173,6 +177,7 @@ static void DefaultEncoderOptions(WebPAnimEncoderOptions* const enc_options) {
   enc_options->minimize_size = 0;
   DisableKeyframes(enc_options);
   enc_options->allow_mixed = 0;
+  enc_options->verbose = 0;
 }
 
 int WebPAnimEncoderOptionsInitInternal(WebPAnimEncoderOptions* enc_options,
@@ -185,7 +190,8 @@ int WebPAnimEncoderOptionsInitInternal(WebPAnimEncoderOptions* enc_options,
   return 1;
 }
 
-#define TRANSPARENT_COLOR   0x00ffffff
+// This starting value is more fit to WebPCleanupTransparentAreaLossless().
+#define TRANSPARENT_COLOR   0x00000000
 
 static void ClearRectangle(WebPPicture* const picture,
                            int left, int top, int width, int height) {
@@ -338,11 +344,16 @@ static EncodedFrame* GetFrame(const WebPAnimEncoder* const enc,
   return &enc->encoded_frames_[enc->start_ + position];
 }
 
-// Returns true if 'length' number of pixels in 'src' and 'dst' are identical,
+typedef int (*ComparePixelsFunc)(const uint32_t*, int, const uint32_t*, int,
+                                 int, int);
+
+// Returns true if 'length' number of pixels in 'src' and 'dst' are equal,
 // assuming the given step sizes between pixels.
-static WEBP_INLINE int ComparePixels(const uint32_t* src, int src_step,
-                                     const uint32_t* dst, int dst_step,
-                                     int length) {
+// 'max_allowed_diff' is unused and only there to allow function pointer use.
+static WEBP_INLINE int ComparePixelsLossless(const uint32_t* src, int src_step,
+                                             const uint32_t* dst, int dst_step,
+                                             int length, int max_allowed_diff) {
+  (void)max_allowed_diff;
   assert(length > 0);
   while (length-- > 0) {
     if (*src != *dst) {
@@ -354,15 +365,62 @@ static WEBP_INLINE int ComparePixels(const uint32_t* src, int src_step,
   return 1;
 }
 
+// Helper to check if each channel in 'src' and 'dst' is at most off by
+// 'max_allowed_diff'.
+static WEBP_INLINE int PixelsAreSimilar(uint32_t src, uint32_t dst,
+                                        int max_allowed_diff) {
+  const int src_a = (src >> 24) & 0xff;
+  const int src_r = (src >> 16) & 0xff;
+  const int src_g = (src >> 8) & 0xff;
+  const int src_b = (src >> 0) & 0xff;
+  const int dst_a = (dst >> 24) & 0xff;
+  const int dst_r = (dst >> 16) & 0xff;
+  const int dst_g = (dst >> 8) & 0xff;
+  const int dst_b = (dst >> 0) & 0xff;
+
+  return (abs(src_r * src_a - dst_r * dst_a) <= (max_allowed_diff * 255)) &&
+         (abs(src_g * src_a - dst_g * dst_a) <= (max_allowed_diff * 255)) &&
+         (abs(src_b * src_a - dst_b * dst_a) <= (max_allowed_diff * 255)) &&
+         (abs(src_a - dst_a) <= max_allowed_diff);
+}
+
+// Returns true if 'length' number of pixels in 'src' and 'dst' are within an
+// error bound, assuming the given step sizes between pixels.
+static WEBP_INLINE int ComparePixelsLossy(const uint32_t* src, int src_step,
+                                          const uint32_t* dst, int dst_step,
+                                          int length, int max_allowed_diff) {
+  assert(length > 0);
+  while (length-- > 0) {
+    if (!PixelsAreSimilar(*src, *dst, max_allowed_diff)) {
+      return 0;
+    }
+    src += src_step;
+    dst += dst_step;
+  }
+  return 1;
+}
+
 static int IsEmptyRect(const FrameRect* const rect) {
   return (rect->width_ == 0) || (rect->height_ == 0);
 }
 
+static int QualityToMaxDiff(float quality) {
+  const double val = pow(quality / 100., 0.5);
+  const double max_diff = 31 * (1 - val) + 1 * val;
+  return (int)(max_diff + 0.5);
+}
+
 // Assumes that an initial valid guess of change rectangle 'rect' is passed.
 static void MinimizeChangeRectangle(const WebPPicture* const src,
                                     const WebPPicture* const dst,
-                                    FrameRect* const rect) {
+                                    FrameRect* const rect,
+                                    int is_lossless, float quality) {
   int i, j;
+  const ComparePixelsFunc compare_pixels =
+      is_lossless ? ComparePixelsLossless : ComparePixelsLossy;
+  const int max_allowed_diff_lossy = QualityToMaxDiff(quality);
+  const int max_allowed_diff = is_lossless ? 0 : max_allowed_diff_lossy;
+
   // Sanity checks.
   assert(src->width == dst->width && src->height == dst->height);
   assert(rect->x_offset_ + rect->width_ <= dst->width);
@@ -374,8 +432,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src,
         &src->argb[rect->y_offset_ * src->argb_stride + i];
     const uint32_t* const dst_argb =
         &dst->argb[rect->y_offset_ * dst->argb_stride + i];
-    if (ComparePixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride,
-                      rect->height_)) {
+    if (compare_pixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride,
+                       rect->height_, max_allowed_diff)) {
       --rect->width_;  // Redundant column.
       ++rect->x_offset_;
     } else {
@@ -390,8 +448,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src,
         &src->argb[rect->y_offset_ * src->argb_stride + i];
     const uint32_t* const dst_argb =
         &dst->argb[rect->y_offset_ * dst->argb_stride + i];
-    if (ComparePixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride,
-                      rect->height_)) {
+    if (compare_pixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride,
+                       rect->height_, max_allowed_diff)) {
       --rect->width_;  // Redundant column.
     } else {
       break;
@@ -405,7 +463,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src,
         &src->argb[j * src->argb_stride + rect->x_offset_];
     const uint32_t* const dst_argb =
         &dst->argb[j * dst->argb_stride + rect->x_offset_];
-    if (ComparePixels(src_argb, 1, dst_argb, 1, rect->width_)) {
+    if (compare_pixels(src_argb, 1, dst_argb, 1, rect->width_,
+                       max_allowed_diff)) {
       --rect->height_;  // Redundant row.
       ++rect->y_offset_;
     } else {
@@ -420,7 +479,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src,
         &src->argb[j * src->argb_stride + rect->x_offset_];
     const uint32_t* const dst_argb =
         &dst->argb[j * dst->argb_stride + rect->x_offset_];
-    if (ComparePixels(src_argb, 1, dst_argb, 1, rect->width_)) {
+    if (compare_pixels(src_argb, 1, dst_argb, 1, rect->width_,
+                       max_allowed_diff)) {
       --rect->height_;  // Redundant row.
     } else {
       break;
@@ -445,20 +505,46 @@ static WEBP_INLINE void SnapToEvenOffsets(FrameRect* const rect) {
   rect->y_offset_ &= ~1;
 }
 
+typedef struct {
+  int should_try_;               // Should try this set of parameters.
+  int empty_rect_allowed_;       // Frame with empty rectangle can be skipped.
+  FrameRect rect_ll_;            // Frame rectangle for lossless compression.
+  WebPPicture sub_frame_ll_;     // Sub-frame pic for lossless compression.
+  FrameRect rect_lossy_;         // Frame rectangle for lossy compression.
+                                 // Could be smaller than rect_ll_ as pixels
+                                 // with small diffs can be ignored.
+  WebPPicture sub_frame_lossy_;  // Sub-frame pic for lossless compression.
+} SubFrameParams;
+
+static int SubFrameParamsInit(SubFrameParams* const params,
+                              int should_try, int empty_rect_allowed) {
+  params->should_try_ = should_try;
+  params->empty_rect_allowed_ = empty_rect_allowed;
+  if (!WebPPictureInit(&params->sub_frame_ll_) ||
+      !WebPPictureInit(&params->sub_frame_lossy_)) {
+    return 0;
+  }
+  return 1;
+}
+
+static void SubFrameParamsFree(SubFrameParams* const params) {
+  WebPPictureFree(&params->sub_frame_ll_);
+  WebPPictureFree(&params->sub_frame_lossy_);
+}
+
 // Given previous and current canvas, picks the optimal rectangle for the
-// current frame. The initial guess for 'rect' will be the full canvas.
+// current frame based on 'is_lossless' and other parameters. Assumes that the
+// initial guess 'rect' is valid.
 static int GetSubRect(const WebPPicture* const prev_canvas,
                       const WebPPicture* const curr_canvas, int is_key_frame,
                       int is_first_frame, int empty_rect_allowed,
-                      FrameRect* const rect, WebPPicture* const sub_frame) {
-  rect->x_offset_ = 0;
-  rect->y_offset_ = 0;
-  rect->width_ = curr_canvas->width;
-  rect->height_ = curr_canvas->height;
+                      int is_lossless, float quality, FrameRect* const rect,
+                      WebPPicture* const sub_frame) {
   if (!is_key_frame || is_first_frame) {  // Optimize frame rectangle.
     // Note: This behaves as expected for first frame, as 'prev_canvas' is
     // initialized to a fully transparent canvas in the beginning.
-    MinimizeChangeRectangle(prev_canvas, curr_canvas, rect);
+    MinimizeChangeRectangle(prev_canvas, curr_canvas, rect,
+                            is_lossless, quality);
   }
 
   if (IsEmptyRect(rect)) {
@@ -477,6 +563,29 @@ static int GetSubRect(const WebPPicture* const prev_canvas,
                          rect->width_, rect->height_, sub_frame);
 }
 
+// Picks optimal frame rectangle for both lossless and lossy compression. The
+// initial guess for frame rectangles will be the full canvas.
+static int GetSubRects(const WebPPicture* const prev_canvas,
+                       const WebPPicture* const curr_canvas, int is_key_frame,
+                       int is_first_frame, float quality,
+                       SubFrameParams* const params) {
+  // Lossless frame rectangle.
+  params->rect_ll_.x_offset_ = 0;
+  params->rect_ll_.y_offset_ = 0;
+  params->rect_ll_.width_ = curr_canvas->width;
+  params->rect_ll_.height_ = curr_canvas->height;
+  if (!GetSubRect(prev_canvas, curr_canvas, is_key_frame, is_first_frame,
+                  params->empty_rect_allowed_, 1, quality,
+                  &params->rect_ll_, &params->sub_frame_ll_)) {
+    return 0;
+  }
+  // Lossy frame rectangle.
+  params->rect_lossy_ = params->rect_ll_;  // seed with lossless rect.
+  return GetSubRect(prev_canvas, curr_canvas, is_key_frame, is_first_frame,
+                    params->empty_rect_allowed_, 0, quality,
+                    &params->rect_lossy_, &params->sub_frame_lossy_);
+}
+
 static void DisposeFrameRectangle(int dispose_method,
                                   const FrameRect* const rect,
                                   WebPPicture* const curr_canvas) {
@@ -490,9 +599,9 @@ static uint32_t RectArea(const FrameRect* const rect) {
   return (uint32_t)rect->width_ * rect->height_;
 }
 
-static int IsBlendingPossible(const WebPPicture* const src,
-                              const WebPPicture* const dst,
-                              const FrameRect* const rect) {
+static int IsLosslessBlendingPossible(const WebPPicture* const src,
+                                      const WebPPicture* const dst,
+                                      const FrameRect* const rect) {
   int i, j;
   assert(src->width == dst->width && src->height == dst->height);
   assert(rect->x_offset_ + rect->width_ <= dst->width);
@@ -512,88 +621,66 @@ static int IsBlendingPossible(const WebPPicture* const src,
   return 1;
 }
 
-#define MIN_COLORS_LOSSY     31  // Don't try lossy below this threshold.
-#define MAX_COLORS_LOSSLESS 194  // Don't try lossless above this threshold.
-#define MAX_COLOR_COUNT     256  // Power of 2 greater than MAX_COLORS_LOSSLESS.
-#define HASH_SIZE (MAX_COLOR_COUNT * 4)
-#define HASH_RIGHT_SHIFT     22  // 32 - log2(HASH_SIZE).
-
-// TODO(urvang): Also used in enc/vp8l.c. Move to utils.
-// If the number of colors in the 'pic' is at least MAX_COLOR_COUNT, return
-// MAX_COLOR_COUNT. Otherwise, return the exact number of colors in the 'pic'.
-static int GetColorCount(const WebPPicture* const pic) {
-  int x, y;
-  int num_colors = 0;
-  uint8_t in_use[HASH_SIZE] = { 0 };
-  uint32_t colors[HASH_SIZE];
-  static const uint32_t kHashMul = 0x1e35a7bd;
-  const uint32_t* argb = pic->argb;
-  const int width = pic->width;
-  const int height = pic->height;
-  uint32_t last_pix = ~argb[0];   // so we're sure that last_pix != argb[0]
-
-  for (y = 0; y < height; ++y) {
-    for (x = 0; x < width; ++x) {
-      int key;
-      if (argb[x] == last_pix) {
-        continue;
-      }
-      last_pix = argb[x];
-      key = (kHashMul * last_pix) >> HASH_RIGHT_SHIFT;
-      while (1) {
-        if (!in_use[key]) {
-          colors[key] = last_pix;
-          in_use[key] = 1;
-          ++num_colors;
-          if (num_colors >= MAX_COLOR_COUNT) {
-            return MAX_COLOR_COUNT;  // Exact count not needed.
-          }
-          break;
-        } else if (colors[key] == last_pix) {
-          break;  // The color is already there.
-        } else {
-          // Some other color sits here, so do linear conflict resolution.
-          ++key;
-          key &= (HASH_SIZE - 1);  // Key mask.
-        }
+static int IsLossyBlendingPossible(const WebPPicture* const src,
+                                   const WebPPicture* const dst,
+                                   const FrameRect* const rect,
+                                   float quality) {
+  const int max_allowed_diff_lossy = QualityToMaxDiff(quality);
+  int i, j;
+  assert(src->width == dst->width && src->height == dst->height);
+  assert(rect->x_offset_ + rect->width_ <= dst->width);
+  assert(rect->y_offset_ + rect->height_ <= dst->height);
+  for (j = rect->y_offset_; j < rect->y_offset_ + rect->height_; ++j) {
+    for (i = rect->x_offset_; i < rect->x_offset_ + rect->width_; ++i) {
+      const uint32_t src_pixel = src->argb[j * src->argb_stride + i];
+      const uint32_t dst_pixel = dst->argb[j * dst->argb_stride + i];
+      const uint32_t dst_alpha = dst_pixel >> 24;
+      if (dst_alpha != 0xff &&
+          !PixelsAreSimilar(src_pixel, dst_pixel, max_allowed_diff_lossy)) {
+        // In this case, if we use blending, we can't attain the desired
+        // 'dst_pixel' value for this pixel. So, blending is not possible.
+        return 0;
       }
     }
-    argb += pic->argb_stride;
   }
-  return num_colors;
+  return 1;
 }
 
-#undef MAX_COLOR_COUNT
-#undef HASH_SIZE
-#undef HASH_RIGHT_SHIFT
-
 // For pixels in 'rect', replace those pixels in 'dst' that are same as 'src' by
 // transparent pixels.
-static void IncreaseTransparency(const WebPPicture* const src,
-                                 const FrameRect* const rect,
-                                 WebPPicture* const dst) {
+// Returns true if at least one pixel gets modified.
+static int IncreaseTransparency(const WebPPicture* const src,
+                                const FrameRect* const rect,
+                                WebPPicture* const dst) {
   int i, j;
+  int modified = 0;
   assert(src != NULL && dst != NULL && rect != NULL);
   assert(src->width == dst->width && src->height == dst->height);
   for (j = rect->y_offset_; j < rect->y_offset_ + rect->height_; ++j) {
     const uint32_t* const psrc = src->argb + j * src->argb_stride;
     uint32_t* const pdst = dst->argb + j * dst->argb_stride;
     for (i = rect->x_offset_; i < rect->x_offset_ + rect->width_; ++i) {
-      if (psrc[i] == pdst[i]) {
+      if (psrc[i] == pdst[i] && pdst[i] != TRANSPARENT_COLOR) {
         pdst[i] = TRANSPARENT_COLOR;
+        modified = 1;
       }
     }
   }
+  return modified;
 }
 
 #undef TRANSPARENT_COLOR
 
 // Replace similar blocks of pixels by a 'see-through' transparent block
 // with uniform average color.
-static void FlattenSimilarBlocks(const WebPPicture* const src,
-                                 const FrameRect* const rect,
-                                 WebPPicture* const dst) {
+// Assumes lossy compression is being used.
+// Returns true if at least one pixel gets modified.
+static int FlattenSimilarBlocks(const WebPPicture* const src,
+                                const FrameRect* const rect,
+                                WebPPicture* const dst, float quality) {
+  const int max_allowed_diff_lossy = QualityToMaxDiff(quality);
   int i, j;
+  int modified = 0;
   const int block_size = 8;
   const int y_start = (rect->y_offset_ + block_size) & ~(block_size - 1);
   const int y_end = (rect->y_offset_ + rect->height_) & ~(block_size - 1);
@@ -615,11 +702,12 @@ static void FlattenSimilarBlocks(const WebPPicture* const src,
           const uint32_t src_pixel = psrc[x + y * src->argb_stride];
           const int alpha = src_pixel >> 24;
           if (alpha == 0xff &&
-              src_pixel == pdst[x + y * dst->argb_stride]) {
-              ++cnt;
-              avg_r += (src_pixel >> 16) & 0xff;
-              avg_g += (src_pixel >>  8) & 0xff;
-              avg_b += (src_pixel >>  0) & 0xff;
+              PixelsAreSimilar(src_pixel, pdst[x + y * dst->argb_stride],
+                               max_allowed_diff_lossy)) {
+            ++cnt;
+            avg_r += (src_pixel >> 16) & 0xff;
+            avg_g += (src_pixel >> 8) & 0xff;
+            avg_b += (src_pixel >> 0) & 0xff;
           }
         }
       }
@@ -635,9 +723,11 @@ static void FlattenSimilarBlocks(const WebPPicture* const src,
             pdst[x + y * dst->argb_stride] = color;
           }
         }
+        modified = 1;
       }
     }
   }
+  return modified;
 }
 
 static int EncodeFrame(const WebPConfig* const config, WebPPicture* const pic,
@@ -662,9 +752,10 @@ typedef struct {
 // Generates a candidate encoded frame given a picture and metadata.
 static WebPEncodingError EncodeCandidate(WebPPicture* const sub_frame,
                                          const FrameRect* const rect,
-                                         const WebPConfig* const config,
+                                         const WebPConfig* const encoder_config,
                                          int use_blending,
                                          Candidate* const candidate) {
+  WebPConfig config = *encoder_config;
   WebPEncodingError error_code = VP8_ENC_OK;
   assert(candidate != NULL);
   memset(candidate, 0, sizeof(*candidate));
@@ -682,7 +773,13 @@ static WebPEncodingError EncodeCandidate(WebPPicture* const sub_frame,
   // Encode picture.
   WebPMemoryWriterInit(&candidate->mem_);
 
-  if (!EncodeFrame(config, sub_frame, &candidate->mem_)) {
+  if (!config.lossless && use_blending) {
+    // Disable filtering to avoid blockiness in reconstructed frames at the
+    // time of decoding.
+    config.autofilter = 0;
+    config.filter_strength = 0;
+  }
+  if (!EncodeFrame(&config, sub_frame, &candidate->mem_)) {
     error_code = sub_frame->error_code;
     goto Err;
   }
@@ -698,6 +795,8 @@ static WebPEncodingError EncodeCandidate(WebPPicture* const sub_frame,
 static void CopyCurrentCanvas(WebPAnimEncoder* const enc) {
   if (enc->curr_canvas_copy_modified_) {
     WebPCopyPixels(enc->curr_canvas_, &enc->curr_canvas_copy_);
+    enc->curr_canvas_copy_.progress_hook = enc->curr_canvas_->progress_hook;
+    enc->curr_canvas_copy_.user_data = enc->curr_canvas_->user_data;
     enc->curr_canvas_copy_modified_ = 0;
   }
 }
@@ -710,12 +809,15 @@ enum {
   CANDIDATE_COUNT
 };
 
-// Generates candidates for a given dispose method given pre-filled 'rect'
-// and 'sub_frame'.
+#define MIN_COLORS_LOSSY     31  // Don't try lossy below this threshold.
+#define MAX_COLORS_LOSSLESS 194  // Don't try lossless above this threshold.
+
+// Generates candidates for a given dispose method given pre-filled sub-frame
+// 'params'.
 static WebPEncodingError GenerateCandidates(
     WebPAnimEncoder* const enc, Candidate candidates[CANDIDATE_COUNT],
     WebPMuxAnimDispose dispose_method, int is_lossless, int is_key_frame,
-    const FrameRect* const rect, WebPPicture* sub_frame,
+    SubFrameParams* const params,
     const WebPConfig* const config_ll, const WebPConfig* const config_lossy) {
   WebPEncodingError error_code = VP8_ENC_OK;
   const int is_dispose_none = (dispose_method == WEBP_MUX_DISPOSE_NONE);
@@ -727,16 +829,24 @@ static WebPEncodingError GenerateCandidates(
   WebPPicture* const curr_canvas = &enc->curr_canvas_copy_;
   const WebPPicture* const prev_canvas =
       is_dispose_none ? &enc->prev_canvas_ : &enc->prev_canvas_disposed_;
-  const int use_blending =
+  int use_blending_ll;
+  int use_blending_lossy;
+
+  CopyCurrentCanvas(enc);
+  use_blending_ll =
+      !is_key_frame &&
+      IsLosslessBlendingPossible(prev_canvas, curr_canvas, &params->rect_ll_);
+  use_blending_lossy =
       !is_key_frame &&
-      IsBlendingPossible(prev_canvas, curr_canvas, rect);
+      IsLossyBlendingPossible(prev_canvas, curr_canvas, &params->rect_lossy_,
+                              config_lossy->quality);
 
   // Pick candidates to be tried.
   if (!enc->options_.allow_mixed) {
     candidate_ll->evaluate_ = is_lossless;
     candidate_lossy->evaluate_ = !is_lossless;
   } else {  // Use a heuristic for trying lossless and/or lossy compression.
-    const int num_colors = GetColorCount(sub_frame);
+    const int num_colors = WebPGetColorPalette(&params->sub_frame_ll_, NULL);
     candidate_ll->evaluate_ = (num_colors < MAX_COLORS_LOSSLESS);
     candidate_lossy->evaluate_ = (num_colors >= MIN_COLORS_LOSSY);
   }
@@ -744,23 +854,26 @@ static WebPEncodingError GenerateCandidates(
   // Generate candidates.
   if (candidate_ll->evaluate_) {
     CopyCurrentCanvas(enc);
-    if (use_blending) {
-      IncreaseTransparency(prev_canvas, rect, curr_canvas);
-      enc->curr_canvas_copy_modified_ = 1;
+    if (use_blending_ll) {
+      enc->curr_canvas_copy_modified_ =
+          IncreaseTransparency(prev_canvas, &params->rect_ll_, curr_canvas);
     }
-    error_code = EncodeCandidate(sub_frame, rect, config_ll, use_blending,
-                                 candidate_ll);
+    error_code = EncodeCandidate(&params->sub_frame_ll_, &params->rect_ll_,
+                                 config_ll, use_blending_ll, candidate_ll);
     if (error_code != VP8_ENC_OK) return error_code;
   }
   if (candidate_lossy->evaluate_) {
     CopyCurrentCanvas(enc);
-    if (use_blending) {
-      FlattenSimilarBlocks(prev_canvas, rect, curr_canvas);
-      enc->curr_canvas_copy_modified_ = 1;
+    if (use_blending_lossy) {
+      enc->curr_canvas_copy_modified_ =
+          FlattenSimilarBlocks(prev_canvas, &params->rect_lossy_, curr_canvas,
+                               config_lossy->quality);
     }
-    error_code = EncodeCandidate(sub_frame, rect, config_lossy, use_blending,
-                                 candidate_lossy);
+    error_code =
+        EncodeCandidate(&params->sub_frame_lossy_, &params->rect_lossy_,
+                        config_lossy, use_blending_lossy, candidate_lossy);
     if (error_code != VP8_ENC_OK) return error_code;
+    enc->curr_canvas_copy_modified_ = 1;
   }
   return error_code;
 }
@@ -918,13 +1031,16 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc,
   const int is_lossless = config->lossless;
   const int is_first_frame = enc->is_first_frame_;
 
-  int try_dispose_none = 1;  // Default.
-  FrameRect rect_none;
-  WebPPicture sub_frame_none;
   // First frame cannot be skipped as there is no 'previous frame' to merge it
   // to. So, empty rectangle is not allowed for the first frame.
   const int empty_rect_allowed_none = !is_first_frame;
 
+  // Even if there is exact pixel match between 'disposed previous canvas' and
+  // 'current canvas', we can't skip current frame, as there may not be exact
+  // pixel match between 'previous canvas' and 'current canvas'. So, we don't
+  // allow empty rectangle in this case.
+  const int empty_rect_allowed_bg = 0;
+
   // If current frame is a key-frame, dispose method of previous frame doesn't
   // matter, so we don't try dispose to background.
   // Also, if key-frame insertion is on, and previous frame could be picked as
@@ -933,19 +1049,20 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc,
   // background.
   const int dispose_bg_possible =
       !is_key_frame && !enc->prev_candidate_undecided_;
-  int try_dispose_bg = 0;  // Default.
-  FrameRect rect_bg;
-  WebPPicture sub_frame_bg;
+
+  SubFrameParams dispose_none_params;
+  SubFrameParams dispose_bg_params;
 
   WebPConfig config_ll = *config;
   WebPConfig config_lossy = *config;
   config_ll.lossless = 1;
   config_lossy.lossless = 0;
   enc->last_config_ = *config;
-  enc->last_config2_ = config->lossless ? config_lossy : config_ll;
+  enc->last_config_reversed_ = config->lossless ? config_lossy : config_ll;
   *frame_skipped = 0;
 
-  if (!WebPPictureInit(&sub_frame_none) || !WebPPictureInit(&sub_frame_bg)) {
+  if (!SubFrameParamsInit(&dispose_none_params, 1, empty_rect_allowed_none) ||
+      !SubFrameParamsInit(&dispose_bg_params, 0, empty_rect_allowed_bg)) {
     return VP8_ENC_ERROR_INVALID_CONFIGURATION;
   }
 
@@ -954,10 +1071,14 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc,
   }
 
   // Change-rectangle assuming previous frame was DISPOSE_NONE.
-  GetSubRect(prev_canvas, curr_canvas, is_key_frame, is_first_frame,
-             empty_rect_allowed_none, &rect_none, &sub_frame_none);
+  if (!GetSubRects(prev_canvas, curr_canvas, is_key_frame, is_first_frame,
+                   config_lossy.quality, &dispose_none_params)) {
+    error_code = VP8_ENC_ERROR_INVALID_CONFIGURATION;
+    goto Err;
+  }
 
-  if (IsEmptyRect(&rect_none)) {
+  if ((is_lossless && IsEmptyRect(&dispose_none_params.rect_ll_)) ||
+      (!is_lossless && IsEmptyRect(&dispose_none_params.rect_lossy_))) {
     // Don't encode the frame at all. Instead, the duration of the previous
     // frame will be increased later.
     assert(empty_rect_allowed_none);
@@ -971,36 +1092,43 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc,
     WebPCopyPixels(prev_canvas, prev_canvas_disposed);
     DisposeFrameRectangle(WEBP_MUX_DISPOSE_BACKGROUND, &enc->prev_rect_,
                           prev_canvas_disposed);
-    // Even if there is exact pixel match between 'disposed previous canvas' and
-    // 'current canvas', we can't skip current frame, as there may not be exact
-    // pixel match between 'previous canvas' and 'current canvas'. So, we don't
-    // allow empty rectangle in this case.
-    GetSubRect(prev_canvas_disposed, curr_canvas, is_key_frame, is_first_frame,
-               0 /* empty_rect_allowed */, &rect_bg, &sub_frame_bg);
-    assert(!IsEmptyRect(&rect_bg));
+
+    if (!GetSubRects(prev_canvas_disposed, curr_canvas, is_key_frame,
+                     is_first_frame, config_lossy.quality,
+                     &dispose_bg_params)) {
+      error_code = VP8_ENC_ERROR_INVALID_CONFIGURATION;
+      goto Err;
+    }
+    assert(!IsEmptyRect(&dispose_bg_params.rect_ll_));
+    assert(!IsEmptyRect(&dispose_bg_params.rect_lossy_));
 
     if (enc->options_.minimize_size) {  // Try both dispose methods.
-      try_dispose_bg = 1;
-      try_dispose_none = 1;
-    } else if (RectArea(&rect_bg) < RectArea(&rect_none)) {
-      try_dispose_bg = 1;  // Pick DISPOSE_BACKGROUND.
-      try_dispose_none = 0;
+      dispose_bg_params.should_try_ = 1;
+      dispose_none_params.should_try_ = 1;
+    } else if ((is_lossless &&
+                RectArea(&dispose_bg_params.rect_ll_) <
+                    RectArea(&dispose_none_params.rect_ll_)) ||
+               (!is_lossless &&
+                RectArea(&dispose_bg_params.rect_lossy_) <
+                    RectArea(&dispose_none_params.rect_lossy_))) {
+      dispose_bg_params.should_try_ = 1;  // Pick DISPOSE_BACKGROUND.
+      dispose_none_params.should_try_ = 0;
     }
   }
 
-  if (try_dispose_none) {
+  if (dispose_none_params.should_try_) {
     error_code = GenerateCandidates(
         enc, candidates, WEBP_MUX_DISPOSE_NONE, is_lossless, is_key_frame,
-        &rect_none, &sub_frame_none, &config_ll, &config_lossy);
+        &dispose_none_params, &config_ll, &config_lossy);
     if (error_code != VP8_ENC_OK) goto Err;
   }
 
-  if (try_dispose_bg) {
+  if (dispose_bg_params.should_try_) {
     assert(!enc->is_first_frame_);
     assert(dispose_bg_possible);
     error_code = GenerateCandidates(
         enc, candidates, WEBP_MUX_DISPOSE_BACKGROUND, is_lossless, is_key_frame,
-        &rect_bg, &sub_frame_bg, &config_ll, &config_lossy);
+        &dispose_bg_params, &config_ll, &config_lossy);
     if (error_code != VP8_ENC_OK) goto Err;
   }
 
@@ -1016,8 +1144,8 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc,
   }
 
  End:
-  WebPPictureFree(&sub_frame_none);
-  WebPPictureFree(&sub_frame_bg);
+  SubFrameParamsFree(&dispose_none_params);
+  SubFrameParamsFree(&dispose_bg_params);
   return error_code;
 }
 
@@ -1163,6 +1291,7 @@ static int FlushFrames(WebPAnimEncoder* const enc) {
 int WebPAnimEncoderAdd(WebPAnimEncoder* enc, WebPPicture* frame, int timestamp,
                        const WebPConfig* encoder_config) {
   WebPConfig config;
+  int ok;
 
   if (enc == NULL) {
     return 0;
@@ -1212,6 +1341,10 @@ int WebPAnimEncoderAdd(WebPAnimEncoder* enc, WebPPicture* frame, int timestamp,
   }
 
   if (encoder_config != NULL) {
+    if (!WebPValidateConfig(encoder_config)) {
+      MarkError(enc, "ERROR adding frame: Invalid WebPConfig");
+      return 0;
+    }
     config = *encoder_config;
   } else {
     WebPConfigInit(&config);
@@ -1222,17 +1355,14 @@ int WebPAnimEncoderAdd(WebPAnimEncoder* enc, WebPPicture* frame, int timestamp,
   assert(enc->curr_canvas_copy_modified_ == 1);
   CopyCurrentCanvas(enc);
 
-  if (!CacheFrame(enc, &config)) {
-    return 0;
-  }
+  ok = CacheFrame(enc, &config) && FlushFrames(enc);
 
-  if (!FlushFrames(enc)) {
-    return 0;
-  }
   enc->curr_canvas_ = NULL;
   enc->curr_canvas_copy_modified_ = 1;
-  enc->prev_timestamp_ = timestamp;
-  return 1;
+  if (ok) {
+    enc->prev_timestamp_ = timestamp;
+  }
+  return ok;
 }
 
 // -----------------------------------------------------------------------------
@@ -1278,7 +1408,7 @@ static int FrameToFullCanvas(WebPAnimEncoder* const enc,
   GetEncodedData(&mem1, full_image);
 
   if (enc->options_.allow_mixed) {
-    if (!EncodeFrame(&enc->last_config_, canvas_buf, &mem2)) goto Err;
+    if (!EncodeFrame(&enc->last_config_reversed_, canvas_buf, &mem2)) goto Err;
     if (mem2.size < mem1.size) {
       GetEncodedData(&mem2, full_image);
       WebPMemoryWriterClear(&mem1);
diff --git a/src/3rdparty/libwebp/src/mux/muxedit.c b/src/3rdparty/libwebp/src/mux/muxedit.c
index b27663f..9bbed42 100644
--- a/src/3rdparty/libwebp/src/mux/muxedit.c
+++ b/src/3rdparty/libwebp/src/mux/muxedit.c
@@ -558,8 +558,8 @@ static WebPMuxError CreateVP8XChunk(WebPMux* const mux) {
     height = mux->canvas_height_;
   }
 
-  if (flags == 0) {
-    // For Simple Image, VP8X chunk should not be added.
+  if (flags == 0 && mux->unknown_ == NULL) {
+    // For simple file format, VP8X chunk should not be added.
     return WEBP_MUX_OK;
   }
 
diff --git a/src/3rdparty/libwebp/src/mux/muxi.h b/src/3rdparty/libwebp/src/mux/muxi.h
index 5e8ba2e..d4d5cba 100644
--- a/src/3rdparty/libwebp/src/mux/muxi.h
+++ b/src/3rdparty/libwebp/src/mux/muxi.h
@@ -28,7 +28,7 @@ extern "C" {
 
 #define MUX_MAJ_VERSION 0
 #define MUX_MIN_VERSION 3
-#define MUX_REV_VERSION 0
+#define MUX_REV_VERSION 1
 
 // Chunk object.
 typedef struct WebPChunk WebPChunk;
diff --git a/src/3rdparty/libwebp/src/utils/bit_reader.c b/src/3rdparty/libwebp/src/utils/bit_reader.c
index 45198e1..50ffb74 100644
--- a/src/3rdparty/libwebp/src/utils/bit_reader.c
+++ b/src/3rdparty/libwebp/src/utils/bit_reader.c
@@ -16,6 +16,7 @@
 #endif
 
 #include "./bit_reader_inl.h"
+#include "../utils/utils.h"
 
 //------------------------------------------------------------------------------
 // VP8BitReader
@@ -119,11 +120,10 @@ int32_t VP8GetSignedValue(VP8BitReader* const br, int bits) {
 
 #define VP8L_LOG8_WBITS 4  // Number of bytes needed to store VP8L_WBITS bits.
 
-#if !defined(WEBP_FORCE_ALIGNED) && \
-    (defined(__arm__) || defined(_M_ARM) || defined(__aarch64__) || \
-     defined(__i386__) || defined(_M_IX86) || \
-     defined(__x86_64__) || defined(_M_X64))
-#define VP8L_USE_UNALIGNED_LOAD
+#if defined(__arm__) || defined(_M_ARM) || defined(__aarch64__) || \
+    defined(__i386__) || defined(_M_IX86) || \
+    defined(__x86_64__) || defined(_M_X64)
+#define VP8L_USE_FAST_LOAD
 #endif
 
 static const uint32_t kBitMask[VP8L_MAX_NUM_BIT_READ + 1] = {
@@ -191,15 +191,11 @@ static void ShiftBytes(VP8LBitReader* const br) {
 
 void VP8LDoFillBitWindow(VP8LBitReader* const br) {
   assert(br->bit_pos_ >= VP8L_WBITS);
-  // TODO(jzern): given the fixed read size it may be possible to force
-  //              alignment in this block.
-#if defined(VP8L_USE_UNALIGNED_LOAD)
+#if defined(VP8L_USE_FAST_LOAD)
   if (br->pos_ + sizeof(br->val_) < br->len_) {
     br->val_ >>= VP8L_WBITS;
     br->bit_pos_ -= VP8L_WBITS;
-    // The expression below needs a little-endian arch to work correctly.
-    // This gives a large speedup for decoding speed.
-    br->val_ |= (vp8l_val_t)WebPMemToUint32(br->buf_ + br->pos_) <<
+    br->val_ |= (vp8l_val_t)HToLE32(WebPMemToUint32(br->buf_ + br->pos_)) <<
                 (VP8L_LBITS - VP8L_WBITS);
     br->pos_ += VP8L_LOG8_WBITS;
     return;
diff --git a/src/3rdparty/libwebp/src/utils/bit_reader_inl.h b/src/3rdparty/libwebp/src/utils/bit_reader_inl.h
index 3721570..99ed313 100644
--- a/src/3rdparty/libwebp/src/utils/bit_reader_inl.h
+++ b/src/3rdparty/libwebp/src/utils/bit_reader_inl.h
@@ -55,7 +55,8 @@ void VP8LoadFinalBytes(VP8BitReader* const br);
 // Inlined critical functions
 
 // makes sure br->value_ has at least BITS bits worth of data
-static WEBP_INLINE void VP8LoadNewBytes(VP8BitReader* const br) {
+static WEBP_UBSAN_IGNORE_UNDEF WEBP_INLINE
+void VP8LoadNewBytes(VP8BitReader* const br) {
   assert(br != NULL && br->buf_ != NULL);
   // Read 'BITS' bits at a time if possible.
   if (br->buf_ < br->buf_max_) {
diff --git a/src/3rdparty/libwebp/src/utils/color_cache.c b/src/3rdparty/libwebp/src/utils/color_cache.c
index f9ff4b5..c34b2e7 100644
--- a/src/3rdparty/libwebp/src/utils/color_cache.c
+++ b/src/3rdparty/libwebp/src/utils/color_cache.c
@@ -15,7 +15,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include "./color_cache.h"
-#include "../utils/utils.h"
+#include "./utils.h"
 
 //------------------------------------------------------------------------------
 // VP8LColorCache.
diff --git a/src/3rdparty/libwebp/src/utils/huffman.c b/src/3rdparty/libwebp/src/utils/huffman.c
index d57376a..36e5502 100644
--- a/src/3rdparty/libwebp/src/utils/huffman.c
+++ b/src/3rdparty/libwebp/src/utils/huffman.c
@@ -15,7 +15,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include "./huffman.h"
-#include "../utils/utils.h"
+#include "./utils.h"
 #include "../webp/format_constants.h"
 
 // Huffman data read via DecodeImageStream is represented in two (red and green)
diff --git a/src/3rdparty/libwebp/src/utils/huffman_encode.c b/src/3rdparty/libwebp/src/utils/huffman_encode.c
index 6421c2b..4e5ef6b 100644
--- a/src/3rdparty/libwebp/src/utils/huffman_encode.c
+++ b/src/3rdparty/libwebp/src/utils/huffman_encode.c
@@ -15,7 +15,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include "./huffman_encode.h"
-#include "../utils/utils.h"
+#include "./utils.h"
 #include "../webp/format_constants.h"
 
 // -----------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/utils/quant_levels_dec.c b/src/3rdparty/libwebp/src/utils/quant_levels_dec.c
index 5b8b8b4..ee0a3fe 100644
--- a/src/3rdparty/libwebp/src/utils/quant_levels_dec.c
+++ b/src/3rdparty/libwebp/src/utils/quant_levels_dec.c
@@ -44,6 +44,7 @@ static const uint8_t kOrderedDither[DSIZE][DSIZE] = {
 
 typedef struct {
   int width_, height_;  // dimension
+  int stride_;          // stride in bytes
   int row_;             // current input row being processed
   uint8_t* src_;        // input pointer
   uint8_t* dst_;        // output pointer
@@ -99,7 +100,7 @@ static void VFilter(SmoothParams* const p) {
   // We replicate edges, as it's somewhat easier as a boundary condition.
   // That's why we don't update the 'src' pointer on top/bottom area:
   if (p->row_ >= 0 && p->row_ < p->height_ - 1) {
-    p->src_ += p->width_;
+    p->src_ += p->stride_;
   }
 }
 
@@ -149,7 +150,7 @@ static void ApplyFilter(SmoothParams* const p) {
 #endif
     }
   }
-  p->dst_ += w;  // advance output pointer
+  p->dst_ += p->stride_;  // advance output pointer
 }
 
 //------------------------------------------------------------------------------
@@ -178,17 +179,20 @@ static void InitCorrectionLUT(int16_t* const lut, int min_dist) {
   lut[0] = 0;
 }
 
-static void CountLevels(const uint8_t* const data, int size,
-                        SmoothParams* const p) {
-  int i, last_level;
+static void CountLevels(SmoothParams* const p) {
+  int i, j, last_level;
   uint8_t used_levels[256] = { 0 };
+  const uint8_t* data = p->src_;
   p->min_ = 255;
   p->max_ = 0;
-  for (i = 0; i < size; ++i) {
-    const int v = data[i];
-    if (v < p->min_) p->min_ = v;
-    if (v > p->max_) p->max_ = v;
-    used_levels[v] = 1;
+  for (j = 0; j < p->height_; ++j) {
+    for (i = 0; i < p->width_; ++i) {
+      const int v = data[i];
+      if (v < p->min_) p->min_ = v;
+      if (v > p->max_) p->max_ = v;
+      used_levels[v] = 1;
+    }
+    data += p->stride_;
   }
   // Compute the mininum distance between two non-zero levels.
   p->min_level_dist_ = p->max_ - p->min_;
@@ -208,7 +212,7 @@ static void CountLevels(const uint8_t* const data, int size,
 }
 
 // Initialize all params.
-static int InitParams(uint8_t* const data, int width, int height,
+static int InitParams(uint8_t* const data, int width, int height, int stride,
                       int radius, SmoothParams* const p) {
   const int R = 2 * radius + 1;  // total size of the kernel
 
@@ -233,6 +237,7 @@ static int InitParams(uint8_t* const data, int width, int height,
 
   p->width_ = width;
   p->height_ = height;
+  p->stride_ = stride;
   p->src_ = data;
   p->dst_ = data;
   p->radius_ = radius;
@@ -240,7 +245,7 @@ static int InitParams(uint8_t* const data, int width, int height,
   p->row_ = -radius;
 
   // analyze the input distribution so we can best-fit the threshold
-  CountLevels(data, width * height, p);
+  CountLevels(p);
 
   // correction table
   p->correction_ = ((int16_t*)mem) + LUT_SIZE;
@@ -253,7 +258,7 @@ static void CleanupParams(SmoothParams* const p) {
   WebPSafeFree(p->mem_);
 }
 
-int WebPDequantizeLevels(uint8_t* const data, int width, int height,
+int WebPDequantizeLevels(uint8_t* const data, int width, int height, int stride,
                          int strength) {
   const int radius = 4 * strength / 100;
   if (strength < 0 || strength > 100) return 0;
@@ -261,7 +266,7 @@ int WebPDequantizeLevels(uint8_t* const data, int width, int height,
   if (radius > 0) {
     SmoothParams p;
     memset(&p, 0, sizeof(p));
-    if (!InitParams(data, width, height, radius, &p)) return 0;
+    if (!InitParams(data, width, height, stride, radius, &p)) return 0;
     if (p.num_levels_ > 2) {
       for (; p.row_ < p.height_; ++p.row_) {
         VFilter(&p);  // accumulate average of input
diff --git a/src/3rdparty/libwebp/src/utils/quant_levels_dec.h b/src/3rdparty/libwebp/src/utils/quant_levels_dec.h
index 9aab068..59a1349 100644
--- a/src/3rdparty/libwebp/src/utils/quant_levels_dec.h
+++ b/src/3rdparty/libwebp/src/utils/quant_levels_dec.h
@@ -21,11 +21,11 @@ extern "C" {
 #endif
 
 // Apply post-processing to input 'data' of size 'width'x'height' assuming that
-// the source was quantized to a reduced number of levels.
+// the source was quantized to a reduced number of levels. 'stride' is in bytes.
 // Strength is in [0..100] and controls the amount of dithering applied.
 // Returns false in case of error (data is NULL, invalid parameters,
 // malloc failure, ...).
-int WebPDequantizeLevels(uint8_t* const data, int width, int height,
+int WebPDequantizeLevels(uint8_t* const data, int width, int height, int stride,
                          int strength);
 
 #ifdef __cplusplus
diff --git a/src/3rdparty/libwebp/src/utils/utils.c b/src/3rdparty/libwebp/src/utils/utils.c
index d8e3093..2602ca3 100644
--- a/src/3rdparty/libwebp/src/utils/utils.c
+++ b/src/3rdparty/libwebp/src/utils/utils.c
@@ -15,6 +15,7 @@
 #include <string.h>  // for memcpy()
 #include "../webp/decode.h"
 #include "../webp/encode.h"
+#include "../webp/format_constants.h"  // for MAX_PALETTE_SIZE
 #include "./utils.h"
 
 // If PRINT_MEM_INFO is defined, extra info (like total memory used, number of
@@ -237,3 +238,68 @@ void WebPCopyPixels(const WebPPicture* const src, WebPPicture* const dst) {
 }
 
 //------------------------------------------------------------------------------
+
+#define MAX_COLOR_COUNT         MAX_PALETTE_SIZE
+#define COLOR_HASH_SIZE         (MAX_COLOR_COUNT * 4)
+#define COLOR_HASH_RIGHT_SHIFT  22  // 32 - log2(COLOR_HASH_SIZE).
+
+int WebPGetColorPalette(const WebPPicture* const pic, uint32_t* const palette) {
+  int i;
+  int x, y;
+  int num_colors = 0;
+  uint8_t in_use[COLOR_HASH_SIZE] = { 0 };
+  uint32_t colors[COLOR_HASH_SIZE];
+  static const uint32_t kHashMul = 0x1e35a7bdU;
+  const uint32_t* argb = pic->argb;
+  const int width = pic->width;
+  const int height = pic->height;
+  uint32_t last_pix = ~argb[0];   // so we're sure that last_pix != argb[0]
+  assert(pic != NULL);
+  assert(pic->use_argb);
+
+  for (y = 0; y < height; ++y) {
+    for (x = 0; x < width; ++x) {
+      int key;
+      if (argb[x] == last_pix) {
+        continue;
+      }
+      last_pix = argb[x];
+      key = (kHashMul * last_pix) >> COLOR_HASH_RIGHT_SHIFT;
+      while (1) {
+        if (!in_use[key]) {
+          colors[key] = last_pix;
+          in_use[key] = 1;
+          ++num_colors;
+          if (num_colors > MAX_COLOR_COUNT) {
+            return MAX_COLOR_COUNT + 1;  // Exact count not needed.
+          }
+          break;
+        } else if (colors[key] == last_pix) {
+          break;  // The color is already there.
+        } else {
+          // Some other color sits here, so do linear conflict resolution.
+          ++key;
+          key &= (COLOR_HASH_SIZE - 1);  // Key mask.
+        }
+      }
+    }
+    argb += pic->argb_stride;
+  }
+
+  if (palette != NULL) {  // Fill the colors into palette.
+    num_colors = 0;
+    for (i = 0; i < COLOR_HASH_SIZE; ++i) {
+      if (in_use[i]) {
+        palette[num_colors] = colors[i];
+        ++num_colors;
+      }
+    }
+  }
+  return num_colors;
+}
+
+#undef MAX_COLOR_COUNT
+#undef COLOR_HASH_SIZE
+#undef COLOR_HASH_RIGHT_SHIFT
+
+//------------------------------------------------------------------------------
diff --git a/src/3rdparty/libwebp/src/utils/utils.h b/src/3rdparty/libwebp/src/utils/utils.h
index f506d66..e0a8112 100644
--- a/src/3rdparty/libwebp/src/utils/utils.h
+++ b/src/3rdparty/libwebp/src/utils/utils.h
@@ -21,6 +21,7 @@
 
 #include <assert.h>
 
+#include "../dsp/dsp.h"
 #include "../webp/types.h"
 
 #ifdef __cplusplus
@@ -51,7 +52,7 @@ WEBP_EXTERN(void) WebPSafeFree(void* const ptr);
 // Alignment
 
 #define WEBP_ALIGN_CST 31
-#define WEBP_ALIGN(PTR) ((uintptr_t)((PTR) + WEBP_ALIGN_CST) & ~WEBP_ALIGN_CST)
+#define WEBP_ALIGN(PTR) (((uintptr_t)(PTR) + WEBP_ALIGN_CST) & ~WEBP_ALIGN_CST)
 
 #if defined(WEBP_FORCE_ALIGNED)
 #include <string.h>
@@ -65,10 +66,12 @@ static WEBP_INLINE void WebPUint32ToMem(uint8_t* const ptr, uint32_t val) {
   memcpy(ptr, &val, sizeof(val));
 }
 #else
-static WEBP_INLINE uint32_t WebPMemToUint32(const uint8_t* const ptr) {
+static WEBP_UBSAN_IGNORE_UNDEF WEBP_INLINE
+uint32_t WebPMemToUint32(const uint8_t* const ptr) {
   return *(const uint32_t*)ptr;
 }
-static WEBP_INLINE void WebPUint32ToMem(uint8_t* const ptr, uint32_t val) {
+static WEBP_UBSAN_IGNORE_UNDEF WEBP_INLINE
+void WebPUint32ToMem(uint8_t* const ptr, uint32_t val) {
   *(uint32_t*)ptr = val;
 }
 #endif
@@ -158,6 +161,19 @@ WEBP_EXTERN(void) WebPCopyPixels(const struct WebPPicture* const src,
                                  struct WebPPicture* const dst);
 
 //------------------------------------------------------------------------------
+// Unique colors.
+
+// Returns count of unique colors in 'pic', assuming pic->use_argb is true.
+// If the unique color count is more than MAX_COLOR_COUNT, returns
+// MAX_COLOR_COUNT+1.
+// If 'palette' is not NULL and number of unique colors is less than or equal to
+// MAX_COLOR_COUNT, also outputs the actual unique colors into 'palette'.
+// Note: 'palette' is assumed to be an array already allocated with at least
+// MAX_COLOR_COUNT elements.
+WEBP_EXTERN(int) WebPGetColorPalette(const struct WebPPicture* const pic,
+                                     uint32_t* const palette);
+
+//------------------------------------------------------------------------------
 
 #ifdef __cplusplus
 }    // extern "C"
diff --git a/src/3rdparty/libwebp/src/webp/config.h b/src/3rdparty/libwebp/src/webp/config.h
index 4ea0737..118ac38 100644
--- a/src/3rdparty/libwebp/src/webp/config.h
+++ b/src/3rdparty/libwebp/src/webp/config.h
@@ -79,7 +79,7 @@
 #define PACKAGE_NAME "libwebp"
 
 /* Define to the full name and version of this package. */
-#define PACKAGE_STRING "libwebp 0.5.0"
+#define PACKAGE_STRING "libwebp 0.5.1"
 
 /* Define to the one symbol short name of this package. */
 #define PACKAGE_TARNAME "libwebp"
@@ -88,7 +88,7 @@
 #define PACKAGE_URL "http://developers.google.com/speed/webp"
 
 /* Define to the version of this package. */
-#define PACKAGE_VERSION "0.5.0"
+#define PACKAGE_VERSION "0.5.1"
 
 /* Define to necessary symbol if this constant uses a non-standard name on
    your system. */
@@ -98,7 +98,7 @@
 /* #undef STDC_HEADERS */
 
 /* Version number of package */
-#define VERSION "0.5.0"
+#define VERSION "0.5.1"
 
 /* Enable experimental code */
 /* #undef WEBP_EXPERIMENTAL_FEATURES */
@@ -118,12 +118,21 @@
 /* Set to 1 if JPEG library is installed */
 /* #undef WEBP_HAVE_JPEG */
 
+/* Set to 1 if NEON is supported */
+/* #undef WEBP_HAVE_NEON */
+
+/* Set to 1 if runtime detection of NEON is enabled */
+/* #undef WEBP_HAVE_NEON_RTCD */
+
 /* Set to 1 if PNG library is installed */
 /* #undef WEBP_HAVE_PNG */
 
 /* Set to 1 if SSE2 is supported */
 /* #undef WEBP_HAVE_SSE2 */
 
+/* Set to 1 if SSE4.1 is supported */
+/* #undef WEBP_HAVE_SSE41 */
+
 /* Set to 1 if TIFF library is installed */
 /* #undef WEBP_HAVE_TIFF */
 
@@ -132,6 +141,15 @@
 
 /* Define WORDS_BIGENDIAN to 1 if your processor stores words with the most
    significant byte first (like Motorola and SPARC, unlike Intel). */
+/* #if defined AC_APPLE_UNIVERSAL_BUILD */
+/* # if defined __BIG_ENDIAN__ */
+/* #  define WORDS_BIGENDIAN 1 */
+/* # endif */
+/* #else */
+/* # ifndef WORDS_BIGENDIAN */
+/* /* #  undef WORDS_BIGENDIAN */
+/* # endif */
+/* #endif */
 #if (Q_BYTE_ORDER == Q_BIG_ENDIAN)
 #define WORDS_BIGENDIAN 1
 #endif
diff --git a/src/3rdparty/libwebp/src/webp/decode.h b/src/3rdparty/libwebp/src/webp/decode.h
index 143e4fb..7a3bed9 100644
--- a/src/3rdparty/libwebp/src/webp/decode.h
+++ b/src/3rdparty/libwebp/src/webp/decode.h
@@ -39,8 +39,8 @@ typedef struct WebPDecoderConfig WebPDecoderConfig;
 WEBP_EXTERN(int) WebPGetDecoderVersion(void);
 
 // Retrieve basic header information: width, height.
-// This function will also validate the header and return 0 in
-// case of formatting error.
+// This function will also validate the header, returning true on success,
+// false otherwise. '*width' and '*height' are only valid on successful return.
 // Pointers 'width' and 'height' can be passed NULL if deemed irrelevant.
 WEBP_EXTERN(int) WebPGetInfo(const uint8_t* data, size_t data_size,
                              int* width, int* height);
@@ -197,7 +197,10 @@ struct WebPYUVABuffer {              // view as YUVA
 struct WebPDecBuffer {
   WEBP_CSP_MODE colorspace;  // Colorspace.
   int width, height;         // Dimensions.
-  int is_external_memory;    // If true, 'internal_memory' pointer is not used.
+  int is_external_memory;    // If non-zero, 'internal_memory' pointer is not
+                             // used. If value is '2' or more, the external
+                             // memory is considered 'slow' and multiple
+                             // read/write will be avoided.
   union {
     WebPRGBABuffer RGBA;
     WebPYUVABuffer YUVA;
@@ -205,7 +208,7 @@ struct WebPDecBuffer {
   uint32_t       pad[4];     // padding for later use
 
   uint8_t* private_memory;   // Internally allocated memory (only when
-                             // is_external_memory is false). Should not be used
+                             // is_external_memory is 0). Should not be used
                              // externally, but accessed via the buffer union.
 };
 
@@ -269,7 +272,7 @@ typedef enum VP8StatusCode {
 // that of the returned WebPIDecoder object.
 // The supplied 'output_buffer' content MUST NOT be changed between calls to
 // WebPIAppend() or WebPIUpdate() unless 'output_buffer.is_external_memory' is
-// set to 1. In such a case, it is allowed to modify the pointers, size and
+// not set to 0. In such a case, it is allowed to modify the pointers, size and
 // stride of output_buffer.u.RGBA or output_buffer.u.YUVA, provided they remain
 // within valid bounds.
 // All other fields of WebPDecBuffer MUST remain constant between calls.
@@ -468,16 +471,18 @@ static WEBP_INLINE int WebPInitDecoderConfig(WebPDecoderConfig* config) {
 // parameter, in which case the features will be parsed and stored into
 // config->input. Otherwise, 'data' can be NULL and no parsing will occur.
 // Note that 'config' can be NULL too, in which case a default configuration
-// is used.
+// is used. If 'config' is not NULL, it must outlive the WebPIDecoder object
+// as some references to its fields will be used. No internal copy of 'config'
+// is made.
 // The return WebPIDecoder object must always be deleted calling WebPIDelete().
 // Returns NULL in case of error (and config->status will then reflect
-// the error condition).
+// the error condition, if available).
 WEBP_EXTERN(WebPIDecoder*) WebPIDecode(const uint8_t* data, size_t data_size,
                                        WebPDecoderConfig* config);
 
 // Non-incremental version. This version decodes the full data at once, taking
 // 'config' into account. Returns decoding status (which should be VP8_STATUS_OK
-// if the decoding was successful).
+// if the decoding was successful). Note that 'config' cannot be NULL.
 WEBP_EXTERN(VP8StatusCode) WebPDecode(const uint8_t* data, size_t data_size,
                                       WebPDecoderConfig* config);
 
diff --git a/src/3rdparty/libwebp/src/webp/encode.h b/src/3rdparty/libwebp/src/webp/encode.h
index c382ea7..9291b71 100644
--- a/src/3rdparty/libwebp/src/webp/encode.h
+++ b/src/3rdparty/libwebp/src/webp/encode.h
@@ -134,8 +134,8 @@ struct WebPConfig {
   int thread_level;       // If non-zero, try and use multi-threaded encoding.
   int low_memory;         // If set, reduce memory usage (but increase CPU use).
 
-  int near_lossless;      // Near lossless encoding [0 = off(default) .. 100].
-                          // This feature is experimental.
+  int near_lossless;      // Near lossless encoding [0 = max loss .. 100 = off
+                          // (default)].
   int exact;              // if non-zero, preserve the exact RGB values under
                           // transparent area. Otherwise, discard this invisible
                           // RGB information for better compression. The default
diff --git a/src/plugins/imageformats/icns/main.cpp b/src/plugins/imageformats/icns/main.cpp
index d25d884..70dd034 100644
--- a/src/plugins/imageformats/icns/main.cpp
+++ b/src/plugins/imageformats/icns/main.cpp
@@ -38,10 +38,10 @@
 **
 ****************************************************************************/
 
-#ifndef QT_NO_IMAGEFORMATPLUGIN
-
+#include <QtGui/qimageiohandler.h>
 #include "qicnshandler_p.h"
 
+#ifndef QT_NO_IMAGEFORMATPLUGIN
 #ifndef QT_NO_DATASTREAM
 
 QT_BEGIN_NAMESPACE
author	Liang Qi <liang.qi@qt.io>	2017-03-06 14:33:16 +0100
committer	Liang Qi <liang.qi@qt.io>	2017-03-06 14:33:34 +0100
commit	f2dbc67c2b032a5f27d0224e020fb6dfcd3fd142 (patch)
tree	c5c195998f538fd7b9aa1382ced53caf2dec3926 /src
parent	d2306d74850986692c02b70df0d7a6a6e933d0dc (diff)
parent	03e507c8f3816053257b3c351d79c494b6f9174d (diff)
download	qtimageformats-f2dbc67c2b032a5f27d0224e020fb6dfcd3fd142.tar.gz