delta/ffmpeg.git - git.ffmpeg.org: ffmpeg.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	aarch64/opusdsp: do not clobber register v8	Lynne	2019-08-15	1	-4/+4
\| \| \| \|	A part of v8-v15 needs to be preserved across calls.
*	aarch64/asm-offsets: remove old CELT offsets	Lynne	2019-05-14	1	-8/+0
\| \| \| \|	They're not used and they're incorrect.
*	aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis	Lynne	2019-04-10	3	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	153372 UNITS in postfilter_c, 65536 runs, 0 skips 73164 UNITS in postfilter_neon, 65536 runs, 0 skips -> 2.1x speedup 80591 UNITS in deemphasis_c, 131072 runs, 0 skips 43969 UNITS in deemphasis_neon, 131072 runs, 0 skips -> 1.83x speedup Total decoder speedup: ~15% on a Raspberry Pi 3 (from 28.1x to 33.5x realtime) Deemphasis SIMD based on the following unrolling: const float c1 = CELT_EMPH_COEFF, c2 = c1c1, c3 = c2c1, c4 = c3c1; float state = coeff; for (int i = 0; i < len; i += 4) { y[0] = x[0] + c1state; y[1] = x[1] + c2state + c1x[0]; y[2] = x[2] + c3state + c1x[1] + c2x[0]; y[3] = x[3] + c4state + c1x[2] + c2x[1] + c3*x[0]; state = y[3]; y += 4; x += 4; } Unlike the x86 version, duplication is used instead of pslldq so the structure and tables are different.
*	Merge commit '186bd30aa3b6c2b29b4dbf18278700b572068b1e'	James Almer	2019-03-14	2	-6/+42
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	* commit '186bd30aa3b6c2b29b4dbf18278700b572068b1e': h264/arm64: implement missing 4:2:2 chroma loop filter neon functions Merged-by: James Almer <jamrial@gmail.com>
\| *	h264/arm64: implement missing 4:2:2 chroma loop filter neon functions	Janne Grunau	2019-02-27	2	-8/+46
\| \|
* \|	Merge commit '7e42d5f0ab2aeac811fd01e122627c9198b13f01'	James Almer	2019-03-14	1	-24/+25
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '7e42d5f0ab2aeac811fd01e122627c9198b13f01': aarch64: vp8: Optimize vp8_idct_add_neon for aarch64 Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Optimize vp8_idct_add_neon for aarch64	Martin Storsjö	2019-02-19	1	-24/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead of packing data together (which could be done for free in the arm version). This gives a decent speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on Cortex A73. Before: Cortex A53 A72 A73 vp8_idct_add_neon: 79.7 67.5 65.0 After: vp8_idct_add_neon: 67.7 64.8 66.7 Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit '49f9c4272c4029b57ff300d908ba03c6332fc9c4'	James Almer	2019-03-14	1	-4/+4
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '49f9c4272c4029b57ff300d908ba03c6332fc9c4': aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neon Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neon	Martin Storsjö	2019-02-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original arm version didn't do saturation here. This probably doesn't make any difference for performance, but reduces the differences. Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit '37394ef01b040605f8e1c98e73aa12b1c0bcba07'	James Almer	2019-03-14	1	-24/+10
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '37394ef01b040605f8e1c98e73aa12b1c0bcba07': aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2	Martin Storsjö	2019-02-19	1	-24/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes it similar to put_epel16_v6, and gives a large speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on A73. Before: Cortex A53 A72 A73 vp8_put_epel16_h6v6_neon: 2211.4 1586.5 1431.7 After: vp8_put_epel16_h6v6_neon: 1736.9 1522.0 1448.1 Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit 'e39a9212ab37a55b346801c77487d8a47b6f9fe2'	James Almer	2019-03-14	3	-0/+329
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit 'e39a9212ab37a55b346801c77487d8a47b6f9fe2': aarch64: vp8: Port bilin functions from arm version Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Port bilin functions from arm version	Martin Storsjö	2019-02-19	3	-0/+329
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cortex A53 A72 A73 vp8_put_bilin4_h_c: 303.8 102.2 161.8 vp8_put_bilin4_h_neon: 100.0 40.9 41.2 vp8_put_bilin4_hv_c: 322.8 201.0 305.9 vp8_put_bilin4_hv_neon: 156.8 72.6 77.0 vp8_put_bilin4_v_c: 304.7 101.7 166.5 vp8_put_bilin4_v_neon: 82.7 41.2 33.0 vp8_put_bilin8_h_c: 1192.7 352.5 623.8 vp8_put_bilin8_h_neon: 213.5 70.2 87.8 vp8_put_bilin8_hv_c: 1098.6 769.2 1041.9 vp8_put_bilin8_hv_neon: 324.0 123.5 146.0 vp8_put_bilin8_v_c: 1193.9 350.4 617.7 vp8_put_bilin8_v_neon: 183.9 60.7 64.7 vp8_put_bilin16_h_c: 2353.1 671.2 1223.3 vp8_put_bilin16_h_neon: 261.9 140.7 145.0 vp8_put_bilin16_hv_c: 2453.2 1470.9 2355.2 vp8_put_bilin16_hv_neon: 383.9 196.0 217.0 vp8_put_bilin16_v_c: 2349.3 669.8 1251.2 vp8_put_bilin16_v_neon: 202.9 110.7 96.2 Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit '58d154922707bfeb873cb3a7476e0f94b17463dd'	James Almer	2019-03-14	2	-0/+294
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '58d154922707bfeb873cb3a7476e0f94b17463dd': aarch64: vp8: Port epel4 functions from arm version Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Port epel4 functions from arm version	Martin Storsjö	2019-02-19	2	-0/+294
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cortex A53 A72 A73 vp8_put_epel4_h4_c: 631.4 291.7 367.8 vp8_put_epel4_h4_neon: 241.0 131.0 155.7 vp8_put_epel4_h4v4_c: 967.5 529.3 667.7 vp8_put_epel4_h4v4_neon: 429.3 241.8 279.7 vp8_put_epel4_h4v6_c: 1374.7 657.5 864.5 vp8_put_epel4_h4v6_neon: 515.5 295.5 334.7 vp8_put_epel4_h6_c: 851.0 421.0 486.0 vp8_put_epel4_h6_neon: 321.5 195.0 217.7 vp8_put_epel4_h6v4_c: 1111.3 621.1 781.2 vp8_put_epel4_h6v4_neon: 539.2 328.0 365.3 vp8_put_epel4_h6v6_c: 1561.3 763.3 999.7 vp8_put_epel4_h6v6_neon: 645.5 401.0 434.7 vp8_put_epel4_v4_c: 663.8 298.3 357.0 vp8_put_epel4_v4_neon: 116.0 81.5 72.5 vp8_put_epel4_v6_c: 870.5 437.0 507.4 vp8_put_epel4_v6_neon: 147.7 108.8 92.0 Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit 'cc7ba00c35faf0478f1f56215e926f70ccb31282'	James Almer	2019-03-14	2	-0/+91
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit 'cc7ba00c35faf0478f1f56215e926f70ccb31282': aarch64: vp8: Port missing epel8 functions from arm version Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Port missing epel8 functions from arm version	Martin Storsjö	2019-02-19	2	-0/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cortex A53 A72 A73 vp8_put_epel8_h4_c: 2594.8 1159.6 1374.8 vp8_put_epel8_h4_neon: 506.4 244.2 314.0 vp8_put_epel8_h6_c: 3445.8 1677.1 1811.3 vp8_put_epel8_h6_neon: 634.4 371.7 433.0 vp8_put_epel8_v4_c: 2614.0 1174.8 1378.0 vp8_put_epel8_v4_neon: 321.0 221.7 235.8 vp8_put_epel8_v6_c: 3635.5 1703.0 2079.2 vp8_put_epel8_v6_neon: 416.9 317.0 295.5 Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit '52c9b0a6c0d02cff6caebcf6989e565e05b55200'	James Almer	2019-03-14	2	-0/+112
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '52c9b0a6c0d02cff6caebcf6989e565e05b55200': aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version	Martin Storsjö	2019-02-19	2	-0/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cortex A53 A72 A73 vp8_luma_dc_wht_c: 115.7 75.7 90.7 vp8_luma_dc_wht_neon: 60.7 41.2 45.7 vp8_idct_dc_add4uv_c: 376.1 262.9 282.5 vp8_idct_dc_add4uv_neon: 52.0 29.0 37.0 Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit 'c513fcd7d235aa4cef45a6c3125bd4dcc03bf276'	James Almer	2019-03-14	1	-1/+1
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit 'c513fcd7d235aa4cef45a6c3125bd4dcc03bf276': aarch64: vp8: Fix a typo in a comment Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Fix a typo in a comment	Martin Storsjö	2019-02-19	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit 'f1011ea28a4048ddec97794ca3e9901474fe055f'	James Almer	2019-03-14	1	-4/+4
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit 'f1011ea28a4048ddec97794ca3e9901474fe055f': aarch64: vp8: Reorder the function pointer inits to match the arm original Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Reorder the function pointer inits to match the arm original	Martin Storsjö	2019-02-19	1	-4/+4
\| \| \| \| \| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
\| *	aarch64: vp8: Move the vp8dsp makefile entries to the right places	Martin Storsjö	2019-02-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if NEON would be disabled, the init functions should be built as they are called as long as ARCH_AARCH64 is set. These functions are part of a generic DSP subsytem, not tied directly to one decoder. (They should be built if the vp7 decoder is enabled, even if the vp8 decoder is disabled.) Signed-off-by: Martin Storsjö <martin@martin.st>
\| *	aarch64: vp8: Remove superfluous includes	Martin Storsjö	2019-02-19	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes building with MSVC, which lacks unistd.h. Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit '85bfaa4949f4afcde19061def3e8a18988964858'	James Almer	2019-03-14	1	-14/+14
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '85bfaa4949f4afcde19061def3e8a18988964858': aarch64: vp8: Use the proper aarch64 form for conditional branches Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: vp8: Use the proper aarch64 form for conditional branches	Martin Storsjö	2019-02-19	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous form also does seem to assemble on current tools, but I think it might fail on some older aarch64 tools. Signed-off-by: Martin Storsjö <martin@martin.st>
\| *	aarch64: vp8: Fix assembling with armasm64	Martin Storsjö	2019-02-19	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
\| *	aarch64: vp8: Fix assembling with clang	Martin Storsjö	2019-02-19	1	-69/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This also partially fixes assembling with MS armasm64 (via gas-preprocessor). The movrel macro invocations need to pass the offset via a separate parameter. Mach-o and COFF relocations don't allow a negative offset to a symbol, which is handled properly if the offset is passed via the parameter. If no offset parameter is given, the macro evaluates to something like "adrp x17, subpel_filters-16+(0)", which older clang versions also fail to parse (the older clang versions only support one single offset term, although it can be a parenthesis. Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	Merge commit '0801853e640624537db386727b36fa97aa6258e7'	James Almer	2019-03-14	1	-4/+2
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '0801853e640624537db386727b36fa97aa6258e7': libavcodec: vp8 neon optimizations for aarch64 See 833fed5253617924c41132e0ab261c1d8c076360 Merged-by: James Almer <jamrial@gmail.com>
\| *	libavcodec: vp8 neon optimizations for aarch64	Magnus Röös	2019-02-19	4	-0/+1182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Partial port of the ARM Neon for aarch64. Benchmarks from fate: benchmarking with Linux Perf Monitoring API nop: 58.6 checkasm: using random seed 1760970128 NEON: - vp8dsp.idct [OK] - vp8dsp.mc [OK] - vp8dsp.loopfilter [OK] checkasm: all 21 tests passed vp8_idct_add_c: 201.6 vp8_idct_add_neon: 83.1 vp8_idct_dc_add_c: 107.6 vp8_idct_dc_add_neon: 33.8 vp8_idct_dc_add4y_c: 426.4 vp8_idct_dc_add4y_neon: 59.4 vp8_loop_filter8uv_h_c: 688.1 vp8_loop_filter8uv_h_neon: 216.3 vp8_loop_filter8uv_inner_h_c: 649.3 vp8_loop_filter8uv_inner_h_neon: 195.3 vp8_loop_filter8uv_inner_v_c: 544.8 vp8_loop_filter8uv_inner_v_neon: 131.3 vp8_loop_filter8uv_v_c: 706.1 vp8_loop_filter8uv_v_neon: 141.1 vp8_loop_filter16y_h_c: 668.8 vp8_loop_filter16y_h_neon: 242.8 vp8_loop_filter16y_inner_h_c: 647.3 vp8_loop_filter16y_inner_h_neon: 224.6 vp8_loop_filter16y_inner_v_c: 647.8 vp8_loop_filter16y_inner_v_neon: 128.8 vp8_loop_filter16y_v_c: 721.8 vp8_loop_filter16y_v_neon: 154.3 vp8_loop_filter_simple_h_c: 387.8 vp8_loop_filter_simple_h_neon: 187.6 vp8_loop_filter_simple_v_c: 384.1 vp8_loop_filter_simple_v_neon: 78.6 vp8_put_epel8_h4v4_c: 3971.1 vp8_put_epel8_h4v4_neon: 855.1 vp8_put_epel8_h4v6_c: 5060.1 vp8_put_epel8_h4v6_neon: 989.6 vp8_put_epel8_h6v4_c: 4320.8 vp8_put_epel8_h6v4_neon: 1007.3 vp8_put_epel8_h6v6_c: 5449.3 vp8_put_epel8_h6v6_neon: 1158.1 vp8_put_epel16_h6_c: 6683.8 vp8_put_epel16_h6_neon: 831.8 vp8_put_epel16_h6v6_c: 11110.8 vp8_put_epel16_h6v6_neon: 2214.8 vp8_put_epel16_v6_c: 7024.8 vp8_put_epel16_v6_neon: 799.6 vp8_put_pixels8_c: 112.8 vp8_put_pixels8_neon: 78.1 vp8_put_pixels16_c: 131.3 vp8_put_pixels16_neon: 129.8 This contains a fix to include guards by Carl Eugen Hoyos. Signed-off-by: Martin Storsjö <martin@martin.st>
* \|	lavc/aarch64/h264dsp_init: Only use neon horizontal intra loopfilter for 4:2:0.	Carl Eugen Hoyos	2019-02-20	1	-4/+5
\| \|
* \|	aarch64/h264dsp: change loop filter stride argument to ptrdiff_t	James Almer	2019-02-20	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	This was missed in d5d699ab6e6f8a8290748d107416fd5c19757a1b Signed-off-by: James Almer <jamrial@gmail.com>
* \|	Merge commit '28a8b5413b64b831dfb8650208bccd8b78360484'	James Almer	2019-02-20	2	-0/+313
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '28a8b5413b64b831dfb8650208bccd8b78360484': h264/aarch64: add intra loop filter neon asm Merged-by: James Almer <jamrial@gmail.com>
\| *	h264/aarch64: add intra loop filter neon asm	Janne Grunau	2019-01-26	2	-0/+313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported (x264 uses nv12 chroma) and optimized. Cycle count for checkasm --bench on a Snapdragon 820e: h264_h_loop_filter_luma_intra_8bpp_c: 60.0 h264_h_loop_filter_luma_intra_8bpp_neon: 54.2 h264_v_loop_filter_luma_intra_8bpp_c: 148.3 h264_v_loop_filter_luma_intra_8bpp_neon: 73.8 h264_h_loop_filter_chroma_intra_8bpp_c: 27.8 h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4 h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8 h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7 h264_v_loop_filter_chroma_intra_8bpp_c: 45.8 h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
* \|	Merge commit '846c3d6aca5484904e60946c4fe8b8833bc07f92'	James Almer	2019-02-20	1	-14/+19
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '846c3d6aca5484904e60946c4fe8b8833bc07f92': h264/aarch64: optimize neon loop filter Merged-by: James Almer <jamrial@gmail.com>
\| *	h264/aarch64: optimize neon loop filter	Janne Grunau	2019-01-26	1	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Exit as soon as possible if no filtering will be done. Improves the checkasm --bench cycle count on a Snapdragon 820e: h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5 h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3 h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5 h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9 h264_h_loop_filter_chroma_8bpp_c: 30.2 -> 30.3 h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7 h264_v_loop_filter_chroma_8bpp_c: 57.3 -> 57.3 h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0
* \|	Merge commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22'	James Almer	2019-02-20	1	-0/+3
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22': h264/aarch64: sign extend int stride in loop filter asm Merged-by: James Almer <jamrial@gmail.com>
\| *	h264/aarch64: sign extend int stride in loop filter asm	Janne Grunau	2019-01-26	1	-0/+3
\| \|
* \|	aarch64: vp8: Move the vp8dsp makefile entries to the right places	Martin Storsjö	2019-02-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if NEON would be disabled, the init functions should be built as they are called as long as ARCH_AARCH64 is set. These functions are part of a generic DSP subsytem, not tied directly to one decoder. (They should be built if the vp7 decoder is enabled, even if the vp8 decoder is disabled.) Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit b4b27dce95a6d40bfcd78043d3abec7d80dae143)
* \|	aarch64: vp8: Remove superfluous includes	Martin Storsjö	2019-02-19	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes building with MSVC, which lacks unistd.h. Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit ad32f7b1264dbc614f0db1c443d5361420e9e07e)
* \|	aarch64: vp8: Fix assembling with armasm64	Martin Storsjö	2019-02-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit 2eeac79936e83c4495cbe5905064ab797e9b45ff)
* \|	aarch64: vp8: Fix assembling with clang	Martin Storsjö	2019-02-19	1	-69/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This also partially fixes assembling with MS armasm64 (via gas-preprocessor). The movrel macro invocations need to pass the offset via a separate parameter. Mach-o and COFF relocations don't allow a negative offset to a symbol, which is handled properly if the offset is passed via the parameter. If no offset parameter is given, the macro evaluates to something like "adrp x17, subpel_filters-16+(0)", which older clang versions also fail to parse (the older clang versions only support one single offset term, although it can be a parenthesis. Signed-off-by: Martin Storsjö <martin@martin.st> (cherry picked from commit 26d7af4c381ee3c7b13b032b3817168b84b98ca6)
* \|	lavc/aarch64/vp8dsp: Fix the include guard.	Carl Eugen Hoyos	2019-01-31	1	-3/+3
\| \| \| \| \| \| \| \|	Fixes fate-source.
* \|	libavcodec: vp8 neon optimizations for aarch64	Magnus Röös	2019-01-31	4	-0/+1184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Partial port of the ARM Neon for aarch64. Benchmarks from fate: benchmarking with Linux Perf Monitoring API nop: 58.6 checkasm: using random seed 1760970128 NEON: - vp8dsp.idct [OK] - vp8dsp.mc [OK] - vp8dsp.loopfilter [OK] checkasm: all 21 tests passed vp8_idct_add_c: 201.6 vp8_idct_add_neon: 83.1 vp8_idct_dc_add_c: 107.6 vp8_idct_dc_add_neon: 33.8 vp8_idct_dc_add4y_c: 426.4 vp8_idct_dc_add4y_neon: 59.4 vp8_loop_filter8uv_h_c: 688.1 vp8_loop_filter8uv_h_neon: 216.3 vp8_loop_filter8uv_inner_h_c: 649.3 vp8_loop_filter8uv_inner_h_neon: 195.3 vp8_loop_filter8uv_inner_v_c: 544.8 vp8_loop_filter8uv_inner_v_neon: 131.3 vp8_loop_filter8uv_v_c: 706.1 vp8_loop_filter8uv_v_neon: 141.1 vp8_loop_filter16y_h_c: 668.8 vp8_loop_filter16y_h_neon: 242.8 vp8_loop_filter16y_inner_h_c: 647.3 vp8_loop_filter16y_inner_h_neon: 224.6 vp8_loop_filter16y_inner_v_c: 647.8 vp8_loop_filter16y_inner_v_neon: 128.8 vp8_loop_filter16y_v_c: 721.8 vp8_loop_filter16y_v_neon: 154.3 vp8_loop_filter_simple_h_c: 387.8 vp8_loop_filter_simple_h_neon: 187.6 vp8_loop_filter_simple_v_c: 384.1 vp8_loop_filter_simple_v_neon: 78.6 vp8_put_epel8_h4v4_c: 3971.1 vp8_put_epel8_h4v4_neon: 855.1 vp8_put_epel8_h4v6_c: 5060.1 vp8_put_epel8_h4v6_neon: 989.6 vp8_put_epel8_h6v4_c: 4320.8 vp8_put_epel8_h6v4_neon: 1007.3 vp8_put_epel8_h6v6_c: 5449.3 vp8_put_epel8_h6v6_neon: 1158.1 vp8_put_epel16_h6_c: 6683.8 vp8_put_epel16_h6_neon: 831.8 vp8_put_epel16_h6v6_c: 11110.8 vp8_put_epel16_h6v6_neon: 2214.8 vp8_put_epel16_v6_c: 7024.8 vp8_put_epel16_v6_neon: 799.6 vp8_put_pixels8_c: 112.8 vp8_put_pixels8_neon: 78.1 vp8_put_pixels16_c: 131.3 vp8_put_pixels16_neon: 129.8 Signed-off-by: Magnus Röös <mla2.roos@gmail.com>
* \|	libavcodec: Remove dynamic relocs from aarch64/h264idct_neon.S	Manoj Gupta	2019-01-03	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the assembly functions e.g. ff_h264_idct_dc_add_neon has code like: movrel x14, X(ff_h264_idct_add_neon) Linker cannot resolve them fully at link time and emits dynamic relocations. Use explicit labels instead so that no dynamic relocations are needed at all. This avoids lld complains about text relocations. For background, see https://crbug.com/917919 Signed-off-by: Manoj Gupta <manojgupta@chromium.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* \|	lavc/aarch64/h264dsp_init_aarch64: Fix weight function prototypes.	Carl Eugen Hoyos	2018-07-13	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes the following warnings: libavcodec/aarch64/h264dsp_init_aarch64.c: In function ‘ff_h264dsp_init_aarch64’: libavcodec/aarch64/h264dsp_init_aarch64.c:84:38: warning: assignment from incompatible pointer type [enabled by default] c->weight_h264_pixels_tab[0] = ff_weight_h264_pixels_16_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:85:38: warning: assignment from incompatible pointer type [enabled by default] c->weight_h264_pixels_tab[1] = ff_weight_h264_pixels_8_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:86:38: warning: assignment from incompatible pointer type [enabled by default] c->weight_h264_pixels_tab[2] = ff_weight_h264_pixels_4_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:88:40: warning: assignment from incompatible pointer type [enabled by default] c->biweight_h264_pixels_tab[0] = ff_biweight_h264_pixels_16_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:89:40: warning: assignment from incompatible pointer type [enabled by default] c->biweight_h264_pixels_tab[1] = ff_biweight_h264_pixels_8_neon; ^ libavcodec/aarch64/h264dsp_init_aarch64.c:90:40: warning: assignment from incompatible pointer type [enabled by default] c->biweight_h264_pixels_tab[2] = ff_biweight_h264_pixels_4_neon; ^
* \|	lavc/aarch64/sbrdsp_neon: fix build on old binutils	Rodger Combs	2018-01-26	1	-1/+1
\| \|
* \|	Merge commit '732510636e597585a79be7d111c88b3f7e174fe7'	James Almer	2017-11-11	1	-2/+2
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \|	* commit '732510636e597585a79be7d111c88b3f7e174fe7': aarch64: Remove a dot from a label Merged-by: James Almer <jamrial@gmail.com>
\| *	aarch64: Remove a dot from a label	Martin Storsjö	2017-10-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes building with armasm64 (when run through gas-preprocessor). Signed-off-by: Martin Storsjö <martin@martin.st>