delta/netsurf/libnsgif.git - git.netsurf-browser.org: libnsgif.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	GIF: Don't error for a final partial frame.	Michael Drake	2022-02-26	1	-3/+2
\|
*	API: Consistent typedef naming convention.	Michael Drake	2022-02-26	1	-14/+16
\|
*	API: Add public access function for frame information.	Michael Drake	2022-02-26	1	-53/+68
\|
*	GIF: Rename gif source data buffer members.	Michael Drake	2022-02-26	1	-26/+27
\|
*	GIF: Minimum gif delay shouldn't be public.	Michael Drake	2022-02-26	1	-2/+4
\|
*	GIF: First frame establishes a redraw rectangle.	Michael Drake	2022-02-26	1	-1/+0
\|
*	API: Clean up error codes and add nsgif_strerror function.	Michael Drake	2022-02-26	1	-94/+118
\|
*	API: Clean up client bitmap interface.	Michael Drake	2022-02-24	1	-4/+4
\|
*	GIF: Don't need previous frame size now.	Michael Drake	2022-02-24	1	-21/+4
\| \| \| \|	The image dimensions are constant now.
*	GIF: Avoid two colour table allocations.	Michael Drake	2022-02-24	1	-28/+9
\| \| \| \| \|	Just having the arrays in the nsgif object reduces allocations and improves cache locality.
*	API: Rework library API to keep internal workings private.	Michael Drake	2022-02-23	1	-95/+384
\|
*	API: Rename nsgif_animation to nsgif.	Michael Drake	2022-02-23	1	-27/+27
\|
*	API: Split out redraw rectangle.	Michael Drake	2022-02-23	1	-12/+12
\|
*	GIF: Remove redundant comment.	Michael Drake	2022-02-23	1	-1/+0
\|
*	GIF: Scope reduce frame interlace mask.	Michael Drake	2022-02-23	1	-2/+5
\|
*	GIF: Scope reduce graphic control extension masks.	Michael Drake	2022-02-23	1	-4/+7
\|
*	GIF: Scope reduce gif extension identifier magic values.	Michael Drake	2022-02-23	1	-9/+11
\|
*	LIB: Rename public header and source file.	Michael Drake	2022-02-23	2	-2/+2
\| \| \| \| \|	This allows the library header to have a name consistent with the library soname.
*	LIB: Use consistent nsgif_ / NSGIF_ namespace.	Michael Drake	2022-02-23	1	-320/+319
\|
*	GIF: Rename flag for whether the frame image has been decoded.	Michael Drake	2022-02-21	1	-3/+3
\|
*	gif: Fix graphic control extension handler comment.	Michael Drake	2022-02-21	1	-1/+1
\|
*	gif: Frame parse: Only grow image to accommodate first frame.	Michael Drake	2022-02-21	1	-3/+9
\| \| \| \|	This matches the behaviour of other browsers.
*	gif: Background restoration: Add support for clipped frames.	Michael Drake	2022-02-21	1	-1/+4
\|
*	gif: decode: Add support for clipped frames.	Michael Drake	2022-02-21	1	-4/+75
\|
*	GIF: Reorder #includes.	Michael Drake	2021-11-24	1	-4/+4
\|
*	LOG: Remove unused logging code.	Michael Drake	2021-11-24	2	-24/+1
\|
*	API: Use stdint types.	Michael Drake	2021-11-24	1	-1/+1
\|
*	GIF: Style: Use double underscore for remaining static functions.	Michael Drake	2021-11-24	1	-8/+14
\|
*	GIF: Update code documentation.	Michael Drake	2021-11-24	1	-7/+14
\|
*	GIF: Delay LZW creation until frame proccessing.	Michael Drake	2021-11-20	1	-8/+8
\|
*	GIF: Simplify check for no frame data.	Michael Drake	2021-11-20	1	-9/+4
\|
*	GIF: Split out logical screen descriptor extraction.	Michael Drake	2021-11-19	1	-27/+46
\|
*	LZW: Remove double blank lines.	Michael Drake	2021-11-19	2	-8/+0
\|
*	GIF: Remove double blank lines.	Michael Drake	2021-11-19	1	-3/+0
\|
*	GIF: Initilisation: Make frame initialisation loop more readable.	Michael Drake	2021-11-19	1	-1/+3
\|
*	GIF: Unify insufficient data error codes.	Michael Drake	2021-11-19	1	-33/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no difference in what the client needs to do. If there are displayable frames, they can display them. Otherwise more data is needed. Internally only `GIF_INSUFFICIENT_DATA` is used now. To remove the `GIF_INSUFFICIENT_FRAME_DATA` could make existing client applications fail to compile, so it is left as an alias to the same value. At some point the API will be changed drastically, but for now I want existing applications to still build.
*	GIF: Use common colour table code for global colour table.	Michael Drake	2021-11-19	1	-22/+12
\|
*	GIF: Split out colour table extraction.	Michael Drake	2021-11-19	1	-21/+48
\| \| \| \| \|	This will allow the local and global colour table extraction to share code.
*	GIF: Split header reader out into helper function.	Michael Drake	2021-11-19	1	-16/+44
\|
*	Add myself to the copyright headers.	Michael Drake	2021-11-19	2	-0/+2
\|
*	GIF: Initialisation: Remove misleading comment.	Michael Drake	2021-11-18	1	-1/+0
\|
*	GIF: Optimise deinterlacing of frame rows.	Michael Drake	2021-11-08	1	-23/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original interlacing handling looked like this: ```c static inline uint32_t f1(int height, int y) { if ((y << 3) < height) { return (y << 3); /* Pass 1 / } y -= ((height + 7) >> 3); if ((y << 3) < (height - 4)) { return (y << 3) + 4; / Pass 2 / } y -= ((height + 3) >> 3); if ((y << 2) < (height - 2)) { return (y << 2) + 2; / Pass 3 / } y -= ((height + 1) >> 2); return (y << 1) + 1; / Pass 4 / } ``` Annotated with comments to show the passes. Thought it should be possible to do the job without all the calculations and subtractions from `y`, and I also didn't like that it was not super obvious how it aligned with the spec or how it dealt with pass 4 last, when it is the most common case, happening for 50% of image rows. See the following excerpt from from the GIF89a Specification. > The rows of an Interlaced images are arranged in the following order: > > Group 1 : Every 8th. row, starting with row 0. (Pass 1) > Group 2 : Every 8th. row, starting with row 4. (Pass 2) > Group 3 : Every 4th. row, starting with row 2. (Pass 3) > Group 4 : Every 2nd. row, starting with row 1. (Pass 4) > > The Following example illustrates how the rows of an interlaced image are > ordered. > > Row Number Interlace Pass > > 0 ----------------------------------------- 1 > 1 ----------------------------------------- 4 > 2 ----------------------------------------- 3 > 3 ----------------------------------------- 4 > 4 ----------------------------------------- 2 > 5 ----------------------------------------- 4 > 6 ----------------------------------------- 3 > 7 ----------------------------------------- 4 > 8 ----------------------------------------- 1 > 9 ----------------------------------------- 4 > 10 ----------------------------------------- 3 > 11 ----------------------------------------- 4 > 12 ----------------------------------------- 2 > 13 ----------------------------------------- 4 > 14 ----------------------------------------- 3 > 15 ----------------------------------------- 4 > 16 ----------------------------------------- 1 > 17 ----------------------------------------- 4 > 18 ----------------------------------------- 3 > 19 ----------------------------------------- 4 -- https://www.w3.org/Graphics/GIF/spec-gif89a.txt To test performance I ran it for all possible GIF heights: ```c for (unsigned height = 1; height < UINT16_MAX; height++) { for (uint32_t y = 0; y < height; y++) { use_val(f1(height, y)); } } ``` For the original implementation (`f1`), I got the following performance: \| Compiler \| Time \| \| -------- \| ------ \| \| Clang \| 3.830s \| \| GCC \| 4.737s \| All tests used GCC 11.2 and Clang 13.0, with `-O2` optimisation. Next I rewrote it to more closely resemble my reading of the spec. The compiler can convert the multiplies to shifts and I found this more readable than `f1`, and more closely matching the spec. The `height / N N` bits all use powers of two for N so they boil down the a simple bitwise AND with an appropriate mask to round down to the nearest multiple of N. ```c static inline uint32_t f2(uint32_t height, uint32_t y) { height--; if (y * 8 <= height) { return y * 8; /* Pass 1 / } if (y 4 <= height) { return y * 8 - height / 8 * 8 - 4; /* Pass 2 / } if (y 2 <= height) { return y * 4 - height / 4 * 4 - 2; /* Pass 3 / } return y 2 - height / 2 * 2 - 1; /* Pass 4 / } ``` For `f2`, I got the following performance: \| Compiler \| Time \| \| -------- \| ------ \| \| Clang \| 2.758s \| \| GCC \| 3.909s \| So it's faster than `f1`, and Clang is still doing better than GCC. After that I reversed the order, so that pass 4 would be handled first. ```c static inline uint32_t f3(uint32_t height, uint32_t y) { height--; if (y 2 > height) { return y * 2 - height / 2 * 2 - 1; } if (y * 4 > height) { return y * 4 - height / 4 * 4 - 2; } if (y * 8 > height) { return y * 8 - height / 8 * 8 - 4; } return y * 8; } ``` For `f3`, I got the following performance: \| Compiler \| Time \| \| -------- \| ------ \| \| Clang \| 3.329s \| \| GCC \| 3.630s \| Interestingly this was consistently faster than `f2` with GCC, but slower than `f2` when built with Clang. (It's still faster than the original `f1` with both compilers.) I thought it might be because the branches in `f2` are more predictable than in `f3`. For example the first branch in `f2` is only true one eighth of the time, while all the branches in `f3` are 50:50. However, running under `perf` showed this was not the case. Actually the faster `f2` for Clang had more branch misses than the Clang `f3`. \| Compiler \| Implementation \| Branches \| Branch misses \| \| -------- \| -------------- \| -------------- \| ------------- \| \| Clang \| `f2` \| 13,690,378,148 \| 656,517 \| \| Clang \| `f3` \| 10,738,688,711 \| 275,147 \| \| GCC \| `f2` \| 10,738,659,995 \| 336,992 \| \| GCC \| `f3` \| 10,201,842,339 \| 1,314,445 \| After this I tried changing the implementation completely. All of the previous implementations were based on the idea of mapping from an uninterlaced row number to an interlaced one. ```c static inline bool f4(uint32_t height, uint32_t y, uint8_t pass) { y += pass == 0 ? 8 : 16 >> pass; if (y < height) return true; switch (pass) { case 0: pass += 1; y = 4; if (y < height) return true; case 1: pass += 1; y = 2; if (y < height) return true; case 2: pass += 1; y = 1; return true; } return false; } ``` With `f4` we simply step to the next line in interlaced order. The `16 >> pass` generates the correct step size for all but the first pass, so the first pass is special cased. The switch statement is for handling advancing from one pass to the next, so it is reached a maximum of four times per frame. The rest of the time, we return on the `if (y < height) return true;` above. The switch falls through to handle the case where the frame height is less than the start row for the given pass. For `f4`, I got the following performance: \| Compiler \| Time \| \| -------- \| ------ \| \| Clang \| 2.315s \| \| GCC \| 3.479s \| Again, this was tested for all possible GIF heights, but the `for` loop for rows changes to a do-while loop: ```c for (unsigned height = 1; height < UINT16_MAX; height++) { uint8_t pass = 0; uint32_t y = 0; do { use_val(y); } while (f4(height, &y, &pass)); } ``` This is the fastest yet for both compilers, with Clang still performing best. One thing I didn't like about `f4` is that it calculates the y step size every call, although the step only actually changes three times, when we change to the next pass. So for `f5` the switch was updated to set the step size. ```c static inline bool f5(uint32_t height, uint32_t y, uint8_t step) { y += step & 0xf; if (y < height) return true; switch (step) { case 24: y = 4; step = 8; if (y < height) return true; case 8: y = 2; step = 4; if (y < height) return true; case 4: y = 1; *step = 2; return true; } return false; } ``` To avoid the caller having to maintain extra state, the pass number is no longer used, and we use the step size to keep track of which pass we are on. However, the step size for the first two passes is the same, so that led to the strange use of 24 for the initial step size, and the masking of step to ensure only the bottom four bits are used. For `f5`, I got the following performance: \| Compiler \| Time \| \| -------- \| ------ \| \| Clang \| 2.364s \| \| GCC \| 2.891s \| This is the fastest yet for both compilers, with Clang still winning easily. Note that when the functions are not allowed to inline, `f2` and `f3` are the fastest.
*	GIF: Constify raw source data.	Michael Drake	2021-11-04	1	-14/+14
\|
*	GIF: Don't write into the GIF source data to handle errors.	Michael Drake	2021-11-04	1	-18/+6
\|
*	GIF: Use same function for frame initialisation and decoding.	Michael Drake	2021-11-04	1	-83/+42
\| \| \| \| \|	Now there aren't two entirely separate code paths that must be kept in sync.
*	GIF: Pass data position into the decoders.	Michael Drake	2021-11-04	1	-69/+65
\| \| \| \| \|	This avoids storing the state on in the gif structure, and having to save and restore it when processing frames.
*	GIF: Frame initialisation: Don't need to calculate gif_bytes.	Michael Drake	2021-11-04	1	-5/+3
\|
*	GIF: Use frame rather than indexing frames array.	Michael Drake	2021-11-04	1	-4/+4
\|
*	GIF: Remove redundant check and comment from frame initialiser.	Michael Drake	2021-11-04	1	-14/+0
\|
*	GIF: Use image parsing frunction for frame initialisation.	Michael Drake	2021-11-04	1	-59/+4
\|