summaryrefslogtreecommitdiff
path: root/doc/rewrap.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rewrap.txt')
-rw-r--r--doc/rewrap.txt448
1 files changed, 448 insertions, 0 deletions
diff --git a/doc/rewrap.txt b/doc/rewrap.txt
new file mode 100644
index 00000000..35bc2c8b
--- /dev/null
+++ b/doc/rewrap.txt
@@ -0,0 +1,448 @@
+╔════════════════╗
+║ VTE rewrapping ║
+╚════════════════╝
+
+as per the feature request and discussions at
+https://bugzilla.gnome.org/show_bug.cgi?id=336238
+
+by Egmont Koblinger and Behdad Esfahbod
+
+
+Overview
+════════
+
+It is a really cool feature if the terminal rewraps long lines when the window
+is resized.
+
+In order to implement this, we need to remember for each line whether we
+advanced to the next because a newline (a.k.a. linefeed) was printed, or
+because the end of line was reached. VTE and most other terminals already
+remember this (even if they don't support rewrap) for copy-paste purposes.
+
+Let's use the following terminology:
+
+A "line" or "row" (these two words are used interchangeably in this document)
+refer to a physical line of the terminal.
+
+A line is "hard wrapped" if it was terminated by an explicit newline. On
+contrary, a line is "soft wrapped" if the text overflowed to the next line.
+
+It's not clear by this definition whether the last line should be defined as
+hard or soft wrapped. It should be irrelevant. The definition also gets
+unclear as soon as we start printing escape codes that move the cursor. E.g.
+should positioning the cursor to the beginning of a previous line and printing
+something there effect the soft or hard wrapped state of the preceding line?
+
+A "paragraph" is one or more lines enclosed between two hard line breaks. That
+is, the line preceding the paragraph is hard wrapped (or we're at the
+beginning of the buffer), all lines of the paragraph except the last are soft
+wrapped, and the last line is hard wrapped (or we're at the end of the buffer,
+in which case it can also be soft wrapped).
+
+
+Specification
+═════════════
+
+Content after rewrapping
+────────────────────────
+
+The basic goal is that if an application prints some continuous stream of text
+(with no cursor positioning escape codes) then after resizing the terminal the
+text should look just as if it was originally printed at the new terminal
+width.
+
+Rewrapping paragraphs containing single width and combining characters only
+should be obvious.
+
+Double width (CJK) characters should not be cut in half. If they don't fit at
+the end of the row, they should overflow to the next, leaving one empty cell
+at the end of the previous line. That empty cell should not be considered when
+copy-pasting the text, nor when rewrapping the text again. This is the same as
+when the CJK text is originally printed.
+
+TAB characters are a nightmare. Even without rewrapping, their behavior is
+weird. You can print arbitrary amount of tabs, the cursor doesn't advance from
+the last column. Then you can print a letter, and the cursor stays just beyond
+the last cell and yet again you can print arbitrary amounts of tabs which do
+nothing. Then the next letter wraps to the next line. So, even without
+rewrapping, copy-pasting tabs around EOL doesn't reproduce the exact same text
+that was printed by the application, tab characters can get dropped. In order
+to "fix" this, we'd need to remember two numbers per line (number of tabs at
+EOL before the last character, and number of tabs at EOL after the last
+character). It's definitely not worth it. Furthermore, there's dynamic tab
+stop positions, and the very last thing we'd want to do is to remember for
+each tab character where the tab stops were when it was printed. So when
+rewrapping, we don't try to rewrap to the state exactly as if the application
+originally printed the text at the new width. If we do anything that's not
+obviously horribly broken then we're okay. (In other words, in this respect
+we're safe to say that tab is a cursor positioning code rather than a
+printable character.)
+
+
+Other generic expectations
+──────────────────────────
+
+Window managers can be configured to resize applications (and hence the VTE
+widget) only once for the final size, and can resize it continuously. It's
+expected that these two should lead to the same result (as much as possible).
+
+Some terminal emulators scroll to the bottom on resize. VTE has traditionally
+been cleverer, it kept the scroll position. I believe it's a nice feature and
+we should try to keep it the same.
+
+It is expected that a small difference in the way you resize the terminal
+shouldn't lead to a big difference in behavior. This is very hard to lay in
+exact specifications, these are rather "common sense" expectations, but I try
+to demonstrate via a couple of examples. If you change the width but all
+paragraphs were and still are shorter than the width, rewrapping shouldn't
+change the scroll offset. If there was only 1 paragraph that needed to be
+rewrapped from one line to two lines, the content shouldn't scroll by more
+than 1 line anywhere on the screen. If you change the height only, the
+behavior would be the same as with old non-rewrapping VTE. In this case the
+rewrapping code is actually skipped (because it's an expensive operation), but
+even if it was executed, the behavior should remain the same.
+
+
+Normal vs alternate screen
+──────────────────────────
+
+The normal screen should always be resized and rewrapped, even if the
+alternate screen is visible (bug 415277). This can occur immediately on each
+resize, or once when returning from the alternate screen. Probably resizing
+immediately gives a better user experience (main bug comment 34), since
+resizing is a heavyweight user-initiated event, while returning from the
+alternate screen is not where the user would expect the terminal to hang for
+some time.
+
+The alternate screen should not be rewrapped. It is used by applications that
+have full control over the entire area and they will repaint it themselves.
+Rewrapping by vte would cause ugly artifacts after vte rewraps but before the
+application catches up, e.g. characters aligned below each other would become
+arranged diagonally for a short while. (Moreover, with current VTE design,
+rewrapping the alternate screen would require many new fds to be used: main
+bug comment 60).
+
+
+Cursor position after rewrapping
+────────────────────────────────
+
+Both the active cursor and the saved cursor should be updated when rewrapping.
+(The saved cursor might be important e.g. when returning from alternate
+screen.)
+
+The cursor should ideally stay over the same character (whenever possible), or
+as "close" to that as possible. If it is over the second cell of a CJK, or in
+the middle of a Tab, it should remain so.
+
+If rewrapping is disabled, the cursor can be anywhere to the right, even
+beyond the right end of the screen. This can occur easily when the window is
+narrowed. But even with rewrapping enabled, there is 1 more valid position
+than the number of columns. E.g. with 80 columns, the cursor can be over the
+1st character, ..., over the 80th character, or beyond the 80th character,
+which are 81 valid horizontal positions; in the latter case the cursor is not
+over a character. We need to distinguish all these positions and keep them
+during rewrap whenever possible.
+
+Let's assume the cursor's old position is not above a character, but at EOL or
+beyond. After rewrapping, we should try to maintain this position, so we
+should walk to the right from the corresponding character if possible.
+However, we should not walk into text that got joined with this line during
+rewrapping a paragraphs, nor should we wrap to next line.
+
+Here are a couple of examples. Imagine the cursor stands in the underlined
+cell (although it's technically an "upper one eighth block" character in the
+cell below in this document). The text printed by applications doesn't contain
+space characters in these examples.
+
+- The cursor is far to the right in a hard wrapped line. Keep that position,
+ no matter if visible or not:
+
+ ▏width 13 ▏ ▏width 20 ▏
+ paragraphend. <-> paragraphend.
+ Newparagraph ▔ Newparagraph ▔
+
+- The cursor is far to the right in a soft wrapped line. That position cannot
+ be maintained, so jump to a character:
+
+ ▏width 11 ▏ ▏width 10 ▏ ▏width 12 ▏
+ blabla12345 -> blabla1234 or blabla123456
+ 67890 ▔ 567890 7890 ▔
+ ▔
+- The cursor is far to the right in a soft wrapped line. That position can be
+ maintained because the next CJK doesn't fix:
+
+ ▏width 11 ▏ ▏width 12 ▏
+ blabla12345 <-> blabla12345
+ 伀 ▔ 伀 ▔
+
+- Wrapping a CJK leaves an empty cell. Also, keep the cursor under the second
+ half:
+
+ ▏width 13 ▏ ▏width 12 ▏
+ blabla12345伀 <-> blabla12345
+ ▔ 伀
+ ▔
+
+Shell prompt
+────────────
+
+If you resize the terminal to be narrower than your shell prompt (plus the
+command you're entering) while the shell is waiting for your command, you see
+weird behavior there. This is not a bug in rewrapping: it's because the shell
+redisplays its prompt (and command line) on every resize. There's not much VTE
+could do here.
+
+As a long term goal, maybe readline could have an option where it knows that
+the terminal rewraps its contents so that it doesn't redisplay the prompt and
+the command line, just expects the terminal to do this correctly. It's a bit
+risky, since probably all terminals that support rewrapping do this a little
+bit differently.
+
+
+Scroll position, cutting lines from the bottom
+──────────────────────────────────────────────
+
+A very tricky question is to figure out the scroll position after a resize.
+First, let's ignore bug 708213's requirements.
+
+Normally the scrollbar is at the bottom. If this is the case, it should remain
+so.
+
+How to position the scroll offset if the scrollbar is somewhere at the middle?
+Playing with various possibilities suggested that probably the best behavior
+is if we try to keep the bottom visible paragraph at the bottom. (After all,
+in terminals the bottom is far more important than the top.) It's not yet
+exactly specified if the bottom of the viewport cuts a paragraph in two, but
+still then we try to keep it approximately there.
+
+The exact implemented behavior is: we look at the character at the cell just
+under the viewport's bottom left corner, keep track where this character moves
+during rewrapping, and position the scrollbar so that this character is again
+just under the viewport.
+
+As an exception, I personally found a "snap to top" feature useful: if the
+scrollbar was all the way at the top, it should stay there.
+
+Now let's address bug 708213.
+
+This breaks the expectation that changing the terminal height back and forth
+should be a no-op. To match XTerm's behavior, when the window height is
+reduced and there are lines under the cursor then those lines should be
+dropped for good.
+
+It is very hard to figure out the desired behavior when this is combined with
+rewrapping. E.g. in one step you decrease the height and would expect lines to
+be dropped from the bottom, but in the very same step you increase the width
+which causes some previously wrapped paragraphs to fit in a single line (this
+could be above or below the cursor or just in the cursor's line, or all of
+these) which makes room for previously undisplayed lines. What to do then?
+
+The total number of rows, the number of rows above the cursor, and the number
+of rows below the cursor can all increase/decrease/stay pretty much
+independently from each other, almost all combinations are possible when
+resizing diagonally with rewrapping enabled. The behavior should also be sane
+when the cursor's paragraph starts wrapping.
+
+As an additional requirement, I had the aforementioned shell prompt feature in
+mind. One of the most typical use cases when the cursor is not in the bottom
+row is when you edit a multiline shell command and move the cursor back. In
+this case, shrinking the terminal shouldn't cut lines from the bottom.
+
+My best idea which reasonably covers all the possible cases is that we drop
+the lines (if necessary) after rewrapping, but before computing the new
+scrollbar offsets, and we drop the highest number of lines that satisfies all
+these three conditions:
+
+ - We shouldn't drop more lines than necessary to fit the content without
+ scrollbars.
+
+ - We should only drop data that's below the cursor's paragraph. (We don't
+ drop data that is under the cursor's row, but belongs to the same
+ paragraph).
+
+ - We track the character cell that immediately follows the cursor's
+ paragraph (that is, the line after this paragraph, first column), and see
+ how much it would get closer to the top of the window (assuming viewport is
+ scrolled to the bottom). The original bug is about that the cursor
+ shouldn't get closer to the top, with rewrapping I found that it's probably
+ not the cursor but the end of the cursor's paragraph that makes sense to
+ track. We shouldn't drop more lines than the amount by which this point
+ would get closer to the top.
+
+
+Implementation
+══════════════
+
+Storing lines
+─────────────
+
+Vte's ring was designed with rewrapping in mind, nevertheless it operates with
+rows. Changing it to work on paragraphs would require heavy refactoring, and
+would cause all sorts of troubles with overlong paragraphs. As the main
+features of terminals (showing content, scrolling etc.) are all built around
+rows, such a change for rewrapping only doesn't sound feasible. It's even
+unclear which approach would be better for a terminal built from scratch. So
+we decided to keep Vte operate with rows. Rewrapping is an expensive operation
+that builds up the notion of paragraphs from rows, and then cuts them to rows
+again.
+
+The scrollback buffer also remains defined in terms of lines, rather than
+paragraphs or memory. This also guarantees that the scrollbar's length cannot
+fluctuate.
+
+
+Ring
+────
+
+The ring contains some of the bottom rows in thawed state, while most of the
+scrollback buffer is frozen. Rewrapping is very complicated so we don't want
+the code to be duplicated. It is also computational heavy and we should try to
+be as fast as possible. Hence we work on frozen data structure in which most
+of the data lies, and we freeze all the rows for this purpose.
+
+The frozen text is stored in UTF-8. Care should be taken that the number of
+visual cells, number of Unicode characters, and number of bytes are three
+different values.
+
+The buffer is stored in three streams: text_stream contains the raw text
+encoded in UTF-8, with '\n' characters at paragraph boundaries; attr_stream
+contains records for each continuous run of identical attributes (same colors,
+character width, etc.) of text_stream (with the exception of '\n' where the
+attribute is ignored, e.g. it can be even embedded in a continuous run of
+double-width CJK characters); and row_stream consists of pointers into
+attr_steam and text_stream for every row. Out of these three, only row_stream
+needs to be regenerated.
+
+We start building up the new row stream beginning at new row number 0. We
+could make it any other arbitrary number, but we wouldn't be able to keep any
+of the old numbers unchanged (neither ring->start because lines can be dropped
+from the scrollback's top when narrowing the window, nor ring->end because we
+have no clue at the beginning how many rows we'll have), so there's no point
+even trying.
+
+
+Rewrapping
+──────────
+
+For higher performance, for each row we store whether it consists of ASCII
+32..126 characters only (excluding tabs too). (The flag can err in the safe
+way: it can be false even if the paragraph is ASCII only.) If a paragraph
+consists solely of such rows, we can rewrap it without looking at text_stream,
+since we know that all characters are stored as a single byte and all occupy a
+single cell.
+
+If it's not the case, we need to look at text_stream to be able to wrap the
+paragraph.
+
+Other than this, rewrapping is long, boring, but straightforward code without
+any further tricks.
+
+
+Markers
+───────
+
+There are some cell positions (I call them markers) that we need to keep track
+of, and tell where they moved during rewrapping. Such markers are the cursor,
+the saved cursor, the cell under the viewport's bottom left corner (for
+computing the new scrollbar offset), the cell under the bottom left corner of
+the cursor's paragraph (for computing the number of lines to get dropped), and
+the boundaries of the highlighted region.
+
+A marker is a (row, column) pair where the row is either within the ring's
+range or in a further row, and the column is arbitrary.
+
+Before rewrapping, if the row is within the ring's range, the (row, column)
+pair is converted to a VteCellTextOffset which contains the text offset,
+fragment_cells denoting how many cells to walk from the first cell of a
+multicell character (i.e. 1 for the right half of a CJK), and eol_cells
+containing -1 if the cursor is over a character, 0 if the cursor is just after
+the last character, or more if the cursor is farther to the right. Example:
+
+ ▏width 24 ▏
+ Line 0 overflowing to LI
+ NE 1 ▔
+
+If the cursor is over 'I' then text_offset is 23, eol_cells is -1.
+If the cursor is just after the 'I' (as shown) then text_offset is 24,
+eol_cells is 0.
+If the cursor is one n more cells further to the right then text_offset is 24,
+eol_cells is n.
+if the cursor is over 'N' then text_offset is 24 and eol_cells is -1.
+If the cursor is over 'E' then text_offset is 25 and eol_cells is -1.
+
+If the row is beyond the range covered by the ring, then text_offset will be
+text_stream's head for the immediate next row, one bigger for next row and so
+on, eol_cells will be set to the desired column, and fragment_cells is 0.
+Pretty much as if the ring continued with empty hard wrapped lines.
+
+After rewrapping, VteCellTextOffset is converted back to (row, column)
+according to the new width and new row numbering. This could be done solely
+based on VteCellTextOffset, but instead we update the row during rewrapping,
+and only compute the column afterwards. This is because we don't have a fast
+way of mapping text_offset to row number, this would require a binary search,
+it's much easier to remember this data when we're there anyway while
+rewrapping.
+
+
+Further optimization
+────────────────────
+
+In row_stream and attr_stream, along with the text offset we could similarly
+store the character offset (a counter that is increased by 1 on every Unicode
+character, in other words what the value of the text offset would be if we
+stored the text in UCS-4 rather than UTF-8).
+
+This, along with the fact that a cell's attribute contains the character
+width, and hence there is an attr change at every boundary where the character
+width changes, would enable us to compute the number of lines for each
+paragraph without looking at text_stream. This could be a huge win, since
+text_stream is by far the biggest of the three streams.
+
+The trick is however that we'd only know the number of lines for the
+paragraph, but not the text offsets for the inner lines. These would have to
+remain in a special uninitialized state in the new row_stream, and be computed
+lazily on demand. For storing that, streams would need to be writable at
+arbitrary positions, rather than just allowing appending of new data.
+
+Care should be taken that this "on demand" includes the case when they are
+being scrolled out from the scrollback buffer for good, because we'd still
+need to be able to tell the text offset for the remaining lines of the
+paragraph.
+
+
+Bugs
+════
+
+With the current design, the top of the scrollback buffer can easily contain a
+partial paragraph. After a subsequent resize, this might lead to the topmost
+row missing its first part. E.g. after executing "ls -l /bin" at width 40 and
+then widening the terminal, the first 40 characters of bash's paragraph can be
+cut off like this, because that used to form a row that got scrolled out:
+
+012 bash
+-rwxr-xr-x 3 root root 31152 Aug 3 2012 bunzip2
+-rwxr-xr-x 1 root root 1999912 Mar 13 2013 busybox
+
+With the current design I can't see any easy and clean workaround for this
+that wouldn't introduce other side effects or terribly complicated code. I'd
+say this is a small glitch we can easily live with.
+
+
+Caveats
+═══════
+
+With extremely large scrollback buffers (let's not forget: VTE supports
+infinite scrollback) rewrapping might become slow. On my computer (average
+laptop with Intel(R) Core(TM) i3 CPU, old-fashioned HDD) resizing 1 million
+lines take about 0.2 seconds wall clock time, this is close to the boundary of
+okay-ish speed. For this reason, rewrapping can be disabled with the
+vte_terminal_set_rewrap_on_resize() api call.
+
+Developers writing Vte-based multi-tab terminal emulators are encouraged to
+resize only the visible Vte, the hidden ones should be resized when they
+become visible. This avoids the time it takes to rewrap the buffer to be
+multiplied by the number of tabs and so block the user for a long
+uninterrupted time when they resize the window. Developers are also encouraged
+to implement a user friendly way of disabling rewrapping if they allow giant
+scrollback buffer.
+