summaryrefslogtreecommitdiff
path: root/chromium/content/browser/download/docs/save-page-as.md
blob: 4493bd6b9bad7de95bda93182cfbbfc59cbad1cc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# High-level overview of Save-Page-As code

This document describes code under `//content/browser/downloads`
restricting the scope only to code handling Save-Page-As functionality
(i.e. leaving out other downloads-related code).
This document focuses on high-level overview and aspects of the code that
span multiple compilation units (hoping that individual compilation units
are described by their code comments or by their code structure).

## Classes overview

* SavePackage class
    * coordinates overall save-page-as request
    * created and owned by `WebContents`
      (ref-counted today, but it is unnecessary - see https://crbug.com/596953)
    * UI-thread object

* SaveFileCreateInfo::SaveFileSource enum
    * classifies `SaveItem` and `SaveFile` processing into 2 flavours:
        * `SAVE_FILE_FROM_NET` (see `SaveFileResourceHandler`)
        * `SAVE_FILE_FROM_DOM` (see "Complete HTML" section below)

* SaveItem class
    * tracks saving a single file
    * created and owned by `SavePackage`
    * UI-thread object

* SaveFileManager class
    * coordinates between the download sequence and the UI thread
        * Gets requests from `SavePackage` and communicates results back to
          `SavePackage` on the UI thread.
        * Shephards data (received from the network OR from DOM) into
          the download sequence - via `SaveFileManager::UpdateSaveProgress`
    * created and owned by `BrowserMainLoop`
      (ref-counted today, but it is unnecessary - see https://crbug.com/596953)
    * The global instance can be retrieved by the Get method.

* SaveFile class
    * tracks saving a single file
    * created and owned by `SaveFileManager`
    * download sequence object

* SaveFileResourceHandler class
    * tracks network downloads + forwards their status into `SaveFileManager`
      (onto download sequence)
    * created by `ResourceDispatcherHostImpl::BeginSaveFile`
    * IO-thread object

* SaveFileCreateInfo POD struct
    * short-lived object holding data passed to callbacks handling start of
      saving a file.

* MHTMLGenerationManager class
    * singleton that manages progress of jobs responsible for saving individual
      MHTML files (represented by `MHTMLGenerationManager::Job`).


## Overview of the processing flow

Save-Page-As flow starts with `WebContents::OnSavePage`.
The flow is different depending on the save format chosen by the user
(each flow is described in a separate section below).

### Complete HTML

Very high-level flow of saving a page as "Complete HTML":

* Step 1: `SavePackage` asks all frames for "savable resources"
          and creates `SaveItem` for each of files that need to be saved

* Step 2: `SavePackage` first processes `SAVE_FILE_FROM_NET`
          `SaveItem`s and asks `SaveFileManager` to save
          them.

* Step 3: `SavePackage` handles remaining `SAVE_FILE_FROM_DOM` `SaveItem`s and
          asks each frame to serialize its DOM/HTML (each frame gets from
          `SavePackage` a map covering local paths that need to be referenced by
          the frame).  Responses from frames get forwarded to `SaveFileManager`
          to be written to disk.


### MHTML

Very high-level flow of saving a page as MHTML:

* Step 1: `WebContents::GenerateMHTML` is called by either `SavePackage` (for
          Save-Page-As UI) or Extensions (via `chrome.pageCapture` extensions
          API) or by an embedder of `WebContents` (since this is public API of
          //content).

* Step 2: `MHTMLGenerationManager` creates a new instance of
          `MHTMLGenerationManager::Job` that coordinates generation of
          the MHTML file by sequentially (one-at-a-time) asking each
          frame to write its portion of MHTML to a file handle.  Other
          classes (i.e. `SavePackage` and/or `SaveFileManager`) are not
          used at this step at all.

* Step 3: When done `MHTMLGenerationManager` destroys
          `MHTMLGenerationManager::Job` instance and calls a completion
          callback which in case of Save-Page-As will end up in
          `SavePackage::OnMHTMLGenerated`.

Note: MHTML format is by default disabled in Save-Page-As UI on Windows, MacOS
and Linux (it is the default on Chrome OS), but for testing this can be easily
changed using `--save-page-as-mhtml` command line switch.


### HTML Only

Very high-level flow of saving a page as "HTML Only":

* `SavePackage` creates only a single `SaveItem` (always `SAVE_FILE_FROM_NET`)
  and asks `SaveFileManager` to process it
  (as in the Complete HTML individual SaveItem handling above.).


## Other relevant code

Pointers to related code outside of `//content/browser/download`:

* End-to-end tests:
    * `//chrome/browser/downloads/save_page_browsertest.cc`
    * `//chrome/test/data/save_page/...`

* Other tests:
    * `//content/browser/downloads/*test*.cc`
    * `//content/renderer/dom_serializer_browsertest.cc` - single process... :-/

* Elsewhere in `//content`:
    * `//content/renderer/savable_resources...`

* Blink:
    * `//third_party/blink/public/web/web_frame_serializer...`
    * `//third_party/blink/renderere/core/frame/web_frame_serializer_impl...`
      (used for Complete HTML today;  should use `FrameSerializer` instead in
      the long-term - see https://crbug.com/328354).
    * `//third_party/blink/renderer/core/frame/frame_serializer...`
      (used for MHTML today)
    * `//third_party/blink/renderer/platform/mhtml/mhtml_archive...`