diff options
Diffstat (limited to 'doc/debuginfod.8')
-rw-r--r-- | doc/debuginfod.8 | 369 |
1 files changed, 369 insertions, 0 deletions
diff --git a/doc/debuginfod.8 b/doc/debuginfod.8 new file mode 100644 index 00000000..eb1d8910 --- /dev/null +++ b/doc/debuginfod.8 @@ -0,0 +1,369 @@ +'\"! tbl | nroff \-man +'\" t macro stdmacro + +.de SAMPLE +.br +.RS 0 +.nf +.nh +.. +.de ESAMPLE +.hy +.fi +.RE +.. + +.TH DEBUGINFOD 8 +.SH NAME +debuginfod \- debuginfo-related http file-server daemon + +.SH SYNOPSIS +.B debuginfod +[\fIOPTION\fP]... [\fIPATH\fP]... + +.SH DESCRIPTION +\fBdebuginfod\fP serves debuginfo-related artifacts over HTTP. It +periodically scans a set of directories for ELF/DWARF files and their +associated source code, as well as RPM files containing the above, to +build an index by their buildid. This index is used when remote +clients use the HTTP webapi, to fetch these files by the same buildid. + +If a debuginfod cannot service a given buildid artifact request +itself, and it is configured with information about upstream +debuginfod servers, it queries them for the same information, just as +\fBdebuginfod-find\fP would. If successful, it locally caches then +relays the file content to the original requester. + +If the \fB\-F\fP option is given, each listed PATH creates a thread to +scan for matching ELF/DWARF/source files under the given physical +directory. Source files are matched with DWARF files based on the +AT_comp_dir (compilation directory) attributes inside it. Duplicate +directories are ignored. You may use a file name for a PATH, but +source code indexing may be incomplete; prefer using a directory that +contains the binaries. Caution: source files listed in the DWARF may +be a path \fIanywhere\fP in the file system, and debuginfod will +readily serve their content on demand. (Imagine a doctored DWARF file +that lists \fI/etc/passwd\fP as a source file.) If this is a concern, +audit your binaries with tools such as: + +.SAMPLE +% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^File.name.table/p' +or +% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^Line.number/p' +or even use debuginfod itself: +% debuginfod -vvv -d :memory: -F BINARY 2>&1 | grep 'recorded.*source' +^C +.ESAMPLE + +If the \fB\-R\fP option is given each listed PATH creates a thread to +scan for ELF/DWARF/source files contained in matching RPMs under the +given physical directory. Duplicate directories are ignored. You may +use a file name for a PATH, but source code indexing may be +incomplete; prefer using a directory that contains normal RPMs +alongside debuginfo/debugsource RPMs. Because of complications such +as DWZ-compressed debuginfo, may require \fItwo\fP scan passes to +identify all source code. Source files for RPMs are only served +from other RPMs, so the caution for \-F does not apply. + +If no PATH is listed, or neither \-F nor \-R option is given, then +\fBdebuginfod\fP will simply serve content that it scanned into its +index in previous runs: the data is cumulative. + +File names must match extended regular expressions given by the \-I +option and not the \-X option (if any) in order to be considered. + + +.SH OPTIONS + +.TP +.B "\-F" +Activate ELF/DWARF file scanning threads. The default is off. + +.TP +.B "\-R" +Activate RPM file scanning threads. The default is off. + +.TP +.B "\-d FILE" "\-\-database=FILE" +Set the path of the sqlite database used to store the index. This +file is disposable in the sense that a later rescan will repopulate +data. It will contain absolute file path names, so it may not be +portable across machines. It may be frequently read/written, so it +should be on a fast filesytem. It should not be shared across +machines or users, to maximize sqlite locking performance. The +default database file is $HOME/.debuginfod.sqlite. + +.TP +.B "\-D SQL" "\-\-ddl=SQL" +Execute given sqlite statement after the database is opened and +initialized as extra DDL (SQL data definition language). This may be +useful to tune performance-related pragmas or indexes. May be +repeated. The default is nothing extra. + +.TP +.B "\-p NUM" "\-\-port=NUM" +Set the TCP port number on which debuginfod should listen, to service +HTTP requests. Both IPv4 and IPV6 sockets are opened, if possible. +The webapi is documented below. The default port number is 8002. + +.TP +.B "\-I REGEX" "\-\-include=REGEX" "\-X REGEX" "\-\-exclude=REGEX" +Govern the inclusion and exclusion of file names under the search +paths. The regular expressions are interpreted as unanchored POSIX +extended REs, thus may include alternation. They are evaluated +against the full path of each file, based on its \fBrealpath(3)\fP +canonicalization. By default, all files are included and none are +excluded. A file that matches both include and exclude REGEX is +excluded. (The \fIcontents\fP of RPM files are not subject to +inclusion or exclusion filtering: they are all processed.) + +.TP +.B "\-t SECONDS" "\-\-rescan\-time=SECONDS" +Set the rescan time for the file and RPM directories. This is the +amount of time the scanning threads will wait after finishing a scan, +before doing it again. A rescan for unchanged files is fast (because +the index also stores the file mtimes). A time of zero is acceptable, +and means that only one initial scan should performed. The default +rescan time is 300 seconds. Receiving a SIGUSR1 signal triggers a new +scan, independent of the rescan time (including if it was zero). + +.TP +.B "\-g SECONDS" "\-\-groom\-time=SECONDS" +Set the groom time for the index database. This is the amount of time +the grooming thread will wait after finishing a grooming pass before +doing it again. A groom operation quickly rescans all previously +scanned files, only to see if they are still present and current, so +it can deindex obsolete files. See also the \fIDATA MANAGEMENT\fP +section. The default groom time is 86400 seconds (1 day). A time of +zero is acceptable, and means that only one initial groom should be +performed. Receiving a SIGUSR2 signal triggers a new grooming pass, +independent of the groom time (including if it was zero). + +.TP +.B "\-G" +Run an extraordinary maximal-grooming pass at debuginfod startup. +This pass can take considerable time, because it tries to remove any +debuginfo-unrelated content from the RPM-related parts of the index. +It should not be run if any recent RPM-related indexing operations +were aborted early. It can take considerable space, because it +finishes up with an sqlite "vacuum" operation, which repacks the +database file by triplicating it temporarily. The default is not to +do maximal-grooming. See also the \fIDATA MANAGEMENT\fP section. + +.TP +.B "\-c NUM" "\-\-concurrency=NUM" +Set the concurrency limit for all the scanning threads. While many +threads may be spawned to cover all the given PATHs, only NUM may +concurrently do CPU-intensive operations like parsing an ELF file +or an RPM. The default is the number of processors on the system; +the minimum is 1. + +.TP +.B "\-v" +Increase verbosity of logging to the standard error file descriptor. +May be repeated to increase details. The default verbosity is 0. + +.SH WEBAPI + +.\" Much of the following text is duplicated with debuginfod-find.1 + +The debuginfod's webapi resembles ordinary file service, where a GET +request with a path containing a known buildid results in a file. +Unknown buildid / request combinations result in HTTP error codes. +This file service resemblance is intentional, so that an installation +can take advantage of standard HTTP management infrastructure. + +There are three requests. In each case, the buildid is encoded as a +lowercase hexadecimal string. For example, for a program \fI/bin/ls\fP, +look at the ELF note GNU_BUILD_ID: + +.SAMPLE +% readelf -n /bin/ls | grep -A4 build.id +Note section [ 4] '.note.gnu.buildid' of 36 bytes at offset 0x340: +Owner Data size Type +GNU 20 GNU_BUILD_ID +Build ID: 8713b9c3fb8a720137a4a08b325905c7aaf8429d +.ESAMPLE + +Then the hexadecimal BUILDID is simply: + +.SAMPLE +8713b9c3fb8a720137a4a08b325905c7aaf8429d +.ESAMPLE + +.SS /buildid/\fIBUILDID\fP/debuginfo + +If the given buildid is known to the server, this request will result +in a binary object that contains the customary \fB.*debug_*\fP +sections. This may be a split debuginfo file as created by +\fBstrip\fP, or it may be an original unstripped executable. + +.SS /buildid/\fIBUILDID\fP/executable + +If the given buildid is known to the server, this request will result +in a binary object that contains the normal executable segments. This +may be a executable stripped by \fBstrip\fP, or it may be an original +unstripped executable. \fBET_DYN\fP shared libraries are considered +to be a type of executable. + +.SS /buildid/\fIBUILDID\fP/source\fI/SOURCE/FILE\fP + +If the given buildid is known to the server, this request will result +in a binary object that contains the source file mentioned. The path +should be absolute. Relative path names commonly appear in the DWARF +file's source directory, but these paths are relative to +individual compilation unit AT_comp_dir paths, and yet an executable +is made up of multiple CUs. Therefore, to disambiguate, debuginfod +expects source queries to prefix relative path names with the CU +compilation-directory, followed by a mandatory "/". + +Note: contrary to RFC 3986, the client should not elide \fB../\fP or +\fB/./\fP or extraneous \fB///\fP sorts of path components in the +directory names, because if this is how those names appear in the +DWARF files, that is what debuginfod needs to see too. + +For example: +.TS +l l. +#include <stdio.h> /buildid/BUILDID/source/usr/include/stdio.h +/path/to/foo.c /buildid/BUILDID/source/path/to/foo.c +\../bar/foo.c AT_comp_dir=/zoo/ /buildid/BUILDID/source/zoo//../bar/foo.c +.TE + +.SH DATA MANAGEMENT + +debuginfod stores its index in an sqlite database in a densely packed +set of interlinked tables. While the representation is as efficient +as we have been able to make it, it still takes a considerable amount +of data to record all debuginfo-related data of potentially a great +many files. This section offers some advice about the implications. + +As a general explanation for size, consider that debuginfod indexes +ELF/DWARF files, it stores their names and referenced source file +names, and buildids will be stored. When indexing RPMs, it stores +every file name \fIof or in\fP an RPM, every buildid, plus every +source file name referenced from a DWARF file. (Indexing RPMs takes +more space because the source files often reside in separate +subpackages that may not be indexed at the same pass, so extra +metadata has to be kept.) + +Getting down to numbers, in the case of Fedora RPMs (essentially, +gzip-compressed cpio files), the sqlite index database tends to be +from 0.5% to 3% of their size. It's larger for binaries that are +assembled out of a great many source files, or packages that carry +much debuginfo-unrelated content. It may be even larger during the +indexing phase due to temporary sqlite write-ahead-logging files; +these are checkpointed (cleaned out and removed) at shutdown. It may +be helpful to apply tight \-I or \-X regular-expression constraints to +exclude files from scanning that you know have no debuginfo-relevant +content. + +As debuginfod runs, it periodically rescans its target directories, +and any new content found is added to the database. Old content, such +as data for files that have disappeared or that have been replaced +with newer versions is removed at a periodic \fIgrooming\fP pass. +This means that the sqlite files grow fast during initial indexing, +slowly during index rescans, and periodically shrink during grooming. +There is also an optional one-shot \fImaximal grooming\fP pass is +available. It removes information debuginfo-unrelated data from the +RPM content index such as file names found in RPMs ("rpm sdef" +records) that are not referred to as source files from any binaries +find in RPMs ("rpm sref" records). This can save considerable disk +space. However, it is slow and temporarily requires up to twice the +database size as free space. Worse: it may result in missing +source-code info if the RPM traversals were interrupted, so the not +all source file references were known. Use it rarely to polish a +complete index. + +You should ensure that ample disk space remains available. (The flood +of error messages on -ENOSPC is ugly and nagging. But, like for most +other errors, debuginfod will resume when resources permit.) If +necessary, debuginfod can be stopped, the database file moved or +removed, and debuginfod restarted. + +sqlite offers several performance-related options in the form of +pragmas. Some may be useful to fine-tune the defaults plus the +debuginfod extras. The \-D option may be useful to tell debuginfod to +execute the given bits of SQL after the basic schema creation +commands. For example, the "synchronous", "cache_size", +"auto_vacuum", "threads", "journal_mode" pragmas may be fun to tweak +via \-D, if you're searching for peak performance. The "optimize", +"wal_checkpoint" pragmas may be useful to run periodically, outside +debuginfod. The default settings are performance- rather than +reliability-oriented, so a hardware crash might corrupt the database. +In these cases, it may be necessary to manually delete the sqlite +database and start over. + +As debuginfod changes in the future, we may have no choice but to +change the database schema in an incompatible manner. If this +happens, new versions of debuginfod will issue SQL statements to +\fIdrop\fP all prior schema & data, and start over. So, disk space +will not be wasted for retaining a no-longer-useable dataset. + +In summary, if your system can bear a 0.5%-3% index-to-RPM-dataset +size ratio, and slow growth afterwards, you should not need to +worry about disk space. If a system crash corrupts the database, +or you want to force debuginfod to reset and start over, simply +erase the sqlite file before restarting debuginfod. + + +.SH SECURITY + +debuginfod \fBdoes not\fP include any particular security features. +While it is robust with respect to inputs, some abuse is possible. It +forks a new thread for each incoming HTTP request, which could lead to +a denial-of-service in terms of RAM, CPU, disk I/O, or network I/O. +If this is a problem, users are advised to install debuginfod with a +HTTPS reverse-proxy front-end that enforces site policies for +firewalling, authentication, integrity, authorization, and load +control. + +When relaying queries to upstream debuginfods, debuginfod \fBdoes not\fP +include any particular security features. It trusts that the binaries +returned by the debuginfods are accurate. Therefore, the list of +servers should include only trustworthy ones. If accessed across HTTP +rather than HTTPS, the network should be trustworthy. Authentication +information through the internal \fIlibcurl\fP library is not currently +enabled. + + +.SH "ENVIRONMENT VARIABLES" + +.TP 21 +.B DEBUGINFOD_URLS +This environment variable contains a list of URL prefixes for trusted +debuginfod instances. Alternate URL prefixes are separated by space. +Avoid referential loops that cause a server to contact itself, directly +or indirectly - the results would be hilarious. + +.TP 21 +.B DEBUGINFOD_TIMEOUT +This environment variable governs the timeout for each debuginfod HTTP +connection. A server that fails to respond within this many seconds +is skipped. The default is 5. + +.TP 21 +.B DEBUGINFOD_CACHE_PATH +This environment variable governs the location of the cache where +downloaded files are kept. It is cleaned periodically as this +program is reexecuted. The default is $HOME/.debuginfod_client_cache. +.\" XXX describe cache eviction policy + +.SH FILES +.LP +.PD .1v +.TP 20 +.B $HOME/.debuginfod.sqlite +Default database file. +.PD + +.TP 20 +.B $HOME/.debuginfod_client_cache +Default cache directory for content from upstream debuginfods. +.PD + + +.SH "SEE ALSO" +.I "debuginfod-find(1)" +.I "sqlite3(1)" +.I \%https://prometheus.io/docs/instrumenting/exporters/ |