summaryrefslogtreecommitdiff
path: root/lib/urlapi.c
Commit message (Collapse)AuthorAgeFilesLines
* urlapi: provide more detailed return codesbagder/urlapi-returncodesDaniel Stenberg2021-11-251-56/+80
| | | | | | | | | | | | | | | | | | | | Previously, the return code CURLUE_MALFORMED_INPUT was used for almost 30 different URL format violations. This made it hard for users to understand why a particular URL was not acceptable. Since the API cannot point out a specific position within the URL for the problem, this now instead introduces a number of additional and more fine-grained error codes to allow the API to return more exactly in what "part" or section of the URL a problem was detected. Also bug-fixes curl_url_get() with CURLUPART_ZONEID, which previously returned CURLUE_OK even if no zoneid existed. Test cases in 1560 have been adjusted and extended. Tests 1538 and 1559 have been updated. Updated libcurl-errors.3 and curl_url_strerror() accordingly. Closes #8049
* urlapi: make Curl_is_absolute_url always use MAX_SCHEME_LENDaniel Stenberg2021-11-251-8/+12
| | | | | | | | | | | | Instad of having all callers pass in the maximum length, always use it. The passed in length is instead used only as the length of the target buffer for to storing the scheme name in, if used. Added the scheme max length restriction to the curl_url_set.3 man page. Follow-up to 45bcb2eaa78c79 Closes #8047
* urlapi: reject short file URLsDaniel Stenberg2021-11-231-0/+4
| | | | | | | | file URLs that are 6 bytes or shorter are not complete. Return CURLUE_MALFORMED_INPUT for those. Extended test 1560 to verify. Triggered by #8041 Closes #8042
* urlapi: cleanup scheme parsingStefan Eissing2021-11-221-16/+22
| | | | | | | Makea Curl_is_absolute_url() always leave a defined 'buf' and avoids copying on urls that do not start with a scheme. Closes #8043
* urlapi: skip a strlen(), pass in zeroDaniel Stenberg2021-10-151-2/+1
| | | | | | | | | ... to let curl_easy_escape() itself do the strlen. This avoids a (false positive) Coverity warning and it avoids us having to store the strlen() return value in an int variable. Reviewed-by: Daniel Gustafsson Closes #7862
* urlapi: URL decode percent-encoded host namesDaniel Stenberg2021-10-111-19/+90
| | | | | | | | | | | | | | | | | | | | | | The host name is stored decoded and can be encoded when used to extract the full URL. By default when extracting the URL, the host name will not be URL encoded to work as similar as possible as before. When not URL encoding the host name, the '%' character will however still be encoded. Getting the URL with the CURLU_URLENCODE flag set will percent encode the host name part. As a bonus, setting the host name part with curl_url_set() no longer accepts a name that contains space, CR or LF. Test 1560 has been extended to verify percent encodings. Reported-by: Noam Moshe Reported-by: Sharon Brizinov Reported-by: Raul Onitza-Klugman Reported-by: Kirill Efimov Fixes #7830 Closes #7834
* lib: avoid fallthrough cases in switch statementsDaniel Gustafsson2021-09-291-25/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b5a434f7f0ee4d64857f8592eced5b9007d83620 inhibits the warning on implicit fallthrough cases, since the current coding of indicating fallthrough with comments is falling out of fashion with new compilers. This attempts to make the issue smaller by rewriting fallthroughs to no longer fallthrough, via either breaking the cases or turning switch statements into if statements. lib/content_encoding.c: the fallthrough codepath is simply copied into the case as it's a single line. lib/http_ntlm.c: the fallthrough case skips a state in the state- machine and fast-forwards to NTLMSTATE_LAST. Do this before the switch statement instead to set up the states that we actually want. lib/http_proxy.c: the fallthrough is just falling into exiting the switch statement which can be done easily enough in the case. lib/mime.c: switch statement rewritten as if statement. lib/pop3.c: the fallthrough case skips to the next state in the statemachine, do this explicitly instead. lib/urlapi.c: switch statement rewritten as if statement. lib/vssh/wolfssh.c: the fallthrough cases fast-forwards the state machine, do this by running another iteration of the switch statement instead. lib/vtls/gtls.c: switch statement rewritten as if statement. lib/vtls/nss.c: the fallthrough codepath is simply copied into the case as it's a single line. Also twiddle a comment to not be inside a non-brace if statement. Closes: #7322 See-also: #7295 Reviewed-by: Daniel Stenberg <daniel@haxx.se>
* urlapi: support UNC paths in file: URLs on WindowsSergey Markelov2021-09-271-6/+34
| | | | | | | | | | - file://host.name/path/file.txt is a valid UNC path \\host.name\path\files.txt to a non-local file transformed into URI (RFC 8089 Appendix E.3) - UNC paths on other OSs must be smb: URLs Closes #7366
* urlapi.c:seturl: assert URL instead of using if-checkDaniel Stenberg2021-08-231-2/+1
| | | | | | | There's no code flow possible where this can happen. The assert makes sure it also won't be introduced undetected in the future. Closes #7610
* lib: use %u instead of %ld for port number printfDaniel Stenberg2021-06-301-2/+2
| | | | | | | Follow-up to 764c6bd3bf which changed the type of some port number fields. Detected by Coverity (CID 1486624) etc. Closes #7325
* curl_url_set: reject spaces in URLs w/o CURLU_ALLOW_SPACEDaniel Stenberg2021-06-151-10/+10
| | | | | | | | | | | | | They were never officially allowed and slipped in only due to sloppy parsing. Spaces (ascii 32) should be correctly encoded (to %20) before being part of a URL. The new flag bit CURLU_ALLOW_SPACE when a full URL is set, makes libcurl allow spaces. Updated test 1560 to verify. Closes #7073
* urlapi: make sure no +/- signs are accepted in IPv4 numericalsDaniel Stenberg2021-04-211-1/+5
| | | | | | | | Follow-up to 56a037cc0ad1b2. Extends test 1560 to verify. Reported-by: Tuomas Siipola Fixes #6916 Closes #6917
* urlapi: "normalize" numerical IPv4 host namesDaniel Stenberg2021-04-191-2/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the host name in a URL is given as an IPv4 numerical address, the address can be specified with dotted numericals in four different ways: a32, a.b24, a.b.c16 or a.b.c.d and each part can be specified in decimal, octal (0-prefixed) or hexadecimal (0x-prefixed). Instead of passing on the name as-is and leaving the handling to the underlying name functions, which made them not work with c-ares but work with getaddrinfo, this change now makes the curl URL API itself detect and "normalize" host names specified as IPv4 numericals. The WHATWG URL Spec says this is an okay way to specify a host name in a URL. RFC 3896 does not allow them, but curl didn't prevent them before and it seems other RFC 3896-using tools have not either. Host names used like this are widely supported by other tools as well due to the handling being done by getaddrinfo and friends. I decided to add the functionality into the URL API itself so that all users of these functions get the benefits, when for example wanting to compare two URLs. Also, it makes curl built to use c-ares now support them as well and make curl builds more consistent. The normalization makes HTTPS and virtual hosted HTTP work fine even when curl gets the address specified using one of the "obscure" formats. Test 1560 is extended to verify. Fixes #6863 Closes #6871
* misc: fix "warning: empty expression statement has no effect"Daniel Stenberg2020-12-261-6/+8
| | | | | | | | | | Turned several macros into do-while(0) style to allow their use to work find with semicolon. Bug: https://github.com/curl/curl/commit/08e8455dddc5e48e58a12ade3815c01ae3da3b64#commitcomment-45433279 Follow-up to 08e8455dddc5e4 Reported-by: Gisle Vanem Closes #6376
* urlapi: don't accept blank port number field without schemeDaniel Stenberg2020-12-071-4/+9
| | | | | | | | | | ... as it makes the URL parser accept "very-long-hostname://" as a valid host name and we don't want that. The parser now only accepts a blank (no digits) after the colon if the URL starts with a scheme. Reported-by: d4d on hackerone Closes #6283
* curl.se: new homeDaniel Stenberg2020-11-041-1/+1
| | | | Closes #6172
* urlapi: URL encode a '+' in the query partDaniel Stenberg2020-10-151-20/+7
| | | | | | | | | ... when asked to with CURLU_URLENCODE. Extended test 1560 to verify. Reported-by: Dietmar Hauser Fixes #6086 Closes #6087
* urlapi: use more Curl_safefreeEmil Engler2020-09-171-4/+2
| | | | Closes #5968
* terminology: call them null-terminated stringsDaniel Stenberg2020-06-281-4/+4
| | | | | | | | | | | Updated terminology in docs, comments and phrases to refer to C strings as "null-terminated". Done to unify with how most other C oriented docs refer of them and what users in general seem to prefer (based on a single highly unscientific poll on twitter). Reported-by: coinhubs on github Fixes #5598 Closes #5608
* escape: make the URL decode able to reject only %00 bytesDaniel Stenberg2020-06-251-1/+4
| | | | | | ... or all "control codes" or nothing. Assisted-by: Nicolas Sterchele
* urlapi: accept :: as a valid IPv6 addressDaniel Stenberg2020-05-081-1/+1
| | | | | | | | Text 1560 is extended to verify. Reported-by: Pavel Volgarev Fixes #5344 Closes #5351
* urlapi: guess scheme correct even with credentials givenDaniel Stenberg2020-01-281-31/+37
| | | | | | | | | In the "scheme-less" parsing case, we need to strip off credentials first before we guess scheme based on the host name! Assisted-by: Jay Satiro Fixes #4856 Closes #4857
* urlapi: fix use-after-free bugDaniel Stenberg2019-10-031-35/+36
| | | | | | | | | | | Follow-up from 2c20109a9b5d04 Added test 663 to verify. Reported by OSS-Fuzz Bug: https://crbug.com/oss-fuzz/17954 Closes #4453
* urlapi: fix URL encoding when setting a full URLDaniel Stenberg2019-10-021-1/+16
|
* urlapi: fix unused variable warningMarcel Raad2019-10-011-0/+2
| | | | | | `dest` is only used with `ENABLE_IPV6`. Closes https://github.com/curl/curl/pull/4444
* urlapi: question mark within fragment is still fragmentDaniel Stenberg2019-09-241-4/+4
| | | | | | | | | | | The parser would check for a query part before fragment, which caused it to do wrong when the fragment contains a question mark. Extended test 1560 to verify. Reported-by: Alex Konev Fixes #4412 Closes #4413
* urlapi: avoid index underflow for short ipv6 hostnamesPaul Dreik2019-09-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the input hostname is "[", hlen will underflow to max of size_t when it is subtracted with 2. hostname[hlen] will then cause a warning by ubsanitizer: runtime error: addition of unsigned offset to 0x<snip> overflowed to 0x<snip> I think that in practice, the generated code will work, and the output of hostname[hlen] will be the first character "[". This can be demonstrated by the following program (tested in both clang and gcc, with -O3) int main() { char* hostname=strdup("["); size_t hlen = strlen(hostname); hlen-=2; hostname++; printf("character is %d\n",+hostname[hlen]); free(hostname-1); } I found this through fuzzing, and even if it seems harmless, the proper thing is to return early with an error. Closes #4389
* urlapi: Expression 'storep' is always trueDaniel Stenberg2019-09-201-1/+2
| | | | | Fixes warning detected by PVS-Studio Fixes #4374
* urlapi: 'scheme' is always trueDaniel Stenberg2019-09-201-16/+15
| | | | | Fixes warning detected by PVS-Studio Fixes #4374
* urlapi: part of conditional expression is always true: (relurl[0] == '/')Daniel Stenberg2019-09-201-1/+1
| | | | | Fixes warning detected by PVS-Studio Fixes #4374
* urlapi: CURLU_NO_AUTHORITY allows empty authority/host partJens Finkhaeuser2019-09-191-11/+25
| | | | | | | CURLU_NO_AUTHORITY is intended for use with unknown schemes (i.e. not "file:///") to override cURL's default demand that an authority exists. Closes #4349
* urlapi: one colon is enough for the strspn() input (typo)Daniel Stenberg2019-09-101-1/+1
|
* urlapi: verify the IPv6 numerical addressDaniel Stenberg2019-09-101-4/+13
| | | | | | | | | It needs to parse correctly. Otherwise it could be tricked into letting through a-f using host names that libcurl would then resolve. Like '[ab.be]'. Reported-by: Thomas Vegas Closes #4315
* urlapi: increase supported scheme length to 40 bytesOmar Ramadan2019-05-201-2/+5
| | | | | | | The longest currently registered URI scheme at IANA is 36 bytes long. Closes #3905 Closes #3900
* lib: reduce variable scopesMarcel Raad2019-05-201-2/+2
| | | | | | Fixes Codacy/CppCheck warnings. Closes https://github.com/curl/curl/pull/3872
* urlapi: require a non-zero host name length when parsing URLDaniel Stenberg2019-05-141-0/+2
| | | | | | Updated test 1560 to verify. Closes #3880
* urlapi: add CURLUPART_ZONEID to set and getDaniel Stenberg2019-05-051-13/+22
| | | | | | | | The zoneid can be used with IPv6 numerical addresses. Updated test 1560 to verify. Closes #3834
* urlapi: strip off scope id from numerical IPv6 addressesDaniel Stenberg2019-05-031-9/+54
| | | | | | | | | | ... to make the host name "usable". Store the scope id and put it back when extracting a URL out of it. Also makes curl_url_set() syntax check CURLUPART_HOST. Fixes #3817 Closes #3822
* CURL_MAX_INPUT_LENGTH: largest acceptable string input sizeDaniel Stenberg2019-04-291-0/+8
| | | | | | | | | | | | | | | | | This limits all accepted input strings passed to libcurl to be less than CURL_MAX_INPUT_LENGTH (8000000) bytes, for these API calls: curl_easy_setopt() and curl_url_set(). The 8000000 number is arbitrary picked and is meant to detect mistakes or abuse, not to limit actual practical use cases. By limiting the acceptable string lengths we also reduce the risk of integer overflows all over. NOTE: This does not apply to `CURLOPT_POSTFIELDS`. Test 1559 verifies. Closes #3805
* urlapi: stricter CURLUPART_PORT parsingDaniel Stenberg2019-04-131-2/+9
| | | | | | | | | | | Only allow well formed decimal numbers in the input. Document that the number MUST be between 1 and 65535. Add tests to test 1560 to verify the above. Ref: https://github.com/curl/curl/issues/3753 Closes #3762
* urlapi: urlencode characters above 0x7f correctlyJakub Zakrzewski2019-04-071-3/+3
| | | | | fixes #3741 Closes #3742
* cleanup: make local functions staticDaniel Stenberg2019-02-101-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | urlapi: turn three local-only functions into statics conncache: make conncache_find_first_connection static multi: make detach_connnection static connect: make getaddressinfo static curl_ntlm_core: make hmac_md5 static http2: make two functions static http: make http_setup_conn static connect: make tcpnodelay static tests: make UNITTEST a thing to mark functions with, so they can be static for normal builds and non-static for unit test builds ... and mark Curl_shuffle_addr accordingly. url: make up_free static setopt: make vsetopt static curl_endian: make write32_le static rtsp: make rtsp_connisdead static warnless: remove unused functions memdebug: remove one unused function, made another static
* urlapi: reduce variable scope, remove unreachable 'break'Daniel Stenberg2019-02-091-10/+10
| | | | | | Both nits pointed out by codacy.com Closes #3540
* urlapi: fix parsing ipv6 with zone indexDaniel Gustafsson2018-12-301-2/+5
| | | | | | | | | | | | | | The previous fix for parsing IPv6 URLs with a zone index was a paddle short for URLs without an explicit port. This patch fixes that case and adds a unit test case. This bug was highlighted by issue #3408, and while it's not the full fix for the problem there it is an isolated bug that should be fixed regardless. Closes #3411 Reported-by: GitYuanQu on github Reviewed-by: Daniel Stenberg <daniel@haxx.se>
* urlapi: distinguish possibly empty queryLeonardo Taccari2018-12-131-3/+3
| | | | | | | | | | If just a `?' to indicate the query is passed always store a zero length query instead of having a NULL query. This permits to distinguish URL with trailing `?'. Fixes #3369 Closes #3370
* urlapi: Fix port parsing of eol colonDaniel Gustafsson2018-12-121-16/+16
| | | | | | | | | A URL with a single colon without a portnumber should use the default port, discarding the colon. Fix, add a testcase and also do little bit of comment wordsmithing. Closes #3365 Reviewed-by: Daniel Stenberg <daniel@haxx.se>
* tests: add urlapi unittestDaniel Gustafsson2018-12-111-2/+8
| | | | | | | | | | This adds a new unittest intended to cover the internal functions in the urlapi code, starting with parse_port(). In order to avoid name collisions in debug builds, parse_port() is renamed Curl_parse_port() since it will be exported. Reviewed-by: Daniel Stenberg <daniel@haxx.se> Reviewed-by: Marcel Raad <Marcel.Raad@teamviewer.com>
* urlapi: fix portnumber parsing for ipv6 zone indexDaniel Gustafsson2018-12-111-6/+20
| | | | | | | | | | | | | | An IPv6 URL which contains a zone index includes a '%%25<zode id>' string before the ending ']' bracket. The parsing logic wasn't set up to cope with the zone index however, resulting in a malformed url error being returned. Fix by breaking the parsing into two stages to correctly handle the zone index. Closes #3355 Closes #3319 Reported-by: tonystz on Github Reviewed-by: Daniel Stenberg <daniel@haxx.se> Reviewed-by: Marcel Raad <Marcel.Raad@teamviewer.com>
* snprintf: renamed and we now only use msnprintf()Daniel Stenberg2018-11-231-5/+5
| | | | | | | | | | | The function does not return the same value as snprintf() normally does, so readers may be mislead into thinking the code works differently than it actually does. A different function name makes this easier to detect. Reported-by: Tomas Hoger Assisted-by: Daniel Gustafsson Fixes #3296 Closes #3297
* urlapi: only skip encoding the first '=' with APPENDQUERY setDaniel Stenberg2018-11-071-1/+6
| | | | | | | | | APPENDQUERY + URLENCODE would skip all equals signs but now it only skip encoding the first to better allow "name=content" for any content. Reported-by: Alexey Melnichuk Fixes #3231 Closes #3231