summaryrefslogtreecommitdiff
path: root/chromium/docs/website/site/developers/chromium-string-usage/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'chromium/docs/website/site/developers/chromium-string-usage/index.md')
-rw-r--r--chromium/docs/website/site/developers/chromium-string-usage/index.md96
1 files changed, 0 insertions, 96 deletions
diff --git a/chromium/docs/website/site/developers/chromium-string-usage/index.md b/chromium/docs/website/site/developers/chromium-string-usage/index.md
deleted file mode 100644
index b84e46e1944..00000000000
--- a/chromium/docs/website/site/developers/chromium-string-usage/index.md
+++ /dev/null
@@ -1,96 +0,0 @@
----
-breadcrumbs:
-- - /developers
- - For Developers
-page_name: chromium-string-usage
-title: Chromium String usage
----
-
-Types of Strings
-In the Chromium code base, we use std::string and std::u16string. Blink uses
-WTF::String instead, which is patterned on std::string, but is a slightly
-different class (see the
-[docs](https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/blink/renderer/platform/wtf/text/README.md)
-for their guidelines, we’ll only talk about Chromium here). We also have a
-StringPiece\[16\] class, which is basically a pointer to a string that is owned
-elsewhere with a length of how many characters from the other string form this
-“token”. Finally, there is also WebCString and WebString, which is used by the
-webkit glue layer.
-String Encodings
-We use a variety of encodings in the code base. UTF-8 is most common, but we
-also use ASCII and UTF-16.
-
-* ASCII is the older 7-bit encoding which includes 0-9, a-z, A-Z, and
- a few common punctuation characters, but not much else. ASCII text
- (common in HTML, CCS, and JavaScript) uses one byte per character.
-* UTF-8 is an encoding where characters are one or more bytes (up to
- 6) in length. Each byte indicates whether another byte follows. All
- ASCII characters are UTF-8 characters, so code that works correctly
- with UTF-8 will handle ASCII.
-* UTF-16 is an encoding where all characters are at least two bytes
- long. There are also four-byte UTF-16 characters (a pair of two
- 16-bit code units, called a "surrogate pair"). Common cases of
- 4-byte characters include most Emoji as well as some Chinese
- characters.
-* **UTF-32** is an encoding where all characters are four bytes long.
-
-When to use which encoding
-The most important rule here is the meta-rule, code in the style of the
-surrounding code. In the frontend, we use std::string/char for UTF-8 and
-std::u16string16/char16_t for UTF-16 on all platforms. Even though std::string
-is encoding agnostic, we only put UTF-8 into it. std::wstring/wchar_t is rarely
-used in cross-platform code (in part because it's differently-sized on different
-platforms), but common in Windows-specific code to interface with native APIs
-(which often take wchar_t\* or similar). Most UI strings are UTF-16. URLs are
-generally UTF-8. Strings in the webkit glue layer are typically UTF-16 with
-several exceptions. Chromium code does not use UTF-32.
-The GURL class and strings
-One common data type using strings is the GURL class. The constructor takes a
-std::string in UTF-8 for the URL itself. If you have a GURL, you can use the
-spec() method to get the std::string for the entire URL, or you can use
-component methods to get parsed parts, such as scheme(), host(), port(), path(),
-query(), and ref(), all of which return a std::string. All the parts of the GURL
-with the exception of the ref string will be pure ASCII. The ref string *may*
-have UTF-8 characters which are not also ASCII characters.
-Guidelines for string use in our codebase
-
-* Use std::string from the C++ standard library for normal use with
- strings
-* Length checking - if checking for empty, prefer “string.empty():” to
- “string.length() == 0”
-* When you make a string constant, use char\[\] (or char16_t\[\])
- instead of a std::string:
- * const char kFoo\[\] = “foo”;
- * const char16_t kBar\[\] = u"bar";
- * This is part of our style guidelines. It also makes code faster
- because there are no destructors, and more maintainable because
- there are no shutdown order dependencies.
-* There are many handy routines which operate on strings. You can use
- IntToString() if you want to do atoi(), and StringPrintf() if you
- need the full power of printf. You can use WriteInto() to make a C++
- string writeable by a C API. StringPiece makes it easy and efficient
- to write functions that take both C++ and C style strings.
-* For function input parameters, prefer to pass a string by const
- reference instead of making a new copy.
-* For function output parameters, it is OK to either return a new
- string or pass a pointer to a string. Performance wise, there isn’t
- much difference.
-* Often, efficiency is not paramount, but sometimes it is - when
- working in an inner loop, pay special attention to minimize the
- amount of string construction, and the number of temporary copies
- made.
- * When you use std::string, you can end up constructing lots of
- temporary string objects if you aren’t careful, or copying the
- string lots of times. Each copy makes a call to malloc, which
- needs a lock, and slows things down. Try to minimize how many
- temporaries get constructed.
- * When building a string, prefer “string1 += string2; string1 +=
- string3;” to “string1 = string1 + string2 + string3;” Better
- still, use base::StrCat().
-* For localization, we have the ICU library, with many useful helpers
- to do things like find word boundaries or convert to lowercase or
- uppercase correctly for the current locale.
-* We try to avoid repeated conversions between string encoding
- formats, as converting them is not cheap. It's generally OK to
- convert once, but if we have code that toggles the encoding six
- times as a string goes through some pipeline, that should be fixed. \ No newline at end of file