| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add JS backend adapted from the GHCJS project by Luite Stegeman.
Some features haven't been ported or implemented yet. Tests for these
features have been disabled with an associated gitlab ticket.
Bump array submodule
Work funded by IOG.
Co-authored-by: Jeffrey Young <jeffrey.young@iohk.io>
Co-authored-by: Luite Stegeman <stegeman@gmail.com>
Co-authored-by: Josh Meredith <joshmeredith2008@gmail.com>
|
| |
|
|
|
|
| |
It may not always be a Unicode encoding
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in #17970, these (e.g. `getFileSystemEncoding` and
`setFileSystemEncoding`) previously had unfoldings, which would
break their global-ness.
While not strictly necessary, I also add a NOINLINE on
`initLocaleEncoding` since it is used in `System.IO`, ensuring that we
only system's query the locale encoding once.
Fixes #17970.
|
| |
|
|
|
|
|
|
|
|
| |
Character literals in Haddock should not be written as plain `'\n'` since
single quotes are for linking identifiers. Besides, since we want the
character literal to be monospaced, we really should use `@\'\\n\'@`.
[skip ci]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This drastically cuts down on the number of Haddock warnings when making
docs for `base`. Plus this means more actual links end up in the docs!
Also fixed other small mostly markup issues in the documentation along
the way.
This is a docs-only change.
Reviewers: hvr, bgamari, thomie
Reviewed By: thomie
Subscribers: thomie, rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5055
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Get utf8 encoded arguments before we call hs_init and use them
instead of ignoring hs_init arguments. This reduces differing
behaviour of the RTS between windows and linux and simplifies
the code involved.
A few testcases were changed to expect the same result on windows
as on linux after the changes.
This fixes #13940.
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar, Phyx
Subscribers: Phyx, rwbarton, thomie
GHC Trac Issues: #13940
Differential Revision: https://phabricator.haskell.org/D3739
|
|
|
|
|
|
|
|
| |
This fixes test encoding005 on Windows (#10623).
Reviewed by: austin, bgamari
Differential Revision: https://phabricator.haskell.org/D2262
|
|
|
|
|
|
|
|
|
| |
We already do this for UTF8/16/32, so it seems obvious do the same
for the closely related popular ISO 8859-1 encoding, and avoid iconv
issues on some platforms (such as AIX which which bundles a broken
`libiconv` by default)
This fixes #11096
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D898 and D1059 implemented a fallback behavior to handle the case
that the end user's iconv installation is broken (typically due to
running inside a chroot in which the necessary locale files and/or
gconv modules have not been installed). In this case, if the
program requests an ASCII locale, GHC's char8 encoding is used
rather than the program failing.
However, silently mangling data like char8 does when the programmer
did not ask for it is poor behavior, for reasons described in D1059.
This commit implements an ASCII encoding and uses it in the fallback
case when iconv is unavailable and the user has requested ASCII.
Test Plan:
Added tests for the encodings defined in Latin1.
Also, manually ran a statically-linked executable of that test
in a chroot and the tests passed (up to the ones that call
mkTextEncoding "LATIN1", since there is no fallback from iconv
for that case yet).
Reviewers: austin, hvr, hsyl20, bgamari
Reviewed By: hsyl20, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1085
GHC Trac Issues: #7695, #10623
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D898 was primarily intended to fix hangs in the event that iconv was
unavailable (namely #10298 and #7695). In addition to this fix, it also
introduced self-contained handling of ANSI terminals to allow compiled
executables to run in minimal environments lacking iconv.
However, the behavior that the patch introduced is highly suspicious.
Specifically, it gives the user a UTF-8 encoding even if they requested
ASCII.
This has the potential to break quite a lot of code. At very least it
breaks GHC's Unicode terminal detection logic, which attempts to catch
an invalid character when encoding a pair of smart-quotes. Of course,
this exception will never be thrown if a UTF-8 encoder is used.
Here we use the `char8` encoding to handle requests for ASCII encodings
in the event that we find iconv to be non-functional.
Fixes #10623.
Test Plan: Validate with T8959a
Reviewers: rwbarton, hvr, austin, hsyl20
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1059
GHC Trac Issues: #10623
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This applies a patch from Reid Barton and Sylvain Henry, which fix a
disasterous infinite loop when iconv fails to load locale files, as
specified in #10298.
The fix is a bit of a hack but should be fine - for the actual reasoning
behind it, see `Note [Disaster and iconv]` for more info.
In addition to this fix, we also patch up the IO Encoding utilities to
recognize several variations of the 'ASCII' encoding (including its
aliases) directly so that GHC can do conversions without iconv. This
allows a static binary to sit in an initramfs.
Authored-by: Reid Barton <rwbarton@gmail.com>
Authored-by: Sylvain Henry <hsyl20@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
Test Plan: Eyeballed it.
Reviewers: rwbarton, hvr
Subscribers: bgamari, thomie
Differential Revision: https://phabricator.haskell.org/D898
GHC Trac Issues: #10298, #7695
|
|
|
|
|
|
| |
Starting with Haddock 2.16 there's a new built-in support for since-annotations
Note: This exposes a bug in the `@since` implementation (see e.g. `Data.Bits`)
|
|
|
|
| |
...several modules in `base` recently touched by me
|
|
|
|
|
|
| |
This is preparatory refactoring for avoiding import cycles
when `Data.Traversable` will be imported by `Control.Monad` and
`Data.List` for implementing #9586
|
|
|
|
|
|
|
| |
This is preparatory work for reintroducing SPECIALISEs that were lost
in d94de87252d0fe2ae97341d186b03a2fbe136b04
Differential Revision: https://phabricator.haskell.org/D214
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes language pragmas from Haskell modules which are implicitly
active with `default-language: Haskell2010`. Specifically, the following
language extension pragmas are removed by this commit:
- PatternGuards
- ForeignFunctionInterface
- EmptyDataDecls
- NoBangPatterns
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
|
|
|
|
|
|
|
|
|
| |
This commit retroactively adds `/Since: 4.4.0.0/` annotations to symbols
newly added/exposed in `base-4.4.0.0` (as shipped with GHC 7.2.1).
See also 6368362f which adds the respective annotation for symbols newly
added in `base-4.7.0.0` (that goes together with GHC 7.8.1).
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
|
|
|
|
|
|
|
|
|
| |
This commit retroactively adds `/Since: 4.5.[01].0/` annotations to symbols
newly added/exposed in `base-4.5.[01].0` (as shipped with GHC 7.4.[12]).
See also 6368362f which adds the respective annotation for symbols newly
added in `base-4.7.0.0` (that goes together with GHC 7.8.1).
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
and correct the documentation for hSetBinaryMode which claimed that
it was using the latin1 encoding when in fact it was using an
unchecked modulo-256 version of it.
|
|
|
|
|
|
|
|
|
| |
This replaces the previous scheme (which used lone surrogates). The reason is that
there is Haskell software in the wild (i.e. the text package) that chokes on Char values
that do not represent Unicode characters.
This new approach will not work correctly if the reserved private-use characters are
actually encountered in the input, but we expect this to be rare.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855.
The major changes are:
1) Make Foreign.C.String.*CString use the locale encoding
This change follows the FFI specification in Haskell 98, which
has never actually been implemented before.
The functions exported from Foreign.C.String are partially-applied
versions of those from GHC.Foreign, which allows the user to supply
their own TextEncoding.
We also introduce foreignEncoding as the name of the text encoding
that follows the FFI appendix in that it transliterates encoding
errors.
2) I also changed the code so that mkTextEncoding always tries the
native-Haskell decoders in preference to those from iconv, even on
non-Windows. The motivation here is simply that it is better for
compatibility if we do this, and those are the ones you get for
the utf* and latin1* predefined TextEncodings anyway.
3) Implement surrogate-byte error handling mode for TextEncoding
This implements PEP383-like behaviour so that we are able to
roundtrip byte strings through Strings without loss of information.
The withFilePath function now uses this encoding to get to/from CStrings,
so any code that uses that will get the right PEP383 behaviour automatically.
4) Implement three other coding failure modes: ignore, throw error, transliterate
These mimic the behaviour of the GNU Iconv extensions.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add explicit {-# LANGUAGE xxx #-} pragmas to each module, that say
what extensions that module uses. This makes it clearer where
different extensions are used in the (large, variagated) base package.
Now base.cabal doesn't need any extensions field
Thanks to Bas van Dijk for doing all the work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We keep all of the code page tables in the module
GHC.IO.Encoding.CodePage.Table. That file was generated automatically
by running codepages/MakeTable.hs; more details are in the comments at the
start of that script.
Storing the lookup tables adds about 40KB to each statically linked executable;
this only increases the size of a "hello world" program by about 7%.
Currently we do not support double-byte encodings (Chinese/Japanese/Korean), since
including those codepages would increase the table size to 400KB. It will be
straightforward to implement them once the work on library DLLs is finished.
|
| |
|
|
|
|
|
| |
noting that "//IGNORE" and "//TRANSLIT" suffixes can be used with GNU
iconv.
|
|
|
|
| |
as suggested during the discussion on the libraries list.
|
| |
|
|
|
|
|
| |
These unused imports are detected by the new unused-import code
|
|
Highlights:
* Unicode support for Handle I/O:
** Automatic encoding and decoding using a per-Handle encoding.
** The encoding defaults to the locale encoding (only on Unix
so far, perhaps Windows later).
** Built-in UTF-8, UTF-16 (BE/LE), and UTF-32 (BE/LE) codecs.
** iconv-based codec for other encodings on Unix
* Modularity: the low-level IO interface is exposed as a type class
(GHC.IO.IODevice) so you can build your own low-level IO providers and
make Handles from them.
* Newline translation: instead of being Windows-specific wired-in
magic, the translation from \r\n -> \n and back again is available
on all platforms and is configurable for reading/writing
independently.
Unicode-aware Handles
~~~~~~~~~~~~~~~~~~~~~
This is a significant restructuring of the Handle implementation with
the primary goal of supporting Unicode character encodings.
The only change to the existing behaviour is that by default, text IO
is done in the prevailing locale encoding of the system (except on
Windows [1]).
Handles created by openBinaryFile use the Latin-1 encoding, as do
Handles placed in binary mode using hSetBinaryMode.
We provide a way to change the encoding for an existing Handle:
GHC.IO.Handle.hSetEncoding :: Handle -> TextEncoding -> IO ()
and various encodings (from GHC.IO.Encoding):
latin1,
utf8,
utf16, utf16le, utf16be,
utf32, utf32le, utf32be,
localeEncoding,
and a way to lookup other encodings:
GHC.IO.Encoding.mkTextEncoding :: String -> IO TextEncoding
(it's system-dependent whether the requested encoding will be
available).
We may want to export these from somewhere more permanent; that's a
topic for a future library proposal.
Thanks to suggestions from Duncan Coutts, it's possible to call
hSetEncoding even on buffered read Handles, and the right thing
happens. So we can read from text streams that include multiple
encodings, such as an HTTP response or email message, without having
to turn buffering off (though there is a penalty for switching
encodings on a buffered Handle, as the IO system has to do some
re-decoding to figure out where it should start reading from again).
If there is a decoding error, it is reported when an attempt is made
to read the offending character from the Handle, as you would expect.
Performance varies. For "hGetContents >>= putStr" I found the new
library was faster on my x86_64 machine, but slower on an x86. On the
whole I'd expect things to be a bit slower due to the extra
decoding/encoding, but probabaly not noticeably. If performance is
critical for your app, then you should be using bytestring and text
anyway.
[1] Note: locale encoding is not currently implemented on Windows due
to the built-in Win32 APIs for encoding/decoding not being sufficient
for our purposes. Ask me for details. Offers of help gratefully
accepted.
Newline Translation
~~~~~~~~~~~~~~~~~~~
In the old IO library, text-mode Handles on Windows had automatic
translation from \r\n -> \n on input, and the opposite on output. It
was implemented using the underlying CRT functions, which meant that
there were certain odd restrictions, such as read/write text handles
needing to be unbuffered, and seeking not working at all on text
Handles.
In the rewrite, newline translation is now implemented in the upper
layers, as it needs to be since we have to perform Unicode decoding
before newline translation. This means that it is now available on
all platforms, which can be quite handy for writing portable code.
For now, I have left the behaviour as it was, namely \r\n -> \n on
Windows, and no translation on Unix. However, another reasonable
default (similar to what Python does) would be to do \r\n -> \n on
input, and convert to the platform-native representation (either \r\n
or \n) on output. This is called universalNewlineMode (below).
The API is as follows. (available from GHC.IO.Handle for now, again
this is something we will probably want to try to get into System.IO
at some point):
-- | The representation of a newline in the external file or stream.
data Newline = LF -- ^ "\n"
| CRLF -- ^ "\r\n"
deriving Eq
-- | Specifies the translation, if any, of newline characters between
-- internal Strings and the external file or stream. Haskell Strings
-- are assumed to represent newlines with the '\n' character; the
-- newline mode specifies how to translate '\n' on output, and what to
-- translate into '\n' on input.
data NewlineMode
= NewlineMode { inputNL :: Newline,
-- ^ the representation of newlines on input
outputNL :: Newline
-- ^ the representation of newlines on output
}
deriving Eq
-- | The native newline representation for the current platform
nativeNewline :: Newline
-- | Map "\r\n" into "\n" on input, and "\n" to the native newline
-- represetnation on output. This mode can be used on any platform, and
-- works with text files using any newline convention. The downside is
-- that @readFile a >>= writeFile b@ might yield a different file.
universalNewlineMode :: NewlineMode
universalNewlineMode = NewlineMode { inputNL = CRLF,
outputNL = nativeNewline }
-- | Use the native newline representation on both input and output
nativeNewlineMode :: NewlineMode
nativeNewlineMode = NewlineMode { inputNL = nativeNewline,
outputNL = nativeNewline }
-- | Do no newline translation at all.
noNewlineTranslation :: NewlineMode
noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF }
-- | Change the newline translation mode on the Handle.
hSetNewlineMode :: Handle -> NewlineMode -> IO ()
IO Devices
~~~~~~~~~~
The major change here is that the implementation of the Handle
operations is separated from the underlying IO device, using type
classes. File descriptors are just one IO provider; I have also
implemented memory-mapped files (good for random-access read/write)
and a Handle that pipes output to a Chan (useful for testing code that
writes to a Handle). New kinds of Handle can be implemented outside
the base package, for instance someone could write bytestringToHandle.
A Handle is made using mkFileHandle:
-- | makes a new 'Handle'
mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev)
=> dev -- ^ the underlying IO device, which must support
-- 'IODevice', 'BufferedIO' and 'Typeable'
-> FilePath
-- ^ a string describing the 'Handle', e.g. the file
-- path for a file. Used in error messages.
-> IOMode
-- ^ The mode in which the 'Handle' is to be used
-> Maybe TextEncoding
-- ^ text encoding to use, if any
-> NewlineMode
-- ^ newline translation mode
-> IO Handle
This also means that someone can write a completely new IO
implementation on Windows based on native Win32 HANDLEs, and
distribute it as a separate package (I really hope somebody does
this!).
This restructuring isn't as radical as previous designs. I haven't
made any attempt to make a separate binary I/O layer, for example
(although hGetBuf/hPutBuf do bypass the text encoding and newline
translation). The main goal here was to get Unicode support in, and
to allow others to experiment with making new kinds of Handle. We
could split up the layers further later.
API changes and Module structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NB. GHC.IOBase and GHC.Handle are now DEPRECATED (they are still
present, but are just re-exporting things from other modules now).
For 6.12 we'll want to bump base to version 5 and add a base4-compat.
For now I'm using #if __GLASGOW_HASKEL__ >= 611 to avoid deprecated
warnings.
I split modules into smaller parts in many places. For example, we
now have GHC.IORef, GHC.MVar and GHC.IOArray containing the
implementations of IORef, MVar and IOArray respectively. This was
necessary for untangling dependencies, but it also makes things easier
to follow.
The new module structurue for the IO-relatied parts of the base
package is:
GHC.IO
Implementation of the IO monad; unsafe*; throw/catch
GHC.IO.IOMode
The IOMode type
GHC.IO.Buffer
Buffers and operations on them
GHC.IO.Device
The IODevice and RawIO classes.
GHC.IO.BufferedIO
The BufferedIO class.
GHC.IO.FD
The FD type, with instances of IODevice, RawIO and BufferedIO.
GHC.IO.Exception
IO-related Exceptions
GHC.IO.Encoding
The TextEncoding type; built-in TextEncodings; mkTextEncoding
GHC.IO.Encoding.Types
GHC.IO.Encoding.Iconv
GHC.IO.Encoding.Latin1
GHC.IO.Encoding.UTF8
GHC.IO.Encoding.UTF16
GHC.IO.Encoding.UTF32
Implementation internals for GHC.IO.Encoding
GHC.IO.Handle
The main API for GHC's Handle implementation, provides all the Handle
operations + mkFileHandle + hSetEncoding.
GHC.IO.Handle.Types
GHC.IO.Handle.Internals
GHC.IO.Handle.Text
Implementation of Handles and operations.
GHC.IO.Handle.FD
Parts of the Handle API implemented by file-descriptors: openFile,
stdin, stdout, stderr, fdToHandle etc.
|