summaryrefslogtreecommitdiff
path: root/compiler/nativeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* Dwarf: Produce .dwarf_aranges sectionBen Gamari2015-08-293-21/+77
| | | | | | | | | | Test Plan: Check with `readelf --debug-dump=ranges` Reviewers: scpmw, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1174
* Dwarf: Produce {low,high}_pc attributes for compilation unitsBen Gamari2015-08-292-2/+13
| | | | | | | | | | | | | Some libraries (e.g. elfutils) need these otherwise they ignore our DWARF annotations. Test Plan: Test with elfutils' `readelf --debug-dump=cu_index` Reviewers: scpmw, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1173
* Dwarf: Fix DW_AT_use_UTF8 attributeBen Gamari2015-08-292-5/+6
| | | | | | | | | | | | | Previously this was given in the body but not in the abbreviation table. Who knows what sort of havoc this was wrecking. Test Plan: Verify against DWARF4 specification Reviewers: scpmw, austin Subscribers: Tarrasch, thomie Differential Revision: https://phabricator.haskell.org/D1172
* Refactor: delete most of the module FastTypesThomas Miedema2015-08-215-92/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverses some of the work done in #1405, and goes back to the assumption that the bootstrap compiler understands GHC-haskell. In particular: * use MagicHash instead of _ILIT and _CLIT * pattern matching on I# if possible, instead of using iUnbox unnecessarily * use Int#/Char#/Addr# instead of the following type synonyms: - type FastInt = Int# - type FastChar = Char# - type FastPtr a = Addr# * inline the following functions: - iBox = I# - cBox = C# - fastChr = chr# - fastOrd = ord# - eqFastChar = eqChar# - shiftLFastInt = uncheckedIShiftL# - shiftR_FastInt = uncheckedIShiftRL# - shiftRLFastInt = uncheckedIShiftRL# * delete the following unused functions: - minFastInt - maxFastInt - uncheckedIShiftRA# - castFastPtr - panicDocFastInt and pprPanicFastInt * rename panicFastInt back to panic# These functions remain, since they actually do something: * iUnbox * bitAndFastInt * bitOrFastInt Test Plan: validate Reviewers: austin, bgamari Subscribers: rwbarton Differential Revision: https://phabricator.haskell.org/D1141 GHC Trac Issues: #1405
* Delete FastBoolThomas Miedema2015-08-219-27/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverses some of the work done in Trac #1405, and assumes GHC is smart enough to do its own unboxing of booleans now. I would like to do some more performance measurements, but the code changes can be reviewed already. Test Plan: With a perf build: ./inplace/bin/ghc-stage2 nofib/spectral/simple/Main.hs -fforce-recomp +RTS -t --machine-readable before: ``` [("bytes allocated", "1300744864") ,("num_GCs", "302") ,("average_bytes_used", "8811118") ,("max_bytes_used", "24477464") ,("num_byte_usage_samples", "9") ,("peak_megabytes_allocated", "64") ,("init_cpu_seconds", "0.001") ,("init_wall_seconds", "0.001") ,("mutator_cpu_seconds", "2.833") ,("mutator_wall_seconds", "4.283") ,("GC_cpu_seconds", "0.960") ,("GC_wall_seconds", "0.961") ] ``` after: ``` [("bytes allocated", "1301088064") ,("num_GCs", "310") ,("average_bytes_used", "8820253") ,("max_bytes_used", "24539904") ,("num_byte_usage_samples", "9") ,("peak_megabytes_allocated", "64") ,("init_cpu_seconds", "0.001") ,("init_wall_seconds", "0.001") ,("mutator_cpu_seconds", "2.876") ,("mutator_wall_seconds", "4.474") ,("GC_cpu_seconds", "0.965") ,("GC_wall_seconds", "0.979") ] ``` CPU time seems to be up a bit, but I'm not sure. Unfortunately CPU time measurements are rather noisy. Reviewers: austin, bgamari, rwbarton Subscribers: nomeata Differential Revision: https://phabricator.haskell.org/D1143 GHC Trac Issues: #1405
* Fix misspelled function name in a commentBartosz Nitka2015-07-301-1/+1
| | | | | | | | | | | | Test Plan: none Reviewers: austin, simonmar, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1112
* Fix todo in compiler/nativeGen: Rename Size to Formatmarkus2015-07-0721-935/+944
| | | | | | | | | | | | | | | | | | This commit renames the Size module in the native code generator to Format, as proposed by a todo, as well as adjusting parameter names in other modules that use it. Test Plan: validate Reviewers: austin, simonmar, bgamari Reviewed By: simonmar, bgamari Subscribers: bgamari, simonmar, thomie Projects: #ghc Differential Revision: https://phabricator.haskell.org/D865
* Implement PowerPC 64-bit native code backend for LinuxPeter Trommler2015-07-0310-238/+840
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the PowerPC 32-bit native code generator for "64-bit PowerPC ELF Application Binary Interface Supplement 1.9" by Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification -- OpenPOWER ABI for Linux Supplement" by IBM. The latter ABI is mainly used on POWER7/7+ and POWER8 Linux systems running in little-endian mode. The code generator supports both static and dynamic linking. PowerPC 64-bit code for ELF ABI 1.9 and 2 is mostly position independent anyway, and thus so is all the code emitted by the code generator. In other words, -fPIC does not make a difference. rts/stg/SMP.h support is implemented. Following the spirit of the introductory comment in PPC/CodeGen.hs, the rest of the code is a straightforward extension of the 32-bit implementation. Limitations: * Code is generated only in the medium code model, which is also gcc's default * Local symbols are not accessed directly, which seems to also be the case for 32-bit * LLVM does not work, but this does not work on 32-bit either * Must use the system runtime linker in GHCi, because the GHC linker for "static" object files (rts/Linker.c) for PPC 64-bit is not implemented. The system runtime (dynamic) linker works. * The handling of the system stack (register 1) is not ELF- compliant so stack traces break. Instead of allocating a new stack frame, spill code should use the "official" spill area in the current stack frame and deallocation code should restore the back chain * DWARF support is missing Fixes #9863 Test Plan: validate (on powerpc, too) Reviewers: simonmar, trofi, erikd, austin Reviewed By: trofi Subscribers: bgamari, arnons1, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D629 GHC Trac Issues: #9863
* Encode alignment in MO_Memcpy and friendsBen Gamari2015-06-163-48/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Alignment needs to be a compile-time constant. Previously the code generators had to jump through hoops to ensure this was the case as the alignment was passed as a CmmExpr in the arguments list. Now we take care of this up front. This fixes #8131. Authored-by: Reid Barton <rwbarton@gmail.com> Dusted-off-by: Ben Gamari <ben@smart-cactus.org> Tests for T8131 Test Plan: Validate Reviewers: rwbarton, austin Reviewed By: rwbarton, austin Subscribers: bgamari, carter, thomie Differential Revision: https://phabricator.haskell.org/D624 GHC Trac Issues: #8131
* Fix DWARF generation for MinGW (#10468)Peter Wortmann2015-06-113-12/+16
| | | | | | | Fortunately this is relatively straightforward - all we need to do is switch to a non-ELF-specific way of specifying object file sections and make sure that section-relative addresses work correctly. This is enough to make "gdb" work on MinGW builds.
* Greatly speed up nativeCodeGen/seqBlocksJoachim Breitner2015-05-161-18/+35
| | | | | | | | | When working on #10397, I noticed that "reorder" in nativeCodeGen/seqBlocks took more than 60% of the time. With this refactoring, it does not even show up in the profile any more. This fixes #10422. Differential Revision: https://phabricator.haskell.org/D893
* Restore unwind information generationPeter Wortmann2015-04-031-2/+2
| | | | | | | | | | | | | | | | While we want to reduce the amount of information generated into debug_info, it really doesn't make sense to remove block information for unwinding. This is a simple oversight introduced in 4ab57024, which severly reduces the usefulness of generated unwind data. Thanks to bitonic for spotting this! Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D792 GHC Trac Issues: #10236
* Refactor the story around switches (#10137)Joachim Breitner2015-03-303-17/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-implements the code generation for case expressions at the Stg → Cmm level, both for data type cases as well as for integral literal cases. (Cases on float are still treated as before). The goal is to allow for fancier strategies in implementing them, for a cleaner separation of the strategy from the gritty details of Cmm, and to run this later than the Common Block Optimization, allowing for one way to attack #10124. The new module CmmSwitch contains a number of notes explaining this changes. For example, it creates larger consecutive jump tables than the previous code, if possible. nofib shows little significant overall improvement of runtime. The rather large wobbling comes from changes in the code block order (see #8082, not much we can do about it). But the decrease in code size alone makes this worthwhile. ``` Program Size Allocs Runtime Elapsed TotalMem Min -1.8% 0.0% -6.1% -6.1% -2.9% Max -0.7% +0.0% +5.6% +5.7% +7.8% Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0% ``` Compilation time increases slightly: ``` -1 s.d. ----- -2.0% +1 s.d. ----- +2.5% Average ----- +0.3% ``` The test case T783 regresses a lot, but it is the only one exhibiting any regression. The cause is the changed order of branches in an if-then-else tree, which makes the hoople data flow analysis traverse the blocks in a suboptimal order. Reverting that gets rid of this regression, but has a consistent, if only very small (+0.2%), negative effect on runtime. So I conclude that this test is an extreme outlier and no reason to change the code. Differential Revision: https://phabricator.haskell.org/D720
* Replace .lhs with .hs in compiler commentsYuri de Wit2015-02-093-4/+4
| | | | | | | | | | | | | | Summary: It looks like during .lhs -> .hs switch the comments were not updated. So doing exactly that. Reviewers: austin, jstolarek, hvr, goldfire Reviewed By: austin, jstolarek Subscribers: thomie, goldfire Differential Revision: https://phabricator.haskell.org/D621 GHC Trac Issues: #9986
* Dwarf generation fixed pt 2Peter Wortmann2015-01-133-15/+28
| | | | | | | | | - Don't bracket HsTick expression uneccessarily - Generate debug information in UTF8 - Reduce amount of information generated - we do not currently need block information, for example. Special thanks to slyfox for the reports!
* Remove redundant constraints in the compiler itself, found by ↵Simon Peyton Jones2015-01-063-10/+7
| | | | -fwarn-redundant-constraints
* Some Dwarf generation fixesPeter Wortmann2014-12-182-4/+18
| | | | | | | | - Make abbrev offset absolute on Non-Mac systems - Add another termination byte at the end of the abbrev section (readelf complains) - Scope combination was wrong for the simpler cases - Shouldn't have a "global/" in front of all scopes
* Generate DWARF unwind informationPeter Wortmann2014-12-164-4/+347
| | | | | | | | | | | | | | | | | | | | | | | | | | This tells debuggers such as GDB how to "unwind" a program state, which allows them to walk the stack up. Notes: * The code is quite general, perhaps unnecessarily so. Unless we get more unwind information, only the first case of pprSetUnwind will get used - and pprUnwindExpr and pprUndefUnwind will never be called. It just so happens that this is a point where we can get a lot of features cheaply, even if we don't use them. * When determining what location to show for a return address, most debuggers check the map for "rip-1", assuming that's where the "call" instruction is. For tables-next-to-code, that happens to always be the end of an info table. We therefore cheat a bit here by shifting .debug_frame information so it covers the end of the info table, as well as generating a .loc directive for the info table data. Debuggers will still show the wrong label for the return address, though. Haven't found a way around that one yet. (From Phabricator D396)
* Generate DWARF info sectionPeter Wortmann2014-12-165-33/+493
| | | | | | | | | | | | | | | | This is where we actually make GHC emit DWARF code. The info section contains all the general meta information bits as well as an entry for every block of native code. Notes: * We need quite a few new labels in order to properly address starts and ends of blocks. * Thanks to Nathan Howell for taking the iniative to get our own Haskell language ID for DWARF! (From Phabricator D396)
* Generate .loc/.file directives from source ticksPeter Wortmann2014-12-166-57/+137
| | | | | | | | | | | | | | | | | | | | | | | | This generates DWARF, albeit indirectly using the assembler. This is the easiest (and, apparently, quite standard) method of generating the .debug_line DWARF section. Notes: * Note we have to make sure that .file directives appear correctly before the respective .loc. Right now we ppr them manually, which makes them absent from dumps. Fixing this would require .file to become a native instruction. * We have to pass a lot of things around the native code generator. I know Ian did quite a bit of refactoring already, but having one common monad could *really* simplify things here... * To support SplitObjcs, we need to emit/reset all DWARF data at every split. We use the occassion to move split marker generation to cmmNativeGenStream as well, so debug data extraction doesn't have to choke on it. (From Phabricator D396)
* Debug data extraction (NCG support)Peter Wortmann2014-12-161-50/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of the Debug module is to collect all required information to generate debug information (DWARF etc.) in the back-ends. Our main data structure is the "debug block", which carries all information we have about a block of code that is going to get produced. Notes: * Debug blocks are arranged into a tree according to tick scopes. This makes it easier to reason about inheritance rules. Note however that tick scopes are not guaranteed to form a tree, which requires us to "copy" ticks to not lose them. * This is also where we decide what source location we regard as representing a code block the "best". The heuristic is basically that we want the most specific source reference that comes from the same file we are currently compiling. This seems to be the most useful choice in my experience. * We are careful to not be too lazy so we don't end up breaking streaming. Debug data will be kept alive until the end of codegen, after all. * We change native assembler dumps to happen right away for every Cmm group. This simplifies the code somewhat and is consistent with how pretty much all of GHC handles dumps with respect to streamed code. (From Phabricator D169)
* Add unwind information to CmmPeter Wortmann2014-12-163-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Unwind information allows the debugger to discover more information about a program state, by allowing it to "reconstruct" other states of the program. In practice, this means that we explain to the debugger how to unravel stack frames, which comes down mostly to explaining how to find their Sp and Ip register values. * We declare yet another new constructor for CmmNode - and this time there's actually little choice, as unwind information can and will change mid-block. We don't actually make use of these capabilities, and back-end support would be tricky (generate new labels?), but it feels like the right way to do it. * Even though we only use it for Sp so far, we allow CmmUnwind to specify unwind information for any register. This is pretty cheap and could come in useful in future. * We allow full CmmExpr expressions for specifying unwind values. The advantage here is that we don't have to make up new syntax, and can e.g. use the WDS macro directly. On the other hand, the back-end will now have to simplify the expression until it can sensibly be converted into DWARF byte code - a process which might fail, yielding NCG panics. On the other hand, when you're writing Cmm by hand you really ought to know what you're doing. (From Phabricator D169)
* Tick scopesPeter Wortmann2014-12-163-3/+6
| | | | | | | | | | | | | | | | | | | | | | This patch solves the scoping problem of CmmTick nodes: If we just put CmmTicks into blocks we have no idea what exactly they are meant to cover. Here we introduce tick scopes, which allow us to create sub-scopes and merged scopes easily. Notes: * Given that the code often passes Cmm around "head-less", we have to make sure that its intended scope does not get lost. To keep the amount of passing-around to a minimum we define a CmmAGraphScoped type synonym here that just bundles the scope with a portion of Cmm to be assembled later. * We introduce new scopes at somewhat random places, aligning with getCode calls. This works surprisingly well, but we might have to add new scopes into the mix later on if we find things too be too coarse-grained. (From Phabricator D169)
* Source notes (Cmm support)Peter Wortmann2014-12-163-0/+3
| | | | | | | | | | | | | | | | | | | | This patch adds CmmTick nodes to Cmm code. This is relatively straight-forward, but also not very useful, as many blocks will simply end up with no annotations whatosever. Notes: * We use this design over, say, putting ticks into the entry node of all blocks, as it seems to work better alongside existing optimisations. Now granted, the reason for this is that currently GHC's main Cmm optimisations seem to mainly reorganize and merge code, so this might change in the future. * We have the Cmm parser generate a few source notes as well. This is relatively easy to do - worst part is that it complicates the CmmParse implementation a bit. (From Phabricator D169)
* powerpc: fix and enable shared libraries by default on linuxSergei Trofimovich2014-12-145-32/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: And fix things all the way down to it. Namely: - remove 'r30' from free registers, it's an .LCTOC1 register for gcc. generated .plt stubs expect it to be initialised. - fix PicBase computation, which originally forgot to use 'tmp' reg in 'initializePicBase_ppc.fetchPC' - mark 'ForeighTarget's as implicitly using 'PicBase' register (see comment for details) - add 64-bit MO_Sub and test on alloclimit3/4 regtests - fix dynamic label offsets to match with .LCTOC1 offset Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: validate passes equal amount of vanilla/dyn tests Reviewers: simonmar, erikd, austin Reviewed By: erikd, austin Subscribers: carter, thomie Differential Revision: https://phabricator.haskell.org/D560 GHC Trac Issues: #8024, #9831
* Unlit AsmCodeGen.lhsHerbert Valerio Riedel2014-11-301-4/+0
| | | | Fwiw, this wasn't really a proper .lhs to begin with...
* Allow -dead_strip linking on platforms with .subsections_via_symbolsMoritz Angermann2014-11-193-18/+3
| | | | | | | | | | | | | | | | | Summary: This allows to link objects produced with the llvm code generator to be linked with -dead_strip. This applies to at least the iOS cross compiler and OS X compiler. Signed-off-by: Moritz Angermann <moritz@lichtzwerge.de> Test Plan: Create a ffi library and link it with -dead_strip. If the resulting binary does not crash, the patch works as advertised. Reviewers: rwbarton, simonmar, hvr, dterei, mzero, ezyang, austin Reviewed By: dterei, ezyang, austin Subscribers: thomie, mzero, simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D206
* Per-thread allocation counters and limitsSimon Marlow2014-11-123-21/+60
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* Some refactoring around endPass and debug dumpingSimon Peyton Jones2014-11-041-4/+4
| | | | | I forget all the details, but I spent some time trying to understand the current setup, and tried to simplify it a bit
* Revert "Place static closures in their own section."Edward Z. Yang2014-10-203-6/+0
| | | | | | | | | | This reverts commit b23ba2a7d612c6b466521399b33fe9aacf5c4f75. Conflicts: compiler/cmm/PprCmmDecl.hs compiler/nativeGen/PPC/Ppr.hs compiler/nativeGen/SPARC/Ppr.hs compiler/nativeGen/X86/Ppr.hs
* Indentation and non-semantic changes only.Edward Z. Yang2014-10-193-76/+80
| | | | | | | | | | | | | | | Summary: Get these lines fitting in 80 columns, and replace ptext (sLit ...) with text Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D342
* Implement optimized NCG `MO_Ctz W64` op for i386 (#9340)Herbert Valerio Riedel2014-10-181-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is an optimization to the CTZ primops introduced for #9340 Previously we called out to `hs_ctz64`, but we can actually generate better hand-tuned code while avoiding the FFI ccall. With this patch, the code {-# LANGUAGE MagicHash #-} module TestClz0 where import GHC.Prim ctz64 :: Word64# -> Word# ctz64 x = ctz64# x results in the following assembler generated by NCG on i386: TestClz.ctz64_info: movl (%ebp),%eax movl 4(%ebp),%ecx movl %ecx,%edx orl %eax,%edx movl $64,%edx je _nAO bsf %ecx,%ecx addl $32,%ecx bsf %eax,%eax cmovne %eax,%ecx movl %ecx,%edx _nAO: movl %edx,%esi addl $8,%ebp jmp *(%ebp) For comparision, here's what LLVM 3.4 currently generates: 000000fc <TestClzz_ctzz64_info>: fc: 0f bc 45 04 bsf 0x4(%ebp),%eax 100: b9 20 00 00 00 mov $0x20,%ecx 105: 0f 45 c8 cmovne %eax,%ecx 108: 83 c1 20 add $0x20,%ecx 10b: 8b 45 00 mov 0x0(%ebp),%eax 10e: 8b 55 08 mov 0x8(%ebp),%edx 111: 0f bc f0 bsf %eax,%esi 114: 85 c0 test %eax,%eax 116: 0f 44 f1 cmove %ecx,%esi 119: 83 c5 08 add $0x8,%ebp 11c: ff e2 jmp *%edx Reviewed By: austin Auditors: simonmar Differential Revision: https://phabricator.haskell.org/D163
* Code size micro-optimizations in the X86 backendReid Barton2014-10-071-1/+34
| | | | | | | | | | | | | | | | | | | | | Summary: Carter Schonwald suggested looking for opportunities to replace instructions in GHC's output by equivalent ones that are shorter, as recommended by the Intel optimization manuals. This patch reduces the module sizes as reported by nofib by about 1.5% on x86_64. Test Plan: Built an i386 cross-compiler and ran the test suite; the same (rather large) set of tests failed before and after this commit. Will let Harbormaster validate on x86_64. Reviewers: austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D320
* Place static closures in their own section.Edward Z. Yang2014-10-013-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: The primary reason for doing this is assisting debuggability: if static closures are all in the same section, they are guaranteed to be adjacent to one another. This will help later when we add some code that takes section start/end and uses this to sanity-check the sections. Part of remove HEAP_ALLOCED patch set (#8199) Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D263 GHC Trac Issues: #8199
* Stop exporting, and stop using, functions marked as deprecatedThomas Miedema2014-09-275-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Don't export `getUs` and `getUniqueUs`. `UniqSM` has a `MonadUnique` instance: instance MonadUnique UniqSM where getUniqueSupplyM = getUs getUniqueM = getUniqueUs getUniquesM = getUniquesUs Commandline-fu used: git grep -l 'getUs\>' | grep -v compiler/basicTypes/UniqSupply.lhs | xargs sed -i 's/getUs/getUniqueSupplyM/g git grep -l 'getUniqueUs\>' | grep -v combiler/basicTypes/UniqSupply.lhs | xargs sed -i 's/getUniqueUs/getUniqueM/g' Follow up on b522d3a3f970a043397a0d6556ca555648e7a9c3 Reviewed By: austin, hvr Differential Revision: https://phabricator.haskell.org/D220
* Delete hack that was once needed to fix the buildThomas Miedema2014-09-252-0/+2
| | | | | | | | | | | | | | | | | Summary: Introduced in 6c7b41cc2b24f533697a62bf1843507ae043fc97. I checked the rest of that commit, and this is all that was left to revert. Test Plan: x Reviewers: ezyang, austin Reviewed By: ezyang, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D241
* Make Applicative a superclass of MonadAustin Seipp2014-09-093-5/+11
| | | | | | | | | | | | | | | | | | | | | Summary: This includes pretty much all the changes needed to make `Applicative` a superclass of `Monad` finally. There's mostly reshuffling in the interests of avoid orphans and boot files, but luckily we can resolve all of them, pretty much. The only catch was that Alternative/MonadPlus also had to go into Prelude to avoid this. As a result, we must update the hsc2hs and haddock submodules. Signed-off-by: Austin Seipp <austin@well-typed.com> Test Plan: Build things, they might not explode horribly. Reviewers: hvr, simonmar Subscribers: simonmar Differential Revision: https://phabricator.haskell.org/D13
* `M-x delete-trailing-whitespace` & `M-x untabify`...Herbert Valerio Riedel2014-08-311-2/+2
| | | | | | ...some files more or less recently touched by me [ci skip]
* Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backendReid Barton2014-08-235-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | Summary: These MachOps are used by addIntC# and subIntC#, which in turn are used in integer-gmp when adding or subtracting small Integers. The following benchmark shows a ~6% speedup after this commit on x86_64 (building GHC with BuildFlavour=perf). {-# LANGUAGE MagicHash #-} import GHC.Exts import Criterion.Main count :: Int -> Integer count (I# n#) = go n# 0 where go :: Int# -> Integer -> Integer go 0# acc = acc go n# acc = go (n# -# 1#) $! acc + 1 main = defaultMain [bgroup "count" [bench "100" $ whnf count 100]] Differential Revision: https://phabricator.haskell.org/D140
* Implement new CLZ and CTZ primops (re #9340)Herbert Valerio Riedel2014-08-144-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | This implements the new primops clz#, clz32#, clz64#, ctz#, ctz32#, ctz64# which provide efficient implementations of the popular count-leading-zero and count-trailing-zero respectively (see testcase for a pure Haskell reference implementation). On x86, NCG as well as LLVM generates code based on the BSF/BSR instructions (which need extra logic to make the 0-case well-defined). Test Plan: validate and succesful tests on i686 and amd64 Reviewers: rwbarton, simonmar, ezyang, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D144 GHC Trac Issues: #9340
* x86: zero extend the result of 16-bit popcnt instructions (#9435)Reid Barton2014-08-121-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The 'popcnt r16, r/m16' instruction only writes the low 16 bits of the destination register, so we have to zero-extend the result to a full word as popCnt16# is supposed to return a Word#. For popCnt8# we could instead zero-extend the input to 32 bits and then do a 32-bit popcnt, and not have to zero-extend the result. LLVM produces the 16-bit popcnt sequence with two zero extensions, though, and who am I to argue? Test Plan: - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42" - then ran again adding "WAY=optasm", and verified that the popcnt sequences we generate match the ones produced by LLVM for its @llvm.ctpop.* intrinsics Reviewers: austin, hvr, tibbe Reviewed By: austin, hvr, tibbe Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D147 GHC Trac Issues: #9435
* Add CMOVcc insns to x86 NCGHerbert Valerio Riedel2014-08-122-0/+18
| | | | | | | | | | | | | | This is a pre-requisite for implementing count-{leading,trailing}-zero prim-ops (re #9340) and may be useful to NCG to help turn some code into branch-less code sequences. Test Plan: Compiles and validates in combination with clz/ctz primop impl Reviewers: ezyang, rwbarton, simonmar, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D141
* Add bit scan {forward,reverse} insns to x86 NCGHerbert Valerio Riedel2014-08-122-2/+10
| | | | | | | | | | | This is a pre-requisite for implementing count-{leading,trailing}-zero prim-ops (re #9340) Reviewers: ezyang, rwbarton, simonmar, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D141
* x86: Always generate add instruction in MO_Add2 (#9013)Reid Barton2014-08-113-2/+14
| | | | | | | | | | | | | Test Plan: - ran validate - ran T9013 test with all ways - ran CarryOverflow test with all ways, for good measure Reviewers: austin, simonmar Reviewed By: simonmar Differential Revision: https://phabricator.haskell.org/D137
* Eliminate some code duplication in x86 backend (genCCall32/64)Reid Barton2014-08-101-101/+13
| | | | | | | | | | | | | | | | | | | | | | | | Summary: No functional changes except in panic messages. These functions were identical except for - x87 operations in genCCall32 - the fallback to genCCall32'/64' - "32" vs "64" in panic messages (one case was wrong!) - minor syntactic or otherwise non-functional differences. Test Plan: Ran "validate --no-dph --slow" before and after the change. Only differences were two tests that failed before the change but not after, further investigation revealed that those tests are in fact erratic. Reviewers: simonmar, austin Reviewed By: austin Subscribers: phaskell, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D139
* refactor to fix 80column overflowSimon Marlow2014-08-011-16/+20
|
* Allow multiple entry points when allocating recursive groups (#9303)Simon Marlow2014-07-312-35/+39
| | | | | | | | | | | | | | | | | Summary: In this example we ended up with some code that was only reachable via an info table, because a branch had been optimised away by the native code generator. The register allocator then got confused because it was only considering the first block of the proc to be an entry point, when actually any of the info tables are entry points. Test Plan: validate Reviewers: simonpj, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D88
* Add missing memory fence to atomicWriteIntArray#Johan Tibell2014-07-233-1/+7
|
* X86 codegen: make LOCK a real instruction prefixJohan Tibell2014-07-233-12/+8
| | | | | | | | | | Before LOCK was a separate instruction and this led to the register allocator separating it from the instruction it was supposed to be a prefix of, leading to illegal assembly such as lock mov Fix contributed by PÁLI Gábor János.
* Rename PackageId to PackageKey, distinguishing it from Cabal's PackageId.Edward Z. Yang2014-07-212-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, both Cabal and GHC defined the type PackageId, and we expected them to be roughly equivalent (but represented differently). This refactoring separates these two notions. A package ID is a user-visible identifier; it's the thing you write in a Cabal file, e.g. containers-0.9. The components of this ID are semantically meaningful, and decompose into a package name and a package vrsion. A package key is an opaque identifier used by GHC to generate linking symbols. Presently, it just consists of a package name and a package version, but pursuant to #9265 we are planning to extend it to record other information. Within a single executable, it uniquely identifies a package. It is *not* an InstalledPackageId, as the choice of a package key affects the ABI of a package (whereas an InstalledPackageId is computed after compilation.) Cabal computes a package key for the package and passes it to GHC using -package-name (now *extremely* misnamed). As an added bonus, we don't have to worry about shadowing anymore. As a follow on, we should introduce -current-package-key having the same role as -package-name, and deprecate the old flag. This commit is just renaming. The haddock submodule needed to be updated. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonpj, simonmar, hvr, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D79 Conflicts: compiler/main/HscTypes.lhs compiler/main/Packages.lhs utils/haddock