delta/linux.git - git.kernel.org: pub/scm/linux/kernel/git/torvalds/linux.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	drm/vc4: Add the HVS's interrupt to the device tree.	Eric Anholt	2015-07-14	2	-0/+2
\| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Reconfigure DISPCTRLn registers across mode sets.	Eric Anholt	2015-07-14	2	-0/+90
\| \| \| \| \| \| \| \| \|	This register sets the output width/height of the CRTC, which will be read in by the PV. This gets us to the point of mode sets to a new resolution working sometimes, though occasionally I end up with just a black screen. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Add support for async pageflips.	Eric Anholt	2015-06-30	5	-7/+140
\| \| \| \| \| \| \| \|	This should be the fastest screen update path we have, but it doesn't seem to have really improved performance with vblank_mode=0 fullscreen. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Add tracepoints for seqno waits.	Eric Anholt	2015-06-30	4	-0/+83
\| \| \| \| \| \| \|	They proved valuable on i915, and I expect them to do so on vc4 as well. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Implement async atomic modesets.	Eric Anholt	2015-06-29	4	-22/+129
\| \| \| \| \| \| \| \| \|	In order to support the pageflip ioctl, we have to be able to defer the body of the mode set to a workqueue, so that the pageflip ioctl can return immediately. Pageflipping gets us ~3x performance on fullscreen GL rendering in X. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: When asked for a page flip event, send it at vblank time.	Eric Anholt	2015-06-29	3	-0/+62
\| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Clear the vblank IRQ before doing the work to handle it.	Eric Anholt	2015-06-29	1	-1/+1
\| \| \| \| \| \| \| \|	It's unlikely that we'd hit another vblank during drm_crtc_handle_vblank() time, but we might as well handle it if we do. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Implement custom atomic commit in vc4.	Eric Anholt	2015-06-29	1	-1/+90
\| \| \| \| \| \| \| \| \| \| \|	This is just drm_atomic_helper_commit(), with wait_for_fence replaced with V3D seqno waits based on msm's atomic commit's wait loop. This means we now synchronize to our own rendering. The old wait_for_fence loop did nothing since we never attached a dmabuf fence to our plane state. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Add an option for uninterruptiple V3D waits.	Eric Anholt	2015-06-29	3	-7/+9
\| \| \| \| \| \| \|	All of our current interfaces want to be interruptible, but modesetting doesn't want that. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Expose the atomic ioctl.	Eric Anholt	2015-06-29	1	-0/+1
\| \| \| \| \| \|	We want to have support for this from the beginning. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Add support for negative plane X/Y positions.	Eric Anholt	2015-06-29	1	-5/+22
\| \| \| \| \| \| \|	Thix fixes WARN_ONs and bad plane configuration when the X cursor hit the top/left of the screen. Signed-off-by: Eric Anholt <eric@anholt.net>
*	drm/vc4: Switch the driver from 2835 to 2708/2709.	Eric Anholt	2015-06-23	6	-35/+64
\|
*	ARM: BCM2709: Make the upstream-targeted mbox functions work on 2709 instead.	Eric Anholt	2015-06-23	3	-91/+21
\|
*	Merge remote-tracking branch vc4-kms-v3d' into vc4-kms-v3d-rpi2	Eric Anholt	2015-06-23	54	-113/+8359
\|\
\| *	drm/vc4: Make the kernel allocate the tile state/alloc buffers.	Eric Anholt	2015-06-17	4	-63/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a security hole of letting userspace map and modify the buffers, keeps each userspace client from needing to hang on to a set of them, reduces kernel validation overhead, and improves stability (possibly due to reducing the severity of an addressing issue in the hardware's tile state buffer). Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Switch to generating RCLs from the kernel.	Eric Anholt	2015-06-17	6	-321/+561
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Validation was expensive, and there aren't enough variations of RCLs to warrant it. This also may let us cache generated RCLs, to reduce overhead even further. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Try even harder to succeed at allocations (wait for GPU).	Eric Anholt	2015-06-17	3	-20/+50
\| \| \| \| \| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Clean up BO allocation from the cache a bit.	Eric Anholt	2015-06-17	1	-7/+5
\| \| \| \| \| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Clean up V3D interrupt handling, and document how it works.	Eric Anholt	2015-06-17	1	-13/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My surprise about having to set INTENA for OUTOMEM when I hadn't cleared anything in it now makes sense, having found that there is just one set of underlying enable bits behind the INTENA/INTDIS registers. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Allocate the correct size for our bin BO.	Eric Anholt	2015-06-17	2	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \|	The 256k was left over from early bringup. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Sync validation code from userspace.	Eric Anholt	2015-06-17	3	-111/+287
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We get validation that direct texturing doesn't address off the start of the array, some more symbolic #defines, and a fix for the dithering bits. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Don't forget to turn the hardware off on unbind.	Eric Anholt	2015-06-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps make sure that we don't have any state left around (like the binner overflow allocation). Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Prevent clients from racing far ahead of execution.	Eric Anholt	2015-06-04	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is policy that we've often left up to userland in various DRM drivers. However, given how thoroughly memory-constrained we are on VC4, letting userspace run away with things and over-inflate their caches is definitely not helping. This hasn't reduced maximum throughput in my testing, since we're only waiting until the exec just before the one we queued is finished, so there will be a job to submit the moment the GPU sends us the frame done interrupt. We would only actually idle the GPU if assembling some scenes takes more CPU time than the time it takes to complete some scenes on the GPU. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Return the proper incremented seqno for our job.	Eric Anholt	2015-06-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was returning the seqno before the increment, so userspace's waits on job completion could be too early. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Use the power-of-two divider round_up instead of roundup.	Eric Anholt	2015-06-04	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids generating a uidiv, and saves us another ~.6% in check_tex_size as a result of loadstore_tile_buffer_general. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Restrict texture/fb sizes, to avoid a divide in validation.	Eric Anholt	2015-06-04	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids generating a uidiv, and saves us about a percent in check_tex_size as a result of loadstore_tile_buffer_general. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Stop using relocations for BRANCH_TO_SUBLIST.	Eric Anholt	2015-06-04	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we're going to insist that the relocation target is the tile alloc BO that we saw during binner config, then just ignore the relocations and point at the tile alloc BO. Saves an immediate .6% or so of CPU, and will let us ditch the relocations we've been processing for it, too. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Optimize BRANCH_TO_SUBLIST validation.	Eric Anholt	2015-06-04	2	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We typically get bin_tiles_x * bin_tiles_y of these, so it makes sense to do a tiny bit of up front work to make this easier (particularly, avoid a divide, since those are terrible). Saves about half a percent of the CPU. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Make userspace's infinite waits actually infinite.	Eric Anholt	2015-06-04	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use ~0 (584 years) to indicate "wait forever", so don't actually set up a timer for 584 years from now. Saves about 1% CPU. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Don't bother flushing the ARM CPU caches on GEM exec.	Eric Anholt	2015-06-04	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All of our BOs are mapped write-combining both in the kernel and userspace by drm_gem_cma_helper.c, so anything we've written will end up in main memory, anyway. The only coherency trouble we might have would be if we had content in the CPU's write combining buffers that hadn't been flushed, but a CPU cache flush is unlikely to help with that, anyway. Improves no-swapbuffers drawarrays-mode isosurf performance by around 20%. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Avoid repeatedly grabbing/dropping the GEM handle spinlock.	Eric Anholt	2015-06-04	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improves no-swapbuffers drawarrays-mode isosurf performance by 0.219463% +/- 0.166978% (n=4/5) Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Don't forget to irqsave/restore in the reset path.	Eric Anholt	2015-06-04	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes another deadlock error from lockdep. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Make sure we clean up GEM if probe fails.	Eric Anholt	2015-06-04	6	-4/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We often return -EPROBE_DEFER from the various components, and we would have left a bunch of GEM stuff running and allocated. Cleans up a boot-time warning about how we leaked the 256k of overflow_mem. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Add a debugfs node for looking at BO allocation stats.	Eric Anholt	2015-06-04	3	-11/+68
\| \| \| \| \| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Purge BO cache when we've failed an allocation, and retry.	Eric Anholt	2015-06-04	1	-4/+25
\| \| \| \| \| \| \| \| \| \| \| \|	This gets us a bit farther when running low on CMA memory. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Don't forget to lock around BO creation in the overflow work.	Eric Anholt	2015-06-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The overflow handler notably walks through our BO cache structs, which are protected by this lock. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Don't forget to disable IRQs during the job_done spinlock.	Eric Anholt	2015-06-04	2	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hard IRQ handler also takes this lock, so we need interrupts disabled or we'll deadlock. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Avoid race with IRQs at module load.	Eric Anholt	2015-06-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vc4_gem_init() initializes the BO cache, but v3d probe was setting up the interrupt handler, and potentially taking an interrupt and trying to allocate an overflow BO out of the cache before then. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Return a correct IRQ status.	Eric Anholt	2015-06-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there were stray IRQs, the core kernel would never had a chance to shut it down. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Fix off-by-one in branch target validation.	Eric Anholt	2015-06-04	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Add support for jobs without a bin CL.	Eric Anholt	2015-06-04	3	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is useful for doing blit operations using the render engine, which can perform reads from raster textures into a tiled output, when the texture fetch engine has proved incapable of fetching correctly. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Drop unnecessary restriction on render w/h vs bin w/h.	Eric Anholt	2015-06-04	1	-22/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Make alignment of raster texture widths more consistent.	Eric Anholt	2015-06-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This more like how the simulator writes the aligment, and consistent with the other tiling formats. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Revert "drm/vc4: Evict user mappings of shaders while they're being ↵	Eric Anholt	2015-06-04	5	-118/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	executed." This reverts commit cd95e2a4feb4cf7a5e47da3e320f9fc3336899e4. This appears to be broken on 3.18 on rpi2, and I haven't figured out why yet. It causes a system lockup with sometimes an angry RCU complaint beforehand.
\| *	drm/vc4: Add create and map BO ioctls.	Eric Anholt	2015-06-04	4	-2/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've been relying on the dumb APIs until now, but that's against the rules. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Make sure that waits that get interrupted don't wait forever.	Eric Anholt	2015-06-04	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	danvet's talk recommended avoiding relative waits entirely, but his example was of relative waits with units (vblank numbers) that were potentially longer than the interval between signals coming in, so you couldn't adjust the ioctl arguments appropriately. This change also has a wait unit that might be too large (jiffies), but given that the other argument (an emitted seqno) is guaranteed to eventually result in a nonblocking successful return, this isn't really a big deal. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Add a "flags" arg to submit_cl.	Eric Anholt	2015-06-04	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was one of danvet's recommendations from his ioctls talk, and it makes sense. It also happens to occupy a 32-bit padding slot in the struct. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Follow danvet's alignment rules for ioctl struct ABI.	Eric Anholt	2015-06-04	2	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a flag day for the kernel ABI, but it's a requirement for merging the code. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Evict user mappings of shaders while they're being executed.	Eric Anholt	2015-06-04	5	-3/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the user could rewrite shader code while the GPU was executing it, then all of vc4_validate_shaders.c's safety checks would be invalidated, notably by using the direct-addressing TMU fetches to read arbitrary system memory. This hopefully closes our last remaining root hole. Signed-off-by: Eric Anholt <eric@anholt.net>
\| *	drm/vc4: Disallow using dmabuf BOs as shaders.	Eric Anholt	2015-06-04	4	-2/+53
\| \| \| \| \| \| \| \| \| \| \| \|	This is a step in eliminating the rewrite-the-shaders root hole. Signed-off-by: Eric Anholt <eric@anholt.net>