summaryrefslogtreecommitdiff
path: root/src/buildstream/_scheduler/jobs
Commit message (Collapse)AuthorAgeFilesLines
* job.py: Simplify handling of messages through the parent-child pipebschubert/optimize-jobBenjamin Schubert2020-12-051-49/+4
| | | | | | Now that the only type of message that goes through are messages for the messenger, we can remove the enveloppe and only ever handle messenger's messages
* job.py: Stop sending errors through the child-parent pipe, and set it directlyBenjamin Schubert2020-12-051-24/+1
| | | | | Since we run in a single process, we do not need this distinction anymore
* job.py: Stop sending the result from a job through the pipeBenjamin Schubert2020-12-051-30/+8
| | | | | This is not needed now that jobs run in the smae process, we can just return the value from the method.
* job.py: Remove the ability to send child data to the parentBenjamin Schubert2020-12-052-31/+0
| | | | | | | | | | | | This is currently only used by the ElementJob to send back information about the workspace, that we can get directly now that we run in the same process * elementjob.py: Remove the returning of the workspace dict. This is directly available in the main thread. * queue.py: Use the workspace from the element directly instead of going through child data
* scheduler.py: Use threads instead of processes for jobsBenjamin Schubert2020-12-043-178/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes how the scheduler works and adapts all the code that needs adapting in order to be able to run in threads instead of in subprocesses, which helps with Windows support, and will allow some simplifications in the main pipeline. This addresses the following issues: * Fix #810: All CAS calls are now made in the master process, and thus share the same connection to the cas server * Fix #93: We don't start as many child processes anymore, so the risk of starving the machine are way less * Fix #911: We now use `forkserver` for starting processes. We also don't use subprocesses for jobs so we should be starting less subprocesses And the following highlevel changes where made: * cascache.py: Run the CasCacheUsageMonitor in a thread instead of a subprocess. * casdprocessmanager.py: Ensure start and stop of the process are thread safe. * job.py: Run the child in a thread instead of a process, adapt how we stop a thread, since we ca't use signals anymore. * _multiprocessing.py: Not needed anymore, we are not using `fork()`. * scheduler.py: Run the scheduler with a threadpool, to run the child jobs in. Also adapt how our signal handling is done, since we are not receiving signals from our children anymore, and can't kill them the same way. * sandbox: Stop using blocking signals to wait on the process, and use timeouts all the time. * messenger.py: Use a thread-local context for the handler, to allow for multiple parameters in the same process. * _remote.py: Ensure the start of the connection is thread safe * _signal.py: Allow blocking entering in the signal's context managers by setting an event. This is to ensure no thread runs long-running code while we asked the scheduler to pause. This also ensures all the signal handlers is thread safe. * source.py: Change check around saving the source's ref. We are now running in the same process, and thus the ref will already have been changed.
* Restore task element name / element name distinction in UITristan van Berkom2020-10-271-28/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This behavior has regressed a while back when introducing the messenger object in 0026e379 from merge request !1500. Main behavior change: - Messages in the master log always appear with the task element's element name and cache key, even if the element or plugin issuing the log line is not the primary task element. - Messages logged in the task specific log, retain the context of the element names and cache keys which are issuing the log lines. Changes include: * _message.py: Added the task element name & key members * _messenger.py: Log the element key as well if it is provided * _widget.py: Prefer the task name & key when logging, we fallback to the element name & key in case messages are being logged outside of any ongoing task (main process/context) * job.py: Unconditionally stamp messages with the task name & key Also removed some unused parameters here, clearing up an XXX comment * plugin.py: Add new `_message_kwargs` instance property, it is the responsibility of the core base class to maintain the base keyword arguments which are to be used as kwargs for Message() instances created on behalf of the issuing plugin. Use this method to construct messages in Plugin.__message() and to pass kwargs along to Messenger.timed_activity(). * element.py: Update the `_message_kwargs` when the cache key is updated * tests/frontend/logging.py: Fix test to expect the cache key in the logline * tests/frontend/artifact_log.py: Fix test to expect the cache key in the logline Fixes #1393
* Adding _DisplayKey typeTristan van Berkom2020-10-271-1/+1
| | | | | | | | | | | | | | | | | | | Instead of passing around untyped tuples for cache keys, lets have a clearly typed object for this. This makes for more readable code, and additionally corrects the data model statement of intent that some cache keys should be displayed as "dim", instead informing the frontend about whether the cache key is "strict" or not, allowing the frontend to decide how to display a strict or non-strict key. This patch does the following: * types.py: Add _DisplayKey * element.py: Return a _DisplayKey from Element._get_display_key() * Other sources: Updated to use the display key object
* _state.py: Use separate task identifierJürg Billeter2020-09-101-0/+7
| | | | | | | | `State.add_task()` required the job name to be unique in the session. However, the tuple `(action_name, full_name)` is not guaranteed to be unique. E.g., multiple `ArtifactElement` objects with the same element name may participate in a single session. Use a separate task identifier to fix this.
* job.py: Remove ability of job classes to send custom messagesbschubert/remove-custom-sched-messagesBenjamin Schubert2020-08-231-43/+0
| | | | | | | | We previously were sending custom messages from child jobs to parent jobs for example for reporting the cache size. This is not used anymore by the current implementation. Let's remove this entirely
* _messenger.py: Make `timed_suspendable` public and use it in job.pybschubert/timed-suspendableBenjamin Schubert2020-08-221-20/+9
| | | | This reduces the amount of code duplication
* element.py: move printing the build environment from elementjob.pyAbderrahim Kitouni2020-07-291-10/+0
|
* scheduler.py: Remove 'Message' notification type, use the messengerBenjamin Schubert2020-07-061-3/+4
| | | | | The messenger should be the one receiving messages directly, we don't need this indirection
* Completely abolish job pickling.tristan/nuke-pickle-jobberTristan van Berkom2020-06-152-209/+1
|
* _pluginfactory: Delegating the work of locating plugins to the PluginOriginTristan van Berkom2020-05-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This way we split up the logic of how to load plugins from different origins into their respective classes. This commit also: o Introduces PluginType (which is currently either SOURCE or ELEMENT) o Reduces the complexity of the PluginFactory constructor o Kills the loaded_dependencies list and the all_loaded_plugins API, and replaces both of these with a new list_plugins() API. Consequently the jobpickler.py from the scheduler, and the widget.py from the frontend, are updated to use list_plugins(). o Split up the PluginOrigin implementations into separate files Instead of having all PluginOrigin classes in pluginorigin.py, split it up into one base class and separate files for each implementation, which is more inline with BuildStream coding style. This has the unfortunate side effect of adding load_plugin_origin() into the __init__.py file, because keeping new_from_node() as a PluginOrigin class method cannot be done without introducing a cyclic dependency with PluginOrigin and it's implementations.
* _scheduler: Fix order of launching jobs and sending notifications.Tristan Van Berkom2020-05-191-2/+6
| | | | | | | | | | | | | | | | | | | | | | | Sending notifications causes potentially large bodies of code to run in the abstracted frontend codebase, we are not allowed to have knowledge of the frontend from this code. Previously, we were adding the job to the active jobs, sending the notification, and then starting the job. This means that if a BuildStream frontend implementation crashes, we handle the excepting in an inconsistent state and try to kill jobs which are not running. In addition to making sure that active_jobs list adjustment and job starting does not have any code body run in the danger window in between these, this patch also adds some fault tolerance and assertions around job termination so that: o Job.terminate() and Job.kill() do not crash with None _process o Job.start() raises an assertion if started after being terminated This fixes the infinite looping aspects of frontend crashes at job_start() time described in #1312.
* plugin.py: Rework how deprecation warnings are configured.Tristan Van Berkom2020-05-041-1/+1
| | | | | | | | | | | | | | | | | This is mostly a semantic change which defines how deprecation warnings are suppressed in a more consistent fashion, by declaring such suppressions in the plugin origin declarations rather than on the generic element/source configuration overrides section. Other side effects of this commit are that the warnings have been enhanced to include the provenance of whence the deprecated plugins have been used in the project, and that the custom deprecation message is optional and will appear in the message detail string rather than in the primary warning text, which now simply indicates that the plugin being used is deprecated. Documentation and test cases are updated. This fixes #1291
* _pluginfactory/pluginfactory.py: Add provenance to missing plugin errorsTristan Van Berkom2020-05-031-1/+1
| | | | | | | | | So far we were only reporting "No Source plugin registered for kind 'foo'", without specifying what bst file with line and column information, this commit fixes it. Additionally, this patch stores the provenance on the MetaSource to allow this to happen for sources.
* job.py: Use `_signals.terminator()` to handle `SIGTERM`Jürg Billeter2020-04-091-9/+7
| | | | | | `Sandbox` subclasses use `_signals.terminator()` to gracefully terminate the running command and cleanup the sandbox. Setting a `SIGTERM` handler in `job.py` breaks this.
* job.py: Do not call Process.close()Jürg Billeter2019-12-191-1/+0
| | | | | | | | | | | As we handle subprocess termination by pid with an asyncio child watcher, the multiprocessing.Process object does not get notified when the process terminates. And as the child watcher reaps the process, the pid is no longer valid and the Process object is unable to check whether the process is dead. This results in Process.close() raising a ValueError. Fixes: 9c23ce5c ("job.py: Replace message queue with pipe")
* job.py: Replace message queue with pipejuerg/job-pipeJürg Billeter2019-12-121-44/+40
| | | | | | | | A lightweight unidirectional pipe is sufficient to pass messages from the child job process to its parent. This also avoids the need to access the private `_reader` instance variable of `multiprocessing.Queue`.
* scheduler.py: Only run thread-safe code in callbacks from watchersbschubert/stricter-asyncio-handlingBenjamin Schubert2019-12-071-1/+7
| | | | | | | | Per https://docs.python.org/3/library/asyncio-policy.html#asyncio.AbstractChildWatcher.add_child_handler, the callback from a child handler must be thread safe. Not all our callbacks were. This changes all our callbacks to schedule a call for the next loop iteration instead of executing it directly.
* job.py: Only start new jobs in a `with watcher:` blockBenjamin Schubert2019-12-071-26/+5
| | | | | | | | The documentation (https://docs.python.org/3/library/asyncio-policy.html#asyncio.AbstractChildWatcher) is apparently missing this part, but the code mentions that new processes should only ever be called inside a with block: https://github.com/python/cpython/blob/99eb70a9eb9493602ff6ad8bb92df4318cf05a3e/Lib/asyncio/unix_events.py#L808
* job.py: Remove '_watcher' attribute, it is not neededBenjamin Schubert2019-12-071-3/+2
| | | | We don't need to keep a reference to the watcher, let's remove it.
* Reformat code using BlackChandan Singh2019-11-143-100/+71
| | | | | | | As discussed over the mailing list, reformat code using Black. This is a one-off change to reformat all our codebase. Moving forward, we shouldn't expect such blanket reformats. Rather, we expect each change to already comply with the Black formatting style.
* job.py: Gracefully handle killed subprocessesBenjamin Schubert2019-11-131-0/+8
| | | | | This ensures that we don't show an unexpected error when we forcefully kill one of our workers
* job.py: handle SIGTERM gracefullyBenjamin Schubert2019-11-131-0/+14
| | | | | This allows showing a nice info to the user and stop showing that the return code was an unexpected 255.
* job.py: Don't use 'terminate_wait', as it uses waitpid()Benjamin Schubert2019-11-131-16/+0
| | | | | | | | | | | Using `join()` on the subprocess calls `waitpid()` under the hood which breaks our child watcher. Instead, schedule a task for 20 seconds later that will effectively kill the tasks. Note that the task will only be called if we still have active jobs. Otherwise, it will just be skipped and we won't wait as long.
* scheduler.py: Prevent the asyncio loop from leaking into subprocessesBenjamin Schubert2019-11-111-11/+3
| | | | | | | | | | | Having a running asyncio loop while forking a program is not supported in python and doesn't work as expected. This leads to file descriptors leaking and the subprocesses sharing the same loop as the parents. This also leads to the parent receiving all signals the children receive. This ensures we don't leek our asyncio loop in the workers we fork.
* _scheduler/jobs: mv pickle details into jobpicklerAngelos Evripiotis2019-10-292-60/+59
| | | | | Move pickle_child_job and do_pickled_child_job into jobpickler.py, to keep details like saving and restoring global state out of job.py.
* job pickling: also pickle global state in node.pyxAngelos Evripiotis2019-10-292-19/+45
|
* job pickling: pickle first_pass_config factoriesAngelos Evripiotis2019-10-251-2/+4
| | | | | Note that for multiple-pass setups, i.e. where we have junctions, we also have to pickle things that belong to the 'first_pass_config'.
* job pickling: plugins don't return their factoriesAngelos Evripiotis2019-10-251-15/+21
| | | | | | | | | Remove the need for plugins to find and return the factory they came from. Also take the opportunity to combine source and element pickling into a single 'plugin' pickling path. This will make it easier for us to later support pickling plugins from the 'first_pass_config' of projects.
* jobpickler: also pickle DigestProtoAngelos Evripiotis2019-10-211-6/+21
| | | | | This is now required by some code paths. Also make a generic routine for pickling / unpickling, as we may be doing more of this.
* _scheduler/jobs/job.py: sort importsAngelos Evripiotis2019-10-041-4/+4
|
* scheduler.py: Notification for Message() propagationTom Pollard2019-09-101-3/+3
| | | | | | | Add a notification for MESSAGE. Instead of scheduler's Queues and Jobs directly calling the message handler that App has assigned to Context, the Message() is now sent over the notification handler where it is then given to Messenger's handler.
* Remove uneccesary _platform.multiprocessingaevri/nompAngelos Evripiotis2019-08-201-16/+27
| | | | | | | | It turns out we don't need to use multiprocessing.Manager() queues when using the 'spawn' method - the regular multiprocessing queues are also picklable, if passed as parameters to the new process. Thanks to @BenjaminSchubert for pointing this out.
* _scheduler: Remove cache size jobJürg Billeter2019-08-202-49/+0
| | | | Cache size will be tracked by buildbox-casd.
* _scheduler: Remove cleanup jobJürg Billeter2019-08-202-56/+0
| | | | Cache expiry will be managed by buildbox-casd.
* Support pickling jobs if the platform requires itAngelos Evripiotis2019-08-161-6/+44
| | | | | | | | Add support for using `multiprocessing.Manager` and the associated queues. Downgrade the queue event callback guarantees accordingly. In later work we may be able to support callbacks in all scenarios. Pickle and unpickle the child job if the platform requires it.
* Abstract mp Queue usage, prep to spawn processesAngelos Evripiotis2019-08-161-25/+14
| | | | | | | Pave the way to supporting starting processes by the 'spawn' method, by abstracting our usage of `multiprocessing.Queue`. This means we can easily switch to using a multiprocessing.Manager() and associated queues instead when necessary.
* job.py: Report error when job process unexpectedly dies (#1089)tmewett/report-weird-return-codesTom Mewett2019-08-121-1/+5
|
* _message.py: Use element_name & element_key instead of unique_idtpollard/messageobjectTom Pollard2019-08-082-55/+74
| | | | | | | | | | | | | Adding the element full name and display key into all element related messages removes the need to look up the plugintable via a plugin unique_id just to retrieve the same values for logging and widget frontend display. Relying on plugintable state is also incompatible if the frontend will be running in a different process, as it will exist in multiple states. The element full name is now displayed instead of the unique_id, such as in the debugging widget. It is also displayed in place of 'name' (i.e including any junction prepend) to be more informative.
* job: fix exception caught from enum translationBenjamin Schubert2019-07-311-1/+1
| | | | | The exception was incorrectly marked as 'KeyError', but enum throw 'ValueError' instead.
* types: Add a 'FastEnum' implementation and replace Enum by itBenjamin Schubert2019-07-291-8/+10
| | | | | | | 'Enum' has a big performance impact on the running code. Replacing it with a safe subset of functionality removes lots of this overhead without removing the benefits of using enums (safe comparisions, uniqueness)
* job: try pickling child jobs if BST_TEST_SUITEaevri/pickleAngelos Evripiotis2019-07-241-0/+7
| | | | | | | | If we're running BuildStream tests then pickle child jobs. This ensures that we keep things picklable, whilst we work towards being able to support platforms that need to use the 'spawn' method of starting processes.
* Make ChildJobs and friends picklableAngelos Evripiotis2019-07-241-0/+132
| | | | | | | | | Pave the way toward supporting the 'spawn' method of creating jobs, by adding support for pickling ChildJobs. Introduce a new 'jobpickler' module that provides an entrypoint for this functionality. This also makes replays of jobs possible, which has made the debugging of plugins much easier for me.
* Store core state for the frontend separatelyJonathan Maw2019-07-094-0/+10
|
* job: only pass Messenger to child, not all ContextAngelos Evripiotis2019-07-051-6/+8
| | | | | | | | | Reduce the amount of context shared with child jobs, by only sending the messenger portion of it rather than the whole thing. Also send the logdir. This also means that we will need to pickle less stuff when using the 'spawn' method of multi-processing, as opposed to the 'fork' method.
* Refactor, use context.messenger directlyAngelos Evripiotis2019-07-051-5/+5
| | | | | | Instead of having methods in Context forward calls on to the Messenger, have folks call the Messenger directly. Remove the forwarding methods in Context.
* Refactor: message handlers take 'is_silenced'Angelos Evripiotis2019-07-051-4/+4
| | | | | | | | Remove the need to pass the Context object to message handlers, by passing what is usually requested from the context instead. This paves the way to sharing less information with some child jobs - they won't need the whole context object, just the messenger.