| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Now that the only type of message that goes through are messages for the
messenger, we can remove the enveloppe and only ever handle messenger's
messages
|
|
|
|
|
| |
Since we run in a single process, we do not need this distinction
anymore
|
|
|
|
|
| |
This is not needed now that jobs run in the smae process, we can just
return the value from the method.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is currently only used by the ElementJob to send back information
about the workspace, that we can get directly now that we run in the
same process
* elementjob.py: Remove the returning of the workspace dict. This is
directly available in the main thread.
* queue.py: Use the workspace from the element directly instead of going
through child data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes how the scheduler works and adapts all the code that needs
adapting in order to be able to run in threads instead of in
subprocesses, which helps with Windows support, and will allow some
simplifications in the main pipeline.
This addresses the following issues:
* Fix #810: All CAS calls are now made in the master process, and thus
share the same connection to the cas server
* Fix #93: We don't start as many child processes anymore, so the risk
of starving the machine are way less
* Fix #911: We now use `forkserver` for starting processes. We also
don't use subprocesses for jobs so we should be starting less
subprocesses
And the following highlevel changes where made:
* cascache.py: Run the CasCacheUsageMonitor in a thread instead of a
subprocess.
* casdprocessmanager.py: Ensure start and stop of the process are thread
safe.
* job.py: Run the child in a thread instead of a process, adapt how we
stop a thread, since we ca't use signals anymore.
* _multiprocessing.py: Not needed anymore, we are not using `fork()`.
* scheduler.py: Run the scheduler with a threadpool, to run the child
jobs in. Also adapt how our signal handling is done, since we are not
receiving signals from our children anymore, and can't kill them the
same way.
* sandbox: Stop using blocking signals to wait on the process, and use
timeouts all the time.
* messenger.py: Use a thread-local context for the handler, to allow for
multiple parameters in the same process.
* _remote.py: Ensure the start of the connection is thread safe
* _signal.py: Allow blocking entering in the signal's context managers
by setting an event. This is to ensure no thread runs long-running
code while we asked the scheduler to pause. This also ensures all the
signal handlers is thread safe.
* source.py: Change check around saving the source's ref. We are now
running in the same process, and thus the ref will already have been
changed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This behavior has regressed a while back when introducing the messenger
object in 0026e379 from merge request !1500.
Main behavior change:
- Messages in the master log always appear with the task element's
element name and cache key, even if the element or plugin issuing
the log line is not the primary task element.
- Messages logged in the task specific log, retain the context of the
element names and cache keys which are issuing the log lines.
Changes include:
* _message.py: Added the task element name & key members
* _messenger.py: Log the element key as well if it is provided
* _widget.py: Prefer the task name & key when logging, we fallback
to the element name & key in case messages are being logged outside
of any ongoing task (main process/context)
* job.py: Unconditionally stamp messages with the task name & key
Also removed some unused parameters here, clearing up an XXX comment
* plugin.py: Add new `_message_kwargs` instance property, it is the responsibility
of the core base class to maintain the base keyword arguments which
are to be used as kwargs for Message() instances created on behalf
of the issuing plugin.
Use this method to construct messages in Plugin.__message() and to
pass kwargs along to Messenger.timed_activity().
* element.py: Update the `_message_kwargs` when the cache key is updated
* tests/frontend/logging.py: Fix test to expect the cache key in the logline
* tests/frontend/artifact_log.py: Fix test to expect the cache key in the logline
Fixes #1393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of passing around untyped tuples for cache keys, lets have
a clearly typed object for this.
This makes for more readable code, and additionally corrects the
data model statement of intent that some cache keys should be displayed
as "dim", instead informing the frontend about whether the cache key
is "strict" or not, allowing the frontend to decide how to display
a strict or non-strict key.
This patch does the following:
* types.py: Add _DisplayKey
* element.py: Return a _DisplayKey from Element._get_display_key()
* Other sources: Updated to use the display key object
|
|
|
|
|
|
|
|
| |
`State.add_task()` required the job name to be unique in the session.
However, the tuple `(action_name, full_name)` is not guaranteed to be
unique. E.g., multiple `ArtifactElement` objects with the same element
name may participate in a single session. Use a separate task identifier
to fix this.
|
|
|
|
|
|
|
|
| |
We previously were sending custom messages from child jobs to parent
jobs for example for reporting the cache size.
This is not used anymore by the current implementation. Let's remove
this entirely
|
|
|
|
| |
This reduces the amount of code duplication
|
| |
|
|
|
|
|
| |
The messenger should be the one receiving messages directly, we don't
need this indirection
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way we split up the logic of how to load plugins from different
origins into their respective classes.
This commit also:
o Introduces PluginType (which is currently either SOURCE or ELEMENT)
o Reduces the complexity of the PluginFactory constructor
o Kills the loaded_dependencies list and the all_loaded_plugins API,
and replaces both of these with a new list_plugins() API.
Consequently the jobpickler.py from the scheduler, and the
widget.py from the frontend, are updated to use list_plugins().
o Split up the PluginOrigin implementations into separate files
Instead of having all PluginOrigin classes in pluginorigin.py, split
it up into one base class and separate files for each implementation,
which is more inline with BuildStream coding style.
This has the unfortunate side effect of adding load_plugin_origin()
into the __init__.py file, because keeping new_from_node() as
a PluginOrigin class method cannot be done without introducing a
cyclic dependency with PluginOrigin and it's implementations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sending notifications causes potentially large bodies of code to run
in the abstracted frontend codebase, we are not allowed to have knowledge
of the frontend from this code.
Previously, we were adding the job to the active jobs, sending the
notification, and then starting the job. This means that if a BuildStream
frontend implementation crashes, we handle the excepting in an inconsistent
state and try to kill jobs which are not running.
In addition to making sure that active_jobs list adjustment and
job starting does not have any code body run in the danger window
in between these, this patch also adds some fault tolerance and assertions
around job termination so that:
o Job.terminate() and Job.kill() do not crash with None _process
o Job.start() raises an assertion if started after being terminated
This fixes the infinite looping aspects of frontend crashes at
job_start() time described in #1312.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a semantic change which defines how deprecation warnings
are suppressed in a more consistent fashion, by declaring such suppressions
in the plugin origin declarations rather than on the generic element/source
configuration overrides section.
Other side effects of this commit are that the warnings have been enhanced
to include the provenance of whence the deprecated plugins have been used in
the project, and that the custom deprecation message is optional and will
appear in the message detail string rather than in the primary warning text,
which now simply indicates that the plugin being used is deprecated.
Documentation and test cases are updated.
This fixes #1291
|
|
|
|
|
|
|
|
|
| |
So far we were only reporting "No Source plugin registered for kind 'foo'",
without specifying what bst file with line and column information, this
commit fixes it.
Additionally, this patch stores the provenance on the MetaSource to
allow this to happen for sources.
|
|
|
|
|
|
| |
`Sandbox` subclasses use `_signals.terminator()` to gracefully terminate
the running command and cleanup the sandbox. Setting a `SIGTERM` handler
in `job.py` breaks this.
|
|
|
|
|
|
|
|
|
|
|
| |
As we handle subprocess termination by pid with an asyncio child
watcher, the multiprocessing.Process object does not get notified when
the process terminates. And as the child watcher reaps the process, the
pid is no longer valid and the Process object is unable to check whether
the process is dead. This results in Process.close() raising a
ValueError.
Fixes: 9c23ce5c ("job.py: Replace message queue with pipe")
|
|
|
|
|
|
|
|
| |
A lightweight unidirectional pipe is sufficient to pass messages from
the child job process to its parent.
This also avoids the need to access the private `_reader` instance
variable of `multiprocessing.Queue`.
|
|
|
|
|
|
|
|
| |
Per
https://docs.python.org/3/library/asyncio-policy.html#asyncio.AbstractChildWatcher.add_child_handler,
the callback from a child handler must be thread safe. Not all our
callbacks were. This changes all our callbacks to schedule a call for
the next loop iteration instead of executing it directly.
|
|
|
|
|
|
|
|
| |
The documentation
(https://docs.python.org/3/library/asyncio-policy.html#asyncio.AbstractChildWatcher)
is apparently missing this part, but the code mentions that new
processes should only ever be called inside a with block:
https://github.com/python/cpython/blob/99eb70a9eb9493602ff6ad8bb92df4318cf05a3e/Lib/asyncio/unix_events.py#L808
|
|
|
|
| |
We don't need to keep a reference to the watcher, let's remove it.
|
|
|
|
|
|
|
| |
As discussed over the mailing list, reformat code using Black. This is a
one-off change to reformat all our codebase. Moving forward, we
shouldn't expect such blanket reformats. Rather, we expect each change
to already comply with the Black formatting style.
|
|
|
|
|
| |
This ensures that we don't show an unexpected error when we
forcefully kill one of our workers
|
|
|
|
|
| |
This allows showing a nice info to the user and stop showing that
the return code was an unexpected 255.
|
|
|
|
|
|
|
|
|
|
|
| |
Using `join()` on the subprocess calls `waitpid()` under the hood
which breaks our child watcher.
Instead, schedule a task for 20 seconds later that will effectively
kill the tasks.
Note that the task will only be called if we still have active jobs.
Otherwise, it will just be skipped and we won't wait as long.
|
|
|
|
|
|
|
|
|
|
|
| |
Having a running asyncio loop while forking a program is not supported
in python and doesn't work as expected.
This leads to file descriptors leaking and the subprocesses sharing the
same loop as the parents. This also leads to the parent receiving all
signals the children receive.
This ensures we don't leek our asyncio loop in the workers we fork.
|
|
|
|
|
| |
Move pickle_child_job and do_pickled_child_job into jobpickler.py, to
keep details like saving and restoring global state out of job.py.
|
| |
|
|
|
|
|
| |
Note that for multiple-pass setups, i.e. where we have junctions, we
also have to pickle things that belong to the 'first_pass_config'.
|
|
|
|
|
|
|
|
|
| |
Remove the need for plugins to find and return the factory they came
from. Also take the opportunity to combine source and element pickling
into a single 'plugin' pickling path.
This will make it easier for us to later support pickling plugins from
the 'first_pass_config' of projects.
|
|
|
|
|
| |
This is now required by some code paths. Also make a generic routine for
pickling / unpickling, as we may be doing more of this.
|
| |
|
|
|
|
|
|
|
| |
Add a notification for MESSAGE. Instead of scheduler's Queues and
Jobs directly calling the message handler that App has assigned to
Context, the Message() is now sent over the notification handler
where it is then given to Messenger's handler.
|
|
|
|
|
|
|
|
| |
It turns out we don't need to use multiprocessing.Manager() queues when
using the 'spawn' method - the regular multiprocessing queues are also
picklable, if passed as parameters to the new process.
Thanks to @BenjaminSchubert for pointing this out.
|
|
|
|
| |
Cache size will be tracked by buildbox-casd.
|
|
|
|
| |
Cache expiry will be managed by buildbox-casd.
|
|
|
|
|
|
|
|
| |
Add support for using `multiprocessing.Manager` and the associated
queues. Downgrade the queue event callback guarantees accordingly. In
later work we may be able to support callbacks in all scenarios.
Pickle and unpickle the child job if the platform requires it.
|
|
|
|
|
|
|
| |
Pave the way to supporting starting processes by the 'spawn' method, by
abstracting our usage of `multiprocessing.Queue`. This means we can
easily switch to using a multiprocessing.Manager() and associated queues
instead when necessary.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding the element full name and display key into all element related
messages removes the need to look up the plugintable via a plugin
unique_id just to retrieve the same values for logging and widget
frontend display. Relying on plugintable state is also incompatible
if the frontend will be running in a different process, as it will
exist in multiple states.
The element full name is now displayed instead of the unique_id,
such as in the debugging widget. It is also displayed in place of
'name' (i.e including any junction prepend) to be more informative.
|
|
|
|
|
| |
The exception was incorrectly marked as 'KeyError', but enum throw
'ValueError' instead.
|
|
|
|
|
|
|
| |
'Enum' has a big performance impact on the running code. Replacing
it with a safe subset of functionality removes lots of this overhead
without removing the benefits of using enums (safe comparisions,
uniqueness)
|
|
|
|
|
|
|
|
| |
If we're running BuildStream tests then pickle child jobs.
This ensures that we keep things picklable, whilst we work towards being
able to support platforms that need to use the 'spawn' method of
starting processes.
|
|
|
|
|
|
|
|
|
| |
Pave the way toward supporting the 'spawn' method of creating jobs, by
adding support for pickling ChildJobs. Introduce a new 'jobpickler'
module that provides an entrypoint for this functionality.
This also makes replays of jobs possible, which has made the debugging
of plugins much easier for me.
|
| |
|
|
|
|
|
|
|
|
|
| |
Reduce the amount of context shared with child jobs, by only sending the
messenger portion of it rather than the whole thing. Also send the
logdir.
This also means that we will need to pickle less stuff when using the
'spawn' method of multi-processing, as opposed to the 'fork' method.
|
|
|
|
|
|
| |
Instead of having methods in Context forward calls on to the Messenger,
have folks call the Messenger directly. Remove the forwarding methods in
Context.
|
|
|
|
|
|
|
|
| |
Remove the need to pass the Context object to message handlers, by
passing what is usually requested from the context instead.
This paves the way to sharing less information with some child jobs -
they won't need the whole context object, just the messenger.
|