| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
Change-Id: I992dc0c1d40f563ade56a833162d409b02be90a0
|
| |
|
|\
| |
| |
| |
| | |
Reviewed-By: Adam Coldrick <adam.coldrick@codethink.co.uk>
Reviewed-By: Richard Maw <richard.maw@codethink.co.uk>
|
| |
| |
| |
| |
| | |
This makes it easier to spot if an incomplete build was due to the user
cancelling, or if it represents a dropped connection or internal error.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This message was hundreds of kilobytes in size, as it contained a
recursive list of dependencies for each artifact in the build graph. It
was used in the initiator only to print this message:
Build steps in total: 592
This message is now gone. The 'Need to build %d artifacts'
build-progress message now indicates the total build steps instead:
Need to build 300 artifacts, of 592 total
This is a compatible change to the distbuild protocol: old initiators
will continue to work as normal with new controllers that don't send
the build-steps message.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It gets messy having hundreds of build-step-xx.log files in the current
directory, and if two builds are run in parallel from the same directory
the logs for a given chunk will be mixed together in one file.
Now, a new directory named build-0, build-1, build-2 etc is created for
each new build.
If the user passes --initiator-step-output-dir the logs will be placed
in that directory, instead. This behaviour is the same as before.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Users build sources, not artifacts. So the log files should be called
build-step-systemd.log and not build-step-systemd-misc.log.
Note strata are a kind of special case so you will still see
build-step-foundation-runtime.log, build-step-foundation-devel.log etc.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For a while we have seen an issue where output from build A would end up
in the log file of some other random chunk.
The problem turns out to be that the WorkerConnection class in the
controller-daemon assumes cancellation is instantaneous. If a build was
cancelled, the WorkerConnection would send a cancel message for the job
it was running, and then start a new job. However, the worker-daemon
process would have a backlog of exec-output messages and a delayed
exec-response message from the old job. The controller would receive
these and would assume that they were for the new job, without checking
the job ID in the messages. Thus they would be sent to the wrong log
file.
To fix this, the WorkerConnection class now tracks jobs by job ID, and
the code should be generally more robust when unexpected messages are
received.
|
| | |
|
|\ \
| | |
| | |
| | |
| | |
| | | |
Reviewed-By: Richard Maw <richard.maw@codethink.co.uk>
Reviewed-By: Francisco Redondo Marchena <francisco.marchena@codethink.co.uk>
Reviewed-By: Mike Smith <mike.smith@codethink.co.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The logic to handle a worker disconnecting was broken. The
WorkerConnection object would remove itself from the main loop as soon
as the worker disconnected. But it would not get removed from the list
of available workers that the WorkerBuildQueue maintains. So the
controller would continue sending messages to this dead connection, and
the builds it sent would hang forever for a response.
|
| | | |
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | | |
'lauren/baserock/lauren/distbuild-invalid-input-crash'
Reviewed-By: Richard Maw <richard.maw@codethink.co.uk>
Reviewed-By: Sam Thursfield <sam.thursfield@codethink.co.uk>
|
| | | |
|
| |/ |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Let the end-user see the URL that distbuild was attempting to talk to,
so they can more easily spot configuration errors. It's kind of silly
to say 'HTTP request failed' without saying where the request was going.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The previous error looked like this by the time it had reached the
initiator's console:
ERROR: Failed to build baserock:baserock/definitions
c7292b7c81cdd7e5b9e85722406371748453c44f
systems/base-system-x86_64-generic.morph.frodsham: Failed to compute
build graph. Problem with serialise-artifact: ERROR: Couldn't find
morphology: systems/base-system-x86_64-generic.morph.frodsham
New message is at least a bit simpler:
ERROR: Failed to build baserock:baserock/definitions
c7292b7c81cdd7e5b9e85722406371748453c44f
systems/base-system-x86_64-generic.morph.frodsham: ERROR: Couldn't
find morphology: systems/base-system-x86_64-generic.morph.frodsham
|
| |
| |
| |
| |
| |
| | |
If there's no distbuild-helper process running on the controller then
the controller would hang forever. This situation is unlikely, but it's
important to give the user feedback instead of silently hanging forever.
|
| |
| |
| |
| |
| |
| |
| | |
There's no need to handle failure differently at each stage of the
build. Simpler to use the BuildFailed message for all errors. This
then allows us to have a single self.fail() function that can be used
everywhere.
|
| | |
|
|/
|
|
|
|
|
| |
Knowing which worker built something is useful for debugging, and right
now that information is only present on the initiator's console. It's
good to have it in the build-step-xx.log file too so the information
doesn't get lost.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The recent changes to the BuildCommand.build() function caused distbuild
to break, because I didn't make the same change to the
InitiatorBuildCommand.build() function but did change how it was called.
This commit adds the ability to have optional fields in distbuild
messages. This is used to add an optional 'original_ref' field, which
will get passed to `morph serialise-artifact` by new distbuild
controllers, and will be ignored by older ones.
|
| |
|
| |
|
|
|
|
|
|
| |
JSON can only handle unicode strings, but commands can write anything to
stdout/stderr, so we do the same trick as for the serialise, and json
encode yaml.
|
|
|
|
|
|
| |
The horrible json.dumped, yaml dump is because we need it to be both
binary safe (which yaml gives us) and one line per message (which json
gives us).
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
You can bind to an ephemeral port by passing 0 as the port number.
To work out which port you actually got, you need to call getsockname().
To facilitate being able to spawn multiple copies of the daemons for
testing environments, you can pass a -file option, which will make the
daemon write which port it actually bound to.
If this path is a fifo, reading from it in the spawner process will
allow synchronisation of only spawning services that require that port to
be ready after it is.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "unicode fix" worked for the subset of cases relevant, and only
broke distbuild because its tests have not been integrated with ./check,
so the fact that it broke for any string ending with a \ escaped notice,
if you will excuse the pun.
During json.load, the encode option is for specifying the character
encoding of the file or string that is being loaded.
During json.dump, the encode option is for the encoding of `str` keys
and values.
The fact that it worked for the set of cases we cared about is a small
mystery, probably caused by the strings we happened to give it being
valid unicode-escape encoded `str`ings.
A full fix would require either converting all these cases to a
different format, such as YAML, which will handle input data not being
valid Unicode, or pre-processing the data that is passed to `json.dump`
to convert all `str` instances to an appropriately escaped `unicode`,
and converting back on `json.load`, but this is a quick fix to get the
distbuild code working again.
|
|\
| |
| |
| |
| | |
Reviewed-by: Lars Wirzenius
Reviewed-by: Pedro Alvarez
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
json only accepts unicode. Various APIs such as file paths and environment
variables allow binary data, so we need to support this properly.
This patch changes every[1] use of json.load or json.dump to escape
non-unicode data strings. This appears exactly as it used to if the
input was valid unicode, if it isn't it will insert \xabcd escapes in
the place of non-unicode data.
When loading back in, if json.load is told to unescape it with
`encoding='unicode-escape'` then it will convert it back correctly.
This change was primarily to support file paths that weren't valid
unicode, where this would choke and die. Now it works, but any tools
that parsed the metadata need to unescape the paths.
[1]: The interface to the remote repo cache uses json data, but I haven't
changes its json.load calls to unescape the data, since the repo
caches haven't been made to escape the data.
|
|/
|
|
|
| |
This will make it easier to determine what is wrong if the controller
daemon is run with a bad controller host address.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found an issue in distbuild where the controller was stuck in a busy
loop where it was continually writing to a closed socket. With 'strace'
I saw write(), SIGPIPE, write(), SIGPIPE, ad infinitum. I got this much
of a Python backtrace using GDB:
distbuild.socketsrc.SocketEventSource.write()
distbuild.sockbuf.SocketBuffer._flush()
distbuild.sm.StateMachine.handle_event()
I didn't manage to get further. However, I suspect one of the state
machine transitions may be creating an event loop instead of correctly
handling the error.
The log file was quiet at this point, the last entries were:
2014-06-19 08:57:36 INFO There seems to be nothing to build
2014-06-19 08:57:36 INFO Requested artifact is built
2014-06-19 08:57:36 DEBUG InitiatorConnection: sent to 10.24.1.215:53818: {'mess
age': 'Need to build 0 artifacts', 'type': 'build-progress', 'id': 790629564}
2014-06-19 08:57:36 DEBUG Notifying initiator of successful build
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <BuildController at 0xb
6c554c, request-id InitiatorConnection-93>
2014-06-19 08:57:36 DEBUG InitiatorConnection: sent to 10.24.1.215:53818: {'type
': 'build-finished', 'id': 790629564, 'urls': [u'http://hawkdevtrove:8080/1.0/ar
tifacts?filename=861f640923494ca3626bbd65655b350ce1bebea4c0bf7a57693bc06ed122cef
4.system.devel-system-x86_32-chroot-rootfs']}
2014-06-19 08:57:36 DEBUG InitiatorConnection: 10.24.1.215:53818: closing: <Json
Machine at 0xc6cb22c: socket 10.24.1.164:7878 -> 10.24.1.215:53818, max_buffer 1
6384>
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <InitiatorConnection at 0xc6cbcec: remote 10.24.1.215:53818>
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <JsonMachine at 0xc6cb22c: socket 10.24.1.164:7878 -> 10.24.1.215:53818, max_buffer 16384>
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <SocketBuffer at 0xc6cbe2c: socket None max_buffer 16384>
This commit should improve matters a little: in future the log file will show
the ID of the SocketEventSource object and error we hit when calling its
write() function.
|
| |
|
|
|
|
| |
This change is made just for consistency.
|
|
|
|
|
|
| |
The InitiatorConnectionMachine wraps the ConnectionMachine,
so we can continue to use ConnectionMachine without providing
it with an app.
|
|
|
|
|
|
|
| |
By default there is no limit on the number of reconnection attempts.
We make the reconnect_interval a parameter, but the default
interval remains 1 second.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the mainloop unintentionally busy-waits if
we can't connect to the controller.
We want the mainloop's select call to wait for the timeout
not for this socket's descriptors (which are always ready).
We could just call stop_reading() and stop_writing() but since
we won't be needing this socket again we may as well close the
entire event source, which calls stop_reading(), stop_writing()
and then closes the socket.
|
| |
|
|
|
|
| |
We always want to warn if we attempt to remove a job that's not present
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
Reviewed by:
Sam Thursfield
Adam Coldrick
Richard Maw
|