| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
This will make it easier to determine what is wrong if the controller
daemon is run with a bad controller host address.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found an issue in distbuild where the controller was stuck in a busy
loop where it was continually writing to a closed socket. With 'strace'
I saw write(), SIGPIPE, write(), SIGPIPE, ad infinitum. I got this much
of a Python backtrace using GDB:
distbuild.socketsrc.SocketEventSource.write()
distbuild.sockbuf.SocketBuffer._flush()
distbuild.sm.StateMachine.handle_event()
I didn't manage to get further. However, I suspect one of the state
machine transitions may be creating an event loop instead of correctly
handling the error.
The log file was quiet at this point, the last entries were:
2014-06-19 08:57:36 INFO There seems to be nothing to build
2014-06-19 08:57:36 INFO Requested artifact is built
2014-06-19 08:57:36 DEBUG InitiatorConnection: sent to 10.24.1.215:53818: {'mess
age': 'Need to build 0 artifacts', 'type': 'build-progress', 'id': 790629564}
2014-06-19 08:57:36 DEBUG Notifying initiator of successful build
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <BuildController at 0xb
6c554c, request-id InitiatorConnection-93>
2014-06-19 08:57:36 DEBUG InitiatorConnection: sent to 10.24.1.215:53818: {'type
': 'build-finished', 'id': 790629564, 'urls': [u'http://hawkdevtrove:8080/1.0/ar
tifacts?filename=861f640923494ca3626bbd65655b350ce1bebea4c0bf7a57693bc06ed122cef
4.system.devel-system-x86_32-chroot-rootfs']}
2014-06-19 08:57:36 DEBUG InitiatorConnection: 10.24.1.215:53818: closing: <Json
Machine at 0xc6cb22c: socket 10.24.1.164:7878 -> 10.24.1.215:53818, max_buffer 1
6384>
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <InitiatorConnection at 0xc6cbcec: remote 10.24.1.215:53818>
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <JsonMachine at 0xc6cb22c: socket 10.24.1.164:7878 -> 10.24.1.215:53818, max_buffer 16384>
2014-06-19 08:57:36 DEBUG MainLoop.remove_state_machine: <SocketBuffer at 0xc6cbe2c: socket None max_buffer 16384>
This commit should improve matters a little: in future the log file will show
the ID of the SocketEventSource object and error we hit when calling its
write() function.
|
| |
|
|
|
|
| |
This change is made just for consistency.
|
|
|
|
|
|
| |
The InitiatorConnectionMachine wraps the ConnectionMachine,
so we can continue to use ConnectionMachine without providing
it with an app.
|
|
|
|
|
|
|
| |
By default there is no limit on the number of reconnection attempts.
We make the reconnect_interval a parameter, but the default
interval remains 1 second.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the mainloop unintentionally busy-waits if
we can't connect to the controller.
We want the mainloop's select call to wait for the timeout
not for this socket's descriptors (which are always ready).
We could just call stop_reading() and stop_writing() but since
we won't be needing this socket again we may as well close the
entire event source, which calls stop_reading(), stop_writing()
and then closes the socket.
|
| |
|
|
|
|
| |
We always want to warn if we attempt to remove a job that's not present
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
Reviewed by:
Sam Thursfield
Adam Coldrick
Richard Maw
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If a new build request makes a request for an artifact that is currently being
cached then the artifact will be needlessly rebuilt.
To avoid this the new build request should wait for caching to finish.
We rename _ExecStarted, _ExecEnded, _ExecFailed to
_JobStarted, _JobFinished, _JobFailed
and Job's is_building attribute is renamed to running.
|
| |
| |
| |
| |
| | |
This fixes the bug that causes the distbuild controller
to crash when population of the artifact cache fails.
|
|/ |
|
|\
| |
| |
| |
| | |
Reviewed-By: Richard Ipsum <richard.ipsum@codethink.co.uk>
Reviewed-By: Lars Wirzenius <lars.wirzenius@codethink.co.uk>
|
| |
| |
| |
| |
| | |
Users need to be able to see logs of all builds, not just those that
failed.
|
|/ |
|
|
|
|
| |
To cancel jobs cleanly we need to know when a job has failed.
|
| |
|
| |
|
|
|
|
| |
add_initiator() isn't necessary given lists have a remove method.
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
Reviewed by:
Sam Thursfield
Richard Maw
Lars Wirzenius
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
The contents of the message has changed for several events,
event messages that need to be sent to several initiators have a list
of ids instead of a single id.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are two new messages:
WorkerBuildStepAlreadyStarted tells the initiator that the artifact
they want to build is already being built, e.g.
'eglibc-misc is already building on 172.17.1.37:3434'
WorkerBuildWaiting tells the initiator that the artifact they want
to build can't be built yet because there aren't any workers free, e.g.
'Ready to build eglibc-misc: waiting for a worker to become available'
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Put our _exec_response_msg into WorkerBuildFinished event,
it's essentially the same as _finished_msg, just a different name
Get our artifact's cache key from the job
|
| |
| |
| |
| | |
Now we just get everything from the job object
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The exec_response_msg also needs to be sent to a number of initiators,
so we give it a list of ids not just one.
The exec_response_msg will be sent to the controller once the artifacts
have been cached successfully.
There's no longer any need to use a route map to retrieve
the id of the initiator, since this is stored with the job
|
| |
| |
| |
| |
| | |
msg now contains a list of initiator ids rather than a single one,
since BuiltOutput needs to be sent to a number of initiators
|
| |
| |
| |
| |
| |
| | |
Each job is given a unique id, so we don't need to generate
an id for each exec request this means we can remove use of route map
since we can use the job's id for the exec request
|
| |
| |
| |
| | |
This method no longer works, we will replace it soon.
|
| |
| |
| |
| |
| | |
The name change from BuildFailed -> JobFailed etc
was unintentionally merged into master, undo this.
|
| |
| |
| |
| |
| |
| | |
_job is the job this worker is carrying out
_exec_response_msg will contain the response the worker sends back to us
when it finishes the build.
|
| | |
|
| | |
|
| |
| |
| |
| | |
We need to be able to send this message to a number of initiators
|
| | |
|
|/ |
|
|
|
|
|
|
| |
Most of the time knowing the state of the build
graph isn't that useful for debugging distbuild,
but it may be useful in some situations.
|
|
|
|
|
|
| |
The controller should check the response event is
actually the response it was waiting for before
logging that there was a cache response
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
distbuild/build_controller.py
Reviewed by:
Lars Wirzenius
Daniel Silverstone
Sam Thursfield
|
| |
| |
| |
| | |
We now get the state of all artifacts with a single request.
|
| |
| |
| |
| | |
body and headers must now be specified for http-request message.
|
| | |
|