diff options
author | Tim Beale <timbeale@catalyst.net.nz> | 2017-08-11 13:53:31 +1200 |
---|---|---|
committer | Garming Sam <garming@samba.org> | 2017-09-18 05:51:25 +0200 |
commit | 44ca84166e5a909f1acd77323a444ff2e14a8db8 (patch) | |
tree | fd212e528bdf859b9dc9849ff5ac480ee8e73531 /source4/torture/drs | |
parent | 278039ff78c97bda143b21318a59edbd18fa3825 (diff) | |
download | samba-44ca84166e5a909f1acd77323a444ff2e14a8db8.tar.gz |
replmd: Allow missing targets if GET_TGT has already been set
While running the selftests, I noticed a case where DC replication
unexpectedly sends a linked attribute for a deleted object (created in
the drs.ridalloc_exop tests). The problem is due to the
msDS-NC-Replica-Locations attribute, which is a (known) one-way link.
Because it is a one-way link, when the test demotes the DC and deletes
the link target, there is no backlink to delete the link from the source
object.
After much debate and head-scratching, we decided that there wasn't an
ideal way to resolve this problem. Any automated intervention could
potentially do the wrong thing, especially if the link spans partitions.
Running dbcheck will find this problem and is able to fix it (providing
the deleted object is still a tombstone). So the recommendation is to
run dbcheck on your DCs every 6 months (or more frequently if using a
lower tombstone lifetime setting).
However, it does highlight a problem with the current GET_TGT
implementation. If the tombstone object had been expunged and you
upgraded to 4.8, then you would be stuck - replication would fail
because the target object can't be resolved, even with GET_TGT, and
dbcheck would not be able to fix the hanging link. The solution is to
not fail the replication for an unknown target if GET_TGT has already
been set (i.e. the dsdb_repl_flags contains
DSDB_REPL_FLAG_TARGETS_UPTODATE).
It's debatable whether we should add a hanging link in this case or
ignore/drop the link. Some cases to consider:
- If you're talking to a DC that still sends all the links last, you
could still get object deletion between processing the source object's
links and sending the target (GET_TGT just restarts the replication
cycle from scratch). Adding a hanging link in this case would be
incorrect and would add spurious information to the DB.
- Suppose there's a bug in Samba that incorrectly results in an object
disappearing. If other DCs then remove any links that pointed to that
object, it makes recovering from the problem harder. However, simply
ignoring the link shouldn't result in data loss, i.e. replication won't
remove the existing link information from other DCs. Data loss in this
case would only occur if a new DC were brought online, or if it were a
new link that was affected.
Based on this, I think ignoring the link does the least harm.
This problem also highlights that we should really be using the same
logic in both the unknown target and the deleted target cases.
Combining the logic and moving it into a common
replmd_allow_missing_target() function fixes the problem. (This also has
the side-effect of fixing another logic flaw - in the deleted object
case we would unnecessarily retry with GET_TGT if the target object was
in another partition. This is pointless work, because GET_TGT won't
resolve the target).
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972
Reviewed-by: Garming Sam <garming@catalyst.net.nz>
Diffstat (limited to 'source4/torture/drs')
-rw-r--r-- | source4/torture/drs/python/getncchanges.py | 24 |
1 files changed, 23 insertions, 1 deletions
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py index 9022f4a1909..b1a8f68c2c0 100644 --- a/source4/torture/drs/python/getncchanges.py +++ b/source4/torture/drs/python/getncchanges.py @@ -1042,8 +1042,30 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase): self.assertFalse("managedBy" in res[0], "%s in DB still has managedBy attribute" % la_source) + # Check receiving a cross-partition link to a deleted target. + # Delete the target and make sure the deletion is sync'd between DCs + target_guid = self.get_object_guid(la_target) + self.test_ldb_dc.delete(la_target) + self.sync_DCs(nc_dn=self.config_dn) + self._disable_all_repl(self.dnsname_dc2) + + # re-animate the target + self.restore_deleted_object(target_guid, la_target) + self.modify_object(la_source, "managedBy", la_target) + + # now sync the link - because the target is in another partition, the + # peer DC receives a link for a deleted target, which it should accept + self.sync_DCs() + res = self.test_ldb_dc.search(ldb.Dn(self.ldb_dc1, la_source), + attrs=["managedBy"], + controls=['extended_dn:1:0'], + scope=ldb.SCOPE_BASE) + self.assertTrue("managedBy" in res[0], "%s in DB missing managedBy attribute" + % la_source) + # cleanup the server object we created in the Configuration partition - self.test_ldb_dc.delete(la_source) + self.test_ldb_dc.delete(la_target) + self._enable_all_repl(self.dnsname_dc2) def test_repl_get_tgt_multivalued_links(self): """Tests replication with multi-valued link attributes.""" |