summaryrefslogtreecommitdiff
path: root/migration/rdma.c
Commit message (Collapse)AuthorAgeFilesLines
* migration/rdma: Check for postcopy soonerJuan Quintela2023-05-051-12/+12
| | | | | | | | | | It makes no sense first try to see if there is an rdma error and then do nothing on postcopy stage. Change it so we check we are in postcopy before doing anything. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20230504114443.23891-6-quintela@redhat.com>
* migration/rdma: We can calculate the rioc from the QEMUFileJuan Quintela2023-05-051-3/+3
| | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20230504114443.23891-4-quintela@redhat.com>
* migration/rdma: Don't pass the QIOChannelRDMA as an opaqueJuan Quintela2023-05-051-3/+3
| | | | | | | | We can calculate it from the QEMUFile like the caller. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20230503131847.11603-6-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: Unfold last user of acct_update_position()Juan Quintela2023-05-031-1/+3
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>
* migration/rdma: Split the zero page case from acct_update_positionJuan Quintela2023-05-031-2/+5
| | | | | | | | Now that we have atomic counters, we can do it on the place that we need it, no need to do it inside ram.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>
* migration: Create migrate_rdma_pin_all() functionJuan Quintela2023-04-241-3/+3
| | | | | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Fixed missing space after comma (fabiano)
* migration: Move migrate_use_return() to options.cJuan Quintela2023-04-241-3/+3
| | | | | | | | Once that we are there, we rename the function to migrate_return_path() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
* migration: rename enabled_capabilities to capabilitiesJuan Quintela2023-04-241-2/+2
| | | | | | | | | It is clear from the context what that means, and such a long name with the extra long names of the capabilities make very difficilut to stay inside the 80 columns limit. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
* migration/rdma: Remove deprecated variable rdma_return_pathLi Zhijian2023-03-161-2/+1
| | | | | | | | | It's no longer needed since commit 44bcfd45e98 ("migration/rdma: destination: create the return patch after the first accept") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: Fix return-path caseDr. David Alan Gilbert2023-03-161-3/+5
| | | | | | | | | | | | | | | | | The RDMA code has return-path handling code, but it's only enabled if postcopy is enabled; if the 'return-path' migration capability is enabled, the return path is NOT setup but the core migration code still tries to use it and breaks. Enable the RDMA return path if either postcopy or the return-path capability is enabled. bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* io: Add support for MSG_PEEK for socket channelmanish.mishra2023-02-061-0/+1
| | | | | | | | | | | | | | MSG_PEEK peeks at the channel, The data is treated as unread and the next read shall still return this data. This support is currently added only for socket class. Extra parameter 'flags' is added to io_readv calls to pass extra read flags like MSG_PEEK. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: fix return value for qio_channel_rdma_{readv,writev}Fiona Ebner2023-02-061-5/+10
| | | | | | | | | | | | | | | | | | | upon errors. As the documentation in include/io/channel.h states, only -1 and QIO_CHANNEL_ERR_BLOCK should be returned upon error. Other values have the potential to confuse the call sites. error_setg is used rather than error_setg_errno, because there are certain code paths where -1 (as a non-errno) is propagated up (e.g. starting from qemu_rdma_block_for_wrid or qemu_rdma_post_recv_control) all the way to qio_channel_rdma_{readv,writev}. Similar to a216ec85b7 ("migration/channel-block: fix return value for qio_channel_block_{readv,writev}"). Suggested-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: remove the QEMUFileOps abstractionDaniel P. Berrangé2022-06-231-3/+2
| | | | | | | | | | | Now that all QEMUFile callbacks are removed, the entire concept can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: stop passing 'opaque' parameter to QEMUFile hooksDaniel P. Berrangé2022-06-221-9/+10
| | | | | | | | | | | The only user of the hooks is RDMA which provides a QIOChannel backed impl of QEMUFile. It can thus use the qemu_file_get_ioc() method. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: remove unreachble RDMA code in save_hook implDaniel P. Berrangé2022-06-221-99/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | The QEMUFile 'save_hook' callback has a 'size_t size' parameter. The RDMA impl of this has logic that takes different actions depending on whether the value is zero or non-zero. It has commented out logic that would have taken further actions if the value was negative. The only place where the 'save_hook' callback is invoked is the ram_control_save_page() method, which passes 'size' through from its caller. The only caller of this method is in turn control_save_page(). This method unconditionally passes the 'TARGET_PAGE_SIZE' constant for the 'size' parameter. IOW, the only scenario for 'size' that can execute in the qemu_rdma_save_page method is 'size > 0'. The remaining code has been unreachable since RDMA support was first introduced 9 years ago. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Remove RDMA_UNREGISTRATION_EXAMPLEJuan Quintela2022-06-221-41/+0
| | | | | | | | | | Nobody has ever showed up to unregister individual pages, and another set of patches written by Daniel P. Berrangé <berrange@redhat.com> just remove qemu_rdma_signal_unregister() function needed here. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* QIOChannel: Add flags on io_writev and introduce io_flush callbackLeonardo Bras2022-05-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add flags to io_writev and introduce io_flush as optional callback to QIOChannelClass, allowing the implementation of zero copy writes by subclasses. How to use them: - Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY), - Wait write completion with qio_channel_flush(). Notes: As some zero copy write implementations work asynchronously, it's recommended to keep the write buffer untouched until the return of qio_channel_flush(), to avoid the risk of sending an updated buffer instead of the buffer state during write. As io_flush callback is optional, if a subclass does not implement it, then: - io_flush will return 0 without changing anything. Also, some functions like qio_channel_writev_full_all() were adapted to receive a flag parameter. That allows shared code between zero copy and non-zero copy writev, and also an easier implementation on new flags. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20220513062836.965425-3-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: set the REUSEADDR option for destinationJack Wang2022-03-021-0/+7
| | | | | | | | | | | | | | | | | | | We hit following error during testing RDMA transport: in case of migration error, mgmt daemon pick one migration port, incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr Then try another -incoming rdma:[::]:8103, sometime it worked, sometimes need another try with other ports number. Set the REUSEADDR option for destination, This allow address could be reused to avoid rdma_bind_addr error out. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Message-Id: <20220208085640.19702-1-jinpu.wang@ionos.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed up some tabs
* aio-posix: split poll check from ready handlerStefan Hajnoczi2022-01-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* migration/rdma: Fix out of order wridLi Zhijian2021-11-011-37/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | destination: ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.22.23:8888 qemu-system-x86_64: -spice streaming-video=filter,port=5902,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated Please use disable-ticketing=on instead QEMU 6.0.50 monitor - type 'help' for more information (qemu) trace-event qemu_rdma_block_for_wrid_miss on (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL RECV (4000) source: ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5901,disable-ticketing -S qemu-system-x86_64: -spice streaming-video=filter,port=5901,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated Please use disable-ticketing=on instead QEMU 6.0.50 monitor - type 'help' for more information (qemu) (qemu) trace-event qemu_rdma_block_for_wrid_miss on (qemu) migrate -d rdma:192.168.22.23:8888 source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got CONTROL RECV (4000) NOTE: we use soft RoCE as the rdma device. [root@iaas-rpma images]# rdma link show rxe_eth0/1 link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 This migration could not be completed when out of order(OOO) CQ event occurs. The send queue and receive queue shared a same completion queue, and qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. So in this case, qemu_rdma_block_for_wrid() will block forever. OOO cases will occur in both source side and destination side. And a forever blocking happens on only SEND and RECV are out of order. OOO between 'WRITE RDMA' and 'RECV' doesn't matter. below the OOO sequence: source destination rdma_write_one() qemu_rdma_registration_handle() 1. S1: post_recv X D1: post_recv Y 2. wait for recv CQ event X 3. D2: post_send X ---------------+ 4. wait for send CQ send event X (D2) | 5. recv CQ event X reaches (D2) | 6. +-S2: post_send Y | 7. | wait for send CQ event Y | 8. | recv CQ event Y (S2) (drop it) | 9. +-send CQ event Y reaches (S2) | 10. send CQ event X reaches (D2) -----+ 11. wait recv CQ event Y (dropped by (8)) Although a hardware IB works fine in my a hundred of runs, the IB specification doesn't guaratee the CQ order in such case. Here we introduce a independent send completion queue to distinguish ibv_post_send completion queue from the original mixed completion queue. It helps us to poll the specific CQE we are really interested in. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: advise prefetch write for ODP regionLi Zhijian2021-10-191-0/+42
| | | | | | | | | | | | | | | | The responder mr registering with ODP will sent RNR NAK back to the requester in the face of the page fault. --------- ibv_poll_cq wc.status=13 RNR retry counter exceeded! ibv_poll_cq wrid=WRITE RDMA! --------- ibv_advise_mr(3) helps to make pages present before the actual IO is conducted so that the responder does page fault as little as possible. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: Try to register On-Demand Paging memory regionLi Zhijian2021-10-191-20/+53
| | | | | | | | | | | | | | | Previously, for the fsdax mem-backend-file, it will register failed with Operation not supported. In this case, we can try to register it with On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2]. [1]: https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x [2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3 CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: prevent from double free the same mrLi Zhijian2021-07-131-0/+1
| | | | | | | | | | | | | | | | | | | | backtrace: '0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 478 void *addr = mr->addr; (gdb) bt #0 0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 #1 0x0000555555891fcc in rdma_delete_block (block=<optimized out>, rdma=0x7fff38176010) at ../migration/rdma.c:691 #2 qemu_rdma_cleanup (rdma=0x7fff38176010) at ../migration/rdma.c:2365 #3 0x00005555558925b0 in qio_channel_rdma_close_rcu (rcu=0x555556b8b6c0) at ../migration/rdma.c:3073 #4 0x0000555555d652a3 in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:281 #5 0x0000555555d5edf9 in qemu_thread_start (args=0x7fffe88bb4d0) at ../util/qemu-thread-posix.c:541 #6 0x00007ffff54c73f9 in start_thread () at /lib64/libpthread.so.0 #7 0x00007ffff53f3b03 in clone () at /lib64/libc.so.6 ' Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210708144521.1959614-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* misc: Remove redundant new line in perror()Li Zhijian2021-07-091-1/+1
| | | | | | | Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210706094433.1766952-1-lizhijian@cn.fujitsu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* migration/rdma: Use error_report to suppress errno messageLi Zhijian2021-07-051-2/+2
| | | | | | | | | | | | | | | Since the prior calls are successful, in this case a errno doesn't indicate a real error which would just make us confused. before: (qemu) migrate -d rdma:192.168.22.23:8888 source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No space left on device Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210628071959.23455-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: Fix cm event use after freeLi Zhijian2021-06-081-3/+8
| | | | | | | Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210602023506.3821293-1-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: source: poll cm_event from return pathLi Zhijian2021-05-261-4/+38
| | | | | | | | | | | | | source side always blocks if postcopy is only enabled at source side. users are not able to cancel this migration in this case. Let source side have chance to cancel this migration Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-4-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Typo fix
* migration/rdma: destination: create the return patch after the first acceptLi Zhijian2021-05-261-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | destination side: $ build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.1.10:8888 (qemu) migrate_set_capability postcopy-ram on (qemu) dest_init RDMA Device opened: kernel name rocep1s0f0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/rocep1s0f0, transport: (2) Ethernet Segmentation fault (core dumped) (gdb) bt #0 qemu_rdma_accept (rdma=0x0) at ../migration/rdma.c:3272 #1 rdma_accept_incoming_migration (opaque=0x0) at ../migration/rdma.c:3986 #2 0x0000563c9e51f02a in aio_dispatch_handler (ctx=ctx@entry=0x563ca0606010, node=0x563ca12b2150) at ../util/aio-posix.c:329 #3 0x0000563c9e51f752 in aio_dispatch_handlers (ctx=0x563ca0606010) at ../util/aio-posix.c:372 #4 aio_dispatch (ctx=0x563ca0606010) at ../util/aio-posix.c:382 #5 0x0000563c9e4f4d9e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 #6 0x00007fe96ef3fa9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #7 0x0000563c9e4ffeb8 in glib_pollfds_poll () at ../util/main-loop.c:231 #8 os_host_main_loop_wait (timeout=12188789) at ../util/main-loop.c:254 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530 #10 0x0000563c9e3c7211 in qemu_main_loop () at ../softmmu/runstate.c:725 #11 0x0000563c9dfd46fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 The rdma return path will not be created when qemu incoming is starting since migrate_copy() is false at that moment, then a NULL return path rdma was referenced if the user enabled postcopy later. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-3-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: Fix rdma_addrinfo res leaksLi Zhijian2021-05-261-0/+3
| | | | | | | | | rdma_freeaddrinfo() is the reverse operation of rdma_getaddrinfo() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210525080552.28259-2-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: cleanup rdma in rdma_start_incoming_migration error pathLi Zhijian2021-05-261-2/+5
| | | | | | | | | the error path after calling qemu_rdma_dest_init() should do rdma cleanup Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210520081148.17001-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: Fix cm_event used before being initializedLi Zhijian2021-05-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | A segmentation fault was triggered when i try to abort a postcopy + rdma migration. since rdma_ack_cm_event releases a uninitialized cm_event in these case. like below: 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); 2497 if (ret) { 2498 perror("rdma_get_cm_event after rdma_connect"); 2499 ERROR(errp, "connecting to destination!"); 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault 2501 goto err_rdma_source_connect; 2502 } Refer to the rdma_get_cm_event() code, cm_event will be updated/changed only if rdma_get_cm_event() returns 0. So it's okey to remove the ack in error patch. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210519064740.10828-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Delete redundant spacesBihong Yu2020-10-261-1/+1
| | | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-9-git-send-email-yubihong@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Open brace '{' following function declarations go on the next lineBihong Yu2020-10-261-1/+2
| | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-8-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Do not use C99 // commentsBihong Yu2020-10-261-1/+1
| | | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1603163448-27122-2-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi2020-09-231-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
* Merge remote-tracking branch ↵Peter Maydell2020-09-221-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging Pull request trivial patches 20200919 # gpg: Signature made Sat 19 Sep 2020 19:43:35 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.2-pull-request: contrib/: fix some comment spelling errors qapi/: fix some comment spelling errors disas/: fix some comment spelling errors linux-user/: fix some comment spelling errors util/: fix some comment spelling errors scripts/: fix some comment spelling errors docs/: fix some comment spelling errors migration/: fix some comment spelling errors qemu/: fix some comment spelling errors scripts/git.orderfile: Display meson files along with buildsys ones hw/timer/hpet: Fix debug format strings hw/timer/hpet: Remove unused functions hpet_ram_readb, hpet_ram_readw meson: remove empty else and duplicated gio deps manual: escape backslashes in "parsed-literal" blocks ui/spice-input: Remove superfluous forward declaration hw/ppc/ppc4xx_pci: Replace magic value by the PCI_NUM_PINS definition hw/gpio/max7310: Remove impossible check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * migration/: fix some comment spelling errorszhaolichang2020-09-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | I found that there are many spelling errors in the comments of qemu, so I used the spellcheck tool to check the spelling errors and finally found some spelling errors in the migration folder. Signed-off-by: zhaolichang <zhaolichang@huawei.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20200917075029.313-3-zhaolichang@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* | Use OBJECT_DECLARE_SIMPLE_TYPE when possibleEduardo Habkost2020-09-181-3/+1
|/ | | | | | | | | | | | | This converts existing DECLARE_INSTANCE_CHECKER usage to OBJECT_DECLARE_SIMPLE_TYPE when possible. $ ./scripts/codeconverter/converter.py -i \ --pattern=AddObjectDeclareSimpleType $(git grep -l '' -- '*.[ch]') Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20200916182519.415636-6-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* Use DECLARE_*CHECKER* macrosEduardo Habkost2020-09-091-2/+2
| | | | | | | | | | | | | | | Generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]') Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-12-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-13-ehabkost@redhat.com> Message-Id: <20200831210740.126168-14-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* Move QOM typedefs and add missing includesEduardo Habkost2020-09-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some typedefs and macros are defined after the type check macros. This makes it difficult to automatically replace their definitions with OBJECT_DECLARE_TYPE. Patch generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]') which will split "typdef struct { ... } TypedefName" declarations. Followed by: $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \ $(git grep -l '' -- '*.[ch]') which will: - move the typedefs and #defines above the type check macros - add missing #include "qom/object.h" lines if necessary Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-9-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-10-ehabkost@redhat.com> Message-Id: <20200831210740.126168-11-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell2020-07-071-2/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio,acpi: features, fixes, cleanups. vdpa support virtio-mem support a handy script for disassembling acpi tables misc fixes and cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 07 Jul 2020 13:00:35 BST # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (41 commits) vhost-vdpa: introduce vhost-vdpa net client vhost-vdpa: introduce vhost-vdpa backend vhost_net: introduce set_config & get_config vhost: implement vhost_force_iommu method vhost: introduce new VhostOps vhost_force_iommu vhost: implement vhost_vq_get_addr method vhost: introduce new VhostOps vhost_vq_get_addr vhost: implement vhost_dev_start method vhost: introduce new VhostOps vhost_dev_start vhost: check the existence of vhost_set_iotlb_callback virtio-pci: implement queue_enabled method virtio-bus: introduce queue_enabled method vhost_net: use the function qemu_get_peer net: introduce qemu_get_peer MAINTAINERS: add VT-d entry docs: vhost-user: add Virtio status protocol feature tests/acpi: remove stale allowed tables numa: Auto-enable NUMA when any memory devices are possible virtio-mem: Exclude unplugged memory during migration virtio-mem: Add trace events ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # hw/arm/virt.c # hw/virtio/trace-events
| * migration/rdma: Use ram_block_discard_disable()David Hildenbrand2020-07-021-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA will pin all guest memory (as documented in docs/rdma.txt). We want to disable RAM block discards - however, to keep it simple use ram_block_discard_is_required() instead of inhibiting. Note: It is not sufficient to limit disabling to pin_all. Even when only conditionally pinning 1 MB chunks, as soon as one page within such a chunk was discarded and one page not, the discarded pages will be pinned as well. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-9-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | migration/rdma: Plug memory leaks in qemu_rdma_registration_stop()Markus Armbruster2020-07-021-10/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qemu_rdma_registration_stop() uses the ERROR() macro to create, report to stderr, and store an Error object. The stored Error object is never used, and its memory is leaked. Even where ERROR() doesn't leak, it is ill-advised. The whole point of passing an Error to the caller is letting the caller handle the error. Error handling may report to stderr, to somewhere else, or not at all. Also reporting in the callee mixes up concerns that should be kept separate. Since I don't know what reporting to stderr is supposed to accomplish, I'm not touching it. Commit 2a1bc8bde7 "migration/rdma: rdma_accept_incoming_migration fix error handling" plugged the same leak in rdma_accept_incoming_migration(). Plug the memory leak the same way: keep the report part, delete the store part. The report part uses fprintf(). If it's truly an error, it should use error_report() instead. But I don't know, so I leave it alone, just like commit 2a1bc8bde7 did. Fixes: 2da776db4846eadcb808598a5d3484d149773c05 Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200630090351.1247703-27-armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: cleanup rdma context before g_free to avoid memleaksPan Nengyuan2020-06-011-3/+5
| | | | | | | | | | | When error happen in initializing 'rdma_return_path', we should cleanup rdma context before g_free(rdma) to avoid some memleaks. This patch fix that. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200508100755.7875-3-pannengyuan@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: fix potential nullptr access in rdma_start_incoming_migrationPan Nengyuan2020-06-011-1/+3
| | | | | | | | | | | | 'rdma' is NULL when taking the first error branch in rdma_start_incoming_migration. And it will cause a null pointer access in label 'err'. Fix that. Fixes: 59c59c67ee6b0327ae932deb303caa47919aeb1e Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200508100755.7875-2-pannengyuan@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Note this is CID 1428762
* migration/rdma: fix a memleak on error path in rdma_start_incoming_migrationPan Nengyuan2020-05-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | 'rdma->host' is malloced in qemu_rdma_data_init, but forgot to free on the error path in rdma_start_incoming_migration(), this patch fix that. The leak stack: Direct leak of 2 byte(s) in 1 object(s) allocated from: #0 0x7fb7add18ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8) #1 0x7fb7ad0df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5) #2 0x7fb7ad0f8b32 in g_strdup (/lib64/libglib-2.0.so.0+0x6cb32) #3 0x55a0464a0f6f in qemu_rdma_data_init /mnt/sdb/qemu/migration/rdma.c:2647 #4 0x55a0464b0e76 in rdma_start_incoming_migration /mnt/sdb/qemu/migration/rdma.c:4020 #5 0x55a0463f898a in qemu_start_incoming_migration /mnt/sdb/qemu/migration/migration.c:365 #6 0x55a0458c75d3 in qemu_init /mnt/sdb/qemu/softmmu/vl.c:4438 #7 0x55a046a3d811 in main /mnt/sdb/qemu/softmmu/main.c:48 #8 0x7fb7a8417872 in __libc_start_main (/lib64/libc.so.6+0x23872) #9 0x55a04536b26d in _start (/mnt/sdb/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x286926d) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200420102727.17339-1-pannengyuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma: rdma_accept_incoming_migration fix error handlingDr. David Alan Gilbert2020-02-131-4/+7
| | | | | | | | | | | | | | rdma_accept_incoming_migration is called from an fd handler and can't return an Error * anywhere. Currently it's leaking Error's in errp/local_err - there's no point putting them in there unless we can report them. Turn most into fprintf's, and the last into an error_reportf_err where it's coming up from another function. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* multifd: Make multifd_load_setup() get an Error parameterJuan Quintela2020-01-291-1/+1
| | | | | | | We need to change the full chain to pass the Error parameter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Use automatic rcu_read unlock in rdma.cDr. David Alan Gilbert2019-10-111-46/+11
| | | | | | | | | Use the automatic read unlocker in migration/rdma.c. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20191007143642.301445-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/rdma.c: Swap synchronize_rcu for call_rcuDr. David Alan Gilbert2019-09-251-12/+27
| | | | | | | | | | | | | | | | | | | | | This fixes a deadlock that can occur on the migration source after a failed RDMA migration; as the source tries to cleanup it clears a pair of pointers and uses synchronize_rcu to wait; this is happening on the main thread. With the CPUs running a CPU thread can be an rcu reader and attempt to grab the main lock (kvm_handle_io->address_space_write->flatview_write->flatview_write_continue-> prepare_mmio_access->qemu_mutex_lock_iothread_impl) Replace the synchronize_rcu with a call_rcu to postpone the freeing. Fixes: 74637e6f08fceda98806 ("migration: implement bi-directional RDMA QIOChannel") ( https://bugzilla.redhat.com/show_bug.cgi?id=1746787 ) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190913163507.1403-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>