Content-type: text/html; charset=UTF-8
Man page of IOCTL-FIEXCHANGE_RANGE
IOCTL-FIEXCHANGE_RANGE
Section: Linux Programmer's Manual (2)
Updated: 2021-04-01
Index
Return to Main Contents
NAME
ioctl_fiexchange_range - exchange the contents of parts of two files
SYNOPSIS
#include <sys/ioctl.h>
#include <linux/fiexchange.h>
int ioctl(int file2_fd, FIEXCHANGE_RANGE, struct file_xchg_range *arg);
DESCRIPTION
Given a range of bytes in a first file
file1_fd
and a second range of bytes in a second file
file2_fd,
this
ioctl(2)
exchanges the contents of the two ranges.
Exchanges are atomic with regards to concurrent file operations, so no
userspace-level locks need to be taken to obtain consistent results.
Implementations must guarantee that readers see either the old contents or the
new contents in their entirety, even if the system fails.
The exchange parameters are conveyed in a structure of the following form:
struct file_xchg_range {
__s64 file1_fd;
__s64 file1_offset;
__s64 file2_offset;
__s64 length;
__u64 flags;
__s64 file2_ino;
__s64 file2_mtime;
__s64 file2_ctime;
__s32 file2_mtime_nsec;
__s32 file2_ctime_nsec;
__u64 pad[6];
};
The field
pad
must be zero.
The fields
file1_fd, file1_offset, and length
define the first range of bytes to be exchanged.
The fields
file2_fd, file2_offset, and length
define the second range of bytes to be exchanged.
Both files must be from the same filesystem mount.
If the two file descriptors represent the same file, the byte ranges must not
overlap.
Most disk-based filesystems require that the starts of both ranges must be
aligned to the file block size.
If this is the case, the ends of the ranges must also be so aligned unless the
FILE_XCHG_RANGE_TO_EOF
flag is set.
The field
flags
control the behavior of the exchange operation.
-
- FILE_XCHG_RANGE_FILE2_FRESH
-
Check the freshness of
file2_fd
after locking the file but before exchanging the contents.
The supplied
file2_ino field
must match file2's inode number, and the supplied
file2_mtime, file2_mtime_nsec, file2_ctime, and file2_ctime_nsec
fields must match the modification time and change time of file2.
If they do not match,
EBUSY
will be returned.
- FILE_XCHG_RANGE_TO_EOF
-
Ignore the
length
parameter.
All bytes in
file1_fd
from
file1_offset
to EOF are moved to
file2_fd,
and file2's size is set to
(file2_offset+(file1_length-file1_offset)).
Meanwhile, all bytes in file2 from
file2_offset
to EOF are moved to file1 and file1's size is set to
(file1_offset+(file2_length-file2_offset)).
This option is not compatible with
FILE_XCHG_RANGE_FULL_FILES.
- FILE_XCHG_RANGE_FSYNC
-
Ensure that all modified in-core data in both file ranges and all metadata
updates pertaining to the exchange operation are flushed to persistent storage
before the call returns.
Opening either file descriptor with
O_SYNC or O_DSYNC
will have the same effect.
- FILE_XCHG_RANGE_SKIP_FILE1_HOLES
-
Skip sub-ranges of
file1_fd
that are known not to contain data.
This facility can be used to implement atomic scatter-gather writes of any
complexity for software-defined storage targets.
- FILE_XCHG_RANGE_DRY_RUN
-
Check the parameters and the feasibility of the operation, but do not change
anything.
- FILE_XCHG_RANGE_COMMIT
-
This flag is a combination of
FILE_XCHG_RANGE_FILE2_FRESH | FILE_XCHG_RANGE_FSYNC
and can be used to commit changes to
file2_fd
to persistent storage if and only if file2 has not changed.
- FILE_XCHG_RANGE_FULL_FILES
-
Require that
file1_offset and file2_offset
are zero, and that the
length
field matches the lengths of both files.
If not,
EDOM
will be returned.
This option is not compatible with
FILE_XCHG_RANGE_TO_EOF.
- FILE_XCHG_RANGE_NONATOMIC
-
This flag relaxes the requirement that readers see only the old contents or
the new contents in their entirety.
If the system fails before all modified in-core data and metadata updates
are persisted to disk, the contents of both file ranges after recovery are not
defined and may be a mix of both.
Do not use this flag unless the contents of both ranges are known to be
identical and there are no other writers.
RETURN VALUE
On error, -1 is returned, and
errno
is set to indicate the error.
ERRORS
Error codes can be one of, but are not limited to, the following:
- EBADF
-
file1_fd
is not open for reading and writing or is open for append-only writes; or
file2_fd
is not open for reading and writing or is open for append-only writes.
- EBUSY
-
The inode number and timestamps supplied do not match
file2_fd
and
FILE_XCHG_RANGE_FILE2_FRESH
was set in
flags.
- EDOM
-
The ranges do not cover the entirety of both files, and
FILE_XCHG_RANGE_FULL_FILES
was set in
flags.
- EINVAL
-
The parameters are not correct for these files.
This error can also appear if either file descriptor represents
a device, FIFO, or socket.
Disk filesystems generally require the offset and length arguments
to be aligned to the fundamental block sizes of both files.
- EIO
-
An I/O error occurred.
- EISDIR
-
One of the files is a directory.
- ENOMEM
-
The kernel was unable to allocate sufficient memory to perform the
operation.
- ENOSPC
-
There is not enough free space in the filesystem exchange the contents safely.
- EOPNOTSUPP
-
The filesystem does not support exchanging bytes between the two
files.
- EPERM
-
file1_fd or file2_fd
are immutable.
- ETXTBSY
-
One of the files is a swap file.
- EUCLEAN
-
The filesystem is corrupt.
- EXDEV
-
file1_fd and file2_fd
are not on the same mounted filesystem.
CONFORMING TO
This API is Linux-specific.
USE CASES
Three use cases are imagined for this system call.
The first is a filesystem defragmenter, which copies the contents of a file
into another file and wishes to exchange the space mappings of the two files,
provided that the original file has not changed. The flags
NONATOMIC and FILE2_FRESH
are recommended for this application.
The second is a data storage program that wants to commit non-contiguous updates
to a file atomically. This can be done by creating a temporary file, calling
FICLONE(2)
to share the contents, and staging the updates into the temporary file.
Either of the
FULL_FILES or TO_EOF
flags are recommended, along with
FSYNC.
Depending on the application's locking design, the flags
FILE2_FRESH or COMMIT
may be applicable here.
The temporary file can be deleted or punched out afterwards.
The third is a software-defined storage host (e.g. a disk jukebox) which
implements an atomic scatter-gather write command.
Provided the exported disk's logical block size matches the file's allocation
unit size, this can be done by creating a temporary file and writing the data
at the appropriate offsets.
Use this call with the
SKIP_HOLES
flag to exchange only the blocks involved in the write command.
The use of the
FSYNC
flag is recommended here.
The temporary file should be deleted or punched out completely before being
reused to stage another write.
NOTES
Some filesystems may limit the amount of data or the number of extents that can
be exchanged in a single call.
SEE ALSO
ioctl(2)
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- RETURN VALUE
-
- ERRORS
-
- CONFORMING TO
-
- USE CASES
-
- NOTES
-
- SEE ALSO
-
This document was created by
man2html,
using the manual pages.
Time: 15:00:59 GMT, September 21, 2021