[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Libfuse interoperability/ABI broken.



I’m back from holiday now and working on this properly.  Yes, the below is what I had in mind but haven’t had a chance to go through all the permutations yet and see if we can patch this up at run-time without potentially missing any calls.

The good news at least is once you detect one one incorrect/invalid option you can then make assumptions about what version the binary was with and then just apply the fix from that point out without further needing the run the heuristics to detect on every call.  The heuristics and what flags collied with which other flags depend on if we choose to revert to the older ABI or stick with the most recent one, my initial though was to revert on the basis that most distros haven’t updated past the change point yet but Bernd was arguing for keeping the new interface.  Being able to construct a set of heuristics which is safe in either direction could swing this decision.

One option we do have is that fuse_session_new passes in struct fuse_lowlevel_ops and it’s size, whilst it’s also a hack I could add padding to this structure to increase it’s size and then have a way of knowing conclusively at run-time that this bug and any work-arounds are not applicable.

Thank you everyone who's contributed to this discussion, hopefully we can find a path where we can detect this and patch it up entirely in the libfuse source.

Ashley.

> On 9 Mar 2024, at 02:46, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>> 
>> Hi all,
>> 
>> this is certainly not kind of the mail I was hoping for as a new libfuse
>> maintainer.
>> 
>> As you can see from the title and from discussion below (sorry this is
>> not typical ML discussion style), we have a bit of of problem with
>> libfuse ABI compatibility.
>> 
>> While scanning through git history, Ashley found a commit that was adding
>> members in the middle of struct - doesn't break API, but binary
>> compatibility. That commit already landed a year ago in these releases:
>> 
>> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
>> fuse-3.14.1
>> fuse-3.15.0
>> fuse-3.15.1
>> fuse-3.16.1
>> fuse-3.16.2
>> 
>> 
>> Obviously this needs improved testing for, but right now we wonder what
>> would be the preferred action to avoid issues.
>> 
>> a) We could fix it and move bits to the right place. Fixes everything
>> compiled with old versions, but breaks everything compiled with the
>> versions above
>> 
>> b) Increase the .so version - enforces recompilation of all binaries.
>> Intrusive, especially for distributions, but probably safest.
>> 
>> c) Other ideas?
>> 
> 
> Heuristically, you can detect most of the shifted flags at runtime
> because...
> 
>> 
>> 
>> I don't think there is anything in libfuse that would allow us to
>> detect which version of libfuse a library was linked to.
>> 
> 
> I think we do know for sure if fs was linked with libfuse < 3.12
> without fuse_loop_mt_312?
> so only left to detect  3.12..3.14 vs. 3.14..3.16
> 
>> 
>> The commit shifted these members in struct fuse_file_info {
>> 
>> struct fuse_file_info {
>> ...
>>        /** Can be filled by open/create, to allow parallel direct writes on this
>>         *  file */
>>        unsigned int parallel_direct_writes : 1; --> introduced the shift
> 
> Not expected in flush/release, so can be heuristically interpreted as flush
> 
>> 
>>        /** Indicates a flush operation.  Set in flush operation, also
>>            maybe set in highlevel lock operation and lowlevel release
>>            operation. */
>>        unsigned int flush : 1;
>> 
> 
> Not expected in open/create, so can be heuristically interpreted as nonseekable
> 
>>        /** Can be filled in by open, to indicate that the file is not
>>            seekable. */
>>        unsigned int nonseekable : 1;
>> 
> 
> Not expected in release, so can be heuristically interpreted as flock_release
> 
>>        /* Indicates that flock locks for this file should be
>>           released.  If set, lock_owner shall contain a valid value.
>>           May only be set in ->release(). */
>>        unsigned int flock_release : 1;
>> 
> 
> Not expected in opendir, so can be heuristically interpreted as cache_readdir
> 
>>        /** Can be filled in by opendir. It signals the kernel to
>>            enable caching of entries returned by readdir().  Has no
>>            effect when set in other contexts (in particular it does
>>            nothing when set by open()). */
>>        unsigned int cache_readdir : 1;
>> 
> 
> Ignored in open, but based on the comment above, it may be
> implied that some fs may set it in open() reply
> 
>>        /** Can be filled in by open, to indicate that flush is not needed
>>            on close. */
>>        unsigned int noflush : 1;
> 
> noflush is just an optimization, which the kernel ignores anyway
> when writeback cache is enabled, so no harm done if it is wrongly
> interpreted as readdir_cache in open() and ignored.
> It is also quite recent (3.11) so not very likely used.
> 
>> };
>> 
>> I.e. setting flush would actually set parallel_direct_writes
>> (note: with binaries compiled against older libfuse versions)
>> 
> 
> It would be suspicious if fs sets parallel_direct_writes flush at
> flush() or release(), so that can be used as a hint of old ABI
> interpreted as flush and issue a warning.
> 
> It would be pretty ugly to use these heuristics in the library forever,
> but perhaps as safety measure for binaries built with a range of
> libfuse version that is not likely to be found in the wild (3.14..3.16)
> it is an acceptable compromise?
> 
> and perhaps the next libfuse version can pass the header version
> fs was compiled with as argument to fuse_loop_mt_317()?
> 
> Thanks,
> Amir.


Reply to: