Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

To: Wouter Verhelst <w@uter.be>
Cc: Eric Blake <eblake@redhat.com>, v.sementsov-og@mail.ru, qemu-block@nongnu.org, libguestfs@redhat.com, nbd@other.debian.org
Subject: Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Fri, 8 Apr 2022 12:52:44 +0100
Message-id: <[🔎] 20220408115244.GL8021@redhat.com>
In-reply-to: <Yk/jTZcHpW5qWsvg@pc181009.grep.be>
References: <[🔎] 20220407213720.250252-1-eblake@redhat.com> <[🔎] 20220407213720.250252-2-eblake@redhat.com> <Yk/jTZcHpW5qWsvg@pc181009.grep.be>

On Fri, Apr 08, 2022 at 09:25:01AM +0200, Wouter Verhelst wrote:
> Hi Eric,
> 
> On Thu, Apr 07, 2022 at 04:37:19PM -0500, Eric Blake wrote:
> > The spec was silent on how many extents a server could reply with.
> > However, both qemu and nbdkit (the two server implementations known to
> > have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
> > cap, and will truncate the amount of extents in a reply to avoid
> > sending a client a reply larger than the maximum NBD_CMD_READ response
> > they are willing to tolerate:
> > 
> > When qemu first implemented NBD_CMD_BLOCK_STATUS for the
> > base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
> > as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
> > responded with more than one extent.  Later, when adding its
> > qemu:dirty-bitmap:XXX context extension (qemu commit 3d068aff16, Jun
> > 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
> > applied to base:allocation once qemu started sending multiple extents
> > for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
> > extents are never smaller than 512 bytes (other than an exception at
> > the end of a file whose size is not aligned to 512), but even so, a
> > request for just under 4G of block status could produce 8M extents,
> > resulting in a reply of 64M if it were not capped smaller.
> > 
> > When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
> > Mar 2019), it did not impose any restriction on the number of extents
> > in the reply chunk.  But because it allows extents as small as one
> > byte, it is easy to write a server that can amplify a client's request
> > of status over 1M of the image into a reply over 8M in size, and it
> > was very easy to demonstrate that a hard cap was needed to avoid
> > crashing clients or otherwise killing the connection (a bad server
> > impacting the client negatively); unique to nbdkit's situation is the
> > fact that because it is designed for plugin server implementations,
> > not capping the length of extent also posed a problem to nbdkit as the
> > server (a client requesting large block status could cause the server
> > to run out of memory depending on the plugin providing the server
> > callbacks).  So nbdkit enforced a bound of 1M extents (8M+4 bytes,
> > nbdkit commit 6e0dc839ea, Jun 2019).
> > 
> > Since the limit chosen by these two implementations is different, and
> > since nbdkit has versions that were not limited, add this as a SHOULD
> > NOT instead of MUST NOT constraint on servers implementing block
> > status.  It does not matter that qemu picked a smaller limit that it
> > truncates to, since we have already documented that the server may
> > truncate for other reasons (such as it being inefficient to collect
> > that many extents in the first place).  But documenting the limit now
> > becomes even more important in the face of a future addition of 64-bit
> > requests, where a client's request is no longer bounded to 4G and
> > could thereby produce even more than 8M extents for the corner case
> > when every 512 bytes is a new extent, if it were not for this
> > recommendation.
> 
> It feels backwards to me to make this a restriction on the server side.
> You're saying there are server implementations that will be inefficient
> if there are more than 2^20 extents, and therefore no server should send
> more than those, even if it can do so efficiently.
> 
> Isn't it up to the server implementation to decide what can be done
> efficiently?
> 
> Perhaps we can make the language about possibly reducing length of
> extens a bit stronger; but I don't think adding explicit limits for a
> server's own protection is necessary.

I agree, but for a different reason.

I think Eric should add language that servers can consider limiting
response sizes in order to prevent possible amplification issues
and/or simply overwhelming the client with work (bad server DoS
attacks against clients are a thing!), but I don't think it's
necessarily a "SHOULD" issue.

BTW attached is an nbdkit plugin that creates an NBD server that
responds with massive numbers of byte-granularity extents, in case
anyone wants to test how nbdkit and/or clients respond:

$ chmod +x /var/tmp/lots-of-extents.py
$ /var/tmp/lots-of-extents.py -f

$ nbdinfo --map nbd://localhost | head
         0           1    3  hole,zero
         1           1    0  data
         2           1    3  hole,zero
         3           1    0  data
         4           1    3  hole,zero
         5           1    0  data
         6           1    3  hole,zero
         7           1    0  data
         8           1    3  hole,zero
         9           1    0  data
$ nbdinfo --map --totals nbd://localhost 
    524288  50.0%   0 data
    524288  50.0%   3 hole,zero

>From experimentation I found this really hurts my laptop :-(

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

#!/usr/sbin/nbdkit python
API_VERSION = 2

def open(readonly):
    return 1

def get_size(h):
    return 1024*1024

def pread(h, buf, offset, flags):
    buf[:] = bytearray(len(buf))

def typ(offset):
    if offset % 2:
        return 0 # data
    else:
        return 3 # zero + hole

def extents(h, count, offset, flags):
    e = [(i, 1, typ(i)) for i in range(offset, offset + count + 1)]
    return e

Reply to:

Follow-Ups:
- Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  - From: "Richard W.M. Jones" <rjones@redhat.com>
- Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  - From: Nir Soffer <nsoffer@redhat.com>

References:
- [PATCH 0/2] More NBD spec prep-work before 64-bit headers
  - From: Eric Blake <eblake@redhat.com>
- [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  - From: Eric Blake <eblake@redhat.com>
- Re: [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
  - From: Wouter Verhelst <w@uter.be>

Prev by Date: Re: [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
Next by Date: Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
Previous by thread: Re: [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
Next by thread: Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length
Index(es):
- Date
- Thread