[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Why writing files eats up memory: Part II: The Solution



So the behavior is that writing a large file (one N MB long) eats up N
MB of memory quickly.  Why?

Well, suppose dd is writing a big file, and getting data from
/dev/zero.  Or suppose ld is writing a big file and getting data from
some other in-core segment.  What happens?

Well, it basically copies the data into the memory object backed by
the file--as fast as it can.  (I'll call this object the "hosage
object".)  In both those cases (dd or ld) it can do it pretty darn
fast, because it isn't waiting on anything.  (If it's writing into
/dev/null, then it never gets copied into a hosage object, and so none
of what follows applies, which explains Roland's observation that it
matters whether dd is writing to a file or /dev/null.)

So now we have two processes: one which is filling the hosage object
as fast as it possibly can, and another (the filesystem's pager
threads) which is writing out the data.

As the writer proceeds to fill the hosage object, the pool of free
pages in the system plummets (*quickly*) and the pageout thread in the
kernel kicks in.  The kernel notices the sequential access pattern
(contrary to my previous statements it already does this; more on it
in a separate message, because there are still problems), and pretty
much all the pages in the hosage object are marked "inactive", and
therefore ripe for immediate pageout.

So the situation is a fairly small pool of active pages, a small pool
of "ordinary" inactive pages (idle pages from other things, ripe for
pageout), and a gajillion inactive pages belonging to the hosage
object.

And the kernel pageout thread looks at this situation.  As long as the
system decides memory is needed, the pageout thread tries to free
pages, and what happens depends on exactly how much memory is free: 

1) There are more than 15 pages free.  In that case, the pageout
   daemon hands off the first inactive pages it finds; odds are these
   are almost all pages from the hosage object, and so we toss a bunch
   at the filesystem pager.  Now the kernel is studiously careful to
   avoid swamping the filesystem pager, so it might not page out as
   many as it could, but given that there is a huge demand for pages
   (because dd is hosing them As Fast As It Can), we cannot reach a
   steady state here.  The number of free pages *will* drop below 15,
   because dd can hose pages much faster than the disk can write them
   out.

2) If there are between 10 and 15 pages free then the system stops
   letting most processes allocate pages.  dd stops now, and the
   filesystem pages a little--but not much--because the filesystem
   does need to allocate pages for things, including pageout.  The
   default pager, however, is still happy.  The pageout thread
   continues taking pages and trying to send them to the filesystem
   pager, however, and this takes a little memory.  In this state, the
   pageout thread is still sending pages to the filesystem, which is
   basically not able to run.  So while dd is no longer hosing new
   pages, the kernel uses up more memory queuing the pages to the
   filesystem pagers, and probably does so faster than the default
   pager can write pages to disk.  If you have no default pager, then
   you lose totally at this point.

3) When memory drops below 10 pages free, the kernel gives up on the
   filesystem pager entirely.  Now pages are paged only to the default
   pager.  Almost all the inactive pages are in the hosage object, but
   the kernel is smart: it just directly pages these pages into the
   swap partition.  At this point there is nothing much allocating
   memory; only the default pager and the kernel are allowed to, and
   they don't take that much.

4) If memory drops below 5 pages, then the pageout thread stops
   entirely to let the default pager catch up.  We don't get here
   much.

So the system quckly plummets to the less-than-10 free pages, at which
point the default pager pages out five.  Now there's memory to
allocate!  A little memory is quickly taken by the filesystem and the
hosage pageout process, and a little progress is made.  Memory drops.
Pages go to the swap partition.  The filesystem is now paging at a
crawl.  

Eventually when dd or ld or whatever finishes writing, the default
pager frees up enough memory for the filesystem to run freely, and it
does so, and begins writing the file at a reasonable speed and it
clears up.  

Now what to do about this problem?  For that, see part 2.

Thomas


Reply to: