Linux: Reviewing Suspend2

Nigel Cunningham submitted his suspend2 patches [story] to the lkml for review and inclusion into Andrew Morton [interview]'s -mm tree [story]. Jens Axboe summarized the current roadblocks to merging suspend2, "now I haven't followed the suspend2 vs swsusp debate very closely, but it seems to me that your biggest problem with getting this merged is getting consensus on where exactly this is going. Nobody wants two different suspend modules in the kernel. So there are two options - suspend2 is deemed the way to go, and it gets merged and replaces swsusp. Or the other way around - people like swsusp more, and you are doomed to maintain suspend2 outside the tree."

Greg KH pointed out that the current focus with swsusp is to move the functionality from the kernel into userspace, called uswsusp, "Pavel and others have a working implementation and are slowly moving toward adding all of the 'bright and shiny' features that is in suspend2 to it (encryption, progress screens, abort by pressing a key, etc.) so that there is no loss of functionality." Nigel countered that only some of swsusp is being moved to userland, adding, "and there _is_ loss of functionality - uswsusp still doesn't support writing a full image of memory, writing to multiple swap devices (partitions or files), or writing to ordinary files. They're getting the low hanging fruit, but when it comes to these parts of the problem, they're going to require either smoke and very good mirrors (eg the swap prefetching trick), or simply refuse to implement them." Pavel Machek, maintainer of swsusp and uswsusp, replied item by item to Nigel's list of suspend2 advantages noting that uswsusp now has or soon will have the same capabilities. It was further noted that the submitted patches will need to be consolidated into logical pieces and resubmitted for proper review.

From: Nigel Cunningham [email blocked]
To: Linux Kernel Mailing List [email blocked], suspend2-devel@lists.suspend2.net
Subject: Suspend2 - Request for review & inclusion in -mm
Date:	Tue, 27 Jun 2006 01:47:16 +1000

Hi all.

I'd like, at long last, to submit Suspend2 for review and inclusion in -mm.

All going well, I'll shortly be sending a number of sets of patches, which 
together represent the whole of suspend2 as it stands at the moment. Those of 
you who've looked at Suspend2 code before will see that there are far fewer 
changes outside of kernel/power than there have been in the past. In some 
cases, this is because we were early adopters of some functionality that has 
now been merged, and in others because better, less intrusive ways have been 
found of doing some things.

Some of the advantages of suspend2 over swsusp and uswsusp are:

- Speed (Asynchronous I/O and readahead for synchronous I/O)
- Well tested in a wide range of configurations
- Supports multiple swap partitions and files
- Supports writing to ordinary files and raw devices.
- Userspace helpers for user interface and storage management.
- Support for cancelling the suspend at any point while the image is being 
written (can be disabled)
- Can be configured and reconfigured without rebooting.
- Scripting support

I'm very much part-time on this, so please accept my apologies in advance if 
I'm slow in replying to responses.

A git tree is now available on kernel.org:

http://www.kernel.org/git/?p=linux/kernel/git/nigelc/suspend2-2.6.git;a=summary

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia



From: Pavel Machek [email blocked]
Subject: Re: Suspend2 - Request for review & inclusion in -mm
Date:	Tue, 27 Jun 2006 15:33:22 +0200

Hi!

> I'd like, at long last, to submit Suspend2 for review and inclusion in -mm.
> 
> All going well, I'll shortly be sending a number of sets of patches, which 
> together represent the whole of suspend2 as it stands at the moment. Those of 
> you who've looked at Suspend2 code before will see that there are far fewer 
> changes outside of kernel/power than there have been in the past. In some 
> cases, this is because we were early adopters of some functionality that has 
> now been merged, and in others because better, less intrusive ways have been 
> found of doing some things.
> 
> Some of the advantages of suspend2 over swsusp and uswsusp are:
> 
> - Speed (Asynchronous I/O and readahead for synchronous I/O)

uswsusp should be able to match suspend2's speed. It can do async I/O,
etc...

> - Well tested in a wide range of configurations
> - Supports multiple swap partitions and files

Doable in userspace with uswsusp.

> - Supports writing to ordinary files and raw devices.

Should be doable in userspace with uswsusp, too; I actually had raw
devices version at one point.

> - Userspace helpers for user interface and storage management.

Better put it completely in userspace :-).

> - Support for cancelling the suspend at any point while the image is being 
> written (can be disabled)

uswsusp does that... or did that at some point.

> - Can be configured and reconfigured without rebooting.

No problem for uswsusp.

> - Scripting support

What does that mean?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



From: Jens Axboe [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 09:59:06 +0200

On Tue, Jun 27 2006, Nigel Cunningham wrote:
> Hi.
> 
> On Tuesday 27 June 2006 17:05, Jens Axboe wrote:
> > On Tue, Jun 27 2006, Nigel Cunningham wrote:
> > > On Tuesday 27 June 2006 15:36, Jens Axboe wrote:
> > > > On Tue, Jun 27 2006, Nigel Cunningham wrote:
> > > > > On Tuesday 27 June 2006 07:20, Rafael J. Wysocki wrote:
> > > > > > On Monday 26 June 2006 18:54, Nigel Cunningham wrote:
> > > > > > > Add Suspend2 extent support. Extents are used for storing the
> > > > > > > lists of blocks to which the image will be written, and are
> > > > > > > stored in the image header for use at resume time.
> > > > > >
> > > > > > Could you please put all of the changes in kernel/power/extents.c
> > > > > > into one patch?  It's quite difficult to review them now, at least
> > > > > > for me.
> > > > >
> > > > > I spent a long time splitting them up because I was asked in previous
> > > > > iterations to break them into manageable chunks. How about if I were
> > > > > to email you the individual files off line, so as to not send the
> > > > > same amount again?
> > > >
> > > > Managable chunks means logical changes go together, one function per
> > > > diff is really extreme and unreviewable. Support for extents is one
> > > > logical change, so it's one patch. Unless of course you have to do some
> > > > preparatory patches, then you'd do those separately.
> > > >
> > > > I must admit I thought you were kidding when I read through this
> > > > extents patch series, having a single patch just adding includes!
> > >
> > > Sorry for fluffing it up. I'm pretty inexperienced, but I'm trying to
> > > follow CodingStyle and all the other advice. If I'd known I'd
> > > misunderstood what was wanted, I probably could have submitted this
> > > months ago. Oh well. Live and learn. What would you have me do at this
> > > point?
> >
> > Split up your patches differently, and not in so many steps. Ideally
> > each step should work and compile, with each introducing some sort of
> > functionality. Each patch should be reviewable on its own.
> 
> The difficulty I have there is that suspending to disk doesn't seem to
> me to be something where you can add a bit at a time like that. I do
> have proc entries that allow you to say "Just freeze the processes and
> prepare the metadata, then free it and exit" (freezer_test) and "Do
> everything but actually writing the image and doing the atomic copy,
> then exit (test_filter_speed), for diagnosing problems and tuning the
> configuration, but if I start were to start again with nothing, I'd
> only write the dynamic pageflags code to start with and submit it
> (giving you lib/dyn_pageflags.c and kernel/power/pageflags.c), then
> the refrigerator changes and the extent code and so on. I guess what
> I'm trying to say is that I'm not mutating swsusp into suspend2 here,
> and I don't think I can. Suspend2 is a reimplementation of swsusp, not
> a series of incremental modifications. It uses completely different
> methods for writing the image, storing the metadata and so on. Until
> recently, the only thing it shared with swsusp was the refrigerator
> and driver model calls, and even now the sharing of lowlevel code is
> only a tiny fraction of all that is done.

You can't split up what isn't composed of multiple things, of course. I
didn't review your patches (sorry), but if you have changes outside of
suspend2 itself, then you need to split these out. You could submit
those patches seperately.

Now I haven't followed the suspend2 vs swsusp debate very closely, but
it seems to me that your biggest problem with getting this merged is
getting consensus on where exactly this is going. Nobody wants two
different suspend modules in the kernel. So there are two options -
suspend2 is deemed the way to go, and it gets merged and replaces
swsusp. Or the other way around - people like swsusp more, and you are
doomed to maintain suspend2 outside the tree.

> Could I ask what might be a dumb question in this regard - why isn't
> Reiser4 going through the same process? Is it an indication that I
> shouldn't have submitted these patches and should have just asked
> Andrew to take Suspend2 into mm, or is there something different
> between Reiser4 and Suspend2 that I'm missing?

That's not a dumb question at all. reiser4 hasn't been merged for years,
so you probably don't want to look at that as an example :-)

reiser4 is pretty much a separate entity, so it doesn't make sense so
split that up much for submission. Core kernel changes (as always) need
to be split, of course.

Sorry I cannot be of more help.

-- 
Jens Axboe



From: Greg KH [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 01:12:52 -0700

On Tue, Jun 27, 2006 at 09:59:06AM +0200, Jens Axboe wrote:
> Now I haven't followed the suspend2 vs swsusp debate very closely, but
> it seems to me that your biggest problem with getting this merged is
> getting consensus on where exactly this is going. Nobody wants two
> different suspend modules in the kernel. So there are two options -
> suspend2 is deemed the way to go, and it gets merged and replaces
> swsusp. Or the other way around - people like swsusp more, and you are
> doomed to maintain suspend2 outside the tree.

Actually, there's a third option that is looking like the way forward,
doing all of this from userspace and having no suspend-to-disk in the
kernel tree at all.

Pavel and others have a working implementation and are slowly moving
toward adding all of the "bright and shiny" features that is in suspend2
to it (encryption, progress screens, abort by pressing a key, etc.) so
that there is no loss of functionality.

So I don't really see the future of suspend2 because of this...

thanks,

greg k-h



From: Jens Axboe [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 10:22:42 +0200

On Tue, Jun 27 2006, Greg KH wrote:
> On Tue, Jun 27, 2006 at 09:59:06AM +0200, Jens Axboe wrote:
> > Now I haven't followed the suspend2 vs swsusp debate very closely, but
> > it seems to me that your biggest problem with getting this merged is
> > getting consensus on where exactly this is going. Nobody wants two
> > different suspend modules in the kernel. So there are two options -
> > suspend2 is deemed the way to go, and it gets merged and replaces
> > swsusp. Or the other way around - people like swsusp more, and you are
> > doomed to maintain suspend2 outside the tree.
> 
> Actually, there's a third option that is looking like the way forward,
> doing all of this from userspace and having no suspend-to-disk in the
> kernel tree at all.

Yeah, but isn't that already in progress and swsusp being migrated that
way? So really option #2.

> Pavel and others have a working implementation and are slowly moving
> toward adding all of the "bright and shiny" features that is in suspend2
> to it (encryption, progress screens, abort by pressing a key, etc.) so
> that there is no loss of functionality.
> 
> So I don't really see the future of suspend2 because of this...

Well, it sure looks slim..

-- 
Jens Axboe



From: Nigel Cunningham [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 18:58:17 +1000

Hi.

On Tuesday 27 June 2006 18:12, Greg KH wrote:
> On Tue, Jun 27, 2006 at 09:59:06AM +0200, Jens Axboe wrote:
> > Now I haven't followed the suspend2 vs swsusp debate very closely, but
> > it seems to me that your biggest problem with getting this merged is
> > getting consensus on where exactly this is going. Nobody wants two
> > different suspend modules in the kernel. So there are two options -
> > suspend2 is deemed the way to go, and it gets merged and replaces
> > swsusp. Or the other way around - people like swsusp more, and you are
> > doomed to maintain suspend2 outside the tree.
>
> Actually, there's a third option that is looking like the way forward,
> doing all of this from userspace and having no suspend-to-disk in the
> kernel tree at all.
>
> Pavel and others have a working implementation and are slowly moving
> toward adding all of the "bright and shiny" features that is in suspend2
> to it (encryption, progress screens, abort by pressing a key, etc.) so
> that there is no loss of functionality.
>
> So I don't really see the future of suspend2 because of this...

But what Rafael and Pavel are doing is really only moving the highest level of 
controlling logic to userspace (ok, and maybe compression and encryption 
too). Everything important (freezing other processes, atomic copy and the 
guts of the I/O) is still done by the kernel.

And there _is_ loss of functionality - uswsusp still doesn't support writing a 
full image of memory, writing to multiple swap devices (partitions or files), 
or writing to ordinary files. They're getting the low hanging fruit, but when 
it comes to these parts of the problem, they're going to require either smoke 
and very good mirrors (eg the swap prefetching trick), or simply refuse to 
implement them.

If we take the problem one step further, and begin to think about 
checkpointing, they're in even bigger trouble. I'll freely admit that I'd 
have to redesign the way I store data so that random parts of the image could 
be replaced, have hooks in mm to be able to learn what pages need have 
changed and would also need filesystem support to handle that part of the 
problem, but I'd at least be working in the right domain.

I don't want to demean Rafael and Pavels' work for a moment. I've benefited 
from Pavel's help of Gabor in the beginning, and a little bit since he forked 
and merged in 2.5. But it seems to me that uswsusp is a short trip down a 
dead end road. It just doesn't have a future beyond being an interesting hack 
that proves you can safely run a program in userspace while snapshotting. 
Suspending to disk belongs in the kernel. That is shown clearly by the fact 
that uswsusp continues to use kernel code to do the really critical tasks, 
rather than being some super privileged userspace program that does them 
itself from userspace.

Regards,

Nigel


From: Greg KH [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 00:06:09 -0700

On Tue, Jun 27, 2006 at 03:39:26PM +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Tuesday 27 June 2006 15:36, Jens Axboe wrote:
> > On Tue, Jun 27 2006, Nigel Cunningham wrote:
> > > Hi.
> > >
> > > On Tuesday 27 June 2006 07:20, Rafael J. Wysocki wrote:
> > > > On Monday 26 June 2006 18:54, Nigel Cunningham wrote:
> > > > > Add Suspend2 extent support. Extents are used for storing the lists
> > > > > of blocks to which the image will be written, and are stored in the
> > > > > image header for use at resume time.
> > > >
> > > > Could you please put all of the changes in kernel/power/extents.c into
> > > > one patch? ?It's quite difficult to review them now, at least for me.
> > >
> > > I spent a long time splitting them up because I was asked in previous
> > > iterations to break them into manageable chunks. How about if I were to
> > > email you the individual files off line, so as to not send the same
> > > amount again?
> >
> > Managable chunks means logical changes go together, one function per
> > diff is really extreme and unreviewable. Support for extents is one
> > logical change, so it's one patch. Unless of course you have to do some
> > preparatory patches, then you'd do those separately.
> >
> > I must admit I thought you were kidding when I read through this extents
> > patch series, having a single patch just adding includes!
> 
> Sorry for fluffing it up. I'm pretty inexperienced, but I'm trying to follow 
> CodingStyle and all the other advice. If I'd known I'd misunderstood what was 
> wanted, I probably could have submitted this months ago. Oh well. Live and 
> learn. What would you have me do at this point?

Please break things up into logical steps to solve the problem, and try
it again.

Oh, and as a meta-comment, why /proc?  You know that's not acceptable,
right?

thanks,

greg k-h



From: Nigel Cunningham [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 17:27:30 +1000

Hi.

On Tuesday 27 June 2006 17:06, Greg KH wrote:
> Oh, and as a meta-comment, why /proc?  You know that's not acceptable,
> right?

Partly because when I did consider switching to /sys, I found it to be 
incomprehensible (even with the LWN articles and Documentation/ files). 
Jonathan's articles and LCA presentation did help me start to get a better 
grip, but then it just didn't seem to be worth the effort. I have two simple 
relatively simple routines that handle all my proc entries at the moment, so 
that adding a new entry is just a matter of adding an element in an array of 
structs (saying what variable is being read/written, what type, min/max 
values and side effect routines, eg). It looked to me like changing to sysfs 
was going to require me to have a separate routine for every sysfs entry, 
even though they'd all have those some basic features. Maybe I'm just 
ignorant. Please tell me I am and point me in the right direction.

Regards,

Nigel
-- 
See http://www.suspend2.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.



From: Greg KH [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 00:53:41 -0700

On Tue, Jun 27, 2006 at 05:27:30PM +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Tuesday 27 June 2006 17:06, Greg KH wrote:
> > Oh, and as a meta-comment, why /proc?  You know that's not acceptable,
> > right?
> 
> Partly because when I did consider switching to /sys, I found it to be 
> incomprehensible (even with the LWN articles and Documentation/ files). 
> Jonathan's articles and LCA presentation did help me start to get a better 
> grip, but then it just didn't seem to be worth the effort. I have two simple 
> relatively simple routines that handle all my proc entries at the moment, so 
> that adding a new entry is just a matter of adding an element in an array of 
> structs (saying what variable is being read/written, what type, min/max 
> values and side effect routines, eg). It looked to me like changing to sysfs 
> was going to require me to have a separate routine for every sysfs entry, 
> even though they'd all have those some basic features. Maybe I'm just 
> ignorant. Please tell me I am and point me in the right direction.

Well, as your stuff does not have anything to do with "processes",
putting it in /proc is not acceptable.

sysfs is one value per file, and if that matches up to what you need,
then it should be fine to use.

You do need to have some kind of function for every sysfs entry, but you
can group common ones together (as the hwmon drivers do.)

As you will not have a backing "device" to attach your files to, you
will probably need to deal with "raw" kobjects, and the learning curve
for how to create files in sysfs with them is unfortunatly a bit steep.
But there is lots of working examples in the kernel that do this (block
devices, md, driver core, etc.), there's plenty of code to copy from to
get it to work.

And if that doesn't look like fun, you can always just use create a new
filesystem (only 200 lines of code), or use debugfs.

good luck,

greg k-h



From: Nigel Cunningham [email blocked]
Subject: Re: [Suspend2][ 0/9] Extents support.
Date:	Tue, 27 Jun 2006 19:08:46 +1000

Hi.

On Tuesday 27 June 2006 17:53, Greg KH wrote:
> Well, as your stuff does not have anything to do with "processes",
> putting it in /proc is not acceptable.
>
> sysfs is one value per file, and if that matches up to what you need,
> then it should be fine to use.

It does.

> You do need to have some kind of function for every sysfs entry, but you
> can group common ones together (as the hwmon drivers do.)

Ok. I'll take a look.

> As you will not have a backing "device" to attach your files to, you
> will probably need to deal with "raw" kobjects, and the learning curve
> for how to create files in sysfs with them is unfortunatly a bit steep.
> But there is lots of working examples in the kernel that do this (block
> devices, md, driver core, etc.), there's plenty of code to copy from to
> get it to work.
>
> And if that doesn't look like fun, you can always just use create a new
> filesystem (only 200 lines of code), or use debugfs.
>
> good luck,

Ok. I'll give it a go. Thanks for the pointers, Greg.

Regards,

Nigel
-- 
See http://www.suspend2.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


From: Nigel Cunningham [email blocked]
Subject: [Suspend2][ 3/4] [Suspend2] Suspend2 internals document.
Date:	Tue, 27 Jun 2006 02:45:58 +1000

Add Documentation/power/internals.txt, describing how suspend2 works.

Signed-off-by: Nigel Cunningham [email blocked]

 Documentation/power/internals.txt |  362 +++++++++++++++++++++++++++++++++++++
 1 files changed, 362 insertions(+), 0 deletions(-)

diff --git a/Documentation/power/internals.txt b/Documentation/power/internals.txt
new file mode 100644
index 0000000..424cab0
--- /dev/null
+++ b/Documentation/power/internals.txt
@@ -0,0 +1,362 @@
+		Software Suspend 2.2 Internal Documentation.
+				Version 1
+
+1.  Introduction.
+
+    Software Suspend 2.2 is an addition to the Linux Kernel, designed to
+    allow the user to quickly shutdown and quickly boot a computer, without
+    needing to close documents or programs. It is equivalent to the
+    hibernate facility in some laptops. This implementation, however,
+    requires no special BIOS or hardware support.
+
+    The code in these files is based upon the original implementation
+    prepared by Gabor Kuti and additional work by Pavel Machek and a
+    host of others. This code has been substantially reworked by Nigel
+    Cunningham, again with the help and testing of many others, not the
+    least of whom is Michael Frank, At its heart, however, the operation is
+    essentially the same as Gabor's version.
+
+2.  Overview of operation.
+
+    The basic sequence of operations is as follows:
+
+	a. Quiesce all other activity.
+	b. Ensure enough memory and storage space are available, and attempt
+	   to free memory/storage if necessary.
+	c. Allocate the required memory and storage space.
+	d. Write the image.
+	e. Power down.
+
+    There are a number of complicating factors which mean that things are
+    not as simple as the above would imply, however...
+
+    o The activity of each process must be stopped at a point where it will
+    not be holding locks necessary for saving the image, or unexpectedly
+    restart operations due to something like a timeout and thereby make
+    our image inconsistent.
+
+    o It is desirous that we sync outstanding I/O to disk before calculating
+    image statistics. This reduces corruption if one should suspend but
+    then not resume, and also makes later parts of the operation safer (see
+    below).
+
+    o We need to get as close as we can to an atomic copy of the data.
+    Inconsistencies in the image will result in inconsistent memory contents at
+    resume time, and thus in instability of the system and/or file system
+    corruption. This would appear to imply a maximum image size of one half of
+    the amount of RAM, but we have a solution... (again, below).
+
+    o In 2.6, we choose to play nicely with the other suspend-to-disk
+    implementations.
+
+3.  Detailed description of internals.
+
+    a. Quiescing activity.
+
+    Safely quiescing the system is achieved using two methods.
+
+    First, we note that the vast majority of processes don't need to run during
+    suspend. They can be 'frozen'. We therefore implement a refrigerator
+    routine, which processes enter and in which they remain until the cycle is
+    complete. Processes enter the refrigerator via try_to_freeze() invocations
+    at appropriate places.  A process cannot be frozen in any old place. It
+    must not be holding locks that will be needed for writing the image or
+    freezing other processes. For this reason, userspace processes generally
+    enter the refrigerator via the signal handling code, and kernel threads at
+    the place in their event loops where they drop locks and yield to other
+    processes or sleep.
+
+    The second part of our method for quisescing the system involves freezing
+    the filesystems. We use the standard freeze_bdev and thaw_bdev functions to
+    ensure that all of the user's data is synced to disk before we begin to
+    write the image.
+
+    Quiescing the system works most quickly and reliably when we add one more
+    element to the algorithm: separating the freezing of userspace processes
+    from the freezing of kernel space processes, and doing the filesystem freeze
+    in between. The filesystem freeze needs to be done while kernel threads such
+    as kjournald can still run.At the same time, though, everything will be less
+    racy and run more quickly if we stop userspace submitting more I/O work
+    while we're trying to quiesce.
+
+    Quiescing the system is therefore done in three steps:
+	- Freeze userspace
+	- Freeze filesystems
+	- Freeze kernel threads
+
+    If we need to free memory, we thaw kernel threads and filesystems, but not
+    userspace. We can then free caches without worrying about deadlocks due to
+    swap files being on frozen filesystems or such like.
+
+    b. Ensure enough memory & storage are available.
+
+    We have a number of constraints to meet to be able to successfully suspend
+    and resume.
+
+    First, the image will be written in two parts, described below. One of these
+    parts needs to have an atomic copy made, which of course implies a maximum
+    size of one half of the amount of system memory. The other part ('pageset')
+    is not atomically copied, and can therefore be as large or small as desired.
+
+    Second, we have constraints on the amount of storage available. In these
+    calculations, we may also consider any compression that will be done. The
+    cryptoapi module allows the user to configure an expected compression ratio.
+   
+    Third, the user can specify an arbitrary limit on the image size, in
+    megabytes. This limit is treated as a soft limit, so that we don't fail the
+    attempt to suspend if we cannot meet this constraint.
+
+    c. Allocate the required memory and storage space.
+
+    Having done the initial freeze, we determine whether the above constraints
+    are met, and seek to allocate the metadata for the image. If the constraints
+    are not met, or we fail to allocate the required space for the metadata, we
+    seek to free the amount of memory that we calculate is needed and try again.
+    We allow up to four iterations of this loop before aborting the cycle. If we
+    do fail, it should only be because of a bug in Suspend's calculations.
+    
+    These steps are merged together in the prepare_image function, found in
+    prepare_image.c. The functions are merged because of the cyclical nature
+    of the problem of calculating how much memory and storage is needed. Since
+    the data structures containing the information about the image must
+    themselves take memory and use storage, the amount of memory and storage
+    required changes as we prepare the image. Since the changes are not large,
+    only one or two iterations will be required to achieve a solution.
+
+    d. Write the image.
+
+    We previously mentioned the need to create an atomic copy of the data, and
+    the half-of-memory limitation that is implied in this. This limitation is
+    circumvented by dividing the memory to be saved into two parts, called
+    pagesets.
+
+    Pageset2 contains the page cache - the pages on the active and inactive
+    lists. These pages are saved first and reloaded last. While saving these
+    pages, the swapwriter module carefully ensures that the work of writing
+    the pages doesn't make the image inconsistent. Pages added to the LRU
+    lists are immediately shot down, and careful accounting for available
+    memory aids debugging. No atomic copy of these pages needs to be made.
+
+    Writing the image requires memory, of course, and at this point we have
+    also not yet suspended the drivers. To avoid the possibility of remaining
+    activity corrupting the image, we allocate a special memory pool. Calls
+    to __alloc_pages and __free_pages_ok are then diverted to use our memory
+    pool. Pages in the memory pool are saved as part of pageset1 regardless of
+    whether or not they are used.
+
+    Once pageset2 has been saved, we suspend the drivers and save the CPU
+    context before making an atomic copy of pageset1, resuming the drivers
+    and saving the atomic copy. After saving the two pagesets, we just need to
+    save our metadata before powering down.
+
+    Having saved pageset2 pages, we can safely overwrite their contents with
+    the atomic copy of pageset1. This is how we manage to overcome the half of
+    memory limitation. Pageset2 is normally far larger than pageset1, and
+    pageset1 is normally much smaller than half of the memory, with the result
+    that pageset2 pages can be safely overwritten with the atomic copy of
+    pageset1. This is where we need to be careful about syncing, however.
+    Pageset2 will probably contain filesystem meta data. If this is overwritten
+    with pageset1 and then a sync occurs, the filesystem will be corrupted -
+    at least until resume time and another sync of the restored data. Since
+    there is a possibility that the user might not resume or (may it never be!)
+    that suspend might oops, we do our utmost to avoid syncing filesystems after
+    copying pageset1.
+
+    e. Power down.
+
+    Powering down uses standard kernel routines. Prior to this, however, we
+    suspend drivers again, ensuring that write caches are flushed.
+
+4.  The method of writing the image.
+
+    Suspend2 contains an internal API which is designed to simplify the
+    implementation of new methods of transforming the image to be written and
+    writing the image itself. In early versions of Suspend2, compression support
+    was inlined in the image writing code, and the data structures and code for
+    managing swap were intertwined with the rest of the code. A number of people
+    had expressed interest in implementing image encryption, and alternative
+    methods of storing the image. This internal API makes that possible by
+    implementing 'modules'.
+
+    A module is a single file which encapsulates the functionality needed
+    to transform a pageset of data (encryption or compression, for example),
+    or to write the pageset to a device. The former type of module is called
+    a 'page-transformer', the later a 'writer'.
+
+    Modules are linked together in pipeline fashion. There may be zero or more
+    page transformers in a pipeline, and there is always exactly one writer.
+    The pipeline follows this pattern:
+
+		---------------------------------
+		|          Suspend2 Core        |
+		---------------------------------
+				|
+				|
+		---------------------------------
+		|	Page transformer 1	|
+		---------------------------------
+				|
+				|
+		---------------------------------
+		|	Page transformer 2	|
+		---------------------------------
+				|
+				|
+		---------------------------------
+		|            Writer		|
+		---------------------------------
+
+    During the writing of an image, the core code feeds pages one at a time
+    to the first module. This module performs whatever transformations it
+    implements on the incoming data, completely consuming the incoming data and
+    feeding output in a similar manner to the next module. A module may buffer
+    its output.
+
+    During reading, the pipeline works in the reverse direction. The core code
+    calls the first module with the address of a buffer which should be filled.
+    (Note that the buffer size is always PAGE_SIZE at this time). This module
+    will in turn request data from the next module and so on down until the
+    writer is made to read from the stored image.
+
+    Part of definition of the structure of a module thus looks like this:
+
+        int (*rw_init) (int rw, int stream_number);
+        int (*rw_cleanup) (int rw);
+        int (*write_chunk) (struct page *buffer_page);
+        int (*read_chunk) (struct page *buffer_page, int sync);
+
+    It should be noted that the _cleanup routine may be called before the
+    full stream of data has been read or written. While writing the image,
+    the user may (depending upon settings) choose to abort suspending, and
+    if we are in the midst of writing the last portion of the image, a portion
+    of the second pageset may be reread.
+
+    In addition to the above routines for writing the data, all modules have a
+    number of other routines:
+
+    TYPE indicates whether the module is a page transformer or a writer.
+    #define TRANSFORMER_MODULE 1
+    #define WRITER_MODULE 2
+
+    NAME is the name of the module, used in generic messages.
+
+    MODULE_LIST is used to link the module into the list of all modules.
+
+    MEMORY_NEEDED returns the number of pages of memory required by the module
+    to do its work.
+
+    STORAGE_NEEDED returns the number of pages in the suspend header required
+    to store the module's configuration data.
+
+    PRINT_DEBUG_INFO fills a buffer with information to be displayed about the
+    operation or settings of the module.
+
+    SAVE_CONFIG_INFO returns a buffer of PAGE_SIZE or smaller (the size is the
+    return code), containing the module's configuration info. This information
+    will be written in the image header and restored at resume time. Since this
+    buffer is allocated after the atomic copy of the kernel is made, you don't
+    need to worry about the buffer being freed.
+
+    LOAD_CONFIG_INFO gives the module a pointer to the the configuration info
+    which was saved during suspending. Once again, the module doesn't need to
+    worry about freeing the buffer. The kernel will be overwritten with the
+    original kernel, so no memory leak will occur.
+
+    OPS contains the operations specific to transformers and writers. These are
+    described below.
+
+    The complete definition of struct suspend_module_ops is:
+
+	struct suspend_module_ops {
+	        /* Functions common to all modules */
+	        int type;
+	        char *name;
+	        struct module *module;
+	        int disabled;
+	        struct list_head module_list;
+
+	        /* List of filters or writers */
+	        struct list_head list, type_list;
+
+	        /*
+	         * Requirements for memory and storage in
+	         * the image header..
+	         */
+	        unsigned long (*memory_needed) (void);
+	        unsigned long (*storage_needed) (void);
+
+	        /*
+	         * Debug info
+	         */
+	        int (*print_debug_info) (char *buffer, int size);
+	        int (*save_config_info) (char *buffer);
+	        void (*load_config_info) (char *buffer, int len);
+
+	        /*
+	         * Initialise & cleanup - general routines called
+	         * at the start and end of a cycle.
+	         */
+	        int (*initialise) (int starting_cycle);
+	        void (*cleanup) (int finishing_cycle);
+
+	        /*
+	         * Calls for allocating storage (writers only).
+	         *
+	         * Header space is allocated separately. Note that allocation
+	         * of space for the header might result in allocated space
+	         * being stolen from the main pool if there is no unallocated
+	         * space. We have to be able to allocate enough space for
+	         * the header. We can eat memory to ensure there is enough
+	         * for the main pool.
+	         */
+
+	        int (*storage_available) (void);
+	        int (*allocate_header_space) (int space_requested);
+	        int (*allocate_storage) (int space_requested);
+	        int (*storage_allocated) (void);
+	        int (*release_storage) (void);
+
+	        /*
+	         * Routines used in image I/O.
+	         */
+	        int (*rw_init) (int rw, int stream_number);
+	        int (*rw_cleanup) (int rw);
+	        int (*write_chunk) (struct page *buffer_page);
+	        int (*read_chunk) (struct page *buffer_page, int sync);
+
+	        /* Reset module if image exists but reading aborted */
+	        void (*noresume_reset) (void);
+
+	        /* Read and write the metadata */
+	        int (*write_header_init) (void);
+	        int (*write_header_cleanup) (void);
+
+	        int (*read_header_init) (void);
+	        int (*read_header_cleanup) (void);
+
+	        int (*rw_header_chunk) (int rw, char *buffer_start, int buffer_size);
+
+	        /* Attempt to parse an image location */
+	        int (*parse_sig_location) (char *buffer, int only_writer);
+
+	        /* Determine whether image exists that we can restore */
+	        int (*image_exists) (void);
+
+	        /* Mark the image as having tried to resume */
+	        void (*mark_resume_attempted) (void);
+
+	        /* Destroy image if one exists */
+	        int (*invalidate_image) (void);
+	};
+
+
+	Expected compression returns the expected ratio between the amount of
+	data sent to this module and the amount of data it passes to the next
+	module. The value is used by the core code to calculate the amount of
+	space required to write the image. If the ratio is not achieved, the
+	writer will complain when it runs out of space with data still to
+	write, and the core code will abort the suspend.
+
+	transformer_list links together page transformers, in the order in
+	which they register, which is in turn determined by order in the
+	Makefile.

--
Nigel Cunningham		nigel at suspend2 dot net



From: Nigel Cunningham [email blocked]
Subject: [Suspend2][ 4/4] [Suspend2] User documentation
Date:	Tue, 27 Jun 2006 02:46:01 +1000

Add documentation/power/suspend2.txt, containing information for users on
using Suspend2.

Signed-off-by: Nigel Cunningham [email blocked]

 Documentation/power/suspend2.txt |  673 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 673 insertions(+), 0 deletions(-)

diff --git a/Documentation/power/suspend2.txt b/Documentation/power/suspend2.txt
new file mode 100644
index 0000000..54b18a9
--- /dev/null
+++ b/Documentation/power/suspend2.txt
@@ -0,0 +1,673 @@
+	--- Suspend2, version 2.2 ---
+
+1.  What is it?
+2.  Why would you want it?
+3.  What do you need to use it?
+4.  Why not just use the version already in the kernel?
+5.  How do you use it?
+6.  What do all those entries in /proc/suspend2 do?
+7.  How do you get support?
+8.  I think I've found a bug. What should I do?
+9.  When will XXX be supported?
+10  How does it work?
+11. Who wrote Suspend2?
+
+1. What is it?
+
+   Imagine you're sitting at your computer, working away. For some reason, you
+   need to turn off your computer for a while - perhaps it's time to go home
+   for the day. When you come back to your computer next, you're going to want
+   to carry on where you left off. Now imagine that you could push a button and
+   have your computer store the contents of its memory to disk and power down.
+   Then, when you next start up your computer, it loads that image back into
+   memory and you can carry on from where you were, just as if you'd never
+   turned the computer off. Far less time to start up, no reopening
+   applications and finding what directory you put that file in yesterday.
+   That's what Suspend2 does.
+
+   Suspend2 has a long heritage. It began life as work by Gabor Kuti, who,
+   with some help from Pavel Machek, got an early version going in 1999. The
+   project was then taken over by Florent Chabaud while still in alpha version
+   numbers. Nigel Cunningham came on the scene when Florent was unable to
+   continue, moving the project into betas, then 1.0, 2.0 and so on up to
+   the present 2.2 series. Pavel Machek's swsusp code, which was merged around
+   2.5.17 retains the original name, and was essentially a fork of the beta
+   code until Rafael Wysocki came on the scene in 2005 and began to improve it
+   further.
+
+2. Why would you want it?
+
+   Why wouldn't you want it?
+   
+   Being able to save the state of your system and quickly restore it improves
+   your productivity - you get a useful system in far less time than through
+   the normal boot process.
+   
+3. What do you need to use it?
+
+   a. Kernel Support.
+
+   i) The Suspend2 patch.
+   
+   Suspend2 is part of the Linux Kernel. This version is not part of Linus's
+   2.6 tree at the moment, so you will need to download the kernel source and
+   apply the latest patch. Having done that, enable the appropriate options in
+   make [menu|x]config (under General Setup), compile and install your kernel.
+   Suspend2 works with SMP, Highmem, preemption, x86-32, PPC and x86_64.
+
+   Suspend2 patches are available from http://suspend2.net.
+
+   ii) Compression and encryption support.
+
+   Compression and encryption support are implemented via the
+   cryptoapi. You will therefore want to select any Cryptoapi transforms that
+   you want to use on your image from the Cryptoapi menu while configuring
+   your kernel.
+
+   You can also tell Suspend to write it's image to an encrypted and/or
+   compressed filesystem/swap partition. In that case, you don't need to do
+   anything special for Suspend2 when it comes to kernel configuration.
+
+   iii) Configuring other options.
+
+   While you're configuring your kernel, try to configure as much as possible
+   to build as modules. We recommend this because there are a number of drivers
+   that are still in the process of implementing proper power management
+   support. In those cases, the best way to work around their current lack is
+   to build them as modules and remove the modules while suspending. You might
+   also bug the driver authors to get their support up to speed, or even help!
+
+   b. Storage.
+
+   i) Swap.
+
+   Suspend2 can store the suspend image in your swap partition, a swap file or
+   a combination thereof. Whichever combination you choose, you will probably
+   want to create enough swap space to store the largest image you could have,
+   plus the space you'd normally use for swap. A good rule of thumb would be
+   to calculate the amount of swap you'd want without using Suspend2, and then
+   add the amount of memory you have. This swapspace can be arranged in any way
+   you'd like. It can be in one partition or file, or spread over a number. The
+   only requirement is that they be active when you start a suspend cycle.
+   
+   There is one exception to this requirement. Suspend2 has the ability to turn
+   on one swap file or partition at the start of suspending and turn it back off
+   at the end. If you want to ensure you have enough memory to store a image
+   when your memory is fully used, you might want to make one swap partition or
+   file for 'normal' use, and another for Suspend2 to activate & deactivate
+   automatically. (Further details below).
+
+   ii) Normal files.
+
+   Suspend2 includes a 'filewriter'. The filewriter can store
+   your image in a simple file. Since Linux has the idea of everything being
+   a file, this is more powerful than it initially sounds. If, for example,
+   you were to set up a network block device file, you could suspend to a
+   network server. This has been tested and works to a point, but nbd itself
+   isn't stateless enough for our purposes.
+
+   Take extra care when setting up the filewriter. If you just type commands
+   without thinking and then try to suspend, you could cause irreversible
+   corruption on your filesystems! Make sure you have backups. Also, because
+   the filewriter is comparatively new, it's not as well tested as the
+   swapwriter. Be aware that there may be bugs that could cause damage to your
+   data even if you are careful! You have been warned!
+
+   Most people will only want to suspend to a local file. To achieve that, do
+   something along the lines of:
+
+   echo "Suspend2" > /suspend-file
+   dd if=/dev/zero bs=1M count=512 >> suspend-file
+
+   This will create a 512MB file called /suspend-file. To get Suspend2 to use
+   it:
+
+   echo /suspend-file > /proc/suspend2/filewriter_target
+
+   Then
+
+   cat /proc/suspend2/resume2
+
+   Put the results of this into your bootloader's configuration (see also step
+   C, below:
+
+   ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
+   # cat /proc/suspend2/resume2
+   file:/dev/hda2:0x1e001
+   
+   In this example, we would edit the append= line of our lilo.conf|menu.lst
+   so that it included:
+
+   resume2=file:/dev/hda2:0x1e001
+   ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE---
+ 
+   For those who are thinking 'Could I make the file sparse?', the answer is
+   'No!'. At the moment, there is no way for Suspend2 to fill in the holes in
+   a sparse file while suspending. In the longer term (post merge!), I'd like
+   to change things so that the file could be dynamically resized as needed.
+   Right now, however, that's not possible.
+
+   c. Bootloader configuration.
+   
+   Using Suspend2 also requires that you add an extra parameter to 
+   your lilo.conf or equivalent. Here's an example for a swap partition:
+
+   append="resume2=swap:/dev/hda1"
+
+   This would tell Suspend2 that /dev/hda1 is a swap partition you 
+   have. Suspend2 will use the swap signature of this partition as a
+   pointer to your data when you suspend. This means that (in this example)
+   /dev/hda1 doesn't need to be _the_ swap partition where all of your data
+   is actually stored. It just needs to be a swap partition that has a
+   valid signature.
+
+   You don't need to have a swap partition for this purpose. Suspend2
+   can also use a swap file, but usage is a little more complex. Having made
+   your swap file, turn it on and do 
+
+   cat /proc/suspend2/headerlocations
+
+   (this assumes you've already compiled your kernel with Suspend2
+   support and booted it). The results of the cat command will tell you
+   what you need to put in lilo.conf:
+
+   For swap partitions like /dev/hda1, simply use resume2=/dev/hda1.
+   For swapfile `swapfile`, use resume2=swap:/dev/hda2:0x242d@4096.
+
+   If the swapfile changes for any reason (it is moved to a different
+   location, it is deleted and recreated, or the filesystem is
+   defragmented) then you will have to check
+   /proc/suspend2/headerlocations for a new resume_block value.
+
+   Once you've compiled and installed the kernel, adjusted your lilo.conf
+   and rerun lilo, you should only need to reboot for the most basic part
+   of Suspend2 to be ready.
+
+   If you only compile in the swapwriter, or only compile in the filewriter,
+   you don't need to add the "swap:" part of the resume2= parameters above.
+   resume2=/dev/hda2:0x242d@4096 will work just as well.
+
+   d. The hibernate script.
+
+   Since the driver model in 2.6 kernels is still being developed, you may need
+   to do more, however. Users of Suspend2 usually start the process via a script
+   which prepares for the suspend, tells the kernel to do its stuff and then
+   restore things afterwards. This script might involve:
+
+   - Switching to a text console and back if X doesn't like the video card
+     status on resume.
+   - Un/reloading PCMCIA support since it doesn't play well with suspend.
+  
+   Note that you might not be able to unload some drivers if there are 
+   processes using them. You might have to kill off processes that hold
+   devices open. Hint: if your X server accesses an USB mouse, doing a
+   'chvt' to a text console releases the device and you can unload the
+   module.
+
+   Check out the latest script (available on suspend2.net).
+   
+4. Why not just use the version already in the kernel?
+
+   The version in the vanilla kernel has a number of drawbacks. Among these:
+	- it has a maximum image size of 1/2 total memory.
+	- it doesn't allocate storage until after it has snapshotted memory.
+	  This means that you can't be sure suspending will work until you
+	  see it start to write the image.
+	- it performs all of it's I/O synchronously.
+	- it does not allow you to press escape to cancel a cycle
+	- it does not allow you to automatically swapon a file when
+	  starting a cycle.
+	- it does not allow you to use multiple swap partitions.
+	- it does not allow you to use swapfiles.
+	- it does not allow you to use ordinary files.
+	- it just invalidates an image and continues to boot if you
+	  accidentally boot the wrong kernel after suspending.
+	- it doesn't support any sort of nice display while suspending
+	- it is moving toward requiring that you have an initrd/initramfs
+	  to ever have a hope of resuming.
+
+5. How do you use it?
+
+   Once your script is properly set up, you should just be able to start it
+   and everything should go like clockwork. Of course things aren't always
+   that easy out of the box.
+
+   Check out (in the kernel source tree) include/linux/suspend2.h for
+   settings you can use to get detailed information about what suspend is doing.
+   The kernel parameters suspend_act, suspend_dbg and suspend_lvl allow you to
+   set the action and debugging parameters prior to starting a suspend and/or
+   at the lilo prompt before resuming. There is also a nice little program that
+   should be available from suspend2.net which makes it easier to turn these
+   debugging settings on and off. Note that to get any debugging output, you
+   need to enable CONFIG_PM_DEBUG when compiling the kernel.
+
+   A neat feature of Suspend2 is that you can press Escape at any time
+   during suspending, and the process will be aborted.
+   
+   Due to the way suspend works, this means you'll have your system back and
+   perfectly usable almost instantly. The only exception is when it's at
+   the very end of writing the image. Then it will need to reload a small
+   (usually 4-50MBs, depending upon the image characteristics) portion first.
+
+   If you run into problems with resuming, adding the "noresume2" option to
+   the kernel command line will let you skip the resume step and recover your
+   system.
+
+6. What do all those entries in /proc/suspend2 do?
+
+   /proc/suspend2 is the directory which contains files you can use to
+   tune and configure Suspend2 to your liking. The exact contents of
+   the directory will depend upon the version of Suspend2 you're
+   running and the options you selected at compile time. In the following
+   descriptions, names in brackets refer to compile time options.
+   (Note that they're all dependant upon you having selected CONFIG_SUSPEND2
+   in the first place!)
+
+   Since the values of these settings can open potential security risks, they
+   are usually accessible only to the root user. You can, however, enable a
+   compile time option which makes all of these files world-accessible. This
+   should only be done if you trust everyone with shell access to this
+   computer!
+  
+   - debug_info:
+  
+   This file returns information about your configuration that may be helpful
+   in diagnosing problems with suspending.
+
+   - debug_sections (CONFIG_PM_DEBUG):
+
+   This value, together with the console log level, controls what debugging
+   information is displayed. The console log level determines the level of
+   detail, and this value determines what detail is displayed. This value is
+   a bit vector, and the meaning of the bits can be found in the kernel tree
+   in include/linux/suspend2.h. It can be overridden using the kernel's
+   command line option suspend_dbg.
+
+   - default_console_level (CONFIG_PM_DEBUG):
+
+   This determines the value of the console log level at the start of a
+   suspend cycle. If debugging is compiled in, the console log level can be
+   changed during a cycle by pressing the digit keys. Meanings are:
+
+   0: Nice display.
+   1: Nice display plus numerical progress.
+   2: Errors only.
+   3: Low level debugging info.
+   4: Medium level debugging info.
+   5: High level debugging info.
+   6: Verbose debugging info.
+
+   This value can be overridden using the kernel command line option 
+   suspend_lvl.
+
+   - disable_*
+
+   This option can be used to temporarily disable various parts of suspend.
+   Note that these flags can be set by restoring all_settings: If the saved
+   settings don't include any information about how a part of suspend should
+   be configured, that section will be disabled.
+
+   - do_resume:
+
+   When anything is written to this file suspend will attempt to read and
+   restore an image. If there is no image, it will return almost immediately.
+   If an image exists, the echo > will never return. Instead, the original
+   kernel context will be restored and the original echo > do_suspend will
+   return.
+
+   - do_suspend:
+
+   When anything is written to this file, the kernel side of Suspend2 will
+   begin to attempt to write an image to disk and power down. You'll normally
+   want to run the hibernate script instead, to get modules unloaded first.
+
+   - enable_escape:
+
+   Setting this to "1" will enable you abort a suspend by
+   pressing escape, "0" (default) disables this feature. Note that enabling
+   this option means that you cannot initiate a suspend and then walk away
+   from your computer, expecting it to be secure. With feature disabled,
+   you can validly have this expectation once Suspend begins to write the
+   image to disk. (Prior to this point, it is possible that Suspend might
+   about because of failure to freeze all processes or because constraints
+   on its ability to save the image are not met).
+
+   - expected_compression:
+
+   These values allow you to set an expected compression ratio, which Software
+   Suspend will use in calculating whether it meets constraints on the image
+   size. If this expected compression ratio is not attained, the suspend will
+   abort, so it is wise to allow some spare. You can see what compression
+   ratio is achieved in the logs after suspending.
+
+   - filewriter_target:
+
+   Read this value to get the current setting. Write to it to point Suspend
+   at a new storage location for the filewriter. See above for details of how
+   to set up the filewriter.
+
+   - headerlocations:
+
+   This option tells you the resume2= options to use for swap devices you
+   currently have activated. It is particularly useful when you only want to
+   use a swap file to store your image. See above for further details.
+
+   - image_exists:
+
+   Can be used in a script to determine whether a valid image exists at the
+   location currently pointed to by resume2=. Returns up to three lines.
+   The first is whether an image exists (-1 for unsure, otherwise 0 or 1).
+   If an image eixsts, additional lines will return the machine and version.
+   Echoing anything to this entry removes any current image.
+
+   - image_size_limit:
+
+   The maximum size of suspend image written to disk, measured in megabytes
+   (1024*1024).
+
+   - interface_version:
+
+   The value returned by this file can be used by scripts and configuration
+   tools to determine what entries should be looked for. The value is
+   incremented whenever an entry in /proc/suspend2 is obsoleted or 
+   added.
+
+   - last_result:
+
+   The result of the last suspend, as defined in
+   include/linux/suspend-debug.h with the values SUSPEND_ABORTED to
+   SUSPEND_KEPT_IMAGE. This is a bitmask.
+
+   - log_everything (CONFIG_PM_DEBUG):
+
+   Setting this option results in all messages printed being logged. Normally,
+   only a subset are logged, so as to not slow the process and not clutter the
+   logs. Useful for debugging. It can be toggled during a cycle by pressing
+   'L'.
+
+   - pause_between_steps (CONFIG_PM_DEBUG):
+
+   This option is used during debugging, to make Suspend2 pause between
+   each step of the process. It is ignored when the nice display is on.
+
+   - powerdown_method:
+
+   Used to select a method by which Suspend2 should powerdown after writing the
+   image. Currently:
+
+   0: Don't use ACPI to power off.
+   3: Attempt to enter Suspend-to-ram.
+   4: Attempt to enter ACPI S4 mode.
+   5: Attempt to power down via ACPI S5 mode.
+
+   Note that these options are highly dependant upon your hardware & software:
+
+   3: When succesful, your machine suspends-to-ram instead of powering off.
+      The advantage of using this mode is that it doesn't matter whether your
+      battery has enough charge to make it through to your next resume. If it
+      lasts, you will simply resume from suspend to ram (and the image on disk
+      will be discarded). If the battery runs out, you will resume from disk
+      instead. The disadvantage is that it takes longer than a normal
+      suspend-to-ram to enter the state, since the suspend-to-disk image needs
+      to be written first.
+   4/5: When successful, your machine will be off and comsume (almost) no power.
+      But it might still react to some external events like opening the lid or
+      trafic on  a network or usb device. For the bios, resume is then the same
+      as warm boot, similar to a situation where you used the command `reboot'
+      to reboot your machine. If your machine has problems on warm boot or if
+      you want to protect your machine with the bios password, this is probably
+      not the right choice. Mode 4 may be necessary on some machines where ACPI
+      wake up methods need to be run to properly reinitialise hardware after a
+      suspend-to-disk cycle.  
+   0: Switch the machine completely off. The only possible wakeup is the power
+      button. For the bios, resume is then the same as a cold boot, in
+      particular you would  have to provide your bios boot password if your
+      machine uses that feature for booting.
+
+   - progressbar_granularity_limit:
+
+   This option can be used to limit the granularity of the progress bar
+   displayed with a bootsplash screen. The value is the maximum number of
+   steps. That is, 10 will make the progress bar jump in 10% increments.
+
+   - reboot:
+
+   This option causes Suspend2 to reboot rather than powering down
+   at the end of saving an image. It can be toggled during a cycle by pressing
+   'R'.
+
+   - resume_commandline:
+
+   This entry can be read after resuming to see the commandline that was used
+   when resuming began. You might use this to set up two bootloader entries
+   that are the same apart from the fact that one includes a extra append=
+   argument "at_work=1". You could then grep resume_commandline in your
+   post-resume scripts and configure networking (for example) differently
+   depending upon whether you're at home or work. resume_commandline can be
+   set to arbitrary text if you wish to remove sensitive contents.
+
+   - swapfile:
+
+   This entry is used to specify the swapfile or partition that
+   Suspend2 will attempt to swapon/swapoff automatically. Thus, if
+   I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically
+   for my suspend image, I would
+  
+   echo /dev/hda2 > /proc/suspend2/swapfile
+
+   /dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the
+   swapon and swapoff occur while other processes are frozen (including kswapd)
+   so this swap file will not be used up when attempting to free memory. The
+   parition/file is also given the highest priority, so other swapfiles/partitions
+   will only be used to save the image when this one is filled.
+
+   The value of this file is used by headerlocations along with any currently
+   activated swapfiles/partitions.
+
+   - toggle_process_nofreeze
+
+   This entry can be used to toggle the NOFREEZE flag on a process, to allow it
+   to run during Suspending. It should be used with extreme caution. There are
+   strict limitations on what a process running during suspend can do. This is
+   really only intended for use by Suspend's helpers (userui in particular).
+
+   - userui_program
+
+   This entry is used to tell Suspend what userspace program to use for
+   providing a user interface while suspending. The program uses a netlink
+   socket to pass messages back and forward to the kernel, allowing all of the
+   functions formerly implemented in the kernel user interface components.
+
+   - version:
+  
+   The version of suspend you have compiled into the currently running kernel.
+
+7. How do you get support?
+
+   Glad you asked. Suspend2 is being actively maintained and supported
+   by Nigel (the guy doing most of the kernel coding at the moment), Bernard
+   (who maintains the hibernate script and userspace user interface components)
+   and its users.
+
+   Resources availble include HowTos, FAQs and a Wiki, all available via
+   suspend2.net.  You can find the mailing lists there.
+
+8. I think I've found a bug. What should I do?
+
+   By far and a way, the most common problems people have with suspend2
+   related to drivers not having adequate power management support. In this
+   case, it is not a bug with suspend2, but we can still help you. As we
+   mentioned above, such issues can usually be worked around by building the
+   functionality as modules and unloading them while suspending. Please visit
+   the Wiki for up-to-date lists of known issues and work arounds.
+
+   If this information doesn't help, try running:
+
+   hibernate --bug-report
+
+   ..and sending the output to the users mailing list.
+
+   Good information on how to provide us with useful information from an
+   oops is found in the file REPORTING-BUGS, in the top level directory
+   of the kernel tree. If you get an oops, please especially note the
+   information about running what is printed on the screen through ksymoops.
+   The raw information is useless.
+
+9. When will XXX be supported?
+
+   If there's a feature missing from Suspend2 that you'd like, feel free to
+   ask. We try to be obliging, within reason.
+
+   Patches are welcome. Please send to the list.
+
+10. How does it work?
+
+   Suspend2 does its work in a number of steps.
+
+   a. Freezing system activity.
+
+   The first main stage in suspending is to stop all other activity. This is
+   achieved in stages. Processes are considered in fours groups, which we will
+   describe in reverse order for clarity's sake: Threads with the PF_NOFREEZE
+   flag, kernel threads without this flag, userspace processes with the
+   PF_SYNCTHREAD flag and all other processes. The first set (PF_NOFREEZE) are
+   untouched by the refrigerator code. They are allowed to run during suspending
+   and resuming, and are used to support user interaction, storage access or the
+   like. Other kernel threads (those unneeded while suspending) are frozen last.
+   This leaves us with userspace processes that need to be frozen. When a
+   process enters one of the *_sync system calls, we set a PF_SYNCTHREAD flag on
+   that process for the duration of that call. Processes that have this flag are
+   frozen after processes without it, so that we can seek to ensure that dirty
+   data is synced to disk as quickly as possible in a situation where other
+   processes may be submitting writes at the same time. Freezing the processes
+   that are submitting data stops new I/O from being submitted. Syncthreads can
+   then cleanly finish their work. So the order is:
+
+   - Userspace processes without PF_SYNCTHREAD or PF_NOFREEZE;
+   - Userspace processes with PF_SYNCTHREAD (they won't have NOFREEZE);
+   - Kernel processes without PF_NOFREEZE.
+
+   b. Eating memory.
+
+   For a successful suspend, you need to have enough disk space to store the
+   image and enough memory for the various limitations of Suspend2's
+   algorithm. You can also specify a maximum image size. In order to attain
+   to those constraints, Suspend2 may 'eat' memory. If, after freezing
+   processes, the constraints aren't met, Suspend2 will thaw all the
+   other processes and begin to eat memory until its calculations indicate
+   the constraints are met. It will then freeze processes again and recheck
+   its calculations.
+
+   c. Allocation of storage.
+
+   Next, Suspend2 allocates the storage that will be used to save
+   the image.
+
+   The core of Suspend2 knows nothing about how or where pages are stored. We
+   therefore request the active writer (remember you might have compiled in
+   more than one!) to allocate enough storage for our expect image size. If
+   this request cannot be fulfilled, we eat more memory and try again. If it
+   is fulfiled, we seek to allocate additional storage, just in case our
+   expected compression ratio (if any) isn't achieved. This time, however, we
+   just continue if we can't allocate enough storage.
+
+   If these calls to our writer change the characteristics of the image such
+   that we haven't allocated enough memory, we also loop. (The writer may well
+   need to allocate space for its storage information).
+
+   d. Write the first part of the image.
+
+   Suspend2 stores the image in two sets of pages called 'pagesets'.
+   Pageset 2 contains pages on the active and inactive lists; essentially
+   the page cache. Pageset 1 contains all other pages, including the kernel.
+   We use two pagesets for one important reason: We need to make an atomic copy
+   of the kernel to ensure consistency of the image. Without a second pageset,
+   that would limit us to an image that was at most half the amount of memory
+   available. Using two pagesets allows us to store a full image. Since pageset
+   2 pages won't be needed in saving pageset 1, we first save pageset 2 pages.
+   We can then make our atomic copy of the remaining pages using both pageset 2
+   pages and any other pages that are free. While saving both pagesets, we are
+   careful not to corrupt the image. Among other things, we use lowlevel block
+   I/O routines that don't change the pagecache contents.
+
+   The next step, then, is writing pageset 2.
+
+   e. Suspending drivers and storing processor context.
+
+   Having written pageset2, Suspend2 calls the power management functions to
+   notify drivers of the suspend, and saves the processor state in preparation
+   for the atomic copy of memory we are about to make.
+
+   f. Atomic copy.
+
+   At this stage, everything else but the Suspend2 code is halted. Processes
+   are frozen or idling, drivers are quiesced and have stored (ideally and where
+   necessary) their configuration in memory we are about to atomically copy.
+   In our lowlevel architecture specific code, we have saved the CPU state.
+   We can therefore now do our atomic copy before resuming drivers etc.
+
+   g. Save the atomic copy (pageset 1).
+
+   Suspend can then write the atomic copy of the remaining pages. Since we
+   have copied the pages into other locations, we can continue to use the
+   normal block I/O routines without fear of corruption our image.
+
+   f. Save the suspend header.
+
+   Nearly there! We save our settings and other parameters needed for
+   reloading pageset 1 in a 'suspend header'. We also tell our writer to
+   serialise its data at this stage, so that it can reread the image at resume
+   time. Note that the writer can write this data in any format - in the case
+   of the swapwriter, for example, it splits header pages in 4092 byte blocks,
+   using the last four bytes to link pages of data together. This is completely
+   transparent to the core.
+
+   g. Set the image header.
+
+   Finally, we edit the header at our resume2= location. The signature is
+   changed by the writer to reflect the fact that an image exists, and to point
+   to the start of that data if necessary (swapwriter).
+
+   h. Power down.
+
+   Or reboot if we're debugging and the appropriate option is selected.
+
+   Whew!
+
+   Reloading the image.
+   --------------------
+
+   Reloading the image is essentially the reverse of all the above. We load
+   our copy of pageset 1, being careful to choose locations that aren't going
+   to be overwritten as we copy it back (We start very early in the boot
+   process, so there are no other processes to quiesce here). We then copy
+   pageset 1 back to its original location in memory and restore the process
+   context. We are now running with the original kernel. Next, we reload the
+   pageset 2 pages, free the memory and swap used by Suspend2, restore
+   the pageset header and restart processes. Sounds easy in comparison to
+   suspending, doesn't it!
+
+   There is of course more to Suspend2 than this, but this explanation
+   should be a good start. If there's interest, I'll write further
+   documentation on range pages and the low level I/O.
+
+11. Who wrote Suspend2?
+
+   (Answer based on the writings of Florent Chabaud, credits in files and
+   Nigel's limited knowledge; apologies to anyone missed out!)
+
+   The main developers of Suspend2 have been...
+
+   Gabor Kuti
+   Pavel Machek
+   Florent Chabaud
+   Bernard Blackham
+   Nigel Cunningham
+
+   They have been aided in their efforts by a host of hundreds, if not thousands
+   of testers and people who have submitted bug fixes & suggestions. Of special
+   note are the efforts of Michael Frank, who had his computers repetitively
+   suspend and resume for literally tens of thousands of cycles and developed
+   scripts to stress the system and test Suspend2 far beyond the point
+   most of us (Nigel included!) would consider testing. His efforts have
+   contributed as much to Suspend2 as any of the names above.

--
Nigel Cunningham		nigel at suspend2 dot net

Related Links:

Archive of above thread

When is this joke going to end?

June 27, 2006 - 9:28am

Henk Poley

On one side we have a working hibernate implementation, outside of the main kernel development. On the other side we have one already in-kernel that supports less systems, and less architectures (AFAIK), and is in the position of a full rewrite.

Which are one we going to choose.. ?

*sigh*

Agreed, best jokes are short ones

June 27, 2006 - 1:08pm

renoX

*Sigh too* sometimes working in an enterprise has advantage: the manager would lock Nigel Cunningham and Greg KH together in a room until they finally sort the mess.

FOSS development is usually slow, but this is glacial..

Even worse:

June 27, 2006 - 1:59pm

Anonymous (not verified)

We're talking about the -mm tree in this case, not the main line.

Um, we were "locked together

June 28, 2006 - 6:10am

Greg K-H (not verified)

Um, we were "locked together in a room" last year at the Linux power
management meeting, along with Pavel and the other kernel developers
in this area. And we all drilled into Nigel the importance of
working _with_ the kernel community, instead of ignoring it.

And this patch set is a direct result of that. Nigel is trying to
do the right thing here, and is slowly moving to understand the process
that the kernel is developed under (small incremental changes over time,
not large big changes all at once.)

Pavel's dog (in the manger)

June 29, 2006 - 3:35am

I see that Nigel *has* been the *only one* working with LKML to improve hibernate to a usable state. Pavel's dog, on the other hand, has managed to hold back progress on this for over 2 years now -- just sitting there poo-pooing things randomly, because *he* happens to have gotten there first with a vastly inferior implementation. Ugh. Makes me sick.

"small changes" = maintainer never changes?

June 30, 2006 - 11:05pm

Honestly, I can't see Pavel as anything but an obstruction in this project.

Look at where Nigel is listing the advantages of Suspend2. Pavel can only reply to these points with sheepish comments like "I had that at one point," "someone else could implement that," "should be able to, eventually."

What kind of lame comments are these? And who exactly would reject an obviously technically superior implementation for a half-baked one that may have worked at some point, for some small set of people?

I hope that the philosophy of "small changes at a time" isn't code for "whoever gets there first gets to be the maintainer, no matter how bad he is at it." Because that's a lot like what it looks like now.

Linux suspend is in a sorry state.

- just another dev

Yup, I agree with sorry-suspe

July 1, 2006 - 1:25am

Srinivas M (not verified)

Yup, I agree with sorry-suspend state of Linux. I have bought Dell's Laptop in November 2003. Tried to fiddle with suspend, but knew that it was not stable. Waited more than 2.5 years, now I use Windows for regularly on my Laptop -- as it will gives me Hibernate facility. I still use extensively Linux in my office and my home AMD machine -- but not for Laptop. Yes in first year of my latop-days, I religiously used Linux-only on Laptop. As I donot have UPS(and we have some occassional powercuts)and with a better power-save/hibernate implementation is available in Windoze, I switched to Windoze for most of the jobs -- enough waiting for Linux's Suspend-solution. If Linux kernel developers wants more people to test the new features, then should not let end-user wait for years for some-features(not all).

So if Linux wants to go into mainstream(whether Desktop/Laptop), Hibernation facility is one of the most important utility.

I have been listening about suspend/supend2, it seems we are having enough beurocracy in the Linux-Kernel-Devlopement-System to stifle. This beurocracy has killed/destroyed great companies/countries-economies in the world(over a period of time).

In last yearworld's laptop sales(or growth rate) going higher than desktops.

Linux as one of the major OS in the world, is still debating,... over the issue and waiting for some auspious-day/time for suspend :).

We(linux-admirers) still think that we will outshine Windoze :) in desktop arena --without all these facilities

We? Whats your substantial co

June 28, 2006 - 10:11am

We? Whats your substantial contribution to the problem? Did you help Nigel with his issues? A little bit more respect seems appropriate...

Why can't we have two of them

June 28, 2006 - 5:48pm

I donot understand the rational behind that only one of them make into kernel and other will stayout(or will not be supported, or supported with extra difficulties)

Most of the open-source/any-commerical software follows Darwin's theory -- survival of the fittest. So this case also should be like that.

So why can't we have an approach in kernel with ability to choose/exist variety of implementation for different strategies -- this exists for different file-systems(like ext2,ext3,reiser,xfs,fat,etc). Yes, it may be complicated for Hibernate/SW-Suspend -- as they are more complicated than file-systems.

IMHO, we should have a VFS layer type of thingy(some interface) for SW-Suspend mechanism -- rather than one hard-liked strategy available to user. By default main line kernel may choose one, but users should have option of choosing them using some /etc or /sys or /proc files. In this end users can check better implementation and give feed back to kernel-devlopers -- rather than kernel-developers theorizing,cotteries,arguing,procastinating,etc. Why I say that is, most of the times developers won't think from end-user point of view. A softwaree can only be successful, if it found acceptance with end-user(not developer). For success with end-user means: lot of features, extentability, easy-of-use,etc.

Stiffling a better-implementaion/innovation with various clauses like long-term maintanance,lots-of-theories,or-we-too-can-do-it,red-tape,voting(not from end-users,but few selected kernel-developers), will alienate enthusiastic developers (or new wannabe-kernel-developers)putting any extra effort to put new features to Linux kernel. After-all Linux kernel is a place for innovative/bright developers to put great ideas into kernel for making the Linux kernel great.

But it seems that we are deviating from such ideals now-a-days(for example, swsuspend, reiser4-fs,etc) with some of the clauses(mentioned above).

IMHO, Let different implementations fight on their own-merit and win according to survival-of-fittest. In this way only, we can foster innovation, urge to do things in better-way, and finally with more varieties to end users.

It is not my intention to start a flame-war, but as an developer and end-user, I have just voiced my opinion.

Agreed

June 29, 2006 - 6:48am

I have four different brands of laptops at home, and suspend2 is the only one that works on all of them. I find it frustrating that a broken implementation is being chosen over a working implementation because of what is essentially politics.

Thank goodness that Gentoo maintains a suspend2 kernel, and thank goodness that good developers like Nigel don't get discouraged by the process and continue struggling to provide us with quality software despite the kernel maintainers.

saving running process

June 28, 2006 - 12:10am

Nicolas Boulay (not verified)

This area seems very complexe. But the heart of the problem is to catch the entire memory of a process, serialise it, and then you could deserialise it to make it work again. That imply also structure from the kernel (vm state, IO state, etc...).

If only this could be maid available in kernel space, suspend to disk could be done but also a lot of high avalaibility feature (checkpoint and rerun on another node, process save and restore,...)

If only this could be maid av

June 28, 2006 - 2:00am

XEN or Vmware/VMotion anyone ;-)

checkpoint does (partially) exist

June 28, 2006 - 4:44am

Hello,

there exists software related to openmosic to checkpoint programs and to start them up again.

Maube the following link can help: http://howto.x-tend.be/openMosix-Summer2004/x110.html

Greetings,

Michel

software suspend deux

June 29, 2006 - 8:53pm

spstarr (not verified)

Let's not remove the current implementation until #2 works. I have a Thinkpad T42 and suspend to ram/disk works.. wonderfully. let's not break it yet.

Suspend2 "Just Works"

July 7, 2006 - 8:54am

APz

I was never too lucky with the original suspend but then again Suspend2 has always 'just worked' making it the first thing to try on a new installation.

Being a fan of FBsplash, suspend2 also gives the tools to make it look good besides working great.

I am missing a democratic app

November 19, 2006 - 8:24am

I am missing a democratic approach in Linux? If we would let the users vote on this issue the vote would clearly go to suspend2! Just ask them!

Users know what works for them and what does not! Suspend2 used to work for me from the early days on, when I could not get swsusp to work at all on any of my systems, which were numerous.

I believe there should be something like a user-referendum in Linux. If you find a certain number of supporters kernel.org would start a poll and consider the result as the way to go.

In cases like suspend2 which are of such importance to the users, a OSDL developer should be assigned to assist the developer of the project in question to get his work ready for incorporation into the main line kernel.

This developer then could help him to get his sysfs hooks and such.

I hereby strongly encourage Nigel to proceed with his efforts to get suspend2 into the kernel! Nigel, you are doing a great job. The userbase loves your work. We are confident you will master the few obstacles that are still to overcome until your work will be accepted into the -mm and later into the main line kernel!

Thank you!

lbutuxpz

June 18, 2007 - 12:28pm

lbutuxpz (not verified)

xykspgln [URL=http://axvsgmvs.com]vvfbsodp[/URL] phsxbwsj http://fibutqin.com xzwmuhje vcykjcbi


Is anybody is working on kernel development	15 minutes ago	Linux help
Linux: 2.6.9-ac16, RPMs Now Available	4 hours, 56 minutes ago	Linux 2.6-ac kernel
FreeBSD Security Advisory FreeBSD-SA-03:16.filedesc	9 hours, 25 minutes ago	FreeBSD 4.8 kernel
the system call table	17 hours, 20 minutes ago	Linux 2.6 kernel
manual pages	18 hours, 13 minutes ago	Linux help
how to configure ev64260 board configuration file in linux2.6	21 hours, 35 minutes ago	Linux 2.6 kernel
Vpn client through Pf firewall	1 day, 6 hours ago	OpenBSD PF
Is there any plans to support RSTP with in linux kernel in near future?	1 day, 8 hours ago	Linux 2.6 kernel
Kernel Panic AMD64	1 day, 14 hours ago	Linux 2.6 kernel
Linux: 2.6.18 Kernel Released	1 day, 17 hours ago	Linux 2.6 kernel