Prevent zram LRU inversion with zswap and max_pool_percent = 100












4















The major disadvantage of using zram is LRU inversion:




older pages get into the higher-priority zram and quickly fill it, while newer pages are swapped in and out of the slower [...] swap




The zswap documentation says that zswap does not suffer from this:




Zswap receives pages for compression through the Frontswap API and is able to
evict pages from its own compressed pool on an LRU basis and write them back to
the backing swap device in the case that the compressed pool is full.




Could I have all the benefits of zram and a completely compressed RAM by setting max_pool_percent to 100?




Zswap seeks to be simple in its policies.  Sysfs attributes allow for one user
controlled policy:
* max_pool_percent - The maximum percentage of memory that the compressed
pool can occupy.



No default max_pool_percent is specified here, but the Arch Wiki page says that it is 20.



Apart from the performance implications of decompressing, is there any danger / downside in setting max_pool_percent to 100?



Would it operate like using an improved swap-backed zram?










share|improve this question





























    4















    The major disadvantage of using zram is LRU inversion:




    older pages get into the higher-priority zram and quickly fill it, while newer pages are swapped in and out of the slower [...] swap




    The zswap documentation says that zswap does not suffer from this:




    Zswap receives pages for compression through the Frontswap API and is able to
    evict pages from its own compressed pool on an LRU basis and write them back to
    the backing swap device in the case that the compressed pool is full.




    Could I have all the benefits of zram and a completely compressed RAM by setting max_pool_percent to 100?




    Zswap seeks to be simple in its policies.  Sysfs attributes allow for one user
    controlled policy:
    * max_pool_percent - The maximum percentage of memory that the compressed
    pool can occupy.



    No default max_pool_percent is specified here, but the Arch Wiki page says that it is 20.



    Apart from the performance implications of decompressing, is there any danger / downside in setting max_pool_percent to 100?



    Would it operate like using an improved swap-backed zram?










    share|improve this question



























      4












      4








      4


      2






      The major disadvantage of using zram is LRU inversion:




      older pages get into the higher-priority zram and quickly fill it, while newer pages are swapped in and out of the slower [...] swap




      The zswap documentation says that zswap does not suffer from this:




      Zswap receives pages for compression through the Frontswap API and is able to
      evict pages from its own compressed pool on an LRU basis and write them back to
      the backing swap device in the case that the compressed pool is full.




      Could I have all the benefits of zram and a completely compressed RAM by setting max_pool_percent to 100?




      Zswap seeks to be simple in its policies.  Sysfs attributes allow for one user
      controlled policy:
      * max_pool_percent - The maximum percentage of memory that the compressed
      pool can occupy.



      No default max_pool_percent is specified here, but the Arch Wiki page says that it is 20.



      Apart from the performance implications of decompressing, is there any danger / downside in setting max_pool_percent to 100?



      Would it operate like using an improved swap-backed zram?










      share|improve this question
















      The major disadvantage of using zram is LRU inversion:




      older pages get into the higher-priority zram and quickly fill it, while newer pages are swapped in and out of the slower [...] swap




      The zswap documentation says that zswap does not suffer from this:




      Zswap receives pages for compression through the Frontswap API and is able to
      evict pages from its own compressed pool on an LRU basis and write them back to
      the backing swap device in the case that the compressed pool is full.




      Could I have all the benefits of zram and a completely compressed RAM by setting max_pool_percent to 100?




      Zswap seeks to be simple in its policies.  Sysfs attributes allow for one user
      controlled policy:
      * max_pool_percent - The maximum percentage of memory that the compressed
      pool can occupy.



      No default max_pool_percent is specified here, but the Arch Wiki page says that it is 20.



      Apart from the performance implications of decompressing, is there any danger / downside in setting max_pool_percent to 100?



      Would it operate like using an improved swap-backed zram?







      linux swap zram zswap






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 25 '17 at 8:07







      Tom Hale

















      asked Nov 25 '17 at 6:37









      Tom HaleTom Hale

      7,24033999




      7,24033999






















          1 Answer
          1






          active

          oldest

          votes


















          7





          +50









          To answer your question, I first ran a series of experiments. The final answers are in bold at the end.



          Experiments performed:



          1) swap file, zswap disabled
          2) swap file, zswap enabled, max_pool_percent = 20
          3) swap file, zswap enabled, max_pool_percent = 70
          4) swap file, zswap enabled, max_pool_percent = 100
          5) zram swap, zswap disabled
          6) zram swap, zswap enabled, max_pool_percent = 20
          7) no swap
          8) swap file, zswap enabled, max_pool_percent = 1
          9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Setup before the experiment:




          • VirtualBox 5.1.30

          • Fedora 27, xfce spin

          • 512 MB RAM, 16 MB video RAM, 2 CPUs

          • linux kernel 4.13.13-300.fc27.x86_64

          • default swappiness value (60)

          • created an empty 512 MB swap file (300 MB in experiment 9) for possible use during some of the experiments (using dd) but didn't swapon yet

          • disabled all dnf* systemd services, ran watch "killall -9 dnf" to be more sure that dnf won't try to auto-update during the experiment or something and throw the results off too far


          State before the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 280 72 8 132 153
          Swap: 511 0 511
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 74624 8648 127180 0 0 1377 526 275 428 3 2 94 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 102430 688 3593850 67603 3351 8000 1373336 17275 0 26
          sr0 0 0 0 0 0 0 0 0 0 0


          The subsequent swapon operations, etc., leading to the different settings during the experiments, resulted in variances of within about 2% of these values.



          Experiment operation consisted of:




          • Run Firefox for the first time

          • Wait about 40 seconds or until network and disk activity ceases (whichever is longer)

          • Record the following state after the experiment (firefox left running, except for experiments 7 and 9 where firefox crashed)


          State after the experiment:



          1) swap file, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 287 5 63 192 97
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 5904 1892 195428 63 237 1729 743 335 492 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 134680 10706 4848594 95687 5127 91447 2084176 26205 0 38
          sr0 0 0 0 0 0 0 0 0 0 0


          2) swap file, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 330 6 33 148 73
          Swap: 511 317 194
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 325376 7436 756 151144 3 110 1793 609 344 477 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 136046 1320 5150874 117469 10024 41988 1749440 53395 0 40
          sr0 0 0 0 0 0 0 0 0 0 0


          3) swap file, zswap enabled, max_pool_percent = 70



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 342 8 32 134 58
          Swap: 511 393 118
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 403208 8116 1088 137180 4 8 3538 474 467 538 3 3 91 3 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 224321 1414 10910442 220138 7535 9571 1461088 42931 0 60
          sr0 0 0 0 0 0 0 0 0 0 0


          4) swap file, zswap enabled, max_pool_percent = 100



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 345 10 32 129 56
          Swap: 511 410 101
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 420712 10916 2316 130520 1 11 3660 492 478 549 3 4 91 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 221920 1214 10922082 169369 8445 9570 1468552 28488 0 56
          sr0 0 0 0 0 0 0 0 0 0 0


          5) zram swap, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 333 4 34 147 72
          Swap: 499 314 185
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          5 0 324128 7256 1192 149444 153 365 1658 471 326 457 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 130703 884 5047298 112889 4197 9517 1433832 21037 0 37
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 58673 0 469384 271 138745 0 1109960 927 0 1


          6) zram swap, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 338 5 32 141 65
          Swap: 499 355 144
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 364984 7584 904 143572 33 166 2052 437 354 457 3 3 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 166168 998 6751610 120911 4383 9543 1436080 18916 0 42
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 13819 0 110552 78 68164 0 545312 398 0 0


          7) no swap



          Note that firefox is not running in this experiment at the time of recording these stats.



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 289 68 8 127 143
          Swap: 0 0 0
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          2 0 0 70108 10660 119976 0 0 13503 286 607 618 2 5 88 5 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 748978 3511 66775042 595064 4263 9334 1413728 23421 0 164
          sr0 0 0 0 0 0 0 0 0 0 0


          8) swap file, zswap enabled, max_pool_percent = 1



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 292 7 63 186 90
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 7088 2156 188688 43 182 1417 606 298 432 3 2 94 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 132222 9573 4796802 114450 10171 77607 2050032 137961 0 41
          sr0 0 0 0 0 0 0 0 0 0 0


          9) swap file (300 M), zswap enabled, max_pool_percent = 100



          Firefox was stuck and the system still read from disk furiously.
          The baseline for this experiment is a different since a new swap file has been written:



                        total        used        free      shared  buff/cache   available
          Mem: 485 280 8 8 196 153
          Swap: 299 0 299
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 8948 3400 198064 0 0 1186 653 249 388 2 2 95 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 103099 688 3610794 68253 3837 8084 1988936 20306 0 27
          sr0 0 0 0 0 0 0 0 0 0 0


          Specifically, extra 649384 sectors have been written as a result of this change.



          State after the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 335 32 47 118 53
          Swap: 299 277 22
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          7 1 283540 22912 2712 129132 0 0 83166 414 2387 1951 2 23 62 13 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 3416602 26605 406297938 4710584 4670 9025 2022272 33805 0 521
          sr0 0 0 0 0 0 0 0 0 0 0


          Subtracting the extra 649384 written sectors from 2022272 results in 1372888. This is less than 1433000 (see later) which is probably because of firefox not loading fully.



          I also ran a few experiments with low swappiness values (10 and 1) and they all got stuck in a frozen state with excessive disk reads, preventing me from recording the final memory stats.



          Observations:




          • Subjectively, high max_pool_percent values resulted in sluggishness.

          • Subjectively, the system in experiment 9 was so slow as to be unusable.

          • High max_pool_percent values result in the least amount of writes whereas very low value of max_pool_percent results in the most number of writes.

          • Experiments 5 and 6 (zram swap) suggest that firefox wrote data that resulted in about 62000 sectors written to disk. Anything above about 1433000 are sectors written due to swapping. See the following table.

          • If we assume the lowest number of read sectors among the experiments to be the baseline, we can compare the experiments based on how much extra read sectors due to swapping they caused.


          Written sectors as a direct consequence of swapping (approx.):



          650000   1) swap file, zswap disabled
          320000 2) swap file, zswap enabled, max_pool_percent = 20
          30000 3) swap file, zswap enabled, max_pool_percent = 70
          40000 4) swap file, zswap enabled, max_pool_percent = 100
          0 5) zram swap, zswap disabled
          0 6) zram swap, zswap enabled, max_pool_percent = 20
          -20000 7) no swap (firefox crashed)
          620000 8) swap file, zswap enabled, max_pool_percent = 1
          -60000 9) swap file (300 M), zswap enabled, max_pool_percent = 100 (firefox crashed)


          Extra read sectors as a direct consequence of swapping (approx.):



              51792             1) swap file, zswap disabled
          354072 2) swap file, zswap enabled, max_pool_percent = 20
          6113640 3) swap file, zswap enabled, max_pool_percent = 70
          6125280 4) swap file, zswap enabled, max_pool_percent = 100
          250496 5) zram swap, zswap disabled
          1954808 6) zram swap, zswap enabled, max_pool_percent = 20
          61978240 7) no swap
          0 (baseline) 8) swap file, zswap enabled, max_pool_percent = 1
          401501136 9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Interpretation of results:




          • This is subjective and also specific to the usecase at hand; behavior will vary in other usecases.

          • Zswap's page pool takes away space in RAM that can otherwise be used by system's page cache (for file-backed pages), which means that the system repeatedly throws away file-backed pages and reads them again when needed, resulting in excessive reads.

          • The high number of reads in experiment 7 is caused by the same problem - the system's anonymous pages took most of the RAM and file-backed pages had to be repeatedly read from disk.

          • It might be possible under certain circumstances to minimize the amount of data written to swap disk near zero using zswap but it is evidently not suited for this task.

          • It is not possible to have "completely compressed RAM" as the system needs a certain amount of non-swap pages to reside in RAM for operation.


          Personal opinions and anecdotes:




          • The main improvement of zswap in terms of disk writes is not the fact that it compresses the pages but the fact it has its own buffering & caching system that reduces the page cache and effectively keeps more anonymous pages (in compressed form) in RAM. (However, based on my subjective experience as I use Linux daily, a system with swap and zswap with the default values of swappiness and max_pool_percent always behaves better than any swappiness value and no zswap or zswap with high values of max_pool_percent.)

          • Low swappiness values seem to make the system behave better until the amount of page cache left is so small as to render the system unusable due to excessive disk reads. Similar with too high max_pool_percent.

          • Either use solely zram swap and limit the amount of anonymous pages you need to hold in memory, or use disk-backed swap with zswap with approximately default values for swappiness and max_pool_percent.


          EDIT:
          Possible future work to answer the finer points of your question would be to find out for your particular usecase how the the zsmalloc allocator used in zram compares compression-wise with the zbud allocator used in zswap. I'm not going to do that, though, just pointing out things to search for in docs/on the internet.



          EDIT 2:
          echo "zsmalloc" > /sys/module/zswap/parameters/zpool switches zswap's allocator from zbud to zsmalloc. Continuing with my test fixture for the above experiments and comparing zram with zswap+zsmalloc, it seems that as long as the swap memory needed is the same as either a zram swap or as zswap's max_pool_percent, the amount of reads and writes to disk is very similar between the two. In my personal opinion based on the facts, as long as the amount of zram swap I need is smaller than the amount of zram swap I can afford to actually keep in RAM, then it is best to use solely zram; and once I need more swap than I can actually keep in memory, it is best to either change my workload to avoid it or to disable zram swap and use zswap with zsmalloc and set max_pool_percent to the equivalent of what zram previously took in memory (size of zram * compression ratio). I currently don't have the time to do a proper writeup of these additional tests, though.






          share|improve this answer


























          • Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

            – Tom Hale
            Dec 2 '17 at 8:28











          • I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

            – Jake F
            Dec 2 '17 at 13:31











          • Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

            – Tom Hale
            Dec 2 '17 at 13:55











          • It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

            – Wil
            Feb 5 at 17:40











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f406925%2fprevent-zram-lru-inversion-with-zswap-and-max-pool-percent-100%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7





          +50









          To answer your question, I first ran a series of experiments. The final answers are in bold at the end.



          Experiments performed:



          1) swap file, zswap disabled
          2) swap file, zswap enabled, max_pool_percent = 20
          3) swap file, zswap enabled, max_pool_percent = 70
          4) swap file, zswap enabled, max_pool_percent = 100
          5) zram swap, zswap disabled
          6) zram swap, zswap enabled, max_pool_percent = 20
          7) no swap
          8) swap file, zswap enabled, max_pool_percent = 1
          9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Setup before the experiment:




          • VirtualBox 5.1.30

          • Fedora 27, xfce spin

          • 512 MB RAM, 16 MB video RAM, 2 CPUs

          • linux kernel 4.13.13-300.fc27.x86_64

          • default swappiness value (60)

          • created an empty 512 MB swap file (300 MB in experiment 9) for possible use during some of the experiments (using dd) but didn't swapon yet

          • disabled all dnf* systemd services, ran watch "killall -9 dnf" to be more sure that dnf won't try to auto-update during the experiment or something and throw the results off too far


          State before the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 280 72 8 132 153
          Swap: 511 0 511
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 74624 8648 127180 0 0 1377 526 275 428 3 2 94 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 102430 688 3593850 67603 3351 8000 1373336 17275 0 26
          sr0 0 0 0 0 0 0 0 0 0 0


          The subsequent swapon operations, etc., leading to the different settings during the experiments, resulted in variances of within about 2% of these values.



          Experiment operation consisted of:




          • Run Firefox for the first time

          • Wait about 40 seconds or until network and disk activity ceases (whichever is longer)

          • Record the following state after the experiment (firefox left running, except for experiments 7 and 9 where firefox crashed)


          State after the experiment:



          1) swap file, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 287 5 63 192 97
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 5904 1892 195428 63 237 1729 743 335 492 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 134680 10706 4848594 95687 5127 91447 2084176 26205 0 38
          sr0 0 0 0 0 0 0 0 0 0 0


          2) swap file, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 330 6 33 148 73
          Swap: 511 317 194
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 325376 7436 756 151144 3 110 1793 609 344 477 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 136046 1320 5150874 117469 10024 41988 1749440 53395 0 40
          sr0 0 0 0 0 0 0 0 0 0 0


          3) swap file, zswap enabled, max_pool_percent = 70



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 342 8 32 134 58
          Swap: 511 393 118
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 403208 8116 1088 137180 4 8 3538 474 467 538 3 3 91 3 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 224321 1414 10910442 220138 7535 9571 1461088 42931 0 60
          sr0 0 0 0 0 0 0 0 0 0 0


          4) swap file, zswap enabled, max_pool_percent = 100



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 345 10 32 129 56
          Swap: 511 410 101
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 420712 10916 2316 130520 1 11 3660 492 478 549 3 4 91 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 221920 1214 10922082 169369 8445 9570 1468552 28488 0 56
          sr0 0 0 0 0 0 0 0 0 0 0


          5) zram swap, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 333 4 34 147 72
          Swap: 499 314 185
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          5 0 324128 7256 1192 149444 153 365 1658 471 326 457 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 130703 884 5047298 112889 4197 9517 1433832 21037 0 37
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 58673 0 469384 271 138745 0 1109960 927 0 1


          6) zram swap, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 338 5 32 141 65
          Swap: 499 355 144
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 364984 7584 904 143572 33 166 2052 437 354 457 3 3 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 166168 998 6751610 120911 4383 9543 1436080 18916 0 42
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 13819 0 110552 78 68164 0 545312 398 0 0


          7) no swap



          Note that firefox is not running in this experiment at the time of recording these stats.



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 289 68 8 127 143
          Swap: 0 0 0
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          2 0 0 70108 10660 119976 0 0 13503 286 607 618 2 5 88 5 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 748978 3511 66775042 595064 4263 9334 1413728 23421 0 164
          sr0 0 0 0 0 0 0 0 0 0 0


          8) swap file, zswap enabled, max_pool_percent = 1



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 292 7 63 186 90
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 7088 2156 188688 43 182 1417 606 298 432 3 2 94 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 132222 9573 4796802 114450 10171 77607 2050032 137961 0 41
          sr0 0 0 0 0 0 0 0 0 0 0


          9) swap file (300 M), zswap enabled, max_pool_percent = 100



          Firefox was stuck and the system still read from disk furiously.
          The baseline for this experiment is a different since a new swap file has been written:



                        total        used        free      shared  buff/cache   available
          Mem: 485 280 8 8 196 153
          Swap: 299 0 299
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 8948 3400 198064 0 0 1186 653 249 388 2 2 95 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 103099 688 3610794 68253 3837 8084 1988936 20306 0 27
          sr0 0 0 0 0 0 0 0 0 0 0


          Specifically, extra 649384 sectors have been written as a result of this change.



          State after the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 335 32 47 118 53
          Swap: 299 277 22
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          7 1 283540 22912 2712 129132 0 0 83166 414 2387 1951 2 23 62 13 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 3416602 26605 406297938 4710584 4670 9025 2022272 33805 0 521
          sr0 0 0 0 0 0 0 0 0 0 0


          Subtracting the extra 649384 written sectors from 2022272 results in 1372888. This is less than 1433000 (see later) which is probably because of firefox not loading fully.



          I also ran a few experiments with low swappiness values (10 and 1) and they all got stuck in a frozen state with excessive disk reads, preventing me from recording the final memory stats.



          Observations:




          • Subjectively, high max_pool_percent values resulted in sluggishness.

          • Subjectively, the system in experiment 9 was so slow as to be unusable.

          • High max_pool_percent values result in the least amount of writes whereas very low value of max_pool_percent results in the most number of writes.

          • Experiments 5 and 6 (zram swap) suggest that firefox wrote data that resulted in about 62000 sectors written to disk. Anything above about 1433000 are sectors written due to swapping. See the following table.

          • If we assume the lowest number of read sectors among the experiments to be the baseline, we can compare the experiments based on how much extra read sectors due to swapping they caused.


          Written sectors as a direct consequence of swapping (approx.):



          650000   1) swap file, zswap disabled
          320000 2) swap file, zswap enabled, max_pool_percent = 20
          30000 3) swap file, zswap enabled, max_pool_percent = 70
          40000 4) swap file, zswap enabled, max_pool_percent = 100
          0 5) zram swap, zswap disabled
          0 6) zram swap, zswap enabled, max_pool_percent = 20
          -20000 7) no swap (firefox crashed)
          620000 8) swap file, zswap enabled, max_pool_percent = 1
          -60000 9) swap file (300 M), zswap enabled, max_pool_percent = 100 (firefox crashed)


          Extra read sectors as a direct consequence of swapping (approx.):



              51792             1) swap file, zswap disabled
          354072 2) swap file, zswap enabled, max_pool_percent = 20
          6113640 3) swap file, zswap enabled, max_pool_percent = 70
          6125280 4) swap file, zswap enabled, max_pool_percent = 100
          250496 5) zram swap, zswap disabled
          1954808 6) zram swap, zswap enabled, max_pool_percent = 20
          61978240 7) no swap
          0 (baseline) 8) swap file, zswap enabled, max_pool_percent = 1
          401501136 9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Interpretation of results:




          • This is subjective and also specific to the usecase at hand; behavior will vary in other usecases.

          • Zswap's page pool takes away space in RAM that can otherwise be used by system's page cache (for file-backed pages), which means that the system repeatedly throws away file-backed pages and reads them again when needed, resulting in excessive reads.

          • The high number of reads in experiment 7 is caused by the same problem - the system's anonymous pages took most of the RAM and file-backed pages had to be repeatedly read from disk.

          • It might be possible under certain circumstances to minimize the amount of data written to swap disk near zero using zswap but it is evidently not suited for this task.

          • It is not possible to have "completely compressed RAM" as the system needs a certain amount of non-swap pages to reside in RAM for operation.


          Personal opinions and anecdotes:




          • The main improvement of zswap in terms of disk writes is not the fact that it compresses the pages but the fact it has its own buffering & caching system that reduces the page cache and effectively keeps more anonymous pages (in compressed form) in RAM. (However, based on my subjective experience as I use Linux daily, a system with swap and zswap with the default values of swappiness and max_pool_percent always behaves better than any swappiness value and no zswap or zswap with high values of max_pool_percent.)

          • Low swappiness values seem to make the system behave better until the amount of page cache left is so small as to render the system unusable due to excessive disk reads. Similar with too high max_pool_percent.

          • Either use solely zram swap and limit the amount of anonymous pages you need to hold in memory, or use disk-backed swap with zswap with approximately default values for swappiness and max_pool_percent.


          EDIT:
          Possible future work to answer the finer points of your question would be to find out for your particular usecase how the the zsmalloc allocator used in zram compares compression-wise with the zbud allocator used in zswap. I'm not going to do that, though, just pointing out things to search for in docs/on the internet.



          EDIT 2:
          echo "zsmalloc" > /sys/module/zswap/parameters/zpool switches zswap's allocator from zbud to zsmalloc. Continuing with my test fixture for the above experiments and comparing zram with zswap+zsmalloc, it seems that as long as the swap memory needed is the same as either a zram swap or as zswap's max_pool_percent, the amount of reads and writes to disk is very similar between the two. In my personal opinion based on the facts, as long as the amount of zram swap I need is smaller than the amount of zram swap I can afford to actually keep in RAM, then it is best to use solely zram; and once I need more swap than I can actually keep in memory, it is best to either change my workload to avoid it or to disable zram swap and use zswap with zsmalloc and set max_pool_percent to the equivalent of what zram previously took in memory (size of zram * compression ratio). I currently don't have the time to do a proper writeup of these additional tests, though.






          share|improve this answer


























          • Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

            – Tom Hale
            Dec 2 '17 at 8:28











          • I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

            – Jake F
            Dec 2 '17 at 13:31











          • Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

            – Tom Hale
            Dec 2 '17 at 13:55











          • It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

            – Wil
            Feb 5 at 17:40
















          7





          +50









          To answer your question, I first ran a series of experiments. The final answers are in bold at the end.



          Experiments performed:



          1) swap file, zswap disabled
          2) swap file, zswap enabled, max_pool_percent = 20
          3) swap file, zswap enabled, max_pool_percent = 70
          4) swap file, zswap enabled, max_pool_percent = 100
          5) zram swap, zswap disabled
          6) zram swap, zswap enabled, max_pool_percent = 20
          7) no swap
          8) swap file, zswap enabled, max_pool_percent = 1
          9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Setup before the experiment:




          • VirtualBox 5.1.30

          • Fedora 27, xfce spin

          • 512 MB RAM, 16 MB video RAM, 2 CPUs

          • linux kernel 4.13.13-300.fc27.x86_64

          • default swappiness value (60)

          • created an empty 512 MB swap file (300 MB in experiment 9) for possible use during some of the experiments (using dd) but didn't swapon yet

          • disabled all dnf* systemd services, ran watch "killall -9 dnf" to be more sure that dnf won't try to auto-update during the experiment or something and throw the results off too far


          State before the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 280 72 8 132 153
          Swap: 511 0 511
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 74624 8648 127180 0 0 1377 526 275 428 3 2 94 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 102430 688 3593850 67603 3351 8000 1373336 17275 0 26
          sr0 0 0 0 0 0 0 0 0 0 0


          The subsequent swapon operations, etc., leading to the different settings during the experiments, resulted in variances of within about 2% of these values.



          Experiment operation consisted of:




          • Run Firefox for the first time

          • Wait about 40 seconds or until network and disk activity ceases (whichever is longer)

          • Record the following state after the experiment (firefox left running, except for experiments 7 and 9 where firefox crashed)


          State after the experiment:



          1) swap file, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 287 5 63 192 97
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 5904 1892 195428 63 237 1729 743 335 492 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 134680 10706 4848594 95687 5127 91447 2084176 26205 0 38
          sr0 0 0 0 0 0 0 0 0 0 0


          2) swap file, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 330 6 33 148 73
          Swap: 511 317 194
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 325376 7436 756 151144 3 110 1793 609 344 477 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 136046 1320 5150874 117469 10024 41988 1749440 53395 0 40
          sr0 0 0 0 0 0 0 0 0 0 0


          3) swap file, zswap enabled, max_pool_percent = 70



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 342 8 32 134 58
          Swap: 511 393 118
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 403208 8116 1088 137180 4 8 3538 474 467 538 3 3 91 3 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 224321 1414 10910442 220138 7535 9571 1461088 42931 0 60
          sr0 0 0 0 0 0 0 0 0 0 0


          4) swap file, zswap enabled, max_pool_percent = 100



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 345 10 32 129 56
          Swap: 511 410 101
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 420712 10916 2316 130520 1 11 3660 492 478 549 3 4 91 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 221920 1214 10922082 169369 8445 9570 1468552 28488 0 56
          sr0 0 0 0 0 0 0 0 0 0 0


          5) zram swap, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 333 4 34 147 72
          Swap: 499 314 185
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          5 0 324128 7256 1192 149444 153 365 1658 471 326 457 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 130703 884 5047298 112889 4197 9517 1433832 21037 0 37
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 58673 0 469384 271 138745 0 1109960 927 0 1


          6) zram swap, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 338 5 32 141 65
          Swap: 499 355 144
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 364984 7584 904 143572 33 166 2052 437 354 457 3 3 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 166168 998 6751610 120911 4383 9543 1436080 18916 0 42
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 13819 0 110552 78 68164 0 545312 398 0 0


          7) no swap



          Note that firefox is not running in this experiment at the time of recording these stats.



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 289 68 8 127 143
          Swap: 0 0 0
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          2 0 0 70108 10660 119976 0 0 13503 286 607 618 2 5 88 5 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 748978 3511 66775042 595064 4263 9334 1413728 23421 0 164
          sr0 0 0 0 0 0 0 0 0 0 0


          8) swap file, zswap enabled, max_pool_percent = 1



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 292 7 63 186 90
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 7088 2156 188688 43 182 1417 606 298 432 3 2 94 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 132222 9573 4796802 114450 10171 77607 2050032 137961 0 41
          sr0 0 0 0 0 0 0 0 0 0 0


          9) swap file (300 M), zswap enabled, max_pool_percent = 100



          Firefox was stuck and the system still read from disk furiously.
          The baseline for this experiment is a different since a new swap file has been written:



                        total        used        free      shared  buff/cache   available
          Mem: 485 280 8 8 196 153
          Swap: 299 0 299
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 8948 3400 198064 0 0 1186 653 249 388 2 2 95 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 103099 688 3610794 68253 3837 8084 1988936 20306 0 27
          sr0 0 0 0 0 0 0 0 0 0 0


          Specifically, extra 649384 sectors have been written as a result of this change.



          State after the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 335 32 47 118 53
          Swap: 299 277 22
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          7 1 283540 22912 2712 129132 0 0 83166 414 2387 1951 2 23 62 13 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 3416602 26605 406297938 4710584 4670 9025 2022272 33805 0 521
          sr0 0 0 0 0 0 0 0 0 0 0


          Subtracting the extra 649384 written sectors from 2022272 results in 1372888. This is less than 1433000 (see later) which is probably because of firefox not loading fully.



          I also ran a few experiments with low swappiness values (10 and 1) and they all got stuck in a frozen state with excessive disk reads, preventing me from recording the final memory stats.



          Observations:




          • Subjectively, high max_pool_percent values resulted in sluggishness.

          • Subjectively, the system in experiment 9 was so slow as to be unusable.

          • High max_pool_percent values result in the least amount of writes whereas very low value of max_pool_percent results in the most number of writes.

          • Experiments 5 and 6 (zram swap) suggest that firefox wrote data that resulted in about 62000 sectors written to disk. Anything above about 1433000 are sectors written due to swapping. See the following table.

          • If we assume the lowest number of read sectors among the experiments to be the baseline, we can compare the experiments based on how much extra read sectors due to swapping they caused.


          Written sectors as a direct consequence of swapping (approx.):



          650000   1) swap file, zswap disabled
          320000 2) swap file, zswap enabled, max_pool_percent = 20
          30000 3) swap file, zswap enabled, max_pool_percent = 70
          40000 4) swap file, zswap enabled, max_pool_percent = 100
          0 5) zram swap, zswap disabled
          0 6) zram swap, zswap enabled, max_pool_percent = 20
          -20000 7) no swap (firefox crashed)
          620000 8) swap file, zswap enabled, max_pool_percent = 1
          -60000 9) swap file (300 M), zswap enabled, max_pool_percent = 100 (firefox crashed)


          Extra read sectors as a direct consequence of swapping (approx.):



              51792             1) swap file, zswap disabled
          354072 2) swap file, zswap enabled, max_pool_percent = 20
          6113640 3) swap file, zswap enabled, max_pool_percent = 70
          6125280 4) swap file, zswap enabled, max_pool_percent = 100
          250496 5) zram swap, zswap disabled
          1954808 6) zram swap, zswap enabled, max_pool_percent = 20
          61978240 7) no swap
          0 (baseline) 8) swap file, zswap enabled, max_pool_percent = 1
          401501136 9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Interpretation of results:




          • This is subjective and also specific to the usecase at hand; behavior will vary in other usecases.

          • Zswap's page pool takes away space in RAM that can otherwise be used by system's page cache (for file-backed pages), which means that the system repeatedly throws away file-backed pages and reads them again when needed, resulting in excessive reads.

          • The high number of reads in experiment 7 is caused by the same problem - the system's anonymous pages took most of the RAM and file-backed pages had to be repeatedly read from disk.

          • It might be possible under certain circumstances to minimize the amount of data written to swap disk near zero using zswap but it is evidently not suited for this task.

          • It is not possible to have "completely compressed RAM" as the system needs a certain amount of non-swap pages to reside in RAM for operation.


          Personal opinions and anecdotes:




          • The main improvement of zswap in terms of disk writes is not the fact that it compresses the pages but the fact it has its own buffering & caching system that reduces the page cache and effectively keeps more anonymous pages (in compressed form) in RAM. (However, based on my subjective experience as I use Linux daily, a system with swap and zswap with the default values of swappiness and max_pool_percent always behaves better than any swappiness value and no zswap or zswap with high values of max_pool_percent.)

          • Low swappiness values seem to make the system behave better until the amount of page cache left is so small as to render the system unusable due to excessive disk reads. Similar with too high max_pool_percent.

          • Either use solely zram swap and limit the amount of anonymous pages you need to hold in memory, or use disk-backed swap with zswap with approximately default values for swappiness and max_pool_percent.


          EDIT:
          Possible future work to answer the finer points of your question would be to find out for your particular usecase how the the zsmalloc allocator used in zram compares compression-wise with the zbud allocator used in zswap. I'm not going to do that, though, just pointing out things to search for in docs/on the internet.



          EDIT 2:
          echo "zsmalloc" > /sys/module/zswap/parameters/zpool switches zswap's allocator from zbud to zsmalloc. Continuing with my test fixture for the above experiments and comparing zram with zswap+zsmalloc, it seems that as long as the swap memory needed is the same as either a zram swap or as zswap's max_pool_percent, the amount of reads and writes to disk is very similar between the two. In my personal opinion based on the facts, as long as the amount of zram swap I need is smaller than the amount of zram swap I can afford to actually keep in RAM, then it is best to use solely zram; and once I need more swap than I can actually keep in memory, it is best to either change my workload to avoid it or to disable zram swap and use zswap with zsmalloc and set max_pool_percent to the equivalent of what zram previously took in memory (size of zram * compression ratio). I currently don't have the time to do a proper writeup of these additional tests, though.






          share|improve this answer


























          • Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

            – Tom Hale
            Dec 2 '17 at 8:28











          • I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

            – Jake F
            Dec 2 '17 at 13:31











          • Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

            – Tom Hale
            Dec 2 '17 at 13:55











          • It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

            – Wil
            Feb 5 at 17:40














          7





          +50







          7





          +50



          7




          +50





          To answer your question, I first ran a series of experiments. The final answers are in bold at the end.



          Experiments performed:



          1) swap file, zswap disabled
          2) swap file, zswap enabled, max_pool_percent = 20
          3) swap file, zswap enabled, max_pool_percent = 70
          4) swap file, zswap enabled, max_pool_percent = 100
          5) zram swap, zswap disabled
          6) zram swap, zswap enabled, max_pool_percent = 20
          7) no swap
          8) swap file, zswap enabled, max_pool_percent = 1
          9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Setup before the experiment:




          • VirtualBox 5.1.30

          • Fedora 27, xfce spin

          • 512 MB RAM, 16 MB video RAM, 2 CPUs

          • linux kernel 4.13.13-300.fc27.x86_64

          • default swappiness value (60)

          • created an empty 512 MB swap file (300 MB in experiment 9) for possible use during some of the experiments (using dd) but didn't swapon yet

          • disabled all dnf* systemd services, ran watch "killall -9 dnf" to be more sure that dnf won't try to auto-update during the experiment or something and throw the results off too far


          State before the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 280 72 8 132 153
          Swap: 511 0 511
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 74624 8648 127180 0 0 1377 526 275 428 3 2 94 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 102430 688 3593850 67603 3351 8000 1373336 17275 0 26
          sr0 0 0 0 0 0 0 0 0 0 0


          The subsequent swapon operations, etc., leading to the different settings during the experiments, resulted in variances of within about 2% of these values.



          Experiment operation consisted of:




          • Run Firefox for the first time

          • Wait about 40 seconds or until network and disk activity ceases (whichever is longer)

          • Record the following state after the experiment (firefox left running, except for experiments 7 and 9 where firefox crashed)


          State after the experiment:



          1) swap file, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 287 5 63 192 97
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 5904 1892 195428 63 237 1729 743 335 492 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 134680 10706 4848594 95687 5127 91447 2084176 26205 0 38
          sr0 0 0 0 0 0 0 0 0 0 0


          2) swap file, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 330 6 33 148 73
          Swap: 511 317 194
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 325376 7436 756 151144 3 110 1793 609 344 477 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 136046 1320 5150874 117469 10024 41988 1749440 53395 0 40
          sr0 0 0 0 0 0 0 0 0 0 0


          3) swap file, zswap enabled, max_pool_percent = 70



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 342 8 32 134 58
          Swap: 511 393 118
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 403208 8116 1088 137180 4 8 3538 474 467 538 3 3 91 3 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 224321 1414 10910442 220138 7535 9571 1461088 42931 0 60
          sr0 0 0 0 0 0 0 0 0 0 0


          4) swap file, zswap enabled, max_pool_percent = 100



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 345 10 32 129 56
          Swap: 511 410 101
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 420712 10916 2316 130520 1 11 3660 492 478 549 3 4 91 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 221920 1214 10922082 169369 8445 9570 1468552 28488 0 56
          sr0 0 0 0 0 0 0 0 0 0 0


          5) zram swap, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 333 4 34 147 72
          Swap: 499 314 185
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          5 0 324128 7256 1192 149444 153 365 1658 471 326 457 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 130703 884 5047298 112889 4197 9517 1433832 21037 0 37
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 58673 0 469384 271 138745 0 1109960 927 0 1


          6) zram swap, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 338 5 32 141 65
          Swap: 499 355 144
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 364984 7584 904 143572 33 166 2052 437 354 457 3 3 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 166168 998 6751610 120911 4383 9543 1436080 18916 0 42
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 13819 0 110552 78 68164 0 545312 398 0 0


          7) no swap



          Note that firefox is not running in this experiment at the time of recording these stats.



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 289 68 8 127 143
          Swap: 0 0 0
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          2 0 0 70108 10660 119976 0 0 13503 286 607 618 2 5 88 5 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 748978 3511 66775042 595064 4263 9334 1413728 23421 0 164
          sr0 0 0 0 0 0 0 0 0 0 0


          8) swap file, zswap enabled, max_pool_percent = 1



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 292 7 63 186 90
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 7088 2156 188688 43 182 1417 606 298 432 3 2 94 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 132222 9573 4796802 114450 10171 77607 2050032 137961 0 41
          sr0 0 0 0 0 0 0 0 0 0 0


          9) swap file (300 M), zswap enabled, max_pool_percent = 100



          Firefox was stuck and the system still read from disk furiously.
          The baseline for this experiment is a different since a new swap file has been written:



                        total        used        free      shared  buff/cache   available
          Mem: 485 280 8 8 196 153
          Swap: 299 0 299
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 8948 3400 198064 0 0 1186 653 249 388 2 2 95 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 103099 688 3610794 68253 3837 8084 1988936 20306 0 27
          sr0 0 0 0 0 0 0 0 0 0 0


          Specifically, extra 649384 sectors have been written as a result of this change.



          State after the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 335 32 47 118 53
          Swap: 299 277 22
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          7 1 283540 22912 2712 129132 0 0 83166 414 2387 1951 2 23 62 13 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 3416602 26605 406297938 4710584 4670 9025 2022272 33805 0 521
          sr0 0 0 0 0 0 0 0 0 0 0


          Subtracting the extra 649384 written sectors from 2022272 results in 1372888. This is less than 1433000 (see later) which is probably because of firefox not loading fully.



          I also ran a few experiments with low swappiness values (10 and 1) and they all got stuck in a frozen state with excessive disk reads, preventing me from recording the final memory stats.



          Observations:




          • Subjectively, high max_pool_percent values resulted in sluggishness.

          • Subjectively, the system in experiment 9 was so slow as to be unusable.

          • High max_pool_percent values result in the least amount of writes whereas very low value of max_pool_percent results in the most number of writes.

          • Experiments 5 and 6 (zram swap) suggest that firefox wrote data that resulted in about 62000 sectors written to disk. Anything above about 1433000 are sectors written due to swapping. See the following table.

          • If we assume the lowest number of read sectors among the experiments to be the baseline, we can compare the experiments based on how much extra read sectors due to swapping they caused.


          Written sectors as a direct consequence of swapping (approx.):



          650000   1) swap file, zswap disabled
          320000 2) swap file, zswap enabled, max_pool_percent = 20
          30000 3) swap file, zswap enabled, max_pool_percent = 70
          40000 4) swap file, zswap enabled, max_pool_percent = 100
          0 5) zram swap, zswap disabled
          0 6) zram swap, zswap enabled, max_pool_percent = 20
          -20000 7) no swap (firefox crashed)
          620000 8) swap file, zswap enabled, max_pool_percent = 1
          -60000 9) swap file (300 M), zswap enabled, max_pool_percent = 100 (firefox crashed)


          Extra read sectors as a direct consequence of swapping (approx.):



              51792             1) swap file, zswap disabled
          354072 2) swap file, zswap enabled, max_pool_percent = 20
          6113640 3) swap file, zswap enabled, max_pool_percent = 70
          6125280 4) swap file, zswap enabled, max_pool_percent = 100
          250496 5) zram swap, zswap disabled
          1954808 6) zram swap, zswap enabled, max_pool_percent = 20
          61978240 7) no swap
          0 (baseline) 8) swap file, zswap enabled, max_pool_percent = 1
          401501136 9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Interpretation of results:




          • This is subjective and also specific to the usecase at hand; behavior will vary in other usecases.

          • Zswap's page pool takes away space in RAM that can otherwise be used by system's page cache (for file-backed pages), which means that the system repeatedly throws away file-backed pages and reads them again when needed, resulting in excessive reads.

          • The high number of reads in experiment 7 is caused by the same problem - the system's anonymous pages took most of the RAM and file-backed pages had to be repeatedly read from disk.

          • It might be possible under certain circumstances to minimize the amount of data written to swap disk near zero using zswap but it is evidently not suited for this task.

          • It is not possible to have "completely compressed RAM" as the system needs a certain amount of non-swap pages to reside in RAM for operation.


          Personal opinions and anecdotes:




          • The main improvement of zswap in terms of disk writes is not the fact that it compresses the pages but the fact it has its own buffering & caching system that reduces the page cache and effectively keeps more anonymous pages (in compressed form) in RAM. (However, based on my subjective experience as I use Linux daily, a system with swap and zswap with the default values of swappiness and max_pool_percent always behaves better than any swappiness value and no zswap or zswap with high values of max_pool_percent.)

          • Low swappiness values seem to make the system behave better until the amount of page cache left is so small as to render the system unusable due to excessive disk reads. Similar with too high max_pool_percent.

          • Either use solely zram swap and limit the amount of anonymous pages you need to hold in memory, or use disk-backed swap with zswap with approximately default values for swappiness and max_pool_percent.


          EDIT:
          Possible future work to answer the finer points of your question would be to find out for your particular usecase how the the zsmalloc allocator used in zram compares compression-wise with the zbud allocator used in zswap. I'm not going to do that, though, just pointing out things to search for in docs/on the internet.



          EDIT 2:
          echo "zsmalloc" > /sys/module/zswap/parameters/zpool switches zswap's allocator from zbud to zsmalloc. Continuing with my test fixture for the above experiments and comparing zram with zswap+zsmalloc, it seems that as long as the swap memory needed is the same as either a zram swap or as zswap's max_pool_percent, the amount of reads and writes to disk is very similar between the two. In my personal opinion based on the facts, as long as the amount of zram swap I need is smaller than the amount of zram swap I can afford to actually keep in RAM, then it is best to use solely zram; and once I need more swap than I can actually keep in memory, it is best to either change my workload to avoid it or to disable zram swap and use zswap with zsmalloc and set max_pool_percent to the equivalent of what zram previously took in memory (size of zram * compression ratio). I currently don't have the time to do a proper writeup of these additional tests, though.






          share|improve this answer















          To answer your question, I first ran a series of experiments. The final answers are in bold at the end.



          Experiments performed:



          1) swap file, zswap disabled
          2) swap file, zswap enabled, max_pool_percent = 20
          3) swap file, zswap enabled, max_pool_percent = 70
          4) swap file, zswap enabled, max_pool_percent = 100
          5) zram swap, zswap disabled
          6) zram swap, zswap enabled, max_pool_percent = 20
          7) no swap
          8) swap file, zswap enabled, max_pool_percent = 1
          9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Setup before the experiment:




          • VirtualBox 5.1.30

          • Fedora 27, xfce spin

          • 512 MB RAM, 16 MB video RAM, 2 CPUs

          • linux kernel 4.13.13-300.fc27.x86_64

          • default swappiness value (60)

          • created an empty 512 MB swap file (300 MB in experiment 9) for possible use during some of the experiments (using dd) but didn't swapon yet

          • disabled all dnf* systemd services, ran watch "killall -9 dnf" to be more sure that dnf won't try to auto-update during the experiment or something and throw the results off too far


          State before the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 280 72 8 132 153
          Swap: 511 0 511
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 74624 8648 127180 0 0 1377 526 275 428 3 2 94 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 102430 688 3593850 67603 3351 8000 1373336 17275 0 26
          sr0 0 0 0 0 0 0 0 0 0 0


          The subsequent swapon operations, etc., leading to the different settings during the experiments, resulted in variances of within about 2% of these values.



          Experiment operation consisted of:




          • Run Firefox for the first time

          • Wait about 40 seconds or until network and disk activity ceases (whichever is longer)

          • Record the following state after the experiment (firefox left running, except for experiments 7 and 9 where firefox crashed)


          State after the experiment:



          1) swap file, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 287 5 63 192 97
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 5904 1892 195428 63 237 1729 743 335 492 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 134680 10706 4848594 95687 5127 91447 2084176 26205 0 38
          sr0 0 0 0 0 0 0 0 0 0 0


          2) swap file, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 330 6 33 148 73
          Swap: 511 317 194
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 325376 7436 756 151144 3 110 1793 609 344 477 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 136046 1320 5150874 117469 10024 41988 1749440 53395 0 40
          sr0 0 0 0 0 0 0 0 0 0 0


          3) swap file, zswap enabled, max_pool_percent = 70



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 342 8 32 134 58
          Swap: 511 393 118
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 403208 8116 1088 137180 4 8 3538 474 467 538 3 3 91 3 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 224321 1414 10910442 220138 7535 9571 1461088 42931 0 60
          sr0 0 0 0 0 0 0 0 0 0 0


          4) swap file, zswap enabled, max_pool_percent = 100



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 345 10 32 129 56
          Swap: 511 410 101
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 420712 10916 2316 130520 1 11 3660 492 478 549 3 4 91 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 221920 1214 10922082 169369 8445 9570 1468552 28488 0 56
          sr0 0 0 0 0 0 0 0 0 0 0


          5) zram swap, zswap disabled



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 333 4 34 147 72
          Swap: 499 314 185
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          5 0 324128 7256 1192 149444 153 365 1658 471 326 457 3 2 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 130703 884 5047298 112889 4197 9517 1433832 21037 0 37
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 58673 0 469384 271 138745 0 1109960 927 0 1


          6) zram swap, zswap enabled, max_pool_percent = 20



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 338 5 32 141 65
          Swap: 499 355 144
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 364984 7584 904 143572 33 166 2052 437 354 457 3 3 93 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 166168 998 6751610 120911 4383 9543 1436080 18916 0 42
          sr0 0 0 0 0 0 0 0 0 0 0
          zram0 13819 0 110552 78 68164 0 545312 398 0 0


          7) no swap



          Note that firefox is not running in this experiment at the time of recording these stats.



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 289 68 8 127 143
          Swap: 0 0 0
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          2 0 0 70108 10660 119976 0 0 13503 286 607 618 2 5 88 5 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 748978 3511 66775042 595064 4263 9334 1413728 23421 0 164
          sr0 0 0 0 0 0 0 0 0 0 0


          8) swap file, zswap enabled, max_pool_percent = 1



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 292 7 63 186 90
          Swap: 511 249 262
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          1 0 255488 7088 2156 188688 43 182 1417 606 298 432 3 2 94 2 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 132222 9573 4796802 114450 10171 77607 2050032 137961 0 41
          sr0 0 0 0 0 0 0 0 0 0 0


          9) swap file (300 M), zswap enabled, max_pool_percent = 100



          Firefox was stuck and the system still read from disk furiously.
          The baseline for this experiment is a different since a new swap file has been written:



                        total        used        free      shared  buff/cache   available
          Mem: 485 280 8 8 196 153
          Swap: 299 0 299
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          0 0 0 8948 3400 198064 0 0 1186 653 249 388 2 2 95 1 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 103099 688 3610794 68253 3837 8084 1988936 20306 0 27
          sr0 0 0 0 0 0 0 0 0 0 0


          Specifically, extra 649384 sectors have been written as a result of this change.



          State after the experiment:



          [root@user-vm user]# free -m ; vmstat ; vmstat -d 
          total used free shared buff/cache available
          Mem: 485 335 32 47 118 53
          Swap: 299 277 22
          procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
          r b swpd free buff cache si so bi bo in cs us sy id wa st
          7 1 283540 22912 2712 129132 0 0 83166 414 2387 1951 2 23 62 13 0
          disk- ------------reads------------ ------------writes----------- -----IO------
          total merged sectors ms total merged sectors ms cur sec
          sda 3416602 26605 406297938 4710584 4670 9025 2022272 33805 0 521
          sr0 0 0 0 0 0 0 0 0 0 0


          Subtracting the extra 649384 written sectors from 2022272 results in 1372888. This is less than 1433000 (see later) which is probably because of firefox not loading fully.



          I also ran a few experiments with low swappiness values (10 and 1) and they all got stuck in a frozen state with excessive disk reads, preventing me from recording the final memory stats.



          Observations:




          • Subjectively, high max_pool_percent values resulted in sluggishness.

          • Subjectively, the system in experiment 9 was so slow as to be unusable.

          • High max_pool_percent values result in the least amount of writes whereas very low value of max_pool_percent results in the most number of writes.

          • Experiments 5 and 6 (zram swap) suggest that firefox wrote data that resulted in about 62000 sectors written to disk. Anything above about 1433000 are sectors written due to swapping. See the following table.

          • If we assume the lowest number of read sectors among the experiments to be the baseline, we can compare the experiments based on how much extra read sectors due to swapping they caused.


          Written sectors as a direct consequence of swapping (approx.):



          650000   1) swap file, zswap disabled
          320000 2) swap file, zswap enabled, max_pool_percent = 20
          30000 3) swap file, zswap enabled, max_pool_percent = 70
          40000 4) swap file, zswap enabled, max_pool_percent = 100
          0 5) zram swap, zswap disabled
          0 6) zram swap, zswap enabled, max_pool_percent = 20
          -20000 7) no swap (firefox crashed)
          620000 8) swap file, zswap enabled, max_pool_percent = 1
          -60000 9) swap file (300 M), zswap enabled, max_pool_percent = 100 (firefox crashed)


          Extra read sectors as a direct consequence of swapping (approx.):



              51792             1) swap file, zswap disabled
          354072 2) swap file, zswap enabled, max_pool_percent = 20
          6113640 3) swap file, zswap enabled, max_pool_percent = 70
          6125280 4) swap file, zswap enabled, max_pool_percent = 100
          250496 5) zram swap, zswap disabled
          1954808 6) zram swap, zswap enabled, max_pool_percent = 20
          61978240 7) no swap
          0 (baseline) 8) swap file, zswap enabled, max_pool_percent = 1
          401501136 9) swap file (300 M), zswap enabled, max_pool_percent = 100


          Interpretation of results:




          • This is subjective and also specific to the usecase at hand; behavior will vary in other usecases.

          • Zswap's page pool takes away space in RAM that can otherwise be used by system's page cache (for file-backed pages), which means that the system repeatedly throws away file-backed pages and reads them again when needed, resulting in excessive reads.

          • The high number of reads in experiment 7 is caused by the same problem - the system's anonymous pages took most of the RAM and file-backed pages had to be repeatedly read from disk.

          • It might be possible under certain circumstances to minimize the amount of data written to swap disk near zero using zswap but it is evidently not suited for this task.

          • It is not possible to have "completely compressed RAM" as the system needs a certain amount of non-swap pages to reside in RAM for operation.


          Personal opinions and anecdotes:




          • The main improvement of zswap in terms of disk writes is not the fact that it compresses the pages but the fact it has its own buffering & caching system that reduces the page cache and effectively keeps more anonymous pages (in compressed form) in RAM. (However, based on my subjective experience as I use Linux daily, a system with swap and zswap with the default values of swappiness and max_pool_percent always behaves better than any swappiness value and no zswap or zswap with high values of max_pool_percent.)

          • Low swappiness values seem to make the system behave better until the amount of page cache left is so small as to render the system unusable due to excessive disk reads. Similar with too high max_pool_percent.

          • Either use solely zram swap and limit the amount of anonymous pages you need to hold in memory, or use disk-backed swap with zswap with approximately default values for swappiness and max_pool_percent.


          EDIT:
          Possible future work to answer the finer points of your question would be to find out for your particular usecase how the the zsmalloc allocator used in zram compares compression-wise with the zbud allocator used in zswap. I'm not going to do that, though, just pointing out things to search for in docs/on the internet.



          EDIT 2:
          echo "zsmalloc" > /sys/module/zswap/parameters/zpool switches zswap's allocator from zbud to zsmalloc. Continuing with my test fixture for the above experiments and comparing zram with zswap+zsmalloc, it seems that as long as the swap memory needed is the same as either a zram swap or as zswap's max_pool_percent, the amount of reads and writes to disk is very similar between the two. In my personal opinion based on the facts, as long as the amount of zram swap I need is smaller than the amount of zram swap I can afford to actually keep in RAM, then it is best to use solely zram; and once I need more swap than I can actually keep in memory, it is best to either change my workload to avoid it or to disable zram swap and use zswap with zsmalloc and set max_pool_percent to the equivalent of what zram previously took in memory (size of zram * compression ratio). I currently don't have the time to do a proper writeup of these additional tests, though.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 27 '17 at 6:05

























          answered Nov 26 '17 at 17:04









          Jake FJake F

          1564




          1564













          • Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

            – Tom Hale
            Dec 2 '17 at 8:28











          • I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

            – Jake F
            Dec 2 '17 at 13:31











          • Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

            – Tom Hale
            Dec 2 '17 at 13:55











          • It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

            – Wil
            Feb 5 at 17:40



















          • Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

            – Tom Hale
            Dec 2 '17 at 8:28











          • I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

            – Jake F
            Dec 2 '17 at 13:31











          • Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

            – Tom Hale
            Dec 2 '17 at 13:55











          • It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

            – Wil
            Feb 5 at 17:40

















          Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

          – Tom Hale
          Dec 2 '17 at 8:28





          Welcome and thanks! P.S. you may want to look at zpool=z3fold as it allows 3 pages per compressed page, rather than 2.

          – Tom Hale
          Dec 2 '17 at 8:28













          I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

          – Jake F
          Dec 2 '17 at 13:31





          I did try z3fold and it slowed the computer tremendously while keeping the CPU load high, as compared to zsmalloc. Maybe because I didn't try it on the latest kernel that includes some crucial performance improvements to z3fold - elinux.org/images/d/d3/Z3fold.pdf slide 28.

          – Jake F
          Dec 2 '17 at 13:31













          Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

          – Tom Hale
          Dec 2 '17 at 13:55





          Ooh, perhaps its time to change to 4.14LTS... :) Your article says that it's not the best performing zpool.

          – Tom Hale
          Dec 2 '17 at 13:55













          It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

          – Wil
          Feb 5 at 17:40





          It's also possible to mix zram with disk-based swap without suffering LRU inversion. To achieve this, create 30 zram volumes at priority 5 and disk swap at priority 0. Write a cron job which runs swapoff / swapon for one of the volumes every 2 minutes. The zram will keep getting paged back into memory and the disk won't so LRU will eventually end up on the disk when they overflow the zram.

          – Wil
          Feb 5 at 17:40


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f406925%2fprevent-zram-lru-inversion-with-zswap-and-max-pool-percent-100%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to make a Squid Proxy server?

          Is this a new Fibonacci Identity?

          19世紀