What kind of metadata is primarily being loaded/evicted from ARC, in my ZFS system?
I'm trying to check how my ZFS pool's ARC is used, motivated by "do I need more RAM, even though expensive".
I have 128GB fast ECC and using NVMe SSDs for L2ARC, but it's still doing a lot of small IO read thrashing related to a mix of DDT/spacemaps. The system is specced for DDT and has about 3.5x ~ 4x dedup ratio, so please no "DDT is baad" replies, I know that, but need DDT, so I'm working to minimise the remaining read-thrash by ensuring at least that ddt/spacemap metadata is pretty much all retained in ARC once the system is warm.
The way I'm expecting RAM to be used is - DDT is about 35-40GB, and I've used sysctls to reserve 85GB of ARC for metadata. I've also set spacemap block size to a larger size, and defragged the pool (copied to new pool), which looks like it's helping a lot. But because I can't see metrics for how much of different types of metadata is loaded or evicted (ddt/spacemap/other), and there aren't tools to set ZFS ddt block size or preload DDT entries to ARC, I'm in the dark as to the exact impact, or whether more RAM will help, or other systematic ways to do better.
I've looked for solutions. zdb
, arc-stats
etc don't expose a breakdown of the metadata ARC situation, just a lump sum for all metadata.
Is there a straightforward way to get a sense what's going on, in order to assess whether more RAM will help, even if it's not precise, or to get some better sense (even if imprecise) of breakdown of the amounts of ddt/spacemap/"other" metadata MRU/MFU being loaded/cached/evicted?
performance freebsd zfs freenas dtrace
add a comment |
I'm trying to check how my ZFS pool's ARC is used, motivated by "do I need more RAM, even though expensive".
I have 128GB fast ECC and using NVMe SSDs for L2ARC, but it's still doing a lot of small IO read thrashing related to a mix of DDT/spacemaps. The system is specced for DDT and has about 3.5x ~ 4x dedup ratio, so please no "DDT is baad" replies, I know that, but need DDT, so I'm working to minimise the remaining read-thrash by ensuring at least that ddt/spacemap metadata is pretty much all retained in ARC once the system is warm.
The way I'm expecting RAM to be used is - DDT is about 35-40GB, and I've used sysctls to reserve 85GB of ARC for metadata. I've also set spacemap block size to a larger size, and defragged the pool (copied to new pool), which looks like it's helping a lot. But because I can't see metrics for how much of different types of metadata is loaded or evicted (ddt/spacemap/other), and there aren't tools to set ZFS ddt block size or preload DDT entries to ARC, I'm in the dark as to the exact impact, or whether more RAM will help, or other systematic ways to do better.
I've looked for solutions. zdb
, arc-stats
etc don't expose a breakdown of the metadata ARC situation, just a lump sum for all metadata.
Is there a straightforward way to get a sense what's going on, in order to assess whether more RAM will help, even if it's not precise, or to get some better sense (even if imprecise) of breakdown of the amounts of ddt/spacemap/"other" metadata MRU/MFU being loaded/cached/evicted?
performance freebsd zfs freenas dtrace
add a comment |
I'm trying to check how my ZFS pool's ARC is used, motivated by "do I need more RAM, even though expensive".
I have 128GB fast ECC and using NVMe SSDs for L2ARC, but it's still doing a lot of small IO read thrashing related to a mix of DDT/spacemaps. The system is specced for DDT and has about 3.5x ~ 4x dedup ratio, so please no "DDT is baad" replies, I know that, but need DDT, so I'm working to minimise the remaining read-thrash by ensuring at least that ddt/spacemap metadata is pretty much all retained in ARC once the system is warm.
The way I'm expecting RAM to be used is - DDT is about 35-40GB, and I've used sysctls to reserve 85GB of ARC for metadata. I've also set spacemap block size to a larger size, and defragged the pool (copied to new pool), which looks like it's helping a lot. But because I can't see metrics for how much of different types of metadata is loaded or evicted (ddt/spacemap/other), and there aren't tools to set ZFS ddt block size or preload DDT entries to ARC, I'm in the dark as to the exact impact, or whether more RAM will help, or other systematic ways to do better.
I've looked for solutions. zdb
, arc-stats
etc don't expose a breakdown of the metadata ARC situation, just a lump sum for all metadata.
Is there a straightforward way to get a sense what's going on, in order to assess whether more RAM will help, even if it's not precise, or to get some better sense (even if imprecise) of breakdown of the amounts of ddt/spacemap/"other" metadata MRU/MFU being loaded/cached/evicted?
performance freebsd zfs freenas dtrace
I'm trying to check how my ZFS pool's ARC is used, motivated by "do I need more RAM, even though expensive".
I have 128GB fast ECC and using NVMe SSDs for L2ARC, but it's still doing a lot of small IO read thrashing related to a mix of DDT/spacemaps. The system is specced for DDT and has about 3.5x ~ 4x dedup ratio, so please no "DDT is baad" replies, I know that, but need DDT, so I'm working to minimise the remaining read-thrash by ensuring at least that ddt/spacemap metadata is pretty much all retained in ARC once the system is warm.
The way I'm expecting RAM to be used is - DDT is about 35-40GB, and I've used sysctls to reserve 85GB of ARC for metadata. I've also set spacemap block size to a larger size, and defragged the pool (copied to new pool), which looks like it's helping a lot. But because I can't see metrics for how much of different types of metadata is loaded or evicted (ddt/spacemap/other), and there aren't tools to set ZFS ddt block size or preload DDT entries to ARC, I'm in the dark as to the exact impact, or whether more RAM will help, or other systematic ways to do better.
I've looked for solutions. zdb
, arc-stats
etc don't expose a breakdown of the metadata ARC situation, just a lump sum for all metadata.
Is there a straightforward way to get a sense what's going on, in order to assess whether more RAM will help, even if it's not precise, or to get some better sense (even if imprecise) of breakdown of the amounts of ddt/spacemap/"other" metadata MRU/MFU being loaded/cached/evicted?
performance freebsd zfs freenas dtrace
performance freebsd zfs freenas dtrace
asked Feb 12 at 12:39
StilezStilez
78011022
78011022
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I don’t think there’s any built-in tool like arcstats for this purpose, especially since you’re on FreeBSD, which I’m guessing doesn’t have mdb
(from illumos / Solaris).
The simplest solution would be giving a little bit more RAM and seeing if that helps. Of course, that costs money, but it might be less expensive than your time trying to figure out the answer (if you’re being paid for this work).
The next easiest thing to try would be fussing with the ARC memory limits while running some test workload. The impact these have on workloads is often unintuitive, so my recommendation would be to start with all default settings, then gradually change stuff to make it more complicated. It’s possible that e.g. when you tried to set the min memory reserved for metadata, you actually set the max by mistake, which would cause thrashing like you described — so make sure to read the “docs” for these settings carefully.
Finally, if you’re a bit intrepid and fairly confident in your ability to analyze complicated data to determine the answer to your question, you can also use DTrace for this. This probe is triggered each time a cache miss happens in the ARC:
DTRACE_PROBE4(arc__miss, arc_buf_hdr_t *, hdr, blkptr_t *, bp,
uint64_t, lsize, zbookmark_phys_t *, zb);
So, you could write a D script that listens for the :::arc_miss
probe, and use args[0]
, args[1]
, and / or the backtrace to figure out what type of requests are missing the cache.
I suspect the easiest way to go is looking at the type of the blockptr_t
in args[1]
. Unfortunately that’s somewhat annoying to extract because it’s part of a bitfield. The definition of the block pointer object can be found here, and you want your DTrace script to output the same thing that BP_GET_TYPE(args[1])
would output, and then interpret those values by comparing to values of dmu_object_type
from here.
Alternately, I can recommend a simpler script that’s potentially more involved to interpret. It would collect the backtrace every time the probe fires, and then you can post-process the traces to make a flame graph for easier interpretation. The method names in ZFS are pretty descriptive in general (at least, they all have acronyms, e.g. ddt
for “dedup table” that you can search online), so you can probably figure out what the callers are doing that way.
If there’s a ton of values showing up that aren’t file or directory data, you probably need to keep more metadata in cache. You can do that either by dedicating more space to it using the tunables, or by giving the machine more RAM.
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
add a comment |
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404834%2fwhat-kind-of-metadata-is-primarily-being-loaded-evicted-from-arc-in-my-zfs-syst%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I don’t think there’s any built-in tool like arcstats for this purpose, especially since you’re on FreeBSD, which I’m guessing doesn’t have mdb
(from illumos / Solaris).
The simplest solution would be giving a little bit more RAM and seeing if that helps. Of course, that costs money, but it might be less expensive than your time trying to figure out the answer (if you’re being paid for this work).
The next easiest thing to try would be fussing with the ARC memory limits while running some test workload. The impact these have on workloads is often unintuitive, so my recommendation would be to start with all default settings, then gradually change stuff to make it more complicated. It’s possible that e.g. when you tried to set the min memory reserved for metadata, you actually set the max by mistake, which would cause thrashing like you described — so make sure to read the “docs” for these settings carefully.
Finally, if you’re a bit intrepid and fairly confident in your ability to analyze complicated data to determine the answer to your question, you can also use DTrace for this. This probe is triggered each time a cache miss happens in the ARC:
DTRACE_PROBE4(arc__miss, arc_buf_hdr_t *, hdr, blkptr_t *, bp,
uint64_t, lsize, zbookmark_phys_t *, zb);
So, you could write a D script that listens for the :::arc_miss
probe, and use args[0]
, args[1]
, and / or the backtrace to figure out what type of requests are missing the cache.
I suspect the easiest way to go is looking at the type of the blockptr_t
in args[1]
. Unfortunately that’s somewhat annoying to extract because it’s part of a bitfield. The definition of the block pointer object can be found here, and you want your DTrace script to output the same thing that BP_GET_TYPE(args[1])
would output, and then interpret those values by comparing to values of dmu_object_type
from here.
Alternately, I can recommend a simpler script that’s potentially more involved to interpret. It would collect the backtrace every time the probe fires, and then you can post-process the traces to make a flame graph for easier interpretation. The method names in ZFS are pretty descriptive in general (at least, they all have acronyms, e.g. ddt
for “dedup table” that you can search online), so you can probably figure out what the callers are doing that way.
If there’s a ton of values showing up that aren’t file or directory data, you probably need to keep more metadata in cache. You can do that either by dedicating more space to it using the tunables, or by giving the machine more RAM.
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
add a comment |
I don’t think there’s any built-in tool like arcstats for this purpose, especially since you’re on FreeBSD, which I’m guessing doesn’t have mdb
(from illumos / Solaris).
The simplest solution would be giving a little bit more RAM and seeing if that helps. Of course, that costs money, but it might be less expensive than your time trying to figure out the answer (if you’re being paid for this work).
The next easiest thing to try would be fussing with the ARC memory limits while running some test workload. The impact these have on workloads is often unintuitive, so my recommendation would be to start with all default settings, then gradually change stuff to make it more complicated. It’s possible that e.g. when you tried to set the min memory reserved for metadata, you actually set the max by mistake, which would cause thrashing like you described — so make sure to read the “docs” for these settings carefully.
Finally, if you’re a bit intrepid and fairly confident in your ability to analyze complicated data to determine the answer to your question, you can also use DTrace for this. This probe is triggered each time a cache miss happens in the ARC:
DTRACE_PROBE4(arc__miss, arc_buf_hdr_t *, hdr, blkptr_t *, bp,
uint64_t, lsize, zbookmark_phys_t *, zb);
So, you could write a D script that listens for the :::arc_miss
probe, and use args[0]
, args[1]
, and / or the backtrace to figure out what type of requests are missing the cache.
I suspect the easiest way to go is looking at the type of the blockptr_t
in args[1]
. Unfortunately that’s somewhat annoying to extract because it’s part of a bitfield. The definition of the block pointer object can be found here, and you want your DTrace script to output the same thing that BP_GET_TYPE(args[1])
would output, and then interpret those values by comparing to values of dmu_object_type
from here.
Alternately, I can recommend a simpler script that’s potentially more involved to interpret. It would collect the backtrace every time the probe fires, and then you can post-process the traces to make a flame graph for easier interpretation. The method names in ZFS are pretty descriptive in general (at least, they all have acronyms, e.g. ddt
for “dedup table” that you can search online), so you can probably figure out what the callers are doing that way.
If there’s a ton of values showing up that aren’t file or directory data, you probably need to keep more metadata in cache. You can do that either by dedicating more space to it using the tunables, or by giving the machine more RAM.
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
add a comment |
I don’t think there’s any built-in tool like arcstats for this purpose, especially since you’re on FreeBSD, which I’m guessing doesn’t have mdb
(from illumos / Solaris).
The simplest solution would be giving a little bit more RAM and seeing if that helps. Of course, that costs money, but it might be less expensive than your time trying to figure out the answer (if you’re being paid for this work).
The next easiest thing to try would be fussing with the ARC memory limits while running some test workload. The impact these have on workloads is often unintuitive, so my recommendation would be to start with all default settings, then gradually change stuff to make it more complicated. It’s possible that e.g. when you tried to set the min memory reserved for metadata, you actually set the max by mistake, which would cause thrashing like you described — so make sure to read the “docs” for these settings carefully.
Finally, if you’re a bit intrepid and fairly confident in your ability to analyze complicated data to determine the answer to your question, you can also use DTrace for this. This probe is triggered each time a cache miss happens in the ARC:
DTRACE_PROBE4(arc__miss, arc_buf_hdr_t *, hdr, blkptr_t *, bp,
uint64_t, lsize, zbookmark_phys_t *, zb);
So, you could write a D script that listens for the :::arc_miss
probe, and use args[0]
, args[1]
, and / or the backtrace to figure out what type of requests are missing the cache.
I suspect the easiest way to go is looking at the type of the blockptr_t
in args[1]
. Unfortunately that’s somewhat annoying to extract because it’s part of a bitfield. The definition of the block pointer object can be found here, and you want your DTrace script to output the same thing that BP_GET_TYPE(args[1])
would output, and then interpret those values by comparing to values of dmu_object_type
from here.
Alternately, I can recommend a simpler script that’s potentially more involved to interpret. It would collect the backtrace every time the probe fires, and then you can post-process the traces to make a flame graph for easier interpretation. The method names in ZFS are pretty descriptive in general (at least, they all have acronyms, e.g. ddt
for “dedup table” that you can search online), so you can probably figure out what the callers are doing that way.
If there’s a ton of values showing up that aren’t file or directory data, you probably need to keep more metadata in cache. You can do that either by dedicating more space to it using the tunables, or by giving the machine more RAM.
I don’t think there’s any built-in tool like arcstats for this purpose, especially since you’re on FreeBSD, which I’m guessing doesn’t have mdb
(from illumos / Solaris).
The simplest solution would be giving a little bit more RAM and seeing if that helps. Of course, that costs money, but it might be less expensive than your time trying to figure out the answer (if you’re being paid for this work).
The next easiest thing to try would be fussing with the ARC memory limits while running some test workload. The impact these have on workloads is often unintuitive, so my recommendation would be to start with all default settings, then gradually change stuff to make it more complicated. It’s possible that e.g. when you tried to set the min memory reserved for metadata, you actually set the max by mistake, which would cause thrashing like you described — so make sure to read the “docs” for these settings carefully.
Finally, if you’re a bit intrepid and fairly confident in your ability to analyze complicated data to determine the answer to your question, you can also use DTrace for this. This probe is triggered each time a cache miss happens in the ARC:
DTRACE_PROBE4(arc__miss, arc_buf_hdr_t *, hdr, blkptr_t *, bp,
uint64_t, lsize, zbookmark_phys_t *, zb);
So, you could write a D script that listens for the :::arc_miss
probe, and use args[0]
, args[1]
, and / or the backtrace to figure out what type of requests are missing the cache.
I suspect the easiest way to go is looking at the type of the blockptr_t
in args[1]
. Unfortunately that’s somewhat annoying to extract because it’s part of a bitfield. The definition of the block pointer object can be found here, and you want your DTrace script to output the same thing that BP_GET_TYPE(args[1])
would output, and then interpret those values by comparing to values of dmu_object_type
from here.
Alternately, I can recommend a simpler script that’s potentially more involved to interpret. It would collect the backtrace every time the probe fires, and then you can post-process the traces to make a flame graph for easier interpretation. The method names in ZFS are pretty descriptive in general (at least, they all have acronyms, e.g. ddt
for “dedup table” that you can search online), so you can probably figure out what the callers are doing that way.
If there’s a ton of values showing up that aren’t file or directory data, you probably need to keep more metadata in cache. You can do that either by dedicating more space to it using the tunables, or by giving the machine more RAM.
edited Feb 13 at 13:39
answered Feb 13 at 13:27
DanDan
758314
758314
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
add a comment |
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
This is a fantastically promising answer. I'm happy to explore with dtrace (hence the tag). Rather than a long ramble in comments, would you be willing to join me in chat, to narrow it down and to ask a couple of technical questions from your reply? If so, I've set up a room at chat.stackexchange.com/rooms/89701/… and (time zones allowing) hope this is good for you too
– Stilez
Feb 13 at 23:09
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1404834%2fwhat-kind-of-metadata-is-primarily-being-loaded-evicted-from-arc-in-my-zfs-syst%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown