Useless test instruction?

I got the below assembly list as result for JIT compilation for my java program.

mov    0x14(%rsp),%r10d

inc    %r10d              



mov    0x1c(%rsp),%r8d

inc    %r8d               



test   %eax,(%r11)         ; <--- this instruction



mov    (%rsp),%r9

mov    0x40(%rsp),%r14d

mov    0x18(%rsp),%r11d

mov    %ebp,%r13d

mov    0x8(%rsp),%rbx

mov    0x20(%rsp),%rbp

mov    0x10(%rsp),%ecx

mov    0x28(%rsp),%rax    



movzbl 0x18(%r9),%edi     

movslq %r8d,%rsi          



cmp    0x30(%rsp),%rsi

jge    0x00007fd3d27c4f17

My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don't use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

edited 20 hours ago

Henrik Schumacher

1433

asked yesterday

QIvan

1586

2

This instruction does indeed seem useless.
– fuz
yesterday

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don't know, out of context.
– another-dave
yesterday

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it's a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn't hurt front-end throughput.
– Peter Cordes
yesterday

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths
– Peter Cordes
yesterday

@PeterCordes That's pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog's manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?
– liliscent
yesterday

|
show 4 more comments

I got the below assembly list as result for JIT compilation for my java program.

mov    0x14(%rsp),%r10d

inc    %r10d              



mov    0x1c(%rsp),%r8d

inc    %r8d               



test   %eax,(%r11)         ; <--- this instruction



mov    (%rsp),%r9

mov    0x40(%rsp),%r14d

mov    0x18(%rsp),%r11d

mov    %ebp,%r13d

mov    0x8(%rsp),%rbx

mov    0x20(%rsp),%rbp

mov    0x10(%rsp),%ecx

mov    0x28(%rsp),%rax    



movzbl 0x18(%r9),%edi     

movslq %r8d,%rsi          



cmp    0x30(%rsp),%rsi

jge    0x00007fd3d27c4f17

My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don't use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

edited 20 hours ago

Henrik Schumacher

1433

asked yesterday

QIvan

1586

2

This instruction does indeed seem useless.
– fuz
yesterday

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don't know, out of context.
– another-dave
yesterday

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it's a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn't hurt front-end throughput.
– Peter Cordes
yesterday

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths
– Peter Cordes
yesterday

@PeterCordes That's pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog's manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?
– liliscent
yesterday

|
show 4 more comments

I got the below assembly list as result for JIT compilation for my java program.

mov    0x14(%rsp),%r10d

inc    %r10d              



mov    0x1c(%rsp),%r8d

inc    %r8d               



test   %eax,(%r11)         ; <--- this instruction



mov    (%rsp),%r9

mov    0x40(%rsp),%r14d

mov    0x18(%rsp),%r11d

mov    %ebp,%r13d

mov    0x8(%rsp),%rbx

mov    0x20(%rsp),%rbp

mov    0x10(%rsp),%ecx

mov    0x28(%rsp),%rax    



movzbl 0x18(%r9),%edi     

movslq %r8d,%rsi          



cmp    0x30(%rsp),%rsi

jge    0x00007fd3d27c4f17

My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don't use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

edited 20 hours ago

Henrik Schumacher

1433

asked yesterday

QIvan

1586

I got the below assembly list as result for JIT compilation for my java program.

mov    0x14(%rsp),%r10d

inc    %r10d              



mov    0x1c(%rsp),%r8d

inc    %r8d               



test   %eax,(%r11)         ; <--- this instruction



mov    (%rsp),%r9

mov    0x40(%rsp),%r14d

mov    0x18(%rsp),%r11d

mov    %ebp,%r13d

mov    0x8(%rsp),%rbx

mov    0x20(%rsp),%rbp

mov    0x10(%rsp),%ecx

mov    0x28(%rsp),%rax    



movzbl 0x18(%r9),%edi     

movslq %r8d,%rsi          



cmp    0x30(%rsp),%rsi

jge    0x00007fd3d27c4f17

My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don't use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

java assembly jvm jit jvm-hotspot

edited 20 hours ago

Henrik Schumacher

1433

asked yesterday

QIvan

1586

edited 20 hours ago

Henrik Schumacher

1433

asked yesterday

QIvan

1586

edited 20 hours ago

Henrik Schumacher

1433

edited 20 hours ago

Henrik Schumacher

1433

edited 20 hours ago

Henrik Schumacher

1433

asked yesterday

QIvan

1586

asked yesterday

QIvan

1586

asked yesterday

QIvan

1586

2

This instruction does indeed seem useless.
– fuz
yesterday

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don't know, out of context.
– another-dave
yesterday

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it's a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn't hurt front-end throughput.
– Peter Cordes
yesterday

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths
– Peter Cordes
yesterday

@PeterCordes That's pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog's manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?
– liliscent
yesterday

|
show 4 more comments

2

This instruction does indeed seem useless.
– fuz
yesterday

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don't know, out of context.
– another-dave
yesterday

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it's a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn't hurt front-end throughput.
– Peter Cordes
yesterday

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths
– Peter Cordes
yesterday

@PeterCordes That's pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog's manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?
– liliscent
yesterday

This instruction does indeed seem useless.
– fuz
yesterday

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don't know, out of context.
– another-dave
yesterday

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it's a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn't hurt front-end throughput.
– Peter Cordes
yesterday

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths
– Peter Cordes
yesterday

@PeterCordes That's pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog's manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?
– liliscent
yesterday

|
show 4 more comments

1 Answer
1

active

oldest

votes

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that's the guy. See the example here:

  0.31%  ↗  ...70: movzbl 0x94(%r9),%r10d    

  0.19%  │  ...78: mov    0x108(%r15),%r11  ; read the thread-local page addr

 25.62%  │  ...7f: add    $0x1,%rbp          

 35.10%  │  ...83: test   %eax,(%r11)       ; thread-local handshake poll

 34.91%  │  ...86: test   %r10d,%r10d

         ╰  ...89: je     ...70

It is not useless, it would cause SEGV once the guard page is marked non-readable, and that would transfer control to JVM's SEGV handler. This is part of JVM's mechanics to safepoint Java threads, e.g. for GC.

UPD: Hopefully, more details here.

edited yesterday

answered yesterday

Aleksey Shipilev

13.8k23769

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54054782%2fuseless-test-instruction%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that's the guy. See the example here:

  0.31%  ↗  ...70: movzbl 0x94(%r9),%r10d    

  0.19%  │  ...78: mov    0x108(%r15),%r11  ; read the thread-local page addr

 25.62%  │  ...7f: add    $0x1,%rbp          

 35.10%  │  ...83: test   %eax,(%r11)       ; thread-local handshake poll

 34.91%  │  ...86: test   %r10d,%r10d

         ╰  ...89: je     ...70

UPD: Hopefully, more details here.

edited yesterday

answered yesterday

Aleksey Shipilev

13.8k23769

add a comment |

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that's the guy. See the example here:

  0.31%  ↗  ...70: movzbl 0x94(%r9),%r10d    

  0.19%  │  ...78: mov    0x108(%r15),%r11  ; read the thread-local page addr

 25.62%  │  ...7f: add    $0x1,%rbp          

 35.10%  │  ...83: test   %eax,(%r11)       ; thread-local handshake poll

 34.91%  │  ...86: test   %r10d,%r10d

         ╰  ...89: je     ...70

UPD: Hopefully, more details here.

edited yesterday

answered yesterday

Aleksey Shipilev

13.8k23769

add a comment |

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that's the guy. See the example here:

  0.31%  ↗  ...70: movzbl 0x94(%r9),%r10d    

  0.19%  │  ...78: mov    0x108(%r15),%r11  ; read the thread-local page addr

 25.62%  │  ...7f: add    $0x1,%rbp          

 35.10%  │  ...83: test   %eax,(%r11)       ; thread-local handshake poll

 34.91%  │  ...86: test   %r10d,%r10d

         ╰  ...89: je     ...70

UPD: Hopefully, more details here.

edited yesterday

answered yesterday

Aleksey Shipilev

13.8k23769

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that's the guy. See the example here:

  0.31%  ↗  ...70: movzbl 0x94(%r9),%r10d    

  0.19%  │  ...78: mov    0x108(%r15),%r11  ; read the thread-local page addr

 25.62%  │  ...7f: add    $0x1,%rbp          

 35.10%  │  ...83: test   %eax,(%r11)       ; thread-local handshake poll

 34.91%  │  ...86: test   %r10d,%r10d

         ╰  ...89: je     ...70

UPD: Hopefully, more details here.

edited yesterday

answered yesterday

Aleksey Shipilev

13.8k23769

edited yesterday

answered yesterday

Aleksey Shipilev

13.8k23769

answered yesterday

Aleksey Shipilev

13.8k23769

answered yesterday

Aleksey Shipilev

13.8k23769

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly