Jumphost suddenly reseting first SSH MUX connection attempts

I have been using a Debian 9 SSH jumpbox host to run my scripts/ansible playbooks for a while. The jumbox talks with Debian 9 and some Debian 8 servers, mostly. Most of the servers are VMs running under VMWare Enterprise 5.5.

The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.

The SSH has been working well for years now, however suddenly SSH connections started giving the error ssh_exchange_identification: read: Connection reset by peer at first try, several times a day, which obviously creates havoc with my scripts and scripts of our development team.

However, after the first try they are ok for a while. The servers misbehaving appear be random at first, but they have some patterns/timeouts. If I do send a command to all of the servers, for instance, running in a command before the intended script/playbook, a few will fail, but the next script will run in all of them.

There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.

I already found a MTU configuration or other that was once applied to several servers in a malfunction and forgotten, however that was not the case. I also diminished both on the client and server side the ControlPersist and ClientAliveInterval both to 1h, and that did not improve the situation.

So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.

The SSH configuration on the client side /etc/ssh_config, Debian 9 is:

Host *

SendEnv LANG LC_*

HashKnownHosts yes

GSSAPIAuthentication yes

GSSAPIDelegateCredentials no

ControlMaster auto

ControlPath /tmp/ssh_mux_%h_%p_%r

ControlPersist 1h

Compression no

UseRoaming no

On SSH on the server side of several Debian servers:

Protocol 2

HostKey /etc/ssh/ssh_host_rsa_key

HostKey /etc/ssh/ssh_host_dsa_key

UsePrivilegeSeparation yes



SyslogFacility AUTH

LogLevel INFO

LoginGraceTime 120

PermitRootLogin forced-commands-only 

StrictModes yes

PubkeyAuthentication yes

IgnoreRhosts yes

HostbasedAuthentication no

PermitEmptyPasswords no

ChallengeResponseAuthentication no

PasswordAuthentication no



X11Forwarding no

X11DisplayOffset 10

PrintMotd no

PrintLastLog yes

TCPKeepAlive yes



AcceptEnv LANG LC_*

Subsystem sftp /usr/lib/openssh/sftp-server -l INFO

UsePAM yes

ClientAliveInterval 3600

ClientAliveCountMax 0

AddressFamily inet

SSH versions:

client -

$ssh -V 

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

server(s)

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)

SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3  (Debian 8)

I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.

The tcpdump of a server misbehaving, both machines being in the same network without any firewalls in the middle. I also monitor MAC addresses and there is not log of a new machine/MAC with the same MAC addresses in the last few years.

#tcpdump port 22

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0

19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0

19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0

19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0

19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39

19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

An ssh -v server log of a failed connection, with the reset error:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist

debug1: Connecting to fenix-storage [10.10.32.156] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

write: Connection reset by peer

An ssh -v server of a successful connection:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist

debug1: Connecting to sql01 [10.20.10.88] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1

debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1

debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000

debug1: Authenticating to sql01:22 as 'rui'

debug1: SSH2_MSG_KEXINIT sent

debug1: SSH2_MSG_KEXINIT received

debug1: kex: algorithm: curve25519-sha256

debug1: kex: host key algorithm: rsa-sha2-512

debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4

debug1: Host 'sql01' is known and matches the RSA host key.

debug1: Found key in /home/rui/.ssh/known_hosts:315

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_NEWKEYS sent

debug1: expecting SSH2_MSG_NEWKEYS

debug1: SSH2_MSG_NEWKEYS received

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_EXT_INFO received

debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>

debug1: SSH2_MSG_SERVICE_ACCEPT received

debug1: Authentications that can continue: publickey

debug1: Next authentication method: publickey

debug1: Offering RSA public key: /home/rui/.ssh/id_rsa

debug1: Server accepts key: pkalg ssh-rsa blen 277

debug1: Authentication succeeded (publickey).

Authenticated to sql01 ([10.20.10.88]:22).

debug1: setting up multiplex master socket

debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]

debug1: control_persist_detach: backgrounding master process

debug1: forking to background

debug1: Entering interactive session.

debug1: pledge: id

debug1: multiplexing control connection

debug1: channel 1: new [mux-control]

debug1: channel 2: new [client-session]

debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0

debug1: Sending environment.

debug1: Sending env LC_ALL = en_US.utf8

debug1: Sending env LANG = en_US.UTF-8

debug1: mux_client_request_session: master session id: 2

Interestingly enough, the behaviour can be reproduced with a telnet command:

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

Connection closed by foreign host.

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1



Protocol mismatch.

Connection closed by foreign host.

UPDATE:

Forced Protocol 2 in the /etc/ssh_client client configuration in the jumpbox. No change.

UPDATE2:

Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.

UPDATE3:

Interestingly enough, while the mux is active, the situation does not presents itself.

UPDATE4:

I also have found a similar question at serverfault, however without a chosen answer: https://serverfault.com/questions/445045/ssh-connection-error-ssh-exchange-identification-read-connection-reset-by-pe

Tried regenerating the ssh host keys, and the suggestion of sshd: ALL without success.

UPDATE 5

Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.

# tcpdump -n -vvv "host 1.1.1.1"

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0

11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0

11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0

11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40

11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0

11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39

11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0

11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46

11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28

11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28

11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46

^C

11 packets captured

11 packets received by filter

0 packets dropped by kernel

UPDATE 6

Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.

$sudo ethtool -K eth0 rx off

$sudo ethtool -K eth0 tx off



iface eth0 inet static

   address 1.1.1.2

   netmask 255.255.255.0

   network 1.1.1.0

   broadcast 1.11.1.255

   gateway 1.1.1.254

   post-up /sbin/ethtool -K $IFACE rx off

   post-up /sbin/ethtool -K $IFACE tx off

Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)

UPDATE 7

Disabled GSSAPIAuthentication in the ssh client in the jumpbox. Tested Enable Compression yes No change.

UPDATE 8

Testing filling up the checksum with iptables.

/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill

It did not improve the situation.

UPDATE 9:

Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.

For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.

The mysterious case of broken SSH client (“connection reset by peer”)

UPDATE 10

Added this to /etc/ssh/ssh_config. No changes.

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc

SSH issues: Read from socket failed: Connection reset by peer

UPDATE 11

Defined the ssh service in port 22 and port 2222. It did not help.

UPDATE 12

I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5

Release notes from OpenSSH 7.5

sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680

For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client and openssh-server from Debian testing/Buster.

No improvements on the situation.

UPDATE 13

Defined

Ciphers aes256-ctr
MACs hmac-sha1

Both at the client(s) and server side. No improvements.

UPDATE 14

Setup

UseDNS no

GSSAPIAuthentication no

GSSAPIKeyExchange no

No change.

UPDATE 15

/etc/ssh/sshd_config

Changed it to /etc/ssh/sshd_config:

TCPKeepAlive no

From How does tcp-keepalive work in ssh?

TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.

My guess is that TCPKeepAlive was configuring the server sending a packet that is being optimised/ignored in some layer down the stack bellow, and somewhat the remote SSH server believed it was still connected to the TCP mux client, while in fact the session was already teared down; thus the TCP reset(s) at first try.

So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.

It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.

TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.

Nevertheless, still with the problem.

edited Oct 10 '17 at 13:02

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.

– Satō Katsura
Sep 8 '17 at 10:01

@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall

– Rui F Ribeiro
Sep 8 '17 at 11:24

Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug

– Satō Katsura
Sep 8 '17 at 11:33

@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random

– Rui F Ribeiro
Sep 8 '17 at 11:37

add a comment |

The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.

There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.

So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.

The SSH configuration on the client side /etc/ssh_config, Debian 9 is:

Host *

SendEnv LANG LC_*

HashKnownHosts yes

GSSAPIAuthentication yes

GSSAPIDelegateCredentials no

ControlMaster auto

ControlPath /tmp/ssh_mux_%h_%p_%r

ControlPersist 1h

Compression no

UseRoaming no

On SSH on the server side of several Debian servers:

Protocol 2

HostKey /etc/ssh/ssh_host_rsa_key

HostKey /etc/ssh/ssh_host_dsa_key

UsePrivilegeSeparation yes



SyslogFacility AUTH

LogLevel INFO

LoginGraceTime 120

PermitRootLogin forced-commands-only 

StrictModes yes

PubkeyAuthentication yes

IgnoreRhosts yes

HostbasedAuthentication no

PermitEmptyPasswords no

ChallengeResponseAuthentication no

PasswordAuthentication no



X11Forwarding no

X11DisplayOffset 10

PrintMotd no

PrintLastLog yes

TCPKeepAlive yes



AcceptEnv LANG LC_*

Subsystem sftp /usr/lib/openssh/sftp-server -l INFO

UsePAM yes

ClientAliveInterval 3600

ClientAliveCountMax 0

AddressFamily inet

SSH versions:

client -

$ssh -V 

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

server(s)

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)

SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3  (Debian 8)

I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.

#tcpdump port 22

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0

19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0

19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0

19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0

19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39

19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

An ssh -v server log of a failed connection, with the reset error:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist

debug1: Connecting to fenix-storage [10.10.32.156] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

write: Connection reset by peer

An ssh -v server of a successful connection:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist

debug1: Connecting to sql01 [10.20.10.88] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1

debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1

debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000

debug1: Authenticating to sql01:22 as 'rui'

debug1: SSH2_MSG_KEXINIT sent

debug1: SSH2_MSG_KEXINIT received

debug1: kex: algorithm: curve25519-sha256

debug1: kex: host key algorithm: rsa-sha2-512

debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4

debug1: Host 'sql01' is known and matches the RSA host key.

debug1: Found key in /home/rui/.ssh/known_hosts:315

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_NEWKEYS sent

debug1: expecting SSH2_MSG_NEWKEYS

debug1: SSH2_MSG_NEWKEYS received

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_EXT_INFO received

debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>

debug1: SSH2_MSG_SERVICE_ACCEPT received

debug1: Authentications that can continue: publickey

debug1: Next authentication method: publickey

debug1: Offering RSA public key: /home/rui/.ssh/id_rsa

debug1: Server accepts key: pkalg ssh-rsa blen 277

debug1: Authentication succeeded (publickey).

Authenticated to sql01 ([10.20.10.88]:22).

debug1: setting up multiplex master socket

debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]

debug1: control_persist_detach: backgrounding master process

debug1: forking to background

debug1: Entering interactive session.

debug1: pledge: id

debug1: multiplexing control connection

debug1: channel 1: new [mux-control]

debug1: channel 2: new [client-session]

debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0

debug1: Sending environment.

debug1: Sending env LC_ALL = en_US.utf8

debug1: Sending env LANG = en_US.UTF-8

debug1: mux_client_request_session: master session id: 2

Interestingly enough, the behaviour can be reproduced with a telnet command:

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

Connection closed by foreign host.

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1



Protocol mismatch.

Connection closed by foreign host.

UPDATE:

Forced Protocol 2 in the /etc/ssh_client client configuration in the jumpbox. No change.

UPDATE2:

Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.

UPDATE3:

Interestingly enough, while the mux is active, the situation does not presents itself.

UPDATE4:

Tried regenerating the ssh host keys, and the suggestion of sshd: ALL without success.

UPDATE 5

Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.

# tcpdump -n -vvv "host 1.1.1.1"

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0

11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0

11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0

11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40

11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0

11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39

11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0

11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46

11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28

11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28

11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46

^C

11 packets captured

11 packets received by filter

0 packets dropped by kernel

UPDATE 6

Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.

$sudo ethtool -K eth0 rx off

$sudo ethtool -K eth0 tx off



iface eth0 inet static

   address 1.1.1.2

   netmask 255.255.255.0

   network 1.1.1.0

   broadcast 1.11.1.255

   gateway 1.1.1.254

   post-up /sbin/ethtool -K $IFACE rx off

   post-up /sbin/ethtool -K $IFACE tx off

Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)

UPDATE 7

Disabled GSSAPIAuthentication in the ssh client in the jumpbox. Tested Enable Compression yes No change.

UPDATE 8

Testing filling up the checksum with iptables.

/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill

It did not improve the situation.

UPDATE 9:

Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.

For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.

The mysterious case of broken SSH client (“connection reset by peer”)

UPDATE 10

Added this to /etc/ssh/ssh_config. No changes.

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc

SSH issues: Read from socket failed: Connection reset by peer

UPDATE 11

Defined the ssh service in port 22 and port 2222. It did not help.

UPDATE 12

I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5

Release notes from OpenSSH 7.5

sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680

For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client and openssh-server from Debian testing/Buster.

No improvements on the situation.

UPDATE 13

Defined

Ciphers aes256-ctr
MACs hmac-sha1

Both at the client(s) and server side. No improvements.

UPDATE 14

Setup

UseDNS no

GSSAPIAuthentication no

GSSAPIKeyExchange no

No change.

UPDATE 15

/etc/ssh/sshd_config

Changed it to /etc/ssh/sshd_config:

TCPKeepAlive no

From How does tcp-keepalive work in ssh?

TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.

So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.

It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.

TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.

Nevertheless, still with the problem.

edited Oct 10 '17 at 13:02

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.

– Satō Katsura
Sep 8 '17 at 10:01

@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall

– Rui F Ribeiro
Sep 8 '17 at 11:24

Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug

– Satō Katsura
Sep 8 '17 at 11:33

@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random

– Rui F Ribeiro
Sep 8 '17 at 11:37

add a comment |

The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.

There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.

So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.

The SSH configuration on the client side /etc/ssh_config, Debian 9 is:

Host *

SendEnv LANG LC_*

HashKnownHosts yes

GSSAPIAuthentication yes

GSSAPIDelegateCredentials no

ControlMaster auto

ControlPath /tmp/ssh_mux_%h_%p_%r

ControlPersist 1h

Compression no

UseRoaming no

On SSH on the server side of several Debian servers:

Protocol 2

HostKey /etc/ssh/ssh_host_rsa_key

HostKey /etc/ssh/ssh_host_dsa_key

UsePrivilegeSeparation yes



SyslogFacility AUTH

LogLevel INFO

LoginGraceTime 120

PermitRootLogin forced-commands-only 

StrictModes yes

PubkeyAuthentication yes

IgnoreRhosts yes

HostbasedAuthentication no

PermitEmptyPasswords no

ChallengeResponseAuthentication no

PasswordAuthentication no



X11Forwarding no

X11DisplayOffset 10

PrintMotd no

PrintLastLog yes

TCPKeepAlive yes



AcceptEnv LANG LC_*

Subsystem sftp /usr/lib/openssh/sftp-server -l INFO

UsePAM yes

ClientAliveInterval 3600

ClientAliveCountMax 0

AddressFamily inet

SSH versions:

client -

$ssh -V 

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

server(s)

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)

SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3  (Debian 8)

I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.

#tcpdump port 22

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0

19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0

19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0

19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0

19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39

19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

An ssh -v server log of a failed connection, with the reset error:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist

debug1: Connecting to fenix-storage [10.10.32.156] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

write: Connection reset by peer

An ssh -v server of a successful connection:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist

debug1: Connecting to sql01 [10.20.10.88] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1

debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1

debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000

debug1: Authenticating to sql01:22 as 'rui'

debug1: SSH2_MSG_KEXINIT sent

debug1: SSH2_MSG_KEXINIT received

debug1: kex: algorithm: curve25519-sha256

debug1: kex: host key algorithm: rsa-sha2-512

debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4

debug1: Host 'sql01' is known and matches the RSA host key.

debug1: Found key in /home/rui/.ssh/known_hosts:315

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_NEWKEYS sent

debug1: expecting SSH2_MSG_NEWKEYS

debug1: SSH2_MSG_NEWKEYS received

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_EXT_INFO received

debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>

debug1: SSH2_MSG_SERVICE_ACCEPT received

debug1: Authentications that can continue: publickey

debug1: Next authentication method: publickey

debug1: Offering RSA public key: /home/rui/.ssh/id_rsa

debug1: Server accepts key: pkalg ssh-rsa blen 277

debug1: Authentication succeeded (publickey).

Authenticated to sql01 ([10.20.10.88]:22).

debug1: setting up multiplex master socket

debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]

debug1: control_persist_detach: backgrounding master process

debug1: forking to background

debug1: Entering interactive session.

debug1: pledge: id

debug1: multiplexing control connection

debug1: channel 1: new [mux-control]

debug1: channel 2: new [client-session]

debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0

debug1: Sending environment.

debug1: Sending env LC_ALL = en_US.utf8

debug1: Sending env LANG = en_US.UTF-8

debug1: mux_client_request_session: master session id: 2

Interestingly enough, the behaviour can be reproduced with a telnet command:

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

Connection closed by foreign host.

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1



Protocol mismatch.

Connection closed by foreign host.

UPDATE:

Forced Protocol 2 in the /etc/ssh_client client configuration in the jumpbox. No change.

UPDATE2:

Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.

UPDATE3:

Interestingly enough, while the mux is active, the situation does not presents itself.

UPDATE4:

Tried regenerating the ssh host keys, and the suggestion of sshd: ALL without success.

UPDATE 5

Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.

# tcpdump -n -vvv "host 1.1.1.1"

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0

11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0

11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0

11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40

11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0

11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39

11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0

11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46

11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28

11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28

11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46

^C

11 packets captured

11 packets received by filter

0 packets dropped by kernel

UPDATE 6

Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.

$sudo ethtool -K eth0 rx off

$sudo ethtool -K eth0 tx off



iface eth0 inet static

   address 1.1.1.2

   netmask 255.255.255.0

   network 1.1.1.0

   broadcast 1.11.1.255

   gateway 1.1.1.254

   post-up /sbin/ethtool -K $IFACE rx off

   post-up /sbin/ethtool -K $IFACE tx off

Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)

UPDATE 7

Disabled GSSAPIAuthentication in the ssh client in the jumpbox. Tested Enable Compression yes No change.

UPDATE 8

Testing filling up the checksum with iptables.

/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill

It did not improve the situation.

UPDATE 9:

Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.

For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.

The mysterious case of broken SSH client (“connection reset by peer”)

UPDATE 10

Added this to /etc/ssh/ssh_config. No changes.

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc

SSH issues: Read from socket failed: Connection reset by peer

UPDATE 11

Defined the ssh service in port 22 and port 2222. It did not help.

UPDATE 12

I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5

Release notes from OpenSSH 7.5

sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680

For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client and openssh-server from Debian testing/Buster.

No improvements on the situation.

UPDATE 13

Defined

Ciphers aes256-ctr
MACs hmac-sha1

Both at the client(s) and server side. No improvements.

UPDATE 14

Setup

UseDNS no

GSSAPIAuthentication no

GSSAPIKeyExchange no

No change.

UPDATE 15

/etc/ssh/sshd_config

Changed it to /etc/ssh/sshd_config:

TCPKeepAlive no

From How does tcp-keepalive work in ssh?

TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.

So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.

It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.

TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.

Nevertheless, still with the problem.

edited Oct 10 '17 at 13:02

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.

There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.

So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.

The SSH configuration on the client side /etc/ssh_config, Debian 9 is:

Host *

SendEnv LANG LC_*

HashKnownHosts yes

GSSAPIAuthentication yes

GSSAPIDelegateCredentials no

ControlMaster auto

ControlPath /tmp/ssh_mux_%h_%p_%r

ControlPersist 1h

Compression no

UseRoaming no

On SSH on the server side of several Debian servers:

Protocol 2

HostKey /etc/ssh/ssh_host_rsa_key

HostKey /etc/ssh/ssh_host_dsa_key

UsePrivilegeSeparation yes



SyslogFacility AUTH

LogLevel INFO

LoginGraceTime 120

PermitRootLogin forced-commands-only 

StrictModes yes

PubkeyAuthentication yes

IgnoreRhosts yes

HostbasedAuthentication no

PermitEmptyPasswords no

ChallengeResponseAuthentication no

PasswordAuthentication no



X11Forwarding no

X11DisplayOffset 10

PrintMotd no

PrintLastLog yes

TCPKeepAlive yes



AcceptEnv LANG LC_*

Subsystem sftp /usr/lib/openssh/sftp-server -l INFO

UsePAM yes

ClientAliveInterval 3600

ClientAliveCountMax 0

AddressFamily inet

SSH versions:

client -

$ssh -V 

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

server(s)

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)

SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3  (Debian 8)

I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.

#tcpdump port 22

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0

19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0

19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0

19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0

19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39

19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0

19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0

An ssh -v server log of a failed connection, with the reset error:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist

debug1: Connecting to fenix-storage [10.10.32.156] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

write: Connection reset by peer

An ssh -v server of a successful connection:

OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l  25 May 2017

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 19: Applying options for *

debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"

debug1: auto-mux: Trying existing master

debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist

debug1: Connecting to sql01 [10.20.10.88] port 22.

debug1: Connection established.

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_rsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_dsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519 type -1

debug1: key_load_public: No such file or directory

debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1

debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1

debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000

debug1: Authenticating to sql01:22 as 'rui'

debug1: SSH2_MSG_KEXINIT sent

debug1: SSH2_MSG_KEXINIT received

debug1: kex: algorithm: curve25519-sha256

debug1: kex: host key algorithm: rsa-sha2-512

debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4

debug1: Host 'sql01' is known and matches the RSA host key.

debug1: Found key in /home/rui/.ssh/known_hosts:315

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_NEWKEYS sent

debug1: expecting SSH2_MSG_NEWKEYS

debug1: SSH2_MSG_NEWKEYS received

debug1: rekey after 134217728 blocks

debug1: SSH2_MSG_EXT_INFO received

debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>

debug1: SSH2_MSG_SERVICE_ACCEPT received

debug1: Authentications that can continue: publickey

debug1: Next authentication method: publickey

debug1: Offering RSA public key: /home/rui/.ssh/id_rsa

debug1: Server accepts key: pkalg ssh-rsa blen 277

debug1: Authentication succeeded (publickey).

Authenticated to sql01 ([10.20.10.88]:22).

debug1: setting up multiplex master socket

debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]

debug1: control_persist_detach: backgrounding master process

debug1: forking to background

debug1: Entering interactive session.

debug1: pledge: id

debug1: multiplexing control connection

debug1: channel 1: new [mux-control]

debug1: channel 2: new [client-session]

debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0

debug1: Sending environment.

debug1: Sending env LC_ALL = en_US.utf8

debug1: Sending env LANG = en_US.UTF-8

debug1: mux_client_request_session: master session id: 2

Interestingly enough, the behaviour can be reproduced with a telnet command:

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

Connection closed by foreign host.

$ telnet remote-server 22

Trying x.x.x.x...

Connected to remote-server

Escape character is '^]'.

SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1



Protocol mismatch.

Connection closed by foreign host.

UPDATE:

Forced Protocol 2 in the /etc/ssh_client client configuration in the jumpbox. No change.

UPDATE2:

Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.

UPDATE3:

Interestingly enough, while the mux is active, the situation does not presents itself.

UPDATE4:

Tried regenerating the ssh host keys, and the suggestion of sshd: ALL without success.

UPDATE 5

Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.

# tcpdump -n -vvv "host 1.1.1.1"

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0

11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0

11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0

11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40

11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0

11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)

    1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39

11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)

    1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0

11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46

11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28

11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28

11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46

^C

11 packets captured

11 packets received by filter

0 packets dropped by kernel

UPDATE 6

Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.

$sudo ethtool -K eth0 rx off

$sudo ethtool -K eth0 tx off



iface eth0 inet static

   address 1.1.1.2

   netmask 255.255.255.0

   network 1.1.1.0

   broadcast 1.11.1.255

   gateway 1.1.1.254

   post-up /sbin/ethtool -K $IFACE rx off

   post-up /sbin/ethtool -K $IFACE tx off

Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)

UPDATE 7

Disabled GSSAPIAuthentication in the ssh client in the jumpbox. Tested Enable Compression yes No change.

UPDATE 8

Testing filling up the checksum with iptables.

/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill

It did not improve the situation.

UPDATE 9:

Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.

For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.

The mysterious case of broken SSH client (“connection reset by peer”)

UPDATE 10

Added this to /etc/ssh/ssh_config. No changes.

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc

SSH issues: Read from socket failed: Connection reset by peer

UPDATE 11

Defined the ssh service in port 22 and port 2222. It did not help.

UPDATE 12

I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5

Release notes from OpenSSH 7.5

sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680

For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client and openssh-server from Debian testing/Buster.

No improvements on the situation.

UPDATE 13

Defined

Ciphers aes256-ctr
MACs hmac-sha1

Both at the client(s) and server side. No improvements.

UPDATE 14

Setup

UseDNS no

GSSAPIAuthentication no

GSSAPIKeyExchange no

No change.

UPDATE 15

/etc/ssh/sshd_config

Changed it to /etc/ssh/sshd_config:

TCPKeepAlive no

From How does tcp-keepalive work in ssh?

TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.

So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.

It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.

TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.

Nevertheless, still with the problem.

debian ssh vmware

edited Oct 10 '17 at 13:02

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

edited Oct 10 '17 at 13:02

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

edited Oct 10 '17 at 13:02

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

asked Sep 8 '17 at 9:35

Rui F Ribeiro

41.7k1483142

The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.

– Satō Katsura
Sep 8 '17 at 10:01

@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall

– Rui F Ribeiro
Sep 8 '17 at 11:24

Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug

– Satō Katsura
Sep 8 '17 at 11:33

@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random

– Rui F Ribeiro
Sep 8 '17 at 11:37

add a comment |

The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.

– Satō Katsura
Sep 8 '17 at 10:01

@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall

– Rui F Ribeiro
Sep 8 '17 at 11:24

Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug

– Satō Katsura
Sep 8 '17 at 11:33

@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random

– Rui F Ribeiro
Sep 8 '17 at 11:37

The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.

– Satō Katsura
Sep 8 '17 at 10:01

@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall

– Rui F Ribeiro
Sep 8 '17 at 11:24

Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug

– Satō Katsura
Sep 8 '17 at 11:33

@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random

– Rui F Ribeiro
Sep 8 '17 at 11:37

add a comment |

4 Answers
4

active

oldest

votes

Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.

answered Sep 9 '17 at 6:57

user1998586

1613

I actually monitor MAC addresses on that netblock, and it seems not to be the case.

– Rui F Ribeiro
Sep 9 '17 at 8:20

add a comment |

Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.

answered Oct 4 '18 at 17:41

Yusufk

1366

Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.

– Rui F Ribeiro
Oct 25 '18 at 16:59

add a comment |

Found some systems with

net.ipv4.tcp_timestamps = 0

in /etc/sysctl.conf ; the servers having the problem all have that enabled.

I ended up taking this line from the affected systems and running in all systems:

sudo sysctl -w net.ipv4.tcp_timestamps=1

Waiting for further tests.

edited Oct 10 '17 at 13:26

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

add a comment |

In the end, found out it was due to bugs in the Cisco 6059 FWSM core router and the ASA firewall being used.

The Linux kernel v3 and v4 does not play well with TCP Sequence Randomization, and gives "random" problems on transferring big files, or other kind of obscure problems in many connections, of which SSH were more visible. Unfortunately, Windows, Mac and FreeBSD do play well, so it can be somewhat quoted as a Linux bug.

It was a quite nasty situation as we had complaints about people not being able to download random files from our sites.

Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.

Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.

You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:

If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.

I initially disabled Cisco Randomization in the internal core router, it was not enough. After Cisco Randomization was disabled both in the border firewalls and core Cisco router/switch, the problem stopped happening.

For disabling it, it is something similar to:

policy-map global_policy

    class preserve-sq-no

        set connection random-sequence-number disable

See Cisco note Disable TCP Sequence Randomization

I also found an unrelated XLATE bug on the FWSM for NAT optimization, which comes enabled by default, which was causing spurious communication problems, and as the core router was not in charge of NAT, disabled it with:

xlate-bypass

Enable xlate-bypass In both of the examples above, the xlates are
created with the Ii flags. These flags indicate that the xlate is an
identity translation (I) that originated on a high security (i)
interface. By default, the FWSM will build these xlates for any
traffic that does not match an explicit NAT/PAT rule. In order to
disable this behavior, the xlate-bypass command can be enabled in FWSM
3.2(1) and later:

FWSM(config)# xlate-bypass

Beware those are default configurations, we spend months tracking this down in an internal investigation, and several involved vendors which I wont name here were not able to pinpoint those configurations.

edited Mar 3 at 6:48

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

The XLATE issue was found in another context, which my deserve its own question linked to this one.

– Rui F Ribeiro
Mar 3 at 6:48

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f391125%2fjumphost-suddenly-reseting-first-ssh-mux-connection-attempts%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.

answered Sep 9 '17 at 6:57

user1998586

1613

I actually monitor MAC addresses on that netblock, and it seems not to be the case.

– Rui F Ribeiro
Sep 9 '17 at 8:20

add a comment |

Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.

answered Sep 9 '17 at 6:57

user1998586

1613

I actually monitor MAC addresses on that netblock, and it seems not to be the case.

– Rui F Ribeiro
Sep 9 '17 at 8:20

add a comment |

Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.

answered Sep 9 '17 at 6:57

user1998586

1613

Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.

answered Sep 9 '17 at 6:57

user1998586

1613

answered Sep 9 '17 at 6:57

user1998586

1613

answered Sep 9 '17 at 6:57

user1998586

1613

answered Sep 9 '17 at 6:57

user1998586

1613

I actually monitor MAC addresses on that netblock, and it seems not to be the case.

– Rui F Ribeiro
Sep 9 '17 at 8:20

add a comment |

I actually monitor MAC addresses on that netblock, and it seems not to be the case.

– Rui F Ribeiro
Sep 9 '17 at 8:20

I actually monitor MAC addresses on that netblock, and it seems not to be the case.

– Rui F Ribeiro
Sep 9 '17 at 8:20

add a comment |

Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.

answered Oct 4 '18 at 17:41

Yusufk

1366

Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.

– Rui F Ribeiro
Oct 25 '18 at 16:59

add a comment |

Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.

answered Oct 4 '18 at 17:41

Yusufk

1366

Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.

– Rui F Ribeiro
Oct 25 '18 at 16:59

add a comment |

Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.

answered Oct 4 '18 at 17:41

Yusufk

1366

Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.

answered Oct 4 '18 at 17:41

Yusufk

1366

answered Oct 4 '18 at 17:41

Yusufk

1366

answered Oct 4 '18 at 17:41

Yusufk

1366

answered Oct 4 '18 at 17:41

Yusufk

1366

Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.

– Rui F Ribeiro
Oct 25 '18 at 16:59

add a comment |

Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.

– Rui F Ribeiro
Oct 25 '18 at 16:59

Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.

– Rui F Ribeiro
Oct 25 '18 at 16:59

add a comment |

Found some systems with

net.ipv4.tcp_timestamps = 0

in /etc/sysctl.conf ; the servers having the problem all have that enabled.

I ended up taking this line from the affected systems and running in all systems:

sudo sysctl -w net.ipv4.tcp_timestamps=1

Waiting for further tests.

edited Oct 10 '17 at 13:26

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

add a comment |

Found some systems with

net.ipv4.tcp_timestamps = 0

in /etc/sysctl.conf ; the servers having the problem all have that enabled.

I ended up taking this line from the affected systems and running in all systems:

sudo sysctl -w net.ipv4.tcp_timestamps=1

Waiting for further tests.

edited Oct 10 '17 at 13:26

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

add a comment |

Found some systems with

net.ipv4.tcp_timestamps = 0

in /etc/sysctl.conf ; the servers having the problem all have that enabled.

I ended up taking this line from the affected systems and running in all systems:

sudo sysctl -w net.ipv4.tcp_timestamps=1

Waiting for further tests.

edited Oct 10 '17 at 13:26

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

Found some systems with

net.ipv4.tcp_timestamps = 0

in /etc/sysctl.conf ; the servers having the problem all have that enabled.

I ended up taking this line from the affected systems and running in all systems:

sudo sysctl -w net.ipv4.tcp_timestamps=1

Waiting for further tests.

edited Oct 10 '17 at 13:26

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

edited Oct 10 '17 at 13:26

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

answered Oct 6 '17 at 22:08

Rui F Ribeiro

41.7k1483142

add a comment |

In the end, found out it was due to bugs in the Cisco 6059 FWSM core router and the ASA firewall being used.

It was a quite nasty situation as we had complaints about people not being able to download random files from our sites.

Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.

Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.

You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:

If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.

For disabling it, it is something similar to:

policy-map global_policy

    class preserve-sq-no

        set connection random-sequence-number disable

See Cisco note Disable TCP Sequence Randomization

xlate-bypass

Enable xlate-bypass In both of the examples above, the xlates are
created with the Ii flags. These flags indicate that the xlate is an
identity translation (I) that originated on a high security (i)
interface. By default, the FWSM will build these xlates for any
traffic that does not match an explicit NAT/PAT rule. In order to
disable this behavior, the xlate-bypass command can be enabled in FWSM
3.2(1) and later:

FWSM(config)# xlate-bypass

edited Mar 3 at 6:48

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

The XLATE issue was found in another context, which my deserve its own question linked to this one.

– Rui F Ribeiro
Mar 3 at 6:48

add a comment |

In the end, found out it was due to bugs in the Cisco 6059 FWSM core router and the ASA firewall being used.

It was a quite nasty situation as we had complaints about people not being able to download random files from our sites.

Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.

Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.

You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:

If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.

For disabling it, it is something similar to:

policy-map global_policy

    class preserve-sq-no

        set connection random-sequence-number disable

See Cisco note Disable TCP Sequence Randomization

xlate-bypass

Enable xlate-bypass In both of the examples above, the xlates are
created with the Ii flags. These flags indicate that the xlate is an
identity translation (I) that originated on a high security (i)
interface. By default, the FWSM will build these xlates for any
traffic that does not match an explicit NAT/PAT rule. In order to
disable this behavior, the xlate-bypass command can be enabled in FWSM
3.2(1) and later:

FWSM(config)# xlate-bypass

edited Mar 3 at 6:48

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

The XLATE issue was found in another context, which my deserve its own question linked to this one.

– Rui F Ribeiro
Mar 3 at 6:48

add a comment |

In the end, found out it was due to bugs in the Cisco 6059 FWSM core router and the ASA firewall being used.

It was a quite nasty situation as we had complaints about people not being able to download random files from our sites.

Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.

Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.

You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:

If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.

For disabling it, it is something similar to:

policy-map global_policy

    class preserve-sq-no

        set connection random-sequence-number disable

See Cisco note Disable TCP Sequence Randomization

xlate-bypass

Enable xlate-bypass In both of the examples above, the xlates are
created with the Ii flags. These flags indicate that the xlate is an
identity translation (I) that originated on a high security (i)
interface. By default, the FWSM will build these xlates for any
traffic that does not match an explicit NAT/PAT rule. In order to
disable this behavior, the xlate-bypass command can be enabled in FWSM
3.2(1) and later:

FWSM(config)# xlate-bypass

edited Mar 3 at 6:48

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

In the end, found out it was due to bugs in the Cisco 6059 FWSM core router and the ASA firewall being used.

It was a quite nasty situation as we had complaints about people not being able to download random files from our sites.

Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.

Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.

You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:

If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.

For disabling it, it is something similar to:

policy-map global_policy

    class preserve-sq-no

        set connection random-sequence-number disable

See Cisco note Disable TCP Sequence Randomization

xlate-bypass

Enable xlate-bypass In both of the examples above, the xlates are
created with the Ii flags. These flags indicate that the xlate is an
identity translation (I) that originated on a high security (i)
interface. By default, the FWSM will build these xlates for any
traffic that does not match an explicit NAT/PAT rule. In order to
disable this behavior, the xlate-bypass command can be enabled in FWSM
3.2(1) and later:

FWSM(config)# xlate-bypass

edited Mar 3 at 6:48

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

edited Mar 3 at 6:48

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

answered Mar 3 at 6:32

Rui F Ribeiro

41.7k1483142

The XLATE issue was found in another context, which my deserve its own question linked to this one.

– Rui F Ribeiro
Mar 3 at 6:48

add a comment |

The XLATE issue was found in another context, which my deserve its own question linked to this one.

– Rui F Ribeiro
Mar 3 at 6:48

The XLATE issue was found in another context, which my deserve its own question linked to this one.

– Rui F Ribeiro
Mar 3 at 6:48

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly