summaryrefslogtreecommitdiff
path: root/net/smc/smc_wr.c
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2017-02-07 13:07:56 -0500
committerDavid S. Miller <davem@davemloft.net>2017-02-07 13:07:56 -0500
commit29ba6e7400a317725bdfb86a725d1824447dbcd7 (patch)
treeb009850c5a2e7c633a94eeacb71a25f91b4b64f0 /net/smc/smc_wr.c
parentb08d46b01e995dd7b653b22d35bd1d958d6ee9b4 (diff)
parent51ce8bd4d17a761e1a90a34a1b5c9b762cce7553 (diff)
Merge branch 'replace-dst_confirm'
Julian Anastasov says: ==================== net: dst_confirm replacement This patchset addresses the problem of neighbour confirmation where received replies from one nexthop can cause confirmation of different nexthop when using the same dst. Thanks to YueHaibing <yuehaibing@huawei.com> for tracking the dst->pending_confirm problem. Sockets can obtain cached output route. Such routes can be to known nexthop (rt_gateway=IP) or to be used simultaneously for different nexthop IPs by different subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0). At first look, there are more problems: - dst_confirm() sets flag on dst and not on dst->path, as result, indication is lost when XFRM is used - DNAT can change the nexthop, so the really used nexthop is not confirmed So, the following solution is to avoid using dst->pending_confirm. The current dst_confirm() usage is as follows: Protocols confirming dst on received packets: - TCP (1 dst per socket) - SCTP (1 dst per transport) - CXGB* Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to confirm neighbour: - UDP IPv4/IPv6 - ICMPv4 PING - RAW IPv4/IPv6 - L2TP/IPv6 MSG_CONFIRM for other purposes (fix not needed): - CAN Sending without locking the socket: - UDP (when no cork) - RAW (when hdrincl=1) Redirects from old to new GW: - rt6_do_redirect The patchset includes the following changes: 1. sock: add sk_dst_pending_confirm flag - used only by TCP with patch 4 to remember the received indication in sk->sk_dst_pending_confirm 2. net: add dst_pending_confirm flag to skbuff - skb->dst_pending_confirm will be used by all protocols in following patches, via skb_{set,get}_dst_pending_confirm 3. sctp: add dst_pending_confirm flag - SCTP uses per-transport dsts and can not use sk->sk_dst_pending_confirm like TCP 4. tcp: replace dst_confirm with sk_dst_confirm 5. net: add confirm_neigh method to dst_ops - IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users. I decided to use neigh lookup only for this case because on MSG_PROBE the skb may pass MTU checks but it does not reach the neigh confirmation code. This patch will be used from patch 6. - xfrm_confirm_neigh: we use the last tunnel address, if present. When there are only transports, the original dest address is used. 6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP - dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6 - these protocols use MSG_CONFIRM propagated by ip*_append_data to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not used because some sending paths do not lock the socket. For MSG_PROBE we use the slow lookup (dst_confirm_neigh). - there are also 2 cases that need the slow lookup: __ip6_rt_update_pmtu and rt6_do_redirect. I hope &ipv6_hdr(skb)->saddr is the correct nexthop address to use here. 7. net: pending_confirm is not used anymore - I failed to understand the CXGB* code, I see dst_confirm() calls but I'm not sure dst_neigh_output() was called. For now I just removed the dst->pending_confirm flag and left all dst_confirm() calls there. Any better idea? - Now may be old function neigh_output() should be restored instead of dst_neigh_output? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/smc/smc_wr.c')
0 files changed, 0 insertions, 0 deletions
of jiffiesstephen hemminger2-4/+8 Jiffies is volatile so read it once. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-07bridge: remove unnecessary check for vtbegin in br_fill_vlan_tinfo_rangeRoopa Prabhu1-1/+1 vtbegin should not be NULL in this function, Its already checked by the caller. this should silence the below smatch complaint: net/bridge/br_netlink_tunnel.c:144 br_fill_vlan_tinfo_range() error: we previously assumed 'vtbegin' could be null (see line 130) net/bridge/br_netlink_tunnel.c 129 130 if (vtbegin && vtend && (vtend->vid - vtbegin->vid) > 0) { ^^^^^^^ Check for NULL. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-07bridge: tunnel: fix attribute checks in br_parse_vlan_tunnel_infoNikolay Aleksandrov1-4/+4 These checks should go after the attributes have been parsed otherwise we're using tb uninitialized. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-07net: bridge: remove redundant check to see if err is setColin Ian King1-3/+0 The error check on err is redundant as it is being checked previously each time it has been updated. Remove this redundant check. Detected with CoverityScan, CID#140030("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-06bridge: fdb: write to used and updated at most once per jiffyNikolay Aleksandrov2-2/+4 Writing once per jiffy is enough to limit the bridge's false sharing. After this change the bridge doesn't show up in the local load HitM stats. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-06bridge: move write-heavy fdb members in their own cache lineNikolay Aleksandrov1-4/+6 Fdb's used and updated fields are written to on every packet forward and packet receive respectively. Thus if we are receiving packets from a particular fdb, they'll cause false-sharing with everyone who has looked it up (even if it didn't match, since mac/vid share cache line!). The "used" field is even worse since it is updated on every packet forward to that fdb, thus the standard config where X ports use a single gateway results in 100% fdb false-sharing. Note that this patch does not prevent the last scenario, but it makes it better for other bridge participants which are not using that fdb (and are only doing lookups over it). The point is with this move we make sure that only communicating parties get the false-sharing, in a later patch we'll show how to avoid that too. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-06bridge: move to workqueue gcNikolay Aleksandrov10-23/+29 Move the fdb garbage collector to a workqueue which fires at least 10 milliseconds apart and cleans chain by chain allowing for other tasks to run in the meantime. When having thousands of fdbs the system is much more responsive. Most importantly remove the need to check if the matched entry has expired in __br_fdb_get that causes false-sharing and is completely unnecessary if we cleanup entries, at worst we'll get 10ms of traffic for that entry before it gets deleted. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-06bridge: modify bridge and port to have often accessed fields in one cache lineNikolay Aleksandrov1-23/+20 Move around net_bridge so the vlan fields are in the beginning since they're checked on every packet even if vlan filtering is disabled. For the port move flags & vlan group to the beginning, so they're in the same cache line with the port's state (both flags and state are checked on each packet). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-06net: remove ndo_neigh_{construct, destroy} from stacked devicesIdo Schimmel1-2/+0 In commit 18bfb924f000 ("net: introduce default neigh_construct/destroy ndo calls for L2 upper devices") we added these ndos to stacked devices such as team and bond, so that calls will be propagated to mlxsw. However, previous commit removed the reliance on these ndos and no new users of these ndos have appeared since above mentioned commit. We can therefore safely remove this dead code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller3-32/+49 Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from sk_buff so we only access one single cacheline in the conntrack hotpath. Patchset from Florian Westphal. 2) Don't leak pointer to internal structures when exporting x_tables ruleset back to userspace, from Willem DeBruijn. This includes new helper functions to copy data to userspace such as xt_data_to_user() as well as conversions of our ip_tables, ip6_tables and arp_tables clients to use it. Not surprinsingly, ebtables requires an ad-hoc update. There is also a new field in x_tables extensions to indicate the amount of bytes that we copy to userspace. 3) Add nf_log_all_netns sysctl: This new knob allows you to enable logging via nf_log infrastructure for all existing netnamespaces. Given the effort to provide pernet syslog has been discontinued, let's provide a way to restore logging using netfilter kernel logging facilities in trusted environments. Patch from Michal Kubecek. 4) Validate SCTP checksum from conntrack helper, from Davide Caratti. 5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly a copy&paste from the original helper, from Florian Westphal. 6) Reset netfilter state when duplicating packets, also from Florian. 7) Remove unnecessary check for broadcast in IPv6 in pkttype match and nft_meta, from Liping Zhang. 8) Add missing code to deal with loopback packets from nft_meta when used by the netdev family, also from Liping. 9) Several cleanups on nf_tables, one to remove unnecessary check from the netlink control plane path to add table, set and stateful objects and code consolidation when unregister chain hooks, from Gao Feng. 10) Fix harmless reference counter underflow in IPVS that, however, results in problems with the introduction of the new refcount_t type, from David Windsor. 11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp, from Davide Caratti. 12) Missing documentation on nf_tables uapi header, from Liping Zhang. 13) Use rb_entry() helper in xt_connlimit, from Geliang Tang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-03bridge: vlan dst_metadata hooks in ingress and egress pathsRoopa Prabhu6-2/+82 - ingress hook: - if port is a tunnel port, use tunnel info in attached dst_metadata to map it to a local vlan - egress hook: - if port is a tunnel port, use tunnel info attached to vlan to set dst_metadata on the skb CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-03bridge: per vlan dst_metadata netlink supportRoopa Prabhu7-48/+641 This patch adds support to attach per vlan tunnel info dst metadata. This enables bridge driver to map vlan to tunnel_info at ingress and egress. It uses the kernel dst_metadata infrastructure. The initial use case is vlan to vni bridging, but the api is generic to extend to any tunnel_info in the future: - Uapi to configure/unconfigure/dump per vlan tunnel data - netlink functions to configure vlan and tunnel_info mapping - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach dst_metadata to bridged packets on ports. off by default. - changes to existing code is mainly refactor some existing vlan handling netlink code + hooks for new vlan tunnel code - I have kept the vlan tunnel code isolated in separate files. - most of the netlink vlan tunnel code is handling of vlan-tunid ranges (follows the vlan range handling code). To conserve space vlan-tunid by default are always dumped in ranges if applicable. Use case: example use for this is a vxlan bridging gateway or vtep which maps vlans to vn-segments (or vnis). iproute2 example (patched and pruned iproute2 output to just show relevant fdb entries): example shows same host mac learnt on two vni's and vlan 100 maps to vni 1000, vlan 101 maps to vni 1001 before (netdev per vni): $bridge fdb show | grep "00:02:00:00:00:03" 00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge 00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self 00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge 00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self after this patch with collect metdata in bridged mode (single netdev): $bridge fdb show | grep "00:02:00:00:00:03" 00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge 00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self 00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge 00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2017-02-02netfilter: allow logging from non-init namespacesMichal Kubeček1-1/+1 Commit 69b34fb996b2 ("netfilter: xt_LOG: add net namespace support for xt_LOG") disabled logging packets using the LOG target from non-init namespaces. The motivation was to prevent containers from flooding kernel log of the host. The plan was to keep it that way until syslog namespace implementation allows containers to log in a safe way. However, the work on syslog namespace seems to have hit a dead end somewhere in 2013 and there are users who want to use xt_LOG in all network namespaces. This patch allows to do so by setting /proc/sys/net/netfilter/nf_log_all_netns to a nonzero value. This sysctl is only accessible from init_net so that one cannot switch the behaviour from inside a container. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>