Bhyve setup for tcp testing

Posted: January 9th, 2017 | Author: | Filed under: FreeBSD, tcp, virtualization | Tags: | Comments Off on Bhyve setup for tcp testing

Here is how I test simple FreeBSD tcp changes with dummynet on bhyve. I’ve already wrote down how I do dummynet so I’ll focus on bhyve part.

Caution: Handbook entry on bhyve is the true source. Please refer to it for exact information. This post is super quick and may contain not-entierly-correct things. Also, I am lazy and all this config is what I am using, you may need to tweak a bit here and there.

Setup:
I’ll create 3 bhyve guests: client, router and server:

client            router            server
192.168.1.227     192.168.1.228     192.168.1.229 
10.10.10.10   10.10.10.11
                  10.10.11.11   10.10.11.10

Here, 192.* addresses are for ssh and 10.* are for guests to be able to communicate within themselves.

First, create tap interfaces needed for all bhyve guests:

client has tap0 (ssh), tap1
router has tap2 (ssh), tap3, tap4
server has tap5 (ssh), tap6

ifconfig tap0 create
ifconfig tap1 create
ifconfig tap2 create
ifconfig tap3 create
ifconfig tap4 create
ifconfig tap5 create
ifconfig tap6 create

Now create bridge interfaces for the communication.

bridge0 contains re0, tap0, tap2, tap5
bridge1 contains tap1, tap3
bridge2 contains tap4, tap6

re0 is host interface here.

ifconfig bridge0 create
ifconfig bridge0 addm re0 addm tap0 addm tap2 addm tap5
ifconfig bridge0 up
ifconfig bridge1 create
ifconfig bridge1 addm tap1 addm tap3
ifconfig bridge1 up
ifconfig bridge2 create
ifconfig bridge2 addm tap4 addm tap6
ifconfig bridge2 up

bridge0 would help connect all guests mgmt interfaces to re0 (host interface) so they all can reach out and for us to be able to ssh into them.

bridge1 connects client to router and bridge2 connects router to server.

Now, let’s create VMs.

truncate -s 10G client.img
truncate -s 10G router.img
truncate -s 10G server.img

Setup/install VMs:

sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 2048M -t tap0 -t tap1 -d client.img -i -I iso client
sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 2048M -t tap2 -t tap3 -t tap4 -d router.img -i -I iso router
sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 2048M -t tap5 -t tap6 -d server.img -i -I iso server

Here, ‘iso’ is the path to iso image that you want to install with and last arguments – client, router,server – are VM names.

Start the VMs:

sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 2048M -t tap0 -t tap1 -d client.img client
sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 2048M -t tap2 -t tap3 -t tap4 -d router.img router
sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 2048M -t tap5 -t tap6 -d server.img server

Stop a VM:

bhyvectl --force-poweroff --vm=

To setup networking, you’d need following in rc.conf files:

client:
ifconfig_vtnet0="inet 192.168.1.227 netmask 255.255.255.0"
defaultrouter="192.168.1.1"
ifconfig_vtnet1="inet 10.10.10.10 netmask 255.255.255.0"
static_routes="inet1"
route_inet1="-host 10.10.11.10 10.10.10.11"

router:
ifconfig_vtnet0="inet 192.168.1.228 netmask 255.255.255.0"
defaultrouter="192.168.1.1"
ifconfig_vtnet1="inet 10.10.10.11 netmask 255.255.255.0"
ifconfig_vtnet2="inet 10.10.11.11 netmask 255.255.255.0"

server:
ifconfig_vtnet0="inet 192.168.1.229 netmask 255.255.255.0"
defaultrouter="192.168.1.1"
ifconfig_vtnet1="inet 10.10.11.10 netmask 255.255.255.0"
static_routes="inet1"
route_inet1="-host 10.10.10.10 10.10.11.11"

static route entries make sure routes are setup correctly for client and server to communicate with each other.

router would also need following in /etc/sysctl.conf to be able to pass traffic between client and server.

net.inet.ip.forwarding=1

Try pinging client from server or the other way around to make sure networking is working:

root@server:~ # ping 10.10.10.10
PING 10.10.10.10 (10.10.10.10): 56 data bytes
64 bytes from 10.10.10.10: icmp_seq=0 ttl=63 time=0.718 ms
64 bytes from 10.10.10.10: icmp_seq=1 ttl=63 time=0.999 ms
64 bytes from 10.10.10.10: icmp_seq=2 ttl=63 time=0.553 ms
64 bytes from 10.10.10.10: icmp_seq=3 ttl=63 time=0.553 ms
^C
--- 10.10.10.10 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.553/0.706/0.999/0.182 ms

Working networking setup on the guest looks something like this:

root@server:~ # ifconfig
vtnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
        ether xx:xx:xx:xx:xx:xx
        inet 192.168.1.229 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 10Gbase-T 
        status: active
vtnet1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
        ether xx:xx:xx:xx:xx:xx
        inet 10.10.11.10 netmask 0xffffff00 broadcast 10.10.11.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 10Gbase-T 
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo

Working networking setup on the host looks something like this:

tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 26035
tap1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 26035
tap2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 26093
tap3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 26093
tap4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 26093
tap5: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 25977
tap6: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=80000 ether xx:xx:xx:xx:xx:xx
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect
status: active
groups: tap
Opened by PID 25977
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether xx:xx:xx:xx:xx:xx
inet 192.168.1.224 netmask 0xffffff00 broadcast 192.168.1.255
nd6 options=1 groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap5 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 13 priority 128 path cost 2000000
member: tap2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 10 priority 128 path cost 2000000
member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 8 priority 128 path cost 2000000
member: re0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 5 priority 128 path cost 20000
bridge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether xx:xx:xx:xx:xx:xx
nd6 options=1 groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 11 priority 128 path cost 2000000
member: tap1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 9 priority 128 path cost 2000000
bridge2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether xx:xx:xx:xx:xx:xx
nd6 options=1 groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap6 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 14 priority 128 path cost 2000000
member: tap4 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 12 priority 128 path cost 2000000

Drop a packet

Posted: October 14th, 2015 | Author: | Filed under: FreeBSD, networking, tcp | Tags: , , | Comments Off on Drop a packet

A few months back when I started looking into improving FreeBSD TCP’s response to packet loss, I looked around for traffic simulators which can do deterministic packet drop for me.

I had used dummynet(4) before so I thought of using it but the problem is that it only provided probabilistic drops. You can specify dropping 10% of the total packets, for example. I came across dpd work from CAIA, Swinburne University but it was written for FreeBSD7 and I couldn’t port it forward to FreeBSD11 with reasonable time/efforts as ipfw/dummynet has changed quite a bit.

So I decided to hack dummynet to provide me deterministic drops. Here is the patch: drop.patch
(Yes, it’s a hack and it needs polishing.)

Here is how I use it:
Setup:

client              dummynet          server
10.10.10.10  <--->  10.10.10.11
                    10.10.11.11 <---> 10.10.11.12

Both client and server need their routing tables setup correctly so that they can reach each other.

Dummynet node is the traffic shaping node here. We need to enable forwarding between interfaces:

sysctl net.inet.ip.forwarding=1

We need to setup links (called ‘pipes’) and their parameters on dummynet node like this:

# ipfw add pipe 100 ip from 10.10.11.12 to 10.10.10.10 out 
# ipfw add pipe 101 ip from 10.10.10.10 to 10.10.11.12 out
# ipfw pipe 100 config mask proto TCP src-ip 10.10.11.12 dst-ip 10.10.10.10 pls 3,4,5 plsr 7
# ipfw pipe 101 config mask proto TCP src-ip 10.10.10.10 dst-ip 10.10.11.12

‘pls 3,4,5 plsr 7’ – is the new configuration that the patch provides here.
pls : packet loss sequence
plsr : repeat frequency for the loss pattern

In the example above, it configures the pipe 100 to drop 3rd, 4th and 5th packet and repeat this pattern at every 7 packets going from server to client. So it’d also drop 10th, 11th and 12th packets and so on and so forth.

Side note: delay, bw and queue depth are other very useful parameters that you can set for the link to simulate however you want the link to behave. For example: ‘delay 5ms bw 40Mbps queue 50Kbytes’ would create a link with 10ms RTT, 40Mbps bandwidth with 50Kbytes worth of queue depth/capacity. Queue depth is usually decided based on BDP (bandwidth delay product) of the link. Dummynet drops packets once the limit is reached.

For simulations, I run a lighttpd web-server on the server which serves different sized objects and I request them via curl or wget from the client. I have tcpdump running on any/all of four interfaces involved to observe traffic and I can see specified packets getting dropped by dummynet.
sysctl net.inet.ip.dummynet.io_pkt_drop is incremented with each packet that dummynet drops.

Future work:
* Work on getting this patch committed into FreeBSD-head.
* sysctl net.inet.ip.dummynet.io_pkt_drop increments on any type of loss (which includes queue overflow and any other random error) so I am planning to add a more specific counter to show explicitly dropped packets only.
* I’ve (unsuccessfully) tried adding deterministic delay to dummynet so that we can delay specific packet(s) which can be useful in simulating link delays and also in debugging any delay-based congestion control algorithms. Turns out it’s trickier that I thought. I’d like to resume working on it as time permits.


Improving FreeBSD’s transport layer

Posted: October 9th, 2015 | Author: | Filed under: FreeBSD, networking, tcp | Tags: , , | Comments Off on Improving FreeBSD’s transport layer

FreeBSD network stack is quite stable but lacks some of the improvements/features available in other OSes.

A bunch of us have started an effort to try and identify current problems to improve transport layer (TCP, UDP, SCTP and others) for FreeBSD:

Transport Protocols wiki

Traditionally, freebsd-net has been the mailing list where networking problems get discussed but some have complained it to be too spammy and too focused on NIC drivers related issues. So a new mailing list has been created to specifically talk about transport level protocols: transport@

We’ve also started creating a list of TCP related RFCs and their support for FreeBSD to have a single point of reference.

Plan is to have a coordinated effort to improve TCP, UDP, etc.. so if you are interested in any of those protocols, please join the mailing list and help FreeBSD. :-)