Llk QA
Linux 2.6 TCP/IP Receive Stack
Contents
Before process_backlog
Network device interface driver's ISR (interrupt service routine) calls netif_rx which passes the incoming packet to the input packet queing layer. Packets are stored in Softnet_data data structures. Queuing layer manages the softIRQs that process input packets taken directly from the device driver. Note that the term softIRQs refer to the two queue processing threads. The receive thread is named NET_RX_SOFTIRQ. This thread's action function, net_rx_action, removes the packet from the queue and passes it to the packet handlers (which provides flow control for incoming packets). The packet handler is the backlog device (blog_dev). The function that does the work in the backlog device is process_backlog.
[See Herbert pp. 184-192, and diagram on pg. 192]
process_backlog
net/core/dev.c:1819
This function loops through all the packets on the backlog device's input packet queue. This function is prevented from spinning on the processor by being time budgeted. It also disables hardware interrupts to protect the input packet queue because the ISR places the packet directly from the card to the queue.
[Herbert pp. 194-196]
netif_receive_skb
net/core/dev.c:1720
The protocol field of the skb (taken from the link layer header) is compared to the registered protocol handler values. If there is a match, the packet is passed to the matching protocol handler. If there is no match, the packet is dropped. The TPR_NET_START is located before the comparision and TPR_NET_END is located after.
Options that manifest themselves in this function: CONFIG_NETPOLL CONFIG_NET_CLS_ACT
[Herbert pp. 197-199]
ip_rcv
net/ipv4/ip_input.c:371
As the main input function for the IP protocol, ip_rcv takes an incoming packet (in the form of an sk_buff) from netif_receive_skb. This function checks the header checksum to ensure that the packet is IPV4 and that the checksum is valid. If either fails, the packet is discarded. The timer point TPR_IP_START is located at the beginning of this function.
[Herbert pp. 447-450, diagram pg. 448]
ip_rcv_finish
net/ipv4/ip_input.c:291
Internal and external routing is performed in the ip_rcv_finish function. After the destination of the packet is determined, the packet is passed to ip_local_deliver.
[Herbert pp. 450-453]
ip_local_deliver
net/ipv4/ip_input.c:271
This function reassembles the packet (if it was fragmented) and sends the packet to ip_local_deliver_finish. The TPR_IP_END timer point is hit after the packet is reformed.
[Herbert pg. 453]
ip_local_deliver_finish
net/ipv4/ip_input.c:201
The ip_local_deliver_finish function determines which higher level protocol will received the packets, how many protocols will receive the packet (in special cases), which protocols will received a clone of the packet, and it sends the packet to any open raw sockets. The packet is sent to a higher level protocol by passing the packet's data to the registered packet handling function (each protocol must register a handling function when it is being registered).
Instead of the TPR_IP_END timer point being in the ip_local_deliver function, it might be more appropriately placed in the ip_local_deliver_finish function in front of the call to the protocol packet handler.
[Herbert pp. 454-456]
TCP Queues
TCP has three queues: the receive queue, backlog queue, and prequeue. Normally (defined as the receive queue not being full and the user receive socket is not in use) the receive queue takes the packet and passes it to the socket. If the receive queue is full or the user task has the socket locked, the packets are placed in the backlog queue. The packet is sent to the prequeue via the tcp_prequeue function when both packet header prediction determines the packet packet is an in-order segment containing data and the socket is in the established state. The receive/backlog queue route is called the "slow" path and the prequeue route is called the "fast" path.
[Herbert pp. 478-483,diagram on pg. 473]
tcp_v4_rcv
net/ipv4/tcp_ipv4.c:1734
Among many other things, the tcp_v4_rcv function determines if the packet should take the fast or slow route through TCP. Other actions taken by this function are error checking, determining if the socket is able to accept the packet, and calling functions to handle TCP state. The TPR_IP_TCP timer point is located at the beginning of this function.
[Herbert pp. 472-478]
tcp_prequeue
include/net/tcp.h:1592
The tcp_prequeue function gives the packet to the socket and sets the ACK to be sent back (it is piggybacked on the next data segment). If the prequeue is full, the packet is sent to the TCP backlog device. If the prequeue is not full the packet is placed on in the prequeue.
[Herbert pp. 478-482]
tcp_v4_do_rcv
net/ipv4/tcp_ipv4.c:1683
As the backlog receive function for TCP, tcp_v4_do_rcv is called when the socket is unable to receive incoming packets. If the TCP state is ESTABLISHED at this point (as the header prediction suggested), the packet is sent to be fully processed by TCP via tcp_rcv_established. Other TCP state packets (ACK, SYN, etc) are determined and passed off to state handling functions.
[Herbert pp. 482-483]
tcp_rcv_established
net/ipv4/tcp_input.c:4214
The tcp_rcv_established function does the bulk of the work with packets that contain data (aka more than TCP state packets). The actual header prediction happens in this function. The final check for taking the fast path is also administered to the packet in this function. If the fast path is taken, the packet data is copied to user space. If the slow path is taken, the packet data is copied to the socket queue (which will then have to be copied again to user space - this is my guess as what they mean exactly by fast path). Other functions are called to send out ACKs if necessary. Timer point TPR_TCP_SOCK1 is located in the fast path after the copy to user memory but before the potential sendig of ACKs. The TPR_TCP_SOCK2 is similarly located in the slow path.
On line 4325, there is a tcp_rcv_rtt_measure_ts function. This might be interesting to poke at as it might have an alternate take on our timing information.
[Herbert pp. 493-501 , diagram pg. 494]
tcp_rcvmsg
net/ipv4/tcp.c:1213
When data is put on the socket queue, the user task receives a signal meaning that there is data to read. The user task uses system receive or read calls to open the socket. These system calls are translated to tcp_rcvmsg which copies the information stored in the socket to a user buffer. This function also takes care of the user closing the socket and sending the proper state packets.
[Herbert pp. 508-516]
udp_rcv
net/ipv4/udp.c:1130
[Herbert pp. 459-462]