Difference between revisions of "Llk QA"
(→udp_recvmsg) |
|||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | Linux 2.6 TCP/IP Receive Stack | + | =Linux 2.6 TCP/IP Receive Stack= |
+ | |||
+ | Todo for this doc: | ||
+ | *finish socket functions | ||
+ | *not all of our current timer points | ||
==Before process_backlog== | ==Before process_backlog== | ||
Line 178: | Line 182: | ||
[Herbert pp. 493-501 , diagram pg. 494] | [Herbert pp. 493-501 , diagram pg. 494] | ||
+ | ==tcp_data_queue== | ||
+ | "queues up data in the socket's normal receive queue." "puts segments that are out of order on the <code>out_of_order_queue</code>." | ||
+ | |||
+ | [Herbert pp. 501] | ||
− | == | + | ==tcp_recvmsg== |
net/ipv4/tcp.c:1213 | net/ipv4/tcp.c:1213 | ||
Line 192: | Line 200: | ||
[Herbert pp. 508-516] | [Herbert pp. 508-516] | ||
+ | |||
+ | ==tcp_read_sock== | ||
+ | /* | ||
+ | * This routine provides an alternative to tcp_recvmsg() for routines | ||
+ | * that would like to handle copying from skbuffs directly in 'sendfile' | ||
+ | * fashion. */ | ||
+ | tcp.c | ||
+ | |||
==udp_rcv== | ==udp_rcv== | ||
Line 214: | Line 230: | ||
[Herbert pp. 467-470] | [Herbert pp. 467-470] | ||
+ | |||
+ | =Linux 2.6 TCP/IP Send Stack= | ||
+ | |||
+ | ==sys_send== | ||
+ | socket.c | ||
+ | |||
+ | ==sys_sendto== | ||
+ | socket.c | ||
+ | |||
+ | ==sock_sendmsg== | ||
+ | socket.c | ||
+ | |||
+ | ==__sock_sendmsg== | ||
+ | socket.c | ||
+ | |||
+ | ==inet_sendmsg== | ||
+ | af_net.c | ||
+ | |||
+ | ==UDP== | ||
+ | |||
+ | ===udp_sendmsg=== | ||
+ | some routing is done here | ||
+ | udp.c | ||
+ | |||
+ | ===ip_append_data=== | ||
+ | ip_output.c | ||
+ | |||
+ | ===ip_generic_getfrag=== | ||
+ | ip_output.c | ||
+ | |||
+ | ===???=== | ||
+ | not sure how this path completes, but eventually we end up at ip_output (as with TCP packets) | ||
+ | |||
+ | ==TCP== | ||
+ | |||
+ | ===tcp_sendmsg=== | ||
+ | tcp.c | ||
+ | |||
+ | ===tcp_transmit_skb=== | ||
+ | tcp_output.c | ||
+ | |||
+ | ===ip_queue_xmit=== | ||
+ | ip_output.c | ||
+ | |||
+ | ===???=== | ||
+ | |||
+ | ==ip_ouput== | ||
+ | ip_output.c | ||
+ | |||
+ | ==ip_finish_output== | ||
+ | ip_output.c | ||
+ | |||
+ | ==ip_finish_output2== | ||
+ | calls dst->output or hh->hh_output, which are both function pointers to dev_queue_xmit | ||
+ | |||
+ | ip_output.c | ||
+ | |||
+ | ==dev_queue_xmit== | ||
+ | dev.c | ||
+ | |||
+ | cf. Herbert pp. 107-110 | ||
+ | |||
+ | ==hard_start_xmit== | ||
+ | member of net_device struct (implemented by each network driver); see how we hook into it in ip_output.c | ||
+ | |||
+ | cf. Herbert p. 100 |
Latest revision as of 01:20, 18 February 2006
Contents
- 1 Linux 2.6 TCP/IP Receive Stack
- 1.1 Before process_backlog
- 1.2 process_backlog
- 1.3 netif_receive_skb
- 1.4 ip_rcv
- 1.5 ip_rcv_finish
- 1.6 ip_local_deliver
- 1.7 ip_local_deliver_finish
- 1.8 TCP Queues
- 1.9 tcp_v4_rcv
- 1.10 tcp_prequeue
- 1.11 tcp_v4_do_rcv
- 1.12 tcp_rcv_established
- 1.13 tcp_data_queue
- 1.14 tcp_recvmsg
- 1.15 tcp_read_sock
- 1.16 udp_rcv
- 1.17 udp_queue_rcv_skb
- 1.18 udp_recvmsg
- 2 Linux 2.6 TCP/IP Send Stack
Linux 2.6 TCP/IP Receive Stack
Todo for this doc:
- finish socket functions
- not all of our current timer points
Before process_backlog
Network device interface driver's ISR (interrupt service routine) calls netif_rx which passes the incoming packet to the input packet queing layer. Packets are stored in Softnet_data data structures. Queuing layer manages the softIRQs that process input packets taken directly from the device driver. Note that the term softIRQs refer to the two queue processing threads. The receive thread is named NET_RX_SOFTIRQ. This thread's action function, net_rx_action, removes the packet from the queue and passes it to the packet handlers (which provides flow control for incoming packets). The packet handler is the backlog device (blog_dev). The function that does the work in the backlog device is process_backlog.
[See Herbert pp. 184-192, and diagram on pg. 192]
process_backlog
net/core/dev.c:1819
This function loops through all the packets on the backlog device's input packet queue. This function is prevented from spinning on the processor by being time budgeted. It also disables hardware interrupts to protect the input packet queue because the ISR places the packet directly from the card to the queue.
[Herbert pp. 194-196]
netif_receive_skb
net/core/dev.c:1720
The protocol field of the skb (taken from the link layer header) is compared to the registered protocol handler values. If there is a match, the packet is passed to the matching protocol handler. If there is no match, the packet is dropped. The TPR_NET_START is located before the comparision and TPR_NET_END is located after.
[Herbert pp. 197-199]
ip_rcv
net/ipv4/ip_input.c:371
As the main input function for the IP protocol, ip_rcv takes an incoming packet (in the form of an sk_buff) from netif_receive_skb. This function checks the header checksum to ensure that the packet is IPV4 and that the checksum is valid. If either fails, the packet is discarded. The timer point TPR_IP_START is located at the beginning of this function.
[Herbert pp. 447-450, diagram pg. 448]
ip_rcv_finish
net/ipv4/ip_input.c:291
Internal and external routing is performed in the ip_rcv_finish function. After the destination of the packet is determined, the packet is passed to ip_local_deliver.
[Herbert pp. 450-453]
ip_local_deliver
net/ipv4/ip_input.c:271
This function reassembles the packet (if it was fragmented) and sends the packet to ip_local_deliver_finish. The TPR_IP_END timer point is hit after the packet is reformed.
[Herbert pg. 453]
ip_local_deliver_finish
net/ipv4/ip_input.c:201
The ip_local_deliver_finish function determines which higher level protocol will received the packets, how many protocols will receive the packet (in special cases), which protocols will received a clone of the packet, and it sends the packet to any open raw sockets. The packet is sent to a higher level protocol by passing the packet's data to the registered packet handling function (each protocol must register a handling function when it is being registered).
Instead of the TPR_IP_END timer point being in the ip_local_deliver function, it might be more appropriately placed in the ip_local_deliver_finish function in front of the call to the protocol packet handler.
[Herbert pp. 454-456]
TCP Queues
TCP has three queues: the receive queue, backlog queue, and prequeue. Normally (defined as the receive queue not being full and the user receive socket is not in use) the receive queue takes the packet and passes it to the socket. If the receive queue is full or the user task has the socket locked, the packets are placed in the backlog queue. The packet is sent to the prequeue via the tcp_prequeue function when both packet header prediction determines the packet packet is an in-order segment containing data and the socket is in the established state. The receive/backlog queue route is called the "slow" path and the prequeue route is called the "fast" path.
[Herbert pp. 478-483,diagram on pg. 473]
tcp_v4_rcv
net/ipv4/tcp_ipv4.c:1734
Among many other things, the tcp_v4_rcv function determines if the packet should take the fast or slow route through TCP. Other actions taken by this function are error checking, determining if the socket is able to accept the packet, and calling functions to handle TCP state. The TPR_IP_TCP timer point is located at the beginning of this function.
[Herbert pp. 472-478]
tcp_prequeue
include/net/tcp.h:1592
The tcp_prequeue function gives the packet to the socket and sets the ACK to be sent back (it is piggybacked on the next data segment). If the prequeue is full, the packet is sent to the TCP backlog device. If the prequeue is not full the packet is placed on in the prequeue.
[Herbert pp. 478-482]
tcp_v4_do_rcv
net/ipv4/tcp_ipv4.c:1683
As the backlog receive function for TCP, tcp_v4_do_rcv is called when the socket is unable to receive incoming packets. If the TCP state is ESTABLISHED at this point (as the header prediction suggested), the packet is sent to be fully processed by TCP via tcp_rcv_established. Other TCP state packets (ACK, SYN, etc) are determined and passed off to state handling functions.
[Herbert pp. 482-483]
tcp_rcv_established
net/ipv4/tcp_input.c:4214
The tcp_rcv_established function does the bulk of the work with packets that contain data (aka more than TCP state packets). The actual header prediction happens in this function. The final check for taking the fast path is also administered to the packet in this function. If the fast path is taken, the packet data is copied to user space. If the slow path is taken, the packet data is copied to the socket queue (which will then have to be copied again to user space - this is my guess as what they mean exactly by fast path). Other functions are called to send out ACKs if necessary. Timer point TPR_TCP_SOCK1 is located in the fast path after the copy to user memory but before the potential sendig of ACKs. The TPR_TCP_SOCK2 is similarly located in the slow path.
On line 4325, there is a tcp_rcv_rtt_measure_ts function. This might be interesting to poke at as it might have an alternate take on our timing information.
[Herbert pp. 493-501 , diagram pg. 494]
tcp_data_queue
"queues up data in the socket's normal receive queue." "puts segments that are out of order on the out_of_order_queue
."
[Herbert pp. 501]
tcp_recvmsg
net/ipv4/tcp.c:1213
When data is put on the socket queue, the user task receives a signal meaning that there is data to read. The user task uses system receive or read calls to open the socket. These system calls are translated to tcp_rcvmsg which copies the information stored in the socket to a user buffer. This function also takes care of the user closing the socket and sending the proper state packets.
[Herbert pp. 508-516]
tcp_read_sock
/*
* This routine provides an alternative to tcp_recvmsg() for routines * that would like to handle copying from skbuffs directly in 'sendfile' * fashion. */
tcp.c
udp_rcv
net/ipv4/udp.c:1130
This is the registered function that receives the packet from IP. The packet header, checksum, and length are checked for errors. If no errors are detected, the packet is placed on the UDP recevie queue by a call to udp_queue_rcv_skb. The TPR_UDP_START timer point is placed before the error checking. The TPR_UDP_END timer point is located after the call to udp_queue_rcv_skb.
[Herbert pp. 459-462]
udp_queue_rcv_skb
net/ipv4/udp.c:1002
This function deals with encapsulated packets, completes the packet checksum process, and determines if there is enough space in the socket for the packet. If there is not enough space, the packet is dropped. The sock_queue_rcv_skb function is called to place the packet information in the socket.
[Herbert pp. 466-467]
udp_recvmsg
When receiving UDP data, the socket calls the UDP socket receive function, udp_recvmsg. This function checks for socket errors on the error message queue, dequeues packets from the socket's receive queue, and calls skb_copy_datagram_iovec to copy the datagram into the user's buffer. This may be the best place for TPR_UDP_END and TPR_SOCKET_START.
[Herbert pp. 467-470]
Linux 2.6 TCP/IP Send Stack
sys_send
socket.c
sys_sendto
socket.c
sock_sendmsg
socket.c
__sock_sendmsg
socket.c
inet_sendmsg
af_net.c
UDP
udp_sendmsg
some routing is done here udp.c
ip_append_data
ip_output.c
ip_generic_getfrag
ip_output.c
???
not sure how this path completes, but eventually we end up at ip_output (as with TCP packets)
TCP
tcp_sendmsg
tcp.c
tcp_transmit_skb
tcp_output.c
ip_queue_xmit
ip_output.c
???
ip_ouput
ip_output.c
ip_finish_output
ip_output.c
ip_finish_output2
calls dst->output or hh->hh_output, which are both function pointers to dev_queue_xmit
ip_output.c
dev_queue_xmit
dev.c
cf. Herbert pp. 107-110
hard_start_xmit
member of net_device struct (implemented by each network driver); see how we hook into it in ip_output.c
cf. Herbert p. 100