CodingBison - Socket Programming: TCP Sockets (Connection-oriented Sockets)

TCP sockets are connection-oriented sockets and hence, require an explicit association between the two socket endpoints. Two connected sockets learn the address of each other during the connection-setup and so they can send data to each other without having to specify the address of the other socket.

Having an explicit connection enables TCP to offer three important flow-properties. First, if the network drops packets, then TCP retransmits lost packets. Second, TCP adjusts the sending-rate, when the available network bandwidth changes. Lastly, TCP sender ensures that it does not send more data than the available receiver buffer at the peer (the other endpoint) and thus, does not overwhelm the receiver. If these services are critical for your application, then you should consider using TCP.

Depending upon who initiates the connection, a socket can be classified as a server socket or as a client socket. The server socket accepts incoming requests for new connections from client sockets. The client socket sends request for a new connection. This section provides APIs that allow us to create both types of sockets. Following that, we provide their sample implementation as well.

TCP Socket Server

First off, let us discuss socket APIs that allow us to build a TCP server socket. Let us start with signature of these APIs (provided below), followed by their discussion and an implementation.

 int socket(int family, int type, int protocol); 
 int bind(int fd, const struct sockaddr *addr, socklen_t addrlen);
 int listen(int fd, int backlog);
 int accept(int fd, struct sockaddr *addr, socklen_t *addrlen);
 int close(int fd);

The socket() call creates a socket -- whether it is a server or a client -- and is usually the very first function in a socket application. This call accepts three parameters. The first parameter represents the IP address family that can be AF_INET (for IPv4 family of addresses), AF_INET6 (for IPv6 family of addresses), AF_UNIX (for communicating with sockets on the local machine), and AF_PACKET(for sending/receiving packets directly to network driver without having to go through the TCP/IP (or UDP/IP) stack). The second parameter is the protocol type that can be SOCK_STREAM (for TCP) or SOCK_DGRAM (for UDP). The last parameter is the transport protocol that can be IPPROTO_TCP (for TCP) or IPPROTO_UDP (for UDP). Thus, if we want to create an IPv4 TCP socket, then these params would be AF_INET4, SOCK_STREAM, and IPPROTO_TCP. For IPv6, they would be AF_INET6, SOCK_STREAM, and IPPROTO_TCP.

The return value of the socket() call is a file descriptor that is used to symbolically refer to this socket. All subsequent operations on this socket (like send or receive) must be done by passing this file descriptor. If this call runs into an error, then it returns a value of -1.

The next three functions collectively characterize the steps needed to create a TCP server: bind(), listen(), and accept(). The bind() call allows the server to bind to a well-known port and an address so that clients can reach it. The listen() call allows the server to wait (or listen) for incoming connections from clients. Once a request arrives, the accept() call allows the server to retrieve a new connection.

Let us look at these three calls in more detail.

The bind() call takes three params: (a) the file descriptor of the socket that we need to bind, (b) a pointer to an address structure that holds the local port and local IP address, and (c) the length of the address buffer pointed by the address pointer. Upon success, the bind() call returns 0, else it returns -1.

Some thoughts on port numbers. Each host can have as many as 65,536 ports for each protocol and for each address family; typically, port numbers in the range of 0 to 1024 are standard ports and are reserved for various applications. Ports outside this range are usually available for general use. For a detailed list of well-known ports, please visit Internet Assignment Numbers Authority (IANA) website.

Binding establishes the server socket uniquely in the entire Internet. Any remote socket client can send connection request to this server using a combination of port, machine's IP address, and the protocol. The only requirement is that the combination of these three variables should be unique in the Internet. If we were to use the analogy of regular postage mail, then the bind() call would mean assigning (or using) a unique mailing address to a house. It is this uniqueness that allows the post office to deliver all the mails destined to this house correctly.

The next call, listen() takes two parameters: (a) the file descriptor of the socket that needs to become a listener socket and (b) the maximum number of backlog of pending connections. This call allows a socket to wait for newer connections from remote client machines. Once a socket is in listen mode and it receives a request for a connection, then it completes TCP's 3-way handshake and enqueues the new connection in a queue that is reserved for pending connections. For our postal analogy, the listen() call means that the owner of the house puts a mailbox so that the postman can deliver incoming mails. Upon success, the listen() call returns 0, else it returns -1.

The backlog limit is a good strategy to avoid Denial-Of-Service attacks. Without this limit, a malicious client can continuously (and at a fast rate) send connection requests. Since each connection requires both memory and CPU processing, this attack can seriously deplete memory and CPU resources at the server.

Even though having a backlog limit is a good thing, it is possible that the server can get busy and thus, can take more time to dequeue newer connections from the accepted list. For such cases, some of the genuine requests can also be dropped since the queue of pending connections might already be at the maximum backlog limit! We should not loose our sleep over this because TCP has an inbuilt mechanism where the clients would retry again and hopefully, with the next try, the server would have dequeued some of the pending connections.

One last thing about the backlog. The backlog value is bounded by an upper limit provided by the underlying operating system. For Linux, the default value is 256. Thus, if an application were to erroneously pass this parameter as a high value, let us say 1000, then the underlying layer would automatically reduce it to 256.

The next step of accept() takes three params: (a) the file descriptor of the server socket, (b) a pointer to an address structure, and (c) the length of the storage pointed by the address pointer. Upon success, the accept() call returns file descriptor of the new (child) socket, else it returns -1.

The accept() call allows the application to retrieve a single connection from the queue of pending connections. The retrieval dequeues the first connection from the queue and hence, creates room for one additional future connection. This call returns a new file descriptor associated with the new connection. This file descriptor is different from that of the server and all socket operations for the new connection should be done using this descriptor. The accept() call is blocking because if there is no pending connection, then the calling thread must wait.

Even though we do not need to bind to any local address, the accept() call still takes "struct sockaddr" buffer as argument. The reason for passing an address buffer is that the accept call returns the address of the remote client (or peer) that initiates the connection. This way, the server application gets to know the IP address and the port number of the remote client socket. Not bad!

On error, all of these calls set the system errno variable, besides returning -1. We should make it a habit of printing the value of the error number, when we run into an error. The error information would prove handy during difficult times of debugging!

The following figure displays the above three steps of bind(), listen(), and accept() for server socket (socketA0) and its interaction with a client socket (socketB0); the client socket uses a connect() call and we will discuss that a little later. Note that the steps for client side may happen at the same machine.

Figure: Detailed Steps for TCP Connection Establishment

Once we are done with all the operations, we can call close() and pass the file descriptor of the socket that we wish to close. This call is essential since it releases all the resources held by the socket. For TCP, this call also informs the other end (the peer) that it is tearing down the connection. On success it returns 0, else it returns -1.

While the earlier calls have focused on creating a socket and setting up the connection, the next two calls allow a socket server to send and receive data. A client socket also uses the same functions to send and receive data, so these calls are actually common to both server and client sockets. Since sockets are bidirectional, both sides can send data and receive data simultaneously.

 ssize_t send(int fd, const void *buf, size_t bufsize, int flags);
 ssize_t recv(int fd, void *buf, size_t bufsiz, int flags);

The send() call sends data to the other end of the connected socket pipe. This call takes three parameters: (a) the file descriptor of the sending socket, (b) a pointer to a buffer that contains the data we wish to send, and (c) the length of the buffer storage. The underlying socket/TCP layer copies this data into the outgoing send buffer. Upon success, this call returns the number of bytes sent and on error, it returns -1.

If the buffer is more than the space available in the send buffer, then the send() call can also block; the underlying socket/TCP layer maintains its own send buffer for each socket. However, if the socket is non-blocking, then it would return -1 with the errno set to EAGAIN or EWOULDBLOCK.

The recv() call allows an application to read received data. It takes three parameters: (a) the file descriptor of the server socket, (b) a pointer to a buffer, and (c) the length of the buffer storage. We pass buffer so that the underlying socket/TCP layer can copy received data (from the client) into this buffer. Upon success, this call would return the number of bytes received and copied in the buffer. On error it would return -1. As a special case, if recv() call returns zero, then that means the peer has (gracefully) closed the connection.

The recv() call is a blocking call. If there is no received data, then the calling thread will block till it receives any data or till the underlying connection is closed; like the send buffer, the underlying socket/TCP layer also maintains its own receiver buffer for every socket. If the socket is non-blocking and if there is no received data, then recv() would return with -1 and the errno will be set to EAGAIN or EWOULDBLOCK. In that case, the application would have to retry later.

Depending upon the amount of data received by the TCP layer (let us say, k_rcvd), the passed buffer (let us say, k_passed) may or may not be sufficient. If the passed buffer is more than the data received (i.e. k_passed > k_rcvd), then the TCP layer will return all the data received in the same buffer and the returned value would be k_rcvd. On the other hand, if the passed buffer is less than the data received ( i.e. k_passed < k_rcvd), then TCP will only return k_passed bytes.

The recv() call also accepts a flag that can be passed to tweak the behavior of the recv() call itself. Two of these flags are: MSG_DONTWAIT and MSG_PEEK. MSG_DONTWAIT specifies that if the underlying TCP has not received any data, then it can return immediately and the returned value would be -1. MSG_PEEK means that the normal recv() call behavior would hold except that the TCP layer would not remove that much data from its receive buffer since the goal is only to peek; we would need a subsequent recv() call without MSG_PEEK flag set to drain the data from the TCP receive buffer.

That completes our rather-long discussion on socket functions needed for a server. If you feel like grabbing a cup of coffee, I would understand! Let us move on and apply our skills to write a simple server. We present the example below and following that, we describe its various pieces.

 #include <stdio.h>
 #include <errno.h> 
 #include <netinet/in.h> 
 #include <sys/socket.h> 

 #define DATA_BUFFER 5000

 int create_tcp_server_socket() {
     struct sockaddr_in saddr;
     int fd, ret_val;

     /* Step1: create a TCP socket */
     fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); 
     if (fd == -1) {
         fprintf(stderr, "socket failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }
     printf("Created a socket with fd: %d\n", fd);

     /* Initialize the socket address structure */
     saddr.sin_family = AF_INET;         
     saddr.sin_port = htons(7000);     
     saddr.sin_addr.s_addr = INADDR_ANY; 

     /* Step2: bind the socket to port 7000 on the local host */
     ret_val = bind(fd, (struct sockaddr *)&saddr, sizeof(struct sockaddr_in));
     if (ret_val != 0) {
         fprintf(stderr, "bind failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }

     /* Step3: listen for incoming connections */
     ret_val = listen(fd, 5);
     if (ret_val != 0) {
         fprintf(stderr, "listen failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }
     return fd;
 }

 int main () {
     struct sockaddr_in new_client_addr;
     int fd, new_fd, ret_val;
     socklen_t addrlen;
     char buf[DATA_BUFFER];

     /* Create the server socket */
     fd = create_tcp_server_socket(); 
     if (fd == -1) {
         fprintf(stderr, "Creating server failed [%s]\n", strerror(errno));
         return -1;
     }

     /* Accept a new connection */
     new_fd = accept(fd, (struct sockaddr*)&new_client_addr, &addrlen);
     if (new_fd == -1) {
         fprintf(stderr, "accept failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }
     printf("Accepted a new connection with fd: %d\n", new_fd);

     /* Receive data */
     printf("Let us wait for the client to send some data\n");
     do {
         ret_val = recv(new_fd, buf, DATA_BUFFER, 0);
         printf("Received data (len %d bytes)\n", ret_val);
         if (ret_val > 0) 
             printf("Received data: %s\n", buf);
         if (ret_val == -1) {
             printf("recv() failed [%s]\n", strerror(errno));
             break;
         }
     } while (ret_val != 0);

     /* Close the sockets */
     close(fd);
     close(new_fd);
     return 0;
 }

Now, let us describe various pieces of the above example.

The example includes "netinet/in.h" and "sys/socket.h" for definitions of socket address types, constants, and socket calls.

Next, the main() function calls the create_tcp_server_socket() function to create a TCP server socket. This function begins by using socket() call to create a TCP socket.

The create_tcp_server_socket() function binds the socket using a bind() call to a unique port number (7000). The function htons() converts the unsigned ushort port number (port numbers are defined as unsigned short) from host byte order to network byte order. This is needed to handle different endianness of machines. For address, it passes INADDR_ANY (equal to zero) that means the default local address on the server.

Following the bind() call, the create_tcp_server_socket() function uses listen() call to make the server wait for incoming calls; it passes a backlog of 5. After that, it returns with the file descriptor of the server socket.

The main() issues an accept() call using the server file descriptor and waits for an incoming connection. Once the server receives an incoming request, the accept() call returns a new file descriptor (new_fd).

We should take a moment to note that the bind() and accept() calls take pointer to "struct sockaddr" instead of "struct sockaddr_in". The reason for this is that "struct sockaddr" is an IPv4/IPv6 independent definition. For IPv6, we need to use "struct sockaddr_in6" as the socket address. However, for both sockaddr_in and sockaddr_in6, the first 2 bytes (which is the address family) is common and that is same as the first 2 bytes of sockaddr. Thus, irrespective of sockaddr_in or sockaddr_in6, the first 2 bytes are bound to the address family.

 struct sockaddr_in {
     short            sin_family;   // AF_INET 
     unsigned short   sin_port;     // port number in network byte order
     struct in_addr   sin_addr;     // IP address.
     char             sin_zero[8];  // zero this if you want to
 };
 struct sockaddr_in6 {
     u_int16_t       sin6_family;   // AF_INET6
     u_int16_t       sin6_port;     // port number in network byte order
     u_int32_t       sin6_flowinfo; // IPv6 flow information
     struct in6_addr sin6_addr;     // IPv6 address
     u_int32_t       sin6_scope_id; // Scope ID
 };
 struct sockaddr {
     unsigned short    sa_family;    // AF_INET or AF_INET6 
     char              sa_data[14];  // Protocol address
 };

When the bind() call receive a socket address, it uses the first 2 bytes to identify the address family. If the address family is AF_INET, then it uses sockaddr_in to interpret the rest of the fields. On the other hand, if the address family is AF_INET6, then it uses sockaddr_in6 to interpret the rest of the fields. Thus, this simple little trick allows us to retain the same function signature for both IPv4 and IPv6 families!

To keep our implementation simple, the server waits for only one connection. After a new connection is established, the server waits for data on the new connection using the recv() call; else, why would we go through all this trouble! Since recv() call returns zero when the connection is closed, we stay in the do-while loop as long as the return value of the recv() call is not zero. Once recv() call returns 0 bytes, we call it a day, close both sockets, and go home!

TCP Socket Client

Having described the implementation of the TCP server program, let us now look at the other side of the story. Where as a TCP server waits for new connections, a TCP client is the one that actually issues these requests.

In terms of functions, some of the calls used by a client are same as that of the server socket: socket(), send(), recv(), and close(). Since they are already discussed above, we omit their discussion here.

However, the one call that differentiates a client from the server is a connect() call -- it is this call that allows the client to send a request to the server and thereby establish a new connection. Like the bind() call, the connect() call also takes an address as a parameter. However, the passed address is that of the remote TCP server. Here is its signature:

 int connect(int fd, const struct sockaddr *addr, socklen_t addrlen);

With that, let us write a simple TCP client program.

 #include <stdio.h>
 #include <errno.h>
 #include <netinet/in.h> 
 #include <netdb.h> 

 #define DATA_BUFFER "Mona Lisa was painted by Leonardo da Vinci"

 int main () {
     struct sockaddr_in saddr;
     int fd, ret_val;
     struct hostent *local_host; /* need netdb.h for this */

     /* Step1: create a TCP socket */
     fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); 
     if (fd == -1) {
         fprintf(stderr, "socket failed [%s]\n", strerror(errno));
         return -1;
     }
     printf("Created a socket with fd: %d\n", fd);

     /* Let us initialize the server address structure */
     saddr.sin_family = AF_INET;         
     saddr.sin_port = htons(7000);     
     local_host = gethostbyname("127.0.0.1");
     saddr.sin_addr = *((struct in_addr *)local_host->h_addr);

     /* Step2: connect to the TCP server socket */
     ret_val = connect(fd, (struct sockaddr *)&saddr, sizeof(struct sockaddr_in));
     if (ret_val == -1) {
         fprintf(stderr, "connect failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }
     printf("The Socket is now connected\n");

     printf("Let us sleep before we start sending data\n");
     sleep(5);

     /* Next step: send some data */
     ret_val = send(fd,DATA_BUFFER, sizeof(DATA_BUFFER), 0);
     printf("Successfully sent data (len %d bytes): %s\n", ret_val, DATA_BUFFER);

     /* Last step: close the socket */
     close(fd);
     return 0;
 }

For the sake of simplicity, we create this socket on the same machine as that of the TCP server. Therefore, we use the loopback address of the localhost (127.0.0.1) for the connect() call. If the client were to run on a different machine, then this call would require the IP address of the box that houses the TCP server socket.

You may have noticed that the bind() call is conspicuously absent from the scene for the client! This is because, we don't need to explicitly bind the socket to do a connect(). If the socket is not bound, then during the connect() step, the underlying TCP layer automatically selects a random (available) port number and binds the client socket to it. This is acceptable since the client socket does not need to be bound to a well-known (or a pre-communicated) port number.

After connection is established, we use the send() call to send some data to the server. As explained before, since this is a connection-oriented socket, we do not have to specify the address of the remote client for every send() call! After sending data, we close the socket.

Making them talk!

Now that we have both the programs ready, let us run them together! For this task, we run the server on one terminal and the client on another. Further, we need to run the server first, since the server must be up and running before the client can send its request. Following that, we can run the client.

We provide below the output for the server. In the output, the message indicating the return of the accept call appears only after we start the client. Note that the file descriptor for the server socket is 3 where as the file descriptor for the newly accepted connection is 4. In the end, we close the client socket and accordingly, the server's recv() call returns with 0 bytes, indicating that the client has closed the connection.

 $ gcc tcp-server.c -o server
 $ 
 $ ./server 
 Created a socket with fd: 3
 Accepted a new connection with fd: 4
 Let us wait for the client to send some data
 Received data (len 43 bytes)
 Received data: Mona Lisa was painted by Leonardo da Vinci
 Received data (len 0 bytes)

Following is the output for the client.

 $ gcc tcp-client.c -o client
 $ 
 $ ./client
 Created a socket with fd: 3 
 The Socket is now connected 
 Let us sleep before we start sending data
 Successfully sent data (len 43 bytes): Mona Lisa was painted by Leonardo da Vinci

When debugging sockets, we can check the state of these sockets using the ss tool on Linux or netstat tool on non-Linux (Windows, or Mac OS, etc) machines. The netstat tool is deprecated on Linux. If we were to run these tools after running the "server" program and before running the client, we would find an entry for a TCP socket sitting on port 7000 and with a state of LISTEN. To have a compact output, we pass a set of options to ss tool or netstat tool: "t" for tcp only sockets, "p" for printing associated programs, "l" for printing only listening sockets, and "n" for printing numeric addresses.

For these output, to access names of all programs, we must be logged as a root. We provide the output both as the root user and as a non-root user. The output shows tcp-server listening on port 7000 (the third entry in the output).

 Output when logged as a non-root user
 [user@codingbison]$ ss -tpln
 Recv-Q Send-Q  Local Address:Port Peer Address:Port 
 0      128     127.0.0.1:631      *:*     
 0      128     :::631             :::*     
 0      5       *:7000             *:*   users:(("server",30206,3))
 0      10      127.0.0.1:25       *:*     
 [user@codingbison]$ ss -tpln

 Output when logged as the root
 [root@codingbison]# ss -tpln
 Recv-Q Send-Q  Local Address:Port Peer Address:Port 
 0      128     127.0.0.1:631      *:*   users:(("cupsd",1532,12))
 0      128     :::631             :::*  users:(("cupsd",1532,4),("systemd",1,18))
 0      5       *:7000             *:*   users:(("server",30206,3))
 0      10      127.0.0.1:25       *:*   users:(("sendmail",29022,4))
 [root@codingbison]#

On non-Linux systems, here is the output with netstat tool. Once again, we need to login as root, if we wish to see all processes.

 [user@codingbison]$ netstat -tpln
 (Not all processes could be identified, non-owned process info
  will not be shown, you would have to be root to see it all.)
 Active Internet connections (only servers)
 Proto Recv-Q Send-Q Local Address  Foreign Address  State   PID/Program name   
 tcp        0      0 127.0.0.1:631  0.0.0.0:*        LISTEN  -                   
 tcp        0      0 0.0.0.0:7000   0.0.0.0:*        LISTEN  30206/./server      
 tcp        0      0 127.0.0.1:25   0.0.0.0:*        LISTEN  -                   
 tcp        0      0 :::631         :::*             LISTEN  -                   
 [user@codingbison]$

If we were to run this tool immediately after starting the client, such that the connection is established, then we can spot both the new connection and the old existing listener; the sleep command in the TCP client makes taking this output more convenient, else the client program would send data and too quickly for us to print the output.

This time, we pass an "a" option which lists all connections (including listen and established ones) instead of "l" option. We also filter the output (using a "grep" command) for port 7000 since we are only interested in connections that are based on port 7000. The output shows that when the server is waiting for incoming data and when it is reading incoming data, the earlier listener socket continues to stay in the listen state. In addition, we see two records of TCP sockets that are in established state. Since both the sockets are on the same box, the two entries are the two ends of the same connection. Here is the output with ss tool (we are logged in as root).

 [root@codingbison]# ss -tpan | grep 7000
 LISTEN  0      5     *:7000             *:*              users:(("server",30206,3))
 ESTAB   0      0     127.0.0.1:52425    127.0.0.1:7000   users:(("client",30272,3))
 ESTAB   0      0     127.0.0.1:7000     127.0.0.1:52425  users:(("server",30206,4))
 [root@codingbison]#

For the above output, the Recv-Q and the Send-Q (the second and third columns) show data in bytes. Thus, if we were to send 100 bytes from the client and if the server were to not call recv() call, then the Recv-Q would show 100 bytes for the established connection.

On non-Linux systems, here is the output with netstat tool:

 [root@codingbison]# netstat -tpan | grep 7000
 tcp        0      0 0.0.0.0:7000     0.0.0.0:*        LISTEN      30206/./server      
 tcp        0      0 127.0.0.1:52425  127.0.0.1:7000   ESTABLISHED 30272/./client      
 tcp        0      0 127.0.0.1:7000   127.0.0.1:52425  ESTABLISHED 30206/./server      
 [root@codingbison]#