CodingBison

A fair number of socket calls, like accept() and recv(), are blocking. This can pose a problem for real-life network applications, where a socket server needs to handle a large number of clients. It is easy to see that with large number of clients, we would end up blocking most of the time and hence, would hardly scale! The way around this problem is to use the socket select() call -- select() allows us to monitor a large number of sockets, all in one shot without having to block individually for each socket.

You could argue that scalability can also be reached by using a large number of threads, with some of them reading incoming data, some of the writing outgoing data, and one or more of them accepting incoming connections. While this certainly allows us to handle additional load, a select() call would still be a better choice since it can monitor a larger number (around 1024) sockets in one shot -- having 1024 threads would not be feasible on most of the platforms! However, the best bang for the buck would come when we combine these two approaches and use select() call with multiple threads. In this approach, we can have one thread blocking on the select() call and threads handling read and write operations identified by the select().

The select() method accepts as an input a bitmask (structure fd_set) where we set bits corresponding to the set of file descriptors. Do not worry -- the socket layer provides handy macros to set these bits and check for these bits, so we do not have to worry about dealing with bits at all. Isn't that a relief!

Once we pass an fd_set with bits set for each file descriptor, then the select() call monitors all of them. If there is an event on any of those descriptors, then it returns immediately and informs the application that a given fd has an event and the application can act accordingly. We can find the fd (or fds) with an event by checking if the corresponding bit is set for the passed file descriptors.

Before we go any further, this would be a good time to see the signature of the select() call.

 int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, 
     struct timeval *timeout);

 struct timeval {
     long int tv_sec;
     long int tv_usec
 };

The select() call takes several arguments. The first argument is the highest file descriptor plus one. Thus, if we pass two file descriptors with values 2 and 10, then the nfds parameter should be 10 + 1 or 11 and not 2. The maximum number of sockets supported by select() has an upper limit, represented by FD_SETSIZE (typically 1024). For simpler programs, passing FD_SETSIZE as nfds should be more than sufficient!.

The next three parameters represent the three different types of events monitored by the select(): read, write, and exception events. A read event means that for a given fd, there is either some data to be read (so the application can call recv()) or a new connection has been established (so the application can call accept()). A write event means that for a given fd, the local send buffer has become non-empty and the application can send more data. An exception event means that there is some exception event like receiving out-of-band data.

These three parameters are pointers to fd_set values, one for read, one for write, and the other for exception. An application does not necessarily have to pass all of these fd_sets. For example, if the application is only interested in monitoring read events, then it can pass only the read fd_set and pass the other two as NULL. The select calls monitors all the file descriptors specified in the three fd_set bitmasks.

The sixth and the last argument to select() is a timeout value in the form of a pointer to a timeval structure. The first field, tv_sec stores the number of whole seconds of elapsed time. The second field, tv_usec stores the rest of the elapsed time (a fraction of a second) in the form of microseconds. If we pass a NULL value to this field, then the select() waits indefinitely for events. Otherwise, if we make the select timeout after a certain time, then we need to pass a non-NULL value of timeval to it.

If a timeout does not occur and there are some events (read, write, or exception) on the file descriptors, then the return value from select() is the total number of file descriptors that are ready with read, write, or exception events. Also, when select() returns, it overwrites each of the three fd_sets with information about the descriptors that are ready for the corresponding operation. So, if we use select() in a loop, then before calling select(), we need to reset the fd_sets every time with the file descriptor that we wish to monitor.

With that, let us now write a simple TCP server code that demonstrates the need for a select() call. The example shows that select() can do several things seamlessly like, handling multiple existing connections, listening for newer connections, etc. We provide the example below followed by its explanation.

 #include <stdio.h>
 #include <netinet/in.h> 
 #include <unistd.h> 
 #include <string.h>
 #include <errno.h>

 #define DATA_BUFFER 5000
 #define MAX_CONNECTIONS 10 

 int create_tcp_server_socket() {
     struct sockaddr_in saddr;
     int fd, ret_val;

     /* Step1: create a TCP socket */
     fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); 
     if (fd == -1) {
         fprintf(stderr, "socket failed [%s]\n", strerror(errno));
         return -1;
     }
     printf("Created a socket with fd: %d\n", fd);

     /* Initialize the socket address structure */
     saddr.sin_family = AF_INET;         
     saddr.sin_port = htons(7000);     
     saddr.sin_addr.s_addr = INADDR_ANY; 

     /* Step2: bind the socket to port 7000 on the local host */
     ret_val = bind(fd, (struct sockaddr *)&saddr, sizeof(struct sockaddr_in));
     if (ret_val != 0) {
         fprintf(stderr, "bind failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }

     /* Step3: listen for incoming connections */
     ret_val = listen(fd, 5);
     if (ret_val != 0) {
         fprintf(stderr, "listen failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }
     return fd;
 }

 int main () {
     fd_set read_fd_set;
     struct sockaddr_in new_addr;
     int server_fd, new_fd, ret_val, i;
     socklen_t addrlen;
     char buf[DATA_BUFFER];
     int all_connections[MAX_CONNECTIONS];

     /* Get the socket server fd */
     server_fd = create_tcp_server_socket(); 
     if (server_fd == -1) {
       fprintf(stderr, "Failed to create a server\n");
       return -1; 
     }   

     /* Initialize all_connections and set the first entry to server fd */
     for (i=0;i < MAX_CONNECTIONS;i++) {
         all_connections[i] = -1;
     }
     all_connections[0] = server_fd;

     while (1) {
         FD_ZERO(&read_fd_set);
         /* Set the fd_set before passing it to the select call */
         for (i=0;i < MAX_CONNECTIONS;i++) {
             if (all_connections[i] >= 0) {
                 FD_SET(all_connections[i], &read_fd_set);
             }
         }

         /* Invoke select() and then wait! */
         printf("\nUsing select() to listen for incoming events\n");
         ret_val = select(FD_SETSIZE, &read_fd_set, NULL, NULL, NULL);

         /* select() woke up. Identify the fd that has events */
         if (ret_val >= 0 ) {
             printf("Select returned with %d\n", ret_val);
             /* Check if the fd with event is the server fd */
             if (FD_ISSET(server_fd, &read_fd_set)) { 
                 /* accept the new connection */
                 printf("Returned fd is %d (server's fd)\n", server_fd);
                 new_fd = accept(server_fd, (struct sockaddr*)&new_addr, &addrlen);
                 if (new_fd >= 0) {
                     printf("Accepted a new connection with fd: %d\n", new_fd);
                     for (i=0;i < MAX_CONNECTIONS;i++) {
                         if (all_connections[i] < 0) {
                             all_connections[i] = new_fd; 
                             break;
                         }
                     }
                 } else {
                     fprintf(stderr, "accept failed [%s]\n", strerror(errno));
                 }
                 ret_val--;
                 if (!ret_val) continue;
             } 

             /* Check if the fd with event is a non-server fd */
             for (i=1;i < MAX_CONNECTIONS;i++) {
                 if ((all_connections[i] > 0) &&
                     (FD_ISSET(all_connections[i], &read_fd_set))) {
                     /* read incoming data */   
                     printf("Returned fd is %d [index, i: %d]\n", all_connections[i], i);
                     ret_val = recv(all_connections[i], buf, DATA_BUFFER, 0);
                     if (ret_val == 0) {
                         printf("Closing connection for fd:%d\n", all_connections[i]);
                         close(all_connections[i]);
                         all_connections[i] = -1; /* Connection is now closed */
                     } 
                     if (ret_val > 0) { 
                         printf("Received data (len %d bytes, fd: %d): %s\n", 
                             ret_val, all_connections[i], buf);
                     } 
                     if (ret_val == -1) {
                         printf("recv() failed for fd: %d [%s]\n", 
                             all_connections[i], strerror(errno));
                         break;
                     }
                 }
                 ret_val--;
                 if (!ret_val) continue;
             } /* for-loop */
         } /* (ret_val >= 0) */
     } /* while(1) */

     /* Last step: Close all the sockets */
     for (i=0;i < MAX_CONNECTIONS;i++) {
         if (all_connections[i] > 0) {
             close(all_connections[i]);
         }
     }
     return 0;
 }

The above example uses an all_connections array to store information about various sockets. This is needed since we are dealing with multiple sockets. Next, the example creates a server socket (using the socket(), bind(), and listen() call sequence) by calling the create_tcp_server_socket() and then stores the returned file descriptor as the first element in the all_connections array. Further, as and when we get new incoming connections, we store their fds in this array as well.

To help identify empty slots in the all_connections array, we initialize it with -1. And as and when a connection goes away, we reset its index back to -1. We use all_connections array to set bits in read_fd_set value by using the FD_SET() macro. For the sake of simplicity, we pass an fd_set only for read events and NULL for write and exception events.

The above program passes NULL as a timeout value to select() call. However, if an application wishes to timeout, then it can define a timeout value as follows and then pass "&timeout" as the last parameter to the argument. For example, the following example sets the timeout value to 30.5 seconds.

 struct timeval timeout;
 timeout.tv_sec = 30;      /* Value in seconds */
 timeout.tv_usec = 500000; /* Value in milli-seconds */

The program sits in a while() loop and for every pass, it begins by populating the read_fd_set with the connections present in the all_connections array. Next, the program blocks on select() and the waiting game begins! Once we have an incoming connection or some data on an existing connection, the select() returns. Since upon return(), the select() call updates the read_fd_set by first clearing all the bits and then setting only those bits that have read events. We use FD_ISSET macro to find out the connections that have been set.

For a listener fd, which is the first element of the all_connections array, a read-event signifies that there is a pending new connection. And, when that happens, we can call accept() and store the returned file descriptor of the new connection in the all_connections array. Since the return value of select() is the total number of file descriptors that are ready for the event, we decrease the ret_val value by one. If ret_val equals zero, then that means only one file descriptor was ready and so we continue.

Since all_connections can have two or more file descriptors (one listener and the other accepted connections), we should be able to listen to incoming read events for all of them -- a read event on the listener would mean a new connection and a read event on the accepted connection would mean there is new data to be read.

For a non-listener fd, if the received bytes is 0, then that means the connection is closed and we remove it from the all_connections array so that in the next pass, we would not be doing a select() for the closed connection.

It would be good to pinpoint that the above program uses two loops: one to traverse over the array and set each file descriptor in the read_fd_set and one to lookup the returned file descriptor from the select() call. Each of the two loops incur a running time complexity of O(n) -- clearly, we can do better than that. One way to improve the running time complexity would be to use a hash-table so that the lookup loop becomes faster. The average time complexity of a hash table is O(1) and so, for higher-workloads, it would be a lot faster. If we want to make both the loops faster, then we would have to use a data structure that is optimized for both traversal and lookups. We would leave that as an exercise for the reader!

Now that we have the server ready, we would also need clients that can talk to it and test our select() code. For that we present a simple TCP client program. Once we write it, we would run multiple copies of this client to mimic a workload of multiple clients.

 #include <stdio.h>
 #include <errno.h>
 #include <netinet/in.h> 
 #include <netdb.h> 

 #define DATA_BUFFER "Mona Lisa was painted by Leonardo da Vinci"

 int main () {
     struct sockaddr_in saddr;
     int fd, ret_val;
     struct hostent *local_host; /* need netdb.h for this */

     /* Step1: create a TCP socket */
     fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); 
     if (fd == -1) {
         fprintf(stderr, "socket failed [%s]\n", strerror(errno));
         return -1;
     }
     printf("Created a socket with fd: %d\n", fd);

     /* Let us initialize the server address structure */
     saddr.sin_family = AF_INET;         
     saddr.sin_port = htons(7000);     
     local_host = gethostbyname("127.0.0.1");
     saddr.sin_addr = *((struct in_addr *)local_host->h_addr);

     /* Step2: connect to the TCP server socket */
     ret_val = connect(fd, (struct sockaddr *)&saddr, sizeof(struct sockaddr_in));
     if (ret_val == -1) {
         fprintf(stderr, "connect failed [%s]\n", strerror(errno));
         close(fd);
         return -1;
     }
     printf("The Socket is now connected\n");

     printf("Let us sleep before we start sending data\n");
     sleep(5);

     /* Next step: send some data */
     ret_val = send(fd,DATA_BUFFER, sizeof(DATA_BUFFER), 0);
     printf("Successfully sent data (len %d bytes): %s\n", 
                 ret_val, DATA_BUFFER);

     /* Last step: close the socket */
     close(fd);
     return 0;
 }

This time we would run two clients from two different terminals along with this server. The output for the server is below.

 $ gcc select-tcp-server.c -o select_tcp_server
 $ 
 $ ./select_tcp_server 
 Created a socket with fd: 3

 Using select() to listen for incoming events
 Select returned with 1
 Returned fd is 3 (server's fd)
 Accepted a new connection with fd: 4

 Using select() to listen for incoming events
 Select returned with 1
 Returned fd is 3 (server's fd)
 Accepted a new connection with fd: 5

 Using select() to listen for incoming events
 Select returned with 1
 Returned fd is 4 [index, i: 1]
 Received data (len 43 bytes, fd: 4): Mona Lisa was painted by Leonardo da Vinci

 Using select() to listen for incoming events
 Select returned with 1
 Returned fd is 4 [index, i: 1]
 Closing connection for fd:4

 Using select() to listen for incoming events
 Select returned with 1
 Returned fd is 5 [index, i: 2]
 Received data (len 43 bytes, fd: 5): Mona Lisa was painted by Leonardo da Vinci

 Using select() to listen for incoming events
 Select returned with 1
 Returned fd is 5 [index, i: 2]
 Closing connection for fd:5

 Using select() to listen for incoming events

 ^C
 $ 

From this output, we can see that a listener is created with a file descriptor of 3 and it then uses select() to wait for incoming connections. When the first client comes up, the select() returns and we accept the first incoming connection assigning it a file descriptor of 4. After that we do select() and block, only to be awakened by the next client's incoming connection which gets an fd of 5. At this point, we call select by setting 3 fds in the fd_set: 3, 4,and 5. When we receive data on the new connections, select() returns and we read data. Once a connection is closed, select() returns (when the recv() call returns 0) and we close the connection. And, the select() will continue to wait forever for the newer connections.

The output for both the clients is same:

 $ ./tcp_client
 Created a socket with fd: 3                                                                                                       
 The Socket is now connected 
 Let us sleep before we start sending data
 Successfully sent data (len 43 bytes): Mona Lisa was painted by Leonardo da Vinci  

We should note that this example uses select() for a connection-oriented socket (TCP), but select() can also be used for connectionless sockets (UDP). If there are multiple UDP listeners (servers), then we can always use select() to block for all of these and as and when we receive a read event, we can use recvfrom() to read the incoming data.





comments powered by Disqus