CodingBison

C provides a set of library functions for manipulating both string and non-string data. These functions are published using the "string.h" header file and we should include this header file to use them. A string data uses a pointer to char ("char *") and a non-string data uses a pointer to void ("void *").

Before we go any further, it would do us good to recall that C represents strings as an array of char types. Each char type requires one byte of storage. Thus, a string is an array of one byte-sized elements. When C stores a string in an array, it uses '\0' (NUL termination character) as the last character to mark the end of the string. The NUL character is a critical aspect of C string, because without that, C would not be able to know the end of the string.

In this section, we focus on functions that copy data (string or non-string) from one location to another.

Overview of Functions

Let us begin by providing the copying functions; these functions help us copy both string and non-string data.

 char *strcpy(char *dest, const char *src);
 char *strncpy(char *dest, const char *src, size_t n);
 void *memcpy(void *dest, void *src, size_t n); 
 void *memmove(void *dest, void *src, size_t n); 

Function strcpy() copies a source string (src) into a destination string (dest), including the NUL character present at the end of the source string. The function strncpy() also does the same thing, but it only copies a maximum of "n" bytes. If strncpy() reaches NUL character (that marks the end of the src string) before reaching "n" bytes, then the copying terminates there. When passing "n", we should ensure that it includes the NUL character of the src string as well.

For both cases, it is equally important that the dest string should have enough buffer to accommodate the src string being copied along with the NUL character; thus, the length should be at least equal to the length of the source string plus the NUL character.

For strcpy() and strncpy() functions, if there is an overlap in the source and destination strings, then the behavior is undefined; for strncpy(), the overlap would occur if there is an overlap between the first n bytes of the source string and the first n bytes of the destination string.

The next function, memcpy(), allows us to copy "n" bytes from src buffer to the dest buffer. Since memcpy() already takes "n" size_t as input, it is in fact closely related to strncpy and not strcpy. Unfortunately, similar to strcpy() and strncpy(), if the buffers pointed by dest and src overlap, then the behavior of memcpy() is also undefined!

The next function, memmove() copies "n" bytes from the source buffer (src) to the destination buffer (dest). This function brings the good news that we have been waiting for -- for the case when the buffers pointed by source and destination overlaps dest and src overlap, memmove() correctly copies the source buffer to the destination buffer!

All of the above four functions return a pointer to the updated dest string.

If we were to attempt to copy a string to a string that has smaller buffer, then the behavior is undefined; in simpler words, doing so would be plain wrong! Hence, when copying strings, it is wise to pay attention to two things; the destination string should have enough buffer and the NUL character should be copied (added) at the end of the string.

Armed with a basic understanding of the above functions, let us now look at four examples to increase our understanding further. The first two examples focus on string-related functions. The last two examples focus on memory-related functions.

Examples: strcpy() and strncpy()

Our first example shows a simple usage of strcpy and strncpy functions. The example uses the built-in function strlen() to get the total number of characters present in a string. However, strlen() does not include the terminating NUL character. Accordingly, we pass an additional byte, as "len +1" to the strncpy() call, so that it also copies the terminating NUL character. Here is the example:

 #include <stdio.h>
 #include <string.h>

 #define STR_LONG "Starry Nights by Vincent Van Gogh"
 #define STR_SHORT "The Yellow House"

 #define LEN_STRING 50

 int main () {
     char var_str[LEN_STRING];
     char* str_temp;
     size_t len;

     /* Copy a Short string to a short string */
     printf("[strcpy] Copying shorter string:\n");
     printf("[strcpy] Before copy: %-35s (len: %2d)\n", var_str, strlen(var_str));
     str_temp = strcpy(var_str, STR_SHORT);
     printf("[strcpy] After copy : %-35s (len: %2d)\n", var_str, strlen(var_str));

     /* Copy a long string to a long string using strncpy */
     len = (strlen(STR_LONG) > LEN_STRING) ?  strlen(STR_LONG) : LEN_STRING;
     printf("\n[strncpy] Copying longer string (%d bytes):\n", len);
     printf("[strncpy] Before copy: %-35s (len: %2d)\n", var_str, strlen(var_str));
     str_temp = strncpy(var_str, STR_LONG, (len + 1)); /* Add 1 for NUL character */
     printf("[strncpy] After copy : %-35s (len: %2d)\n", var_str, strlen(var_str));
     return 0;
 }

Note that the above program show two different ways in which we can initialize a string. First, the string, STRING_LONG, is merely a macro and therefore, where ever needed the entire string represented by STRING_LONG gets replaced by the compiler. The second style uses a string variable as an array of chars (var_str). This method provides us with the pointer to the beginning of these strings and we can navigate them from there. There is yet another way of using strings and that is by doing a malloc of a character array.

Let us now see the output of the above program to understand various parts better.

 $ gcc -g strcpy.c -o strcp
 $ 
 $ ./strcp 
 [strcpy] Copying shorter string:
 [strcpy] Before copy:                                      (len: 14)
 [strcpy] After copy : The Yellow House                     (len: 16)

 [strncpy] Copying longer string (50 bytes):
 [strncpy] Before copy: The Yellow House                    (len: 16)
 [strncpy] After copy : Starry Nights by Vincent Van Gogh   (len: 33)

As expected, both strcpy() and strncpy() copy the source string into the destination string. strcpy() automatically appends the NUL character at the end. The output shows that before we call the strcpy(), var_str variable contains garbled text (not shown) -- this is because it is not initialized. Hence, it is a good idea to initialize a string, when applicable. We can initialize strings using memset() call that we will visit in a later section.

Our second example reveals a subtle folly of the strcpy()/strncpy() functions. These functions do not work correctly if the two strings (source string and destination string) have an overlap. This example intentionally tries to do a strncpy() to a destination string, when the destination string overlaps with the source string.

The example has two strings: src (equal to "0123456789") and dest. Next, the example makes dest point to src and then advances it by 4 characters. This way, dest becomes "456789". When we copy dest worth of bytes from src to dest, due to overlap, src ends up writing to itself! Here is the example:

 #include <stdio.h>
 #include <string.h>

 int main () {
     char src[] = "0123456789";
     char *dest, *str_temp;

     dest = src;
     dest += 4; /* Move the pointer ahead by 4 bytes */

     printf("Before copy:  src: %-10s (len: %3d)\n", src, strlen(src));
     printf("Before copy: dest: %-10s (len: %3d)\n\n", dest, strlen(dest));

     str_temp = strncpy(dest, src, strlen(dest));

     printf("After copy :  src: %-10s (len: %3d)\n", src, strlen(src));
     printf("After copy : dest: %-10s (len: %3d)\n", dest, strlen(dest));

     return 0;
 }

When we run the program, the output is not the same as expected. We would have expected the output to show dest as "012345" -- instead, it shows dest as "012301"!

 Before copy:  src: 0123456789 (len:  10)
 Before copy: dest: 456789     (len:   6)

 After copy :  src: 0123012301 (len:  10)
 After copy : dest: 012301     (len:   6)

So, what went wrong? Basically, when we try to copy 6 bytes of string from src to dest, the strncpy does not account for overlap and unknowingly overwrites the storage space (shared by both variables). As noted earlier, strcpy() and memcpy() also suffer from this flaw.

Let us use a figure to explain this oddity. For the sake of clarity, the figure shows strncpy() copying in two steps. In the first step, it shows copying of the first 4 bytes to show the initial overwrite (we choose 4 bytes because dest is ahead of src by 4 bytes). In the second step, we copy the remaining 2 bytes.



Figure: Overlapping bytes: copying 6 bytes from src to dest

As shown in the above figure, by the time the first 4 bytes are written, src ends up overwriting itself. Due to that, the value of character '4' gets overwritten by character '0', and so on. At the start of the second step, src is now "0123012389" and dest is now "012389". The second step does no good either and copies the 5th and 6th bytes ("01") back to itself leading to the final value of "0123012301"!

In case you have started to worry, there are two ways out of this problem. First and simple way is to use memmove(). Second way is to take into account the overlap and copy bytes in the correct order.

The copy implementation should consider using the correct copying order. In the earlier figure we started copying with the first character and the order was: '0' first, followed by '1', '3', '4', and '5'; this ordering leads to overwriting. If we were to copy in the reverse direction ('4' first, followed by '3', '2', '1', and '0'), we can arrive at the correct output. We show this in the figure below.



Figure: Overlapping bytes: coping in reverse order provides correct copy

The ordering should be determined based on the starting address of the two strings. If there is an overlap and the starting character of the destination string lies after the starting character of the source string, then the reverse ordering is the right approach. On the other hand, if the starting character of the destination string lies before the starting character of the source string, then the normal ordering is the right approach.

Thus, if we suspect that there might be an overlap between the source and the destination string, then we should avoid using strcpy()/strncpy(). Note that even though the behavior is undefined in this case, this does not mean that these two functions may not provide a correct behavior in some of the cases. But, if you have got luck like mine, then you are likely to see the incorrect behavior!

Examples: memcpy() and memmove()

Our last two examples go beyond the two string-related functions. They provides implementations for the two memory-related functions: memcpy() and memmove().

The following example shows usage of memcpy() and memmove(). It also demonstrates that memmove() is the right function to use when there is an overlap between the source string and the destination string.

 #include <stdio.h>
 #include <string.h>

 #define STR_MONALISA "Mona Lisa was painted in the year 1505."
 #define LEN_STRING 50
 #define STRING_NUM "0123456789"

 int main () {
     char src[LEN_STRING];
     char *str_temp, *dest;

     memcpy((void *)src, STR_MONALISA, strlen(STR_MONALISA) + 1);
     printf("[memcpy] src: %s \n\n", src);

     memcpy((void *)src, STRING_NUM, strlen(STRING_NUM) + 1);
     dest = src;
     dest += 4; /* Move the pointer a little bit ahead */
     printf("[memcpy] Before copy: src: %s, dest: %s\n", src, dest);
     str_temp = (char *)memcpy((void *)dest, (void *)src, strlen(dest));
     printf("[memcpy] After copy : src: %s, dest: %s\n\n", src, dest);

     /* Reset src to point to STRING_NUM again */
     memcpy((void *)src, STRING_NUM, strlen(STRING_NUM) + 1);
     dest = src;
     dest += 4; /* Move the pointer a little bit ahead */

     printf("[memmove] Before copy: src: %s, dest: %s\n", src, dest);
     str_temp = memmove(dest, src, strlen(dest));
     printf("[memmove] After copy : src: %s, dest: %s\n", src, dest);

     return 0;
 }

The example starts with a string variable, str and uses memcpy() to copy string MONA_LISA_YEAR to str. Next, it uses STRING_NUM to demonstrate the case of overlap. Here is the output:

 $ gcc memcpy.c -o memcpy
 $ 
 $ ./memcpy 
 [memcpy] src: Mona Lisa was painted in the year 1505. 

 [memcpy] Before copy: src: 0123456789, dest: 456789
 [memcpy] After copy : src: 0123012345, dest: 012345

 [memmove] Before copy: src: 0123456789, dest: 456789
 [memmove] After copy : src: 0123012345, dest: 012345

The above output shows that for the case of overlapping strings, surprisingly, memcpy() also provides a correct output. As we noted earlier, when we copy "0123456789" to "456789", the output should be "0123012345". Even if the behavior is undefined, it does not mean that memcpy() would always provide an incorrect output to every copying of overlapping strings -- in some cases, it might still provide the correct output. I guess, I got lucky! However, its behavior should still be treated as undefined. When we use the memmove() variant, we find that the output is correct.

The last example shows the unique advantage that memcpy() and memmove() enjoy over strcpy() and strncpy() -- they can copy data for non-string storage types as well. The example (provided below) defines a data structure, painting_frame and use memmove() to copy the values of the information stored in one data structure to another. Since the return value of memmove() is same as the destination string, we ignore the output using the "(void)" syntax.

 #include <stdio.h>
 #include <string.h>

 #define BUFFER_SIZE 100
 #define STR_PAINTER "Leonardo da Vinci"

 typedef struct painting_frame {
     int painting_id;
     int width;
     int height;
     char painter[BUFFER_SIZE];
 } painting_frame_t;

 void print_painting (painting_frame_t* p_frame) {
     if (!p_frame) return;

     printf("Printing Painting Frame Info: \n");
     printf("\tID     : %5d\n", p_frame->painting_id);
     printf("\twidth  : %5d\n", p_frame->width);
     printf("\theight : %5d\n", p_frame->height);
     printf("\tpainter: %s\n\n", p_frame->painter);
 }

 int main () {
     painting_frame_t painting1 = {1001, 40, 100, STR_PAINTER};
     painting_frame_t painting2;

     print_painting(&painting1);

     (void) memmove((void *)&painting2, (void *)&painting1, sizeof(painting_frame_t));

     painting2.painting_id = 1002;
     print_painting(&painting2);

     return 0;
 }

Here is the output of the above code.

 $ gcc memstructcpy.c -o memstruct
 $ 
 $ ./memstruct
 Printing Painting Frame Info: 
 	ID     :  1001
 	width  :    40
 	height :   100
 	painter: Leonardo da Vinci

 Printing Painting Frame Info: 
 	ID     :  1002
 	width  :    40
 	height :   100
 	painter: Leonardo da Vinci




comments powered by Disqus