Thursday, November 22, 2007

Port to ARM Processor

http://www.ddj.com/architect/184405435


Portability & the ARM Processor

"ANSI C" doesn't always mean "portability"


Trevor Harmon
When porting a Linux application to the StrongArm-based iPAQ handheld computer. Trevor found out that "ANSI C" and "portability" don't always go hand-in-hand.

Trevor is a graduate student at the University of California, Irvine. His research interests include real-time embedded devices and distributed real-time networks. He can be reached at trevor@vocaro.com.


While we like to think of C as a "write once, compile anywhere" language, a recent experience writing code for the ARM processor reminded me that this isn't always the case. I was porting a Linux application to the iPAQ, a handheld computer from HP (formerly Compaq), and assumed that because this desktop application was written in pure ANSI C, I'd have no problem. In this article, I show why I was wrong about this and share tips on making code more portable to the ARM and similar RISC processors. I've also included several short C programs illustrating portability problems when programming for the ARM (available electronically; see "Resource Center," page 5).

The ARM processor comes from the lineage known as "StrongARM." This low-voltage RISC core was never manufactured by ARM Ltd., which instead tends to license its embedded processor designs to manufacturers. This is what the company did in 1995 when it sold Digital Equipment Corp. (DEC) the rights to build an enhanced version of the ARM core, which quadrupled the clock rate of the ARM while preserving its low-power characteristics. However, DEC eventually sold its design to Intel as part of a massive legal dispute. Today, StrongARM processors can run at 233 MHz without heat sinks or other cooling methods, making them suitable for CPU-intensive embedded devices such as the iPAQ.

Like most technologies, StrongARM earns these benefits by sacrificing a little bit of backward compatibility. Perhaps the most fundamental of these compatibility issues is the processor's endianess—the order in which it stores integers in memory. Fortunately, the ARM processor can configure itself, chameleon like, either as Big-endian or Little-endian. And because most Linux distributions for the iPAQ switch the ARM into Little-endian mode by default, developers porting x86 code to the ARM don't need to worry about endianess. (All x86 processors are Little-endian.) There are, however, three potential hazards you should still keep in mind: signed versus unsigned chars, data alignment, and floating-point emulation.

Signed versus Unsigned Chars

Can you predict what Listing One will do? On my Pentium laptop, this snippet prints c<0, as expected. When recompiled for ARM and run on the iPAQ, the code mysteriously prints c>=0. The reason is hinted at in a warning on the third line, signaled only by the ARM compiler: "Comparison is always false due to limited range of data type."

So the question is: What's the range of the char data type? The answer is "undefined." The ANSI C Standard specifies the range only for signed and unsigned chars. Signed chars are at least -127 to 127, while unsigned chars are at least 0 to 255. As for simple chars, the Standard lets the compiler decide whether they are signed or unsigned.

This ambiguity exists because compilers often have to promote chars to ints in arithmetic operations—such as the comparison in Listing One—and on some machines, the fastest way to do that is with a sign-extent instruction. The PDP-11, for example, on which Dennis Ritchie implemented the first modern version of C in 1973, had the instruction SXT for this task, so for historical reasons, most C compilers make chars signed by default.

Fast forward 20 years and you'll find no single "load character from memory and sign extend" in the ARM instruction set. That's why, for performance reasons, every compiler I'm aware of makes the default char type signed on x86, but unsigned on ARM. (A workaround for the GNU GCC compiler is the -fsigned- char parameter, which forces all chars to become signed.)

Of course, speed comes at the expense of portability. In Listing Two, the comparison is between EOF, defined as -1, and ch, of type char. On x86, the code dumps the contents of textfile to the console, but on ARM, it enters an infinite loop. Basically, what's happening here is that an internal conversion copies the lowest 8 bits of -1 into an instance of char for the comparison, and in the complement notation, those bits are all on. Of course, if the char type is 8 unsigned bits, then those bits are 255 in decimal, which never equals -1; hence, the infinite loop.

The easy solution is to declare the variable as int instead of char. If you take a close look at the stdio functions, you'll see that they were designed with this fix in mind. They all take and return ints, even though they work with characters.

Depending on your point of view, the specification for the char type may be as hysterical as it is historical, but I think everyone can agree on some rules of thumb when following it. They apply not just to ARM developers, but to anyone who wants to write portable code in C:

  • Use signed char when you need small signed integers.
  • Use unsigned char when you need small unsigned integers or to treat a block of memory as a sequence of bytes.

  • Use plain char for ASCII characters and string manipulation only.

As usual, write portable code first and worry about optimizations later.

Data Alignment

Making assumptions about data alignment is another way to shoot yourself in the foot while programming for ARM. As a case in point, I recently wrote a program that sends data from an iPAQ to my PC over a serial cable; see Listing Three.

When I examined the two values of sensor data on the PC side, I discovered that the first one came over fine, but the second was corrupted. I didn't realize what was happening until I looked at the return value of sizeof. On the PC, the size of the SensorData struct was, just as I expected, 3 bytes (three chars of 1 byte each). On the iPAQ, however, the struct was 4 bytes.

The problem was that I assumed the compiler would lay out the fields of the struct without any space between them, when in fact, there is no such requirement in C. The compiler knew that the ARM, like other RISC processors, is more efficient when loading data from memory on 32-bit boundaries, and it realized that any data following my 24-bit struct wouldn't fall on such a boundary. So, it added an invisible 8 bits to make the struct reach the next 32-bit address. Those extra bits were being sent over the serial cable along with the sensor data, and that's what caused the corruption.

Essentially, the compiler makes a judgment call to trade data space for smaller, faster code. Otherwise, data lying across a 32-bit boundary may have to be loaded piecewise then shifted and ORed together, a slower alternative requiring more opcodes. That idea is foreign to programmers who grew up on the x86 architecture (like me) and are used to the CISC style where alignments don't matter. One way to fix the problem is to do the shifting and ORing yourself and pack the struct's fields into a string before sending them through the wire. Listing Four is a simpler solution using the __attribute__ keyword, a GNU GCC extension. With this change, sizeof(struct SensorData) returns three on both x86 and ARM. Figure 1 shows another example of how this __attribute__ keyword can eliminate structure padding differences.

Unfortunately, the fix works only for structs, and data alignment bugs can waste your afternoon in other ways. Imagine networking code that packs a char and int into a 5-byte string, ships it across the network, then unpacks it at the other end. Listing Five is a mock-up of how the unpacking might work.

On x86, this code predictably prints 05040302, but on ARM, it prints 01040302. The discrepancy is due to the location of the int pointer, which lies on an odd-numbered address (buf+1). The x86 processors have no problem accessing words from odd addresses; there is merely a performance hit for accesses not aligned on a 2-, 4-, or 8-byte boundary. ARM processors, on the other hand, truncate the pointer to the nearest word-aligned address during a load. They will then rotate the data in a way that depends on the endian configuration and the offset of the address. The results are unpredictable.

These frustrating data-alignment problems are certainly nothing new. They're so common that the comp.lang.c FAQ (http://www.eskimo.com/~scs/C-faq/top.html) has a section to address them. But it's not just beginners that step into the hole. Even experienced Linux kernel hackers sometimes produce nonportable code when they forget about structure padding. Listing Six is a struct from Version 2.4 of the Linux kernel's TCP/IP implementation. ETH_ALEN is 6, so the size of the struct is 14 on some architectures, but 16 on others. The alignment differences cause bugs in parts of the kernel that calculate offsets into network packets using sizeof(struct ethhdr). Luckily, Russell King, the maintainer of the ARM port of the Linux kernel, noticed the problem and submitted a patch that adds __attribute__ ((packed)) to the ethhdr struct, improving compatibility with ARM, SPARC, and other processors with strict alignment rules. The fix will be available in the 2.6 series of the Linux kernel.

The moral of this story is that you'll need sharp eyes to spot alignment errors when developing for ARM. If you plan on using structs for network routines or writing binary files, your code must be carefully crafted to avoid holes. Remember that the minimum structure alignment is 4 on the ARM compiler and 1 on x86. Be especially wary when porting legacy code from the x86 world, which is known to be sloppy in this area.

Floating-Point Emulation

Although ARM Ltd. offers a floating-point coprocessor, it's not compatible with StrongARM in the iPAQ. Like most embedded designs, however, power consumption, chip size, and cost take precedence over speed, so the iPAQ probably would have left the floating-point unit out anyway. Instead, a software library emulates floating-point operations with integer arithmetic and the expected performance penalty.

In Linux, the floating-point emulator for ARM is a child of NetWinder, a low-power Internet server running Linux on a StrongARM processor. The makers of this turn-key "Internet appliance" decided that the costs of licensing a third-party emulator were too steep and developed one on their own. They derived this emulator from SoftFloat (http://www.jhauser.us/arithmetic/SoftFloat.html), a freely available IEEE floating-point library by the University of California at Berkeley student John Hauser, to which they added some ARM-specific inline assembly. When the iPAQ reaches a floating-point instruction, the StrongARM processor, having no FPU, throws an "undefined instruction" exception that the NetWinder emulator traps and reroutes to the appropriate SoftFloat algorithms.

The NetWinder emulator is now included in the official Linux kernel and licensed under GPL. This means that any Linux distribution for the iPAQ has floating-point support by default. Programs can simply define floats and doubles as usual and link in the libc math library for high-level functions such as sin or cos. (In fact, the NetWinder emulator contains only arithmetic, exp, and sqrt operations and lets libc handle the rest.) You should still be aware of speed limitations if you need floating point extensively on the ARM.

DDJ

Listing One

char c = -1;
if (c < 0)
printf("c<0\n");
else
printf("c>=0\n");
Back to Article

Listing Two

char ch;
FILE* file;
file = fopen("textfile", "r");
while ((ch = getc(file)) != EOF)
putchar(ch);
Back to Article

Listing Three

struct SensorData
{
unsigned char x_position;
unsigned char y_position;
unsigned char sensorID;
};
...
write(serial_port, sensor_data1,
sizeof(struct SensorData));
write(serial_port, sensor_data2,
sizeof(struct SensorData));
Back to Article

Listing Four

struct SensorData
{
unsigned char x_position;
unsigned char y_position;
unsigned char sensorID;
} __attribute__ ((packed));
Back to Article

Listing Five

char buf[5];
int* i = (int*)(buf+1);
// Simulate data read from network
buf[0]=1; buf[1]=2; buf[2]=3;
buf[3]=4; buf[4]=5;
printf("%08x\n", *i);
Back to Article

Listing Six

struct ethhdr
{
unsigned char h_dest[ETH_ALEN];
unsigned char h_source[ETH_ALEN];
unsigned short h_proto;
};

Figure 1: By adding GCC's __attribute__ keyword to a structure declaration, you can pack data tightly together at the expense of a performance hit.

No comments:

如何发掘出更多退休的钱?

如何发掘出更多退休的钱? http://bbs.wenxuecity.com/bbs/tzlc/1328415.html 按照常规的说法,退休的收入必须得有退休前的80%,或者是4% withdrawal rule,而且每年还得要加2-3%对付通胀,这是一个很大...