Shuva's blog
Evergreen issues wil the End of Line (EOL) character 
Sunday, January 27, 2008, 10:48 AM - Tips and Tricks
There are a bunch of irritating errors that often come in the life of software engineers and we always learn the hard way over the years. At the end, the reason is often not so complicated, as usual. Its about the way systems choose to choose the end the line character.

The end-of-line character is one hell of a character. Different operating system developers choose different formats. Despite having a good interesting history about the reasons for being different, it does not justify the pain that comes along with it.

The way the common OSes choose the End of Line (EOL) character is as follows:

DOS/Windows: It uses two characters CR+LF (in ASCII its 0D followed by 0A)
UNIX: It uses only LF (in ASCII its 0A)
Macintosh: It uses on CR. (in ASCII its 0D)

It’s obvious that transferring text file between these operating systems would cause problems. A file created in Windows would open properly in UNIX and Mac, except that an extra character may or may not be displayed at the end of every line. It’s usually ^M in UNIX. Cant say how it’s on Mac --- never worked on Mac so far. A file written in Mac/Unix would probably be seen as a single long line in Windows.

Here are a few of the problems caused by this anomaly, which I have faced in my computing experience.

quantum_of_solace.sh : line 42: syntax error: unexpected end of file
but the file only has 41 lines.
Reason: This script has CR+LF as EOL and we get this error while trying to run it on UNIX. Its confused by the end of line characters and the line counting logic is also confused. You need to run the dos2unix utility to convert all CR+LF to LF only.


end_of_life.sh: /bin/bash: bad interpreter: No such file or directory
Reason: This has also the same reason as in case 1. The first line of the script is #!/bin/bash<CR>
So, file not found is actually for “/bin/bash<CR>” and since <CR> is a non-displayable character this message baffles all first time users.



Have you ever heard somebody saying "Always use the bin mode while doing an FTP, its faster"?
Explanation: If it is not faster then why do we have two modes in FTP? The FTP program is smart enough to do the EOL conversion for you when you transfer text files between different operating systems. So if you FTP a text file in ASCII mode, it does convert every “CR+LF” to only “LF” when the transfer is from a Windows OS to a Linux OS. It’s obvious that ASCII mode is invasive and could be damaging if you use this mode for transferring binary files. Note: ZIP files are binary files, so if your ASCII files are zipped and then FTPed then no such conversion is possible.

Sometimes I see ^M characters at the end of every line, where did they come from?
Reason: This is when a file created in Windows is opened in UNIX. ^M represents the non displayable character <CR> that was actually meant to symbolize the EOL (CR+LF) in Windows.

/CVSROOTccess /usr/local/cvsroot
No such file or directory

This is the last one (in context) of those weird cases, which I faced. Last Friday, Mahesh called me to decipher this message. We had a very good stare at this message without any clues, tried everything, and then ultimately hit the Google search to get enlightened. All that was needed to do was:

dos2unix `find . -name Root`
dos2unix `find . -name Entries`
dos2unix `find . -name Repository`


Same file in Unix has a smaller size than in Windows ...EEEEEEEEEEhhhh!
Reason: Yes, this is expected when the EOL line conversion is done from CR+LF to only LF. If the file is N lines, then the files size is expected to get reduced by N bytes. The conversion need not necessarily be the dos2unix utility or the ASCII mode of FTP. Some source control programs do this auto conversion for you. The conversion is done at Check-in and Check-out operations.

bash: ./configure: /bin/sh^M: bad interpreter: No such file or directory
or
$ ./configure
-bash: ./configure: /bin/sh: bad interpreter: Permission denied

Reason: The config.h file contains the unwanted <CR> character. Other makefiles probably also are affected.


If I write a script (Perl/ bash) in Windows notepad, then I cant get it executed on Unix. If I write the same
script in Linux I can. I ftped this windows written file to my colleague. He said he was able to execute it.I am going nuts!

Reason: If you have read the reasoning so far, you know the answer of this. This stuff can really go nasty at times, and cause total waste of time.

To add salt to injury none of these errors even remotely hint that the issue is related to the EOL characters.

I can bet that programmers are going to face the same issue again and again. An end to this is very difficult.

References:
EOL Wikipedia explanation
The Unicode New line guidelines


Happy EOLing.//

2 comments ( 144 views )   |  0 trackbacks   |  permalink   |   ( 2.9 / 60 )
Indian largest bank messes date format. 
Wednesday, January 23, 2008, 03:05 PM - Just a thought
DD/MM/YYYY or MM/DD/YYYY? Having worked with American clients and colleagues, I already know how to appreciate the confusion that could be caused when you document dates like this. As a personal preference I always use dd-mmm-yyyy(23-Jan-2008) in all my project documentation. If the date in question is less than 13, then it is hell of a confusion. Something like 23/01/08 is very clear, it has to be DD/MM/YY. Something like 12/12/2008 is confusing. Considering that its only the first 12 days of every month which causes this confusion, there are 144 days in the year to worry about.

Of course, I would argue if someone says "Apply common sense and put it in context, it would be fairly non-confusing most of the time". Almost everything that is 100% Indian would be DD/MM/YY and if its American it would be MM/DD/YY. If you get a receipt(bill/check) from a restaurant in India, it would be DD/MM/YY. The same applies to other places in India like the bank. Or so did I think, common sense meant!

Today I went to the State Bank of India ATM for the first time in my life, withdrew some money(nothing to do with pluming stock market though). Two bad experiences:

1. They have a yellow sunflower as their wall paper in the touch screen and they displayed my balance with a yellow font on top of it. Am I hearing "Whaaaaaaaaaaaaaaaat?".

2. In spite being an Indian bank, serving India since 200 years, its the only bank here that prints its date in MM/DD/YY. Even the American Citibank, prints in DD/MM/YY in ATMs located in India. Crazy!



With so much IT out-sourcing from US to India, I can imagine an American product having a mistake of this sort. I wonder whom did SBI out-source/sub-contract their ATM software to? Or did they just pick some re-usable components from some one else and did not know how to configure it properly? Deployment and configuration issues?

NCR Systemedia who handles SBI's ATM system would probably know.

Happy date-formatting.//
1 comment ( 77 views )   |  0 trackbacks   |  permalink   |   ( 3 / 66 )
Is blogging damaging information search? 
Friday, January 18, 2008, 04:08 PM - Ideas and Thoughts
I read in one of my bookmarked blogs today that .NET framework library code is now available for developers to allow them to press the F-11 key and dive into the unexplored ocean. I wanted to find out more information on this and as usual, I clicked the Home button on my browser and typed in ".net framework library code available" into the search box. The first page of the result were all blogs -- John's blog, Peter's blog, Mike's blog, Bill's blog, etc etc. There is so much blogging and news is being duplicated so much and these pages gets hit so much, so much to alter the best search engine to show only blogs in the first page of the result. Food for thought? I was expecting something link that has the word "www.microsoft" or something in line with that, something that would be more official then a blog.

The closest think I found out the details of setting up the debugging environment is another blog. So the question arises.

Anyways Happy Blogging.//
2 comments ( 107 views )   |  0 trackbacks   |  permalink   |   ( 3.1 / 50 )
Notes on C calling conventions 
Friday, January 18, 2008, 06:04 AM - Programming
Today’s post is basically of a bunch of stuff about C name decoration that I learned in the last 2 days and I am writing this down here so that I don’t forget it.

It’s about the way a C function name gets decorated when you compile them. These decorated names become important when you are using a library in your program.

In C there are 3 types of calling conventions:

Example, one may declare a function in 1 on the three following ways:

int __cdecl    f (int x) { return 0; }
int __stdcall g (int y) { return 0; }
int __fastcall h (int z) { return 0; }


The compiler decorates them as:

_f             //This is the standard C calling convention. When nothing is specified in front of the function name as we normally do, the decorated name simply contains an underscore (_) followed by the function name.
_g@4 //This is the PASCAL way of name decoration. The number 4 indicates the number of bytes in the argument list. sizeof(int) =4
@h@4 //And this is the __fastcall which is same as above but with an @prefixed.


You can see these decorated calls by opening you DLL in a DLL viewer or Dependencies walker.

One other very important point to note is that we also use __declspec(dllexport) to export these functions in out DLL. This causes a slight change in the decorated name and the rule is:

If __declspec(dllexport) is used with
1. __cdecl, it strips the leading underscore (_) when the name is exported.
2. If the function being exported does not use the C calling convention (for example, it uses __stdcall ), it exports the decorated name.

Here are a few examples:

1. __declspec(dllexport) int function(int i, int j);
2. __declspec(dllexport) int __cdecl function(int i, int j);
3. __declspec(dllexport) int __stdcall function(int i, int j);
4. __declspec(dllexport) int __fastcall function(int i, int j);

The names that get exported are:

1. function
2. function (defaults to __cdecl)
3. _function@8 (if sizeof(int) == 4)
4. @function@8

When you import these functions or call them its very important to use the same naming convention. If you get a header file for your DLL then you generally don’t have to worry much. But if you are using function pointers, you should be very particular about the way you are declaring your function pointer. It needs to have the proper style (__cdecl or __stdcall).

__stdcall is a very Windows thing. The reason for its existence is that other languages like VB, PASCAL or JAVA applications understand only __stdcall and not __cdecl. So if you find __stdcall in your DLL or header files in Windows, remember that it is for compatibility with other languages.

There is another moot difference between __cdecl and __stdcall. As we all know that the function arguments are pushed in into the stack. When the function returns, these arguments need to be popped off the stack. Whose responsibility is it to pop them off? Is it the caller or the callee?

In case of __cdecl it the caller’s responsibility. It’s the caller who knows how many bytes were passed in into the function and as such the caller can pop so many bytes off the stack. And I think that’s the precise reason why there is no @NN in the name decoration rules for __cdecl. (Remember the old K & R way of defining function arguments below the function name?).

In case of __stdcall it’s the callee’s responsibility. The callee would just off the exact number of bytes and cleans the stack. That’s why we have to keep the @NN in the name decoration. This allows cross language portability as the code to clean off the stack is in the DLL itself and not the user of the DLL. The drawback with this is that how would you name decorate a function with variable number of arguments. What would @NN be? The caller can call it with 1 or more number of arguments (Example the printf() function). How would the callee know how many bytes to pop off the stack? The answer is it cant, and that’s why we don’t have variable argument support with __stdcall.

A passing note on the not so popular __fastcall: this indicates the compiler to pass the arguments into the registers instead of the stack whenever possible.

Happy Declaring.//

add comment ( 52 views )   |  0 trackbacks   |  permalink   |   ( 3 / 44 )
New synchronization techniques in Vista for C/C++ developers, but I aint using Vista yet 
Tuesday, January 15, 2008, 11:42 AM - Programming
Todays post comes from me accidentally discovering that Windows SDK so far did not have a native implementation of condition variable. For those who know me, they will tell you that I am a bit naive in Windows.

Anyways, I had this particular and popular problem of having a queue in my design and there were consumers and producers, each running on separate threads. That required synchronization of the shared data so that they don't corrupt the queue. Basically what we want to make sure is that, only one of them is updating the queue. The solution as it may appear is very simple at first glance:

Create a mutex/semaphore/CRITICALSECTION to guard the queue. The queue has basically two update operations: Push() and Pop(). If you guard them with a mutex, then two no threads at any point of time can be updating the queue. This is a simple solution.

In this solution we are assuming that the consumer would continuous poll on the queue to find out if there are new entries, i.e., check for not_empty condition. We also assume that the producer would always check that the queue is not full and only insert when the queue is not full. Its the producer's responsibility to wait until the queue is not_full.

To solve this many producers and consumers Sleep() until the condition (either _not_full and _not_empty is meet). Fair enough, but not good enough because you never know how much Sleep is appropriate and you may either be using too much CPU polling or not using the power of your CPU by sleeping too much.

This problem is traditionally addressed by the Monitor Object design pattern, explained beautifully by Douglas C Schmidt in his book, Book : Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects.

Its pretty elegant to solve this by use of two condition variables(_not_full and _not_empty). Condition variables are synchronization primitives that enable threads to wait until a particular condition occurs. But isn't that what a mutex or a Windows Event does?

Its a little hard to appreciate condition variables at first go, so lets take a simple case. Imagine we have a Mutex which is un-flagged(non-signaled or value=0). When a thread waits on that Mutex (using WaitforSingleObject or POSIX's sem_wait() or System V semop()), it gets suspended until the mutex if flagged (signalled) by another thread. The thread then resumes it operations. But however if you want the thread inspite of just waiting on the mutex to:

1. Acquire a mutex (the lock) and
2. Go into suspended mode if a particular condition(boolean expression) is false, and
3. Release the mutex and still be suspended, until the boolean express is true

and do all the three above ATOMICALLY, then you need a condition variable. And yes, a condition variable is always used in conjunction with a "mutex".

When I tried to solve my queue problem using the Monitor Object design pattern, I realized than Windows does not support Condition Variable natively. Windows Vista however has introduced this, which is good, but I ain't developing software for Vista yet.

Good news:
Its is possible to use the native Win32 synchronization primitives and achieve the functionality of condition variable. There are more than one approach, each having its own pros and cons. One paper that dives into details about 4 possible solutions is
Strategies for Implementing POSIX Condition Variables on Win32 by Douglas C. Schmidt and Irfan Pyarali.

Resources:
Synchronization Primitives New To Windows Vista
POSIX Condition Variable -- tutorial

Happy Conditioning.//

add comment ( 108 views )   |  0 trackbacks   |  permalink   |   ( 3.2 / 42 )

<<First <Back | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | Next> Last>>