Sunday, January 27, 2008, 10:48 AM - Tips and Tricks
There are a bunch of irritating errors that often come in the life of software engineers and we always learn the hard way over the years. At the end, the reason is often not so complicated, as usual. Its about the way systems choose to choose the end the line character. The end-of-line character is one hell of a character. Different operating system developers choose different formats. Despite having a good interesting history about the reasons for being different, it does not justify the pain that comes along with it.
The way the common OSes choose the End of Line (EOL) character is as follows:
DOS/Windows: It uses two characters CR+LF (in ASCII its 0D followed by 0A)
UNIX: It uses only LF (in ASCII its 0A)
Macintosh: It uses on CR. (in ASCII its 0D)
It’s obvious that transferring text file between these operating systems would cause problems. A file created in Windows would open properly in UNIX and Mac, except that an extra character may or may not be displayed at the end of every line. It’s usually ^M in UNIX. Cant say how it’s on Mac --- never worked on Mac so far. A file written in Mac/Unix would probably be seen as a single long line in Windows.
Here are a few of the problems caused by this anomaly, which I have faced in my computing experience.
quantum_of_solace.sh : line 42: syntax error: unexpected end of file
but the file only has 41 lines.
Reason: This script has CR+LF as EOL and we get this error while trying to run it on UNIX. Its confused by the end of line characters and the line counting logic is also confused. You need to run the dos2unix utility to convert all CR+LF to LF only.
end_of_life.sh: /bin/bash: bad interpreter: No such file or directory
Reason: This has also the same reason as in case 1. The first line of the script is #!/bin/bash<CR>
So, file not found is actually for “/bin/bash<CR>” and since <CR> is a non-displayable character this message baffles all first time users.
Have you ever heard somebody saying "Always use the bin mode while doing an FTP, its faster"?
Explanation: If it is not faster then why do we have two modes in FTP? The FTP program is smart enough to do the EOL conversion for you when you transfer text files between different operating systems. So if you FTP a text file in ASCII mode, it does convert every “CR+LF” to only “LF” when the transfer is from a Windows OS to a Linux OS. It’s obvious that ASCII mode is invasive and could be damaging if you use this mode for transferring binary files. Note: ZIP files are binary files, so if your ASCII files are zipped and then FTPed then no such conversion is possible.
Sometimes I see ^M characters at the end of every line, where did they come from?
Reason: This is when a file created in Windows is opened in UNIX. ^M represents the non displayable character <CR> that was actually meant to symbolize the EOL (CR+LF) in Windows.
/CVSROOTccess /usr/local/cvsroot
No such file or directory
This is the last one (in context) of those weird cases, which I faced. Last Friday, Mahesh called me to decipher this message. We had a very good stare at this message without any clues, tried everything, and then ultimately hit the Google search to get enlightened. All that was needed to do was:
dos2unix `find . -name Root`
dos2unix `find . -name Entries`
dos2unix `find . -name Repository`
Same file in Unix has a smaller size than in Windows ...EEEEEEEEEEhhhh!
Reason: Yes, this is expected when the EOL line conversion is done from CR+LF to only LF. If the file is N lines, then the files size is expected to get reduced by N bytes. The conversion need not necessarily be the dos2unix utility or the ASCII mode of FTP. Some source control programs do this auto conversion for you. The conversion is done at Check-in and Check-out operations.
bash: ./configure: /bin/sh^M: bad interpreter: No such file or directory
or
$ ./configure
-bash: ./configure: /bin/sh: bad interpreter: Permission denied
Reason: The config.h file contains the unwanted <CR> character. Other makefiles probably also are affected.
If I write a script (Perl/ bash) in Windows notepad, then I cant get it executed on Unix. If I write the same
script in Linux I can. I ftped this windows written file to my colleague. He said he was able to execute it.I am going nuts!
Reason: If you have read the reasoning so far, you know the answer of this. This stuff can really go nasty at times, and cause total waste of time.
To add salt to injury none of these errors even remotely hint that the issue is related to the EOL characters.
I can bet that programmers are going to face the same issue again and again. An end to this is very difficult.
References:
EOL Wikipedia explanation
The Unicode New line guidelines
Happy EOLing.//




( 2.9 / 60 )

Calendar




