Shuva's blog
Racism in programming 
Friday, February 1, 2008, 04:46 AM - Just a thought
With all the noise about racism in cricket, I got so confused with what racism actually meant. I went to check the literal meaning of this word and it says that being racist meant discrimination on the basis of race, color or religion.

I wondered if I would be considered a racist if you don't write my program to handle wide characters?

Happy? Happy what?.//
add comment ( 119 views )   |  0 trackbacks   |  permalink   |   ( 2.9 / 69 )
Readng lines from stdin in C++ 
Friday, February 1, 2008, 04:25 AM - Programming
While programming in C++, very often, probably most often, most people prefer to use scanf() and printf() for satisfying their program's hunger and excretion needs, rather than using C++ std::cin and std::cout .In my experience I have seen most C++ programmers having very little control over C++ IO routines. They seem to go back to C style of programming while dealing with IO (be it stdout/stdin or file read/writes). FILE* fp appears more often than not that std::fstream.

After reading through some articles how using a mix of std::cout and printf() could produce unpredictable behavior, I am making conscious effort to only using std::cout and std::cin in all my C++ programs. I will agree with all those people that using printf() is easier when you are writing small programs --- writing std::endl is always a little more painful then just writing "\n" or getting an integer formatted in hex is just saying "%0x" in C whereas in C++ its like saying "<< std::base::hex <<". Formatting is much easier in C than in C++.

To illustrate,
printf(" %d : %s | The file, %s has %d lines and is located 
at offset Ox%X.\n", __LINE__ , __FUNCTION__, filename,
lines , offset);

is easier to write and foretell the output than
std::cout << " " << __LINE__ << " : " << __FUNCTION__ 
<< " | The file, " << filename << " has " << lines
<< " lines and is located at offset Ox" << std::hex
<< offset << std::endl;

But when you need to do complex stuff, C++ can be quite effective and more immune to program crash and more safe than C, but that's an altogether different discussion -- I mention this not to sound like I am vehemently rejecting its beauty.

In today's diary, I am writing about reading characters from stdin and some common problems/issues that always seem to make me feel like I am reinventing the wheel.

When you do a std::cin >> var; , it does a pretty decent job while reading integers, floats, characters or words.

What if you want to read a line from stdin. Lets say I was to read my name "Shuva Brata Deb" into a string. Just like scanf, std::cin by default ignores all white spaces and tabs. So just saying std::cin >> str; just reads "Shuva".

I did some search in my quest for reading lines and interestingly(as expected) there are more than one way of getting this done. Here are my notes:

1.Using the getline function.
#include <iostream>
#include <sstream>

int main() {
std::string str;

std::cout << "Name:";
getline(std::cin, str);
std::cout << str;

std::cout << std::endl << "Company:";
getline(std::cin, str);
std::cout << str;
}

So instead of using std::cin >> str; you use the getline global function to read the string. It is recommended that you don’t mix the two styles of reading input. The reason for this recommendation is that when you say std::cin >> str; it leaves the end-of-line character in the input buffer. At this stage if you call getline(std::cin, str), it only reads the left-over EOL char. It gives a feeling that the getline(std::cin, str) did not do anything.

To illustrate :

#include <iostream>
#include <sstream>

int main() {
std::string str;

std::cout << "Name:";
std::cin >> str;
std::cout << str;

std::cout << std::endl << "Company:";
getline(std::cin, str);
std::cout << str << std::endl;
}


Would give different outputs when you enter two words in one line or one word in the first line and press enter.

Here is the two outputs, which can be a bit confusing:
C:\readline.exe
Name:Shuva Brata Deb
Shuva
Company: Brata Deb
Press any key to continue . . .
<Here the company was mis-read, but this was expected>


C:\readline.exe
Name:Shuva
Shuva
Company:
Press any key to continue . . .
<Here you did not get a chance to enter the company>


The correct solution is of course to use only getline(), but the point (as we see in the second output) is that we should try not to use them interchangeably.

But we know such recommendation have no value, because sometimes we do need to break the recommendation. The question is how do you make a readline(std::cin, str) after a std::cin >> str; behave properly.

First solution: (not very good solution):
Call readline(std::cin, str); twice. The first call eats the left over end-of-line
#include <iostream>
#include <sstream>

int main() {
std::string str;

std::cout << "Name:";
std::cin >> str;
std::cout << str;

std::cout << std::endl << "Company:";
getline(std::cin, str);
getline(std::cin, str);
std::cout << str << std::endl;
}


Second solution: Call std::cin.get() to read the left-over end of line.
#include <iostream>
#include <sstream>

int main() {
std::string str;

std::cout << "Name:";
std::cin >> str;
std::cout << str;

std::cout << std::endl << "Company:";
char eol(0);
std::cin.get(eol);
getline(std::cin, str);
std::cout << str << std::endl;
}


Third solution: Check for the EOL char and ignore.
#include <iostream>
#include <sstream>
int main() {
std::string str;

std::cout << "Name:";
std::cin >> str;
std::cout << str;

std::cout << std::endl << "Company:";
char eol(0);
if (std::cin.peek()=='\n') std::cin.ignore(1,'\n');
getline(std::cin, str);
std::cout << str << std::endl;
}


Using getline() is more robust that std::cin. You can get getline to work with file streams too, you can make getline() to stop reading after a specific character, which by default is ‘\n’. You end up with have consistency in your code base if you stick to one, and one which is more robust and solves more problems.

Happy Getlining.//

add comment ( 58 views )   |  0 trackbacks   |  permalink   |   ( 3 / 46 )
Password changes everyday 
Tuesday, January 29, 2008, 04:42 AM - Ideas and Thoughts
While administering Aminus3, we had a support query from somebody that said that her site login password changed everyday. I had to read the email more than once to understand what was actually going on. The password was not supposed to be changing daily, and even if it did, how would the user know.

What actually was happening was that, she would forget her password and click the "Forgot password" link on the site, expecting that her original password would be emailed. Every time the password reset email to her would contain a new password. It took a couple of such emails before she wrote to us reporting the bug in the system.

Coming to think of it, it is perfectly reasonable for a not-so-internet-savvy user to expect the "Forgot password" to send her the original password. Its straight logic -- need to read that article on usability again.

Food for thought!

Happy thinking.//
add comment ( 64 views )   |  0 trackbacks   |  permalink   |   ( 3 / 49 )
Evergreen issues wil the End of Line (EOL) character 
Sunday, January 27, 2008, 10:48 AM - Tips and Tricks
There are a bunch of irritating errors that often come in the life of software engineers and we always learn the hard way over the years. At the end, the reason is often not so complicated, as usual. Its about the way systems choose to choose the end the line character.

The end-of-line character is one hell of a character. Different operating system developers choose different formats. Despite having a good interesting history about the reasons for being different, it does not justify the pain that comes along with it.

The way the common OSes choose the End of Line (EOL) character is as follows:

DOS/Windows: It uses two characters CR+LF (in ASCII its 0D followed by 0A)
UNIX: It uses only LF (in ASCII its 0A)
Macintosh: It uses on CR. (in ASCII its 0D)

It’s obvious that transferring text file between these operating systems would cause problems. A file created in Windows would open properly in UNIX and Mac, except that an extra character may or may not be displayed at the end of every line. It’s usually ^M in UNIX. Cant say how it’s on Mac --- never worked on Mac so far. A file written in Mac/Unix would probably be seen as a single long line in Windows.

Here are a few of the problems caused by this anomaly, which I have faced in my computing experience.

quantum_of_solace.sh : line 42: syntax error: unexpected end of file
but the file only has 41 lines.
Reason: This script has CR+LF as EOL and we get this error while trying to run it on UNIX. Its confused by the end of line characters and the line counting logic is also confused. You need to run the dos2unix utility to convert all CR+LF to LF only.


end_of_life.sh: /bin/bash: bad interpreter: No such file or directory
Reason: This has also the same reason as in case 1. The first line of the script is #!/bin/bash<CR>
So, file not found is actually for “/bin/bash<CR>” and since <CR> is a non-displayable character this message baffles all first time users.



Have you ever heard somebody saying "Always use the bin mode while doing an FTP, its faster"?
Explanation: If it is not faster then why do we have two modes in FTP? The FTP program is smart enough to do the EOL conversion for you when you transfer text files between different operating systems. So if you FTP a text file in ASCII mode, it does convert every “CR+LF” to only “LF” when the transfer is from a Windows OS to a Linux OS. It’s obvious that ASCII mode is invasive and could be damaging if you use this mode for transferring binary files. Note: ZIP files are binary files, so if your ASCII files are zipped and then FTPed then no such conversion is possible.

Sometimes I see ^M characters at the end of every line, where did they come from?
Reason: This is when a file created in Windows is opened in UNIX. ^M represents the non displayable character <CR> that was actually meant to symbolize the EOL (CR+LF) in Windows.

/CVSROOTccess /usr/local/cvsroot
No such file or directory

This is the last one (in context) of those weird cases, which I faced. Last Friday, Mahesh called me to decipher this message. We had a very good stare at this message without any clues, tried everything, and then ultimately hit the Google search to get enlightened. All that was needed to do was:

dos2unix `find . -name Root`
dos2unix `find . -name Entries`
dos2unix `find . -name Repository`


Same file in Unix has a smaller size than in Windows ...EEEEEEEEEEhhhh!
Reason: Yes, this is expected when the EOL line conversion is done from CR+LF to only LF. If the file is N lines, then the files size is expected to get reduced by N bytes. The conversion need not necessarily be the dos2unix utility or the ASCII mode of FTP. Some source control programs do this auto conversion for you. The conversion is done at Check-in and Check-out operations.

bash: ./configure: /bin/sh^M: bad interpreter: No such file or directory
or
$ ./configure
-bash: ./configure: /bin/sh: bad interpreter: Permission denied

Reason: The config.h file contains the unwanted <CR> character. Other makefiles probably also are affected.


If I write a script (Perl/ bash) in Windows notepad, then I cant get it executed on Unix. If I write the same
script in Linux I can. I ftped this windows written file to my colleague. He said he was able to execute it.I am going nuts!

Reason: If you have read the reasoning so far, you know the answer of this. This stuff can really go nasty at times, and cause total waste of time.

To add salt to injury none of these errors even remotely hint that the issue is related to the EOL characters.

I can bet that programmers are going to face the same issue again and again. An end to this is very difficult.

References:
EOL Wikipedia explanation
The Unicode New line guidelines


Happy EOLing.//

2 comments ( 144 views )   |  0 trackbacks   |  permalink   |   ( 2.9 / 60 )
Indian largest bank messes date format. 
Wednesday, January 23, 2008, 03:05 PM - Just a thought
DD/MM/YYYY or MM/DD/YYYY? Having worked with American clients and colleagues, I already know how to appreciate the confusion that could be caused when you document dates like this. As a personal preference I always use dd-mmm-yyyy(23-Jan-2008) in all my project documentation. If the date in question is less than 13, then it is hell of a confusion. Something like 23/01/08 is very clear, it has to be DD/MM/YY. Something like 12/12/2008 is confusing. Considering that its only the first 12 days of every month which causes this confusion, there are 144 days in the year to worry about.

Of course, I would argue if someone says "Apply common sense and put it in context, it would be fairly non-confusing most of the time". Almost everything that is 100% Indian would be DD/MM/YY and if its American it would be MM/DD/YY. If you get a receipt(bill/check) from a restaurant in India, it would be DD/MM/YY. The same applies to other places in India like the bank. Or so did I think, common sense meant!

Today I went to the State Bank of India ATM for the first time in my life, withdrew some money(nothing to do with pluming stock market though). Two bad experiences:

1. They have a yellow sunflower as their wall paper in the touch screen and they displayed my balance with a yellow font on top of it. Am I hearing "Whaaaaaaaaaaaaaaaat?".

2. In spite being an Indian bank, serving India since 200 years, its the only bank here that prints its date in MM/DD/YY. Even the American Citibank, prints in DD/MM/YY in ATMs located in India. Crazy!



With so much IT out-sourcing from US to India, I can imagine an American product having a mistake of this sort. I wonder whom did SBI out-source/sub-contract their ATM software to? Or did they just pick some re-usable components from some one else and did not know how to configure it properly? Deployment and configuration issues?

NCR Systemedia who handles SBI's ATM system would probably know.

Happy date-formatting.//
1 comment ( 77 views )   |  0 trackbacks   |  permalink   |   ( 3 / 66 )

<<First <Back | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | Next> Last>>