An invisible Unicode character that can waste a lot of your time
Look at these two strings,
#cool. They are the same, right?
Well, not exactly. If we count the characters we'll end up in 5 chars in both.
But if we let our computer count them, then the first string has 6 and the second one 5 characters.
[35, 99, 111, 8203, 111, 108] with this code
let charArray = Array.from(s1).map(c => c.charCodeAt(0));.
Or, same in hex
['0023', '0063', '006F', '200B', '006F', '006C']. Now, if we inspect each of these Unicode characters will see that
200B is the intruder.
Enter the Zero Width Space. Wikipedia says it is used for word boundaries.
Great, but it can also ruin your day. Let's say you call your API for some data by a Primary Key which is a String and get something completely different than what you are seeing directly in the database. You debug the API and everything seems to be OK. Then after checking the Database you realize that you have two different rows with what seems to be the same Primary Key. How can this be, have you found a bug in your DB? Before you open a ticket with your DB system, you want to double check and compare the hash codes of two seemingly identical Primary Keys. And they are not the same, and after inspecting the characters one by one and wasting a couple of hours you find out about Zero Width Space.
Now, should you be removing all zero width spaces characters in your Input TextFields? Probably not, how can you be sure that a user didn't put it intentionally?
And even if you want to remove it, how can you do it?
If it is at the end of the string, do you think calling
trim on it will remove it?
let trimmed = '#cool' + '\u200B'.trim() and then dump the character codes to array.
But it can be easily done with Regular Expressions. This one finds all visible and non visible empty spaces (tabs and spaces)
[\s\u200B-\u200D\uFEFF]. Or this one will find only zero width spaces (yes, there are more than one)