$_GET variables with URL sensitive symbols? (like for search)

$_GET variables with URL sensitive symbols? (like for search) - php

I realized a pretty obvious problem with my search, but don't know how to fix it. Say someone searches for "Hello there" it would of course come up something like ?s=Hello+there in the URL.
However, how do I deal with people searching for something like "Hello & such"? The browser will read the second query as ?s=Hello+&+such which makes it stop the search variable at "Hello". I have the same problem with the pound symbol. If someone searches for something with the pound symbol, it gets added on as though it's a URL fragment, rather than part of the search query.
I can't seem to find information for how to handle this, can anyone give me a hand?

This is where encoding and escaping comes into play. For php see url encode.
However due to the nature of your problem I think you are rather looking for js function:
Encode URL in JavaScript?

Searching & will not break your search. If you're using a GET form to make that search, the & would automatically be changed to %26. Same for other symbols.
Manually escaping with urlencode() for PHP or simple find/replace for JS (or some function whirling around online) should do the trick fool-proof.

Related

Should I be using html_entity_decode to escape a Google Analytics custom variable?

I'm working on a WordPress site with some other developers and the code they wrote to set upcustom variables for Google Analytics, via the _setCustomVar, uses html_entity_decode. They pointed to the well known and much used Yoast plugin which uses a similar technique. I can't figure out why you would use it that way though.
At no point (that I can see) does the string get encoded, so the function doesn't do anything. WordPress delivers whole strings, even with accents on them, never anything encoded, so there aren't rogue encoded characters to worry about. In fact, the one thing you don't want to do is send Google Analytics a mess of HTML, right?
I've changed it because I'm pretty sure that what using html_entity_decode doesn't do is remove single quotes, which in a JS script where strings are contained by single quotes, means that any variable with an apostrophe just breaks Google Analytics tracking entirely.
Instead, I'm cleaning strings using a strip_tags and esc_js (a WordPress function).
I'm a little concerned because the linked script is very commonly used. It seems like I must be wrong about something and I don't want to screw up my own script because of it.
What am I missing?

The answer seems to be that Yoast uses that code as a 'just in case' measure for strings that might have encoded characters in them. It still doesn't seem to take care of quote marks though, which is a pretty big deal.
Here's the code I wrote to solve all the issues: https://gist.github.com/AramZS/8930496

How to get special characters in URL variables to output correctly using PHP and GET

I've an issue I've just encountered (once the web app is already up and running for a week!) and I can't seem to solve it, and I'm sort of rushing through it in order to fix it before it continues malfunctioning.
I've coded a neat little Christmas card for a business and the user inputs his/her name and the name of someone else and then sends it, so the card says TO: JOHN, FROM: PAUL, basically.
This info is sent via URL vars and then, of course, I use GET to retrieve it and output the message.
Of course, it's a card one can use from anyone and to anyone... but I tested it only in English (my bad). So when the first Martín or Sören comes around and uses the card, they get From: Martãn, From: Sã¶ren...
Obviously, that doesn't work.
So I'm guessing I need to find a way either to transform the special character from the input field into the URL or from the URL to the output message. (While we're on the subject: which would you recommend?)
However, I can't get it to work. I've tried finding which character codes work when sent through the URL.
I've noticed URLs usually substitute certain characters and especially white spaces with a % and something else (a coding method whose name I don't know; can anyone enlighten me on that, please?). But when I try %C3%AD, which, according to a website I found, is the code for í, as in my example Martín, I continue to get the ã, as these codes in the URL are automatically changed to their special character.
I've also tried í, í but to no avail!

You can try using rawurlencode, check out the examples there. Hope this helps.
http://www.php.net/rawurlencode
http://www.php.net/manual/en/function.rawurldecode.php

Nasty regex and strange string behavior

I've been struggling with this problem for quite some time now and I just can't seem to find a solution. I have the following regular expression for matching URLs which appears to work flawlessly until I post a bunch of links on new lines without spaces between them.
(http|ftp)+(s)?:(\/\/)((\w|\.|\-)+)(\/)?(\S)+
I tried this in a couple of regex testers and it seems to pick URLs correctly, unlike the code at my application. Which made me think there must be something wrong with the code and I started debugging. What I found out when I echo'ed the string I'm applying the regular expression to is this:
http://www.google.com/\r\nhttp://www.google.com/\r\nhttp://www.google.com/
I have never seen new lines \r\n appear as text in the browser. This makes me think that there's something else getting its hands on this string. I followed my logic and it turned out that this string comes right from a textarea element into $_POST and is not being manipulated anywhere.
What may be causing those \r\ns to appear as text and how would I go about matching those URLs that users may input separated by new lines?
I'm kind of really desperate over here, I would really appreciate your help guys.

If you are seeing
http://www.google.com/\r\nhttp://www.google.com/\r\nhttp://www.google.com/
when you echo the string, that means that the actual string you are echoing is:
http://www.google.com/\\r\\nhttp://www.google.com/\\r\\nhttp://www.google.com/
i.e. the backslashes have been escaped, causing them to not be treated as newline characters. This means that you are only getting a single match in your regex.
Check out this question: Why are $_POST variables getting escaped in PHP? for reasons why your requests may be getting escaped.

Is it safe to use (strip_tags, stripslashes, trim) to clear variable that holds URLs

It's quite pleasure to be posting my first question in here :-)
I'm running a URL Shortening / Redirecting service, PHP written.
I aim to store and handle valid URLs data as much as possible within my service.
I noticed that sometimes, invalid URL data is being handled over to the database, holding invalid characters (like spaces in the end or beginning of the URL).
I decided to make my URL-Check mechanism trim, stripslashes and strip_tags the values before storing them.
As far as I can think, these functions will not remove valid charterers that any URL may have.
Kindly, just correct me or advise me if I'm going into the wrong direction.
Regards..

If you're already trimming the incoming variable, as well as filtering it with the other built in PHP methods, and STILL running into issues, try changing the collation of your table to UTF-8 and see if that helps you get rid of the special characters you mention. (Could you paste a few examples to let us know?)

Removing characters from a PHP String

I'm accepting a string from a feed for display on the screen that may or may not include some rubbish I want to filter out. I don't want to filter normal symbols at all.
The values I want to remove look like this: �
It is only this that I want removed. Relevant technology is PHP.
Suggestions appreciated.

This is an encoding problem; you shouldn't try to clean that bogus characters but understand why you're receiving them scrambled.
Try to get your data as Unicode, or to make a agreement with your feed provider to you both use the same encoding.

Thanks for the responses, guys. Unfortunately, those submitted had the following problems:
wrong for obvious reasons:
ereg_replace("[^A-Za-z0-9]", "", $string);
This:
s/[\u00FF-\uFFFF]//
which also uses the deprecated ereg form of regex also didn't work when I converted to preg because the range was simply too large for the regex to handle. Also, there are holes in that range that would allow rubbish to seep through.
This suggestion:
This is an encoding problem; you shouldn't try to clean that bogus characters but understand why you're receiving them scrambled.
while valid, is no good because I don't have any control over how the data I receive is encoded. It comes from an external source. Sometimes there's garbage in there and sometimes there is not.
So, the solution I came up with was relatively dirty, but in the absence of something more robust I'm just accepting all standard letters, numbers and symbols and discarding the rest.
This does seem to work for now. The solution is as follows:
$fixT = str_replace("£", "£", $string);
$fixT = str_replace("€", "€", $fixT);
$fixT = preg_replace("/[^a-zA-Z0-9\s\.\/:!\[\]\*\+\-\|\<\>##\$%\^&\(\)_=\';,'\?\\\{\}`~\"]/", "", $fixT);
If anyone has any better ideas I'm still keen to hear them. Cheers.

You are looking for characters that are outside of the range of glyphs that your font can display. You can find the maximum unicode value that your font can display, and then create a regex that will replace anything above that value with an empty string. An example would be
s/[\u00FF-\uFFFF]//
This would strip anything above character 255.

That's going to be difficult for you to do, since you don't have a solid definition of what to filter and what to keep. Typically, characters that show up as empty squares are anything that the typeface you're using doesn't have a glyph for, so the definition of "stuff that shows up like this: �" is horribly inexact.
It would be much better for you to decide exactly what characters are valid (this is always a good approach anyway, with any kind of data cleanup) and discard everything that is not one of those. The PHP filter function is one possibility to do this, depending on the level of complexity and robustness you require.

If you cant resolve the issue with the data from the feed and need to filter the information then this may help:
PHP5 filter_input is very good for filtering input strings and allows a fair amount of rlexability
filter_input(input_type, variable, filter, options)
You can also filter all of your form data in one line if it requires the same filtering :)
There are some good examples and more information about it here:
http://www.w3schools.com/PHP/func_filter_input.asp
The PHP site has more information on the options here: Validation Filters

Take a look at this question to get the value of each byte in your string. (This assumes that multibyte overloading is turned off.)
Once you have the bytes, you can use them to determine what these "rubbish" characters actually are. It's possible that they're a result of misinterpreting the encoding of the string, or displaying it in the wrong font, or something else. Post them here and people can help you further.

Try this:
Download a sample from the feed manually.
Open it in Notepad++ or another advanced text editor (KATE on Linux is good for this).
Try changing the encoding and converting from one encoding to another.
If you find a setting that makes the characters display properly, then you'll need to either encode your site in that encoding, or convert it from that encoding to whatever you use on your site.

Hello Friends,
try this Regular Expression to remove unicode char from the string :
/*\\u([0-9]|[a-fA-F])([0-9]|[a-fA-F])([0-9]|[a-fA-F])([0-9]|[a-fA-F])/
Thanks,
Chintu(prajapati.chintu.001#gmail.com)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.