preg_replace replacing &not in string to funny character

preg_replace replacing &not in string to funny character - php

For some reason when preg_replace sees &not in string and replaces it with ¬:
$url= "http://something?blah=2&you=3&rate=22&nothing=1";
echo preg_replace("/&rate=[0-9]*/", "", $url) . "<br/>";
But the output is as follows:
http://something?blah=2&you=3¬hing=1 // Current result
http://something?blah=2&you=3&nothing=1 // Expected result
Any ideas why this is happening and how to prevent it?

& has special meaning when used URIs. Your URI contains &not, which is a valid HTML entity on its own. It's being converted to ¬, hence causing the trouble. Escape them properly as &not to avoid this problem. If your data is fetched from elsewhere, you can use htmlspecialchars() to do this automatically.

Use this & in place of this &
because your &no has special meaning
use this url :
http://something?blah=2&you=3&rate=22&nothing=1
and then do your replace accordingly

Related

PHP urlencode issue with a parameter in my string: "&notify_url" incorrectly returns "¬ify_url"

So when I use PHP's urlencode on the following string, there seems to be a technicality coming up which I think is on a reserved PHP word "&not".
The original string:
cancel_url=https://example.com/payment_cancelled&notify_url=https://example.com/order_notify
I get the following result using urlencode:
cancel_url=https%3A%2F%2Fexample.com%2Fpayment_cancelled¬ify_url=https%3A%2F%2Fexample.com%2Forder_notify
As you notice above, the '¬' special character it creates (just after the word 'cancelled'). So to me it seems the "&not" portion of "&notify_url" is an operator reserved operator word ("&not" in PHP?).
I have tried PHP's str_replace function after url encoding as follows:
$paramUrlString = str_replace('¬', '&not', $paramUrlString);
$paramUrlString = str_replace('&#170', '&not', $paramUrlString);
(trying the ASCII code for that special character too)
I've run out of ideas now. Please assist, thank you.

urlencode does not usually replace &not at all, but does replace & with %26. See example here: http://sandbox.onlinephpfunctions.com/code/e9d62797d01f8162170e5ad5181e14fc339faa52
You could try replacing & with %26 before urlencode.
$urlString = str_replace('&', '%26', $urlString);

It's not that anything in PHP is replacting the string &not with ¬, it's that whatever you're using to view/display the data is doing that.
Given that the closing ; on the entity is not required, I would wager that you're putting the URL into XML without properly escaping the entities. While & is the entity that conflicts between URLs and XML, there are more than that.
The simplest solution is if you're embedding a raw string in an XML document you need to call:
$string = htmlspecialchars($string, ENT_XML1 | ENT_COMPAT);
The best solution, on the other hand, is to not create XML documents by hand at all. Use a library like DOMDocument or XMLWriter. This handles not only the escaping/encoding of your data, but all of the other subtle complexities of creatings proper XML documents.

Does not display ampersand using $_GET in php [duplicate]

I am trying to send a GET message that contains strings with ampersands and can't figure how to escape the ampersand in the URL.
Example:
http://www.example.com?candy_name=M&M
result => candy_name = M
I also tried:
http://www.example.com?candy_name=M\&M
result => candy_name = M\\
I am using URLs manually, so I just need the correct characters.
I can't use any libraries. How can it be done?

They need to be percent-encoded:
> encodeURIComponent('&')
"%26"
So in your case, the URL would look like:
http://www.mysite.com?candy_name=M%26M

This does not only apply to the ampersand in URLs, but to all reserved characters. Some of which include:
# $ & + , / : ; = ? # [ ]
The idea is the same as encoding an &in an HTML document, but the context has changed to be within the URI, in addition to being within the HTML document. So, the percent-encoding prevents issues with parsing inside of both contexts.
The place where this comes in handy a lot is when you need to put a URL inside of another URL. For example, if you want to post a status on Twitter:
http://www.twitter.com/intent/tweet?status=What%27s%20up%2C%20StackOverflow%3F(http%3A%2F%2Fwww.stackoverflow.com)
There's lots of reserved characters in my Tweet, namely ?'():/, so I encoded the whole value of the status URL parameter. This also is helpful when using mailto: links that have a message body or subject, because you need to encode the body and subject parameters to keep line breaks, ampersands, etc. intact.
When a character from the reserved set (a "reserved character") has
special meaning (a "reserved purpose") in a certain context, and a URI
scheme says that it is necessary to use that character for some other
purpose, then the character must be percent-encoded. Percent-encoding
a reserved character involves converting the character to its
corresponding byte value in ASCII and then representing that value as
a pair of hexadecimal digits. The digits, preceded by a percent sign
("%") which is used as an escape character, are then used in the URI
in place of the reserved character. (For a non-ASCII character, it is
typically converted to its byte sequence in UTF-8, and then each byte
value is represented as above.) The reserved character "/", for
example, if used in the "path" component of a URI, has the special
meaning of being a delimiter between path segments. If, according to a
given URI scheme, "/" needs to be in a path segment, then the three
characters "%2F" or "%2f" must be used in the segment instead of a raw
"/".
http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters

Try using http://www.example.org?candy_name=M%26M.
See also this reference and some more information on Wikipedia.

I would like to add a minor comment to Blender's solution.
You can do the following:
var link = 'http://example.com?candy_name=' + encodeURIComponent('M&M');
That outputs:
http://example.com?candy_name=M%26M
The great thing about this it does not only work for &, but for any especial character.
For instance:
var link = 'http://example.com?candy_name=' + encodeURIComponent('M&M?><')
Outputs:
"http://example.com?candy_name=M%26M%3F%3E%3C"

You can use the % character to 'escape' characters that aren't allowed in URLs. See RFC 1738.
A table of ASCII values is given on the Wikipedia page.
You can see & is 26 in hexadecimal - so you need M%26M.

This may help if someone want it in PHP
$variable ="candy_name=M&M";
$variable = str_replace("&", "%26", $variable);

If you can't use any libraries to encode the value,
http://www.urlencoder.org/ or http://www.urlencode-urldecode.com/ or ...
Just enter your value "M&M", not the full URL ;-)

You can rather pass your arguments using this encodeURIComponent function so you don't have to worry about passing any special characters.
data: "param1=getAccNos&param2="+encodeURIComponent('Dolce & Gabbana') OR
var someValue = 'Dolce & Gabbana';
data : "param1=getAccNos&param2="+encodeURIComponent(someValue)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent

PHP Remove all non letters

I have several strings that look like this:
LasklÃ©
Jones & Jon
I am trying to send them via the foursquare API to be matched, however it is failing with these characters. Is there a way to sanitise these so they only include English letters i.e. the results would be:
Lasklé
Jones Jon
As it appears using file_get_contents requests both with the 'Ã©' and the '&' in the URL is causing issues.
I checked how the request was sent and realised that the '&' is uneeded and is causing the issues, is it possible to remove all non Letters/Numbers from the name?

What do the strings look like before you pass them? If your string looks like 'LasklÃ©' then I think you are using the wrong character set when reading the string, try using UTF-8.
If the string looks correct before you pass it on you should try urlencode the string first.

you can use preg_replace() function to replace the part of string using regex
to keep only letters you can use as follow it will also remove space( add \s from expression to keep space)
preg_replace('/[^a-zA-Z]/','',$string);
to keep space in the string or any character to keep you can add it in []
preg_replace('/[^a-zA-Z\s]/','',$string);

Use this to escape (space and '-'). Good for making a custom URL
$string=preg_replace("/[^A-Za-z0-9\s\/\-]/", '', $string);

Escape a url within a regular expression in a preg_replace

I am trying to redirect some tags to another page, passing its href as a url parameter. The code I'm using is something like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/siU",
"<a$1href=\"".WWW."go.php?to=".urlencode("$2")."\"$3>$4</a>", $text
);
It is a modified version of the regexp found here. I use this code in this block:
$text = "<...some other tags...><a target=\"_blank\" href=\"http://www.google.com\" style=\"...\" class=\"...\">Google</a></...some other tags...>";
And it correctly gets captured, but when using urlencode("$2"), it recieves a "$2" string, and not the value stored in the preg variables (as I would). It is not limited to urlencode, but to passing this as a parameter to any other function. So I would not only want to encode this (I can always extend a little more the regexp to accept urls) but generally use variables inside methods.
Do you know any workaround to this? Thanks in advance.

this is totally normal as your are url encoding the string "$2" and then the urlencoded string is used for replacement so you end up with the same thing as writing
"<a$1href=\"".WWW."go.php?to=$2\"$3>$4</a>"
as second parameter. If you want the urlencode to be evaluated you have to use the e (for eval) flag like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/seiU",
"'<a$1href=\"'.WWW.'go.php?to=\"'.urlencode('$2').'\"$3>$4</a>'", $text
);
another preferable solution may be to use preg_replace_callback to avoid relying on evaluating unknown strings

Automatic addition of trailing slash to urlencoded urls

I am very confused about the following:
echo("<a href='http://".urlencode("www.test.com/test.php?x=1&y=2")."'>test</a><br>");
echo("<a href='http://"."www.test.com/test.php?x=1&y=2"."'>test</a>");
The first link gets a trailing slash added (that's causing me problems)
The second link does not.
Can anyone help me to understand why.
Clearly it appears to be something to do with urlencode, but I can't find out what.
Thanks
c

You should not be using urlencode() to echo URLs, unless they contain some non standard characters.
The example provided doesn't contain anything unusual.
Example
$query = 'hello how are you?';
echo 'http://example.com/?q=' . urlencode($query);
// Ouputs http://example.com/?q=hello+how+are+you%3F
See I used it because the $query variable may contain spaces, question marks, etc. I can not use the question mark because it denotes the start of a query string, e.g. index.php?page=1.
In fact, that example would be better off just being output rather than echo'd.
Also, when I tried your example code, I did not get a traling slash, in fact I got
<a href='http://www.test.com%2Ftest.php%3Fx%3D1%26y%3D2'>test</a>

string urlencode ( string $str )
This function is convenient when
encoding a string to be used in a
query part of a URL, as a convenient
way to pass variables to the next
page.
Your urlencode is not used properly in your case.
Plus, echo don't usually come with () it should be echo "<a href='http [...]</a>";

You should use urlencode() for parameters only! Example:
echo 'http://example.com/index.php?some_link='.urlencode('some value containing special chars like whitespace');
You can use this to pass URLs, etc. to your URL.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.