Escape a url within a regular expression in a preg_replace - php

I am trying to redirect some tags to another page, passing its href as a url parameter. The code I'm using is something like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/siU",
"<a$1href=\"".WWW."go.php?to=".urlencode("$2")."\"$3>$4</a>", $text
);
It is a modified version of the regexp found here. I use this code in this block:
$text = "<...some other tags...><a target=\"_blank\" href=\"http://www.google.com\" style=\"...\" class=\"...\">Google</a></...some other tags...>";
And it correctly gets captured, but when using urlencode("$2"), it recieves a "$2" string, and not the value stored in the preg variables (as I would). It is not limited to urlencode, but to passing this as a parameter to any other function. So I would not only want to encode this (I can always extend a little more the regexp to accept urls) but generally use variables inside methods.
Do you know any workaround to this? Thanks in advance.

this is totally normal as your are url encoding the string "$2" and then the urlencoded string is used for replacement so you end up with the same thing as writing
"<a$1href=\"".WWW."go.php?to=$2\"$3>$4</a>"
as second parameter. If you want the urlencode to be evaluated you have to use the e (for eval) flag like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/seiU",
"'<a$1href=\"'.WWW.'go.php?to=\"'.urlencode('$2').'\"$3>$4</a>'", $text
);
another preferable solution may be to use preg_replace_callback to avoid relying on evaluating unknown strings

Related

PHP urlencode issue with a parameter in my string: "&notify_url" incorrectly returns "¬ify_url"

So when I use PHP's urlencode on the following string, there seems to be a technicality coming up which I think is on a reserved PHP word "&not".
The original string:
cancel_url=https://example.com/payment_cancelled&notify_url=https://example.com/order_notify
I get the following result using urlencode:
cancel_url=https%3A%2F%2Fexample.com%2Fpayment_cancelled¬ify_url=https%3A%2F%2Fexample.com%2Forder_notify
As you notice above, the '¬' special character it creates (just after the word 'cancelled'). So to me it seems the "&not" portion of "&notify_url" is an operator reserved operator word ("&not" in PHP?).
I have tried PHP's str_replace function after url encoding as follows:
$paramUrlString = str_replace('¬', '&not', $paramUrlString);
$paramUrlString = str_replace('&#170', '&not', $paramUrlString);
(trying the ASCII code for that special character too)
I've run out of ideas now. Please assist, thank you.
urlencode does not usually replace &not at all, but does replace & with %26. See example here: http://sandbox.onlinephpfunctions.com/code/e9d62797d01f8162170e5ad5181e14fc339faa52
You could try replacing & with %26 before urlencode.
$urlString = str_replace('&', '%26', $urlString);
It's not that anything in PHP is replacting the string &not with ¬, it's that whatever you're using to view/display the data is doing that.
Given that the closing ; on the entity is not required, I would wager that you're putting the URL into XML without properly escaping the entities. While & is the entity that conflicts between URLs and XML, there are more than that.
The simplest solution is if you're embedding a raw string in an XML document you need to call:
$string = htmlspecialchars($string, ENT_XML1 | ENT_COMPAT);
The best solution, on the other hand, is to not create XML documents by hand at all. Use a library like DOMDocument or XMLWriter. This handles not only the escaping/encoding of your data, but all of the other subtle complexities of creatings proper XML documents.

preg_match url reconise for ifstatement [duplicate]

I've made some regex to test for a YouTube embedded video:
/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/
It works for what I expect when I test it, but the problem though is that I need to pass that regex as a string to some function. Particularly I'm using htmlawed, where I pass a following string to a function:
func('iframe=-*,src(match="/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/")');
The problem is that the above regex sort of works, but it just ignores the slashes, and accepts anything in place of them.
That is why I suspect that there is a problem with escaping.
I would appreciate if you could advice some alternative ways of escaping these slashes and backslashes... there must be some way?
If you have a string, you will need to escape the backslashes (and quotes) for the string literal. Or, depending on how the function builds the regex from the string, you might not need to escape slashes at all (I don't think so here).
"iframe=-*,src(match=\"/^(http:\\/\\/www\\.youtube\\.com\\/embed\\/)[^\\/\\s\\\\]+$/\")"
In PHP, you can also use a different regex delimiter:
~^(http://www\.youtube\.com/embed/)[^/\s\\\\]+$~

PHP Remove all non letters

I have several strings that look like this:
Lasklé
Jones & Jon
I am trying to send them via the foursquare API to be matched, however it is failing with these characters. Is there a way to sanitise these so they only include English letters i.e. the results would be:
Lasklé
Jones Jon
As it appears using file_get_contents requests both with the 'é' and the '&' in the URL is causing issues.
I checked how the request was sent and realised that the '&' is uneeded and is causing the issues, is it possible to remove all non Letters/Numbers from the name?
What do the strings look like before you pass them? If your string looks like 'Lasklé' then I think you are using the wrong character set when reading the string, try using UTF-8.
If the string looks correct before you pass it on you should try urlencode the string first.
you can use preg_replace() function to replace the part of string using regex
to keep only letters you can use as follow it will also remove space( add \s from expression to keep space)
preg_replace('/[^a-zA-Z]/','',$string);
to keep space in the string or any character to keep you can add it in []
preg_replace('/[^a-zA-Z\s]/','',$string);
Use this to escape (space and '-'). Good for making a custom URL
$string=preg_replace("/[^A-Za-z0-9\s\/\-]/", '', $string);

Automatic addition of trailing slash to urlencoded urls

I am very confused about the following:
echo("<a href='http://".urlencode("www.test.com/test.php?x=1&y=2")."'>test</a><br>");
echo("<a href='http://"."www.test.com/test.php?x=1&y=2"."'>test</a>");
The first link gets a trailing slash added (that's causing me problems)
The second link does not.
Can anyone help me to understand why.
Clearly it appears to be something to do with urlencode, but I can't find out what.
Thanks
c
You should not be using urlencode() to echo URLs, unless they contain some non standard characters.
The example provided doesn't contain anything unusual.
Example
$query = 'hello how are you?';
echo 'http://example.com/?q=' . urlencode($query);
// Ouputs http://example.com/?q=hello+how+are+you%3F
See I used it because the $query variable may contain spaces, question marks, etc. I can not use the question mark because it denotes the start of a query string, e.g. index.php?page=1.
In fact, that example would be better off just being output rather than echo'd.
Also, when I tried your example code, I did not get a traling slash, in fact I got
<a href='http://www.test.com%2Ftest.php%3Fx%3D1%26y%3D2'>test</a>
string urlencode ( string $str )
This function is convenient when
encoding a string to be used in a
query part of a URL, as a convenient
way to pass variables to the next
page.
Your urlencode is not used properly in your case.
Plus, echo don't usually come with () it should be echo "<a href='http [...]</a>";
You should use urlencode() for parameters only! Example:
echo 'http://example.com/index.php?some_link='.urlencode('some value containing special chars like whitespace');
You can use this to pass URLs, etc. to your URL.

Regex to change spaces in images into entities

I'm having a lot of difficulty matching an image url with spaces.
I need to make this
http://site.com/site.com/files/images/img 2 (5).jpg
into a div like this:
.replace(/(http:\/\/([^\s]+\.(jpg|png|gif)))/ig, "<div style=\"background: url($1)\"></div>")
Here's the thread about that:
regex matching image url with spaces
Now I've decided to first make the spaces into entities so that the above regex will work.
But I'm really having a lot of difficulty doing so.
Something like this:
.replace(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/ig, "http://$1/$2%20$3$4")
Replaces one space, but all the rest are still spaces.
I need to write a regex that says, make all spaces between http:// and an image extension (png|jpg|gif) into %20.
At this point, frankly not sure if it's even possible. Any help is appreciated, thanks.
Trying Paolo's escape:
.escape(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/)
Another way I can do this is to escape serverside in PHP, and in PHP I can directly mess with the file name without having to match it in regex.
But as far as I know something like htmlentities do not apply to spaces. Any hints in this direction would be great as well.
Try the escape function:
>>> escape("test you");
test%20you
If you want to control the replacement character but don't want to use a regular expression, a simple...
$destName = str_replace(' ', '-', $sourceName);
..would probably be the more efficient solution.
Lets say you have the string variable urlWithSpaces which is set to a URL which contains spaces.
Simply go:
urlWithoutSpaces = escape(urlWithSpaces);
What about urlencode() - that may do what you want.
On the JS side you should be using encodeURI(), and escape() only as a fallback. The reason to use encodeURI() is that it uses UTF-8 for encoding, while escape() uses ISO Latin. Same problems applies for decoding.
encodeURI = encodeURI || escape;
alert(encodeURI('image name.png'));

Categories