preg_match removes escaping from json string

preg_match removes escaping from json string - php

I am trying to parse some json data using json_decode function of php. However, I need to remove certain leading and trailing characters from this long string before decoding. Therefore, I am using preg_match to remove those characters prior to decode. For some reason, preg_match is changing escaping when it encounters following substring (in the middle of the string)
{content: \\\"\\200B\\\"}
After preg_match the above string looks like this:
{content: \\"\200B\\"}
Because of this, json_decode fails.
FYI, the preg_match pattern looks like this:
(?<=remove_these_leading_char)(.*)(?=remove_these_trailing_char)
OK, so here is the additional information based on the questions being asked:
Why triple escaping? fix triple escpaing etc. The answer is that I don't have any control over it. It is not generated by my code.
The original string is not fully json compliant. It has several leading and trailing characters that need to be removed. Therefore I have to use regex. The format of that string is like this:
returnedHTMLdata({json_object},xx);
It looks like this behavior is not limited to preg_match only. Even substr also does this.

It looks like you've got some JSON with padding. To remove the function name and parenthesis, leaving the (unescaped) json object, you can do something like this:
$str = <<<'EOS'
returnedHTMLdata({content: \\\"\\200B\\\", foo: \\\"bar\\\", \"baz\": \\\"fez\\\"},xx);
EOS;
$str = preg_replace('/.+?({.+}).+/','$1', $str);
echo $str;
Output:
{content: \\\"\\200B\\\", foo: \\\"bar\\\", \"baz\": \\\"fez\\\"}
Please note that even if you manage to successfully unescape this string, json_decode requires that keys - e.g. "content" - are enclosed in double quotes, so you will need to modify the JSON string/object before calling that function. Or I guess you could instead use something like the old Services_JSON package to decode it, which I believe does not have that requirement.

Related

PHP urlencode issue with a parameter in my string: "&notify_url" incorrectly returns "¬ify_url"

So when I use PHP's urlencode on the following string, there seems to be a technicality coming up which I think is on a reserved PHP word "&not".
The original string:
cancel_url=https://example.com/payment_cancelled&notify_url=https://example.com/order_notify
I get the following result using urlencode:
cancel_url=https%3A%2F%2Fexample.com%2Fpayment_cancelled¬ify_url=https%3A%2F%2Fexample.com%2Forder_notify
As you notice above, the '¬' special character it creates (just after the word 'cancelled'). So to me it seems the "&not" portion of "&notify_url" is an operator reserved operator word ("&not" in PHP?).
I have tried PHP's str_replace function after url encoding as follows:
$paramUrlString = str_replace('¬', '&not', $paramUrlString);
$paramUrlString = str_replace('&#170', '&not', $paramUrlString);
(trying the ASCII code for that special character too)
I've run out of ideas now. Please assist, thank you.

urlencode does not usually replace &not at all, but does replace & with %26. See example here: http://sandbox.onlinephpfunctions.com/code/e9d62797d01f8162170e5ad5181e14fc339faa52
You could try replacing & with %26 before urlencode.
$urlString = str_replace('&', '%26', $urlString);

It's not that anything in PHP is replacting the string &not with ¬, it's that whatever you're using to view/display the data is doing that.
Given that the closing ; on the entity is not required, I would wager that you're putting the URL into XML without properly escaping the entities. While & is the entity that conflicts between URLs and XML, there are more than that.
The simplest solution is if you're embedding a raw string in an XML document you need to call:
$string = htmlspecialchars($string, ENT_XML1 | ENT_COMPAT);
The best solution, on the other hand, is to not create XML documents by hand at all. Use a library like DOMDocument or XMLWriter. This handles not only the escaping/encoding of your data, but all of the other subtle complexities of creatings proper XML documents.

preg_match url reconise for ifstatement [duplicate]

I've made some regex to test for a YouTube embedded video:
/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/
It works for what I expect when I test it, but the problem though is that I need to pass that regex as a string to some function. Particularly I'm using htmlawed, where I pass a following string to a function:
func('iframe=-*,src(match="/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/")');
The problem is that the above regex sort of works, but it just ignores the slashes, and accepts anything in place of them.
That is why I suspect that there is a problem with escaping.
I would appreciate if you could advice some alternative ways of escaping these slashes and backslashes... there must be some way?

If you have a string, you will need to escape the backslashes (and quotes) for the string literal. Or, depending on how the function builds the regex from the string, you might not need to escape slashes at all (I don't think so here).
"iframe=-*,src(match=\"/^(http:\\/\\/www\\.youtube\\.com\\/embed\\/)[^\\/\\s\\\\]+$/\")"

In PHP, you can also use a different regex delimiter:
~^(http://www\.youtube\.com/embed/)[^/\s\\\\]+$~

PHP Remove all non letters

I have several strings that look like this:
LasklÃ©
Jones & Jon
I am trying to send them via the foursquare API to be matched, however it is failing with these characters. Is there a way to sanitise these so they only include English letters i.e. the results would be:
Lasklé
Jones Jon
As it appears using file_get_contents requests both with the 'Ã©' and the '&' in the URL is causing issues.
I checked how the request was sent and realised that the '&' is uneeded and is causing the issues, is it possible to remove all non Letters/Numbers from the name?

What do the strings look like before you pass them? If your string looks like 'LasklÃ©' then I think you are using the wrong character set when reading the string, try using UTF-8.
If the string looks correct before you pass it on you should try urlencode the string first.

you can use preg_replace() function to replace the part of string using regex
to keep only letters you can use as follow it will also remove space( add \s from expression to keep space)
preg_replace('/[^a-zA-Z]/','',$string);
to keep space in the string or any character to keep you can add it in []
preg_replace('/[^a-zA-Z\s]/','',$string);

Use this to escape (space and '-'). Good for making a custom URL
$string=preg_replace("/[^A-Za-z0-9\s\/\-]/", '', $string);

Escape a url within a regular expression in a preg_replace

I am trying to redirect some tags to another page, passing its href as a url parameter. The code I'm using is something like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/siU",
"<a$1href=\"".WWW."go.php?to=".urlencode("$2")."\"$3>$4</a>", $text
);
It is a modified version of the regexp found here. I use this code in this block:
$text = "<...some other tags...><a target=\"_blank\" href=\"http://www.google.com\" style=\"...\" class=\"...\">Google</a></...some other tags...>";
And it correctly gets captured, but when using urlencode("$2"), it recieves a "$2" string, and not the value stored in the preg variables (as I would). It is not limited to urlencode, but to passing this as a parameter to any other function. So I would not only want to encode this (I can always extend a little more the regexp to accept urls) but generally use variables inside methods.
Do you know any workaround to this? Thanks in advance.

this is totally normal as your are url encoding the string "$2" and then the urlencoded string is used for replacement so you end up with the same thing as writing
"<a$1href=\"".WWW."go.php?to=$2\"$3>$4</a>"
as second parameter. If you want the urlencode to be evaluated you have to use the e (for eval) flag like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/seiU",
"'<a$1href=\"'.WWW.'go.php?to=\"'.urlencode('$2').'\"$3>$4</a>'", $text
);
another preferable solution may be to use preg_replace_callback to avoid relying on evaluating unknown strings

Replace characters in a string with their HTML coding

I need to replace characters in a string with their HTML coding.
Ex. The "quick" brown fox, jumps over the lazy (dog).
I need to replace the quotations with the & quot; and replace the brakets with & #40; and & #41;
I have tried str_replace, but I can only get 1 character to be replaced. Is there a way to replace multiple characters using str_replace? Or is there a better way to do this?
Thanks!

I suggest using the function htmlentities().
Have a look at the Manual.

PHP has a number of functions to deal with this sort of thing:
Firstly, htmlentities() and htmlspecialchars().
But as you already found out, they won't deal with ( and ) characters, because these are not characters that ever need to be rendered as entities in HTML. I guess the question is why you want to convert these specific characters to entities? I can't really see a good reason for doing it.
If you really do need to do it, str_replace() will do multiple string replacements, using arrays in both the search and replace paramters:
$output = str_replace(array('(',')'), array('&#40','&#41'), $input);
You can also use the strtr() function in a similar way:
$conversions = array('('=>'(', ')'=>')');
$output = strtr($conversions, $input);
Either of these would do the trick for you. Again, I don't know why you'd want to though, because there's nothing special about ( and ) brackets in this context.
While you're looking into the above, you might also want to look up get_html_translation_table(), which returns an array of entity conversions as used in htmlentities() or htmlspecialchars(), in a format suitable for use with strtr(). You could load that array and add the extra characters to it before running the conversion; this would allow you to convert all normal entity characters as well as the same time.
I would point out that if you serve your page with the UTF8 character set, you won't need to convert any characters to entities (except for the HTML reserved characters <, > and &). This may be an alternative solution for you.
You also asked in a separate comment about converting line feeds. These can be converted with PHP's nl2br() function, but could also be done using str_replace() or strtr(), so could be added to a conversion array with everything else.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match removes escaping from json string - php

Related

PHP urlencode issue with a parameter in my string: "&notify_url" incorrectly returns "¬ify_url"

preg_match url reconise for ifstatement [duplicate]

PHP Remove all non letters

Escape a url within a regular expression in a preg_replace

Replace characters in a string with their HTML coding

Categories

Resources