Why is rawurlencode() in PHP adding additional escape characters to ampersands? - php

I think I'm missing something obvious here but it is driving me crazy and I can't figure it out. I'm developing a WordPress plugin and part of it needs to take the WordPress post title and send that to a RESTful web service to do something else. So of course I want to rawurlencode() the post title since who knows what text might be in there. However, for some reason the output I'm getting has extra escape characters and I have no idea where they are coming from (and it's causing problems with the web service I'm calling obviously).
My code is fairly straight forward:
$topic = get_the_title($post_id);
$curl_post_fields = 'name=' . rawurlencode( $topic );
Yet when I print the output of those two strings I get:
topic=a & b
name=a%20%26%23038%3B%20b
Whereas I would expect the URL encoded string to be
name=a%20%26%20b
I have no idea where that extra %23038%3B could be coming from. If I'm reading the encoding on that correctly it translates to #038; but I still don't know where it's coming from.

There seems to be a html encoding in between as well, instead of &, & is in the encoded string. Probably because & has to be escaped in html, and the get_title function escapes this using html_special_chars or something like that.

I had some problems with that when i used an older php version

Related

PHP Escape a string if it hasn't already been escaped with entities

I'm using a 3rd party API that seems to return its data with the entity codes already in there. Such as The Lion’s Pride.
If I print the string as-is from the API it renders just fine in the browser (in the example above it would put in an apostrophe). However, I can't trust that the API will always use the entities in the future so I want to use something like htmlentities or htmlspecialchars myself before I print it. The problem with this is that it will encode the ampersand in the entity code again and the end result will be The Lion’s Pride in the HTML source which doesn't render anything user friendly.
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
No one seems to be answering your actual question, so I will
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
It's impossible. What if I'm making an educational post about HTML entities and I want to actually print this on the screen:
The Lion’s Pride
... it would need to be encoded as...
The Lion’s Pride
But what if that was the actual string we wanted to print on the string ? ... and so on.
Bottom line is, you have to know what you've been given and work from there – which is where the advice from the other answers comes in – which is still just a workaround.
What if they give you double-encoded strings? What if they start wrapping the html-encoded strings in XML? And then wrap that in JSON? ... And then the JSON is converted to binary strings? the possibilities are endless.
It's not impossible for the API you depend on to suddenly switch the output type, but it's also a pretty big violation of the original contract with your users. To some extent, you have to put some trust in the API to do what it says it's going to do. Unit/Integration tests make up the rest of the trust.
And because you could never write a program that works for any possible change they could make, it's senseless to try to anticipate any change at all.
Decode the string, then re-encode the entities. (Using html_entity_decode())
$string = htmlspecialchars(html_entity_decode($string));
https://eval.in/662095
There is NO WAY to do what you ask for!
You must know what kind of data is the service giving back.
Anything else would be guessing.
Example:
what if the service is giving back & but is not escaping ?
you would guess it IS escaping so you would wrongly interpret as & while the correct value is &
I think the best solution, is first to decode all html entities/special chars from the original string, and then html encode the string again.
That way you will end up with a correctly encoded string, no matter if the original string was encoded or not.
You also have the option of using htmlspecialchars_decode();
$string = htmlspecialchars_decode($string);
It's already in htmlentities:
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), false);
Hi&mom
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), true);
Hi&mom
Just use the [optional]4th argument to NOT double-encode.

'&LTV' rendering as '<V' when I need it to literally say '&LTV' - php

I ran into a quirky syntax issue.
I am using php and cUrl to pull in data from a web page. The link has several variables. One of them is '&LTV', but the resulting link keeps translating '&LTV' as '<V', looking as '&LT' and the 'less than' symbol, when I need the literal text.
I have looked all over the place to figure out how to force php to read '&LTV' literally but have not found it.
Any ideas here would be appreciated.
Thanks.
You need to encode your HTML entities. Either use htmlentities or manually type out the string "&LTV".

Ampersand issue in w3c validator and search engine

I'm using more than one ampersand in my url, see my link below
http://www.theonlytutorials.com/video.php?cat=55&vid=3975&auth=many
When i try to validate in w3c validator it showed hundreds of error because of this & (ampersand).
After that i read some post in here and i got the solution too.
Instead of using (&) If i use (&) w3c validates fine.
But the problem now is in search Engine. Instead of taking (&). it is taking like the below link
http://www.theonlytutorials.com/video.php?cat=55&vid=3975&auth=many
if you copy paste the above link in the address bar it will take you to the wrong page!. Please help how can i solve it.
There must be an error in your code but since we cannot see any of it I think the most important bit is to understand why the W3C validator complaints about raw &.
The HTML syntax contains two basic elements: tags (e.g. <strong>) and entities (e.g. €). Everything else is displayed as-is.
Browsers are expected to ignore errors.
When you type unknown or invalid tags, the browser will do its best to guess and fix it (you are probably aware of that already):
<p>Hello <i>world</b>!</p>
... will render as:
<p>Hello <i>world</i>!</p>
But the same happens when you type an unknown or invalid entity. In your example, there are two invalid entities:
http://www.theonlytutorials.com/video.php?cat=55&vid=3975&auth=many
^^^^ ^^^^^
However, it works because the browser is clever enough to figure out the real URL. Only the validator complaints because it is a tool specifically designed to find errors.
Now, imagine I want to use HTML to write an HTML tutorial and I want to explain the <strong> tag. If I just type <strong>example</strong>, the browser will display example. I need to encode the < symbol so it no longer has a special meaning:
<strong>example</strong>
Now the browser displays <strong>example</strong>, which is precisely the content I want to show.
The same happens with your URL. Since & is part of the entity syntax, when I want to insert a literal & I need to encode it as well:
Barnes & Noble
... will render as Barnes & Noble. Please note that this is only a syntactic trick to insert plain text into a HTML document. Your document shows Barnes & Noble. to all effects, no matter how you encode it. So when you replace & with & in your URL, you are not changing your URL, you are just encoding it.
If search engines are spidering the wrong URL, that means you have actually changed your URL rather than just encoding it, so the source code is:
http://www.theonlytutorials.com/video.php?cat=55&amp;vid=3975&amp;auth=many
... and renders as:
http://www.theonlytutorials.com/video.php?cat=55&vid=3975&auth=many
This can happen, for instance, if you encode twice:
<?php
$url = 'http://www.theonlytutorials.com/video.php?cat=55&vid=3975&auth=many';
$url = htmlspecialchars($url);
$url = htmlspecialchars($url);
echo $url;
... or:
<?php
$url = 'http://www.theonlytutorials.com/video.php?cat=55&vid=3975&auth=many';
$url = htmlspecialchars($url); // Oops: URL is already encoded!
echo $url;
Seems that you made a typo error, it must be & not &ampamp;

Ampersand problem in XML when creating a URL String

I am working with an XML feed that has, as one of it's nodes, a URL string similar to the following:
http://aflite.co.uk/track/?aid=13414&mid=32532&dl=http://www.google.com/&aref=chris
I understand that ampersands cause a lot of problems in XML and should be escaped by using & instead of a naked &. I therefore changed the php to read as follows:
<node><?php echo ('http://aflite.co.uk/track/?aid=13414&mid=32532&dl=http://www.google.com/&aref=chris'); ?></node>
However when this generates the XML feed, the string appears with the full &
and so the actual URL does not work. Apologies if this is a very basic misunderstanding but some guidance would be great.
I've also tried using %26 instead of & but still getting the same problem.
If you are inserting something into XML/HTML you should always use the htmlspecialchars function. this will escape your strings into correct XML syntax.
but you are running into a second problem.
your have added a second url to the first one.
this need also escaped into url syntax.
for this you need to use urlencode.
<node><?php echo htmlspecialchars('http://aflite.co.uk/track/?aid=13414&mid=32532&aref=chris&dl='.urlencode('http://www.google.com/')); ?></node>
& is correct for escaping ampersands in an XML document. The example you've given should work.
You state that it doesn't work, but you haven't stated what application you're using, or in what way it doesn't work. What exactly happens when you click the link? Do the & strings end up in the browser's URL field? If that's the case, it sounds like a fault with the software you've viewing the XML with. Have you tried looking at the XML in another application to see if the problem is consistent?
To answer the final part of your question: %26 would definitely not work for you -- this would be what you'd use if your URL parameters needed to contain ampersands. Say for example in aref=chris, if the name chris were to an ampersand (lets say the username was chris&bob), then that ampersand would need to be escaped using %26 so that the URL parser didn't see it as starting a new URL parameter.
Hope that helps.

PHP wordpress string formatting error

I have a bit of PHP where I want to store a URL in a string.
The code itself seems fine, but for some reason, when I use the characters $sectionId=, it causes problems, in fact, it alters $sectionId= and changes it to §ionId=.
If I misspell it to $secionId then it works fine.
The full url SHOULD be:
http://url.com/file.php?appKey=$appkey&storeId=$storeid&sectionId=$sectionid&v=3
but when I do an echo $myURL; on it, it gives me:
http://url.com/file.php?appKey=$appkey&storeId=$storeid§ionId=$sectionid&v=3
Notice the §ionId= instead of $sectionId=.
Can anyone help me with this? It seems like basic PHP, but I don't understand why it just doesnt like those 4 or 5 characters in a row!!
Thanks.
Are you echoing it right to HTML? Well, some over-helpful browsers will do character conversions without being asked explicitly to with a semicolon; all you need to do is run it through htmlentities or replace all &s with & and it will display correctly.

Categories