I have a method that scrapes data from a url and returns that as a string variable. Currently the method is working if i put in my own url, but when i insert a generated url it doesnt work.
e.g.
The following string is working if I insert it into a variable, and pass it:
http://www.rijkswaterstaat.nl/apps/geoservices/rwsnl/awd.php?mode=html&projecttype=windsnelheden_en_windstoten&category=1&loc=ZBWI&net=LMW
But this string is being generated by another source. The result of my attempt to fetch it is (var_dump()):
string(154) "http://www.rijkswaterstaat.nl/apps/geoservices/rwsnl/awd.php?mode=html&projecttype=windsnelheden_en_windstoten&category=1&loc=ZBWI&net=LMW"
The string is only 138 characters, however it prints string(158). I think this has something to do with the fact it is not working, but i'm not even sure...
Does anyone have any idea how to clean this up? I have found other questions with the question why var_dump() is showing another value then the length of the string, and that had something to do with unvisible characters, but no real solution is given anywhere.
Thx
154-138 = 16
You have 4 & in the string
& HTML encoded is &
So your string seems to be HTML encoded - in the browser you don't see the encoding unless you "View Source".
You can use html_entity_decode() to decode the string or, if possible, make sure that you get a string that is not encoded for HTML output in the first place.
Related
I am trying to decode this URL string using PHP's urldecode function:
urldecode("%3CR201810579707%3E%20%3C20180828%3E%20%3C20180912%3E%20%3C1033.00%3E%20%3CY%3E%20%3C0.00%21NA%3E");
This is supposed to output...
<R201810579707> <20180828> <20180912> <1033.00> <Y> <0.00!NA>
...but instead is ouptutting this
<20180828> <20180912> <1033.00> <0.00!NA>
I've tested the string in a php online decoder with great success, but can't seem to do this operation server side. Any ideas?
If you're printing the result on a web page, the angle brackets will be treated as tag delimiters. You can display it literally by calling htmlentities():
echo htmlentities(urldecode("%3CR201810579707%3E%20%3C20180828%3E%20%3C20180912%3E%20%3C1033.00%3E%20%3CY%3E%20%3C0.00%21NA%3E"));
I'm using a 3rd party API that seems to return its data with the entity codes already in there. Such as The Lion’s Pride.
If I print the string as-is from the API it renders just fine in the browser (in the example above it would put in an apostrophe). However, I can't trust that the API will always use the entities in the future so I want to use something like htmlentities or htmlspecialchars myself before I print it. The problem with this is that it will encode the ampersand in the entity code again and the end result will be The Lion’s Pride in the HTML source which doesn't render anything user friendly.
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
No one seems to be answering your actual question, so I will
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
It's impossible. What if I'm making an educational post about HTML entities and I want to actually print this on the screen:
The Lion’s Pride
... it would need to be encoded as...
The Lion&;#8217;s Pride
But what if that was the actual string we wanted to print on the string ? ... and so on.
Bottom line is, you have to know what you've been given and work from there – which is where the advice from the other answers comes in – which is still just a workaround.
What if they give you double-encoded strings? What if they start wrapping the html-encoded strings in XML? And then wrap that in JSON? ... And then the JSON is converted to binary strings? the possibilities are endless.
It's not impossible for the API you depend on to suddenly switch the output type, but it's also a pretty big violation of the original contract with your users. To some extent, you have to put some trust in the API to do what it says it's going to do. Unit/Integration tests make up the rest of the trust.
And because you could never write a program that works for any possible change they could make, it's senseless to try to anticipate any change at all.
Decode the string, then re-encode the entities. (Using html_entity_decode())
$string = htmlspecialchars(html_entity_decode($string));
https://eval.in/662095
There is NO WAY to do what you ask for!
You must know what kind of data is the service giving back.
Anything else would be guessing.
Example:
what if the service is giving back & but is not escaping ?
you would guess it IS escaping so you would wrongly interpret as & while the correct value is &
I think the best solution, is first to decode all html entities/special chars from the original string, and then html encode the string again.
That way you will end up with a correctly encoded string, no matter if the original string was encoded or not.
You also have the option of using htmlspecialchars_decode();
$string = htmlspecialchars_decode($string);
It's already in htmlentities:
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), false);
Hi&mom
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), true);
Hi&;mom
Just use the [optional]4th argument to NOT double-encode.
Within the array that I retrieve from mysql is a text field that contains an ellipsis as part of the entry. While mysqli will print out the array record properly, when I try to encode it to a json string (json_encode), I get an error...actually nothing happens. At this point I know enough about json to be dangerous. Hopefully somebody has an answer to this. In the meantime I found the offending records and have changed the ellipsis (...) to colon-minus (:-) which seems to work. For presentation sake, I'd like to include the ellipsis.
Thanks,
KCT3937
"At this point I know enough about json to be dangerous." as well so my suggestion is to work around the problem if you can't find a "proper" solution.
Replace the offending character with something else before encoding and replace it back to the ellipsis in the JavaScript that receives the response.
If you are using php you may also want to look into JSON_UNESCAPED_UNICODE. Check the json_encode online manual for more details.
Another thing to check is verify that your data is UTF-8 encoded.
When connecting to PayPal I use a URL like this (I am using fake values here, but the structure is real):
https://www.paypal.com/cgi-bin/webscr?&business=ZDS346347&cmd=_xclick&amount=100&item_name=Test&no_note=1&no_shipping=1&rm=2&return=http://www.website.com/registration.php?paypal=1&classid=122&sessionid=264&studentid=2286
The problem is when I send this url, it truncates my return value query string from this:
paypal=1&classid=122&sessionid=264&studentid=2286
to this:
paypal=1
The ampersands in the return value are confusing it, but I need to use them so I can process those query string values on the return.
Is there someway, I can pass that whole return string to PayPal so it won't truncate after the first ampersand it hits.
Thanks,
Chris
Wrap the passed URL with urlencode to turn the ampersands into PayPal-parsable characters, then when your URL gets called use urldecode to decode them.
This happens because PayPal's URL simply splits everything after the ? into chunks by the & symbol. It doesn't know when one is part of your website or not. So it's sending PayPal classid=122 as it's own key/value pair, not as a part of your URL. Encoding the URL this way should make it work correctly.
edit Referenced the wrong PHP functions. urlencode/decode are for GET parameter passing, htmlspecialchars is for storing HTML data
I am working with an XML feed that has, as one of it's nodes, a URL string similar to the following:
http://aflite.co.uk/track/?aid=13414&mid=32532&dl=http://www.google.com/&aref=chris
I understand that ampersands cause a lot of problems in XML and should be escaped by using & instead of a naked &. I therefore changed the php to read as follows:
<node><?php echo ('http://aflite.co.uk/track/?aid=13414&mid=32532&dl=http://www.google.com/&aref=chris'); ?></node>
However when this generates the XML feed, the string appears with the full &
and so the actual URL does not work. Apologies if this is a very basic misunderstanding but some guidance would be great.
I've also tried using %26 instead of & but still getting the same problem.
If you are inserting something into XML/HTML you should always use the htmlspecialchars function. this will escape your strings into correct XML syntax.
but you are running into a second problem.
your have added a second url to the first one.
this need also escaped into url syntax.
for this you need to use urlencode.
<node><?php echo htmlspecialchars('http://aflite.co.uk/track/?aid=13414&mid=32532&aref=chris&dl='.urlencode('http://www.google.com/')); ?></node>
& is correct for escaping ampersands in an XML document. The example you've given should work.
You state that it doesn't work, but you haven't stated what application you're using, or in what way it doesn't work. What exactly happens when you click the link? Do the & strings end up in the browser's URL field? If that's the case, it sounds like a fault with the software you've viewing the XML with. Have you tried looking at the XML in another application to see if the problem is consistent?
To answer the final part of your question: %26 would definitely not work for you -- this would be what you'd use if your URL parameters needed to contain ampersands. Say for example in aref=chris, if the name chris were to an ampersand (lets say the username was chris&bob), then that ampersand would need to be escaped using %26 so that the URL parser didn't see it as starting a new URL parameter.
Hope that helps.