I am trying to load the meta description of this website (which has a German character) via the following script in PHP:
$page_content = file_get_contents($uri);
$dom_obj = new \DOMDocument();
$dom_obj->loadHTML(mb_convert_encoding($page_content, 'HTML-ENTITIES', 'UTF-8'));
However, while trying to write it into the MySQL db, Laravel says it ran into troubles trying to write that into the db: incorrect string value "\xC3" (which is the German character)
When I simply do the following, writing to the db works. But the character is not displayed correctly (ü instead of ü)
$dom_obj->loadHTML($page_content)
This problem only occurs with this website so far, others I tried with the same character do work. Can you think of a possible reason and fix? Thank you!
Edit:
It works fine, when I use PHPs "utf8_decode" to decode the meta description that I get via $dom_obj without mb_convert_encoding. When I do this, all other sites that worked before lead to errors (like this: Incorrect string value: '\xE4t')
I found the error. I was using substr to shorten the description. Apparently substr cut off one of those special characters and this is why it wasnt working.
foreach($dom_obj->getElementsByTagName('meta') as $meta) {
if($meta->getAttribute('name')=='description'){
substr($meta->getAttribute('content'), 0, 156);
This is a workaround:
mb_substr($foo,0,156,"UTF-8");
Related
I have a database where I need to display the records for the user. I am using htmlentities to make sure no malicious code is being echoed to the user like this:
function h($string) {
return htmlentities($string, ENT_SUBSTITUTE, "UTF-8");
}
then calling the function whenever I output any entries to the user. The problem is that I need to be able to show the Danish characters ÆØÅ and these characters displays as a question mark in a square. The site has utf-8 encoding as well.
I have tried all that is listed under htmlentities on php.net and tried finding some solution for creating exceptions or another work around, but I have been unable to find any.
Does anybody know a workaround for this issue?
The second comment answered it. Adding the charset in the connection solved the problem. So for my PDO connection I had to put it like this:
$dbh = new PDO('mysql:dbname=myName;charset=utf8;host=myHost', 'myUser', 'myPassw0rd');
Now everything displays properly.
I'm trying to execute the following simple command, but it's not returning any result:
$val = substr($gesprek->gevormdnummer, 0, 2);
The $gesprek->gevormdnummer value is a phone number fetched from the database (stored as varchar(50)).
I'm using the Yii framework to get the data from the MS SQL Database.
I'm running PHP 5.5.8 NTS on IIS7.5.
If I echo the variable it returns for example 00497121212 or 511 depending if it's an internal or external number.
When I change $gesprek->gevormdnummer to for example '511', substr will work correctly. I've tried using mb_substr with UTF-8 encoding but that returns the same result.
Is there anybody who has an idea what the problem might be?
CBroe gave the to finding the answer to my problem.
The string was prefixed with spaces till there we're 16 characters.
So the substr was working correctly, i got back 2 spaces, which didn't show up in the echo.
var_dump($var) gave me the insight to find this out.
I'm storing HTML and text data in my database table in its raw form - however I am having a slight problem in getting it to output correctly. Here is some sample data stored in the table AS IS:
<p>Professional Freelance PHP & MySQL developer based in Manchester.
<br />Providing an unbeatable service at a competitive price.</p>
To output this data I do:
echo $row['details'];
And this outputs the data correctly, however when I do a W3C validator check it says:
character "&" is the first character of a delimiter but occurred as data
So I tried using htmlemtities and htmlspecialchars but this just causes the HMTL tags to output on the page.
What is the correct way of doing this?
Use & instead of &.
What you want to do is use the php function htmlentities()...
It will convert your input into html entities, and then when it is outputted it will be interpreted as HTML and outputted as the result of that HTML...For example:
$mything = "<b>BOLD & BOLD</b>";
//normally would throw an error if not converted...
//lets convert!!
$mynewthing = htmlentities($mything);
Now, just insert $mynewthing to your database!!
htmlentities is basically as superset of htmlspecialchars, and htmlspecialchars replaces also < and >.
Actually, what you are trying to do is to fix invalid HTML code, and I think this needs an ad-hoc solution:
$row['details'] = preg_replace("/&(?![#0-9a-z]+;)/i", "&", $row['details']);
This is not a perfect solution, since it will fail for strings like: someone&son; (with a trailing ;), but at least it won't break existing HTML entities.
However, if you have decision power over how the data is stored, please enforce that the HTML code stored in the database is correct.
In my Projects I use XSLT Parser, so i had to change to (e.g.). But this is the safety way i found...
here is my code
$html = trim(addslashes(htmlspecialchars(
html_entity_decode($_POST['html'], ENT_QUOTES, 'UTF-8'),
ENT_QUOTES, 'UTF-8'
)));
And when you read from DB, don't forget to use stripslashes();
$html = stripslashes($mysq_row['html']);
We have a PHP site on Zend Framework with a backend Postgresql database. Our primary character encoding is UTF-8.
I just checked our error log and found a strange entry. My URL is as follows:
www.mydomain.com/schuhe-für-breite-füsse
however someone (or maybe a bot) has tried to access this URL as follows:
www.mydomain.com/schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse/
It's the first time I've seen something like the above. Two things are happening on my page:
1) The above URL is queried against our CMS. This works fine for some reason, I think Postgresql reaslises it is byte-encoded and then converts it back when tried to find this SEF URL in our database.
2) An Ajax request is made on the page, passing the same SEF URL. This fails. I believe the slashes are causing a problem on Javascript.
To avoid this I want to decode any URL that is encoded like this. However a quick test of the following code did not decode anything for me :(
$landing_sef_url = $this->_getParam('landing_sef_url');
$utf8=html_entity_decode($landing_sef_url);
$iso8859=utf8_decode($utf8);
$test3 = html_entity_decode($landing_sef_url, 1, "ISO-8859-1");
$test4 = urldecode($landing_sef_url);
echo utf8_decode("$landing_sef_url");
echo "<br/><br/>";
die($landing_sef_url . " -- $utf8 -- $iso8859 <br/>$test3<br/>$test4");
I found the above via various posts online but they all print back the same result - schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse
Any help would be MUCH appreciated. Many thanks!
This method seems to do what you're looking for:
http://li.php.net/manual/en/function.stripcslashes.php
But if you're just looking to unescape \x## sequences, you could also do this with a fairly simple regular expression.
I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124