PHP decoding square brackets href attr to html file - php

Saving an html the decodes square brackets.
//My STRing
$teaserTest = "<a href='[CLICK_URL]'><strong>testgerr</strong></a>";
//Calling save function
saveFile($teaserTest);
//Save function
function saveFile($stringToAdd){
$doc = new DOMDocument();
$doc->formatOutput = true;
$doc->loadHTML('<html><head><title>Test</title></head><body>'.$stringToAdd.'</body></html>');
$doc->saveHTMLFile("Campaigns/test.html");
}
file resaults <a href="%5BCLICK_URL%5D">
im trying to keep the"[" decoded.

[] brackets are special chars in url
which is specified in following RFC It is important for the ip address for example: http://[::1]/example/
That because it is good to encoding. But if you have a special approach use a different pattern for it.

Related

How to handle special HTML characters in DOMDocument?

Let's say I build an HTML fragment using the following code:
$dom = new DOMDocument();
$header = $dom->createElement("h2", "Lorem & Ipsum");
$dom->appendChild($header);
print($dom->saveHTML());
The raw HTML code printed contains the unescaped & symbol instead of the necessary HTML &. The code also throws the following PHP error:
Warning: DOMDocument::createElement(): unterminated entity reference
What's the best way to handle this?
It appears that the PHP team is not willing to change this behavior (source), so we have to find a workaround instead.
One way is to simply do the encoding yourself in the PHP code, as such:
$header = $dom->createElement("h2", "Lorem & Ipsum");
However, this isn't always convenient, as the text printed may be inside of a variable or contain other special characters besides &. So, you can use the htmlentities function.
$text = "Lorem & Ipsum";
$header = $dom->createElement("h2", htmlentities($text));
If this still is not an ideal solution, another workaround is to use the textContent property instead of the second argument in createElement.
In the code below, I've implemented this in a DOMDocument subclass, so you just have to use the BetterDOM subclass instead to fix this strange bug.
class BetterDOM extends DOMDocument {
public function createElement($tag, $text = null) {
$base = parent::createElement($tag);
$base->textContent = $text;
return $base;
}
}
// Correctly prints "<h2>Lorem & Ipsum</h2>" with no errors
$dom = new BetterDOM();
$header = $dom->createElement("h2", "Lorem & Ipsum");
$dom->appendChild($header);
print($dom->saveHTML());

PHP How to convert strings from DomCrawler to UTF-8

I have some data I collect with DomCrawler and store in an array, but it looks like he fails when it comes to special characters like è,à,ï,etc.
As an example I get è instead of è when I echo the result.
When I store my results in a .json file I get this: \u00c3\u00a8
My goal is to save the special character in the .json file.
I've tried encoding it but doesn't seem to have the result I want.
$html = file_get_contents($url);
$crawler = new Crawler($html);
$h1 = $crawler->filter('h1');
$title = $h1->text();
$title = mb_convert_encoding($title, "HTML-ENTITIES", "UTF-8");
Is there anyway I can have my special characters shown?
Thanks a lot!
By using the constructor to add the HTML, the crawler assume that it is in ISO-8859-1. You have to explicitly tell it that your DOM is in UTF-8 with the addHTMLContent method:
$html = file_get_contents($url);
$crawler = new Crawler;
$crawler->addHTMLContent($html, 'UTF-8');

replace img src with php

I would like to take a block of code stored in a variable and replace the src of any image tags in there without disturbing the rest of the code block.
For example : the block of code might read :
<img src="image1.jpg">
I would like to change that to (using PHP) :
<img src="altimage.jpg">
I am currently using a solution I found using the PHP DOM module to change the image tag but the function returns just the changed img tag HTML without the rest of the HTML.
The function I am calling is as follows :
function replace_img_src($original_img_tag, $new_src_url) {
$doc = new DOMDocument();
$doc->loadHTML($original_img_tag);
$tags = $doc->getElementsByTagName('img');
if(count($tags) > 0)
{
$tag = $tags->item(0);
$tag->setAttribute('src', $new_src_url);
return $doc->saveXML($tag);
}
return false;
}
This, of course, just returns the changed img tag but strips the other HTML (such as the A tag) - I am passing the entire block of code to the function.
(BTW - It's good for me to have the false return for no image tags as well).
What am I missing here please ?
Many thanks in advance for any help.
You need to use return $doc->saveXML(); instead of return $doc->saveXML($tag);. See the documentation of saveXML:
saveXML ([ DOMNode $node [, int $options ]] )
node: Use this parameter to output only a specific node without XML declaration rather than the entire document.

Data not saved properly into XML file

I am trying to save data to a XML file.
if (isset( $_POST['submit'])) {
$name = mysql_real_escape_string($_POST['name']);
$xml = new DOMDocument("1.0", "ISO-8859-1");
$xml->preserveWhiteSpace = false;
$xml->load('/var/www/Report/file.xml');
$element = $xml->getElementsByTagName('entries');
$newItem = $xml->createElement('reports');
$newItem->appendChild($xml->createElement('timestamp', date("F j, Y, g:i a",time())));
$newItem->appendChild($xml->createElement('name', $name));
$element -> item(0) -> appendChild($newItem);
$xml->formatOutput = true; // this adds spaces, new lines and makes the XML more readable format.
$xmlString = $xml->saveXML(); // $xmlString contains the entire String
$xml->save('/var/www/Report/file.xml');
}
Anytime I use mysql_real_escape_string() to escapes special characters in my string or try to to sanitize my data, my XML file looks like something in the image below.
I don't understand why the name start tag is missing in my XML file and why my data $name isn't saved into the XML file either. How could I to fix this. Any help would be appreciated. Thanks in advance
1) Firefox does not show the xml byte-by-byte, rather it formats it itself. Therefore empty elements appear with only one self closing tag, no matter how they appear in the xml source. If you want to see the exact xml output, then open it in a text editor.
2) mysql_real_escape_string escapes characters that are problematic for mysql, not for XML: for example it does not escape '<'. Instead it, you should use DOMDocument::createTextNode. So instead of
$newItem->appendChild($xml->createElement('name', $name));
use
$nameElement = $xml->createElement('name');
$nameElement->appendChild( $xml->createTextNode($name) );
$newItem->appendChild($nameElement);
3) As you see, the $name is empty, but I don't know what can be the problem with your POST data seeing only this code, maybe there is no <input name='name' /> in your form: I sometimes accidentally set an id attribute to inputs instead of the name attribute.

php dom get image src path encode

I use below codes to extract image src path .but this is a problem when the image filename has special character(eg:~ DQBTZ_UC(G#STWO_1R2U_Q4.gif),the output turn to be like this :~ 6Z6W4%255BO29FQ%255BA4YN_%255BFR9%2529M.gif
How to fix this issue? sorry for my poor English.
function _get_imagepath($content){
$doc = new DOMDocument();
$doc->loadHTML($content);
$imagepaths=array();
$imageTags = $doc->getElementsByTagName('img');
$folder=file_directory_path();
foreach($imageTags as $tag) {
$imagepaths[]=$tag->getAttribute('src');
}
if(!empty($imagepaths)){
return $imagepaths;
}else{
return FALSE;
}
}
It seems your filenames are URL encoded. Take a look at http://php.net/manual/en/function.urldecode.php
i.e:
foreach($imageTags as $tag) {
$imagepaths[]=urldecode($tag->getAttribute('src'));
}
You get the encoded URL.
You want to use urldecode:
Decodes any %## encoding in the given string. Plus symbols ('+') are
decoded to a space character.
urldecode() in PHP Manual
replace
return $imagepaths;
with
return urldecode($imagepaths);
to decode your image url.

Categories