php extract piece of html with DOM and insert new html

php extract piece of html with DOM and insert new html - php

what I want is use a html snippet as template with placeholders and load this template, fill with content and return the new html:
$html = '<table>
<tr id="extra">
<td>###EXTRATITLE###</td>
<td>###EXTRATOTAL###</td>
</tr>
</table>';
$temp = new DOMDocument();
$temp->loadHTML($html);
$str = $temp->saveHTML($temp->getElementById('extra'));
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$element = $dom->getElementById('extra');
$element->parentNode->removeChild($element);
$data = [
"key1" => "value1",
"key2" => "value2",
];
foreach ($data as $key => $row) {
$search = [ '###EXTRATITLE###', '###EXTRATOTAL###' ];
$replace = [ $key, $row ];
$el = $dom->createTextNode(str_replace($search, $replace, $str));
$foo = $dom->documentElement->firstChild;
$foo->appendChild($el);
}
echo preg_replace('~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i', '', $dom->saveHTML());
problem are the entities and the wrong placement of the childs - could anyone fix this?

Assuming you have a data mapping array like this:
$data = array(
'PLACEHOLDER1' => 'data 1',
'PLACEHOLDER2' => 'data 2',
);
Here is what you could do:
$html = '<table>
<tr id="myselector">
<td>###PLACEHOLDER1###</td>
<td>###PLACEHOLDER2###</td>
</tr>
</table>';
foreach( array_keys( $data ) as $key )
{
$html = str_replace( '###'.$key.'###', $data[ $key ], $html );
}

Here is an alternative approach:
$html = '<table><tr id="myselector"></tr></table>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$tr = $doc->getElementById('myselector');
foreach ($data as $row) {
$td = $doc->createElement('td', $row['value']);
$tr->appendChild($td);
//repeat as necessary
}
It does not use placeholders, but it should produce the desired result.
If the goal is to create a more complex templating system from scratch, it might make more sense to peruse the XPath documentation and leverage the associated XPath class.

I'm wondering why you're using PHP for this instead of JQuery - which makes this SUPER easy. I mean, I know you can, but i'm not sure you're doing yourself any favors.
I had a similar requirement where i wanted to use a template and get server-side values. What i ended up doing was creating a server-side array and converting this to JSON - in php json_encode($the_data_array);
then i had a standing script portion of my template that used jQuery selectors to get and set the values. This was really clean and much easier to debug that trying to generate the entire HTML payload from php. it also meant i could more easily separate the data file from the template for better caching and updates later.
i can try to update this with a fiddler example if you're not familiar with jQuery and need some guidance. just let me know.

Related

How to make it Short PHP?

i made a code like this and how to make it short ? i mean i don't want to use foreach all the time for regex match, thank you.
<?php
preg_match_all('#<article [^>]*>(.*?)<\/article>#sim', $content, $article);
foreach($article[1] as $posts) {
preg_match_all('#<img class="images" [^>]*>#si', $posts, $matches);
$img[] = $matches[0];
}
$result = array_filter($img);
foreach($result as $res) {
preg_match_all('#src="(.*?)" data-highres="(.*?)"#si', $res[0], $out);
$final[] = array(
'src' => $proxy.base64_encode($out[1][0]),
'highres' => $proxy.base64_encode($out[2][0])
);
?>

If you want a robust code (that always works), avoid to parse html using regex, because html is more complicated and unpredictable than you think. Instead use build-in tools available for these particular tasks, i.e DOMxxx classes.
$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($content);
libxml_use_internal_errors($state);
$xp = new DOMXPath($dom);
$imgList = $xp->query('//article//img[#src][#data-highres]');
foreach($imgList as $img) {
$final[] = [
'src' => $proxy.base64_encode($img->getAttribute('src')),
'highres' => $proxy.base64_encode($img->getAttribute('data-highres'))
];
}

Array filter in PHP

I am using a simple html dom to parsing html file.
I have a dynamic array called links2, it can be empty or maybe have 4 elements inside or more depending on the case
<?php
include('simple_html_dom.php');
$url = 'http://www.example.com/';
$html = file_get_html($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
//////////////////////////////////////////////////////////////////////////////
foreach ($doc->getElementsByTagName('p') as $link)
{
$intro2 = $link->nodeValue;
$links2[] = array(
'value' => $link->textContent,
);
$su=count($links2);
}
$word = 'document.write(';
Assuming that the two elements contain $word in "array links2", when I try to filter this "array links2" by removing elements contains matches
unset( $links2[array_search($word, $links2 )] );
print_r($links2);
the filter removes only one element and array_diff doesn't solve the problem. Any suggestion?

solved by adding an exception
foreach ($doc->getElementsByTagName('p') as $link)
{
$dont = $link->textContent;
if (strpos($dont, 'document') === false) {
$links2[] = array(
'value' => $link->textContent,
);
}
$su=count($links2);
echo $su;

Parsing HTML Table Data from XML with PHP

I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.
Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).
Here is my code so far (code updated):
Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.
function getXMLFeed() {
echo "<h2>Reddit Items</h2><hr><br><br>";
//$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
$feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
$xml = simplexml_load_file($feedURL);
//define each xml entry from reddit as an item
foreach ($xml -> entry as $item ) {
foreach ($item -> content as $content) {
$newContent = (string)$content;
$html = str_get_html($newContent);
foreach($html->find('table') as $table) {
$links = $table->find('span', '0');
//echo $links;
foreach($links->find('a') as $link) {
echo $link->href;
}
}
}
}
}
XML Code:
http://pasted.co/0bcf49e8
I've also included JSON if it can be done this way; I just preferred XML:
http://pasted.co/f02180db
That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).
foreach ($item -> content as $content) {
$dom = new DOMDocument();
$dom -> loadHTML($content);
$xpath = new DOMXPath($dom);
$classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";
foreach ($dom->getElementsByTagName('table') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
//$originalURL = $node->getAttribute('href');
}
//$html = $dom->saveHTML();
}
I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.
Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!
Added HTML:
I am specifically trying to extract <span>[link]</span> from each table/item.
http://pastebin.com/QXa2i6qz

The following code can extract you all the youtube links from each content.
function extract_youtube_link($xml) {
$entries = $xml['entry'];
$videos = [];
foreach($entries as $entry) {
$content = html_entity_decode($entry['content']);
preg_match_all('/<span><a href="(.*)">\[link\]/', $content, $matches);
if(!empty($matches[1][0])) {
$videos[] = array(
'entry_title' => $entry['title'],
'author' => preg_replace('/\/(.*)\//', '', $entry['author']['name']),
'author_reddit_url' => $entry['author']['uri'],
'video_url' => $matches[1][0]
);
}
}
return $videos;
}
$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);
foreach($videos as $video) {
echo "<p>Entry Title: {$video['entry_title']}</p>";
echo "<p>Author: {$video['author']}</p>";
echo "<p>Author URL: {$video['author_reddit_url']}</p>";
echo "<p>Video URL: {$video['video_url']}</p>";
echo "<br><br>";
}
The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!

If you're looking for a specific element you don't need to parse the whole thing. One way of doing it could be to use the DOMXPath class and query directly the xml. The documentation should guide you through.
http://php.net/manual/es/class.domxpath.php .

Recursively loop through the DOM tree and remove unwanted tags?

$tags = array(
"applet" => 1,
"script" => 1
);
$html = file_get_contents("test.html");
$dom = new DOMdocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$body = $xpath->query("//body")->item(0);
I'm about looping through the "body" of the web page and remove all unwanted tags listed in the $tags array but I can't find a way. So how can I do it?

Had you considered HTML Purifier? starting with your own html sanitizing is just re-inventing the wheel, and isn't easy to accomplish.
Furthermore, a blacklist approach is also bad, see SO/why-use-a-whitelist-for-html-sanitizing
You may also be interested in reading how to cinfigure allowed tags & attributes or testing HTML Purifier demo

$tags = array(
"applet" => 1,
"script" => 1
);
$html = file_get_contents("test.html");
$dom = new DOMdocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
for($i=0; $i<count($tags); ++$i) {
$list = $xpath->query("//".$tags[$i]);
for($j=0; $j<$list->length; ++$j) {
$node = $list->item($j);
if ($node == null) continue;
$node->parentNode->removeChild($node);
}
}
$string = $dom->saveXML();
Something like that.

convert associate array to XML in php

How do i convert an associate array to an XML string? I found this but get the error 'Call to a member function addChild() on a non-object' when running the line
$node = $xml->addChild($key);

Use the PHP Document Object Model:
$xml = new DOMDocument('1.0', 'utf-8');
$root = $xml->createElement('top');
$xml->appendChild($root);
foreach ($arr as $k => $v) {
$node = $xml->createelement($k);
$text = $xml->createTextNode($v);
$node->appendChild($text);
$root->appendChild($node);
}
echo $xml->saveXml();

Did you initialize the $xml object? That's probably your problem.

Its pretty similar to how you would do something like this:
while($row = mysql_fetch_assoc($result))
You can't use $result as an array, but you can foreach or while through the different entries.

PEAR's XML_Serialize is pretty good if you want a easy solution. Doing the DOM manually is arguably faster.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php extract piece of html with DOM and insert new html - php

Related

How to make it Short PHP?

Array filter in PHP

Parsing HTML Table Data from XML with PHP

Recursively loop through the DOM tree and remove unwanted tags?

convert associate array to XML in php

Categories

Resources