Inserting a Div through DOMDocument into content got from remote site - php

I am sorry if the title is confusing here is the explanation:
say we have remote page like remotesite.com/page1.html and we use the function file_get_contents to get its source, then we use DOMDocument to edit this source before printing it to our page
$url = "remotesite.com/page1.html";
$html = file_get_contents($url);
$doc = new DOMDocument(); // create DOMDocument
libxml_use_internal_errors(true);
$doc->loadHTML($html); // load HTML you can add $html
//here we do some edits to remove or add contents
I want to add the Div below to the content before printing it:
<div style="float: right; padding-right: 2px;"><a class="open_event_tab" target="_blank" href="some-hard-coded-text-here_'+content+'_title_'+lshtitle+'_event_'+id+'.html" >open event</a></div>
Here is what i have tried but the soft coded part ( '+content+'_title_'+lshtitle+'_event_'+id+') of the href is not working
i know that the my code below may look stupid but sorry i dont have enough knowledge in php
function createDivNode($doc) {
$divNode = $doc->createElement('div');
$divNode->setAttribute('style', 'float: right; padding-right: 2px;');
$aNode = $doc->createElement('a', 'openEvent');
$aNode->setAttribute('class', 'open_event_tab');
$aNode->setAttribute('target', '_blank');
$aNode->setAttribute('href', 'some-hard-coded-text-here_'+content+'_title_'+lshtitle+'_event_'+id+'.html');
$divNode->appendChild($aNode);
return $divNode;
}
and i want to loop through the source got from remote site to get every td that look like the one below and add the div just before closing it
<td colspan="2">
<b>Video </b>
<span class="section">Sports</span><b>: </b>
<span id="category466" class="category">Motor Sports</span>
//here i want to add my div
</td>
6 hours of research and i can't figure this out as i am in learning phase, so i decided to ask someone here at this helpful community

Still now confused about your question but I think that, you want to add/append a dynamic div inside every td and if this is the case then you may try this (at least, you'll get an idea and it's very clean)
var content = 'someContent', lshtitle = 'someTitle', id = 'anId';
var attr = {
'class' : 'open_event_tab',
'target' : '_blank',
'href' : 'some-hard-coded-text-here_' + content + '_title_' + lshtitle + '_event_' + id + '.html',
'text' : 'open event'
};
var link = $('<a/>', attr);
var div = $('<div/>', { 'style' : 'float:right;padding-right:2px;' }).append(link);
$('#myTable td').append(div);
DEMO.
Update : (Question was confusing, so updated answer given below)
Just download Simple HTML DOM Perser and (documentation here)
include('simple_html_dom.php');
$html = file_get_html('remotesite.com/page1.html');
foreach($html->find('table td') as $td) {
$td->innertext = $td->innertext . '<div>New Div</div>';
}
Also, modify only tds those have class=category
foreach($html->find('table td.category') as $td) {
$td->innertext = $td->innertext . '<div>New Div</div>';
}
And you are done. Notice <div>New Div</div>, it's just an example, hope, you can make it according to your need.
Possible result of the example :
<table>
<tbody>
<tr>
<td colspan="2">
<b>Video </b> <span class="section">Sports</span><b>: </b> <span id="category466" class="category">Motor Sports</span>
<div>New Div</div>
</td>
</tr>
<tr>
<td colspan="2">
<b>Video </b> <span class="section">Sports</span><b>: </b> <span id="category466" class="category">Motor Sports</span>
<div>New Div</div>
</td>
</tr>
</tbody>
</table>

JQUERY
if you want to insert the div at the end of each td use append()
$('td').append('<div>Test</div>');
At the beginning of each td use prepend
$('td').prepend('<div>Test</div>');
For more information jquery website

Related

How to return in php DOMXPath object?

Now found query if '$NotXP->query' = query return string?!
How to make work next code?
$xp = new \DOMXPath(#\DOMDocument::loadHTMLFile($url));
$list = $xp->query('//table[#class="table-list quality series"] tbody');
$link = $list->query('//tr[#class="item"]');
$arr_links = [];
foreach ($link as $link_in_cycle) {
$link_quality = $link_in_cycle->query('//td[#class="column first video"]');
$link_audio = $link_in_cycle->query('//td[#class="column audio"]');
$link_size = $link_in_cycle->query('//td[#class="column size"]');
$link_seed = $link_in_cycle->query('//td[#class="column seed-leech"] span[#class="seed"]');
$link_download_url = $link_in_cycle->query('//td[#class="column last download"] a')->getAttribute("data-default");
html source for request #nigel-ren
From this code need grab of info
<tbody>
<tr class="item">
<td class="column first video">720x400</td>
<td class="column audio">mp3</td>
<td class="column size">5.70 Gb</td>
<td class="column seed-leech">
<span class="seed">15</span>
<span class="leech">26</span>
</td>
<td class="column updated">07.07.2017</td>
<td class="column consistence"></td>
<td class="column last download">
<a class="button middle rounded download zona-link"
data-type="download"
data-zona="0"
data-torrent=""
data-default="url_data"
data-not-installed=""
data-installed=""
data-metriks="{'eventType': 'click', 'data' : { 'type': 'show_download', 'id': '84358'}}"
title="text in title" href="javascript:void(0);" >Download</a> </td>
I've made a few changes to help me in debug the code. The main thing is that your XPath expressions were invalid, you can always try a site like FreeFormatter which allows you to check your expressions with some example source.
$doc = new \DOMDocument();
$doc->loadHTMLFile($url);
$xp = new \DOMXPath($doc);
$list = $xp->query('//table[#class="table-list quality series"]//tr[#class="item"]');
$arr_links = [];
foreach ($list as $link_in_cycle) {
$link_quality = $xp->query('//td[#class="column first video"]/text()', $link_in_cycle)[0]->wholeText;
$link_audio = $xp->query('//td[#class="column audio"]/text()', $link_in_cycle)[0]->wholeText;
$link_size = $xp->query('//td[#class="column size"]/text()', $link_in_cycle)[0]->wholeText;
$link_seed = $xp->query('//td[#class="column seed-leech"]//span[#class="seed"]/text()', $link_in_cycle)[0]->wholeText;
$link_download_url = $xp->query('//td[#class="column last download"]//a/#data-default', $link_in_cycle)[0]->value;
echo $link_quality.PHP_EOL;
echo $link_audio.PHP_EOL;
echo $link_size.PHP_EOL;
echo $link_seed.PHP_EOL;
echo $link_download_url.PHP_EOL;
}
The XPath expressions try and retrieve the text node in each element, which will return a list of all of the nodes, this code does assume there isn't any whitespace around the actual content (and uses [0] to fetch the first element of the list). The wholetext is just the actual content of the DOMText element.
With the sample content you gave (plus the surrounding bits I had to invent) it gives...
720x400
mp3
5.70 Gb
15
Download

Get all strings between two other strings in html document in PHP

I'm creating some kind of crawler/proxy at the moment. It can navigate a website and still remain on my website while browsing. But I thought about while loading the website, get all the links and data at the same time.
So the website contains many "< tr>"(without the space) which again contains a lot of other stuff.
Here is 1 example of many on the website:
<tr>
<td class="vertTh">
<center>
Other
<br>
Document
</center>
</td>
<td>
<div class="Name">
Document Title Info
</div>
<a href="http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<font class="Desc">Uploaded 03-24 14:02, Size 267.35 KB, ULed by <a class="Desc" href="/s/user/username/" title="Browse username">username</a></font>
</td>
<td align="right">67</td>
<td align="right">9</td>
</tr>
Users browse the proxy site, and while they do, it catches info from the original website.
I figured out how to get a string between two words, but I don't know how to make this to a "foreach" code or something else.
So let's say I want to get the source link. Then I would do something like this:
$url = $_GET['url'];
$str = file_get_contents('https://database.com/' . $url);
$source = 'http://example.com/source/to/' . getStringBetween($str,'example.com/source/to/','" title="Source">'); // Output looking like this: http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters
function getStringBetween($str,$from,$to)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
return substr($sub,0,strpos($sub,$to));
}
But I can't just do this, because there are multiple of these strings. So I'm wondering if there is any kind of way I can get Source, name and size on all of these strings?
You might want to use preg_match_all so that you get a list of many matches. Then you can loop over it.
http://php.net/manual/en/function.preg-match-all.php
$html = '<tr>
<td class="vertTh">
<center>
Other
<br>
Document
</center>
</td>
<td>
<div class="Name">
Document Title Info
</div>
<a href="http://another-example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<a href="http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<font class="Desc">Uploaded 03-24 14:02, Size 267.35 KB, ULed by <a class="Desc" href="/s/user/username/" title="Browse username">username</a></font>
</td>
<td align="right">67</td>
<td align="right">9</td>
</tr>';
// use | as delimiter for pattern to make it a little cleaner
preg_match_all('|href="(http://.+?)" title="Source"|', $html, $matches);
// loop over $matches
var_dump($matches);
foreach ($matches[1] as $match) {
// $match == http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters
}
You can try this example at... http://phpfiddle.org/ or run it in a .php file locally. Good luck.
FYI: I added an extra anchor tag to illustrate finding another source.

HTML DOM remove/replace between <tr> and </tr> tags

I've searched for solution but i'm lost. I have to remove or replace with blank everything between <tr> tags. I'm loading html file, which contains many <tr> tags, my goal is to remove <tr> with specific id. My <tr> looks like this:
<tr id="ctl00_cphMain_DisplayRecords1_RepeaterResults_ctl03_trZSD">
<td id="ctl00_cphMain_DisplayRecords1_RepeaterResults_ctl03_tdZSD" class="td-zsd footable-visible footable-last-column footable-first-column" colspan="9">
<div id="divZSDBanners" class="table-banners-zsd clearfix">
<div>
<div class="medium-4 columns zsd-ext-ad">
<div>
<script type="text/javascript">
</script>
<script>
</script>
<div id="ctl00_cphMain_DisplayRecords1_RepeaterResults_ctl03_ctl00_divSpace1" class="adSpacer">
</div>
</div>
</div>
<script type="text/javascript">
</script>
</div>
</div>
</td>
</tr>
I'm using Simple HTML DOM, I've already tried with $html->find('tr[id=tr_id]), but don't know to replace everything between, including divs and script tags.
Any ideas?
Use ->innertext property:
$tr = $html->find( 'tr[id=tr_id]', 0 ); // Select first node (0)
$tr->innertext = '';
echo $html->save();
Output:
<tr id="tr_id"></tr>
Or:
$tr->innertext = '<td>New Content</td>';
echo $html->save();
Output:
<tr id="tr_id"><td>New Content</td></tr>
To remove the TR element itself via DOM, use the removeChild method of its parent node:
$tr->parentNode->removeChild($tr);
To remove the element’s contents, either set its textContent property to empty string '' (PHP 5.6.1+) or remove all child nodes one by one using the element’s removeChild() method in a loop, e. g.:
while ($tr->lastChild) {
$tr->removeChild($tr->lastChild);
}
SimpleXMLElement object can be converted to DOMElement object using the dom_import_simplexml() function.

change an element from display none to display block simple html dom

so here is what I have and it works perfectly:
include('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('http://localhost/index.html');
in this page resides a button called "phone number", once you click on it it opens a div
<div class="phone" style="display: none;">
<span class="number"> 212-222-3453</span>
</div>
is there a away to change it to display:block before I scrape the data ?
Yes, use the below code.
include('simple_html_dom.php');
$html = file_get_html('index.html');
$phoneArray = $html->find('div[class=phone]');
$phoneArray[0]->style="display:block";

How can I remove the <p style="text-align:center"> (or left, right. The paragraph comes from database) using PHP?

I have this html code that came from my database (I used TinyMCE to save the data)
<p style="text-align: center;">Come on grab your friends</p>
<p style="text-align: center;">Go to very distant lands</p>
<p style="text-align: center;">Jake the Dog and Finn the Human</p>
<p style="text-align: center;">The fun never ends</p>
<p style="text-align: center;"><strong>Adventure Time!</strong></p>
How can I remove those <p></p> tags considering that there can be other styles applied when using TinyMCE?
To remove HTML tags from a string you can use strip_tags():
$str = '<p style="text-align: center;">Come and grab your friends</p>';
$str2 = strip_tags($str);
echo $str2; // "Come and grab your friends"
To keep certain tags, you can add an additional parameter:
$str = '<p style="text-align: center;"><strong>Adventure Time!</strong></p>';
$str2 = strip_tags($str, "<strong>"); // Preserve <strong> tags
echo $str2; // "<strong>Adventure Time!</strong>"
The second parameter is a string listing each tag you don't want stripped, for example:
$str2 = strip_tags($str, "<p><h1><h2>"); // Preserve <p>, <h1>, and <h2> tags
For more information, review the PHP documentation linked above.
Although you mentioned that you don't use js, I would strongly encourage you to start using it. You will find it extremely useful in plenty of cases to just interfere to client-side rather than use just server-side procedures (as php does). So, just for the record, here is my suggested jQuery solution:
<html>
<head>
<!-- your head content here -->
<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
</head>
<body>
<p style="text-align: center;">Come on grab your friends</p>
<p style="text-align: center;">Go to very distant lands</p>
<p style="text-align: center;">Jake the Dog and Finn the Human</p>
<p style="text-align: center;">The fun never ends</p>
<p style="text-align: center;"><strong>Adventure Time!</strong></p>
<div id="result"></div> <!-- here I have added an extra empty div to display the result -->
<script>
$(document).ready(function() {
$("p").each(function() {
var value = $(this).text();
$("#result").append(value+ "<br>");
$(this).css("display", "none");
});
});
</script>
</body>
</html>
Live example here: http://jsfiddle.net/Rykz9/1/
Hope you (and others) find it useful... Happy coding!

Categories