Extract the content of the script with PHP Simple HTML DOM - php

I want to extract the content of the script on this page, which has the ID __NEXT_DATA__ using PHP Simple HTML DOM, the code I wrote is this:
foreach($html_base->getElementsByTagName('script') as $element) {
if (isset($element->id)){
$id = $element->id;
if ($id == "__NEXT_DATA__"){
$f = $element->nodeValue;
echo $f;
break;
}
}
}
but unfortunately it gives me the following error:
Undefined property: DOMElement::$id

You can use simple html dom documentation but here's my suggestion:
$html = file_get_html("url");
$script = $html->find("script[id=__NEXT_DATA__]", 0)->innertext;
the second parameter which is 0, is the index of the searched results and because it's only one script with this id, you can take the first result.

Related

How to create array in foreach loop?

I am using PHP HTML DOM Parser to get data from another site. First i get URLs of my trades on this site and than i send another request on each trade url to get comments .I want to make an array of comments so i can sort them later. Why i cant create array ?
It looks like this
include_once('simple_html_dom.php');
$result = array();
$html = file_get_html('http://csgolounge.com/profile?id='.$steamid);
foreach($html->find('div.tradepoll') as $trade)
{
$tradeid = $trade->find('.tradeheader')[0]->find('a')[0]->href;
$html = file_get_html('http://csgolounge.com/'.$tradeid);
foreach($html->find('div.message') as $message)
{
if($message->find('p',0)){}
else
{
$left = $message->find('.msgleft')[0];
$right = $message->find('.msgright')[0];
//information about comments
$time = trim(strip_tags_content($left->innertext));
$text = $left->find('.msgtxt')[0];
$result[$time]['time'] = $time;
$result[$time]['text'] = $text;
}
}
}
echo json_encode($result);
If i echo $time or $text i always get data successfully.
I found what was the problem.
The Simple HTML DOM Parser does not clean up memory in the DOM each time file_get_html or str_get_html is called so it needs to be done explicity each time you have finished with the current DOM.
So I added $html->clear(); at the end of the loop.
Credits: electrictoolbox.com

PHP: Simple HTML DOM Parser - how to get the element which has certain tag name?

In PHP I'm using the Simple HTML DOM Parser class.
I have a HTML file which has multiple and diferents tags.
In this HTML there is an element like this:
<a name="10418"><b> Hospitalist (Family Practitioner)</b></a>
So I would like to find that 'a' element with has name="10418"
I've tried this with no luck, because I only want to get that string.
$html_object = str_get_html($url);
$html_object=$html_object->find('a');
foreach ($html_object as $o) {
$a= $o->find("b");
echo $a[0];
}
Try:
$anchor = $html_object->find('a[name=10418]', 0);
echo $anchor->plaintext;
Working DEMO
Try another library called Tag Parse.It's simple and efficient.
$dom = new TagParse\TagDomRoot($html);
$a = $dom->find('a[name="10418"]');
I think it's fast and cost less memory than simple_html_dom.

Replace an element with Dom Document PHP

I load a html page with PHP Dom Document :
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
I search in my page all "a" elements, and if they realize my condition i need to replace for example My link is beautiful by just My link is beautiful
Here my loop :
$liens = $div->getElementsByTagName('a');
foreach($liens as $lien){
if($lien->hasAttribute('href')){
if (preg_match("/metz2/i", $lien->getAttribute('href'))) {
//HERE I NEED TO REPLACE </a>
}
$cpt++;
}
}
Do you have any ideas ? Suggestions ? Thanks :)
Every time i need to manage DOM with PHP, i use a framework called PHP Simple HTLM DOM parser. (Link here)
It's very easy to use, something like this might work for you:
// Create DOM from URL or file
$html = file_get_html('http://www.page.com/');
// Find all links
foreach($html->find('a') as $element) {
//Do your custom logic here if you need it, for example this extracts the inner contents of the a-tag, and puts it freely.
$inner = $element->innertext;
$element->outertext($inner);
}
//To echo modified html again:
echo $html;
Could be done with preg_replace as well:
$sText = 'Stackoverflow';
$sText = preg_replace( '/<a.*>(.*)<\/a>/', '$1', $sText );
echo $sText;

I want to load specific div form other website in php

I have a problem to load specific div element and show on my page using PHP. My code right now is as follows:
<?php
$page = file_get_contents("http://www.bbc.co.uk/sport/football/results");
preg_match('/<div id="results-data" class="fixtures-table full-table-medium">(.*)<\/div>/is', $page, $matches);
var_dump($matches);
?>
I want it to load id="results-data" and show it on my page.
You won't be able to manipulate the URL to get only a portion of the page. So what you'll want to do is grab the page contents via the server-side language of your choice and then parse the HTML. From there you can grab the specific DIV you are looking for and then print that out to your screen. You could also use to remove unwanted content.
With PHP you could use file_get_contents() to read the file you want to parse and then use DOMDocument to parse it and grab the DIV you want.
Here's the basic idea. This is untested but should point you in the right direction:
$page = file_get_contents('http://www.bbc.co.uk/sport/football/results');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
if ($div->getAttribute('id') === 'content') {
echo $div->nodeValue;
}
}
You should use some html parser. Take a look at PHPQuery, here is how you can do it:
require_once('phpQuery/phpQuery.php');
$html = file_get_contents('http://www.bbc.co.uk/sport/football/results');
phpQuery::newDocumentHTML($html);
$resultData = pq('div#results-data');
echo $resultData;
Check it out here:
http://code.google.com/p/phpquery
Also see their selectors' documentation.

Using PHP to get DOM Element

I'm struggling big time understanding how to use the DOMElement object in PHP. I found this code, but I'm not really sure it's applicable to me:
$dom = new DOMDocument();
$dom->loadHTML("index.php");
$div = $dom->getElementsByTagName('div');
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
Basically what I need is to search the DOM for an element with a particular id, after which point I need to extract a non-standard attribute (i.e. one that I made up and put on with JS) so I can see the value of that. The reason is I need one piece from the $_GET and one piece that is in the HTML based from a redirect. If someone could just explain how I use DOMDocument for this purpose, that would be helpful. I'm really struggling understanding what's going on and how to properly implement it, because I clearly am not doing it right.
EDIT (Where I'm at based on comment):
This is my code lines 4-26 for reference:
<div id="column_profile">
<?php
require_once($_SERVER["DOCUMENT_ROOT"] . "/peripheral/profile.php");
$searchResults = isset($_GET["s"]) ? performSearch($_GET["s"]) : "";
$dom = new DOMDocument();
$dom->load("index.php");
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
$div = $dom->getElementById('currentLocation');
$attr = $div->getAttribute('srckey');
echo "<h1>{$attr}</a>";
?>
</div>
<div id="column_main">
Here is the error message I'm getting:
Warning: DOMDocument::load() [domdocument.load]: Extra content at the end of the document in ../public_html/index.php, line: 26 in ../public_html/index.php on line 10
Fatal error: Call to a member function getAttribute() on a non-object in ../public_html/index.php on line 21
getElementsByTagName returns you a list of elements, so first you need to loop through the elements, then through their attributes.
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
In your case, you said you needed a specific ID. Those are supposed to be unique, so to do that, you can use (note getElementById might not work unless you call $dom->validate() first):
$div = $dom->getElementById('divID');
Then to get your attribute:
$attr = $div->getAttribute('customAttr');
EDIT: $dom->loadHTML just reads the contents of the file, it doesn't execute them. index.php won't be ran this way. You might have to do something like:
$dom->loadHTML(file_get_contents('http://localhost/index.php'))
You won't have access to the HTML if the redirect is from an external server. Let me put it this way: the DOM does not exist at the point you are trying to parse it. What you can do is pass the text to a DOM parser and then manipulate the elements that way. Or the better way would be to add it as another GET variable.
EDIT: Are you also aware that the client can change the HTML and have it pass whatever they want? (Using a tool like Firebug)

Categories