converting filterXPATH result to text [duplicate] - php

This question already has answers here:
Reference - What does this error mean in PHP?
(38 answers)
Closed 7 years ago.
I am trying to extract the title text from an html page and insert it into an object. I am using symphony and php. The result from filterXPATH does not seem to be plain text and instead it is the entire html page and throwing error. I don't know why.
My code is:
$html = $this->file_get_contents_curl("http://www.google.com/");
$urlData = [];
$crawler = new Crawler($html);
$urlData->title = $crawler->filterXPath('//title')->extract('_text');
I see the title text if I do:
return $crawler->filterXPath('//title')->extract('_text');

Try this,
libxml_use_internal_errors(true);
$html = file_get_contents("http://www.google.com/");
$dom1 = new DOMDocument;
$dom1->preserveWhiteSpace = false;
$dom1->loadHTML($html);
$xp = new DOMXPath($dom1);
$xp->registerNamespace("php", "http://php.net/xpath");
$urlData= $xp->query('//title');
foreach($urlData as $title) {
echo $title->textContent;
}

Related

PHP: get parent node where child = 'value' [duplicate]

This question already has answers here:
SimpleXML: Selecting Elements Which Have A Certain Attribute Value
(2 answers)
Closed 7 years ago.
I have this xml file:
<friends>
<friend>
<name>xxx</name>
<pays>France</pays>
</friend>
<friend>
<name>yyy</name>
<country>France</country>
</friend>
<friend>
<name>zzz</name>
<country>USA</country>
</friend>
</friends>
To get my data, I am using this php code:
$xml = simplexml_load_file('friends.xml');
$friendsXML = $xml->friend;
Which works fine, but returns all of the friends.
Now I want to retrieve only friends who are from France:
country = 'france'.
Can anyone help me doing that?
I'd use XPath for things like this. Try:
$res = $xml->xpath('friend[country = "france"]');
echo $res[0];

Get the table details from a URL [duplicate]

This question already has answers here:
PHP Web scraping of Javascript generated contents [duplicate]
(2 answers)
Closed 7 years ago.
I just want to get the table details from the HTML and for example the URL is,
$url="https://www.centralbank.org.bz/rates-statistics/exchange-rates";
From this,I need to get the currency rate table in this url and also remove all the dirty data.
Please help me,
Many thanks
Try this code ::
$url = 'https://www.centralbank.org.bz/rates-statistics/exchange-rates';
$content = file_get_contents($url);
$first_step = explode( '<table id="currencyTable">' , $content );
$second_step = explode("</table>" , $first_step[1] );
echo $second_step[0];
You should use Simple HTML DOM,
An example may be helpful to you:
<?php
include('simple_html_dom.php');
$url = 'https://www.phpbb.com/community/viewtopic.php?f=46&t=543171';
$html = file_get_html($url);
$links = array();
foreach($html->find('a[class="postlink"]') as $a) {
$links[] = $a->href;
}
print_r($links);
?>

how do you get the id and class of the elements parsed with php [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
DOMDocument::load - PHP - Getting attribute value
I have many div tags pulled from a string through php, each of them having a unique id and a subjective class. I am trying to get the id and class of each of the divs but am not too sure how I would do this.
HTML:
<div id='x1y1' class = 'classname'></div><div id = 'x2y1' class = 'classname1'>
so far I have tried
$html = new DOMDocument();
$html->loadHTML($boardDataStripSlashes);
$elements = $html->getElementsByTagName('div');
but have not been able to find anything on how to get the actual id's and classes of the selected elements.
You need to use DOMElement::getAttribute to retrieve attributes of elements.
foreach($elements as $element) {
$id = $element->getAttribute('id');
$className = $element->getAttribute('class');
// ...
}

How to get wrapping element of a string using regex [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
get wrapping element using preg_match php
I want to get the element that wraps a specified string, so example:
$string = "My String";
$code = "<div class="string"><p class='text'>My String</p></div>";
So how am i able to get <p class='text'></p> that wraps the string by matching it using regex pattern.
Using the DOM Classes of PHP you are able to do so.
$html = new DomDocument();
// load in the HTML
$html->loadHTML('<div class="string"><p class=\'text\'>My String</p></div>');
// create XPath object
$xpath = new DOMXPath($html);
// get a DOMNodeList containing every DOMNode which has the text 'My String'
$list = $xpath->evaluate("//*[text() = 'My String']");
// lets grab the first item from the list
$element = $list->item(0);
now we have the whole <p>-tag. But we need to remove all child nodes. Here a little function:
function remove_children($node) {
while (($childnode = $node->firstChild) != null) {
remove_children($childnode);
$node->removeChild($childnode);
}
}
let's use this function:
// remove all the child nodes (including the text 'My String')
remove_children($element);
// this will output '<p class="text"></p>'
echo $html->saveHTML($element);

Where can I find a DOM parser? [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I need to read some content from a html page.
I've tested simple_html_dom, but it simply isn't usable for what I need it for.
I need something like this (pseaudo syntax based on simple_html_dom):
$html = file_get_contents($url);
$html_obj = parse_html($html);
$title = $html_obj->get('title');
$meta1 = $html_obj->get('meta[name=description]', 'innertext']; //text only
$meta2 = $html_obj->get('meta[name=keywords]', 'innertext']; // text only
$content = $html_obj->get('div[id=section_a]', outerText); //html code
I've tested simple_html_dom in so many ways, and only managed to get parts of what I need.
It simply isn't "simple".
I've also tested PHP DOMDocument::loadHTML, but it I run in to problems dealing with inline <script>.
Are there any php librarys that makes it as easy to get content as in jQuery?
Update
One of my problems is a a piece of 3rd party javascript from an add agency:
<script language="javascript" type="text/javascript">
<!--
if (window.adgroupid == undefined) {
window.adgroupid = Math.round(Math.random()*100000);
}
document.write('<scr'+'ipt language="javascript1.1" type="text/javascript" src="http://adserver.adtech.de/addyn|3.0|994|3159100|0|-1|size=980x150|ADTECH;loc=100;target=_blank;key=startside,kvinner, kvinnesak, bryllup, graviditet, mamma, kosmetikk, markedsplass, dagbok, feminisme;grp='+window.adgroupid+';misc='+new Date().getTime()+'"></scri'+'pt>');
//-->
</script>
Even if I change <scr'+'ipt to <script it gives me invalid javascript code.
You can use DOMDocument with DOMXPath ..
<?php
$DOMDocument = new DOMDocument();
//libxml_use_internal_errors ( true ) ;
$DOMDocument->loadHTMLFile ( 'http://www.iconfinder.com' ) ;
$XPath = new DOMXPath( $DOMDocument );
$title = $DOMDocument->getElementsByTagName('title')->item(0)->nodeValue;
echo $title ;
#$desc = $XPath->query('//meta[#name=description]')->item(0)->getAttribute ( 'content' );
#$keywords = $XPath->query('//meta[#name=keywords]')->item(0)->getAttribute( 'content' );
#$content = $XPath->query('//div[#id=section_a]')->item(0)->nodeValue;
PHPQuery (http://code.google.com/p/phpquery/) allows you to manipulate HTML through a jquery like syntax

Categories