This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Using DOMDocument to extract from HTML document by class
(3 answers)
Closed 9 years ago.
I'm scraping an html page that has X amount of instances of the element class="page-title" inside a div element id="row-1"
So we have something like:
<div id="row-1">
<div class="page-title">
<span><h4><a>text I want to grab</a></h4></span>
</div>
</div>
There could be 1,2,3,10 of these rows. Could anyone help explain how I can grab every instance of the page title if there are multiple rows?
Whatever you do, don't use a regex! HE COMES
Instead, use a parser:
$dom = new DOMDocument();
$dom->loadHTML($your_html_source_here);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//*[#id='row-1']/div[#class='page-title']");
Related
This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 9 years ago.
I'm trying to keep certain parts of HTML elements inside a dom that was loaded by DomDocument and CURL.
Problem is that when I do xpath query and retireve nodeValue it omits the HTML elements.
Below is the code. Is there a way to retrieve HTML for that particular node?
$location = $xpath->query("//div[#id='location']/label");
echo $location->item(0)->nodeValue."<br>";
$dom = new DOMDocument();
$dom->loadHTML('<html><div id="location"><label><h1>Hello <b>world</b></h1></label></div></html>');
$xpath = new DOMXPath($dom);
$location = $xpath->query("//div[#id='location']/label/*");
var_dump($dom->saveXML($location->item(0)));
Output:
string(27) "<h1>Hello <b>world</b></h1>"
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
DOMDocument for parsing HTML (instead of regex)
(2 answers)
Closed 9 years ago.
I want to get the text from a div in this page:
<html>
<head>Hello world!</head>
<body>
This is a test!<br>Hello man!<br>
<div class="special">
I want this text
</div>
</body>
</html>
I am using this code to get the content without any tags:
echo strip_tags(file_get_contents('http://website.com'));
However, I would like only to get the content from the
<div class="special">
from that page. Is that possible in PHP?
Use a HTML parser to parse the object you read using file_get_contents(). This question lists a bunch of parsers you could use
This question already has answers here:
Get meta information, title and all images of any webpage using php
(2 answers)
Get title of website via link
(10 answers)
Closed 9 years ago.
I wrote the following code to get the title of a webpage. But the code doesn't work and output this error message: Object of class DOMElement could not be converted to string
$html = file_get_contents("http://mysmallwebpage.com/");
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('title');
foreach ($links as $title)
{
echo (string)$title."<br>";
}
Could you please show me with an example?
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to get Anchor text using DomDocument?
Getting node's text in PHP DOM
I have a script that finds all the anchor tags of a certain class in a DOMDocument. I am looking to echo the text that is contained within the <a>"....."</a> tags.
You can access DOMText node directly using XPath:
$xpath = new DOMXPath($dom_document);
$node = $xpath->query('//a/text()')->item(0);
echo $node->textContent; // text
You can use preg_match(). Here is an example:
$link = 'www.CoursesWeb.net';
if(preg_match('/\<a([^\>]*)\>(.*?)\<\/a\>/i', $link, $mc)) {
echo $mc[2]; // www.CoursesWeb.net
}
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Select xml node by attribute in php
Is there any function like getElementByAttribute in PHP? If no, how do I create a workaround?
E.g.
<div class="foo">FOO!</div>
How do I match that element?
You can use XPath:
$xpath = new DOMXPath($document);
$results = $xpath->query("//*[#class='foo']");
Here's a demo.