I just try to create small simplephpdome
target is
<ul id=filter><li><a href="url1"></li><li><a href="url2"></li></ul>
<ul id=filter><li><a href="url3"></li><li><a href="url4"></li></ul>
How to get just first li result for every ul?
I have try this
$html = file_get_html($url);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$first_list_links = $xpath->evaluate('//ul[#id="filter"]/li/a');
foreach($first_list_links as $links) {
echo $dom->saveHTML($links);
}
but all li still included
You can achieve this using the PHP Simple HTML DOM Parser :
PHP
$html = file_get_html('<ul class="filter"><li><a href="url1"></li><li><a href="url2"></li></ul><ul class="filter"><li><a href="url3"></li><li><a href="url4"></li></ul>');
$urls = [];
foreach($html->find('.filter') as $element) {
$url = $element->firstChild()->find('a', 0)->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
should output :
url1
url2
Related
I convert an atom feed into RSS using atom2rss.xsl. Works fine.
Then, using DOMDocument, I try to get the post title and URL:
$feed = new DOMDocument();
$feed->loadHTML('<?xml encoding="utf-8" ?>' . $html);
if (!empty($feed) && is_object($feed) ) {
foreach ($feed->getElementsByTagName("item") as $item){
echo 'url: '. $item->getElementsByTagName("link")->item(0)->nodeValue;
echo 'title'. $item->getElementsByTagName("title")->item(0)->nodeValue;
}
return;
}
But the post URL is empty.
See this eval which contains HTML. What am I doing wrong? I suspect I am not getting the link tag properly via $item->getElementsByTagName("link")->item(0)->nodeValue.
I think the problem is that there are several <link> elements in each item and the one (I think) your interested in is the one with rel="self" as an attribute. The quickest way (without messing around with XPath) is to loop over each <link> element checking for the right rel value and then take the href attribute from that...
if (!empty($feed) && is_object($feed) ) {
foreach ($feed->getElementsByTagName("item") as $item){
$url = "";
// Look for the 'right' link tag and extract URL from that
foreach ( $item->getElementsByTagName("link") as $link ) {
if ( $link->getAttribute("rel") == "self" ) {
$url = $link->getAttribute("href");
break;
}
}
echo 'url: '. $url;
echo 'title'. $item->getElementsByTagName("title")->item(0)->nodeValue;
}
return;
}
which gives...
url: https://www.blogger.com/feeds/2984353310628523257/posts/default/1947782625877709813titleExtraordinary Genius - Cp274
function get_links($link)
{
$ret = array();
$dom = new DOMDocument();
#$dom->loadHTML(file_get_contents($link));
$dom->preserveWhiteSpace = false;
$links = $dom->getElementsByTagName('a');
foreach ($links as $tag){
$ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;
}
return $ret;
}
print_r(get_links('http://www.google.com'));
OR u can use DOMXpath
$html = file_get_contents('http://www.google.com');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// take all links
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo $url.'
';
good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>
i want to extract html inside <p class="js-swt-text" without simplehtmldom
bellow is the working example i did with simplehtmldom:
<?php
include('simple_html_dom.php');
$html = file_get_html("http://blabla.com");
foreach($html->find('.js-swt-text') as $ret) {
echo $ret;
}
?>
please help me to do with domdocument
<?php
$html = file_get_contents("http://blabla.com");
and echo html from every <p class="js-swt-text"
You can use:
$html = file_get_contents("http://blabla.com");
$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$items = $xpath->query('//p[#class="js-swt-text"]');
foreach ($items as $item) {
echo $item->textContent . "\n";
}
I'm working with a DOM parser and I'm having issues. I'm basically trying to grab the href within the tag that only contain the class ID of 'thumbnail '. I've been trying to print the links on the screen and still get no results. Any help is appreciated. I also turned on error_reporting(E_ALL); and still nothing.
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;
I also tried this but still had the same result of NOTHING:
include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;
You were almost there:
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(#class),' '),' thumbnail ')]");
var_dump($hrefs);
Gives:
class DOMNodeList#28 (1) {
public $length =>
int(25)
}
25 matches, I'd call it success.
This code would probably work:
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[#class="thumbnail"]');
foreach($hyperlinks as $hyperlink) {
echo $hyperlink->getAttribute('href'), '<br>;'
}
if you're using simple_html_dom, why are you doing all these superfluous things? It already wraps the resource in everything you need -- http://simplehtmldom.sourceforge.net/manual.htm
include('simple_html_dom.php');
// set up:
$html = new simple_html_dom();
// load from URL:
$html->load_file('http://www.reddit.com/r/funny');
// find those <a> elements:
$links = $html->find('a[class=thumbnail]');
// done.
echo $links;
Tested it and made some changes - this works perfect too.
<?php
// load the url and set up an array for the links
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$links = array();
// loop thru all the A elements found
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$class = $link->getAttribute('class');
// Check if the URL is not empty and if the class contains thumbnail
if(!empty($url) && strpos($class,'thumbnail') !== false) {
array_push($links, $url);
}
}
// Print results
print_r($links);
?>
I have to use crawlers for my project.
I've used simple dom class to get all the links from a page.
Now I want to filter only those links which are of the form "/questions/3904482/<title of the question".
Here's my attempt:
include_once('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('http://stackoverflow.com/questions?sort=newest');
$pat='#^/question/([0-9]+)/#';
foreach($html->find('a') as $link)
{
echo preg_match($pat, $link->href);
{
echo $link->href."<br>";
}
}
All the links get filtered out.
you say the url is question*s* but your pattern shows no s
Also, it looks like you should be using if not echo
include_once('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('http://stackoverflow.com/questions?sort=newest');
$pat='#^/questions/([0-9]+)/#';
foreach($html->find('a') as $link)
{
if ( preg_match($pat, $link->href) )
{
echo $link->href."<br>";
}
}
You can take advantage of DOM and XPath:
<?php
$dom = new DOMDocument;
#$dom->loadHTMLFile('http://stackoverflow.com/questions?sort=newest');
$xpath = new DOMXPath($dom);
$questions = $xpath->query("//a[contains(#href, '/questions/') and not(contains(#href, '/tagged/')) and not(contains(#href, '/ask'))]");
foreach ($questions as $question) {
print "{$question->getAttribute('href')} => {$question->nodeValue}";
}