So I tried to get the "Current Song" as echo in a PHP file using XPath.
I tried to echo the file_get_content and it returns the webpage I'm trying to get the content from, however it seems that I can't filter the webpage content using XPath. It should echo only the Current Song.
This is what I've tried:
<?php
libxml_use_internal_errors(false);
$html = file_get_contents('http://185.40.20.83/radio/8000/');
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$node = $xpath->query('/html/body/div/div[1]/div[2]/table/tbody/tr[10]/td[2]')->item(0);
echo $node->textContent;
?>
I'm trying this for over one hour and I'm loosing hope because I don't see what's the problem...
Try changing your $node to :
$node = $xpath->query('//table//tr[./td[text()="Current Song:"]]/td[2]')->item(0);
or
$node = $xpath->query('//table//tr[./td[text()="Current Song:"]]/td[2]');
echo $node[0]->nodeValue;
Output:
Chmst - Pump Up The Jam
Related
I'm doing a DOMDocument where i get information from a website, i'm trying to get the text inside of the <p> </p> , the code works fine but the fact is that the website has many <P> codes so i get all the information, i just want the information of the first <p>,
the <p> has not id classes so it doesn't help please check the code and help me to know how to get only the first <p>
$html = file_get_contents('http://example.com');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('p');
forsearch ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('') , '<br>';
$goal = $link->nodeValue;
}
the code works fine but it shows all the text, i just need the first <p> not all.
To get only the first paragraph element you can do it like that:
$doc = new \DOMDocument();
$doc->loadHTML(file_get_contents('http://example.com'));
$paragraphs = $doc->getElementsByTagName('p');
echo "Content of first paragraph: {$paragraphs->item(0)->nodeValue}\n";
I'm calling some wikipedia content two different way:
$html = file_get_contents('https://en.wikipedia.org/wiki/Sans-serif');
The first one is to call the first paragraph
$dom = new DomDocument();
#$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;
echo $p;
The second one is to call the first paragraph after a specific $id
$dom = new DOMDocument();
#$dom->loadHTML($html);
$p=$dom->getElementById('$id')->getElementsByTagName('p')->item(0);
echo $p->nodeValue;
I'm looking for a third way to call all the first part.
So I was thinking about calling all the <p> before the id or class "toc" which is the id/class of the table of content.
Any idea how to do that?
If you're just looking for the intro in plain text, you can simply use Wikipedia's API:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Sans-serif
If you want HTML formatting as well (excluding inner images and the likes):
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&titles=Sans-serif
You could use DOMDocument and DOMXPath with for example an xpath expression like:
//div[#id="toc"]/preceding-sibling::p
$doc = new DOMDocument();
$doc->load("https://en.wikipedia.org/wiki/Sans-serif");
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[#id="toc"]/preceding-sibling::p');
foreach ($nodes as $node) {
echo $node->nodeValue;
}
That would give you the content of the paragraphs preceding the div with id = toc.
I've exported my Firefox bookmarks as html so I can download my extensive music collection onto my phone, my problem is there is no easy way that I know of.
My intentions is to use PHP to parse the html into an array of the URLs
Heres what the html looks like
<DT>Don Diablo - Knight Time (Official Music Video) - YouTube
How would I do this?
If you put in $html a correct html string, you could do it parsing the string with DOMDocument and selecting the href attributes with XPath.
<?php
$html = '<DT>Don Diablo - Knight Time (Official Music Video) - YouTube';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
$links_array = [];
foreach($nodeList as $node){
$links_array[] = $node->nodeValue;
}
echo "<pre>";
print_r($links_array);
echo "</pre>";
The output here is:
Array
(
[0] => https://www.youtube.com/watch?v=Ue8PpA557Bc
)
$doc = new DOMDocument();
$doc->loadHTML($bookmarks);
foreach ($doc->getElementsByTagName("a") as $node) {
$urls[] = $node->getAttribute("href");
}
I'm building a php script to transfer selected contents of an xml file to an sql database..
One of the hardcoded XML contents is formatted like this:
<visualURL>
id=18144083|img=http://upload.wikimedia.org/wikipedia/en/8/86/Holyrollernovacaine.jpg
</visualURL>
And I'm looking for a way to just get the contents of the URL (all text after img=).
$Image = $xpath->query("substring-after(/Playlist/PlaylistEntry[1]/visualURL[1]/text(), 'img=')", $element)->item(0)->nodeValue;
Displays a property non-object error on my php output.
There must be another way to just extract the URL contents using XPath that I want, no?
Any help would be greatly appreciated!
EDIT:
Here is the minimum code
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->loadXML('<Playlist>
<PlaylistEntry>
<visualURL>
id=12582194|img=http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
</visualURL>
</PlaylistEntry>
</Playlist>');
$xpath = new DOMXpath($xmlDoc);
$elements = $xpath->query("/Playlist/PlaylistEntry[1]");
if (!is_null($elements))
foreach ($elements as $element)
$Image = $xpath->query("substring-after(/Playlist/PlaylistEntry[1]/visualURL[1]/text(), 'img=')", $element)- >item(0)->nodeValue;
print "Finished Item: $Image";
?>
EDIT 2:
After some research I believe I must use
$xpath->evaluate
instead of my current use of
$xpath->query
see this link
Same XPath query is working with Google docs but not PHP
I'm not exactly sure how to do this yet.. but i will investigate more in the morning. Again, any help would be appreciated.
You're in right direction. Use DOMXPath::evaluate() for xpath expression that doesn't return node(s) like substring-after() (it returns string as documented in the linked page). The following codes prints expected output :
$xmlDoc = new DOMDocument();
$xml = <<<XML
<Playlist>
<PlaylistEntry>
<visualURL>
id=12582194|img=http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
</visualURL>
</PlaylistEntry>
</Playlist>
XML;
$xmlDoc->loadXML($xml);
$xpath = new DOMXpath($xmlDoc);
$elements = $xpath->query("/Playlist/PlaylistEntry");
foreach ($elements as $element) {
$Image = $xpath->evaluate("substring-after(visualURL, 'img=')", $element);
print "Finished Item: $Image <br>";
}
output :
Finished Item: http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
Demo
I'm trying to write a document that will go through a webpage that was poorly coded and return the title element. However, the person who made the website I plan on scraping did not use ANY classes, simply just div elements. Heres the source webpage I'm trying to scrape:
<tbody>
<tr>
<td style = "...">
<div style = "...">
<div style = "...">TEXT I WANT</div>
</div>
</td>
</tr>
</tbody>
and when I copy the xpath in chrome I get this string:
/html/body/table/tbody/tr[2]/td[3]/table/tbody/tr[1]/td/div/div[3]
I'm having trouble figuring out where I put that string in an xpath query.
If not an xpath query maybe I should do a preg_match?
I tried this:
$location = '/html/body/table/tbody/tr[2]/td[3]/table/tbody/tr[1]/td/div/div[3]';
$html = file_get_contents($URL);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query($location) as $node) {
echo $node, "\n";
}
but nothing is printed to the page.
Thanks.
EDIT: Full sourse code here:
http://pastebin.com/K5tZ4dFH
EDIT2: Cleaner code screen shot: http://i.imgur.com/lWKheBy.png
From looking at your source, try the following:
$html = file_get_contents($URL);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query("//div[contains(#style, 'left:20px')]");
foreach ($nodes as $node) {
echo $node->textContent;
}
It looks like you want the text just before the first </div>, so this regex will find that:
[^<>]+(?=<\/div>)
Here's a live demo.