Get part of get html code from file_get_content [duplicate] - php

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
I need a part of code html by a file from file_get_contents(url)
I do
$variableee = file_get_contents("http://url.com/path/to/file");
echo $variableee;
Ok now in Variableee I've all the url's code.
In this code there is a part that I need. I need a table with class name "table".
Es.
<div>text</div>
<span> text </span>
<table class="table">
<tr><td>Text that I need</td></tr>
</table>
How I can get it?
Sorry for bad english.

If you want the data within PHP itself use the built-in DOM parser,
<?php
$doc = new DOMDocument();
$doc->loadHTML($variableee);
$arr = $doc->getElementsByTagName("table"); // DOMNodeList Object
foreach($arr as $item) { // DOMElement Object
echo $item->nodeValue;
}
?>
EDIT: Parse using the class name with DOMXPath
$doc = new DOMDocument();
$doc->loadHTML($variableee);
$classname = 'table';
$a = new DOMXPath($doc);
$spans = $a->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' $classname ')]");
foreach($spans as $item) { // DOMElement Object
echo $item->nodeValue;
}

Related

Scraping tags that have spaces in class names php [duplicate]

This question already has an answer here:
Simple HTML DOM spaces into class
(1 answer)
Closed 4 years ago.
Here I have an html div tag with class name that has space in it.
<div data-marker="music-track-title" class="music-track__title text--single-line size-e"> No Vanguard Revival (Radio 1 Session, 16 May 2018) </div>
I tried getting information out of this tag with php curl and dom help, but it just returns nothing.
Heres code that i have written so far:(Not working)
<?php
include_once 'includes/db.inc.php';
include_once 'includes/simple_html_dom.php';
include_once 'includes/curl_init.php';
$yesterday = date("Y/m/d", strtotime( '-1 days' ) );
$a=NULL;
$html=curl_get('https://www.bbc.co.uk/music/tracks/find/radio1/'.$yesterday.'/12AM');
$dom = new DOMDocument();
$dom = str_get_html($html);
$myList=NULL;
$songs=$dom->find('.music-track__top music-track__top--list');
?>
How to get information from div tag with spaces in its class name. Using php curl, Dom.
You should try this:
$finder = new DomXPath($dom);
$classname="music-track__title";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' $classname ')]");
$node = null;
foreach ($nodes as $element) {
if ($element->getAttribute('class') == "music-track__title text--single-line size-e") {
$node = $element;
break;
}
}
Variable $elements will contains the list of tags that have spaces in class name.

How can I get a specific div from website? [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 5 years ago.
I am trying get a specific div element (i.e. with attribute id="vung_doc") from a website, but I get almost every element. Do you have any idea what's wrong?
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = true;
// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('http://lightnovelgate.com/chapter/epoch_of_twilight/chapter_300');
$xpath = new DOMXPath($doc);
$query = "//*[#class='vung_doc']";
$entries = $xpath->query($query);
var_dump($entries->item(0)->textContent);
Actually, it appears that that one element, which has both id and class attributes with value vung_doc, has many paragraphs inside its text content. Perhaps you are thinking each paragraph should be in its own div element.
<div id="vung_doc" class="vung_doc" style="font-size: 18px;">
<p></p>
"Mayor song..."
In the screenshot at the bottom of this post, I added an outline style to that element, to show just how many paragraphs are within that element.
If you wanted to separate the paragraphs, you could use preg_split() to split on any new line characters:
$entries = $xpath->query($query);
foreach($entries as $entry) {
$paragraphs = preg_split("/[\r\n]+/s",$entry->textContent);
foreach($paragraphs as $paragraph) {
if (trim($paragraph)) {
echo '<b>paragraph:</b> '.$paragraph;
break;
}
}
}
See a demonstration of this in this playground example. Note that before loading the HTML file, libxml_use_internal_errors() is called, to suppress the XML errors:
libxml_use_internal_errors(true);
Screenshot of the target div element with outline added:
Change
$query = "//*[#class='vung_doc']";
to
$query = "//*[#id='vung_doc']";

String between php [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 6 years ago.
I've got something like this:
$string = '<some code before><div class="abc">Something written here</div><some other code after>'
What I want is to get what is within the div and output it:
Something written here
How can I do that in php? Thanks in advance!
You would use the DOMDocument class.
// HTML document stored in a string
$html = '<strong><div class="abc">Something written here</div></strong>';
// Load the HTML document
$dom = new DOMDocument();
$dom->loadHTML($html);
// Find div with class 'abc'
$xpath = new DOMXPath($dom);
$result = $xpath->query('//div[#class="abc"]');
// Echo the results...
if($result->length > 0) {
foreach($result as $node) {
echo $node->nodeValue,"\n";
}
} else {
echo "Empty result set\n";
}
Read up on the expression syntax for XPath to customize your DOM searches.

Using cURL and dom to scrape data with php [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
Hi i am using cURL to get data from a website i need to get multiple items but cannot get it by tag name or id. I have managed to put together some code that will get one item using a class name by passing it through a loop i then pass it through another loop to get the text from the element.
I have a few problems here the first is i can see there must be a more convenient way of doing this. The second i will need to get multiple elements and stack together ie title, desciption, tags and a url link.
# Create a DOM parser object and load HTML
$dom = new DOMDocument();
$result = $dom->loadHTML($html);
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), 'classname')]");
$tmp_dom = new DOMDocument();
foreach ($nodes as $node)
{
$tmp_dom->appendChild($tmp_dom->importNode($node,true));
}
$innerHTML = trim($tmp_dom->saveHTML());
$buffdom = new DOMDocument();
$buffdom->loadHTML($innerHTML);
# Iterate over all the <a> tags
foreach ($buffdom->getElementsByTagName('a') as $link)
{
# Show the <a href>
echo $link->nodeValue, "<br />", PHP_EOL;
}
I want to stick with PHP only.
I wonder if your problem is in the line:
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), 'classname')]");
As it stands, this literally looks for nodes that belong to the class with the name 'classname' - where 'classname' is not a variable, it's the actual name. This looks like you might have copied an example from somewhere - or did you literally name your class that?
I imagine that the data you are looking may not be in such nodes. If you could post a short piece of the actual HTML you are trying to parse, it should be possible to do a better job guiding you to a solution.
As an example, I just made the following complete code (based on yours, but adding code to open the stackoverflow.com home page, and changing 'classname' to 'question', since there seemed to be a lot of classes with question in the name, so I figured I should get a good harvest. I was not disappointed.
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "http://stackoverflow.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
//print_r($output);
$dom = new DOMDocument();
#$dom->loadHTML($output);
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), 'question')]");
print_r($nodes);
$tmp_dom = new DOMDocument();
foreach ($nodes as $node)
{
$tmp_dom->appendChild($tmp_dom->importNode($node,true));
}
$innerHTML.=trim($tmp_dom->saveHTML());
$buffdom = new DOMDocument();
#$buffdom->loadHTML($innerHTML);
# Iterate over all the <a> tags
foreach($buffdom->getElementsByTagName('a') as $link) {
# Show the <a href>
echo $link->nodeValue, PHP_EOL;
echo "<br />";
}
?>
Resulted in many many lines of output. Try it - the page is at http://www.floris.us/SO/scraper.php
(or paste the above code into a page of your own). You were very, very close!
NOTE - this doesn't produce all the output you want - you need to include other properties of the node, not just print out the nodeValue, to get everything. But I figure you can take it from here (again, without actual samples of your HTML it's impossible for anyone else to get much further than this in helping you...)

parse simplexml based on child value [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
PHP xpath - find element with a value and also get elements before and after element
I have the following xml:
<Table>
<ID>100</ID>
<Name>Fridge</Name>
<Description>A cool refrigerator</Description>
</Table>
<Table>
<ID>100</ID>
<Name>Fridge</Name>
<Description>Latest Refrigerator</Description>
</Table>
<Table>
<ID>200</ID>
<Name>Fridge</Name>
<Description>Another refrigerator</Description>
</Table>
In the example above, I would like to get the child values of Name and Description for the nodes with ID=100. There are around 1000 Table nodes in the xml file.
How can i parse the entire xml and get the Name and Description values for only the nodes with ID equal to 100 ?
So far, i have tried the following code, which could not give what i wanted:
$source = 'Tables.xml';
$xmlstr = file_get_contents($source);
$sitemap = new SimpleXMLElement($xmlstr);
$sitemap = new SimpleXMLElement($source,null,true);
foreach($sitemap as $url) {
if($url->ID == '100')
{
echo 'Name: '.(string)$url->Name.', Description: '.(string)$url->Description.'<br>';
}
}
This should be pretty straightforward if you get all Table tags and loop over them:
// Assuming you already loaded the XML into the SimpleXML object $xml
$names_and_descriptions = array();
foreach ($xml->Table as $t) {
if ($t->ID == 100) {
echo $t->Name . " " . $t->Description . "\n";
// Or stick them into an array or whatever...
$names_and_descriptions[] = array(
'name'=>$t->Name,
'description'=>$t->Description
);
}
}
var_dump($names_and_descriptions);

Categories