Fetch data from site using php and put in an array

Fetch data from site using php and put in an array - php

<div>A/C:front<span style="color:red;margin:8px">/
</span>Anti-Lock Brakes<span style="color:red;margin:8px">/
</span>Passenger Airbag<span style="color:red;margin:8px">/
</span>Power Mirrors<span style="color:red;margin:8px">/
</span>Power Steering<span style="color:red;margin:8px">/
</span>Power Windows<span style="color:red;margin:8px">/
</span>Driver Airbag<span style="color:red;margin:8px">/
</span>No Accidents<span style="color:red;margin:8px">/
</span>Power Door Locks<span style="color:red;margin:8px">/</span>
</div>
Appears like this on website :
A/C:front/Anti-Lock Brakes/Passenger Airbag/Power Mirrors/Power Steering/Power Windows/Driver Airbag/No Accidents/Power Door Locks/
I used $content = file_get_contents('url'); and now i need to shift through the data.
I need to fetch each one of the options above and put them in an array or something like :
$option = ("A/C:front","Anti-Lock Brakes","Passenger Airbag",....);
Any idea how to do this using php ?

With the source code everything is easier:
<?php
$dom = new DOMDocument;
#$dom->loadHTMLFile('http://www.sayuri.co.jp/used-cars/B37659-Nissan-Tiida%20Latio-japanese-used-cars');
$xpath = new DOMXPath($dom);
$nodes = iterator_to_array($xpath->query('//h4/following-sibling::div')->item(0)->childNodes);
$items = array_map(function ($node) {
return $node->nodeValue;
}, array_filter($nodes, function ($node) {
return $node->nodeValue != '/';
}));
var_dump($items);
This gave me the following:
array(9) {
[0]=>
string(9) "A/C:front"
[2]=>
string(16) "Anti-Lock Brakes"
[4]=>
string(16) "Passenger Airbag"
[6]=>
string(13) "Power Mirrors"
[8]=>
string(14) "Power Steering"
[10]=>
string(13) "Power Windows"
[12]=>
string(13) "Driver Airbag"
[14]=>
string(12) "No Accidents"
[16]=>
string(16) "Power Door Locks"
}
You might want to use array_values() on $items to reset the indexes. That's all!

Sounds like you need DOMDocument. Specifically, the getElementsByTagName function. So using your example, I suggest this. Please adjust to suit your needs:
// Get the contents of the URL.
$content = file_get_contents('url');
// Parse the HTML using `DOMDocument`
$dom = new DOMDocument();
#$dom->loadHTML($content);
// Search the parsed DOM structure for `span` elements.
$option = array();
foreach($dom->getElementsByTagName('span') as $span){
$option[] = $span->nodeValue;
}
// Dumps the values in `option` for review.
echo '<pre>';
print_r($option);
echo '</pre>';

Related

Removing link from Unordered list as string in array

Using PHP, I would like to remove all the links in an unordered list and put them in an array. So the output would be: array[0]='Benefits', array[1]='Cost Savings', etc.
<ul>
<li>Benefits</li>
<li>Cost Savings</li>
<li>Member listing</li>
</ul>
Using; preg_match_all('/<a href=\"(.*?)\"[.*]?>(.*?)<\/a>/i', $content, $matches);
I get:
array(3) { [0]=> array(3) { [0]=> string(24) "Benefits" [1]=> string(28) "Cost Savings" [2]=> string(30) "Member listing" } [1]=> array(3) { [0]=> string(1) "#" [1]=> string(1) "#" [2]=> string(1) "#" } [2]=> array(3) { [0]=> string(8) "Benefits" [1]=> string(12) "Cost Savings" [2]=> string(14) "Member listing" } }
But i need to put it into one array.

To fetch the links you can leverage domdocument and domxpath
$html = '<html><body><ul>
<li>Benefits</li>
<li>Cost Savings</li>
<li>Member listing</li>
</ul></body></html>';
$dom = new DOMDocument();
$dom->loadHTML( $html ); // loads the html into the class
$xpath = new DOMXPath( $dom );
$items = $xpath->query('*/ul/li/a'); // matches any elements in this order
$array = array();
foreach( $items as $item )
{
$array[] = $dom->saveHTML( $item ); // using the parent document, get just a single elements html
}
// Array
// (
// [0] => Benefits
// [1] => Cost Savings
// [2] => Member listing
// )

Cannot get html attribute using PHP Simple Html DOM

I am tryng to get the ,,sold" info from eBay listing- https://www.ebay.co.uk/itm/Box-With-Tail-Pipe-Rear-Back-Silencer-Fits-Citroen-C2-C3-I-C3-Pluriel-GCN499/254292997729?hash=item3b350b3661:g:clEAAOSwnhldLB4J.
Here is the screenshot:
As you can see I want to get ,1 sold" text on the upper right corner of the screen. I am using the class ,,vi-txt-underline" to get it, however it is not working. Does anyone know how this can be done, using other attribute or something different? Here is the code:
$sold = $html->find(".vi-text-underline", 0);
if($sold != null){
$item['sold'] = $sold->find("a", 0)->plaintext;
}else{
$item['sold'] = '';
["tag"]=>
string(4) "text"
["attr"]=>
array(0) {
}
["children"]=>
array(0) {
}
["nodes"]=>
array(0) {
}
["parent"]=>
*RECURSION*
["_"]=>
array(1) {
[4]=>
string(6) "1 sold"
The above is part of the debugged $sold variable.
I am using an array $item[] because I am also searching for more info before this part of the code.

get page contents
$url = "https://www.ebay.co.uk/itm/Box-With-Tail-Pipe-Rear-Back-Silencer-Fits-Citroen-C2-C3-I-C3-Pluriel-GCN499/254292997729?hash=item3b350b3661:g:clEAAOSwnhldLB4J";
$content = file_get_contents($url);
find what you want
echo strpos($content,'1 sold');

Getting data from XML

I am struggling with reading XML file using PHP.
The XML I want to use is here:
http://www.gdacs.org/xml/rss.xml
Now, the data I am interested are the "item" nodes.
I created the following function, which gets the data:
$rawData = simplexml_load_string($response_xml_data);
foreach($rawData->channel->item as $value) {
$title = $value->title;
....
this works fine.
The nodes with the "gdcs:xxxx" were slightly more problematic, but I used the following code, which also works:
$subject = $value->children('dc', true)->subject;
Now the problem I have is with the "resources" node,
Basically the stripped down version of it would look like this:
<channel>
<item>
<gdacs:resources>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
</gdacs:resources>
</item>
</channel>
How in this case would I get the resources? I was able to get always just the first resource and only the title of it. What I would like to do is get all the resources items, which have "type" of a particular value and get their URL.

Running through XML the regular path, is , from my experience, slow and excruciating.
Have a look into XPath -> it's a way to extract data from XML through selectors ( similar to CSS selectors )
http://php.net/manual/en/simplexmlelement.xpath.php
You can select elements by their attributes similar to CSS
<?php
$xmlStr = file_get_contents('some_xml.xml');
$xml = new SimpleXMLElement($xmlStr);
$items = $xml->xpath("//channel/item");
$urls_by_item = array();
foreach($items as $x) {
$urls_by_item [] = $x->xpath("//gdacs:resources/gdacs:resource[#type='image']/#url");
}

Consider using the node occurrence of xpath with square brackets [] to align urls with corresponding titles. A more involved modification of #Daniel Batkilin's answer, you can incorporate both data pieces in an associative multidimensional array, requiring nested for loops.
$xml = simplexml_load_file('http://www.gdacs.org/xml/rss.xml');
$xml->registerXPathNamespace('gdacs', 'http://www.gdacs.org');
$items = $xml->xpath("//channel/item");
$i = 1;
$out = array();
foreach($items as $x) {
$titles = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/gdacs:title");
$urls = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/#url");
for($j=0; $j<count($urls); $j++) {
$out[$j.$i]['title'] = (string)$titles[$j];
$out[$j.$i]['url'] = (string)$urls[$j];
}
$i++;
}
$out = array_values($out);
var_dump($out);
ARRAY DUMP
array(40) {
[0]=>
array(2) {
["title"]=>
string(21) "Storm surge animation"
["url"]=>
string(92) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/outres1.gif"
}
[1]=>
array(2) {
["title"]=>
string(26) "Storm surge maximum height"
["url"]=>
string(101) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/P1_MAXHEIGHT_END.jpg"
}
[2]=>
array(2) {
["title"]=>
string(12) "Overview map"
["url"]=>
string(64) "http://dma.gdacs.org/saved/gdacs/tc/1000226/clouds_1000226_2.png"
}
[3]=>
array(2) {
["title"]=>
string(41) "Map of rainfall accummulation in past 24h"
["url"]=>
string(70) "http://dma.gdacs.org/saved/gdacs/tc/1000226/current_rain_1000226_2.png"
}
[4]=>
array(2) {
["title"]=>
string(23) "Map of extreme rainfall"
["url"]=>
string(62) "http://dma.gdacs.org/saved/gdacs/tc/1000226/rain_1000226_2.png"
}
[5]=>
array(2) {
["title"]=>
string(34) "Map of extreme rainfall (original)"
["url"]=>
string(97) "http://www.ssd.noaa.gov/PS/TROP/DATA/ETRAP/2015/NorthIndian/THREE/2015THREE.pmqpf.10100000.00.GIF"
}
...

How to fetch html string of XPath results?

Considering this code:
<div class="a">foo</div>
<div class="a"><div id="1">bar</div></div>
If I want to fetch all the values of divs with class a, I'll do the following query:
$q = $xpath->query('//div[#class="a"]');
However, I'll get this result:
foo
bar
But I want to get the actual value including the children tags. So it'll look like that:
foo
<div id="1">bar</div>
How can I accomplish that with XPath and DOMDocument only?
Solved by the function provided here.

PHP DOM has an undocumented '.nodeValue' attribute which acts exactly like .innerHTML in a browser. Once you've used XPath to get the node you want, just do $node->nodeValue to get the innerhtml.

You can try to use
$xml = '<?xml version=\'1.0\' encoding=\'UTF-8\' ?>
<root>
<div class="a">foo</div>
<div class="a"><div id="1">bar</div></div>
</root>';
$xml = simplexml_load_string($xml);
var_dump($xml->xpath('//div[#class="a"]'));
But in this case you will have to iterate objects.
Output:
array(2) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["#attributes"]=>
array(1) {
["class"]=>
string(1) "a"
}
[0]=>
string(3) "foo"
}
[1]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(1) {
["class"]=>
string(1) "a"
}
["div"]=>
string(3) "bar"
}
}

Try something like:
$doc = new DOMDocument;
$doc->loadHTML('<div>Your HTML here.</div>');
$xpath = new DOMXpath($doc);
$node = $xpath->query('//div[#class="a"]')->item(0);
$html = $node->ownerDocument->saveHTML($node); // Get HTML of DOMElement.

xpath not return values

I am able to pull the necessary information using xpath, when I use var_dump using the following code. When I try to add a foreach loop to return all ["href"] values i get a blank page any ideas where I am messing up?
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
$links = $rss->href;
foreach ($links as $link){
echo $link;
}
Here is the array of information.
array(96) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(49) "/p/18351/test1.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(36) ""test1"
}
[1]=>
object(SimpleXMLElement)#4 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(43) "/p/18351/test2.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(30) ""test2"
}
[2]=>
object(SimpleXMLElement)#5 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(48) "/p/18351/test3.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(35) ""test3"
}

Instead of:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
use:
$hrefs = $xml->xpath("/html/body//a[#class='highzoom1']/#href");
The original XPath expression (the first above) you are using selects any a element in the XML document the value of whose class atribute is 'highzoom1' and that (the a element) is a descendent of a body that is a child of the top element (named html) in the XML document.
However, you want to select the href attributes of these a elements -- not the a elements themselves.
The second XPath expression above select exactly the href attributes of these a elements.

$links = $rss->href;
will never work, as $rss is a DOMNodeList object, and won't have an href attribute. Instead, you'd want to do this:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
foreach($rss as $link) {
echo $link->href;
}
Or you can address $rss as an array directly:
echo $rss[5]->href; // echo out the href of the 6th link found.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Fetch data from site using php and put in an array - php

Related

Removing link from Unordered list as string in array

Cannot get html attribute using PHP Simple Html DOM

Getting data from XML

How to fetch html string of XPath results?

xpath not return values

Categories

Resources