I parsing scores from http://sports.in.msn.com/football-world-cup-2014/south-africa-v-brazil/1597383
I able to parse all the attributes. But I can't able to parse the time.
I Used
$homepages = file_get_html("http://sports.in.msn.com/football-world-cup-2014/south-africa-v-brazil/1597383");
$teama = $homepages->find('span[id="clock"]');
Kindly help me
Since the that particular site is loading the values dynamically (thru AJAX request), you cant really parse the value upon initial load.
<span id="clock"></span> // this tends to be empty initial load
Normal scrapping:
$homepages = file_get_contents("http://sports.in.msn.com/football-world-cup-2014/south-africa-v-brazil/1597383");
$doc = new DOMDocument();
#$doc->loadHTML($homepages);
$xpath = new DOMXPath($doc);
$query = $xpath->query("//span[#id='clock']");
foreach($query as $value) {
echo $value->nodeValue; // the traversal is correct, but this will be empty
}
My suggestion is instead of scraping it, you will need to have to access it thru a request also, since it is a time (of course, as the match goes on this will change and change until the game has ended). Or you can also use their request.
$url = 'http://sports.in.msn.com/liveplayajax/SOCCERMATCH/match/gsm/en-in/1597383';
$contents = file_get_contents($url);
$data = json_decode($contents, true);
echo '<pre>';
print_r($data);
echo '</pre>';
Should yield something like (a part of it actually):
[2] => Array
(
[Code] =>
[CommentId] => -1119368663
[CommentType] => manual
[Description] => FULL-TIME: South Africa 0-5 Brazil.
[Min] => 90'
[MinExtra] => (+3)
[View] =>
[ViewHint] =>
[ViewIndex] => 0
[EditKey] =>
[TrackingValues] =>
[AopValue] =>
)
You should get the 90' by using foreach. Consider this example:
foreach($data['Commentary']['CommentaryItems'] as $key => $value) {
if(stripos($value['Description'], 'FULL-TIME') !== false) {
echo $value['Min'];
break;
}
}
Should print: 90'
Related
I am trying to read XML data in PHP. The XML data is coming from an API whose link is the following: https://seekingalpha.com/api/sa/combined/AAPL.xml
I just need News Headline, News Link , Published Date and Author Name of the first five news from the API. To do this, I am using the following PHP code:
$note = "https://seekingalpha.com/api/sa/combined/".$symbolValue.".xml";
$xml=simplexml_load_file($note);
$jsonArray = array();
for ($i=0; $i<5; $i++) {
$newsHeadline = $xml->channel->item[$i]->title->__toString();
$newsLink = $xml->channel->item[$i]->link->__toString();
$publishedDate = $xml->channel->item[$i]->pubDate->__toString();
$authorName = $xml->channel->item[$i]->sa:author_name->__toString();
$temp = array('Title' => $newsHeadline, 'Link' => $newsLink,'Publish'=>$publishedDate,'Author'=>$authorName);
array_push($jsonArray,$temp);
}
$jsonNews = json_encode($jsonArray);
$completeData[9] = $jsonNews;
In the above code, $note contains the link to the API. The $symbolValue is the value which I am getting from the front end. My code works absolutely until I access the author name ie. The following line of code:
$authorName = $xml->channel->item[$i]->sa:author_name->__toString();
I am getting the following error:
Parse error: syntax error, unexpected ':' in /home/File/Path
It seems like I am not supposed to use the ":" for fetching the author name.
So, how do I get the user name and put it in the $temp such that the Tag for the author name is "Author"?
Please have a look at the API to get an idea about the XML file.
The children() method supports an argument for the namespace you want to read. This is required when you want to read elements which are not in the current/default namespace.
$xmldata = <<<'XML'
<?xml version="1.0"?>
<foobar xmlns:x="http://example.org/">
<abc>test</abc>
<x:def>content</x:def>
</foobar>
XML;
$xml = new Simplexmlelement($xmldata);
$other = $xml->children('http://example.org/');
var_dump((string)$other->def);
This will output the value "content", but using the expression $xml->def will not because that is not in the current/default namespace.
If you property contains for example a :, you could use curly braces
->{'sa:author_name'}
The values you are looking for are in the https://seekingalpha.com/api/1.0 namespace.
You could use the children of the SimpleXMLElement and add te namespace:
$authorName = (string)$xml->channel->item[$i]->children('https://seekingalpha.com/api/1.0')->{'author_name'};
Or you could use xpath.
$authorName = (string)$xml->channel->item[$i]->xpath('sa:author_name')[0];
For example:
$jsonArray = array();
for ($i = 0; $i < 5; $i++) {
$newsHeadline = $xml->channel->item[$i]->title->__toString();
$newsLink = $xml->channel->item[$i]->link->__toString();
$publishedDate = $xml->channel->item[$i]->pubDate->__toString();
$authorName = (string)$xml->channel->item[$i]->xpath('sa:author_name')[0];
// or xpath
// $authorName = (string)$xml->channel->item[$i]->children('https://seekingalpha.com/api/1.0')->{'author_name'};
$temp = array('Title' => $newsHeadline, 'Link' => $newsLink, 'Publish' => $publishedDate, 'Author' => $authorName);
array_push($jsonArray, $temp);
}
$jsonNews = json_encode($jsonArray);
print_r($jsonArray);
Will give you:
Array
(
[0] => Array
(
[Title] => Apple acknowledges iPhone X issue in some devices, plans fix
[Link] => https://seekingalpha.com/symbol/AAPL/news?source=feed_symbol_AAPL
[Publish] => Fri, 10 Nov 2017 12:04:42 -0500
[Author] => Brandy Betz
)
etc...
I am trying to use PHP to read a large XML file (gzipped). The file consists of repeated products (actually books). Each book has 1 or more contributors. This is an example of a product.
<Product>
<ProductIdentifier>
<IDTypeName>EAN.UCC-13</IDTypeName>
<IDValue>9999999999999</IDValue>
</ProductIdentifier>
<Contributor>
<SequenceNumber>1</SequenceNumber>
<ContributorRole>A01</ContributorRole>
<PersonNameInverted>Bloggs, Joe</PersonNameInverted>
</Contributor>
<Contributor>
<SequenceNumber>2</SequenceNumber>
<ContributorRole>A01</ContributorRole>
<PersonNameInverted>Jones, John</PersonNameInverted>
</Contributor>
<Contributor>
<SequenceNumber>3</SequenceNumber>
<ContributorRole>B01</ContributorRole>
<PersonNameInverted>Other, An</PersonNameInverted>
</Contributor>
The output I would wish for this example is
Array
(
[1] => 9999999999999
[2] => Bloggs, Joe(A01)
[3] => Jones, John(A01)
[4] => Other, An(B01)
)
My code loads the gzipped XML file and handles the repeated sequence of products with no problem but I cannot get it to handle the repeated sequence of contributors. My code for handling the products and first contributor is shown below but I have tried various ways of looping through the contributors but cannot seem to achieve what I need. I'm a beginner with PHP and XML although an IT professional for many years.
$reader = new XMLReader();
//load the selected XML file to the DOM
if(!$reader->open("compress.zlib://filename.xml.gz","r")){
die('Failed to open file!');
}
while ($reader->read()):
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'Product')
{
$xml = simplexml_load_string($reader->readOuterXML());
list($result) = $xml->xpath('//ProductIdentifier[IDTypeName = "EAN.UCC-13"]');
$line[1] = (string)$result->IDValue;
list($result) = $xml->xpath('//Contributor');
$contributorname = (string)$result->PersonNameInverted;
$role = (string)$result->ContributorRole;
$line[2] = $contributorname."(".$role.")";
echo '<pre>'; print_r($line); echo '</pre>';
}
endwhile;
Since you have several contributors, you must handle it as an array and loop on them to prepare your final variable:
<?php
$reader = new XMLReader();
//load the selected XML file to the DOM
if(!$reader->open("compress.zlib://filename.xml.gz","r")){
die('Failed to open file!');
}
while ($reader->read()) {
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'Product') {
$xml = simplexml_load_string($reader->readOuterXML());
list($result) = $xml->xpath('//ProductIdentifier[IDTypeName = "EAN.UCC-13"]');
$line[1] = (string)$result->IDValue;
// get all contributors in an array
$contributors = $xml->xpath('//Contributor');
$i = 2;
// go through all contributors
foreach($contributors as $contributor) {
$contributorname = (string)$contributor->PersonNameInverted;
$role = (string)$contributor->ContributorRole;
$line[$i] = $contributorname."(".$role.")";
$i++;
}
echo '<pre>'; print_r($line); echo '</pre>';
}
}
This will give you the following output:
Array
(
[1] => 9999999999999
[2] => Bloggs, Joe(A01)
[3] => Jones, John(A01)
[4] => Other, An(B01)
)
EDIT: Some explanation here on what is wrong on your code. Instead of taking all the contributors, you just take the first one with list()
http://php.net/manual/en/function.list.php (assign all values of an array into variables). Since you don't know how many contributors you have (i guess...), you cannot use this.
Then you assign the first one into your $line, so you always have only the first one.
I'm fetching top selling items for a particular browseNodeId. The xml response has 10 items but when I print/display the information it shows only one. Please help.
My request array is:
$params = array(
"Service" => "AWSECommerceService",
"Operation" => "BrowseNodeLookup",
"AWSAccessKeyId" => "",
"AssociateTag" => "",
"BrowseNodeId" => "6386372011",
"ResponseGroup" => "TopSellers"
);
(I removed my ids on purpose)
and this is how I'm parsing xml response:
$response = simplexml_load_file($request_url);
foreach($response->BrowseNodes->BrowseNode as $item)
{
$topItem = $item->TopItemSet->TopItem->Title;
$itemURL = $item->TopItemSet->TopItem->DetailPageURL;
$itemID = $item->TopItemSet->TopItem->ASIN;
$results .= "<tr><td>$topItem</td><td>$itemID</td></tr>";
}
later on I'm simply printing '$results' using echo command. This approach is working with all other requesting/responses i.e. I'm getting & displaying 10 items without any problem. I can't find any error. Please help, I want to display 10 items not just one.
Convert the XML Object into array using this
$response = simplexml_load_file($request_url);
$json_string = json_encode($response);
$result = json_decode($json_string, TRUE);
And then access the elements using array['key'] syntax.
I have an XML file like so:
<GenResponse>
<Detail1></Detail1>
<Detail2></Detail>
<DataNodes>
<DataNode>
<NodeDetails1>
<node4>Parrot Musky Truck Moo</node4>
<node5>Tinker Singer Happy Fool</node5>
<node6>
<FurtherDetails>
<Node>Musky</Node>
<Node>Lorem Ipsum</Node>
</FurtherDetails>
</NodeDetails1>
<NodeDetails2>ID</NodeDetails2>
</DataNode>
<DataNode>
<NodeDetails1>
<node4>Sky Star Panet Shoe</node4>
<node5>Rusky Husky Musky Boo</node5>
</NodeDetails1>
<NodeDetails2>ID</NodeDetails2>
</DataNode>
</DataNodes>
</GenResponse>
I would like to know how I would inject a search string "Musky" to a PHP function and get back <DataNode>...</DataNode> & <DataNode>...</DataNode> back?
Essentially I would like to search a huge XML file for a string and return all the DataNode's which contain the string back.
If this is possible with SimpleXML it would be great. Else any other solution is also fine.
EDIT: Notice how "Musky" can be in different nodes under <DataNode>
use
$xmlStr = file_get_contents('data/your_XML_File.xml');
$xml = new SimpleXMLElement($xmlStr);
// seach records by tag value:
// find nodes with text
$res = $xml->xpath("node2[contains(., 'Musky')]");
print_r($res);
//For testing purpost just copy paste following code in editor , For testing , I didnt use separate xml file.
<?php
//$xmlStr = file_get_contents('test.xml');
$xmlStr = '<node1>
<node2>
<node3>
<node4>Parrot Singer Truck Moo</node4>
<node5>Tinker Musky Happy Fool</node5>
</node3>
<node7>ID</node7>
</node2>
<node2>
<node3>
<node4>Sky Star Panet Shoe</node4>
<node5>Rusky Husky Musky Boo</node5>
</node3>
<node7>ID</node7>
</node2>
</node1>';
$xml = new SimpleXMLElement($xmlStr);
// seach records by tag value:
// find nodes with text
$res = $xml->xpath("node2[contains(., 'Musky')]");
echo "<pre>";
print_r($res);
?>
It gives proper output , i tried
Array
(
[0] => SimpleXMLElement Object
(
[node3] => SimpleXMLElement Object
(
[node4] => Parrot Singer Truck Moo
[node5] => Tinker Musky Happy Fool
)
[node7] => ID
)
[1] => SimpleXMLElement Object
(
[node3] => SimpleXMLElement Object
(
[node4] => Sky Star Panet Shoe
[node5] => Rusky Husky Musky Boo
)
[node7] => ID
)
)
Use this code and you can find your search word.I have made it a function just pass your keyword and you will get your result,
function findWord($findVar)
{
$catalog = simplexml_load_file("xmlfile.xml");
$category = $catalog->node2;
$found = 0;
foreach($category as $c)
{
foreach($c->node3 as $node3)
{
$node4 = (string) ($node3->node4);
$node5 = (string) ($node3->node5);
if (stripos(strtolower($node4),strtolower($findVar)))
{
echo 'Found!!'.'<br/>';
$found++;
}
if (stripos(strtolower($node5),strtolower($findVar)))
{
echo 'Found!!'.'<br/>';
$found++;
}
}
if (stripos(strtolower((string)$c->node7),strtolower($findVar)))
{
echo 'Found!!'.'<br/>';
$found++;
}
}
if ($found == 0)
{
echo "No result";
}
}
$findVar = 'Musky';
findWord($findVar);
This question already has answers here:
Scrape web page data generated by javascript
(2 answers)
Closed 8 years ago.
I am stuck with a scraping task in my project.
i want to grab the data from the link in $html , all table content of tr and td , here i am trying to grab the link but it only shows javascript: self.close()
<?php
include("simple_html_dom.php");
$html = file_get_html('http://www.areacodelocations.info/allcities.php?ac=201');
foreach($html->find('a') as $element)
echo $element->href . '<br>';
?>
Usually, this kind of pages load a bunch of Javascript (jQuery, etc.), which then builds the interface and retrieves the data to be displayed from a data source.
So what you need to do is open that page in Firefox or similar, with a tool such as Firebug in order to see what requests are actually being done. If you're lucky, you will find it directly in the list of XHR requests. As in this case:
http://www.govliquidation.com/json/buyer_ux/salescalendar.js
Notice that this course of action may infringe on some license or terms of use. Clear this with the webmaster/data source/copyright owner before proceeding: detecting and forbidding this kind of scraping is very easy, and identifying you is probably only slightly less so.
Anyway, if you issue the same call in PHP, you can directly scrape the data (provided there is no session/authentication issue, as seems the case here) with very simple code:
<?php
$url = "http://www.govliquidation.com/json/buyer_ux/salescalendar.js";
$json = file_get_contents($url);
$data = json_decode($json);
?>
This yields a data object that you can inspect and convert in CSV by simple looping.
stdClass Object
(
[result] => stdClass Object
(
[events] => Array
(
[0] => stdClass Object
(
[yahoo_dur] => 11300
[closing_today] => 0
[language_code] => en
[mixed_id] => 9297
[event_id] => 9297
[close_meridian] => PM
[commercial_sale_flag] => 0
[close_time] => 01/06/2014
[award_time_unixtime] => 1389070800
[category] => Tires, Parts & Components
[open_time_unixtime] => 1388638800
[yahoo_date] => 20140102T000000Z
[open_time] => 01/02/2014
[event_close_time] => 2014-01-06 17:00:00
[display_event_id] => 9297
[type_code] => X3
[title] => Truck Drive Axles # Killeen, TX
[special_flag] => 1
[demil_flag] => 0
[google_close] => 20140106
[event_open_time] => 2014-01-02 00:00:00
[google_open] => 20140102
[third_party_url] =>
[bid_package_flag] => 0
[is_open] => 1
[fda_count] => 0
[close_time_unixtime] => 1389045600
You retrieve $data->result->events, use fputcsv() on its items converted to array form, and Bob's your uncle.
In the case of the second site, you have a table with several TR elements, and you want to catch the first two TD children of each TR.
By inspecting the source code you see something like this:
<tr>
<td> Allendale</td>
<td> Eastern Time
</td>
</tr>
<tr>
<td> Alpine</td>
<td> Eastern Time
</td>
So you just grab all the TR's
<?php
include("simple_html_dom.php");
$html = file_get_html('http://www.areacodelocations.info/allcities.php?ac=201');
$fp = fopen('output.csv', 'w');
if (!$fp) die("Cannot open output CSV - permission problems maybe?");
foreach($html->find('tr') as $tr) {
$csv = array(); // Start empty. A new CSV row for each TR.
// Now find the TD children of $tr. They will make up a row.
foreach($tr->find('td') as $td) {
// Get TD's innertext, but
$csv[] = $td->innertext;
}
fputcsv($fp, $csv);
}
fclose($fp);
?>
You will notice that the CSV text is "dirty". That is because the actual text is:
<td> Alpine</td>
<td> Eastern Time[CARRIAGE RETURN HERE]
</td>
So to have "Alpine" and "Eastern Time", you have to replace
$csv[] = $td->innertext;
with something like
$csv[] = strip(
html_entity_decode (
$td->innertext,
ENT_COMPAT | ENT_HTML401,
'UTF-8'
)
);
Check out the PHP man page for html_entity_decode() about character set encoding and entity handling. The above ought to work -- and an ought and fifty cents will get you a cup of coffee :-)