Kindly please help regarding Xpath...
Following scripts will scraping the main body of URL by using Xpath
<?php
//sentimen order
if (PHP_SAPI != 'cli') {
echo "<pre>";
}
require_once __DIR__ . '/../autoload.php';
$sentiment = new \PHPInsight\Sentiment();
require_once 'Xpath.php';
$startUrl = "http://news.sky.com/story/1445575/suspect-held-over-shooting-of-ferguson-police/";
$xpath = new XPATH($startUrl);
// We starts from the root element
$query = '/html/body/div[2]/div[3]/article/div/div[2]/div[2]/p[3]';
$strQuery = $xpath->query($query);
$strNode = $strQuery->item(0)->nodeValue;
$result = array($strNode);
foreach ($result as $string) {
// calculations:
$scores = $sentiment->score($string);
$class = $sentiment->categorise($string);
// output:
echo "Strings $string \n";
echo "Dominant: $class, scores: ";
print_r($scores);
echo "\n";
}
Above scripts run well except the array loop...Xpath does not scraping ALL content but ONLY the first line of main body..
I think the problem lies from array loop and foreach...
Anyone please help to fix this looping....
You only fetch one paragraph. Additionally you only put one string into the array.
You're perhaps looking for something more along this lines:
foreach ($xpath->query('
//header/h1
|//header/p
|//header//p[#class="last-updated__text"]
|//div[#class="story__content"]/p') as $p) {
echo string_normalize($p->textContent), "\n\n";
}
function string_normalize($string)
{
return preg_replace('~\s+~u', ' ', trim($string));
}
Output:
Shooting Of Ferguson Police: Suspect Charged
A prosecutor says the 20-year-old suspect claims he fired the shots in a dispute with other individuals and did not aim at police.
05:19, UK, Monday 16 March 2015
By Sky News US Team
A suspect has been charged in connection with the shooting and wounding last week of two police officers in Ferguson, Missouri.
St Louis County prosecutor Robert McCulloch told a news conference the accused was 20-year-old Jeffrey Williams.
He said the suspect, a local resident, was facing two counts of assault in the first degree.
Williams, who was arrested on Saturday night, is also charged with firing a handgun from a vehicle.
"He has acknowledged his participation in firing the shots," Mr McCulloch told reporters.
...
Related
I have created a script where every other word in a paragraph is green, which is correct. However there is a problem because the original paragraph which I used appears above the new paragraph, which I do not want.
This solution to this may be simple but I can't get my head around it.
Can anyone point me in the right direction?
Code:
<?php
$storyOfTheDay= "Once upon a time there was an old woman who loved baking gingerbread. She would bake gingerbread cookies, cakes, houses and gingerbread people, all decorated with chocolate and peppermint, caramel candies and colored frosting.
She lived with her husband on a farm at the edge of town. The sweet spicy smell of gingerbread brought children skipping and running to see what would be offered that day.
Unfortunately the children gobbled up the treats so fast that the old woman had a hard time keeping her supply of flour and spices to continue making the batches of gingerbread. Sometimes she suspected little hands of having reached through her kitchen window because gingerbread pieces and cookies would disappear.";
$storyOfTheDay = preg_split("/\s+/", $storyOfTheDay);
//Adding <span> to odd array index items
foreach (array_chunk($storyOfTheDay , 2) as $chunk) {
$storyOfTheDay[] = $chunk[0];
if(!empty( $chunk[1]))
{
$storyOfTheDay[] = $chunk[1]= "<span style='color:green'>". $chunk[1] ."</span>";
}
}
$storyOfTheDay = join(" ", $storyOfTheDay);
echo $storyOfTheDay;
Output:
Image of Output
You are continuously filling the same array ($storyOfTheDay). Make the new one:
$storyOfTheDay = preg_split("/\s+/", $storyOfTheDay);
$newStoryOfTheDay = [];
//Adding <span> to odd array index items
foreach (array_chunk($storyOfTheDay , 2) as $chunk) {
$newStoryOfTheDay[] = $chunk[0];
if( !empty($chunk[1]) ){
$newStoryOfTheDay[] = "<span style='color:green'>". $chunk[1] ."</span>";
}
}
$newStoryOfTheDay = join(" ", $newStoryOfTheDay);
echo $newStoryOfTheDay;
I'm sure there's a pretty obvious solution to this problem...but it's alluding me.
I've got an XML feed that I want to pull information from - from only items with a specific ID. Let lets say we have the following XML:
<XML>
<item>
<name>John</name>
<p:id>1</id>
<p:eye>Blue</eye>
<p:hair>Black</hair>
</item>
<item>
<name>Jake</name>
<p:id>2</id>
<p:eye>Hazel</eye>
<p:hair>White</hair>
</item>
<item>
<name>Amy</name>
<p:id>3</id>
<p:eye>Brown</eye>
<p:hair>Yellow</hair>
</item>
<item>
<name>Tammy</name>
<p:id>4</id>
<p:eye>Blue</eye>
<p:hair>Black</hair>
</item>
<item>
<name>Blake</name>
<p:id>5</id>
<p:eye>Green</eye>
<p:hair>Red</hair>
</item>
</xml>
And I want to pull ONLY people with the ID 3 and 1 into specific spots on a page (there will be no double IDs - unique IDs for each item). Using SimpleXML and a forloop I can easily display each ITEM on a page using PHP - with some "if ($item->{'id'} == #)" statements (where # is the ID I'm looking for(, I can also display the info for each ID I'm looking for.
The problem I'm running into is how to distribute the information across the page. I'm trying to pull the information into specific spots on a page my first attempt at distributing the specific fields across the page aren't working as follows:
<html>
<head><title>.</title></head>
<body>
<?php
(SimpleXML code / For Loop for each element here...)
?>
<H1>Staff Profiles</h1>
<h4>Maintenance</h4>
<p>Maintenance staff does a lot of work! Meet your super maintenance staff:</p>
<?php
if($ID == 1) {
echo "Name:".$name."<br/>";
echo "Eye Color:".$eye."<br/>";
echo "Hair Color:".$hair."<br/>";
?>
<h4>Receptionists</h4>
<p>Always a smiling face - meet them here:</p>
<?php
if($ID == 3) {
echo "Name:".$name."<br/>";
echo "Eye Color:".$eye."<br/>";
echo "Hair Color:".$hair."<br/>";
?>
<H4>The ENd</h4>
<?php (closing the four loop) ?>
</body>
</html>
But it's not working - it randomly starts repeating elements on my page (not even the XML elements). My method is probably pretty...rudimentary; so a point in the right direction is much appreciated. Any advice?
EDIT:
New (NEW) XPATH code:
$count = 0;
foreach ($sxe->xpath('//item') as $item) {
$item->registerXPathNamespace('p', 'http://www.example.com/this');
$id = $item->xpath('//p:id');
echo $id[$count] . "\n";
echo $item->name . "<br />";
$count++;
}
use xpath to accomplish this, and write a small function to retrieve a person by id.
function getPerson($id = 0, &$xml) {
return $xml->xpath("//item[id='$id']")[0]; // PHP >= 5.4 required
}
$xml = simplexml_load_string($x); // assume XML in $x
Now, you can (example 1):
echo getPerson(5, $xml)->name;
Output:
Blake
or (example 2):
$a = getPerson(2, $xml);
echo "$a->name has $a->eye eyes and $a->hair hair.";
Output:
Jake has Hazel eyes and White hair.
see it working: http://codepad.viper-7.com/SwLids
EDIT In your HTML, this would probably look like this:
...
<h1>Staff Profiles</h1>
<h4>Maintenance</h4>
<p>Maintenance staff does a lot of work! Meet your super maintenance staff:</p>
<?php
$p = getPerson(4, $xml);
echo "Name: $p->name <br />";
echo "Eye Color: $p->eye <br />";
echo "Hair Color: $p->hair <br />";
?>
no looping required, though.
First thing that popped into my mind is to use a numerical offset (which is zero-based in SimpleXML) as there is a string co-relation between the offset and the ID, the offset is always the ID minus one:
$items = $xml->item;
$id = 3;
$person = $items[$id - 1];
echo $person->id, "\n"; // prints "3"
But that would work only if - and only if - the first element would have ID 1 and then each next element the ID value one higher than it's previous sibling.
Which we could just assume by the sample XML given, however, I somewhat guess this is not the case. So the next thing that can be done is to still use the offset but this time create a map between IDs and offsets:
$items = $xml->item;
$offset = 0;
$idMap = [];
foreach ($items as $item) {
$idMap[$item->id] = $offset;
$offset++;
}
With that new $idMap map, you then can get each item based on the ID:
$id = 3;
$person = $items[$idMap[$id]];
Such a map is useful in case you know that you need that more than once, because creating the map is somewhat extra work you need to do.
So let's see if there ain't something built-in that solves the issue already. Maybe there is some code out there that shows how to find an element in simplexml with a specific attribute value?
SimpleXML: Selecting Elements Which Have A Certain Attribute Value (Reference Question)
Read and take value of XML attributes - Especially because of the answer on how to add the functionality to SimpleXMLElement transparently.
Which leads to the point you could do it as outlined in that answer that shows how it works transparently like this:
$person = $items->attribute("id", $id);
I hope this is helpful.
I want to parse Google News rss with PHP. I managed to run this code:
<?
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');
foreach($news->channel->item as $item) {
echo "<strong>" . $item->title . "</strong><br />";
echo strip_tags($item->description) ."<br /><br />";
}
?>
However, I'm unable to solve following problems. For example:
How can i get the hyperlink of the news title?
As each of the Google news has many related news links in footer, (and my code above includes them also). How can I remove those from the description?
How can i get the image of each news also? (Google displays a thumbnail image of each news)
Thanks.
There we go, just what you need for your particular situation:
<?php
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');
$feeds = array();
$i = 0;
foreach ($news->channel->item as $item)
{
preg_match('#src="([^"]+)"#', $item->description, $match);
$parts = explode('<font size="-1">', $item->description);
$feeds[$i]['title'] = (string) $item->title;
$feeds[$i]['link'] = (string) $item->link;
$feeds[$i]['image'] = $match[1];
$feeds[$i]['site_title'] = strip_tags($parts[1]);
$feeds[$i]['story'] = strip_tags($parts[2]);
$i++;
}
echo '<pre>';
print_r($feeds);
echo '</pre>';
?>
And the output should look like this:
[2] => Array
(
[title] => Los Alamos Nuclear Lab Under Siege From Wildfire - ABC News
[link] => http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGxBe4YsZArH0kSwEjq_zDm_h-N4A&url=http://abcnews.go.com/Technology/wireStory?id%3D13951623
[image] => http://nt2.ggpht.com/news/tbn/OhH43xORRwiW1M/6.jpg
[site_title] => ABC News
[story] => A wildfire burning near the desert birthplace of the atomic bomb advanced on the Los Alamos laboratory and thousands of outdoor drums of plutonium-contaminated waste Tuesday as authorities stepped up ...
)
I'd recommend checking out SimplePie. I've used it for several different projects and it works great (and abstracts away all of the headache you're currently dealing with).
Now, if you're writing this code simply because you want to learn how to do it, you should probably ignore this answer. :)
To get the URL for a news item, use $item->link.
If there's a common delimiter for the related news links, you could use regex to cut off everything after it.
Google puts the thumbnail image HTML code inside the description field of the feed. You could regex out everything between the open and close brackets for the image declaration to get the HTML for it.
I want to insert some value from first and second foreach into database, but I met some trouble. I write my problem in the code. I can not solve the two loop problem. I ask for a help.
<?php
header('Content-type:text/html; charset=utf-8');
set_time_limit(0);
require_once ('../conn.php');
require_once ('../simple_html_dom.php');
$url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&q=obama&key={api-key}";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $url);
$body = curl_exec($ch);
curl_close($ch);
$data = json_decode($body);
foreach ($data->responseData->results as $result) {
$title = html_entity_decode($result->titleNoFormatting);
$link = html_entity_decode($result->unescapedUrl);
$html = #file_get_html($link );
foreach(#$html->find('h3') as $element) {
$table=$element;
echo $table;// here while the $table is empty, echo is null.
}
echo $table;// here while the $table is empty, echo will repeat the prev $table value.
mysql_query("SET NAMES utf8");
mysql_query("INSERT INTO ...");// I want insert all the $title and $table into database.
}
echo '<hr />';
}
?>
I print the result while the $table is empty, echo will repeat the prev $table value.
Organizing for America | BarackObama.com
Barack Obama - Wikipedia, the free encyclopedia
President Barack Obama | The White House
President Obama Nominates William Francis Kuntz, II to the United States District Court//the prev value
Change.gov - The Official Web Site of the
President Obama Nominates William Francis Kuntz, II to the United States District Court//here the $table is empty, it will repeat the prev $table value, and it should be empty.
Barack Obama on Myspace
Idle Friends▼
ob (obama) on Twitter
Piè di pagina
Barack Obama
Advertise with the NY Daily News!
Barack Obama on the Issues
Voting Record
PHP's variable initialization and scoping rules are kind of funny.
At no point are you initializing $table. It first gets referenced two foreaches deep. PHP allows this, and won't complain about it.
The problem is that you're constantly trying to set it to a value, but you're never actually resetting it.
Initialize it to null before the inner foreach:
$html = file_get_html($link );
$table = null; // <-- New!
foreach($html->find('h3') as $element) {
$table = $element;
echo $table;
}
This ensures that, when the foreach is completed, $table will either be null, or it will be the final H3 element in the HTML document you fetched. (Incidentally, if you really did want the final H3, you can probably just grab the array that find returns and look at the last element rather than looping through.)
Also, please get rid of the # error-silencing operators, turn error_reporting all the way up, and make sure you've turned on display_errors. You may have other errors lurking that you are intentionally ignoring, and that leads to horror stories.
here is the page i want to parse
(the api link i gave is just a dev test so its ok to be public)
http://api.scribd.com/api?method=docs.getList&api_key=2apz5npsqin3cjlbj0s6m
the output im looking for is something like this (for now)
Doc_id: 29638658
access_key: key-11fg37gwmer54ssq56l3
secret_password: 1trinfqri6cnv3gf6rnl
title: Sample
description: k
thumbnail_url: http://i6.scribdassets.com/public/images/uploaded/152418747/xTkjCwQaGf_thumbnail.jpeg
page_count: 100
ive tried everything i can find on the internet but nothing works good. i have this one script
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load("http://api.scribd.com/api?method=docs.getList&api_key=2apz5npsqin3cjlbj0s6m");
$x = $xmlDoc->documentElement;
foreach ($x->childNodes AS $item) {
print $item->nodeName . " = " . $item->nodeValue;
}
?>
its output comes out like this:
#text =
resultset =
29638658
key-11fg37gwmer54ssq56l3
1trinfqri6cnv3gf6rnl
Sample
k
http://i6.scribdassets.com/public/images/uploaded/152418747/xTkjCwQaGf_thumbnail.jpeg
DONE
100
29713260
key-18a9xret4jf02129vlw8
25fjsmmvl62l4cbwd1vq
book2
description bla bla
http://i6.scribdassets.com/public/images/uploaded/153065528/oLVqPZMu3zhsOn_thumbnail.jpeg
DONE
7
#text =
i need major help im really stuck and dont know what to do. please please help me. thnx
I recommend you load the XML data into a new SimpleXmlElement object as this will allow you to run xpath queries against the document.
You will need to do a little research on how it works, but here's a few pointers...
Execute the xpath like so:
// $xml is a SimpleXMLElement object
$xml = simplexml_load_file('/path/to/file');
$nodes = $xml->xpath('/xpathquery');
A single / represents the root node (in your case rsp). A double slash represents any matching node. For example //title would return all titles. Each result of an xpath query is an array of SimpleXMLElements. You can get data from it like so:
# Untested
$xml = simplexml_load_file('/path/to/file');
$nodes = $xml->xpath('//result');
foreach ($result as $node) {
// Print out the value in title
echo $node->title;
}
// Print out the amount of results
echo $xml->rsp->attributes()->totalResultsAvailable;
The final example with the results numbers may not work, but it is along those lines.
With "$x->childNodes" you get only direct childs. You might want to check php documentation:
http://php.net/manual/en/class.domdocument.php
And I think that this one could be better:
http://si.php.net/manual/en/book.simplexml.php
Quick and dirty, the way things are supposed to be:
<?php
$x = simplexml_load_file('http://api.scribd.com/api?method=docs.getList&api_key=2apz5npsqin3cjlbj0s6m');
foreach($x->resultset->result as $v) {
foreach((array)$v as $kk=>$vv) {
echo $kk.": ".trim($vv)."<br>\n";
}
echo "<br><br>\n";
}
This will output in the format you requested:
doc_id: 29638658
access_key: key-11fg37gwmer54ssq56l3
secret_password: 1trinfqri6cnv3gf6rnl
title: Sample
description: k
thumbnail_url: http://i6.scribdassets.com/public/images/uploaded/152418747/xTkjCwQaGf_thumbnail.jpeg
conversion_status: DONE
page_count: 100
doc_id: 29713260
access_key: key-18a9xret4jf02129vlw8
secret_password: 25fjsmmvl62l4cbwd1vq
title: book2
description: description bla bla
thumbnail_url: http://i6.scribdassets.com/public/images/uploaded/153065528/oLVqPZMu3zhsOn_thumbnail.jpeg
conversion_status: DONE
page_count: 7