I have created a script where every other word in a paragraph is green, which is correct. However there is a problem because the original paragraph which I used appears above the new paragraph, which I do not want.
This solution to this may be simple but I can't get my head around it.
Can anyone point me in the right direction?
Code:
<?php
$storyOfTheDay= "Once upon a time there was an old woman who loved baking gingerbread. She would bake gingerbread cookies, cakes, houses and gingerbread people, all decorated with chocolate and peppermint, caramel candies and colored frosting.
She lived with her husband on a farm at the edge of town. The sweet spicy smell of gingerbread brought children skipping and running to see what would be offered that day.
Unfortunately the children gobbled up the treats so fast that the old woman had a hard time keeping her supply of flour and spices to continue making the batches of gingerbread. Sometimes she suspected little hands of having reached through her kitchen window because gingerbread pieces and cookies would disappear.";
$storyOfTheDay = preg_split("/\s+/", $storyOfTheDay);
//Adding <span> to odd array index items
foreach (array_chunk($storyOfTheDay , 2) as $chunk) {
$storyOfTheDay[] = $chunk[0];
if(!empty( $chunk[1]))
{
$storyOfTheDay[] = $chunk[1]= "<span style='color:green'>". $chunk[1] ."</span>";
}
}
$storyOfTheDay = join(" ", $storyOfTheDay);
echo $storyOfTheDay;
Output:
Image of Output
You are continuously filling the same array ($storyOfTheDay). Make the new one:
$storyOfTheDay = preg_split("/\s+/", $storyOfTheDay);
$newStoryOfTheDay = [];
//Adding <span> to odd array index items
foreach (array_chunk($storyOfTheDay , 2) as $chunk) {
$newStoryOfTheDay[] = $chunk[0];
if( !empty($chunk[1]) ){
$newStoryOfTheDay[] = "<span style='color:green'>". $chunk[1] ."</span>";
}
}
$newStoryOfTheDay = join(" ", $newStoryOfTheDay);
echo $newStoryOfTheDay;
Related
Kindly please help regarding Xpath...
Following scripts will scraping the main body of URL by using Xpath
<?php
//sentimen order
if (PHP_SAPI != 'cli') {
echo "<pre>";
}
require_once __DIR__ . '/../autoload.php';
$sentiment = new \PHPInsight\Sentiment();
require_once 'Xpath.php';
$startUrl = "http://news.sky.com/story/1445575/suspect-held-over-shooting-of-ferguson-police/";
$xpath = new XPATH($startUrl);
// We starts from the root element
$query = '/html/body/div[2]/div[3]/article/div/div[2]/div[2]/p[3]';
$strQuery = $xpath->query($query);
$strNode = $strQuery->item(0)->nodeValue;
$result = array($strNode);
foreach ($result as $string) {
// calculations:
$scores = $sentiment->score($string);
$class = $sentiment->categorise($string);
// output:
echo "Strings $string \n";
echo "Dominant: $class, scores: ";
print_r($scores);
echo "\n";
}
Above scripts run well except the array loop...Xpath does not scraping ALL content but ONLY the first line of main body..
I think the problem lies from array loop and foreach...
Anyone please help to fix this looping....
You only fetch one paragraph. Additionally you only put one string into the array.
You're perhaps looking for something more along this lines:
foreach ($xpath->query('
//header/h1
|//header/p
|//header//p[#class="last-updated__text"]
|//div[#class="story__content"]/p') as $p) {
echo string_normalize($p->textContent), "\n\n";
}
function string_normalize($string)
{
return preg_replace('~\s+~u', ' ', trim($string));
}
Output:
Shooting Of Ferguson Police: Suspect Charged
A prosecutor says the 20-year-old suspect claims he fired the shots in a dispute with other individuals and did not aim at police.
05:19, UK, Monday 16 March 2015
By Sky News US Team
A suspect has been charged in connection with the shooting and wounding last week of two police officers in Ferguson, Missouri.
St Louis County prosecutor Robert McCulloch told a news conference the accused was 20-year-old Jeffrey Williams.
He said the suspect, a local resident, was facing two counts of assault in the first degree.
Williams, who was arrested on Saturday night, is also charged with firing a handgun from a vehicle.
"He has acknowledged his participation in firing the shots," Mr McCulloch told reporters.
...
I want to parse Google News rss with PHP. I managed to run this code:
<?
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');
foreach($news->channel->item as $item) {
echo "<strong>" . $item->title . "</strong><br />";
echo strip_tags($item->description) ."<br /><br />";
}
?>
However, I'm unable to solve following problems. For example:
How can i get the hyperlink of the news title?
As each of the Google news has many related news links in footer, (and my code above includes them also). How can I remove those from the description?
How can i get the image of each news also? (Google displays a thumbnail image of each news)
Thanks.
There we go, just what you need for your particular situation:
<?php
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');
$feeds = array();
$i = 0;
foreach ($news->channel->item as $item)
{
preg_match('#src="([^"]+)"#', $item->description, $match);
$parts = explode('<font size="-1">', $item->description);
$feeds[$i]['title'] = (string) $item->title;
$feeds[$i]['link'] = (string) $item->link;
$feeds[$i]['image'] = $match[1];
$feeds[$i]['site_title'] = strip_tags($parts[1]);
$feeds[$i]['story'] = strip_tags($parts[2]);
$i++;
}
echo '<pre>';
print_r($feeds);
echo '</pre>';
?>
And the output should look like this:
[2] => Array
(
[title] => Los Alamos Nuclear Lab Under Siege From Wildfire - ABC News
[link] => http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGxBe4YsZArH0kSwEjq_zDm_h-N4A&url=http://abcnews.go.com/Technology/wireStory?id%3D13951623
[image] => http://nt2.ggpht.com/news/tbn/OhH43xORRwiW1M/6.jpg
[site_title] => ABC News
[story] => A wildfire burning near the desert birthplace of the atomic bomb advanced on the Los Alamos laboratory and thousands of outdoor drums of plutonium-contaminated waste Tuesday as authorities stepped up ...
)
I'd recommend checking out SimplePie. I've used it for several different projects and it works great (and abstracts away all of the headache you're currently dealing with).
Now, if you're writing this code simply because you want to learn how to do it, you should probably ignore this answer. :)
To get the URL for a news item, use $item->link.
If there's a common delimiter for the related news links, you could use regex to cut off everything after it.
Google puts the thumbnail image HTML code inside the description field of the feed. You could regex out everything between the open and close brackets for the image declaration to get the HTML for it.
In a few different guises I've asked about this "filter" on here and WPSE. I'm now taking a different approach to it, and I'd like to make it solid and reliable.
My situation:
When I create a post in my WordPress CMS, I want to run a filter which searches for certain terms and replaces them with links.
I have the terms that I want to search for in two arrays: $glossary_terms and $species_terms.
$species_terms is a list of scientific names of fishes, such as Apistogramma panduro.
$glossary_terms is a list of fishkeeping glossary terms such as abdomen, caudal-fin and Gram's Method.
There are a few nuances worth noting:
Speed is not an issue, as I will be running this filter in the background rather than when a user visits the page or whan an author submits/edits a species profile or post.
Some of the post content being filtered may contain HTML with these terms in, like <img src="image.jpg" title="Apistogramma panduro male" />. Obviously these shouldn't be replaced.
Species are often referred to with an abbreviated Genus, so instead of Apistogramma panduro, you'll often see A. panduro. This means I need to search & replace all of the species terms as an abbreviation too - Apistogramma panduro, A. panduro, Satanoperca daemon, S. daemon etc.
If caudal-fin and caudal both exist in the glossary terms, caudal-fin should be replaced first.
I was contemplating simply adding a preg_replace which searched for the terms, but only with a space on the left, (i.e. ( )term) and a space, comma, exclamation, full-stop or hyphen on the right (i.e. term(, . ! - )) but that won't help me to not break the image HTML.
Example content
<br />
It looks very similar to fishes of the <i>B. foerschi</i> group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.
Instead it appears to be a member of the <i>B. coccina</i> group which currently includes <i>B. brownorum</i>, <i>B. burdigala</i>, <i>B. coccina</i>, <i>B. livida</i>, <i>B. miniopinna</i>, <i>B. persephone</i>, <i>B. tussyae</i>, <i>B. rutilans</i> and <i>B. uberis</i>.
Of these it's most similar in appearance to <i>B. uberis</i> but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.
Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).<br />
^^^ This example here has had the correct links manually inserted. The filter shouldn't break these links!
It looks very similar to fishes of the B. foerschi group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.
Instead it appears to be a member of the B. coccina group which currently includes B. brownorum, B. burdigala, B. coccina, B. livida, B. miniopinna, B. persephone, B. tussyae, B. rutilans and B. uberis.
Of these it's most similar in appearance to B. uberis but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.
Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).
^^^ Same example pre-formatting.
[caption id="attachment_542" align="alignleft" width="125" caption="Amazonas Magazine - now in English!"]<img class="size-thumbnail wp-image-542" title="Amazonas English" src="/wp-content/uploads/2011/12/Amazonas-English-1-288x381.jpg" alt="Amazonas English" width="125" height="165" />[/caption]
Edited by Hans-Georg Evers, the magazine 'Amazonas' has been widely-regarded as among the finest regular publications in the hobby since its launch in 2005, an impressive achievment considering it's only been published in German to date. The long-awaited English version is just about to launch, and we think a subscription should be top of any serious fishkeeper's Xmas list...
The magazine is published in a bi-monthly basis and the English version launches with the January/February 2012 issue with distributors already organised in the United States, Canada, the United Kingdom, South Africa, Australia, and New Zealand. There are also mobile apps availablen which allow digital subscribers to read on portable devices.
It's fair to say that there currently exists no better publication for dedicated hobbyists with each issue featuring cutting-edge articles on fishes, invertebrates, aquatic plants, field trips to tropical destinations plus the latest in husbandry and breeding breakthroughs by expert aquarists, all accompanied by excellent photography throughout.
U.S. residents can subscribe to the printed edition for just $29 USD per year, which also includes a free digital subscription, with the same offer available to Canadian readers for $41 USD or overseas subscribers for $49 USD. Please see the Amazonas website for further information and a sample digital issue!
Alternatively, subscribe directly to the print version here or digital version here.
^^^ This will likely only have a few Glossary terms in rather than any species links.
Example terms
$species_terms
339 => 'Aulonocara maylandi maylandi',
340 => 'Aulonocara maylandi kandeensis',
341 => 'Aulonocara sp. "walteri"',
342 => 'Aulonocara sp. "stuartgranti maleri"',
343 => 'Aulonocara stuartgranti',
344 => 'Benthochromis tricoti',
345 => 'Boulengerochromis microlepis',
346 => 'Buccochromis lepturus',
347 => 'Buccochromis nototaenia',
348 => 'Betta brownorum',
349 => 'Betta foerschi',
350 => 'Betta coccina',
351 => 'Betta uberis'
As you can see above, the general format for these scientific names is "Genus species", but can often include "sp." or "aff." (for species which aren't officially described) and "Genus species subspecies" formats.
$glossary_terms
1 => 'abdomen',
2 => 'caudal',
3 => 'caudal-fin',
4 => 'caudal-fin peduncle',
5 => 'Gram\'s Method'
If anyone can come up with a filter which meets all these conditions and requirements, I'd like to offer a bounty.
Thanks in advance,
I think it's better to use DOMDocument functionality than regexps. Here is a working prototype:
// Each dynamically constructed regexp will contain at most 70 subpatterns
define('GROUPS_PER_REGEXPS', 70);
$speciesTerms = array(
339 => '(?:Aulonocara|A\.) maylandi maylandi',
340 => '(?:Aulonocara|A\.) maylandi kandeensis',
344 => '(?:Benthochromis|B\.) tricoti',
345 => '(?:Boulengerochromis|B\.) microlepis',
);
function matchTerms($text) {
// Globals are not good. I left it for the simplicity
global $speciesTerms;
$result = array();
$t = 0;
$speciesCount = count($speciesTerms);
reset($speciesTerms);
while ($t < $speciesCount) {
// Maps capturing group identifiers to term ids
$termMapping = array();
// Dynamically construct regexp
$groups = '';
$c = 1;
while (list($termId, $termPattern) = each($speciesTerms)) {
if (!empty($groups)) {
$groups .= '|';
}
// Match word boundaries, so we don't capture "B. tricotisomeramblingstring"
$groups .= '(\b' . $termPattern . '\b)';
$termMapping[$c++] = $termId;
if (++$t % GROUPS_PER_REGEXPS == 0) {
break;
}
}
$regexp = "/$groups/m";
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
for ($i = 1; $i < $c; $i++) {
foreach ($matches[$i] as $matchData) {
// matchData[0] holds matched string, e.g. Benthochromis tricoti
// matchData[1] holds offset, e.g. 15
if (isset($matchData[0]) && !empty($matchData[0])) {
$result[] = array(
'text' => $matchData[0],
'offset' => $matchData[1],
'id' => $termMapping[$i],
);
}
}
}
}
// Sort by offset in descending order
usort($result, function($a, $b) {
return $a['offset'] > $b['offset'] ? -1 : 1;
});
return $result;
}
$doc = DOMDocument::loadHTML($html);
// Stack will be used to avoid recursive functions
$stack = new SplStack;
$stack->push($doc);
while (!$stack->isEmpty()) {
$node = $stack->pop();
if ($node->nodeType == XML_TEXT_NODE && $node->parentNode instanceof DOMElement) {
// $node represents text node
// and it's inside a tag (second condition in the statement above)
// Check that this text is not wrapped in <a> tag
// as we don't want to wrap it twice
if ($node->parentNode->tagName != 'a') {
$matches = matchTerms($node->wholeText);
foreach ($matches as $match) {
// Create new link element in the DOM
$link = $doc->createElement('a', $match['text']);
$link->setAttribute('href', 'species/' . $match['id']);
$link->setAttribute('class', 'link_species');
// Save the text after the link
$remainingText = $node->splitText($match['offset'] + strlen($match['text']));
// Save the text before the link
$linkText = $node->splitText($match['offset']);
// Replace $linkText with $link node
// i.e. 'something' becomes 'something'
$node->parentNode->replaceChild($link, $linkText);
}
}
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $childNode) {
$stack->push($childNode);
}
}
}
$body = $doc->getElementsByTagName('body');
echo $doc->saveHTML($body->item(0));
Implementation details
I've only showed how to replace species terms, glossary terms will be same. Links are formed in form "species/$id". Abbreviations are handled correctly. DOMDocument is a very reliable parser, it can deal with broken markup and is fast.
?: in regexp allows not to count this subpattern as a capturing group (documentation on subpatterns). Without proper counting of subpatterns, we can't retrieve the termId. The idea is that we build a big regexp pattern by joining all regexps specified in $speciesTerms array and separating them with a pipe |. Final regexp for the first two species would be (spaces for clarity):
First capturing group Alternation Second capturing group
( (?:Aulonocara|A\.) maylandi maylandi ) | ( (?:Aulonocara|A\.) maylandi kandeensis )
So, the text "Examples: Aulonocara maylandi maylandi, A. maylandi kandeensis" will give following matches:
$matches[1] = array('Aulonocara maylandi maylandi') // Captured by the first group
$matches[2] = array('A. maylandi kandeensis') // Captured by the second group
We can clearly say that all elements in matches[1] are referring to the species Aulonocara maylandi maylandi or A. maylandi maylandi which has id = 339.
In short: Use (?:) if you're using subpatterns in $speciesTerms.
UPDATE
Each dynamically created regexp has a limit on maximal number of subpatterns, which is defined as a const at the top. This allows avoiding PCRE limit on number of subpatterns in regexp.
Important notes:
If you have a lot of terms you should rewrite matchTerms, because regexp has a limit on a number of subpatterns. In this case it's optimal to prebuild array of regexps out of every N terms.
matchTerms generates regexp at every call, obviously it can be done only once
It's possible to use advanced regexps in speciesTerms
strlen => mb_strlen if you're using multibyte encodings
Supplied $html will be wrapped in a <body> tag (unless it's already wrapped)
It would be better to parse the HTML rather than trying to use regular expressions. Regex is good when you have something specific you want to match, but gets quirky when you're trying to NOT match certain things.
Using http://simplehtmldom.sourceforge.net/ :
function addLinks(&$p, $species, $terms) {
// much easier to say "not in an anchor tag" with parsed content than with regex
if ($p->tag != 'a') {
// pull out existing elements so they aren't replaced
$children = array();
$x = 0;
foreach ($p->children as &$e) {
$children[] = $e->outertext;
$e->outertext = '---child-'.$x.'---';
$x++;
}
foreach($species as $s) {
$p->innertext = str_replace(
$s,
''.$s.'',
$p->innertext);
}
foreach($term as $t) {
$p->innertext = str_replace(
$t,
'<a href="glossary/'.
strtolower($t[0]).'/'.
strtolower(str_replace(' ','-',$t)).'">'.$t.'</a>',
$p->innertext);
}
// restore previous child elements
foreach($children as $x => $e) {
$p->innertext = str_replace('---child-'.$x.'---', $e, $p->innertext);
}
foreach ($p->children() as &$e) {
addLinks($e, $species, $terms);
}
}
}
$html = new simple_html_dom();
// you may have to wrap $content in a div. not exactly sure how partial content is handled
$html->load($content);
addLinks($html, $species_terms, $glossary_terms);
$content = $html->save();
I haven't used simple_html_dom a whole lot, but that should get you pointed in the right direction.
I want to parse Google News rss with PHP. I managed to run this code:
<?
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');
foreach($news->channel->item as $item) {
echo "<strong>" . $item->title . "</strong><br />";
echo strip_tags($item->description) ."<br /><br />";
}
?>
However, I'm unable to solve following problems. For example:
How can i get the hyperlink of the news title?
As each of the Google news has many related news links in footer, (and my code above includes them also). How can I remove those from the description?
How can i get the image of each news also? (Google displays a thumbnail image of each news)
Thanks.
There we go, just what you need for your particular situation:
<?php
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');
$feeds = array();
$i = 0;
foreach ($news->channel->item as $item)
{
preg_match('#src="([^"]+)"#', $item->description, $match);
$parts = explode('<font size="-1">', $item->description);
$feeds[$i]['title'] = (string) $item->title;
$feeds[$i]['link'] = (string) $item->link;
$feeds[$i]['image'] = $match[1];
$feeds[$i]['site_title'] = strip_tags($parts[1]);
$feeds[$i]['story'] = strip_tags($parts[2]);
$i++;
}
echo '<pre>';
print_r($feeds);
echo '</pre>';
?>
And the output should look like this:
[2] => Array
(
[title] => Los Alamos Nuclear Lab Under Siege From Wildfire - ABC News
[link] => http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGxBe4YsZArH0kSwEjq_zDm_h-N4A&url=http://abcnews.go.com/Technology/wireStory?id%3D13951623
[image] => http://nt2.ggpht.com/news/tbn/OhH43xORRwiW1M/6.jpg
[site_title] => ABC News
[story] => A wildfire burning near the desert birthplace of the atomic bomb advanced on the Los Alamos laboratory and thousands of outdoor drums of plutonium-contaminated waste Tuesday as authorities stepped up ...
)
I'd recommend checking out SimplePie. I've used it for several different projects and it works great (and abstracts away all of the headache you're currently dealing with).
Now, if you're writing this code simply because you want to learn how to do it, you should probably ignore this answer. :)
To get the URL for a news item, use $item->link.
If there's a common delimiter for the related news links, you could use regex to cut off everything after it.
Google puts the thumbnail image HTML code inside the description field of the feed. You could regex out everything between the open and close brackets for the image declaration to get the HTML for it.
Hey, basically what i am trying to do is automatically assign Tags to a user input string. Now i have 5 tags to be assigned. Each tag will have around 10 keywords. A String can only be assigned one tag. In order to assign tag to string, i need to search for words matching keywords for all the five tags.
Example:
TAGS: Keywords
Drink: Beer, whiskey, drinks, drink, pint, peg.....
Fitness: gym, yoga, massage, exercise......
Apparels: men's shirt, shirt, dress......
Music: classical, western, sing, salsa.....
Food: meal, grilled, baked, delicious.......
User String: Take first step to reach your fitness goals, Pay Rs 199 for Aerobics, Yoga, Kick Boxing, Bollywood Dance and more worth Rs 1000 at The very Premium F Chisel Bounce, Koramangala.
Now i need to decide upon a tag for the above string. I need an time efficient algorithm for this problem. I don't know how to go about matching keywords for strings but i do have a thought about deciding tag. I was thinking to maintain an array count for each tag and as a keyword is matched count for respective tag is increased. if at any time count for any tag reaches 5 we can stop and decide on that tag only this will save us from searching the whole thing.
Please give any advice you have on this. I will be using php just so you know.
thanks
Interesting topic! What you are looking for is something similar to latent semantic indexing. There is questing here.
If the number of tags and keywords is small I would save me writing a complex algorithm and simply do:
$tags = array(
'drink' => array('beer', 'whiskey', ...),
...
);
$string = 'Take first step ...';
$bestTag = '';
$bestTagCount = 0;
foreach ($tags as $tag => $keywords) {
$count = 0;
foreach ($keywords as $keyword) {
$count += substr_count($string, $keyword);
}
if ($count > $bestTagCount) {
$bestTagCount = $count;
$bestTag = $tag;
}
}
var_dump($bestTag);
The algorithm is pretty obvious, but only suited for a small number of tags/keywords.
If you dont mind using an external API, you should try one of these:
http://www.zemanta.com/
http://www.opencalais.com/
Benjamin Nowack: Linked Data Entity Extraction with Zemanta and OpenCalais
To give an example, Zemanta will return the following tags (among other things) for your User String:
Bollywood, Kickboxing, Koramangala, Aerobics, Boxing, Sports, India, Asia
Open Calais will return
Sports, Hospitality Recreation, Health, Recreation, Human behavior, Kick, Yoga, Chisel
Aerobics, Meditation, Indian philosophy, Combat sports, Aerobic exercise, Exercise