Printing out an array to a file - php

I'm stuck on particular task. As you can see I'm extracting hrefs and title from webpage and I need to put this information to a file. But how this array can be printed in order like this: href1 : title1 , href2 : title2 and so on.
<?php
$searched = file_get_contents('http://technologijos.lt');
$xml = new DOMDocument();
#$xml->loadHTML($searched);
foreach($xml->getElementsByTagName('a') as $lnk)
{
$links[] = array(
'href' => $lnk->getAttribute('href'),
'title' => $lnk->getAttribute('title')
);
}
echo '<pre>'; print_r($links); echo '</pre>';
?>

Why not create the array directly in a way that is usable afterwards?
<?php
$searched = file_get_contents('http://technologijos.lt');
$xml = new DOMDocument();
#$xml->loadHTML($searched);
$links = [];
foreach($xml->getElementsByTagName('a') as $lnk) {
$links[] = sprintf(
'%s : %s',
$lnk->getAttribute('href'),
$lnk->getAttribute('title');
);
}
var_dump(implode(', ', $links);
Obviously the same can be done by using a second loop to iterate over the links array if it is create as shown in your example.

Related

Array filter in PHP

I am using a simple html dom to parsing html file.
I have a dynamic array called links2, it can be empty or maybe have 4 elements inside or more depending on the case
<?php
include('simple_html_dom.php');
$url = 'http://www.example.com/';
$html = file_get_html($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
//////////////////////////////////////////////////////////////////////////////
foreach ($doc->getElementsByTagName('p') as $link)
{
$intro2 = $link->nodeValue;
$links2[] = array(
'value' => $link->textContent,
);
$su=count($links2);
}
$word = 'document.write(';
Assuming that the two elements contain $word in "array links2", when I try to filter this "array links2" by removing elements contains matches
unset( $links2[array_search($word, $links2 )] );
print_r($links2);
the filter removes only one element and array_diff doesn't solve the problem. Any suggestion?
solved by adding an exception
foreach ($doc->getElementsByTagName('p') as $link)
{
$dont = $link->textContent;
if (strpos($dont, 'document') === false) {
$links2[] = array(
'value' => $link->textContent,
);
}
$su=count($links2);
echo $su;

extract title xpath if code xpath is present

i am trying to extract coupon codes and if the code is present then get the corresponding title too but unable to do so.
in the code below i am able to extract the coupon codes correctly but how do i get the corresponding title to be extracted oo. as you can see in the link some titles don't have coupon codes...
<?php
$html = file_get_contents('http://www.grabon.in/abof-coupons/'); //get the html returned from the following url
$mydoc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$mydoc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$my_xpath = new DOMXPath($mydoc);
//get all the codes
$my_code = $my_xpath->query('//*[#class="coupon-click"]//a//small');
if($my_code->length > 0){
foreach($my_code as $row){
$my_row = $my_xpath->query('//*[#class="h3_click"]');
echo $code->nodeValue . "<br/>";
}
}
}
?>
thanx fusion3k the code works perfectly but using ur code i tried for different url as below and get the error Notice: Trying to get property of non-object
<?php
$html = file_get_contents('http://official.deals/ebay-coupons?coupon-id=1055981&h=ed68f1b2a5b28471ecf9584734d65742&utm_source=coupon_page&utm_medium=deal_reveal&utm_campaign=od_deal_click#ebay1055981'); //get the html returned from the following url
$mydoc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(empty($html)) die("EMPTY HTML");
$mydoc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$my_xpath = new DOMXPath($mydoc);
//////////////////////////////////////////////////////
$result = array();
$nodes = $my_xpath->query( '//div[#data-rowtype="1"]' );
foreach( $nodes as $node )
{
$title = $my_xpath->query( 'div[#class="cop-head"]/h4', $node )->item(0)->nodeValue;
$found = $my_xpath->query( 'div[#class="cop-head"]/div/input/value', $node );
$coupon = ( $found->length ) ? $found->item(0)->nodeValue : '' ;
$result[] = compact( 'title', 'coupon' );
}
echo '<pre>';
print_r($result);
echo '</pre>';
?>
If you want retrieve also boxes without coupon, you have to proceed in a different way: retrieve all boxes and, for each box, find if a coupon code exists.
Init an array to store results:
$result = array();
Search for boxes ( <li> nodes with class “coupon-list-item ” ):
$nodes = $my_xpath->query( '//li[#class="coupon-list-item "]' );
# ↑ pay attention!
Then analyze each node through a foreach loop:
foreach( $nodes as $node )
{
Match titles:
$title = $my_xpath->query( 'div/b[#class="h3_click"]', $node )->item(0)->nodeValue;
# ┊ ┊
# No starting slashes, pattern is node-relative ┊
# Second optional xpath->query parameter define the search context
Then search for coupons, if it exists:
$found = $my_xpath->query( 'div[#class="coupon-actions"]/div/a/small', $node );
$coupon = ( $found->length ) ? $found->item(0)->nodeValue : '' ;
At the end, you can add a sub-array to $result using the fabulous downplayed compact function:
$result[] = compact( 'title', 'coupon' );
}
If you want, you can also add related coupons in similar way:
$nodes = $my_xpath->query( '//div[#class="related-coupons"]/*/div[#class="col-sm-8"]' );
foreach( $nodes as $node )
{
$title = $my_xpath->query( 'div/div[#class="coupon-title"]', $node )->item(0)->nodeValue;
$found = $my_xpath->query( 'div/div[#class="coupon-click"]/a/small', $node );
$coupon = ( $found->length ) ? $found->item(0)->nodeValue : '' ;
$result[] = compact( 'title', 'coupon' );
}
At the end, $result looks like this:
Array
(
[0] => Array
(
[title] => Upto 80% OFF + Extra Rs. 500 OFF On Rs 1495 - All Users
[coupon] => ABOFBMF500C
)
(...)
[14] => Array
(
[title] => Fresh Arrivals on Women & Men Collection
[coupon] =>
)
(...)
)
phpFiddle demo

PHP Simple HTML DOM Parser: Get all posts

I'd like to get all articles from the webpage, as well as get all pictures for the each article.
I decided to use PHP Simple HTML DOM Parse and I used the following code:
<?php
include("simple_html_dom.php");
$sitesToCheck = array(
array(
'url' => 'http://googleblog.blogspot.ru/',
'search_element' => 'h2.title a',
'get_element' => 'div.post-content'
),
array(
// 'url' => '', // Site address with a list of of articles
// 'search_element' => '', // Link of Article on the site
// 'get_element' => '' // desired content
)
);
$s = microtime(true);
foreach($sitesToCheck as $site)
{
$html = file_get_html($site['url']);
foreach($html->find($site['search_element']) as $link)
{
$content = '';
$savePath = 'cachedPages/'.md5($site['url']).'/';
$fileName = md5($link->href);
if ( ! file_exists($savePath.$fileName))
{
$post_for_scan = file_get_html($link->href);
foreach($post_for_scan->find($site["get_element"]) as $element)
{
$content .= $element->plaintext . PHP_EOL;
}
if ( ! file_exists($savePath) && ! mkdir($savePath, 0, true))
{
die('Unable to create directory ...');
}
file_put_contents($savePath.$fileName, $content);
}
}
}
$e = microtime(true);
echo $e-$s;
I will try to get only articles without pictures. But I get the response from the server
"Maximum execution time of 120 seconds exceeded"
.
What I'm doing wrong? Is there any other way to get all the articles and all pictures for each article for a specific webpage?
I had similar problems with that lib. Use PHP's DOMDocument instead:
$doc = new DOMDocument;
$doc->loadHTML($html);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
doSomethingWith($link->getAttribute('href'), $link->nodeValue);
}
See http://www.php.net/manual/en/domdocument.getelementsbytagname.php

Loop is running twice while parsing xml in php

I am not very sure why my inner loop data is added to the external loop data-
XML I am parsing - http://pastebin.com/vGc5NhXr
Code I am using -
<?php
$dom = new DomDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load('course/Golf/imsmanifest.xml');
// get the resources element
$organization = $dom->getElementsByTagName( "item" );
echo '<ul>';
foreach( $organization as $organizationItem )
{
$unitTitle = $organizationItem->getElementsByTagName("title");
$unitName = $unitTitle->item(0)->nodeValue;
echo '<li>',$unitName,'</li>';
echo '<ul>';
$item1 = $organizationItem->getElementsByTagName( "item" );
foreach( $item1 as $myitem ) {
$title = $myitem->getElementsByTagName("title");
$author = $title->item(0)->nodeValue;
echo '<li>',$author,'</li>';
}
echo '</ul>';
}
echo '</ul>';
Generated output - http://codepad.org/J2vP71rd
Expected Output - http://codepad.org/uzUtehgT
Let me know what I am doing wrong with the for each loop.
Because the item elements are nested. $dom->getElementsByTagName( "item" ) gets all the item elements, including those lie within another item. That's not what you want.
I'd suggest using XPath for this kind of job.

creating multidimensional array with two arrays

I am indexing web pages. The code scans the web pages for links and the web page that is given's title. The links and title are stored in two different arrays. I would like to create a multidimensional array that has the word Array, followed by the links, followed by the individual titles of the links. I have the code, I just don't know how to put it together.
require_once('simplehtmldom_1_5/simple_html_dom.php');
require_once('url_to_absolute/url_to_absolute.php');
//links
$links = Array();
$URL = 'http://www.youtube.com'; // change it for urls to grab
// grabs the urls from URL
$file = file_get_html($URL);
foreach ($file->find('a') as $theelement) {
$links[] = url_to_absolute($URL, $theelement->href);
}
print_r($links);
//titles
$titles = Array();
$str = file_get_contents($URL);
$titles[] = preg_match_all( "/\<title\>(.*)\<\/title\>/", $str, $title );
print_r($title[1]);
You should be able to do this, assuming there are the same amount of links as there are titles, then they should correspond to the same array key.
$newArray = array();
foreach ($links as $key=>$val)
{
$newArray[$key]['link'] = $val;
$newArray[$key]['title'] = $titles[$key];
}
It is not clear what you want.
Anyway, here is how I would rewrite your code in a more organized way:
require_once('simplehtmldom_1_5/simple_html_dom.php');
require_once('url_to_absolute/url_to_absolute.php');
$info = array();
$urls = array(
'http://www.youtube.com',
'http://www.google.com.br'
);
foreach ($urls as $url)
{
$str = file_get_contents($url);
$html = str_get_html($str);
$title = strval($html->find('title')->plaintext);
$links = array();
foreach($html->find(a) as $anchor)
{
$links[] = url_to_absolute($url, strval($anchor->href));
}
$links = array_unique($links);
$info[$url] = array(
'title' => $title,
'links' => $links
);
}
print_r($info);

Categories