Duplicate PHP Code Block - php

I get images from a specific url. with this script im able to display them on my website without any problems. the website i get the images from has more than one (about 200) pages that i need the images from.
I dont want to copy the block of PHP code manually and fill in the page number every time from 1 to 200. Is it possible to do it in one block?
Like: $html = file_get_html('http://example.com/page/1...to...200');
<?php
require_once('simple_html_dom.php');
$html = file_get_html('http://example.com/page/1');
foreach($html->find('img') as $element) {
echo '<img src="'.$element->src.'"/>';
}
$html = file_get_html('http://example.com/page/2');
foreach($html->find('img') as $element) {
echo '<img src="'.$element->src.'"/>';
}
$html = file_get_html('http://example.com/page/3');
foreach($html->find('img') as $element) {
echo '<img src="'.$element->src.'"/>';
}
?>

You can use a for loop like so:
require_once('simple_html_dom.php');
for($i = 1; $i <= 200; $i++){
$html = file_get_html('http://example.com/page/'.$i);
foreach($html->find('img') as $element) {
echo '<img src="'.$element->src.'"/>';
}
}
So now you have one block of code, that will execute 200 times.
It changes the page number by appending the value of $i to the url, and every time the loop completes a round, the value of $i becomes $i + 1.
if you wish to start on a higher page number, you can just change the value of $i = 1 to $i = 2 or any other number, and you can change the 200 to whatever the max is for your case.

There are many good solutions, on of them: try to make a loop from 1 to 200
for($i = 1; $i <= 200; $i++){
$html = file_get_html('http://example.com/page/'.$i);
foreach($html->find('img') as $element) {
echo '<img src="'.$element->src.'"/>';
}
}

<?php
function SendHtml($httpline) {
$html = file_get_html($httpline);
foreach($html->find('img') as $element) {
echo '<img src="'.$element->src.'"/>';
}
}
for ($x = 1; $x <= 200; $x++) {
$httpline="http://example.com/page/";
$httpline.=$x;
SendHtml($httpline);
}
?>
Just loop. Create a sending function and loop to make the calls.
I recommend you to read all php docu in https://www.w3schools.com/php/default.asp

First, store them in a database. You can(/should) download the images to your own server, or also store the uri to the image. You can use code like FMashiro's for that, or something similar, but opening 200 pages and parsing their HTML takes forever. Every pageview.
And then you simply use the LIMIT functionallity in queries to create pages yourself.
I recommend this method anyways, as this will be MUCH faster than parsing html every time someone opens this page. And you'll have sorting options and other pro's a database gives you.

Related

How to fix timeout on file_get_html?

I'm trying to use file_get_html on a web page to find images (and their URLs) on the page.
<?php
include('simplehtmldom_1_7/simple_html_dom.php');
$html = file_get_html('https://www.mrporter.com/en-us/mens/givenchy/jaw-neoprene--suede--leather-and-mesh-sneakers/1093998');
foreach($html->find('img') as $e)
$img_url_array[] = $e->src . '<br>';
$array_size = sizeof($img_url_array);
$x =0;
while($x <$array_size){
echo "image url is " . $img_url_array[$x] ;
$x =$x+1;
}
?>
The script keeps on loading and doesn't pause. Is there a way to set a timeout or an exception?

Fetch rss with php - Conditional for Enclosured image and not Enclosured

I'm working on a project and it's something new for me. I'll need to fetch rss content from websites, and display Descripion, Title and Images (Thumbnails). Right now i've noticed that some feeds show thumbnails as Enclosure tag and some others dont. right now i have the code for both, but i need to understand how i can create a conditional like:
If the rss returns enclosure image { Do something }
Else { get the common thumb }
Here follow the code that grab the images:
ENCLOSURE TAG IMAGE:
if ($enclosure = $block->get_enclosure())
{
echo "<img src=\"" . $enclosure->get_link() . "\">";
}
NOT ENCLOSURE:
if ($enclosure = $block->get_enclosure())
{
echo '<img src="'.$enclosure->get_thumbnail().'" title="'.$block->get_title().'" width="200" height="200">';
}
=================================================================================================
PS: If we look at both codes they're almost the same, the difference are get_thumbnail and get_link.
Is there a way i can create a conditional to use the correct code and always shows the thumbnail?
Thanks everyone in advance!
EDITED
Here is the full code i have right now:
include_once(ABSPATH . WPINC . '/feed.php');
if(function_exists('fetch_feed')) {
$feed = fetch_feed('http://feeds.bbci.co.uk/news/world/africa/rss.xml'); // this is the external website's RSS feed URL
if (!is_wp_error($feed)) : $feed->init();
$feed->set_output_encoding('UTF-8'); // this is the encoding parameter, and can be left unchanged in almost every case
$feed->handle_content_type(); // this double-checks the encoding type
$feed->set_cache_duration(21600); // 21,600 seconds is six hours
$feed->handle_content_type();
$limit = $feed->get_item_quantity(18); // fetches the 18 most recent RSS feed stories
$items = $feed->get_items(0, $limit); // this sets the limit and array for parsing the feed
endif;
}
$blocks = array_slice($items, 0, 3); // Items zero through six will be displayed here
foreach ($blocks as $block) {
//echo $block->get_date("m d Y");
echo '<div class="single">';
if ($enclosure = $block->get_enclosure())
{
echo '<img class="image_post" src="'.$enclosure->get_link().'" title="'.$block->get_title().'" width="150" height="100">';
}
echo '<div class="description">';
echo '<h3>'. $block->get_title() .'</h3>';
echo '<p>'.$block->get_description().'</p>';
echo '</div>';
echo '<div class="clear"></div>';
echo '</div>';
}
And here are the XML pieces with 2 different tags for images:
Using Thumbnails: view-source:http://feeds.bbci.co.uk/news/world/africa/rss.xml
Using Enclosure: http://feeds.news24.com/articles/news24/SouthAfrica/rss
Is there a way i can create a conditional to use the correct code and always shows the thumbnail?
Sure there is. You've not said in your question what blocks you so I have to assume the reason, but I can imagine multiple.
Is the reason a decisions with more than two alternations?
You handle the scenario of a feed item having no image or an image already:
if ($enclosure = $block->get_enclosure())
{
echo '<img class="image_post" src="'.$enclosure->get_link().'" title="'.$block->get_title().'" width="150" height="100">';
}
With your current scenario there is only one additional alternation which makes it three: if the enclosure is a thumbnail and not a link:
No image (no enclosure)
Image from link (enclosure with link)
Image from thumbnail (enclosure with thumbnail)
And you then don't know how to create a decision of that. This is what basically else-if is for:
if (!$enclosure = $block->get_enclosure())
{
echo "no enclosure: ", "-/-", "\n";
} elseif ($enclosure->get_link()) {
echo "enclosure link: ", $enclosure->get_link(), "\n";
} elseif ($enclosure->get_thumbnail()) {
echo "enclosure thumbnail: ", $enclosure->get_thumbnail(), "\n";
}
This is basically then doing the output based on that. However if you assign the image URL to a variable, you can decide on the output later on:
$image = NULL;
if (!$enclosure = $block->get_enclosure())
{
// nothing to do
} elseif ($enclosure->get_link()) {
$image = $enclosure->get_link();
} elseif ($enclosure->get_thumbnail()) {
$image = $enclosure->get_thumbnail();
}
if (isset($image)) {
// display image
}
And if you then move this more or less complex decision into a function of it's own, it will become even better to read:
$image = feed_item_get_image($block);
if (isset($image)) {
// display image
}
This works quite well until the decision becomes even more complex, but this would go out of scope for an answer on Stackoverflow.

PHP include() wiping rest of HTML document?

So I have a simple html page that looks like this.
<html>
<head>
<?php include("scripts/header.php"); ?>
<title>Directory</title>
</head>
<body>
<?php include("scripts/navbar.php"); ?>
<div id="phd">
<span id="ph">DIRECTORY</span>
<div id="dir">
<?php include("scripts/autodir.php"); ?>
</div>
</div>
<!--Footer Below-->
<?php include("scripts/footer.php"); ?>
<!--End Footer-->
</body>
</html>
Now, the problem is, when I load the page, it's all sorts of messed up. Viewing the page source code reveals that everything after <div id="dir"> is COMPLETELY GONE. The file ends there. There is no included script, no </div>'s, footer, or even </body>, </html>. But it's not spitting out any errors whatsoever. Just erasing the document from the include onward without any reason myself or my buddies can figure out. None of us have ever experienced this kind of strange behavior.
The script being called in question is a script that will fetch picture files from the server (that I've uploaded, not users) and spit out links to the appropriate page in the archive automatically upon page load because having to edit the Directory page every time I upload a new image is a real hassle.
The code in question is below:
<?php
//Define how many pages in each chapter.
//And define all the chapters like this.
//const CHAPTER_1 = 13; etc.
const CHAPTER_1 = 2; //2 for test purposes only.
//+-------------------------------------------------------+//
//| DON'T EDIT BELOW THIS LINE!!! |//
//+-------------------------------------------------------+//
//Defining this function for later. Thanks to an anon on php.net for this!
//This will allow me to get the constants with the $prefix prefix. In this
//case all the chapters will be defined with "CHAPTER_x" so using the prefix
//'CHAPTER' in the function will return all the chapter constants ONLY.
function returnConstants ($prefix) {
foreach (get_defined_constants() as $key=>$value) {
if (substr($key,0,strlen($prefix))==$prefix) {
$dump[$key] = $value;
}
}
if(empty($dump)) {
return "Error: No Constants found with prefix '" . $prefix . "'";
}
else {
return $dump;
}
}
//---------------------------------------------------------//
$archiveDir = "public_html/archive";
$files = array_diff(scandir($archiveDir), array("..", "."));
//This SHOULD populate the array in order, for example:
//$files[0]='20131125.png', $files[1]='20131126.png', etc.
//---------------------------------------------------------//
$pages = array();
foreach ($files as $file) {
//This parses through the files and takes only .png files to put in $pages.
$parts = pathinfo($file);
if ($parts['extension'] == "png") {
$pages[] = $file;
}
unset($parts);
}
//Now that we have our pages, let's assign the links to them.
$totalPages = count($pages);
$pageNums = array();
foreach ($pages as $page) {
//This will be used to populate the page numbers for the links.
//e.g. "<a href='archive.php?p=$pageNum'></a>"
for($i=1; $i<=$totalPages; $i++) {
$pageNums[] = $i;
}
//This SHOULD set the $pageNum array to be something like:
//$pageNum[0] = 1, $pageNum[1] = 2, etc.
}
$linkText = array();
$archiveLinks = array();
foreach ($pageNums as $pageNum) {
//This is going to cycle through each page number and
//check how to display them.
if ($totalPages < 10) {
$linkText[] = $pageNum;
}
elseif ($totalPages < 100) {
$linkText[] = "0" . $pageNum;
}
else {
$linkText[] = "00" . $pageNum;
}
}
//So, now we have the page numbers and the link text.
//Let's plug everything into a link array.
for ($i=0; $i<$totalPages; $i++) {
$archiveLinks[] = "<a href='archive.php?p=" . $pageNums[$i] . "'>" . $linkText[$i] . " " . "</a>";
//Should output: <a href= 'archive.php?p=1'>01 </a>
//as an example, of course.
}
//And now for the fun part. Let's take the links and display them.
//Making sure to automatically assign the pages to their respective chapters!
//I've tested the below using given values (instead of fetching stuff)
//and it worked fine. So I doubt this is causing it, but I kept it just in case.
$rawChapters = returnConstants('CHAPTER');
$chapters = array_values($rawChapters);
$totalChapters = count($chapters);
$chapterTitles = array();
for ($i=1; $i<=$totalChapters; $i++) {
$chapterTitles[] = "<h4>Chapter " . $i . ":</h4><p>";
echo $chapterTitles[($i-1)];
for ($j=1; $j<=$chapters[($i-1)]; $j++) {
echo array_shift($archiveLinks[($j-1)]);
}
echo "</p>"; //added to test if this was causing the deletion
}
?>
What is causing the remainder of the document to vanish like that? EDIT: Two silly syntax errors were causing this, and have been fixed in the above code! However, the links aren't being displayed at all? Please note that I am pretty new to php and I do not expect my code to be the most efficient (I just want the darn thing to work!).
Addendum: if you deem to rewrite the code (instead of simply fixing error(s)) to be the preferred course of action, please do explain what the code is doing, as I do not like using code I do not understand. Thanks!
Without having access to any of the rest of the code or data-structures I can see 2 syntax errors...
Line 45:
foreach ($pages = $page) {
Should be:
foreach ($pages as $page) {
Line 88:
echo array_shift($archiveLinks[($j-1)];
Is missing a bracket:
echo array_shift($archiveLinks[($j-1)]);
Important...
In order to ensure that you can find these kinds of errors yourself, you need to ensure that the error reporting is switched on to a level that means these get shown to you, or learn where your logs are and how to read them.
See the documentation on php.net here:
http://php.net/manual/en/function.error-reporting.php
IMO all development servers should have the highest level of error reporting switched on by default so that you never miss an error, warning or notice. It just makes your job a whole lot easier.
Documentation on setting up at runtime can be found here:
http://www.php.net/manual/en/errorfunc.configuration.php#ini.display-errors
There is an error in scripts/autodir.php this file. Everything up to that point works fine, so this is where the problem starts.
Also you mostlikely have errors hidden as Chen Asraf mentioned, so turn on the errors:
error_reporting(E_ALL);
ini_set('display_errors', '1');
Just put that at the top of the php file.

PHP Simple HTML DOM Scrape External URL

I'm trying to build a personal project of mine, however I'm a bit stuck when using the Simple HTML DOM class.
What I'd like to do is scrape a website and retrieve all the content, and it's inner html, that matches a certain class.
My code so far is:
<?php
error_reporting(E_ALL);
include_once("simple_html_dom.php");
//use curl to get html content
$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
//Get all data inside the <div class="item-list">
foreach($html->find('div[class=item-list]') as $div) {
//get all div's inside "item-list"
foreach($div->find('div') as $d) {
//get the inner HTML
$data = $d->outertext;
}
}
print_r($data)
echo "END";
?>
All I get with this is a blank page with "END", nothing else outputted at all.
It seems your $data variable is being assigned a different value on each iteration. Try this instead:
$data = "";
foreach($html->find('div[class=item-list]') as $div) {
//get all divs inside "item-list"
foreach($div->find('div') as $d) {
//get the inner HTML
$data .= $d->outertext;
}
}
print_r($data)
I hope that helps.
I think, you may want something like this
$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
foreach ($html->find('div.item-list div.item') as $div) {
echo $div . '<br />';
};
This will give you something like this (if you add the proper style sheet, it'll be displayed nicely)

PHP - Multi Curl - scraping for data/content

I have started building a single Curl session with - curl, dom, xpath, and it worked great.
I am now building a scraper based off curl for taking data off multiple sites in one flow, and the script is echo'ing the single phrase i put in.. but it does not pick up variables.
do{
$n=curl_multi_exec($mh, $active);
}while ($active);
foreach ($urls as $i => $url){
$res[$i]=curl_multi_getcontent($conn[$i]);
echo ('<br />success');
}
So this does echo the success-text as many times as there are urls.. but really this is not what i want.. I want to break up the html like i could with the single curl session..
What i did in the single curl session:
//parse the html into a DOMDocument
$dom = new DOMDocument();
#$dom->loadHTML($res);
// grab all the on the page
$xpath = new DOMXPath($dom);
$product_img = $xpath->query("//div[#id='MAIN']//a");
for ($i = 0; i < $product_img->length; $i++){
$href = $product_img->item($i);
$url = $href->getAttribute('href');
echo "<br />Link : $url";
}
This dom parsing / xpath is working for the single session curl, but not when i run the multicurl.
On Multicurl i can do curl_multi_getcontent for the URL from the session, but this is not want..
I would like to get the same content as i picked up with Dom / Xpath in the single session.
What can i do ?
EDIT
It seems i am having problems with the getAttribute. It is a link for an image i am having trouble grabbing. The link is showing when scraping, but then it throws an error :
Fatal error: Call to a member function getAttribute() on a non-object in
The query:
// grab all the on the page
$xpath = new DOMXPath($dom);
$product_img = $xpath->query("//img[#class='product']");
$product_name = $xpath->query("//img[#class='product']");
This is working:
for ($i = 0; i < $product_name->length; $i++) {
$prod_name = $product_name->item($i);
$name = $prod_name->getAttribute('alt');
echo "<br />Link stored: $name";
}
This is not working:
for ($i = 0; i < $product_img->length; $i++) {
$href = $product_img->item($i);
$pic_link = $href->getAttribute('src');
echo "<br />Link stored: $pic_link";
}
Any idea of what i am doing wrong ?
Thanks in advance.
For some odd reason, it is only that one src that won't work right.
This question can be considered "solved".

Categories