I am trying to get a list of files from a directory thorough PHP.
I also tried via glob, but doesn't work with HTTP, tried recursively and this is the latest script I managed to found. Just that it doesn't work. it doesn't display the files.
<?php
$url = 'removed for security puposes';
$html = file_get_contents($url);
$count = preg_match_all('/<td><a href="([^"]+)">[^<]*<\/a><\/td>/i', $html, $files);
for ($i = 0; $i < $count; ++$i) {
echo "File: " . $files[1][$i] . "<br />\n";
}
var_dump($files);
?>
The var_dump($files); is output
array(2) {
[0]=> array(0) {
}
[1]=> array(0)
{ }
}
So what am I mistaking.
on your page are lists, not tables
<?php
$url = 'http://www.seoadsem.com/opencart';
$html = file_get_contents($url);
$count = preg_match_all('/<li><a href="([^"]+)">[^<]*<\/a><\/li>/i', $html, $files);
for ($i = 0; $i < $count; ++$i) {
echo "File: " . $files[1][$i] . "<br />\n";
}
var_dump($files);
?>
For security reasons, file_get_contents might not be working for URLs, only files. Please use cURL instead. This may save you a lot of debugging time.
See PHP cURL vs file_get_contents.
<?php
$url = 'removed for security puposes';
$html = file_get_contents($url);
$count = preg_match_all('/<a href="([^"]+)(png|jpg|mp4|\/)">[^<]*<\/a>/i', $html, $files);
for ($i = 0; $i < $count; ++$i) {
echo "File: " . $files[1][$i] . $files[2][$i] . "<br />\n";
}
var_dump($files);
?>
png, jpg, mp4 can be replaced by extensions you need.
Related
I am trying to loop through a list of URLs and save content from a div tag to a text file.
<?php
$file = 'content.txt';
$i = 406;
for($i; $i <= 1410; $i++) {
$url = 'http://example.com/chapter/chapter-'.$i;
$content = file_get_contents($url);
$start_tag = explode( '<div class="textdiv">' , $content );
$end_tag = explode("</div>" , $start_tag[1] );
$result_text = $second_step[0];
echo $result_text;
$result = file_put_contents($file, $result_text);
}
?>
The first problem is that there are multiple occurrences of the div tag with that class and I want to get every div with that class and the current code just outputs first occurrence.
[EDIT]
Thanks to The Alpha's help for pointing me to right direction, This worked for me:
<?php
include_once('simple_html_dom.php');
$i = 399;
$file = 'content.txt';
for($i; $i < 1400; $i++){
$url = 'http://example.com/chapter/chapter-'.$i;
$html = file_get_html($url);
foreach ($html->find('div.textdiv') as $div) {
echo $div . '<br />';
$result = file_put_contents($file, $div );
}
echo '<hr><br /><h1>Chapter '. $i .'</h1><br /><hr>';
}
?>
One Issue was it takes very very long time for the script to run.
i have code which from given path makes something like breadcrumb navigation. I didnt wrote that code, but i think i understand it except two things in url replacement.
Code is like this, and i cant figure out, why replace space with _-_-- and slash with _-_-_. Is there some common point in this?
function listLinksRecursive($path){
echo "<br>recursive : " . $path;
$path = str_replace("X:/","",$path);
$path = str_replace("X:","",$path);
$header = str_replace("_"," ", $path);
echo "<br>first header ". $header;
$header = str_replace("-"," ", $header);
echo "<br>second header ". $header;
echo "<br>";
$linksarray = explode ("/",$header);
$linksarrayreal = explode("/",str_replace(" ","_-_--",$path));
var_dump($linksarray);
echo "<br>";
var_dump($linksarrayreal);
echo "FOLDER: ";
echo ("<a href='http://sjabcz-vyv-bck/cae/index.php?lvl=0&idpath=' rel='nofollow'>04_Knowledge-base</a> / ");
for ($i=0; $i < count($linksarray); $i++){
$linkpath = "";
for ($j = 0; $j <=$i; $j++){
$linkpath = $linkpath . $linksarrayreal[$j];
if ($j < $i){
$linkpath = $linkpath . "_-_-_";
}
}
if ($i < count($linksarray)-1){
echo ("<a href = 'http://sjabcz-vyv-bck/cae/index.php?lvl=0&idpath=$linkpath' rel='nofollow'>$linksarray[$i]</a> / ");
}
if ($i == count($linksarray)-1){
echo ("<b>$linksarray[$i]</b>");
}
}
echo "<p>";
}
I have seen the ZipArchive class in PHP which lets you read zip files. But I'm wondering if there is a way to iterate though its content without extracting the file first
As found as a comment on http://www.php.net/ziparchive:
The following code can be used to get a list of all the file names in
a zip file.
<?php
$za = new ZipArchive();
$za->open('theZip.zip');
for( $i = 0; $i < $za->numFiles; $i++ ){
$stat = $za->statIndex( $i );
print_r( basename( $stat['name'] ) . PHP_EOL );
}
?>
http://www.php.net/manual/en/function.zip-entry-read.php
<?php
$zip = zip_open("test.zip");
if (is_resource($zip))
{
while ($zip_entry = zip_read($zip))
{
echo "<p>";
echo "Name: " . zip_entry_name($zip_entry) . "<br />";
if (zip_entry_open($zip, $zip_entry))
{
echo "File Contents:<br/>";
$contents = zip_entry_read($zip_entry);
echo "$contents<br />";
zip_entry_close($zip_entry);
}
echo "</p>";
}
zip_close($zip);
}
?>
I solved the problem like this.
$zip = new \ZipArchive();
$zip->open(storage_path('app/'.$request->vrfile));
$name = '';
//looped through the zip files and got each index name of the files
//since I only wanted the first name which is the folder name I break the loop
//after updating the variable $name with the index name and that's it
for( $i = 0; $i < $zip->numFiles; $i++ ){
$filename = $zip->getNameIndex($i);
var_dump($filename);
$name = $filename;
if ($i == 1){
break;
}
}
var_dump($name);
Repeated Question. Search before posting.
PHP library that can list contents of zip / rar files
<?php
$rar_file = rar_open('example.rar') or die("Can't open Rar archive");
$entries = rar_list($rar_file);
foreach ($entries as $entry) {
echo 'Filename: ' . $entry->getName() . "\n";
echo 'Packed size: ' . $entry->getPackedSize() . "\n";
echo 'Unpacked size: ' . $entry->getUnpackedSize() . "\n";
$entry->extract('/dir/extract/to/');
}
rar_close($rar_file);
?>
It's gonna be hard for me to explain the whole situation, but I'll try...
I made a script for my image host that unZips a Zip package with images in it to a certain location, renames the files to a random file name and outputs multiple links to the images. The last part is not working properly! I am unable to output multiple links to the images - It simply outputs one link to the image (the first one) and the rest is in the uploaded folder, but not listed as a link.
Same goes with generating a thumbnail for the just renamed images. Only one thumbnail is generated for the first image, and the rest of the images if being ignored.
This is how my code looks like:
<?php
session_start();
include('includes/imgit.class.php');
$IMGit = new imgit();
/**
* #ignore
*/
if (!defined('IN_IMGIT'))
{
exit;
}
$IMGit->error_report(true);
$IMGit->disable(false);
$IMGit->ieNote(true);
if (isset($_POST['zipsent']) || $_POST['zipsent'] == true && isset($_FILES['archive']))
{
if ($_FILES['archive']['size'] <= MAX_ZIPSIZE)
{
// Main variables
$key = $IMGit->random_key(10);
$move_zip = move_uploaded_file($_FILES['archive']['tmp_name'], ZIP_PATH . $key . $_FILES['archive']['name']);
$zip = ZIP_PATH . $key . $_FILES['archive']['name'];
$extension = substr($zip, -3);
$filename = $IMGit->zipContent($zip); // array
$url = str_replace('www.', '', $IMGit->generate_site_url());
// ZIP limit is 100 images
if (sizeof($filename) <= 100)
{
// Only ZIP archives
if ($extension == 'zip')
{
if ($filename)
{
foreach($filename as $key => $value)
{
// Get extension
$image_extension = substr($value, -3);
$image_extension = (strtoupper($image_extension)) ? strtolower($image_extension) : $image_extension;
$image_extRule = $image_extension == JPG || $image_extension == JPEG || $image_extension == GIF || $image_extension == PNG ||
$image_extension == BMP || $image_extension == ICO;
if ($image_extRule)
{
// Set variables and do some processing
$unZip = $IMGit->unZip($zip, IMAGES_PATH);
$url = str_replace('www.', '', $IMGit->generate_site_url());
$image_name = $IMGit->random_key(7) . $value;
$image_name = (strpos($image_name, ' ') !== false) ? str_replace(' ', '', $image_name) : $image_name;
if (file_exists(IMAGES_PATH . $filename[$key]))
{
// Rename extracted files
$rename = rename(IMAGES_PATH . $filename[$key], IMAGES_PATH . $image_name);
if ($rename && file_exists($zip) && sizeof($image_name))
{
// Delete ZIP
unlink($zip);
// Set URL variables
$image_urls = $url . IMAGES_PATH . $image_name;
$image = IMAGES_PATH . $image_name;
// Generate a thumbnail
$IMGit->generate_thumbnail($image_urls, $image_name, THUMBNAIL_SIZE, THUMBNAIL_SIZE, true, 'file', false, false, THUMBS_PATH);
$thumb_urls = $url . THUMBS_PATH . $image_name;
$filename[] = array('direct' => $image_urls, 'thumb' => $thumb_urls);
}
}
}
}
}
}
}
}
}
else
{
header('Location: index.php');
}
include('includes/header.php');
{
if ($_FILES['archive']['size'] > MAX_ZIPSIZE) { echo '<span id="home-info">The ZIP archive is bigger than 100 MB.</span>'; }
else if ($extension != 'zip') { echo '<span id="home-info">Only ZIP archives are upload ready.</span>'; }
else if (sizeof($filename) > 100) { echo '<span id="home-info">The number of the images inside the archive was more than 100.</span>'; }
else if (!$image_extRule) { echo '<span id="home-info">The extensions inside the ZIP did not match our allowed extension list.</span>'; unlink($zip); } // unlink zip if failed
else { echo '<span id="home-info">Image(s) was/were successfully uploaded!</span>'; }
}
?>
</div>
<br /><br /><br />
<img src="css/images/site-logo.jpg" id="logo" />
<br /><br /><br /><br /><br />
</div>
<div id="box">
<?php
global $filename, $image_urls, $thumb_urls;
echo '<br />';
echo '<div id="links">';
echo '<table>';
echo LINKS_DIRECT;
for($i = 0; $i < sizeof($filename); $i++) { echo $filename[$i]['direct'] . "\n"; }
echo LINKS_CLOSE;
echo LINKS_THUMB;
for($i = 0; $i < sizeof($filename); $i++) { echo $filename[$i]['thumb'] . "\n"; }
echo LINKS_CLOSE;
echo LINKS_BBCODE;
for($i = 0; $i < sizeof($filename); $i++) { echo '[IMG]' . $filename[$i]['direct'] . '[/IMG]' . "\n"; }
echo LINKS_CLOSE;
echo LINKS_HTML;
for($i = 0; $i < sizeof($filename); $i++) { echo '<img src="' . $filename[$i]['thumb'] . '" />' . "\n"; }
echo LINKS_CLOSE;
echo '</table>';
echo '<br />';
echo '<input type="reset" id="resetbtn-remote" class="button-sub" value="« Upload more" />';
echo '<br />';
echo '</div>';
?>
</div>
<?php include('includes/footer.php'); ?>
</div>
</body>
</html>
I guess the problem is inside the foreach loop (it was a for loop a few days ago, but faced the same problems), but I can't seem to fix it. I'll reexplain in a short version:
I upload a Zip archive
Script unZips the archive
Script renames the extracted files
Thumbnail must be generated for all images that were in the Zip (fails)
Multiple links should be outputted matching every image the was in the Zip (fails)
Ideas?
You are re-using a variable ($filename) for two different purposes. At the top, add a line like this:
$file_list = array();
Later in the code, where you do this:
$filename[] = array('direct' => $image_urls, 'thumb' => $thumb_urls);
... change it to this:
$file_list[] = array('direct' => $image_urls, 'thumb' => $thumb_urls);
Later in your code where you loop, use foreach instead:
echo LINKS_DIRECT;
foreach ($file_list as $this_file)
echo $this_file['direct'] . "\n";
echo LINKS_CLOSE;
echo LINKS_THUMB;
foreach ($file_list as $this_file)
echo $this_file['thumb'] . "\n";
echo LINKS_CLOSE;
echo LINKS_BBCODE;
foreach ($file_list as $this_file)
echo '[IMG]' . $this_file['direct'] . '[/IMG]' . "\n";
echo LINKS_CLOSE;
echo LINKS_HTML;
foreach ($file_list as $this_file)
echo '<img src="' . $this_file['thumb'] . '" />' . "\n";
echo LINKS_CLOSE;
You've got a lot of other odd things going on in there, like using constants for HTML fragments. I think you should take another look at your process there and eliminate some of the unnecessary steps and variables. I see several global keywords used... none appear to be necessary.
I fixed this problem just by removing the following code part:
file_exists($zip)
Hi i am trying to use simple_html_dom for a text(website) clustering project but i have run into a weird problem. When i use echo inside the outer loop the url and the snippet are what you would expect but when i try to echo the array contents i have gathered outside the loop the urls are ok but the snippets are gone and the last snippet is in their place.
<?php
// create HTML DOM
include("simple_html_dom.php");
$search_query = 'something';
$j = 1;
$k = 1;
/*************************GOOGLE***************************/
for ($i = 0; $i < 1; $i++) {
$url = sprintf('http://www.google.com/search?q=%s&start=%d', $search_query, 10 * $i);
$html = file_get_html($url);
foreach ($html->find('a[class=l]') as $element) {
$urls[$j] = $element->href;
echo $element->href . "\n\n\n\n\n";
$j++;
}
foreach ($html->find('div[class=s]') as $element) {
$snippets[$k] = $element->innertext;
echo $element->innertext . "\n\n\n\n\n";
$k++;
}
}
$j = 1;
foreach ($snippets as $elemement) {
echo $urls[$j] . "\n" . $element . "\n\n\n\n";
$j++;
}
?>
Are you sure you did not made a typo in your code?
foreach ($snippets as $elemement) {
echo $urls[$j] . "\n" . $element . "\n\n\n\n";
$j++;
}
element and elemement are different; Your loop executes fine but your statement probably doesn't.
You made a typo, $elemenent really should be $element.
foreach ($snippets as $element) {
echo $urls[$j] . "\n" . $element . "\n\n\n\n";
$j++;
}
This is one reason to get used to make readable code. It's not because others like it, but because it makes debugging much easier.