PHP methods for Implementing a pseudo cache system for files - php

This question is more about methodology than actual code - lines
I would like to know how to implement a pseudo caching (for lack of a better name) for FILES in php . I have tried to read some articles, but most of them refer to the internal caching system of PHP , and not to what I need which is a FILE cache.
I have several scenarios where I needed such a system applied :
Scenario 1 :
While accessing a post and clicking a link, all the post attachments are collected and added to a zip file for download.
Scenario 2 :
Accessing a post , the script will scan all the content , extract all links, download some matching images for each link (or dynamically prepare one) and then serve those to browser . (but not after checing expiration period ?? )
( Those example uses "post" and "attachment" because i use wordpress and it is wordpress terminology, both currently work for me fine, except they generate the file over and over again. )
My doubts regarding the two scenarios (especially No.2) - How do I prevent the script to do the operation EVERY time the page is accessed ? (in other words , if the file exists , just serve it without looping the whole creating operation again)
My first instinct was call the file with some distinctive (but not load - unique like uniqueid() ) name and then check if it is already on the server , but that presents several problems (like it can already exists as naming , but of another post ..) and also - that should be very resource intensive for a server with 20,000 images .
The second thing I thought was to somehow associate a meta data for those files, but then again, How to implement it ? How to knwo which link is of what image ??
Also, in a case where I check for the file existence on the server , how can I know if the file SHOULD be changed (and therefor recreated ) ?
Since I am refering to wordpress, I thought about storing those images as base64 from binary directly to the DB with the transien_API - but it feels quite clumsy.
To sum up the question . How to generate a file, but also know if it exists and call it directly when needed ?? does my only option is store the file-name in DB and associate it somehow with the post ?? that seems so non efficient ..
EDIT I
I decided to include some example code , as it can help people to understand my dilemma .
function o99_wbss_prepare_with_callback($content,$width='250'){
$content = preg_replace_callback( '/(http[s]?:[^\s]*)/i', 'o99_wbss_prepare_cb', $content );
return $content;
}
function o99_wbss_prepare_cb($match){
$url = $match[1];
$url = esc_url_raw( $url );//someone said not need ??
$url_name = parse_url($url);
$url_name = $url_name['host'];// get rid of http://..
$param = '660';
$url = 'http://somescript/' . urlencode($url) . '?w=' . $param ;
$uploads = wp_upload_dir();
//$uniqid = uniqid();
$img = $uploads['basedir'] . '/tmp/' . $url_name .'.jpg' ; // was with $uniqid...
if(! # file_get_contents($url)){
$url = 'path ' .$url. ' doesn"t exist or unreachable';
return $url;
} else {
$file = file_get_contents( $url );
}
// here I will need to make some chck if the file already was generated , and
// if so - just serve it ..
if ( $file) {
file_put_contents( $img, $file );
// Do some other operations on the file and prepare a new one ...
// this produces a NEW file in the wp-uploads folder with the same name...
unlink($img);
}
return $url;
}

For Scenario 1:
Wordpress stored all post attachments as posts in the posts table. When a post is accessed run a function either in a created plugin or your themes functions.php. Use the pre_get_posts hook check if you have already created the zip file with function file_exists() using a unique name for each zip archive you create, post ID or permalink would be a good idea. Although you would need to make sure there was no user specific content. You can use filemtime() to check the time the file was created and if it is still relevant. If zip file does not exist create it, pre_get_posts will pass the query object which has the the post ID, just grab all the post attachments using get_posts and the parent ID being set to the ID passed in the query object. The GUID field contains the URL for each attachment then just generate a zip archive using ZipArchive() following this tutorial at.
For Scenario 2:
If your wordpress templates are set up to use the wordpress functions then replace the attachment functions to return their url and map that to the new url you have the cached content. For example the_post_thumbnail() would go to wp_get_attachment_thumb_url() copy the file to your cache and use the cache url as output. If you wanted to cache the DOM for the page as well use ob_start(). Now just run a check at the start of the template using file_exists and filetime(), if both are valid read in the cached DOM instead of loading the page.

Related

Generating unique download link to download once only

I wanna create a few unique download link for my users. The reason is that I wanted to let them download once only, so that they can use back the same link to download again.
I've generate a few of the keys (example, qwertyasdfghzxcbn. As in the download link will be like www.xxxxx.com/download.php?qwertyasdfghzxcbn) in the database and flag field where when the user downloaded, it will update 1 to the flag field.
I did a search on the net and found this.
http://www.webvamp.co.uk/blog/coding/creating-one-time-download-links/
But that only works when you go to the page first then only the page will generate the unique link. I've already pre-generate the link inside my database, I don't need to regenerate again, if fact if I generate the key when user go the page, they will able to download multiple times by refreshing the page.
The solution would be to make the link target itself a PHP script.
You'd hide the actual file somewhere inaccessible from the browser (i.e., somewhere where you can reach the file via fopen(), but isn't within the document root), and put a download.php file to download files.
The download script itself would look something like this:
$fileid = $_REQUEST['file'];
$file = file_location($fileid); // you'd write this function somehow
if ($file === null) die("The file doesn't exist");
$allowed = check_permissions_for($file, $fileid) // again, write this
// the previous line would allow you to implement arbitrary checks on the file
if ($allowed) {
mark_downloaded($fileid, $file); // so you mark it as downloaded if it's single-use
header("Content-Type: application/octet-stream"); // downloadable file
echo file_get_contents($file);
return 0; // running a return 0; from outside any function ends the script
} else
die("You're not allowed to download this file");
Any link you point would simply point to download.php?fileid=712984 (whatever the fileid actually is). That would be the actual download link, since that script does transfer the file; but only if the user is allowed to retrieve it. You'd have to write the file_location(), check_permissions_for() and mark_downloaded() functions yourself though.
I would suggest using uniqid() function, and store unique ids with the expiration date in a database, while returning to the user url with something like this: ...?file_id=$id
When the link is being opened, you may delete it from the database or mark it to be deleted 'soon' (just in case user wants to refresh the page.)

How do I find the filename of an image on a MediaWiki page using php?

How do I find the filename of an image on a MediaWiki site?
I don't want to put the filename in manually. I need PHP code which will fetch me the filename.
I can use $f = wfFindFile( '$filename' ); but HOW DO I GET $filename?
I've been looking at the FILE class but I can't figure out how to use File::getFilename(); I keep getting an error call to undefined method.
What am I doing wrong?
Explaining in more detail:
I would like to add the pin it button to my site so when you click on the button it post it on the pin it board with the image and description of the image. I need to use php to send the image information so it works on every page on my site. I can't code the image name manually each time.
So far I have the code:
<img border="0" src="//assets.pinterest.com/images/PinExt.png" title="Pin It" />
Which works great except I need to put in a value for $f (image name). My question is how do I get the value of $f without having to put in in eg $f = wfFindFile( 'Sunset.jpg' );
I would have thought this would be a really common request for anyone trying to add pinterest to their site.
Thanks
The $filename you are looking for is basically how it is named in MediaWiki when it got uploaded, for example Landscape-plain.jpg. You will just use the wfFindFile() helper function to get a File object. Then call the methods:
$ php maintenance/eval.php
> $file = wfFindFile( 'Landscape-plain.jpg' );
> print $file->getName();
Landscape-plain.jpg
> print $file->getPath();
mwstore://local-backend/local-public/b/b0/Landscape-plain.jpg
> print $file->getFullPath();
/path/to/images/b/b0/Landscape-plain.jpg
> print $file->getTitle();
File:Landscape-plain.jpg
> exit
API documentation:
http://svn.wikimedia.org/doc/classFile.html
http://svn.wikimedia.org/doc/classLocalFile.html
EDIT BELOW
The file informations are available through a File object, so you definitely need to use wfFindFile() to get such an object.
To actually find the filename for the page the user is browsing on, you want to use the query context and get its title:
$context = RequestContext::getMain();
$t = $context->getTitle();
if( $title->getNamespace == 'NS_FILE' ) {
$filename = $title->getPrefixedText;
// do your stuff.
}

How to download the updated version of a server file in IE (PHP)?

Internet Explorer is giving me a serious headache...
What I want to do is - create a button upon clicking which an .csv file is getting downloaded to a client. This .csv file includes information stored in one of the result tables I am producing on the page.
Before I create this button I am calling an internal function to create the .csv file based on the currently displayed table. I create this .csv file on the server. I'll inlcude this function here just in case, but I don't think it is of any help. Like I said I create this file, before I am creating the button.
/*
/ Creates a .csv file including all the data stored in $records
/ #table_headers - array storing the table headers corresponding to $records
/ #records - two dimensional array storing the records that will be written into the .csv file
/ #filename - string storing the name of the .csv file created by this from
*/
public function export_table_csv( $table_headers, $records, $filename )
{
// Open the $filename and store the handle to it in $csv_file
$csv_file = fopen( "temp/" . $filename, "w" );
// Write the $table_headers into $csv_file as a first row
fputcsv( $csv_file, $table_headers );
// Iterate through $records and write each record as one line separated with commas to $csv_file
foreach ( $records as $row )
fputcsv( $csv_file, $row );
// Close $csv_file
fclose( $csv_file );
} // end export_table_csv()
I have that working fine. I got the 'Export' button and I am using its onClick() event where I am using a one-liner:
window.open( 'temp/' . $export_filename );
Now, it works as intended in all browsers, except IE. The file still gets downloaded, but then when I perform some filtering on the table I am displaying on the page (the page gets reloaded whenever new filters are applied), and then press the 'Export' button again, it somehow downloads an old version of the .csv file with the old filters applied, not the current ones, even though this .csv file is re-written every time the new filters are being applied and the page gets reloaded.
It is as if the .csv file I am exporting is stored in IE's cache or something... It is really annoying as the export works fine in all other browsers... Chrome and FF always download the latest version of the file from the server, IE updates the file randomly, sometimes only after I submit the page with different filters a few times...
I didn't include too many lines of my code as I rather think I am simply missing some kind of meta tag or something from my code, rather than have a logical bug in the lines I have already written.
I am really confused by this and annoyed to say the least... I really start to dislike IE now...
I appreciate any suggestions on this matter.
You could use a 'cache buster' to prevent IE from caching the resource.
If you add a GET parameter (with a value that changes every time you load the page) to the URL, IE (or rather: any browser) will think it's a different file to get, so do something like this:
window.open( 'temp/" . $export_filename . "?cachebuster=" . uniqid(true) . "' );
If the value needs to change every time you click (not on page load):
window.open( 'temp/" . $export_filename . "?cachebuster="' + Math.random() );

Fetch HTML page and store it in MYSQL- How to

What's the best way to store a formatted html page with CSS on to MYSQL database? Is it possible?
What the column type should be? How to retrieve the stored formatted HTML and display it correctly using PHP?
What if the page I would like to fetch has pics and videos, show I store the page as blob
What's the best way to fetch a page using PHP-CURL,fopen,..-?
Many questions guys but I really need your help to put me on the right way to do it.
Thanks a lot.
Quite simple, try this code I made for you.
It's the basics to grab and save the source in a DB.
I didn't put error handling or whatever else, just keep it simple for the moment...
I didn't made the function to show the result, but you can print the $source to view the result.
Hope this will help you.
<?php
function GetPage($URL)
{
#Get the source content of the URL
$source = file_get_contents($URL);
#Extract the raw URl from the current one
$scheme = parse_url($URL, PHP_URL_SCHEME); //Ex: http
$host = parse_url($URL, PHP_URL_HOST); //Ex: www.google.com
$raw_url = $scheme . '://' . $host; //Ex: http://www.google.com
#Replace the relative link by an absolute one
$relative = array();
$absolute = array();
#String to search
$relative[0] = '/src="\//';
$relative[1] = '/href="\//';
#String to remplace by
$absolute[0] = 'src="' . $raw_url . '/';
$absolute[1] = 'href="' . $raw_url . '/';
$source = preg_replace($relative, $absolute, $source); //Ex: src="/image/google.png" to src="http://www.google.com/image/google.png"
return $source;
}
function SaveToDB($source)
{
#Connect to the DB
$db = mysql_connect('localhost', 'root', '');
#Select the DB name
mysql_select_db('test');
#Ask for UTF-8 encoding
mysql_query("SET NAMES 'utf8'");
#Escape special chars
$source = mysql_real_escape_string($source);
#Set the Query
$query = "INSERT INTO website (source) VALUES ('$source')"; //Save it in a text row, that's it...
#Run the query
mysql_query($query);
#Close the connection
mysql_close($db);
}
$source = GetPage('http://www.google.com');
SaveToDB($source);
?>
Pull down the whole page using fopen and parse out any URLs (like images and css). You'll want to run a loop to grab each of the urls for files that generate the page. Store these as well, and replace the urls that used to link to the other sites files with your new links. (this will avoid any issues if the files should change or be removed in the future).
I'd recomend using a blob datatype just because it would allow you store all the files in one table, but you could do a table for the pages with a text datatype and another with blob to store images and other files.
Edit:
If you are storing as a blob datatype look into base64_encode() it will increase the storage footprint on the server but you'll avoid any issues with quotes and special characters.
Don't use a relation database to store files. Use a filesystem or a NoSQL solution.
You might want to look into the various open source spider that are available (htdig and httrack come to mind).
I'd store the URLs in a database, and make a cron job to wget the pages regularly, storing them in their own keyed local directories. Using wget will allow you to cache the page, and optionally cache its images, scripts, etc... as well. You can also have your wget command change the embedded URLs so that you don't have to cache everything.
Here is the man page for wget, you may also consider searching for "wget backup website" or similar.
(By "keyed directories" I mean that your database table would have 2 fields, a 'key' and a 'url', the [unique] 'key' would then be the path where you archive the website to using wget.)
You can store the data as text datatype in mysql
but you have to convert the data bcz page may content many quotes and special characters.
you can see this question THIS Its not exact to your question but it will help when you will store the data in database.
about that images and videos...if you are storing page content then there will be only paths of that images and videos.. so no problem will come when you will store in database.

How to extract pictures from website which are using a timestamp as name

I think I know the answer for this question allready, but just as curious I am, I'll ask it anyways.
I'm running a webshop which products come with a csv file. I can import all the objectsng without any trouble, the only thing is that images and thumbnail locations are not exported with the the database dump. (it's never perfect heh) You might say, do it manually then, that's what I did in the first place, but after 200 products and RSI, I gave it up and looked for a better more efficient way to do this.
I have asked my distributer and I can use their images for my own goals without any having copyright problems.
When I look at the location of the images, the url looks like this:
../img/i.php?type=i&file=1250757780.jpg
Does anyone have a idea how this problem can be tackled?
For scraping a website, I found this code:
<?php
function save_image($pageID) {
$base = 'http://www.gistron.com';
//use cURL functions to "open" page
//load $page as source code for target page
//Find catalog/ images on this page
preg_match_all('~catalog/([a-z0-9\.\_\-]+(\.gif|\.png|\.jpe?g))~i', $page, $matches);
/*
$matches[0] => array of image paths (as in source code)
$matches[1] => array of file names
$matches[2] => array of extensions
*/
for($i=0; $i < count($matches[0]); $i++) {
$source = $base . $matches[0][$i];
$tgt = $pageID . $matches[2][$i]; //NEW file name. ID + extension
if(copy($source, $tgt)) $success = true;
else $success = false;
}
return $success; //Rough validation. Only reports last image from source
}
//Download image from each page
for($i=1; $i<=6000; $i++) {
if(!save_image($i)) echo "Error with page $i<br>";
}
?>
For some reason it throws this error: Error with page 1, Error with page 2, etc
Well, you can either make the distributer to give you the image names in the CSV file and then you can construct the URLs directly, or you will have to scrap their website via a script and fetch the images (I'd ask them for permission before doing this).
That URL doesn't really tell you where the picture is located - only that a script i.php will be called and the file name is passed in as a parameter file on the query string.
Where the i.php script goes to actually find the image cannot be deduced from just the info you present here. You'd have to inspect the script to find out that information, me thinks.

Categories