Have a project where I'm scraping a few sites with data, then outputting onto one site. To help with load times, I'm trying to rig it so once every 10 mins, my main website does a full data scrape, then stores it all into a cache folder called "cache", stored in the root folder. Then, anytime I refresh main site after that 10 mins, it pulls from the cache, making load times quite fast at that point.
Trouble is, load times haven't changed, which it really should using this method, so I'm doing something wrong. Would appreciate any help. Now I can confirm the data IS being stored in the cache, because I see the files automatically appearing there. So the issue has to be that the code is broken where specified to grab the data from cache, after it's stored every 10 minutes, it's not grabbing the data.
*part of me wonders if the issue is with how the filenames are being saved in cache, right now it seems to be random values. for ex, one is named f32dd7f0b85eb4c1be0bb9a417cc29ea553d898e.html
I'd think it needs to be saved as a specific file name. Not sure how to achieve that though. The code at the end of my php reference files seem to specify this, so not sure issue. The code that is supposed to be doing this is at the bottom of the post.
I'm really new to php, and honestly have only gotten this far through some very nice and helpful people. I'm close, but not quite there yet with this cache framework.
global.php in root folder:
<?php
$_cache_time =600; //10 minutes
$_cache_dir="./cache"; //cache dir
function deleteBlankInArray($var){
return !ctype_space($var)&&!empty($var);
}
function cache_start($filename)
{
global $_cache_dir,$_cache_time;
$cachefile = $_cache_dir.'/'.sha1($filename).'.html';
ob_start();
if(file_exists($cachefile) && (time( )-$_cache_time <
filemtime($cachefile)))
{
include($cachefile);
ob_flush();
return true;
}
return false;
}
function cache_end($filename)
{
global $_cache_dir,$_cache_time;
$cachefile = $_cache_dir.'/'.sha1($filename).'.html';
$fp = fopen($cachefile, 'w');
fwrite($fp, ob_get_contents());
fclose($fp);
ob_flush();
}
My main website, is an xhtml site. It's referencing these php pages like this:
<?php include 's&pcurrent.php';?>
<?php include 'news.php';?>
It's referencing/outputting multiple php files, which is why load times are slow, if not pulling from cache.
And lastly, this is an example of one of my php files that are being "included". This one is called litecoinchange.php
<?php
error_reporting(E_ALL^E_NOTICE^E_WARNING);
include_once "global.php";
//filename of the file
if(!cache_start("litecoinchange.php")){
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://coinmarketcap.com/');
$xpath = new DOMXPath($doc);
$query = "//tr[#id='id-litecoin']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$result = trim($entry->textContent);
$ret_ = explode(' ', $result);
//make sure every element in the array don't start or end with blank
foreach ($ret_ as $key=>$val){
$ret_[$key]=trim($val);
}
//delete the empty element and the element is blank "\n" "\r" "\t"
//I modify this line
$ret_ = array_values(array_filter($ret_,deleteBlankInArray));
//echo the last element
echo $ret_[7];
//filename of the file
cache_end("litecoinchange");
}
}
Related
It works fine but later sometime the count just goes down to random
number. My guess is my code cannot process multiple visits at a time.
Where increment heppens
Where it displays the count
<?php
$args_loveteam = array('child_of' => 474);
$loveteam_children = get_categories($args_loveteam);
if(in_category('loveteams', $post->ID)){
foreach ($loveteam_children as $loveteam_child) {
$post_slug = $loveteam_child->slug;
echo "<script>console.log('".$post_slug."');</script>";
if(in_category($loveteam_child->name)){
/* counter */
// opens file to read saved hit number
if($loveteam_child->slug == "loveteam-mayward"){
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug."-2.txt","r");
}else{
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug.".txt","r");
}
$count = fgets($datei,1000);
fclose($datei);
$count=$count + 1 ;
// opens file to change new hit number
if($loveteam_child->slug == "loveteam-mayward"){
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug."-2.txt","w");
}else{
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug.".txt","w");
}
fwrite($datei, $count);
fclose($datei);
}
}
}
?>
I would at least change your code to this
foreach ($loveteam_children as $loveteam_child) {
$post_slug = $loveteam_child->slug;
echo "<script>console.log('".$post_slug."');</script>";
if($loveteam_child->slug == "loveteam-mayward"){
$filename = "{$_SERVER['DOCUMENT_ROOT']}/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-{$post_slug}.txt";
}else{
$filename = "{$_SERVER['DOCUMENT_ROOT']}/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-{$post_slug}-2.txt";
}
$count = file_get_contents($filename);
file_get_contents($filename, ++$count, LOCK_EX);
}
You could also try flock on the file to get a lock before modifying it. That way if another process comes along it has to wait on the first one. But file_put_contents works great for things like logging where you may have many processes competing for the same file.
Database should be ok, but even that may not be fast enough. It shouldn't mess up your data though.
Anyway hope it helps. This is kind of an odd question, concurrency can be a real pain if you have a high chance of process collisions and race conditions etc etc.
However as I mentioned (in the comments) using the filesystem is probably not going to provide the consistency you need. Probably the best for this may be some kind of in memory storage such as Redis. But that is hard to say without full knowing what you use it for. For example if it should persist on server reboot.
Hope it helps, good luck.
I'm looping through a directory, trying to find XML files with errors.
$baddies = array();
foreach (glob("fonts/*.svg") as $filename) {
libxml_use_internal_errors(true);
$str = file_get_contents($filename);
$sxe = simplexml_load_string($str);
$errors = libxml_get_errors();
$num_of_errors = 0;
$num_of_errors = sizeof($errors);
if ($num_of_errors > 0){
array_unshift($baddies, $filename);
}
}
However it seems that once the errors are put into this object, they persist there through subsequent iterations of the loop, and files without errors still test positive. $num_of_errors remains high for good files. I have it being reset to zero, and have even tried unseting it after each time through the loop. I suppose libxml_get_errors continues to retain a value once set. How can I reset it?
I think you should use libxml_clear_errors function. As per document here it says, the function keeps the errors stored in buffer.
Some of our pages use a lot of processing power so when users are not signed in it makes sense to completely cache them.
I'm using apc and the following code to try and accomplish this:
$key = "forum-thread-list-cache";
if(($data = apc_fetch($key)) === false) {
ob_start();
$forumOb = new Forum();
$threadList = $forumOb->getThreadList();
require "templates/forum.php";
$data = ob_get_contents();
ob_end_clean();
//Debugging
file_put_contents("/home/user/log.txt", $data);
//15 Minutes
apc_store($key, $data, 60 * 15);
flush();
}
Currently the generated html will appear in the log.txt file but I can't get it to appear in the apc user cache entries?
The generated html is around 18kb in size.
Am I doing something wrong here?
Here are my runtime settings, is there anything in here that would prevent 18kb of html being cached?
With the following code I'm creating a xml file with the info obtained from my database:
<?php
//include 'config.php';
include '/var/www/html/folder/config.php';
$now=date('Y-m-d h:i:s');
echo "Date: ".$now."<br><br>";
$sql="SELECT * FROM awards WHERE active=3";
$result=mysql_query($sql);
// create doctype
$dom = new DOMDocument("1.0");
// create root element
$root = $dom->createElement("data");
$dom->appendChild($root);
$dom->formatOutput=true;
while($data=mysql_fetch_array($result)){
echo $data['title'];
// create ITEM
$item = $dom->createElement("item");
$root->appendChild($item);
// ID DOM
$subitem = $dom->createElement("id");
$item->appendChild($subitem);
$text = $dom->createTextNode($data['id']);
$subitem->appendChild($text);
// title DOM
$subitem = $dom->createElement("title");
$item->appendChild($subitem);
$text = $dom->createTextNode($data['title']);
$subitem->appendChild($text);
}
if(unlink ("api/2.xml")){
echo "deleted<br>";
}
if($dom->save("api/2.xml")){
echo "created";
}
?>
This is working with no problem, file 2.xml is created, when I execute it manually.
But when I add it to the crontab the log shows that the cron is being executed (I obtain the date echoed at the beginning of the script and also the title echoed inside the while loop) but the 2.xml file is not created.
Any clues why is it not created?
If you migrate a script to cron than you always need to check two things:
File permissions, the cron job might get executed with different rights (Reminder: root is not the solution to everything).
Implicit paths, the cron job will have a different working directory.
We can't check the file permissions for you, but I can tell you that you're using implicit paths which, most likely, can not work in that form:
if(unlink("api/2.xml")){
echo "deleted<br>";
}
if($dom->save("api/2.xml")){
echo "created";
}
You now have the folder api floating around somewhere in your filesystem. Use absolute paths and you're good to go.
i'm using Smarty with my php code and i like to cache some of website pages so i used the following code :
// TOP of script
ob_start(); // start the output buffer
$cachefile ="cache/cachefile.html";
// normal PHP script
$smarty->display('somefile.tpl.html') ;
$fp = fopen($cachefile, 'w'); // open the cache file for writing
fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file
fclose($fp); // close the file
ob_end_flush(); // Send the output to the browser
but when i print ob_get_contents() at end of the php file it's empty ! and actually the created cache file is also empty ! so how could i cache the files in php when i using smarty i know i can use smarty cache but it isn't work for me for some reason .
in addition please give me information about APC cache . how to use it? is it worth using in my case , i thin it's just for caching database queries , i read the php manual about it but i can't get anything :)
tanks .
I've mashed up some of the code from the documentation (located here) for a more complete example of the smarty cache. Also, I'm not sure what you were using in your example, but you should be using smarty's methods to manipulate the cache.
require('Smarty.class.php');
$smarty = new Smarty;
// 1 Means use the cache time defined in this file,
// 2 means use cache time defined in the cache itself
$smarty->caching = 2;
// set the cache_lifetime for index.tpl to 5 minutes
$smarty->cache_lifetime = 300;
// Check if a cache exists for this file, if one doesn't exist assign variables etc
if(!$smarty->is_cached('index.tpl')) {
$contents = get_database_contents();
$smarty->assign($contents);
}
// Display the output of index.tpl, will be from cache if one exists
$smarty->display('index.tpl');
// set the cache_lifetime for home.tpl to 1 hour
$smarty->cache_lifetime = 3600;
// Check if a cache exists for this file, if one doesn't exist assign variables etc
if(!$smarty->is_cached('home.tpl')) {
$contents = get_database_contents();
$smarty->assign($contents);
}
// Display the output of index.tpl, will be from cache if one exists
$smarty->display('home.tpl');
As for APC cache, it will work the same way that smarty does. They both store the data in a file for a specific amount of time. Every time you wish to access the data, it checks if the cache is valid, and if so returns the cache value.
However, if not using smarty you can use APC as such:
This example goes through storing the result of a DB query in the cache, similarly, you can modify this to instead store the entire page output so you don't have to run expensive PHP functions frequently.
// A class to make APC management easier
class CacheManager
{
public function get($key)
{
return apc_fetch($key);
}
public function store($key, $data, $ttl)
{
return apc_store($key, $data, $ttl);
}
public function delete($key)
{
return apc_delete($key);
}
}
Combined with some logic,
function getNews()
{
$query_string = 'SELECT * FROM news ORDER BY date_created DESC limit 5';
// see if this is cached first...
if($data = CacheManager::get(md5($query_string)))
{
// It was stored, return the value
$result = $data;
}
else
{
// It wasn't stored, so run the query
$result = mysql_query($query_string, $link);
$resultsArray = array();
while($line = mysql_fetch_object($result))
{
$resultsArray[] = $line;
}
// Save the result inside the cache for 3600 seconds
CacheManager::set(md5($query_string), $resultsArray, 3600);
}
// Continue on with more functions if necessary
}
This example is slightly modified from here.
Do you mean you are calling ob_get_contents() again after you called ob_end_flush() ? If so, the stuff you wrote to the file will have been "deleted" from PHP's memory.
If you wish to still output the HTML, save ob_end_flush to a variable first, then pass that to fwrite. You can use the variable later down the page.