Optimize feed fetching - php

I'm working on a site now that have to fetch users feeds. But how can I best optimize fetching if I have a database with, lets say, 300 feeds. I'm going to set up a cron-job to which fetches the feeds, but should I do it like 5 every second minute or something?
Any ideas on how to do this the best way in PHP?

If I understand you question, you are basically working on a feed agregator site?
You can do the following; start by refreshing every 1 hor (for example). When you have anough entries from some feed - calculate the average interval between entries. Then use that interval as an interval for fetching that feed.
For example, if the site published 7 articles in the last 7 days - you can fetch feeds from it every 24hours (1day).
I use this algorithm with a few changes, when I calculate this average interval I divide it by 2 (to be sure not to fetch too rarely). If the result is less than 60 minutes - I set the interval to 1h or it is bigger than 24 I set it to 24h.
For example, something like this:
public function updateRefreshInterval() {
$sql = 'select count(*) _count ' .
'from article ' .
'where created>adddate(now(), interval -7 day) and feed_id = ' . (int) $this->getId();
$array = Db::loadArray( $sql );
$count = $array[ '_count' ];
$interval = 7 * 24 * 60 * 60 / ( $count + 1 );
$interval = $interval / 2;
if( $interval < self::MIN_REFRESH_INTERVAL ) {
$interval = self::MIN_REFRESH_INTERVAL;
}
if( $interval > self::MAX_REFRESH_INTERVAL ) {
$interval = self::MAX_REFRESH_INTERVAL;
}
Db::execute( 'update feed set refresh_interval = ' . $interval . ' where id = ' . (int) $this->getId() );
}
The table is 'feed', 'refreshed' is the timestampt when the feed was last time refreshed and 'refresh_interval' is the desired time interval between two fetches of the same feed.

Based on the new information I think I would do something like this:
Let the "first" client initiate the updatework and store timestamp with it.
Everey other clients that will ask for the information get a cashed information until that information are to old. Next hit from a client will then refresh the cashe that then will be used by all clients till next time its to old.
The client that will actually initiate the updatework should not have to wait for it to finnish, just serv the old cashed version and continue to do it till the work is done.
That way you dont have to update anything if no clients are requesting it.

The best thing to do is to be 'nice' and not overload the feeds with lots of needless requests. I settled on a 1 hour update time for one of my webapps that monitors about 150 blogs for updates. I store the time they were last checked in the database and use that to decide when to update them. The feeds were added at random times so they aren't all updated at the same time.

I wrote pfetch to do this for me. It's small, but has a couple really important aspects:
It's written in twisted and can handle massive concurrency even when the network is slow.
It doesn't require any cron jockeying or anything.
I actually wrote it because my cron-based fetchers were becoming a problem. Now I have it configured to fetch some random stuff I want around the internet and then runs scripts whenever things change to update parts of my own web site.

Related

How to get the result with these time intervals in php

I'm using a time clock system which, by default, records only the employee's entry and exit times. I'm customizing it so that it's possible to also record break times but I'm having trouble getting the break time to be subtracted from the total time.
This snippet of code is used to register the time between the check-in and check-out:
$time1 = Carbon::createFromFormat("Y-m-d H:i:s", $timeIN);
$time2 = Carbon::createFromFormat("Y-m-d H:i:s", $timeOUT);
$th = $time1->diffInHours($time2);
$tm = floor(($time1->diffInMinutes($time2) - (60 * $th)));
$totalhour = ($th.".".$tm);
The variable ($totalhour) receives the total value between the input register and the output register. It sends to the database in H.i format (hour.minutes), then another page searches for this information in the database and replaces the point (.) with (hr).
Based on this code, I did the same to get the interval start and end timestamps. I was able to get the time between the start time and end time interval with the code below:
$breakstart = table::attendance()->where([['idno', $idno]])->value('breakstart');
$breakend = table::attendance()->where([['idno', $idno]])->value('breakend');
$one = Carbon::createFromFormat("H:i:s", $breakstart);
$two = Carbon::createFromFormat("H:i:s", $breakend);
$breakone = $one->diffInHours($two);
$breaktwo = floor(($one->diffInMinutes($two) - (60 * $breakone)));
$totalbreak = $breakone.".".$breaktwo;
The $totalbreak variable stores the time taken between the start and end of the break. I was also successful in getting the time between this interval.
Now, I need the total time to be done by subtracting the time obtained from the record at the beginning of the interval to the record at the end of the interval.
I did with this code and got good result up to a point. Could you give me tips on how to get an assertive result in this case?
$totalhour = ($th.".".$tm) - ($totalbreak);
I tried to get the total time by subtracting the break time, but without success.

PHP working with milliseconds

I have the following value (generated as GMT Time) being retrieved from an API call
Song Started Time : 2017-09-06T16:51:02.000Z
I also have the duration (in form of milliseconds) of a specific song tied to that record in the API response. For example, it may return:
222000
Then, using PHP GMT time function I'm checking what the current time is on a PHP page.
World Current Time: 2017-09-06T16:51:31.000Z
Using PHP, how would I be able to determine how far along in the song I currently am, using the start time, fixed duration of the song, and the current time. I figure this should be fairly simple, but I'm struggling to figure out how to add milliseconds in PHP. Ideally, the output I'm looking for should just say .33 to indicate the song is currently 33% completed.
So you need to find song played time in percent
<?php
$date = new DateTime("2017-09-06T16:51:02.000Z");
$date2 = new DateTime("2017-09-06T16:51:31.000Z");
$interval = $date2->diff($date)->s;
$duration = 222;//222000/1000 to make milliseconds in seconds
echo $song_played = (int)(($interval/$duration)*100) . "%";
?>
Live demo : https://eval.in/856523
Example you gave is for 13% not for 33%

Ignore seconds when querying created_at

I have a scheduled task that runs every 5 minutes that collects some stats on a server I run.
There is a small delay whilst it waits for the request to come back and so records are always being saved 2 or 3 seconds later. I.e the task runs at 2017-14-03 08:00:00, but the records are saved at 2017-14-03 08:00:03.
I am trying to pull the records out to display on a graph. The graph scales to the time period you want to look at (through hard coded buttons that refresh the graph with new data).
The first graph I am trying to do is one over the last 24 hours. Rather than bring back every 5 minute point for the last 24 hours, I just want one per hour. I have built a function to round down to the nearest hour and then get the last 24 hours based off that - it looks like this:
public function last24Hours()
{
$times = [];
$time = Carbon::now()->minute(0)->second(0);
$i = 1;
while($i <= 24)
{
array_push($times, $time->toDateTimeString());
$time->subHour();
$i++;
}
return $times;
}
Using the times returned, I am trying to query the model with whereIn() like so:
$stats = ServerTracking::whereIn('created_at', $this->last24Hours())->get();
The query runs, but nothing comes back - as the created_at time is a couple of seconds off from what I am querying.
I've hit a bit of a roadblock and cannot think of a way to get around this? Any ideas?
You can use selectRaw with a formatted date:
$stats = ServerTracking::selectRaw('foo,bar,DATE_FORMAT(created_at, "%Y-%m-%d %H:00:00") as hour_created')->get()->keyBy('hour_created');
All of the values in each hour will have the same hour_created, and keyBy will only keep one of them (from docs):
If multiple items have the same key, only the last one will appear in the new collection.
Just replace foo and bar with the other values you need. You'll either keep the 0:55 minute values, or the 0:00 minute values, depending on how you sort the query.
Come to think of it, you could use whereRaw to do it your way:
->whereRaw("DATE_FORMAT(created_at, '%Y-%m-%d %H:00:00') in (" .implode(",",$last24Hours). ")")
Not a solution per se, but I would take a different approach. Assuming I understood you are trying to query the last 24hours (1day), I would do
$now = Carbon::now();
$stats = ServerTracking::where('created_at', '<=', $now) //now
->where('created_at', '>=', $now->subHours(24)) //24hours ago
->get();
Using whereBetween is similar, but a bit shorter
$now = Carbon::now();
$stats = ServerTracking::whereBetween('created_at', [$now, $now->subHours(24)])
->get();

PHP server side incremental counter

sorry I am new to PHP and need some help/guidance on creating a counter that will work server side, so I guess update an initial value?
I need for example to start with a base number of 1500 and have that number increase by 1 every 2 minutes, obviously so any visitors will see an increased number each time the visit.
Would the initial value need to be stored in sql or can a txt file be updated?
Any help would be great,
Thanks
It can be done in SQL if you want it but a text file is OK too, just save a value (1500), then create a cronjob and let it execute a PHP file where you'll have to set up the code that executes an SQL query which updates that value OR the code to update that text file every 2 minutes.
Example:
# Every two minutes
*/2 * * * * /your/path/too/this/file/updatecode.php
In your PHP file:
$SQL = "UPDATE table SET columnname = columname + 1";
// etc...
// OR the text file update code
If you don't need to store it specifically for some reason, then you don't need to run cron etc... Take a time stamp of a specific point in time you want to start at. Then calculate minutes since and add it to your start number (1500)
//Start Number
$n = 1500;
$cur_time = time();
$orig_time = strtotime("2013-10-21 10:00:00");
//New Number + difference in minutes (120 seconds for 2 mins) since start time
$newn = $n + round(abs($cur_time - $orig_time) / 120,0);
// Output New Number
echo $newn;
And if you wanted it in one line for copy/paste
echo 1500 + round(abs(time() - strtotime("2013-10-21 10:00:00")) / 120,0);
You could do this without a database just using dates. work out the difference in time between two dates (the current date and the starting date when you created the script), then divide that down into the correct amount of milliseconds for 2 minutes, and add that to your initial 1500.
If storing it is needed a SQL database for this is probably overkill.
Create you number, serialize it and store it to a file. Load it from the file next time, unserialize, increment, serialize and save.
You can save a time stamp along with the number to the file to avoid having to run some job every 2 minutes and instead calculate the correct value when you load the number back from the file.
Something like this (but error checking etc should be added and I haven't actually tried it to make sure the calculation is correct... but the main idea should be visible).
<?php
if(file_exists('mydatafile')) {
$data = unserialize(file_get_contents('mydatafile'));
// Calculate correct value based on time stamp
$data['number'] += round((time() - $data['timestamp']) / 120);
}
else {
// Init the number
$data["number"] = 1500;
}
// Print it if you need to here
// Update time stamp
$data["timestamp"] = time();
// Serialize and save data
file_put_contents('mydatafile', serialize($data)));

PHP Date-Dependent Pagination

I have a site that stores a bunch of records in an sql database that i want to pull onto a page based on the date, which is stored as Y-m-d in sql. But, I want to paginate the results based on day.
So for the index, i want to show all the results from the current day, for which i was going to use the php date() function as the WHERE in my QUERY. But I'm hitting a snag on doing the pagination. I want to have buttons at the bottom that go to the next page with a get, so index.php?page=2 would be tomorrow, but i cant figure out how to select "tomorrow" reliably from the database in my WHERE.
See, i was going to use date("U") to get the unix time in seconds of the first day on the first page and then just add 3600*$_GET['page'] for incrementing the date on the next pages, but that seems like a sloppy way to do it that might wind up messing me up. Is this the only way or is there a better, more practical solution - thanks a lot guys I appreciate it.
If page 2 is tomorrow, then you're going to be looking at something like this:
$days_ahead = $page - 1;
$query = "... WHERE date = DATE(NOW()) + INTERVAL $days_ahead DAY ...";
Note that this would work fine on the first page too (assuming $page gets defaulted to 1), it'd add 0 days to today's date.
You experiment with strtotime:
$sqldate = date('Y-m-d', strtotime('+2 days'));
This is how I managed to fix it for my site
//'page' is a GET variable from url
if($page<=1) {
$datemax = time();
$datemin = time() - (1 * 2592000); //2592000 being seconds in a month
}
else{
$datemax = time() - (($page - 1) * 2592000);
$datemin = time() - ($page * 2592000);
}
And then obviously the query will look something like
SELECT * FROM posts WHERE dateposted >=$datemin AND dateposted <=$datemax

Categories