How to update database based on variable api result? - php

I would like to get your help to get the best and cleanest way to update my video database (MySQL).
Those videos are ordered into folders and I can only grab those folders one by one.
My server in using an external api to grab thousands of videos (each video including an unique id, folder id, title, url...).
As I want to make as less api call as possible, I would like to run a daily cron job to browse each folder and add the new videos, and remove the removed video.
At the moment I can browse all the folders, looping trough each video and adding it to the database.
while (true){
$videos = video_get($oid, $offset, $count);
$videos = $videos['response']['items'];
$results = count($videos);
echo $results . ' || ' . $offset;
if ($results > 0){
$offset += $results;
foreach ($videos as $video) {
$i++;
echo "<br>Number : " . $i;
echo "<br>";
$data = Array (
"id" => $video['id'],
"folder_id" => $video['folder_id'],
"title" => $video['title'],
"url" => $video['url']
);
$db->insert('videos', $data);
}
}
else {break;}
}
What I am asking is, cosidering that there are about 100000 videos, aranged into around 20 folders, would it be better to add all the video and remove the duplicate, or check if the video exist and then add it into the db in case it is a new one?
And more important, what would be the best way to remove the video that wont be present anymore in the api request?
I hope I have been clear enough, but just in case:
Videos are ordered by folder.
Videos can be added/removed at any time on the api side.
I want to keep all the videos into my database, and update them based of if they have been new/removed videos on the api side, in order to avoid having to call the api too often.
Thank you for your help.

Related

PHP - Trying to show Next and Previous file from the same directory

As the title sais, I'm trying to get the next and previous file from the same directory. So I did some this like this. Is there any better way of doing it? (This is from next auto index file.php code about related files, I have change it for my needs.)
db screenshot if you want to look - ibb.co/wzkDxd3
$title = $file->name; //get the current file name
$in_dir=$file->indir; //current dir id
$r_file = $db->select("SELECT * FROM `". MAI_PREFIX ."files` WHERE `indir`='$in_dir'"); //all of the file from the current dir
$rcount=count($r_file);
$related='';
if($rcount > 2){
$i = 0; // temp variable
foreach($r_file as $key => $r){ //foreach the array to get the key
if($r->name == $title){ //Trying to get the current file key number
$next =$key+1; //Getting next and prev file key number
$prv =$key-1;
foreach($r_file as $keyy => $e){ //getting the file list again to get the prev file
if($prv == $keyy){
$related .=$e->name;
}
}
foreach($r_file as $keyy => $e){ // same for the next file
if($next == $keyy){
$related .=$e->name;
}
}
}
}
Without knowing your DB background and use case, there still should be the possibility to use something like $r_file[$key], $r_file[$next] and $r_file[$prev] to directly access the specific elements. So at least two of your foreach loops could be avoided.
Please note, that nesting loops is extremely inefficient. E. g., if your $r_file contains 100 elements, this would mean 10.000 iterations (100 times 100) with your original code!
Also, you should leave a loop as soon as possible once its task is done. You can use break to do this.
Example, based on the relevant part of your code and how I understand it is supposed to work:
foreach($r_file as $key => $r){ //foreach the array to get the key
if($r->name == $title) { //Trying to get the current file key number
$next =$key+1; //Getting next and prev file key number
$prv =$key-1;
$related .= $r_file[$prv]->name; //Directly accessing the previous file
$related .= $r_file[$next]->name; //Directly accessing the next file
break; // Don't go on with the rest of the elements, if we're already done
}
}
Possibly, looping through all the elements to compare $r->name == $title could also be avoided by using some numbering mechanisms, but without knowing your system better, I can't tell anything more about that.

How can I parse, sort, and print a 90MB JSON file with 100,000 records to CSV?

Background
I'm trying to complete a code challenge where I need to refactor a simple PHP application that accepts a JSON file of people, sorts them by registration date, and outputs them to a CSV file. The provided program is already functioning and works fine with a small input but intentionally fails with a large input. In order to complete the challenge, the program should be modified to be able to parse and sort a 100,000 record, 90MB file without running out of memory, like it does now.
In it's current state, the program uses file_get_contents(), followed by json_decode(), and then usort() to sort the items. This works fine with the small sample data file, however not with the large sample data file - it runs out of memory.
The input file
The file is in JSON format and contains 100,000 objects. Each object has a registered attribute (example value 2017-12-25 04:55:33) and this is how the records in the CSV file should be sorted, in ascending order.
My attempted solution
Currently, I've used the halaxa/json-machine package, and I'm able to iterate over each object in the file. For example
$people = \JsonMachine\JsonMachine::fromFile($fileName);
foreach ($people as $person) {
// do something
}
Reading the whole file into memory as a PHP array is not an option, as it takes up too much memory, so the only solution I've been able to come up with so far has been iterating over each object in the file, finding the person with the earliest registration date and printing that. Then, iterating over the whole file again, finding the next person with the earliest registration date and printing that etc.
The big issue with that is that the nested loops: a loop which runs 100,000 times containing a loop that runs 100,000 times. It's not a viable solution, and that's the furthest I've made it.
How can I parse, sort, and print to CSV, a JSON file with 100,000 records? Usage of packages / services is allowed.
I ended up importing into MongoDB in chunks and then retrieving in the correct order to print
Example import:
$collection = (new Client($uri))->collection->people;
$collection->drop();
$people = JsonMachine::fromFile($fileName);
$chunk = [];
$chunkSize = 5000;
$personNumber = 0;
foreach ($people as $person) {
$personNumber += 1;
$chunk[] = $person;
if ($personNumber % $chunkSize == 0) { // Chunk is full
$this->collection->insertMany($chunk);
$chunk = [];
}
}
// The very last chunk was not filled to the max, but we still need to import it
if(count($chunk)) {
$this->collection->insertMany($chunk);
}
// Create an index for quicker sorting
$this->collection->createIndex([ 'registered' => 1 ]);
Example retrieve:
$results = $this->collection->find([],
[
'sort' => ['registered' => 1],
]
);
// For every person...
foreach ($results as $person) {
// For every attribute...
foreach ($person as $key => $value) {
if($key != '_id') { // No need to include the new MongoDB ID
echo some_csv_encode_function($value) . ',';
}
}
echo PHP_EOL;
}

Cache multiple pages/images from Instagram

I'm working on a small project where the users can see images taged by, in this case, "kitties". Instagram only allows 5000 requests/hour, i don't think it will reach this, but i'm choosing to cache any way. Also because i can't figure out how to get the back-link to work.
I can only get the link for next page, then the link for recent page becomes the current page, a link to itself.
Also, the api can return strange number of images, some times 14, some times 20 and so on. I want it to always show 20 images per page and only have 5 pages (100 images). And then update this file each 5/10 minutes or something.
So, my plan is to store like 100 images into a file. I got it working, but it's incredible slow.
The code looks like this:
$cachefile = "instagram_cache/".TAG.".cache";
$num_requests = 0; //Just for developing and check how many request it does
//If the file does not exsists or is older than *UPDATE_CACHE_TIME* seconds
if (!file_exists($cachefile) || time()-filemtime($cachefile) > UPDATE_CACHE_TIME)
{
$images = array();
$current_file = "https://api.instagram.com/v1/tags/".TAG."/media/recent?client_id=".INSTAGRAM_CLIENT_ID;
$current_image_index = 0;
for($i = 0; $i >= 0; $i++)
{
//Get data from API
$contents = file_get_contents($current_file);
$num_requests++;
//Decode it!
$json = json_decode($contents, true);
//Get what we want!
foreach ($json["data"] as $x => $value)
{
array_push($images, array(
'img_nr' => $current_image_index,
'thumb' => $value["images"]["thumbnail"]["url"],
'fullsize' => $value["images"]["standard_resolution"]["url"],
'link' => $value["link"],
'time' => date("d M", $value["created_time"]),
'nick' => $value["user"]["username"],
'avatar' => $value["user"]["profile_picture"],
'text' => $value['caption']['text'],
'likes' => $value['likes']['count'],
'comments' => $value['comments']['data'],
'num_comments' => $value['comments']['count'],
));
//Check if the requested amount of images is equal or more...
if($current_image_index > MAXIMUM_IMAGES_TO_GET)
break;
$current_image_index++;
}
//Check if the requested amount of images is equal or more, even in this loop...
if($current_image_index > MAXIMUM_IMAGES_TO_GET)
break;
if($json['pagination']['next_url'])
$current_file = $json['pagination']['next_url'];
else
break; //No more files to get!
}
file_put_contents($cachefile, json_encode($images));
This feels like a very ugly hack, any ideas for how to make this work better?
Or someone that can tell me how to make that "back-link" to work like it should? (Yes, i could yes js and go -1 in history, but no!).
Any ideas, suggestions, help, comments etc are appreciated.
Why not subscribe to real-time and store the images in the DB? Then, when they are rendered you can check if the image url is valid (check if the photo has been deleted). Getting the data from your own DB will be much faster than from instagram

Php WHILE loops only find one element

I got a problem with the following php code. It is supposed to list the items of a S3 bucket and find&delete files which contain a certain string in their filenames.
Problem is: only one file is deleted the others remain on the bucket after the execution of the script.
I can't find where the issue comes from so I ask you :/
$aS3Files = $s3->getBucket($bucketName); // list all elements in the bucket
$query = mysql_query("SELECT filename FROM prizes_media WHERE prize_id=" . $_POST["prizeId"]); // finds all filenames linked to the prize
while($media = mysql_fetch_array($query)){
// Find relevant files
while ( list($cFilename, $rsFileData) = each($aS3Files) ) { // reformat the bucket list into a table and reads through it
if(strpos($cFilename,$media['filename'])) {
$s3->deleteObject($bucketName, $cFilename); // deletes all files that contain $media['filename'] in their filename
}
}
}
// 2. Delete DB entry
mysql_query("DELETE FROM prizes WHERE id=" . $_POST['prizeId'] ); // deletes the entry correponding to the prize in the DB (deletes media table in cascade)
You may be getting false negatives on your if, you should be using this:
if(strpos($cFilename,$media['filename']) !== FALSE) { ...
Edit
Here is a different way to loop the bucket, based on the structure on your comment:
foreach($aS3Files as $filename => $filedata) {
if(strpos($filename, $media['filename']) !== FALSE) {
$s3->deleteObject($bucketName, $filename); // deletes all files that contain $media['filename'] in their filename
}
}

Deleting Files In A Directory Based On A Table

First of all, i would like to explain my condition right now.
I'm using PHP as my programming language.
I have a table named "Produk". It keeps every product name. Example value "TWC0001" in its id_produk column.
Every product have its own images, and stored in ./images/Produk/ directory.
the problem is, this project has been working about 1 years ago, and when the users delete a product, the product's images didn't deleted too. So, it still staying in ./images/Produk/ directory. It means, that file become a garbage right?
Case Example :
in the "Produk" table, column "id_produk" i have 3 rows :
"TWC0001","TWC0002","TWC0003".
Of course each of those rows have its own images that stored in ./images/Produk/
Each of those files named :
"TWC0001.jpg", "TWC0002.jpg", "TWC0003.jpg"
Case : A user logged in and deleted row "TWC0002", of course the "TWC0002.jpg" file still exist.
Problem : I want to delete all ".jpg" files that didn't listed in the "Produk" table anymore.
I've been doing this :
//listing all the ".jpg" files
$arrayfiles=scandir("../images/Produk/");
//getting all the product list
$sql="select * from produk";
$produk=mysql_query($sql,$conn) or die("Error : ".mysql_error());
foreach($arrayfiles as $key=>$value)
{
while($row=mysql_fetch_array($produk,MYSQL_ASSOC))
{
///here is the part i've been confused of.
}
}
PHP function to delete file is "unlink()";
Please anybody help me out of this.
The following code will produce an array with all the images that have no corresponding product record. I've left off the unlink command so you can do some reviewing process first.
$sql = "SELECT * FROM Produk";
$result = mysql_query($sql);
$existing_products = array();
while ($row = mysql_fetch_array($result))
$existing_products[] = $row["id_produk"] . ".jpg";
$existing_images = array();
foreach(glob("../images/Produk/*.jpg") as $v)
$existing_images[] = str_replace("../images/Produk/", "", $v);
$images_to_delete = array_diff($existing_images, $existing_products);
try this
$it = new RecursiveIteratorIterator( new RecursiveDirectoryIterator('../images/Produk/'));
$regx = new RegexIterator( $it, '/^.*\.jpg$/i', // only matched text will be returned
RecursiveRegexIterator::GET_MATCH );
foreach ($regx as $file) {
echo $file[0] , "\n";
unlink($file[0]);
}
this will find all JPG files in the given folders and subfolders and will delete it
I would recommend following:
make directory listing of "Images" direcotry by
dir /b > filelist.txt (windows)
or
ls -1 > filelist.txt (linux)
You will have now list of existing files which should be imported to some temp table in mysql.
Now write simple SQL to select files without apropriate products (don't forget to append .JPG suffix).
with list of files to be deleted you can simply create file_get_contents and foreach loop unlink.
Reason why I recommend this is security.You can review what will be deleted.
Once you run script, there is no undo (just from backup).
foreach(glob('../images/Produk/*.jpg') as $file) {
if(is_file($file))
#unlink($file);
}

Categories