Php sort multiple xmlDoc by date - php

I am pulling a list of blog pages from a .XML file and printing the 2 newest entries for a web page. I however have no idea how to sort the .XML files by pubDate or file_edited.
The code successfully retrieves the files and prints the two newest entries.
Here is the PHP code block that retrieves the files and prints them.
<?php
date_default_timezone_set('Europe/Helsinki');
/* XML Source URL:s */
$pages=("blog/data/other/pages.xml");
/* XML Doc Conversions */
$xmlDoc = new DOMDocument();
echo "<div class='blog_article_wrapper'>";
function myFunction($x){
// Run 2 times, skip first file and stop loop.
for ($i=1; $i<=2; $i++) {
//Get "Title
$item_title=$x->item($i)->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
//Get "Date" from .XML document.
$item_date=$x->item($i)->getElementsByTagName('pubDate')
->item(0)->childNodes->item(0)->nodeValue;
//Get "URL" from .XML document.
$item_url=$x->item($i)->getElementsByTagName('url')
->item(0)->childNodes->item(0)->nodeValue;
//Get "Author" from .XML document.
$item_author=$x->item($i)->getElementsByTagName('author')
->item(0)->childNodes->item(0)->nodeValue;
//Format date and author
$item_date = date('d.m.Y', strtotime($item_date));
$item_author = ucfirst(strtolower($item_author));
//Get content data from specifix .XML document being iterated in loop
$url=("blog/data/pages/" . $item_url . ".xml");
$xmlDoc = new DOMDocument();
$xmlDoc->load($url);
$y=$xmlDoc->getElementsByTagName('content')->item(0)->nodeValue;
//Limit content to 150 letters and first paragraph tag.
$start = strpos($y, '<p>="') + 9;
$length = strpos($y, '"</p>') - $start;
$src = substr($y, $start, $length);
$item_content = "\"" . (substr($src, 0, 150)) . "...\"";
// Page specific code for output comes here.
}
}
//Call loop and iterate data
$xmlDoc->load($pages);
$x=$xmlDoc->getElementsByTagName('item');
myFunction($x);
?>
Any advice, code or articles pointing in the right direction would be much appreciated.
Thank you!

I figured this out my self using another stackoverflow question and php.net
//Directory where files are stored.
$folder = "blog/data/pages/";
$array = array();
//scandir and populate array with filename as key and filemtime as value.
foreach (scandir($folder) as $node) {
$nodePath = $folder . DIRECTORY_SEPARATOR . $node;
if (is_dir($nodePath)) continue;
$array[$nodePath] = filemtime($nodePath);
}
//Sort entry and store two newest files into $newest
arsort($array);
$newest = array_slice($array, 0, 2);
// $newest is now populated with name of .XML document as key and filemtime as value
// Use built in functions array_keys() and array_values() to access data
?>
I can now modify the original code in the question to use only these two outputted files for retrieving the desired data.

Related

Recursively search directories and list the x newest files (based on creation date on server)

Ok, I don't fully understand what I'm doing here, so I thought I'd get some feedback on my code.
Trying to recursively search through specific folders on my server, and return the 30 newest *.jpg images that were added (with full filepath).
At the moment my current code gives me (I'm assuming) timestamps (they each look like a string of 10 numbers), and actually I seem to only be getting 22 out of the full 30 I was expecting. I saw another post using directoryIteratorIterator, but I'm not able to upgrade my PHP version for my server and I can't find a lot of clear documentation on that.
Hoping someone can steer me in the right direction on this.
<?php
function get30Latest(){
$files = array();
foreach (glob("*/*.jpg") as $filename) { //I assume "*/*.jpg" would start from the root of the server and go through each directory looking for a match to *.jpg and add to $files array
$files[$filename] = filemtime($filename);
}
arsort($files); //I may not need this since I'm looking to sort by earliest to latest (among the 30 newest images)
$newest = array_slice($files, 0, 29); //This should be the first 30 I believe.
foreach ($newest as $file){ //Assuming I would loop through the array and display the full paths of these 30 images
echo $file . "</br>"; //Returns something similar to "1451186291, 1451186290, 1451186290, etc..."
}
}
?>
You are on a good way. This should work for you:
First of all we create a RecursiveDirectoryIterator which we pass to our RecursiveIteratorIterator so we have an iterator to iterate recursively through all files of your specified path. We filter everything expect *.jpg files out with a RegexIterator.
Now we can convert the iterator into an array with iterator_to_array(), so we can sort the array as we want to. Which we do with usort() combined with filectime() so we compare the creation date of the files and sort it by that.
At the end we can just slice the 30 newest files with array_slice() and we are done. Loop through the files and display them.
Code:
<?php
$it = new RecursiveIteratorIterator(new RecursiveDirectoryIterator("your/path"));
$rgIt = new RegexIterator($it, "/^.+\.jpg$/i");
$files = iterator_to_array($rgIt);
usort($files, function($a, $b){
if(filectime($a) == filectime($b))
return 0;
return filectime($a) > filectime($b) ? -1 : 1;
});
$files = array_slice($files, 0 , 30);
foreach($files as $v)
echo $v . PHP_EOL;
?>
I think what you may want to do is keep you function more general, incase you want to use it's function(s) for other uses or just plain change it. You won't have to then create a get10Latest() or get25Latest(), etc. This is just a simple class that contains all the script that you need to fetch and return. Use what you want from it, the methods are in order of use, so you could just take out the guts of the methods to create one big function:
class FetchImages
{
private $count = 30;
private $arr = array();
private $regex = '';
public function __construct($filter = array('jpg'))
{
// This will create a simple regex from the array of file types ($filter)
$this->regex = '.+\.'.implode('|.+\.',$filter);
}
public function getImgs($dir = './')
{
// Borrowed from contributor notes from the RecursiveDirectoryIterator page
$regex = new RegexIterator(
new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($dir)),
'/^'.$this->regex.'$/i',
RecursiveRegexIterator::GET_MATCH);
// Loop and assign datetimes as keys,
// You don't need date() but it's more readable for troubleshooting
foreach($regex as $file)
$this->arr[date('YmdHis',filemtime($file[0]))][] = $file[0];
// return the object for method chaining
return $this;
}
public function setMax($max = 30)
{
// This will allow for different returned values
$this->count = $max;
// Return for method chaining
return $this;
}
public function getResults($root = false)
{
if(empty($this->arr))
return false;
// Set default container
$new = array();
// Depending on your version, you may not have the "SORT_NATURAL"
// This is what will sort the files from newest to oldest
// I have not accounted for empty->Will draw error(s) if not array
krsort($this->arr,SORT_NATURAL);
// Loop through storage array and make a new storage
// with single paths
foreach($this->arr as $timestamp => $files) {
for($i = 0; $i < count($files); $i++)
$new[] = (!empty($root))? str_replace($root,"",$files[$i]) : $files[$i];
}
// Return the results
return (!$this->count)? $new : array_slice($new,0,$this->count);
}
}
// Create new instance. I am allowing for multiple look-up
$getImg = new FetchImages(array("jpg","jpeg","png"));
// Get the results from my core folder
$count = $getImg ->getImgs(__DIR__.'/core/')
// Sets the extraction limit "false" will return all
->setMax(30)
// This will strip off the long path
->getResults(__DIR__);
print_r($count);
I don't really need a giant, flexible class of functions. This function will always output the 30 latest images. If I'm understanding correctly, you're assigning a timestamp as a key to each file in the array, and then sorting by the key using krsort? I'm trying to pull out just those pieces in order to get an array of files with timestamps, sorted from latest to oldest, and then slice the array to just the first 30. Here's just a quick attempt as a talking point (not complete by any means). At the moment its outputting only one file several hundred times:
<?php
function get30Latest(){
$directory = new RecursiveDirectoryIterator('./');
$iterator = new RecursiveIteratorIterator($directory);
$regex = new RegexIterator($iterator, '/^.+\.jpg$/i', RecursiveRegexIterator::GET_MATCH);
foreach($regex as $file){
$tmp->arr[date('YmdHis',filemtime($file[0]))][] = $file[0];
krsort($tmp->arr,SORT_NATURAL);
foreach($tmp->arr as $timestamp => $files) {
for($i = 0; $i < count($files); $i++)
$new[] = (!empty($root))? str_replace($root,"",$files[$i]) : $files[$i];
echo $new[0] . "</br>"; //this is just for debugging so I can see what files
//are showing up. Ideally this will be the array I'll
//pull the first 30 from and then send them off to a
//thumbnail creation function
}
}
}
?>

php - xml - random filter and store the order

PHP: I get XML feed of 20 articles, I have pick 3 articles randomly and print xml out in the same format. Randomly picked article should change random every day not on every refresh.
so for ex: art1, art2, art3,art......art20
it should display: art4, art2, art 19 (random) but it should with the same article for the entire day - (10/12/12) and tomorrow it should be art1,art20,art13 (another random set)
<?php
// Load our XML document
$doc = new DOMDocument();
$doc->load('feed.xml');
// Create an XPath object and register our namespaces so we can
// find the nodes that we want
$xpath = new DOMXPath($doc);
$xpath->registerNamespace('p', 'http://purl.org/dc/elements/1.1/');
// Random generated xml should go here
// Write our updated XML back to a new file
$doc->save('feedout.xml');
?>
Since storing the article order needs server file storage, I can push that back. How can I randomize the article
for ($i = 0; $i < $nodes->3; $i++) {
$node = $nodes->item($i);}
Thanks
How about just save your file with a date name and then check that date doesn't already exist
// Write our updated XML back to a new file
if( !file_exists( $date . '_feedout.xml' ) )
$doc->save( $date . '_feedout.xml' );
Or
// Write our updated XML back to a new file
if( date( "Y/m/d", filemtime( 'feedout.xml' ) ) != $date )
$doc->save( 'feedout.xml' );

In PHP, how can I get an XML attribute based on a variable?

I'm retrieving files like so (from the Internet Archive):
<files>
<file name="Checkmate-theHumanTouch.gif" source="derivative">
<format>Animated GIF</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>72ec7fcf240969921e58eabfb3b9d9df</md5>
<mtime>1274063536</mtime>
<size>377534</size>
<crc32>b2df3fc1</crc32>
<sha1>211a61068db844c44e79a9f71aa9f9d13ff68f1f</sha1>
</file>
<file name="CheckmateTheHumanTouch1961.thumbs/Checkmate-theHumanTouch_000001.jpg" source="derivative">
<format>Thumbnail</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>6f6b3f8a779ff09f24ee4cd15d4bacd6</md5>
<mtime>1274063133</mtime>
<size>1169</size>
<crc32>657dc153</crc32>
<sha1>2242516f2dd9fe15c24b86d67f734e5236b05901</sha1>
</file>
</files>
They can have any number of <file>s, and I'm solely looking for the ones that are thumbnails. When I find them, I want to increase a counter. When I've gone through the whole file, I want to find the middle Thumbnail and return the name attribute.
Here's what I've got so far:
//pop previously retrieved XML file into a variable
$elem = new SimpleXMLElement($xml_file);
//establish variable
$i = 0;
// Look through each parent element in the file
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){$i++;}
}
//find the middle thumbnail.
$chosenThumb = ceil(($i/2)-1);
//Gloriously announce the name of the chosen thumbnail.
echo($elem->file[$chosenThumb]['name']);`
The final echo doesn't work because it doesn't like have a variable choosing the XML element. It works fine when I hardcode it in. Can you guess that I'm new to handling XML files?
Edit:
Francis Avila's answer from below sorted me right out!:
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$happy_string = (string)$middlethumb[name];
echo $happy_string;
Use XPath.
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$middlethumbname = (string) $middlethumb['name'];
You can also accomplish this with a single XPath expression if you don't need the total count:
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name');
$middlethumbname = (count($thumbs)) ? $thumbs[0]['name'] : '';
A limitation of SimpleXML's xpath method is that it can only return nodes and not simple types. This is why you need to use $thumbs[0]['name']. If you use DOMXPath::evaluate(), you can do this instead:
$doc = new DOMDocument();
$doc->loadXMLFile($url);
$xp = new DOMXPath($doc);
$middlethumbname = $xp->evaluate('string(/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name)');
$elem->file[$chosenThumb] will give the $chosenThumb'th element from the main file[] not the filtered(for Thumbnail) file[], right?
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){
$i++;
//add this item to a new array($filteredFiles)
}
}
$chosenThumb = ceil(($i/2)-1);
//echo($elem->file[$chosenThumb]['name']);
echo($filteredFiles[$chosenThumb]['name']);
Some problems:
Middle thumbnail is incorrectly calculated. You'll have to keep a separate array for those thumbs and get the middle one using count.
file might need to be {'file'}, I'm not sure how PHP sees this.
you don't have a default thumbnail
Code you should use is this one:
$files = new SimpleXMLElement($xml_file);
$thumbs = array();
foreach($files as $file)
if($file->format == "Thumbnail")
$thumbs[] = $file;
$chosenThumb = ceil((count($thumbs)/2)-1);
echo (count($thumbs)===0) ? 'default-thumbnail.png' : $thumbs[$chosenThumb]['name'];
/edit: but I recommend that guy's solution, to use XPath. Way easier.

Can't get the dom node value extracted

I have a code that links to another site, grabs that data, and returns the string to a variable.. i'm wondering why this isn't working however?
<?php
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
$doc = new DOMDocument();
#$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('h1');
for ($i=1; $i<=7; $i++)
{
echo trim($elements->item($i)->nodeValue);
}
?>
there are seven "h1" tags that i would like to grab but they won't return to echo out? an example of the string would be "Here is the test string i am trying to pull out"
This will not work because the path dont exists. It points to a file on your server.
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
I'n not sure if loadHTMLFile() can handle URLs at all. You may need to get the document with file() and load it with DOMDocument::loadHTML.

List directory filenames by file word count

I'd like to,
Check the word count for a folder full of text files.
Output a list of the files arranged by word count in the format - FILENAME is WORDCOUNT
I know str_word_count is used to get individual wordcounts for files but I'm not sure how to rearrange the output.
Thanks in advance.
Adapted from here.
<?php
$files = array();
$it = new DirectoryIterator("/tmp");
$it->rewind();
while ($it->valid()) {
$count = str_word_count(file_get_contents($it->getFilename()));
$files[sprintf("%010d", $count) . $it->getFilename()] =
array($count, $it->getFilename());
$it->next();
}
ksort($files);
foreach ($files as $tup) {
echo sprintf("%s is %d\n", $tup[1], $tup[0]);
}
EDIT It would be more elegant to have $file's key be the file name and $file's value be the word count and then sort by value.
I don't use php but I would
create array to hold filename and
wordcount
read through the folder full of text
files and for each save the filename
and wordcount to the array
sort the array by wordcount
output the array
To store the information (#2) I would put the information into a 2D array. There is more information about 2D arrays here at Free PHP Tutorial. Thus array[0][0] would equal the name of the first file and array0 would be the wordcount. array1[0] and array1 would be the for the next file.
To sort the array (#3) you can use the tutorial firsttube.com.
The to output I would do a loop through the array and output the first and second location.
for ($i = 0; $i < sizeof($array); ++$i) {
print the filename ($array[$i][0]) and wordcount ($array[$i][1])
}
If you would like to keep the iterator-style approach (yet still do essentially the same as Artefacto's answer) then something like the following would suffice.
$dir_it = new FilesystemIterator("/tmp");
// Build array iterator with word counts
$arr_it = new ArrayIterator();
foreach ($dir_it as $fileinfo) {
// Skip non-files
if ( ! $fileinfo->isFile()) continue;
$fileinfo->word_count = str_word_count(file_get_contents($fileinfo->getPathname()));
$arr_it->append($fileinfo);
}
// Sort by word count descending
$arr_it->uasort(function($a, $b){
return $b->word_count - $a->word_count;
});
// Display sorted files and their word counts
foreach ($arr_it as $fileinfo) {
printf("%10d %s\n", $fileinfo->word_count, $fileinfo->getFilename());
}
Aside: If the files are particularly large (read: loading each one entirely into memory just to count the words is too much) then you could loop over the file line-by-line (or byte-by-byte if you really wanted to) with the SplFileObject.

Categories