Is there a way to loop through an XML file from a specific element / node.
For instance if I want to start from <offer id="a2a9d7a3a520de69e8e06f3e53df1c49"> in this XML feed: http://pastebin.com/n5myzcz1
Is is then possible to load all offers after that ID?
I was going to suggest XPath but since you mention that the file is 1GB, you should use a streaming parser such as XMLReader (which I now read that you are already using).
You are bound to have a linear search because you don't know where the element you want is so you have to go through all of them until you find it.
EDIT
Just an esoteric idea could be using shell_exec to grep the file to find the line where the ID is and then cutting the file from that line using sed or equivalent.
EDIT 2
Ahhhh, this was fun!!
$line = intval(shell_exec('grep -n a2a9d7a3a520de69e8e06f3e53df1c49 orders.xml | cut -d : -f1'));
$totalLines = intval(shell_exec('wc -l orders.xml'));
echo shell_exec("sed -n '".$line.",".$totalLines."p' orders.xml");
Maybe you can just put 9999999999 instead of caculating the total lines. Not saying you should of course ;) ;)
I don't know if this is faster than just going through the file with XMLReader but I guess it's up to you to decide if it's worth it.
I hope this can give you further ideas to solve your problem.
Related
I have already got lint testing and code standards checking but I've like to go one further and add a hotkey to change all the code to a certain standard.
I have so far got as far as...
:r ! phpcbf --standard=psr2 %
But that only pulls the document in. So how can I make is just act like a filter and replace the entire script?
As described in :help filter, the general format for filtering content through an external program is
:{range}!{filter} [arg]
The expectation is that the filter command reads input on stdin and writes it to stdout.
For your tool, this likely translates to :%!phpcbf --standard=psr2.
Assuming your command can take input from stdin you would do the following:
:%!phpcbf --standard=psr2
Please ready :h filter
Can anyone give me some pointers with regard PHP command execution and best practice?
Im currently trying to parse some netbackup data, but i am running into issues related to the massive amount of data the system call is returning. In order to cut down the amount of data im retreiving I'm doing something like this:
$awk_command = "awk -F, '{print $1\",\"$2\",\"$3\",\"$4\",\"$5\",\"$6\",\"$7\",\"$9\",\"$11\",\"$26\",\"$32\",\"$33\",\"$34\",\"$35\",\"$36\",\"$37\",\"$38\",\"$39\",\"$40}'";
exec("sudo /usr/openv/netbackup/bin/admincmd/bpdbjobs -report -M $master_name -all_columns | $awk_command", $get_backups, $null);
foreach ($get_backups as $backup_detail)
{
process_the_data();
write_data_to_db();
}
Im using awk to limit the amount of data be received. Without it i end up receiving nearly ~150mb of data, and with it, i get a much more manageable ~800k of data.
You don't need to tell me that the awk shit is nasty - i know that already... But in the interests of bettering myself (and my code) can anyone suggest an alternative?
I was thinking of something like proc_open but really not sure if that is going to provide any benefits.
Use exec to write the data to a file instead of reading it whole into your script.
exec("sudo /usr/openv/netbackup/bin/admincmd/bpdbjobs -report -M $master_name -all_columns | $awk_command > /tmp/output.data");
Then use any memory efficient method to read the file in parts.
Have a look here:
Least memory intensive way to read a file in PHP
i have about 50 file in some folder
i want get all files contain 'Name : EMAD' inside!
example:
the php imap library!
they search in thousands messages text files for some word!
i have wrote an stupid function to open file and search inside
$s = scandir('dir');
foreach($s as $file){
$content = file_get_contents($file);
if(strpos($content,'Name : ENAD') !== false)
$matched_files[] = $file;
}
but what if there is thousands of files!
should i open all files???? !!!!
is that possible to search for something inside file without open it?
if NO
what is the best and fast way to do that ?php
is that possible to search for something inside file without open it?
of course - no. Where is your common sense? Can you search a refrigerator without opening it?
i have about 50 file in some folder
no problem, it will be fast enough. opening a file is not THAT heavy operation as you imagine.
but what if there is thousands of files!
first have a thousand then come to ask.
what is the best and fast way to do that ?
Store your data in database, not files
Is there a reason you need to use PHP functions? This is exactly what grep was designed to do...
You could always just use PHP's exec() to run an appropriate grep command, such as:
grep -lr 'Name : ENAD' dir
But you might also want to consider (if you're the person creating thousands of files in the first place) whether that is the best way of storing your data - if you usually need the ability to search quickly, you might want to either use a database instead of plain files (e.g. MySQL, PostgreSQL, SQLite, et cetera), or keep a search index (using e.g. Sphinx, Solr, or Lucene).
I have a file that I'm using to log IP addresses for a client. They want to keep the last 500 lines of the file. It is on a Linux system with PHP4 (oh no!).
I was going to add to the file one line at a time with new IP addresses. We don't have access to cron so I would probably need to make this function do the line-limit cleanup as well.
I was thinking either using like exec('tail [some params]') or maybe reading the file in with PHP, exploding it on newlines into an array, getting the last 1000 elements, and writing it back. Seems kind of memory intensive though.
What's a better way to do this?
Update:
Per #meagar's comment below, if I wanted to use the zip functionality, how would I do that within my PHP script? (no access to cron)
if(rand(0,10) == 10){
shell_exec("find . logfile.txt [where size > 1mb] -exec zip {} \;")
}
Will zip enumerate the files automatically if there is an existing file or do I need to do that manually?
The fastest way is probably, as you suggested, to use tail:
passthru("tail -n 500 $filename");
(passthru does the same as exec only it outputs the entire program output to stdout. You can capture the output using an output buffer)
[edit]
I agree with a previous comment that a log rotate would be infinitely better... but you did state that you don't have access to cron so I'm assuming you can't do logrotate either.
logrotate
This would be the "proper" answer, and it's not difficult to set this up either.
You may get the number of lines using count(explode("\n", file_get_contents("log.txt"))) and if it is equal to 1000, get the substring starting from the first \n to the end, add the new IP address and write the whole file again.
It's almost the same as writing the new IP by opening the file in a+ mode.
I'd like to insert
<?php include_once('google_analytics.php'); ?>
before the closing body tag of about 100 php files. Unfortunately the person who made the site didn't make a header or footer template.
What is the best way to do this? I've tried using grep/find for getting a list of files and piping the results through xargs to sed, but I've had no luck. I probably have the regex wrong. Can anyone help me out with this?
Also, are there any graphical tools for Apple OS X that you would recommend?
Thanks,
Mike
edit
find . -print0 -name "*.php" | xargs -0 perl -i.bak -pe 's/<\/body>/<?php include_once("google_analytics.php"); ?>\n<\/body>/g'
works.
Using sed:
sed -i s/'<\/body>'/"<?php include_once('google_analytics.php'); ?>\n<\/body>"/ *.htm
The -i option edits the file in place. If you say -iBAK then it will create a backup of the file before editing it.
If you're interested in GUI tools, download TextMate. Put all 100 files in a folder and open that folder with TM. This will put TM in project mode, and you'll see all the files in a sidebar. Now, do Edit>Find>Find In Project, put </body> in the "find" field, <?php include_once('google_analytics.php'); ?></body> in the "replace" field, hit replace and let it run.
this calls for an ed script
#!/bin/sh
for i in *.html; do
ed $i << \eof
?</body>?s/^/<?php include_once('google_analytics.php'); ?>&/
w
q
eof
done
It fires up one of the first (literally) programs ever written for Unix, Ken Thompson's ed(1) text editor on each file and makes the necessary edit. If you want it to work on specific files rather than on every .html in the directory, just change *.html to "$#".
Reading the Wikipedia link just now, I learned something interesting. Ken Thompson made the first actual application of regular expressions, apparently they were just a mathematical expression until he wrote ed(1).
You need to supply an array with filenames in $files to make the following solution work:
foreach ($files as $file)
{
$txt = file_get_contents($file);
$txt = str_replace('</body>', '<?php include_once(\'google_analytics.php\'); ?>'."\n".'</body>', $txt);
file_put_contents($file, $txt);
}
Dreamweaver will do a find/replace for the entire local site; I'm sure other html editors would as well.
Some IDE's like dreamweaver provide functionality to do find and replace in many files, entire folders etc. You can use one of those and do a find and replace replacing the close body tag with the code you want.
I'm not a PHP developer, but is PHP a valid XML format? It doesn't look like it but what do I know. Even if it is the success of this answer may depend more on whether the files you are working with are valid PHP. But...if it is it would be fairly simple to use an XML transform (xslt) to match the body tag, copy its content and append a new tag after the matched content.
What about a replace on
</body>
with
<?php include_once('google_analytics.php'); ?></body>
?