I'm looping through a directory, trying to find XML files with errors.
$baddies = array();
foreach (glob("fonts/*.svg") as $filename) {
libxml_use_internal_errors(true);
$str = file_get_contents($filename);
$sxe = simplexml_load_string($str);
$errors = libxml_get_errors();
$num_of_errors = 0;
$num_of_errors = sizeof($errors);
if ($num_of_errors > 0){
array_unshift($baddies, $filename);
}
}
However it seems that once the errors are put into this object, they persist there through subsequent iterations of the loop, and files without errors still test positive. $num_of_errors remains high for good files. I have it being reset to zero, and have even tried unseting it after each time through the loop. I suppose libxml_get_errors continues to retain a value once set. How can I reset it?
I think you should use libxml_clear_errors function. As per document here it says, the function keeps the errors stored in buffer.
Related
I am trying to make a PHP application which searches through the files of your current directory and looks for a file in every subdirectory called email.txt, then it gets the contents of the file and compares the contents from email.txt with the given query and echoes all the matching directories with the given query. But it does not work and it looks like the problem is in the if-else part of the script at the end because it doesn't give any output.
<?php
// pulling query from link
$query = $_GET["q"];
echo($query);
echo("<br>");
// listing all files in doc directory
$files = scandir(".");
// searching trough array for unwanted files
$downloader = array_search("downloader.php", $files);
$viewer = array_search("viewer.php", $files);
$search = array_search("search.php", $files);
$editor = array_search("editor.php", $files);
$index = array_search("index.php", $files);
$error_log = array_search("error_log", $files);
$images = array_search("images", $files);
$parsedown = array_search("Parsedown.php", $files);
// deleting unwanted files from array
unset($files[$downloader]);
unset($files[$viewer]);
unset($files[$search]);
unset($files[$editor]);
unset($files[$index]);
unset($files[$error_log]);
unset($files[$images]);
unset($files[$parsedown]);
// counting folders
$folderamount = count($files);
// defining loop variables
$loopnum = 0;
// loop
while ($loopnum <= $folderamount + 10) {
$loopnum = $loopnum + 1;
// gets the emails from every folder
$dirname = $files[$loopnum];
$email = file_get_contents("$dirname/email.txt");
//checks if the email matches
if ($stremail == $query) {
echo($dirname);
}
}
//print_r($files);
//echo("<br><br>");
?>
Can someone explain / fix this for me? I literally have no clue what it is and I debugged soo much already. It would be heavily gracious and appreciated.
Kind regards,
Bluppie05
There's a few problems with this code that would be preventing you from getting the correct output.
The main reason you don't get any output from the if test is the condition is (presumably) using the wrong variable name.
// variable with the file data is called $email
$email = file_get_contents("$dirname/email.txt");
// test is checking $stremail which is never given a value
if ($stremail == $query) {
echo($dirname);
}
There is also an issue with your scandir() and unset() combination. As you've discovered scandir() basically gives you everything that a dir or ls would on the command line. Using unset() to remove specific files is problematic because you have to maintain a hardcoded list of files. However, unset() also leaves holes in your array, the count changes but the original indices do not. This may be why you are using $folderamount + 10 in your loop. Take a look at this Stack Overflow question for more discussion of the problem.
Rebase array keys after unsetting elements
I recommend you read the PHP manual page on the glob() function as it will greatly simplify getting the contents of a directory. In particular take a look at the GLOB_ONLYDIR flag.
https://www.php.net/manual/en/function.glob.php
Lastly, don't increment your loop counter at the beginning of the loop when using the counter to read elements from an array. Take a look at the PHP manual page for foreach loops for a neater way to iterate over an array.
https://www.php.net/manual/en/control-structures.foreach.php
I am using league/csv to parse a csv file and then later dumping those data to the database.
The structure looks like:
$csv = Reader::createFromPath($csv_file_path, 'r');
$csv->setOutputBOM(Reader::BOM_UTF8);
$csv->addStreamFilter('convert.iconv.ISO-8859-15/UTF-8');
$csv->setHeaderOffset(0);
$csv_header = $csv->getHeader();
$loop = true;
while($loop){
$stmt = (new Statement())
->offset($offset)
->limit($limit)
;
$records = $stmt->process($csv);
foreach ($records as $record)
{
$rec_arr[] = array_values($record);
}
$records_arr = $service->trimArray($rec_arr);
if(count($records_arr)>0)
{
foreach($records_arr as $ck => $cv){
//map data and insert into database
}
}else{
$loop = false;
}
}
Currently, I am implementing this logic inside a laravel queue. It is successfully inserting the whole set of data but it is not halting the process.
It keeps getting stuck with message processing. However, if I removed that while loop then it will be stopped with message processed.
So, I think it should be something that I am implementing some bad logic there.
Looking for an idea to tackle with this.
if(count($records_arr)>0)
This line probably evaluates to true always.
Your code never reaches the $loop = false; end condition.
#stuart thanks for your comment. It was because I had working loop previously which used to work with multiple ajax requests. However, now with queue too, I had placed records, rec_arr outside of loop. Here, I placed this array initialization inside while loop and it works perfectly fine.
I'm very new to PHP, making errors and learning as I go. Please be gentle! :)
I want to access some data from Blizzard.com's API. For this particular data set, it's not a block of data in JSON, rather each object has it's own URL to access. I estimate that there are approx 150000 objects, however I don't know the start or end points of the number range. So I'm having to assume 1 and work past the highest number I know (269065)
To get the data, I need to access each object's data via a JSON file, which I read, get the contents of & drop in to a text file (this could be written as an insert in to a SQL db too, as I'm able to do this if it's the text file that's the issue). But to be honest, I would love to get to the bottom of why this is happening as much as anything!
I wasn't going to try and run ~250000 iterations in a for loop, I thought I'd try something I considered small, 2000.
The for loop starts with $a as 1, uses $a as part of the URL, loads & decodes the JSON, checks to see if the first field (ID) in the object is set, if it is, it writes a few fields to data.txt & if the first field (ID) isn't set it just writes $a to data.txt (so I know it's a null for other purposes not outlined here).
Simple! Or so I thought, after approx after 183 iterations, the data written to the text file goes awry as seen by the quote below. It is out of sequence and starts at 1 again, then back to 184 ad nauseam. The loop then seems to be locked in some kind of infinite loop of running, outputting in a random order until I close the page 10-20 minutes later.
I have obviously made a big mistake! But I have no idea what I have done wrong to have caused this. During my attempts I have rewritten the code with new variable names, so a new text does not conflict with code that could be running in memory.
I've tried resetting variables to blank at the end of the loop in case it something was being reused that was causing a problem.
If anyone could point out any errors in my code, or suggest something for me to look in to, to handle bigger loops that would be brilliant. I am assuming my issue may be a time out or memory problem. But I don't know where to start & was hoping I'd find some suggestions here.
If it's relevant, I am using 000webhostapp.com as my host provider for now, until I get some paid for hosting.
1 ... 182 183 1 184 2 3 185 4 186 5 187 6 188 7 189 190 8 191
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/".$a."?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$file = fopen("data.txt","a");
fwrite($file,$data['id'].",'".$data['name']."'\n");
fclose($file);
} else {
$file = fopen("data.txt","a");
fwrite($file,$a."\n");
fclose($file);
}
}
The content of the file I'm trying to access is
{"id":33994,"name":"Precise Strikes","profession":"Enchanting","icon":"spell_holy_greaterheal"}
I scrapped the original plan and wrote this instead. Thank you again who took the time out of their day to help and offer suggestions!
$b = $mysqli->query("SELECT id FROM `static_recipes` order by id desc LIMIT 1;")->fetch_object()->id;
if (empty($b)) {$b=1;};
$count = $b+101;
$write = [];
for ($a = $b+1; $a < $count; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/".$a."?locale=en_GB&apikey=";
$contents = #file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$write [] = "(".$data['id'].",'".addslashes($data['name'])."','".addslashes($data['profession'])."','".addslashes($data['icon'])."')";
} else {
$write [] = "(".$a.",'a','a','a'".")";
}
}
$SQL = ('INSERT INTO `static_recipes` (id, name, profession, icon) VALUES '.implode(',', $write));
$mysqli->query($SQL);
$mysqli->close();
$write = [];
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/".$a."?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$write [] = $data['id'].",'".$data['name']."'\n";
} else {
$write [] = $a."\n";
}
}
$file = fopen("data.txt","a");
fwrite($file, implode('', $write));
fclose($file);
Also, why you are think what some IDS isn't duplicated at several "https://eu.api.battle.net/wow/[N]" urls data?
Also if you are I wasn't going to try and run ~250000 think about curl_multi_init(): http://php.net/manual/en/function.curl-multi-init.php
I can't really see anything obviously wrong with your code, can't run it though as I don't have the JSON
It could be possible that there is some kind of race condition since you're opening and closing the same file hundreds of times very quickly.
File operations might seem atomic but not necessarily so - here's an interesting SO thread:
Does PHP wait for filesystem operations (like file_put_contents) to complete before moving on?
Like some others' suggested - maybe just open the file before you enter the loop then close the file when the loop breaks.
I'd try it first and see if it helps.
There's nothing in your original code that would cause that sort of behaviour. PHP will not arbitrarily change the value of a variable. You are opening this file in append mode, are you certain that you're not looking at old data? Maybe output some debug messages as you process the data. It's likely you'd run up against some rate limiting on the API server, so putting a pause in there somewhere may improve reliability.
The only substantive change I'd suggest to your code is opening the file once and closing it when you're done.
$file = fopen("data_1_2000.txt", "w");
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/$a?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents, true);
if (!empty($data['id'])) {
$data["name"] = str_replace("'", "\\'", $data["name"]);
$record = "$data[id],'$data[name]'";
} else {
$record = $a;
}
fwrite($file, "$record\n");
sleep(1);
echo "$a "; if ($a % 50 === 0) echo "\n";
}
fclose($file);
Have a project where I'm scraping a few sites with data, then outputting onto one site. To help with load times, I'm trying to rig it so once every 10 mins, my main website does a full data scrape, then stores it all into a cache folder called "cache", stored in the root folder. Then, anytime I refresh main site after that 10 mins, it pulls from the cache, making load times quite fast at that point.
Trouble is, load times haven't changed, which it really should using this method, so I'm doing something wrong. Would appreciate any help. Now I can confirm the data IS being stored in the cache, because I see the files automatically appearing there. So the issue has to be that the code is broken where specified to grab the data from cache, after it's stored every 10 minutes, it's not grabbing the data.
*part of me wonders if the issue is with how the filenames are being saved in cache, right now it seems to be random values. for ex, one is named f32dd7f0b85eb4c1be0bb9a417cc29ea553d898e.html
I'd think it needs to be saved as a specific file name. Not sure how to achieve that though. The code at the end of my php reference files seem to specify this, so not sure issue. The code that is supposed to be doing this is at the bottom of the post.
I'm really new to php, and honestly have only gotten this far through some very nice and helpful people. I'm close, but not quite there yet with this cache framework.
global.php in root folder:
<?php
$_cache_time =600; //10 minutes
$_cache_dir="./cache"; //cache dir
function deleteBlankInArray($var){
return !ctype_space($var)&&!empty($var);
}
function cache_start($filename)
{
global $_cache_dir,$_cache_time;
$cachefile = $_cache_dir.'/'.sha1($filename).'.html';
ob_start();
if(file_exists($cachefile) && (time( )-$_cache_time <
filemtime($cachefile)))
{
include($cachefile);
ob_flush();
return true;
}
return false;
}
function cache_end($filename)
{
global $_cache_dir,$_cache_time;
$cachefile = $_cache_dir.'/'.sha1($filename).'.html';
$fp = fopen($cachefile, 'w');
fwrite($fp, ob_get_contents());
fclose($fp);
ob_flush();
}
My main website, is an xhtml site. It's referencing these php pages like this:
<?php include 's&pcurrent.php';?>
<?php include 'news.php';?>
It's referencing/outputting multiple php files, which is why load times are slow, if not pulling from cache.
And lastly, this is an example of one of my php files that are being "included". This one is called litecoinchange.php
<?php
error_reporting(E_ALL^E_NOTICE^E_WARNING);
include_once "global.php";
//filename of the file
if(!cache_start("litecoinchange.php")){
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://coinmarketcap.com/');
$xpath = new DOMXPath($doc);
$query = "//tr[#id='id-litecoin']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$result = trim($entry->textContent);
$ret_ = explode(' ', $result);
//make sure every element in the array don't start or end with blank
foreach ($ret_ as $key=>$val){
$ret_[$key]=trim($val);
}
//delete the empty element and the element is blank "\n" "\r" "\t"
//I modify this line
$ret_ = array_values(array_filter($ret_,deleteBlankInArray));
//echo the last element
echo $ret_[7];
//filename of the file
cache_end("litecoinchange");
}
}
So I have a problem with a script I'm working on. I have a folder full of JSON files called roster0.json, roster1, etc. etc.
$dir = "responses/";
$files = glob($dir . "roster*");
$failed = array();
$failcnt = 0;
if (isset($files)) {
$data = null;
for ($i = 0; $i < count($files); $i++) {
$data = json_decode(utf8_decode(file_get_contents($files[$i])));
if(isset($data)){
// Process stuff
When I var_dump($files) I get an array with over 100 paths "responses/roster0.json".
When I test $data I get a proper array of data.
However, once the loop goes to the next file, it never loads it, and never processes it.
Here's the crazy part. If I change the start of the for loop, e.g. $i = 20. It will load the 21st file in the directory and parse it and insert it into the db properly!
Ignoring the failcnt stuff at the bottom, here's the current version of the script in it's entirety. http://pastebin.com/yqyKi5Ag
PS - I have full WARNING/ERROR reporting on in PHP and not getting any error messages...HELP! Thanks!
When I was writing the insert string the ID was being duplicated and thus was invalid. Switched to auto-inc and tada. It works. Thanks for the assistance. –