php ; using fgetcsv with SplFileObject::fseek ; line read issue - php

When reading a specific line in a csv file, I tried to use SplFileObject::fseek with fgetcsv.
To read line 2 (for example), I do a fseek(1) and read with fgetcsv, which gives line 2.
When I do a fseek(0) and read with fgetcsv, I have line 0.
So there is a issue to read line 1 this way. (I know I can read 2 lines in a row but don't think it is nice).
I found this issue reported in 2008 with PHP version 5.2.6 : SplFileObject: fgetcsv after seek returns wrong line.
I'm using PHP verion 5.4.19.
Has anyone some information on this?
Is this intended?

I know this is a pretty old bug but it's still opened on bugs.php
So here is a snippet I want to share to achieve the same (which at least work in my case)
function readBigCsv($path, $skip=1)
{
$file = new \SplFileObject($path, 'r');
$file->setFlags(\SplFileObject::READ_CSV);
$file->seek($skip);
while (!$file->eof()){
yield $file->current();
$file->next();
}
}

Related

file_get_contents(): stream does not support seeking / When was PHP behavior about this changed?

When was PHP behavior about this changed?
From which PHP version is it?
Warning: file_get_contents(): stream does not support seeking in
/simple_html_dom.php
Warning: file_get_contents(): Failed to seek to position -1 in the stream in
/simple_html_dom.php
include('parser/simple_html_dom.php');
$url = "https://en.wikipedia.org/wiki/Stack_Overflow";
$html = file_get_html($url);
if ($html !== false) {
foreach($html->find('div#mw-content-text') as $item){
$item->plaintext;
}
}
I had the same issue on my page when I moved it from one system to another, I was able to change the simple_html_dom.php file by removing the offset reference (didn't cause any further problems for me).
On line 75 of simple_html_dom.php:
$contents = file_get_contents($url, $use_include_path, $context, $offset);
I removed the reference to $offset:
$contents = file_get_contents($url, $use_include_path, $context);
No my page works fine. Not taking liability for anything else it breaks! :)
Change
function file_get_html(..., $offset = -1,...)
to
function file_get_html(..., $offset = 0,...)
in simple_html_dom.php
You don't need to edit the vendor files. Just change your requests from:
$html = HtmlDomParser::file_get_html( "https://www.google.com/");
to:
$html = HtmlDomParser::file_get_html( "https://www.google.com/", false, null, 0 );
The problem is that the default offset used by Simple HTML DOM is "-1" when you want it to be "0". Luckily it accepts it as a parameter, which means you can change it easily without needing to change the Simple HTML DOM source.
Note: This compatibility issue was fixed in v1.7+
See file_get_contents(): stream does not support seeking PHP
You are working with a remote file. Seeking is only supported for local files.
You probably need to copy the file to your local file system before using file_get_html. It should work fine on localhost.
Others have shared the solution, but no one has shared why. I don't know specifically why this is different between PHP 7.0 & 7.1, but the PHP.net docs for this function say:
Seeking (offset) is not supported with remote files. Attempting to
seek on non-local files may work with small offsets, but this is
unpredictable because it works on the buffered stream.
I can confirm that removing the offset parameter in file_get_contents on line 75 works for me and/or setting the offset to 0 in the file_get_html function on line 70 works too.
I guess that the offset parameter was never meant to be used with non local files since:
The offset where the reading starts on the original stream. Negative
offsets count from the end of the stream.
Hope this helps clear up any confusion. With external sources, it makes sense to start streaming from the beginning.
first, try to change simple_html_dom.php like
remove the offset parameter from file_get_contents(...) on line 75
OR set the offset to 0 in file_get_html func on line 70
if still not works ??? like mine
then it means you have the latest version of PHP and you need to download the latest version of simple_html_dom.php from https://sourceforge.net/projects/simplehtmldom/
after that, it works for me on each machine and system
Set $offset = 0
That is working!

CSV file line count not working in PHP

I have a webpage that needs to count the number of lines in a CSV file, but the following code isn't working:
$linecount = count(file("sample.csv"));
var_dump($linecount);
When I run this code, the code returns the number 1, but there are 8 lines in sample.csv. Does anybody know why this is happening and how to fix it?
If the sample.csv file created in mac/linux you might want to consider setting auto_detect_line_endings to ON.
From the manual:
auto_detect_line_endings boolean
When turned on, PHP will examine the data read by fgets() and file() to see if it is using
Unix, MS-Dos or Macintosh line-ending conventions.
Another option (if you don't want to use this) is to read the file and split the lines by all new-line options (\r\n|\r|\n):
$linecount = count(preg_split("/\r\n|\r|\n/", file_get_contents("sample.csv")));

fgetcsv returns too many entries

I have the following code:
while (!feof($file)) {
$arrayOfIdToBodyPart = fgetcsv($file,0, "\t");
if (count($arrayOfIdToBodyPart)==2){
the problem is, the contents of the file look like this:
39 ankle
40 tibia
41 Vastus Intermedius
and so on
sometimes, the test in the if will show three entries, with the first being the number, the second being the name, and the third being just... emtpy.
This causes the if block to fail, and me to be sad. I know i can just make the if block test for >=2, but is there any way i can get it to just recognise the fact that there are two items? I don't like that the fgetcsv is finding "mystery" characters at the end of the line.
Is this possibly a unix server running a windows-based file error? If so, and i'm running an ubuntu server without dos2unix, where do i get it?
You probably have tabs at the end of a line:
value<tab>value<tab><newline>
If that's the case, dos2unix won't help you. You might have to do something like read each line into a variable, trim() the variable, and then use str_getcsv() to split it.
Is it possible that you have a tab at the end of those lines? They are invisible and often hard to spot... you might want to double check.
Also if you are working with csv files, while you are running windows locally and the server is unix, I found this line:
ini_set('auto_detect_line_endings', true);
saves a lot of headaches.

Merge two large CSV files with PHP

I want to merge two large CSV files with PHP. This files are too big to even put into memory all at once. In pseudocode, I can think of something like this:
for i in file1
file3.write(file1.line(i) + ',' + file2.line(i))
end
But when I'm looping through a file using fgetcsv, it's not really clear how I would grab line n from a certain file without loading the whole thing into memory first.
Any ideas?
Edit: I forgot to mention that each of the two files has the same number of lines and they have a one-to-one relationship. That is, line 62,324 in file1 goes with line 62,324 in file2.
Not sure what operating system you're on, but if you're using Linux, using the paste command is probably a lot easier than trying to do this in PHP.
If this is a viable solution and you don't absolutely need to do it in PHP, you could try the following:
paste -d ',' file1 file2 > combined_file
Take a look at the fgets function. You could read a single line of each file, process them, and write them to your new file, then move on to the next line until you've reached the end of your file.
PHP: fgets
Specifically look at the example titled Example #1 Reading a file line by line in the PHP manual. It's also important to note the return value of the the fgets functions.
Returns a string of up to length - 1
bytes read from the file pointed to by
handle. If there is no more data to
read in the file pointer, then FALSE
is returned.
So, if it doesn't return FALSE you know you still have more lines to process.
You can use fgets().
$file1 = fopen('file1.txt', 'r');
$file2 = fopen('file2.txt', 'r');
$merged = fopen('merged.txt', 'w');
while (
($line1 = fgets($file1)) !== false
&& ($line2 = fgets($file2)) !== false) {
fwrite($merged, $line1 . ',' . $line2);
}
fgets() reads one line from a file. As you can see, this code uses it on both files at the same time, writing the merged lines to a third file. The manual here:
http://php.net/fgets
http://php.net/fopen
http://php.net/fwrite
Try using fgets() to read one line from each file at a time.
I think the solution for this is to map first line begins for each line ( and some kind of key if you need ) and then make a new csv using fread and fwrite ( we know beginning and ending of each line now , so we need just seek and read )
Another way is to put it into MySQL ( if it is possible ) and then back to new CSV

Read in text file line by line php - newline not being detected

I have a php function I wrote that will take a text file and list each line as its own row in a table.
The problem is the classic "works fine on my machine", but of course when I ask somebody else to generate the .txt file I am looking for, it keeps on reading in the whole file as 1 line. When I open it in my text editor, it looks just as how I would expect it with a new name on each line, but its the newline character or something throwing it off.
So far I have come to the conclusion it might have something to do with whatever text editor they are using on their Mac system.
Does this make sense? and is there any easy way to just detect this character that the text editor is recognizing as a new line and replace it with a standard one that php will recognize?
UPDATE: Adding the following line solved the issue.
ini_set('auto_detect_line_endings',true);
Function:
function displayTXTList($fileName) {
if(file_exists($fileName)) {
$file = fopen($fileName,'r');
while(!feof($file)) {
$name = fgets($file);
echo('<tr><td align="center">'.$name.'</td></tr>');
}
fclose($file);
} else {
echo('<tr><td align="center">placeholder</td></tr>');
}
}
This doesn't work for you?
http://us2.php.net/manual/en/filesystem.configuration.php#ini.auto-detect-line-endings
What's wrong with file()?
foreach (file($fileName) as $name) {
echo('<tr><td align="center">'.$name.'</td></tr>');
}
From the man page of fgets:
Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.
Also, have you tried the file function? It returns an array; each element in the array corresponds to a line in the file.
Edit: if you don't have access to the php.ini, what web server are you using? In Apache, you can change PHP settings using a .htaccess file. There is also the ini_set function which allows changing settings at runtime.
This is a classic case of the newline problem.
ASCII defines several different "newline" characters. The two specific ones we care about are ASCII 10 (line feed, LF) and 13 (carriage return, CR).
All Unix-based systems, including OS X, Linux, etc. will use LF as a newline. Mac OS Classic used CR just to be different, and Windows uses CR LF (that's right, two characters for a newline - see why no one likes Windows? Just kidding) as a newline.
Hence, text files from someone on a Mac (assuming it's a modern OS) would all have LF as their line ending. If you're trying to read them on Windows, and Windows expects CR LF, it won't find it. Now, it has already been mentioned that PHP has the ability to sort this mess out for you, but if you prefer, here's a memory-hogging solution:
$file = file_get_contents("filename");
$array = split("/\012\015?/", $file); # won't work for Mac Classic
Of course, you can do the same thing with file() (as has already been mentioned).

Categories