I'm no php expert... so forgive me if I'm overlooking something...
I've got a csv file with two columns, looks like this:
john, VSgv4lEpuGS0vHIKapJZV7o...
jane, yHKy6NW6YJZzloFhLDQUJIN...
the first with simple names, the second with a 1000+ character string. I'm trying to echo these out into a page where the simple name becomes a hyperlink with the long character string as its source. I've got the following:
<?php
if (($list = fopen("list.csv", "r")) !== FALSE) {
while (($data = fgetcsv($list, 1000, ",")) !== FALSE) {
echo "<div><a href='localhost:8888/folder/".$data[1]."'>" . $data[0] . "</a></div>";
}
fclose($list);
}
?>
what happens instead is that the name shows up as a link, but the href contains everything but the last 30+ characters of the long string. And the strangest part is that those remaining 30+ characters show up below that div... within its own within its own (ie. imitating the structure I set up for the linked name)
...any ideas why this might be happening && how to fix it ???? ...I'm pretty stumped.
My first guess would be that in your very long character string you have a '<' or '>' character somewhere which breaks the html. You could try wrapping $data[1] in the function htmlspecialchars() which escapes these characters..
Documentation:
http://php.net/manual/en/function.htmlspecialchars.php
Example:
echo "<div><a href='localhost:8888/folder/".htmlspecialchars( $data[1] )."'>" . htmlspecialchars( $data[0] ) . "</a></div>";
The line here:
while (($data = fgetcsv($list, 1000, ",")) !== FALSE) {
Reads a thousand characters from the file. and then processes it. If a field is over that length, it'll process what it can, and then carry on with the rest of the field on the next iteration of the while loop.
You might be better off using fgets to read everything into a variable and processing it; or possibly just increasing the number to more than 1000 as a short term fix.
Citing the documentation concerning the second parameter (length) of fgetcsv():
Must be greater than the longest line (in characters) to be found in the CSV file (allowing for trailing line-end characters).
So have a look, how long your longest line is, and adjust the parameter accordingly. If the second entry alone is longer then a thousand characters, the parameter has to be even larger.
Related
Some CSV files that we import to our server cannot be parsed correctly.
We are reading the CSV file with PHP's fgetcsv():
while (($line = fgetcsv($file)) !== false) { ... }
However, when the CSV line is wrapped in quotes (and contains two double quotes inside), for example:
"first entry,"""","""",Data Chunk,2022-05-30"
The fgetcsv() function cannot handle the line correctly and sees the first entry,"""","""",Data Chunk,2022-05-30 as one entry.
How can we make sure the function does regard first entry as a separate entry, and also interpretes the other parts """" as empty entries?
On more research I found:
Fields containing double quotes ("), Line Break (CRLF) and Comma must be enclosed with double quotes.
If Fields enclosed by double quotes (") contain double quotes character then the double quotes inside the field must be preceded with another double quote as an escape sequence. Source
This is likely the issue that we face here.
A more complete data example of the CSV:
Allgemeines
Subject,Body,Attachment,Author,Created At,Updated At
"Hello everyone, this is a sample. Kind regards,"""","""",Author name (X),2022-05-30 14:54:32 UTC,2022-05-30 14:54:37 UTC"
","""",https://padlet-uploads.storage.googleapis.com/456456456/testfile.docx,Author name (X),2022-05-15 13:53:04 UTC,2022-05-15 13:54:40 UTC"
",""Hello everyone!"
This is some fun text.
More to come.
Another sentence.
And more text.
Even more text
See you soon.
","",Author name (X),2021-07-22 09:41:06 UTC,2021-07-23 16:12:42 UTC
""
Important Things to Know in 2022
Subject,Body,Attachment,Author,Created At,Updated At
"","
01.01.2022 First day of new year
02.02.2202 Second day of new year
Please plan ahead.
","",Author name (X),2021-07-22 09:58:19 UTC,2022-03-24 14:16:50 UTC
""
Note: Line starts with double quote and ends with double quote and carriage return and new line feed.
Turns out the CSV data was corrupted.
The user messed around with the CSV in Excel, and as stated in the comments, likely overwrote the original CSV. Causing double escapings.
For anyone facing the same issue:
Do not waste your time in trying to recover corrupted CSV files with a custom parser.
Ask your user to give you access to the original CSV export site and generate the CSV yourself.
Check the CSV integrity. See code below.
$file = fopen($csvfile, 'r');
// validate if all the records have same number of fields, empty lines (count 1), full entry (count 6) - depends on your CSV structure
$length_array = array();
while (($data = fgetcsv($file, 1000, ",")) !== false)
{
// count number of entries
$length_array[] = count($data);
};
$length_array = array_unique($length_array);
// free memory by closing file
fclose($file);
// depending on your CSV structure it is $length_array==1 or $length_array==2
if (count($length_array) > 2)
{
// count mismatch
return 'Invalid CSV!';
}
๐
Basically, I want to take a long text file (source code), find a specific keyword in that file, and then print out the next 400 characters that come after that keyword. I don't want every thing after the keyword because that ends up being 20,000+ characters.
If I could, I'd like to delimit them right there (which is what I tried to do originally but failed) It's becoming very confusing very quickly. If I can just get the 400 characters, then I can save that to a text file, and then delimit that 400 character text file.
My code now is:
<?php
$website = $_GET["website"]; //I'm pulling the website from a form
$contents = file_get_contents($website));
$del = 'keyword';
$search = preg_quote($del, '/');
$search = "/^.*$search.*\$/m";
if(preg_match_all($search, $contents, $found)){
echo implode("\n", $found[0]);
}else{}
?>
The problem is the above prints out EVERYthing after the keyword, and I can't even take what I get and delimit it further. I'm at the point where the more I come up with ideas the further I'm getting from the solution.
Any help is greatly appreciated.
You can use substr($your_string, 0, 400) to get only 400 characters starting from string.
Syntax for using this method is substr(string,start,length)
You can do this with a combination of strpos, strlen and substr. You don't need any regex to do this, and you should not use it because regex generally is slow as death. Avoid regex whenever possible, and only use it when you don't have any other answer.
<?php
$website = $_GET["website"]; //I'm pulling the website from a form
$contents = file_get_contents($website));
$del = 'keyword';
//get the index of the end of your split value
//this is the character index of your keyword, plus the length of the keyword,
//so it will start collecting the next bit at the end of the first occurance of keyword.
$index = strpos($contents, $del) + strlen($del);
//get the text you want
$text = substr($contents, $index, 400);
Suppose I have textarea filled with following text
employee/company/salary
john/microsoft/12.000
michael/citrusdata/15.000
How can I align each column vertically so I get following text:
employee__________company__________salary
john______________microsoft__________12.000
michael___________citrusdata__________15.000
In this example I used underscores to specify whitespaces, thought to write a simple function like nl2br() to replace '/' with one or many tab characters but it wont be a consistent solution, guess I need to read text line by line and considering the length of every word, I need to replace '/' with enough whitespace but dont have any idea how to code it, is there any other way?
I suppose you will output the textarea content outside the textarea itself, else you will need to use js alternative. My answer uses php :)
So, you may use the sprintf function that allows left or right padding.
Just split your content to get an array of lines
$lines = explode("\n", $content);
Take care of a eventual empty last entry (if your content end with a \n)
Then
foreach($lines as $line) {
$items = explode("/", $line) ;
echo sprintf("%-15s%-15s%-15s", $items[0], $items[1], $items[2]) . "<br/>";
}
"%-15" tells to left-pad with 15 empty spaces.
It works on console, but you have to nl2br it before echoing in web pages !
This is sample, so you have to add error testing (lines with only one / for example).
You should specify the width of each column like 50 characters for each or any desired width. let say it $COLUMN_WIDTH = 100;
find length of the column value (string) than subtract it from fixed length like
$COUNT_SPACES_TO_INSERT = $COLUMN_WIDTH - strlen($COLUMN_STR);
Than insert $COUNT_SPACES_TO_INSERT number of spaces it will solve your issue.
I am grabbing input from a file with the following code
$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh), " \t\n\r"))));
i had also previously tried these while troubleshooting
$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh)))));
$jap= addslashes(strtolower(trim(fgets($fh), " \t\n\r")));
and if I echo $jap it looks fine, so later in the code, without any other alterations to $jap it is inserted into the DB, however i noticed a comparison test that checks if this jap is already in the DB returned false when i can plainly see that a seemingly exact same entry of jap is in the DB. So I copy the jap entry that was inserted right from phpmyadmin or from my site where the jap is displayed and paste into a notepad i notice that it paste like this... (this is an exact paste into the below quotes)
"
ใในใซใฎใฃใฆใใใฟใธ่กใใพใใ"
and obviously i need, it without that white space and breaks or whatever it is.
so as far as I can tell the trim is not doing what it says it will do. or im missing something here. if so what is it?
UPDATE:
with regards to Jacks answer
the preg_replace did not help but here is what i did, i used the
bin2hex() to determine that the part that "is not the part i want" is
efbbbf
i did this by taking $jap into str replace and removing the japanese i am expecting to find, and what is left goes into the bin2hex. and the result was the above "efbbbf"
echo bin2hex(str_replace("ใฉใกใใใใชใใฎๆฌใงใใ","",$jap));
output of the above was efbbbf
but what is it? can i make a str_replace to remove this somehow?
The trim function doesn't know about Unicode white spaces. You could try this:
preg_replace('/^\p{Z}+|\p{Z}+$/u', '', $str);
As taken from: Trim unicode whitespace in PHP 5.2
Otherwise, you can do a bin2hex() to find out what characters are being added at the front.
Update
Your file contains a UTF8 BOM; to remove it:
$f = fopen("file.txt", "r");
$s = fread($f, 3);
if ($s !== "\xef\xbb\xbf") {
// bom not found, rewind file
fseek($f, 0, SEEK_SET);
}
// continue reading here
Edit: is there an alternative to fgetcsv?
The code below processes csv files where each entry is in cased by quotes and separated by commas ex: "Name","Last"... the problem I'm having is sometimes the csv files do not have quotes around each entry and just has the comma to separate it ex: Name,Last. How can I handle both types?
$uploadcsv = "/temp/files/Load15.csv";
$handle = fopen($uploadcsv, 'r');
$column_headers = array();
$row_count = 0;
while (($data = fgetcsv($handle, 100000, ",")) !== FALSE) {
if ($row_count==0){
$column_headers = $data;
} else {
print_r($data);
}
++$row_count;
}
this csv works:
"Name","Last"
"Mike","Aidens"
"Mike1","Aidens1"
this csv does not work:
Name,Last
Mike,Aidens
Mike1,Aidens1
Edit: Strange error... I tried a small snippet from the CSV file with no quotations and it worked. Odd then, I try a large piece then the entire CSV content (this is all be paste into a new test.csv file) and it worked. Both files are the same exact size 17,151kb yet the original csv file will not process. There is no trailing spaces or line at the end.
Set the 4th parameter to an empty string, it sets the enclosure, which is default ".
fgetcsv($handle, 100000, ",", '');
Use this line of code before php getcsv function call
ini_set('auto_detect_line_endings',TRUE);
As far as I am aware fgetcsv should work fine with or without quotes around the data.
Unless the CSV file is malformed, this will "just work".
In order words, you don't need to worry about whether or not every field has quotes around it, fgetcsv will take care of this for you.
Had the same problem, it couldn't read Hebrew (utf-8) letters without double quotes. It ran fine on the command line (could read Hebrew without double quotes), but in Apache it read only the header which had double quotes and returned empty strings instead of Hebrew strings in the rest of the lines which did not have double quotes at all.
Checked the locale in Apache and it returned the letter "C", but in the command line it returned "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
Thus I've added the following line before the fgetcsv command:
setlocale(LC_CTYPE, 'en_US.UTF-8');
And it worked, and read Hebrew letters without double quotes successfully.