I am using PHP to read in a tab delimited CSV file and a pipe delimited TXT file. Unfortunately, I cannot get a string comparsion to work even though the characters (appear) to be exactly the same. I used trim to make sure to clean up hidden characters and I even tried type-casting to string.
Var dump shows they are clearly different but I am not sure how to make them the same?
// read in CSV file
$fh = fopen($mapping_date, 'r');
$mapping_data = fread($fh, filesize($mapping_date));
...
// use str_getcsv to put each line into an array
// get values out that I want to compare
$this_strategy = (string)trim($strategy_name);
$row_strategy = (string)trim($row3[_Strategy_Name]);
if($this_strategy == $row_strategy) { // do something }
var_dump($this_strategy);
Vardump: string(16) "Low Spend ($0.2)"
var_dump($row_strategy);
Vardump: string(31) "Low Spend ($0.2)"
Can't figure out for the life of me how to make this work.
Looks like you have the database encoded in UCS2 (assuming it's MySQL). http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-ucs2.html
You can use possibly use iconv to convert the format - but there's an example in the comments on that page (but it doesn't use iconv - http://php.net/manual/en/function.iconv.php#49171 ). I've not tested it.
Alternatively, change the database field encoding to utf8_generic or ASCII or whatever the file is encoded as?
Edit: Found the actual PHP function you want: mb_convert_encoding - UCS2 is one of the supported encodings, so enable that in php ini and you're good to go.
Related
I am using League/CSV Laravel package to read and manipulate CSV file and save that CSV data into a database but I am facing some issues for some rows only which has some special characters like "45.6 ºF" while reading data from CSV.
I have searched a lot about this problem and found that we should use "UTF-8" or "utf8mb4" in the database collation and save that CSV in "utf8" also but it works only for all those special characters which are on the keyboard.
I want to use all type of special characters like "45.6 ºF" which are not on the keyboard also.
Currently, my code is reading CSV column data and convert it into binary data ' b"column value" ' It adds "b" with the string and converts that string into binary value for only those strings which have any special characters.
I have spent a lot of time but could not find any better solution to this problem. So please help me, I shall be very thankful to you.
$reader = Reader::createFromPath(public_path().'/question.csv', 'r');
$reader->setHeaderOffset(0);
$records = $reader->getRecords();
foreach ($records as $offset => $record) {
$qs = Question::first();
$qs->question = $record['Question'];
$qs->save();
}
It is giving me this result after reading from CSV with "b".
array:2 [▼
"ID" => "1"
"Question" => b"Fahrenheit to Celsius (ºF to ºC) conversion calculator for temperature conversions with additional tables and formulas"
]
but it should be in the string format without "b" binary.
If I copy that string with special characters and assign it to the static variable, then it works fine and saves data into a database like this
$a="Fahrenheit to Celsius (ºF to ºC) conversion calculator for temperature conversions with additional tables and formulas";
$qs = Question::first();
$qs->question = $a;
$qs->save();
After a lot of struggle, i have found the solution of this problem.
I just added this line to code to convert it into utf8_encode before saving in the database.
$r = array_map("utf8_encode", $record);
Don't just copy paste the text from google to save in database because copy paste text and special characters don't work most of the time.
Thanks.
I have found a solution to this problem. below line of code fixed my issue $r = array_map("utf8_encode", $record); We just need to convert into utf8_encode before saving into database.
Do not use any conversion routines; it only leads to "two wrongs accidentally making a right".
With the existence of MySQL's LOAD DATA INFILE, do you even need fgetcsv? Simply execute the LOAD SQL command with the suitable character set specified in the command. The value for that should match the encoding of the csv file. If in doubt, try to get the hex of º from the file:
hex BA --> character set latin1
hex C2BA --> character set utf8 (or utf8mb4)
The column in the database table can be latin1 or utf8 or utf8mb4. The conversion, if needed, will happen during the LOAD.
The degree sign is one of the few special characters that exists in both charsets, so if you have others, latin1 may not be a viable option. (utf8/utf8mb4 has lots more special characters.)
The current use of b"..." may be making things worse by shoehorning C2BA into a latin1 column, leading to Mojibake: º instead of º.
I need to split a big DBF file using php functions, this means that i have for example 1000 records, i have to create 2 files with 500 records each.
I do not have any dbase extension available nor i can install it so i have to work with basic php functions. Using basic fread function i'm able to correctly read and parse the file, but when i try to write a new dbf i have some problems.
As i have understood, the DBF file is structured in a 2 line file: the first line contains file info, header info and it's in binary. The second line contains the data and it's plain text. So i thought to simply write a new binary file replicating the first line and manually adding the first records in the first file, the other records in the other file.
That's the code i use to parse the file and it works nicely
$fdbf = fopen($_FILES['userfile']['tmp_name'],'r');
$fields = array();
$buf = fread($fdbf,32);
$header=unpack( "VRecordCount/vFirstRecord/vRecordLength", substr($buf,4,8));
$goon = true;
$unpackString='';
while ($goon && !feof($fdbf)) { // read fields:
$buf = fread($fdbf,32);
if (substr($buf,0,1)==chr(13)) {$goon=false;} // end of field list
else {
$field=unpack( "a11fieldname/A1fieldtype/Voffset/Cfieldlen/Cfielddec", substr($buf,0,18));
$unpackString.="A$field[fieldlen]$field[fieldname]/";
array_push($fields, $field);
}
}
fseek($fdbf, 0);
$first_line = fread($fdbf, $header['FirstRecord']+1);
fseek($fdbf, $header['FirstRecord']+1); // move back to the start of the first record (after the field definitions)
first_line is the variable the contains the header data, but when i try to write it in a new file something wrong happens and the row isn't written exactly as it was read. That's the code i use for writing:
$handle_log = fopen($new_filename, "wb");
fwrite($handle_log, $first_line, strlen($first_line) );
fwrite($handle_log, $string );
fclose($handle_log);
I've tried to add the b value to fopen mode parameter as suggested to open it in a binary way, i've also taken a suggestion to add exactly the length of the string to avoid the stripes of some characters but unsuccessfully since all the files written are not correctly in DBF format. What can i do to achieve my goal?
As i have understood, the DBF file is structured in a 2 line file: the
first line contains file info, header info and it's in binary. The
second line contains the data and it's plain text.
Well, it's a bit more complicated than that.
See here for a full description of the dbf file format.
So it would be best if you could use a library to read and write the dbf files.
If you really need to do this yourself, here are the most important parts:
Dbf is a binary file format, so you have to read and write it as binary. For example the number of records is stored in a 32 bit integer, which can contain zero bytes.
You can't use string functions on that binary data. For example strlen() will scan the data up to the first null byte, which is present in that 32 bit integer, and will return the wrong value.
If you split the file (the records), you'll have to adjust the record count in the header.
When splitting the records keep in mind that each record is preceded by an extra byte, a space 0x20 if the record is not deleted, an asterisk 0x2A if the record is deleted. (for example, if you have 4 fields of 10 bytes, the length of each record will be 41) - that value is also available in the header: bytes 10-11 - 16-bit number - Number of bytes in the record. (Least significant byte first)
The file could end with the end-of-file marker 0x1A, so you'll have to check for that as well.
I am creating a site where the authenticated user can write messages for the index site.
On the message create site I have a textbox where the user can give the title of the message, and a textbox where he can write the message.
The message will be exported to a .txt file and from the title I'm creating the title of the .txt file and like this:
Title: This is a message (The filename will be: thisisamessage.txt)
The original given text as filename will be stored in a database rekord among with the .txt filename as path.
For converting the title text I am using a function that looks like this:
function filenameconverter($title){
$filename=str_replace(" ","",$title);
$filename=str_replace("ű","u",$filename);
$filename=str_replace("á","a",$filename);
$filename=str_replace("ú","u",$filename);
$filename=str_replace("ö","o",$filename);
$filename=str_replace("ő","o",$filename);
$filename=str_replace("ó","o",$filename);
$filename=str_replace("é","e",$filename);
$filename=str_replace("ü","u",$filename);
$filename=str_replace("í","i",$filename);
$filename=str_replace("Ű","U",$filename);
$filename=str_replace("Á","A",$filename);
$filename=str_replace("Ú","U",$filename);
$filename=str_replace("Ö","O",$filename);
$filename=str_replace("Ő","O",$filename);
$filename=str_replace("Ó","O",$filename);
$filename=str_replace("É","E",$filename);
$filename=str_replace("Ü","U",$filename);
$filename=str_replace("Í","I",$filename);
return $filename;
}
However it works fine at the most of the time, but sometimes it is not doing its work.
For example: "Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló".
It should stand as a .txt as:
pamutkeztorloadagoloeshigieniaikeztorloadagolo.txt, and most of the times it is.
But sometimes when im giving this it will be:
pamutkă©ztă¶rlĺ‘adagolăłă©shigiă©niaikă©ztă¶rlĺ‘adagolăł.txt
I'm hungarian so the title text will be also hungarian, thats why i have to change the characters.
I'm using XAMPP with apache and phpmyadmin.
I would rather use a generated unique ID for each file as its filename and save the real name in a separate column.
This way you can avoid that someone overwrites files by simply uploading them several times. But if that is what you want you will find several approaches on cleaning filenames here on SO and one very good that I used is http://cubiq.org/the-perfect-php-clean-url-generator
intl
I don't think it is advisable to use str_replace manually for this purpose. You can use the bundled intl extension available as of PHP 5.3.0. Make sure the extension is turned on in your XAMPP settings.
Then, use the transliterator_transliterate() function to transform the string. You can also convert them to lowercase along. Credit goes to simonsimcity.
<?php
$input = 'Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló';
$output = transliterator_transliterate('Any-Latin; Latin-ASCII; lower()', $input);
print(str_replace(' ', '', $output)); //pamutkeztorloadagoloeshigieniaikeztorloadagolo
?>
P.S. Unfortunately, the php manual on this function doesn't elaborate the available transliterator strings, but you can take a look at Artefacto's answer here.
iconv
Using iconv still returns some of the diacritics that are probably not expected.
print(iconv("UTF-8","ASCII//TRANSLIT",$input)); //Pamutk'ezt"orl"o adagol'o 'es higi'eniai k'ezt"orl"o adagol'o
mb_convert_encoding
While, using encoding conversion from Hungarian ISO to ASCII or UTF-8 also gives similar problems you have mentioned.
print(mb_convert_encoding($input, "ASCII", "ISO-8859-16")); //Pamutk??zt??rl?? adagol?? ??s higi??niai k??zt??rl?? adagol??
print(mb_convert_encoding($input, "UTF-8", "ISO-8859-16")); //PamutkéztörlŠadagoló és higiéniai kéztörlŠadagoló
P.S. Similar question could also be found here and here.
Hey guys I've seen a lot of options on fread (which requires a fiole, or writing to memory),
but I am trying to invalidate an input based on a string that has already been accepted (unknown format). I have something like this
if (FALSE !== str_getcsv($this->_contents, "\n"))
{
foreach (preg_split("/\n/", $this->_contents) AS $line)
{
$data[] = explode(',', $line);
}
print_r($data); die;
$this->_format = 'csv';
$this->_contents = $this->trimContents($data);
return true;
}
Which works fine on a real csv or csv filled variable, but when I try to pass it garbage to invalidate, something like:
https://www.gravatar.com/avatar/625a713bbbbdac8bea64bb8c2a9be0a4 which is garbage (since its a png), it believes its csv
anyway and keeps on chugging along until the program chokes. How can I fix this? I have not seen and CSV validators that
are not at least several classes deep, is there a simple three or four line to (in)validate?
is there a simple three or four line to (in)validate?
Nope. CSV is so loosely defined - it has no telltale signs like header bytes, and there isn't even a standard for what character is used for separating columns! - that there technically is no way to tell whether a file is CSV or not - even your PNG could technically be a gigantic one-column CSV with some esoteric field and line separator.
For validation, look at what purpose you are using the CSV files for and what input you are expecting. Are the files going to contain address data, separated into, say, 10 columns? Then look at the first line of the file, and see whether enough columns exist, and whether they contain alphanumeric data. Are you looking for a CSV file full of numbers? Then parse the first line, and look for the kinds of values you need. And so on...
If you have an idea of the kinds of CSVs likely to make it to your system, you could apply some heuristics -- at the risk of not accepting valid CSVs. For instance, you could look at line length, consistency of line length, special characters, etc...
If all you are doing is checking for the presence of commas and newlines, then any sufficiently large, random file will likely have those and thus pass such a CSV test.
I spent almost a day for this , but did not get success.
What i want to do is, i have a binary file "data.dat"
I want to read the file contents and output it in text format in say "data.txt" in php.
I tried unpack function of php, but requires the type to be mentioned as the first argument(May be i am wrong, new to php).
$data = fread($file, 4); // 4 is the byte size of a whole on a 32-bit PC.
$content= unpack("C", $data); //C for unsigned charecter , i for int and so on...
But what if i dont know that at what place , what type of data is stored in the file that i am reading?
This function is restricting me because of the type.
I want something similar to this
$content= unpack("s", $data); //where s can denote to string
Thanks.
PHP does not have a "binary" type. Binary data is stored in strings. If you read binary data from a file, it's already stored as a string. You do not need to convert it into a string.
If the binary data already represents text in some standard encoding, you don't need to do anything as you already have a valid string. If the binary data represents some encoding, you need to know what you need to do with it, we don't know.