PHP Excel ( PHPExcel_IOFactory ) encoding issue - php

Thanks in advance.
Im uploading a .csv file through php 5.5 script that converts it from Unicode to UTF-8 and finally saves it on a folder.
Then i read this file with PHPExcel_IOFactory and capture some data to an array. So far so good. File uploads and saves ok, but something may be messing the data in the conversion process, because im getting the text strings in the array filled with blank spaces like this:
array(20) {
["A"]=>
string(39) "
l : 1 7 0 4 8 0 8 8 8 6 4 9 2 3 5 9 "
["B"]=>
string(51) " 2 0 1 7 - 0 8 - 0 9 T 0 0 : 2 1 : 5 7 + 0 2 : 0 0 "
["C"]=>
string(41) " a g : 2 3 8 4 2 6 1 5 5 8 3 3 9 0 2 5 8 "
["D"]=>
bool(false)
["E"]=>
string(41) " a s: 2 3 8 4 2 6 1 5 5 8 3 3 7 0 2 5 8"
["F"]=>
string(29) " D E S A R R O L L O W E B "
Maybe its because Delimiters. Opening .csv on sublime reveals two 'white spaces' as delimiters, so if i pass something like this :
$objReader->setDelimiter(' ');
It works and reads data, but filled with empty spaces. Some tip about how to get clean data from file?
NOTE: Using WAMP its working ok converting the file like this:
$conversion = iconv(mb_detect_encoding($conversion, mb_detect_order(),
true), "UTF-8", $conversion);
In production environment not working at all with that conversion (File saves empty)

Finally sorted out the problem. For anyone using PHPExcel and having same issue on character codification, if you really need to save the .xls file on a different character set, try something like this on the conversion:
$objReader->setDelimiter("\t");
$inputFileName = array_shift($inputFileNames);
$conversion = file_get_contents($inputFileName);
$conversion = iconv("WINDOWS-1252", "UTF-8", $conversion);
Thanks!

Related

PHP - Find repeated strings and group them with counting

I have many strings looking like this:
6 (39), 10 (44), 11 (45), 11½ (45.5), 12 (46)
6 (39), 7 (40.5), 8 (42), 8½ (42.5), 9 (43), 10 (44.5), 11 (46)
6 (39), 7 (40.5), 8 (42), 8½ (42.5), 9 (43), 11 (46)
I got these results with this code:
<?PHP
$rscat = mysql_query("SELECT `Sizes` FROM `products` WHERE `Category`='$Cat'");
while($rowscat = mysql_fetch_array($rscat))
{
$CatSizes = $rowscat['Sizes'];
echo "$CatSizes <br>";
}
?>
What I want: as you can see in the last example the string 6 (39) is repeated exactly three times, the string 7 (40.5) is repeated exactly two times.
So I want a result like this:
6 (39) - (3)
7 (40.5) - (2)
Of course I do not need that just for 7 (41) and 10 (44); I need to find all strings which are repeating and display them in just one row and aside to show how many times they are repeated.
I hope you understand me well.
Thanks in advance!
Advance warning: OP edited their post several times, so there are multiple answers below. I'm leaving them all intact in case others find them helpful.
Original answer
You can use the array_count_values() function for exactly this: it returns a new array with each repeated value as the array key, and the number of times that value appears as the array value. Using your original example, you'd need something like this:
$input = <<<EOT
7 (41)
8 (42)
9 (43)
10 (44)
11 (45)
6 (39)
7 (41)
EOT;
$split = explode("\n", $input);
$counted = array_count_values($split);
foreach($counted as $value => $count) {
echo "$value - ($count)\n";
}
Note: I trimmed the number of strings going into $input for conciseness, but you get the point. Output from that script:
7 (41) - (8)
8 (42) - (6)
9 (43) - (6)
10 (44) - (6)
11 (45) - (7)
8½ (42.5) - (1)
12 (46) - (1)
6 (39) - (3)
You might find the PHP documentation for array_count_values() helpful reading.
Update #1
OP edited their post, rendering my original answer incorrect. Using their edited version, the correct code is this:
$input = "6 (39), 10 (44), 11 (45), 11½ (45.5), 12 (46), 6 (39), 7 (40.5), 8 (42), 8½ (42.5), 9 (43), 10 (44.5), 11 (46), 6 (39), 7 (40.5), 8 (42), 8½ (42.5), 9 (43), 11 (46)";
$split = explode(", ", $input);
$counted = array_count_values($split);
foreach($counted as $value => $count) {
echo "$value - ($count)\n";
}
WARNING: Make sure the items are all separated by commas, not by a mix of new lines and commas. OP: you should choose new lines as in your original version, or choose commas, but don't mix the two if you want this code to work.
Update #2
OP has asked to modify the solution so it works directly with their SQL query. This is tricky because I don't know exactly what data is coming out, but based on their previous edits the answer is likely to look something like this:
$rscat = mysql_query("SELECT `Sizes` FROM `products` WHERE `Category`='$Cat'");
$arrayOfSizes = [];
while($rowscat = mysql_fetch_array($rscat)) {
$arrayOfSizes[] = $rowscat['Sizes'];
}
$counted = array_count_values($arrayOfSizes);
foreach($counted as $value => $count) {
echo "$value - ($count)\n";
}

using PHP explode() of a unicode string to get the rows in an array

I am trying to read a tab delimited spreadsheet with unicode characters like this:
$content = file_get_contents($filename);
When I print this in the browser are texts are shown correctly. Also there is a header:
header('Content-Type: text/html; charset=utf-8');
Now I want to split the content into rows by using:
$rows= explode("\n",$content);
The content for the unicode characters now is gibberish when I for instance print one row:
echo $rows[1];
My question is: what is causing this behaviour and what can I do to get the correct texts into the $row array? In the end I want to insert the row values into the database, which currently now inserts the gibberish.
help appreciated
Example
A row before the explode() looks like this (note: tabs are not displayed below):
R002 Студия 2В 66 Богдан
дорога Санкт-Петербург 3174 45 Андрей Смирнов маркетинг 234-56790 653-23685 dummy#dummy.com 34354547
After the explode a row looks like:
R002 ! B C 4 8 O 2 66 > 3 4 0 = 4 > # > 3 0 ! 0 = : B -¬ 5 B 5
# 1 C # 3 3174 45 = 4 # 5 9 ! < 8 # = > 2 < 0 # : 5 B 8 = 3
234-56790 653-23685 dummy#dummy.com 34354547 59
Edit: Also substring not working
I noted also another strange behavious. When I do
echo mb_substr($content,0,50,'utf-8');
the output is only 25 characters, but characters are displayed correctly
R002 Студия 2В 66 Богдан
However when I change the offset form 0 to for instance 5 it's a mess again.
echo mb_substr($content,5,50,'utf-8');
the output is
02 ! B C 4 8 O 2 66 > 3 4 0 = 4 >
not sure what's going on here ... Can it be because the file contains a utf-8 bom ("\xEF\xBB\xBF")?
I found the solution, which had to to with it's encoding. It was exported from Excel which offered initial difficulties. Anyways here is my code to resolve the encoding bit:
$data = file_get_contents($filename);
if (strpos($data, "\xef\xbb\xbf") !== FALSE)
{
//do nothing, it's already utf-8
}
elseif(strpos($data, "\xff\xfe") !== FALSE)
{
$data = iconv('UCS-2', 'UTF-8', $data); //LE UTF-16
}
elseif(strpos($data, "\xfe\xff") !== FALSE)
{
$data = iconv('UCS-2', 'UTF-8', $data); //BE UTF-16
}

Undesired new lines added to text document when writing to it

I was just trying to do a simple sorting algorithm on a matrix that I read from a matrix.txt file and append the sorted matrix back to the file.
The problem is that undesired new lines are written to the text file. I also tried in parallel to echo the same things I am writing in the text file, but the echo prints everything okay.
// .. reading the file and sorting the matrix ..
// Write the sorted matrix back to the text file
$handle = #fopen("matrix.txt", "a");
if ($handle) {
fwrite($handle, PHP_EOL . PHP_EOL . "Sorted matrix:" . PHP_EOL);
for ($i = 0; $i < $n; $i++) {
for ($j = 0; $j < $m; $j++) {
echo $matrix[$i][$j] . " ";
fwrite($handle, $matrix[$i][$j] . " ");
}
fwrite($handle, PHP_EOL);
echo "<br>";
}
fclose($handle);
}
matrix.txt file contents:
1 2 5 2 5 8 12 323 1 4
8 32 2 1 3 82 2 8 4 2
1 2 5 2 5 8 12 323 1 4
8 32 2 1 3 82 2 8 4 2
In the web browser it echoes the matrix nicely sorted, each row by itself; however, in the text file, the following is appended:
Matrix sorted using selection sort:
1 1 2 2 4
5 5 8 12 323
1 2 2 2
3 4 8 8 32 82
1 1 2 2 4
5 5 8 12 323
1 2 2 2 3 4 8 8 32 82
Any clues what could cause this? Thanks in advance !
The problem isn't in the code you posted; it's in the input matrix you provided. Notice that every extra newline corresponds to the item which used to be at the end of the row, except for the last row. That's because the final newline from each row is being included when you read the line, and explode (which I imagine you're using) doesn't know to remove it. You could simply trim the lines before exploding to fix this, or specifically remove \r and \n characters.

What is the equivalent of var_dump() in R?

I'm looking for a function to dump variables and objects, with human readable explanations of their data types. For instance, in php var_dump does this.
$foo = array();
$foo[] = 1;
$foo['moo'] = 2;
var_dump($foo);
Yields:
array(2) {
[0]=>
int(1)
["moo"]=>
int(2)
}
A few examples:
foo <- data.frame(1:12,12:1)
foo ## What's inside?
dput(foo) ## Details on the structure, names, and class
str(foo) ## Gives you a quick look at the variable structure
Output on screen:
foo <- data.frame(1:12,12:1)
foo
X1.12 X12.1
1 1 12
2 2 11
3 3 10
4 4 9
5 5 8
6 6 7
7 7 6
8 8 5
9 9 4
10 10 3
11 11 2
12 12 1
> dput(foo)
structure(list(X1.12 = 1:12, X12.1 = c(12L, 11L, 10L, 9L, 8L,
7L, 6L, 5L, 4L, 3L, 2L, 1L)), .Names = c("X1.12", "X12.1"), row.names = c(NA,
-12L), class = "data.frame")
> str(foo)
'data.frame': 12 obs. of 2 variables:
$ X1.12: int 1 2 3 4 5 6 7 8 9 10 ...
$ X12.1: int 12 11 10 9 8 7 6 5 4 3 ...
Check out the dump command:
> x <- c(8,6,7,5,3,0,9)
> dump("x", "")
x <-
c(8, 6, 7, 5, 3, 0, 9)
I think you want 'str' which tells you the structure of an r object.
Try deparse, for example:
> deparse(1:3)
[1] "1:3"
> deparse(c(5,6))
[1] "c(5, 6)"
> deparse(data.frame(name=c('jack', 'mike')))
[1] "structure(list(name = structure(1:2, .Label = c(\"jack\", \"mike\""
[2] "), class = \"factor\")), .Names = \"name\", row.names = c(NA, -2L"
[3] "), class = \"data.frame\")"
It's better than dump, because dump requires a variable name, and it creates a dump file.
If you don't want to print it directly, but for example put it inside a string with sprintf(fmt, ...) or a variable to use later, then it's better than dput, because dput prints directly.
print is probably the easiest function to use out of the box; most classes provide a customised print. They might not specifically name the type, but will often provide a distinctive form.
Otherwise, you might be able to write custom code to use the class and datatype functions to retrieve the information you want.

How to get opcodes of PHP?

<?php
$show_value = 123;
echo 'sing_quote'.$show_value;
echo "double_quote{$show_value}";
?>
Its opcode is:
1: <?php
2: $show_value = 123;
0 ASSIGN !0, 123
3: echo 'sing_quote'.$show_value;
1 CONCAT 'sing_quote', !0 =>RES[~1]
2 ECHO ~1
4: echo "double_quote{$show_value}";
3 ADD_STRING 'double_quote' =>RES[~2]
4 ADD_VAR ~2, !0 =>RES[~2]
5 ECHO ~2
6 RETURN 1
Check out the Vulcan Logic Disassembler PECL extension - see author's home page for more info.
The Vulcan Logic Disassembler hooks
into the Zend Engine and dumps all the
opcodes (execution units) of a script.
It was written as as a beginning of an
encoder, but I never got the time for
that. It can be used to see what is
going on in the Zend Engine.
Once installed, you can use it like this:
php -d vld.active=1 -d vld.execute=0 -f yourscript.php
See also this interesting blog post on opcode extraction, and the PHP manual page listing the available opcodes.
Parsekit has parsekit_compile_string().
sudo pecl install parsekit
var_dump(parsekit_compile_string(<<<PHP
\$show_value = 123;
echo 'sing_quote'.\$show_value;
echo "double_quote{\$show_value}";
PHP
));
The output is quite verbose, so you'd need to process it to get assembler-like format.
["opcodes"]=>
array(10) {
[0]=>
array(9) {
["address"]=>
int(44682716)
["opcode"]=>
int(101)
["opcode_name"]=>
string(13) "ZEND_EXT_STMT"
["flags"]=>
int(4294967295)
["result"]=>
array(8) {
["type"]=>
int(8)
["type_name"]=>
string(9) "IS_UNUSED"
["var"]=>
int(0)
["opline_num"]=>
string(1) "0"
["op_array"]=>
string(1) "0"
["jmp_addr"]=>
string(1) "0"
["jmp_offset"]=>
string(8) "35419039"
["EA.type"]=>
int(0)
}
["op1"]=>
array(8) {
["type"]=>
int(8)
["type_name"]=>
string(9) "IS_UNUSED"
["var"]=>
int(0)
["opline_num"]=>
string(1) "0"
["op_array"]=>
string(1) "0"
["jmp_addr"]=>
string(1) "0"
["jmp_offset"]=>
string(8) "35419039"
["EA.type"]=>
int(0)
}
You can run code and also see the opcodes if you use https://3v4l.org/
Note: It automatically shows the Vulcan Logic Disassembler (VLD) output, but only if you have "all supported versions" selected in the version dropdown.
Here's a simple example (shown below for posterity): https://3v4l.org/Gt8fd/vld
Code:
<?php
$arr = [1, 2, 3, 4];
print_r(array_map(fn(int $i): int => $i * $i, $arr));
Result:
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename: /in/Gt8fd
function name: (null)
number of ops: 10
compiled vars: !0 = $arr
line #* E I O op fetch ext return operands
-------------------------------------------------------------------------------------
2 0 E > ASSIGN !0, <array>
3 1 INIT_FCALL 'print_r'
2 INIT_FCALL 'array_map'
3 DECLARE_LAMBDA_FUNCTION '%00%7Bclosure%7D%2Fin%2FGt8fd%3A3%240'
4 SEND_VAL ~2
5 SEND_VAR !0
6 DO_ICALL $3
7 SEND_VAR $3
8 DO_ICALL
9 > RETURN 1
Function %00%7Bclosure%7D%2Fin%2FGt8fd%3A3%240:
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename: /in/Gt8fd
function name: {closure}
number of ops: 6
compiled vars: !0 = $i
line #* E I O op fetch ext return operands
-------------------------------------------------------------------------------------
0 E > RECV !0
1 MUL ~1 !0, !0
2 VERIFY_RETURN_TYPE ~1
3 > RETURN ~1
4* VERIFY_RETURN_TYPE
5* > RETURN null
End of function %00%7Bclosure%7D%2Fin%2FGt8fd%3A3%240
Generated using Vulcan Logic Dumper, using php 8.0.0
Two options are, setting opcache.opt_debug_level INI setting or using phpdbg binary provided in a debug-enabled PHP environment (e.g. requiring you to either compile PHP from source or install the related package on Linux).
For more information and a full guide, refer to this php.watch article (also credits to this article).

Categories