Obtaining PHP regex matches but unable to do anything with them

Obtaining PHP regex matches but unable to do anything with them - php

I have some PHP code that accepts an uploaded file from an HTML form then reads through it using regex to look for specific lines (in the case below, those with "Track Number" followed by an integer).
The file is an XML file that looks like this normally...
<key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer>
But when PHP reads it in it gets rid of the XML tags for some reason, leaving me with just...
Disc Number2
Disc Count2
Track Number1
The file has to be XML, and I don't want to use SimpleXML cause that's a whole other headache. The regex matches the integers like I want it to (I can print them out "0","1","2"...) but of course they're returned as strings in $matches, and it seems I'm unable to make use of these strings. I need to check if the integer is between 0 and 9 but I um unable to do this no matter what I try.
Using intval() or (int) to first convert the matches to integers always returns 0 even though the given string contains only integers. And using in_array to compare the integer to an array of 0-9 as strings always returns false as well for some reason. Here's the trouble code...
$myFile = file($myFileTmp, FILE_IGNORE_NEW_LINES);
$numLines = count($myFile) - 1;
$matches = array();
$nums = array('0','1','2','3','4','5','6','7','8','9');
for ($i=0; $i < $numLines; $i++) {
$line = trim($myFile[$i]);
$numberMatch = preg_match('/Track Number(.*)/', $line, $matches); // if I try matching integers specifically it doesn't return a match at all, only if I do it like this - it gives me the track number I want but I can't do anything with it
if ($numberMatch == 1 and ctype_space($matches[1]) == False) {
$number = trim($matches[1]); // string containing an integer only
echo(intval($number)); // conversion doesn't work - returns 0 regardless
if (in_array($number,$nums)===True) { // searching in array doesn't work - returns FALSE regardless
$number = "0" . $number;
}
}
}
I've tried type checking, double quotes, single quotes, trimming whitespace, UTF8 encoding, === operator, regex matching numbers specifically with (\d+) (which doesn't return a match at all)...what else could it possibly be? When I try these things with regular strings it works fine, but the regex is messing everything up here. I'm about to give up on this app entirely, please save me.

Why is SimpleXML not an option? Consider the following code:
$str = "<container><key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer></container>";
$xml = simplexml_load_string($str);
foreach ($xml->key as $k) {
// do sth. here with it
}

You should read RegEx match open tags except XHTML self-contained tags -- while doesn't exactly match your use case it has good reasons why one should use something besides straight up regexp matching for your use case.
Assuming that files only contain a single Track Number you can simplify what you're doing a lot. See the following:
test.xml
<key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer>
test.php
<?php
$contents = file_get_contents('test.xml');
$result = preg_match_all("/<key>Track Number<\/key><integer>(\d)<\/integer>/", $contents, $matches);
if ($result > 0) {
print_r($matches);
$trackNumber = (int) $matches[1][0];
print gettype($trackNumber) . " - " . $trackNumber;
}
Result
$ php -f test.php
Array
(
[0] => Array
(
[0] => <key>Track Number</key><integer>1</integer>
)
[1] => Array
(
[0] => 1
)
)
integer - 1%
As you can see, there is no need to iterate through the files line by line when using preg_match_all. The matching here is very specific so you don't have to do extra checks for whitespace or validate that it's a number. Which you're doing against a string value currently.

Related

How to convert object class into string

How can I convert the following object into string:
$ssh->exec('tail -1 /var/log/playlog.csv');
So I can parse the string as the first parameter in strripos():
if($idx = strripos($ssh,','))//Get the last index of ',' substring
{
$ErrorCode = substr($ssh,$idx + 1,(strlen($ssh) - $idx) - 1); //using the found index, get the error code using substring
echo " " .$Playlist.ReturnError($ErrorCode); //The ReturnError function just replaces the error code with a custom error
}
As currently when I run my script I get the following error message:
strpos() expects parameter 1 to be string
I've seen similar questions including this one Object of class stdClass could not be converted to string , however I still can't seem to come up with a solution.

There are two problems with this line of code:
if($idx = strripos($ssh,','))
$ssh is an instance of some class. You use it above as $ssh->exec(...). You should check the value it returns (probably a string) and strripos() on it, not on $ssh.
strripos() returns FALSE if it cannot find the substring or a number (that can be 0) when it founds it. But in boolean context, 0 is the same as false. This means this code cannot tell apart the cases when the comma (,) is found as the first character of the string or it is not found at all.
Assuming $ssh->exec() returns the output of the remote command as string, the correct way to write this code is:
$output = $ssh->exec('tail -1 /var/log/playlog.csv');
$idx = strrpos($output, ','); //Get the last index of ',' substring
if ($idx !== FALSE) {
// The value after the last comma is the error code
$ErrorCode = substr($output, $idx + 1);
echo ' ', $Playlist, ReturnError($ErrorCode);
} else {
// Do something else when it doesn't contain a comma
}
There is no need to use strripos(). It performs case-insensitive comparison but you are searching for a character that is not a letter, consequently the case-sensitivity doesn't make any sense for it.
You can use strrpos() instead, it produces the same result and it's a little bit faster than strripos().
An alternative way
An alternative way to get the same outcome is to use explode() to split $output in pieces (separated by comma) and get the last piece (using end() or array_pop()) as the error code:
$output = $ssh->exec('tail -1 /var/log/playlog.csv');
$pieces = explode(',', $output);
if (count($pieces) > 1) {
$ErrorCode = (int)end($pieces);
echo ' ', $Playlist, ReturnError($ErrorCode);
} else {
// Do something else when it doesn't contain a comma
}
This is not necessarily a better way to do it. It is, however, more readable and more idiomatic to PHP (the code that uses strrpos() and substr() resembles more the C code).

Unable to make use of PHP regex matches

I have some PHP code that accepts an uploaded file from an HTML form then reads through it using regex to look for specific lines (in the case below, those with "Number" followed by an integer).
The regex matches the integers like I want it to, but of course they're returned as strings in $matches. I need to check if the integer is between 0 and 9 but I um unable to do this no matter what I try.
Using intval() or (int) to first convert the matches to integers always returns 0 even though the given string contains only integers. And using in_array to compare the integer to an array of 0-9 as strings always returns false as well for some reason. Here's the trouble code...
$myFile = file($myFileTmp, FILE_IGNORE_NEW_LINES);
$numLines = count($myFile) - 1;
$matches = array();
$nums = array('0','1','2','3','4','5','6','7','8','9');
for ($i=0; $i < $numLines; $i++) {
$line = trim($myFile[$i]);
$numberMatch = preg_match('/Number(.*)/', $line, $matches);
if ($numberMatch == 1 and ctype_space($matches[1]) == False) { // works up to here
$number = trim($matches[1]); // string containing an integer only
echo(intval($number)); // conversion doesn't work - returns 0 regardless
if (in_array($number,$nums)) { // searching in array doesn't work - returns FALSE regardless
$number = "0" . $number;
}
}
}
I've tried type checking, double quotes, single quotes, trimming whitespace, UTF8 encoding...what else could it possibly be? I'm about to give up on this app entirely, please save me.

Use '===' for eq for example
if 1 == '1' then true;
if 1 === '1' false;
if 1 == true then true;
if 1 === true then false
You can show file?

You write in your question that you're using a regular expression to look for the term "Number" followed by a single digit (0-9).
A regular expression for it would be:
/Number(\d)/
It will contain in the matching group 1 the number (digit) you're looking for.
The pattern you use:
/Number(.*)/
can contain anything (but a line-break) in the first matching group. It obviously is matching too much. You then have a problem filtering that too much retro-actively.
It normally works best to first look as precise as possible than to fiddle with too much noise afterwards.

How to get equal parts of multiple strings/array?

I have the following point: a xls file contains one column with codes. The codes have a prefix and a unique code like this:
- VIP-AX757
- VIP-QBHE6
- CODE-IUEF7
- CODE-QDGF3
- VIP-KJQFB
- ...
How can I get equal parts of strings or an array? perfect would be if I get an array like this:
- $result[VIP] = 3;
- $result[CODE] = 2;
An array with the found prefix and the sum of cells with that prefix. But the result is not so important at the moment.
I couldn't find a soloution how to get equal parts of two strings: how to compare this "VIP-AX757" and "VIP-QBHE6" and get a result that says: "VIP-" is the same prefix/part in this two strings?
Hope someone has an idea.
thx!

-drum roll- Time for a one-liner!
$result = array_count_values(array_map(function($v) {list($a) = explode("-",$v); return $a;},$input));
(Assumes $input is your array of codes)
If you are using PHP 5.4 or newer (you should be), then:
$result = array_count_values(array_map(function($v) {return explode("-",$v)[0];},$input));
Tested in PHP CLI:

If the prefix is always followed by a '-' then you can do something like this:-
foreach ($codes as $code) {
$tmp = explode("-",$code);
$result[$tmp[0]] += 1;
}
print_r($result);

Depends on the variability of the data, but something like:
preg_match_all('/^([^-]+)/m', $string, $matches);
$result = array_count_values($matches[1]);
print_r($result);
If you don't know that there is an - after the prefix but the prefix is always letters then:
preg_match_all('/^([A-Z]+)/im', $string, $matches);
$result = array_count_values($matches[1]);
Otherwise you'll have to define exactly what the prefix can contain if it's not the delimiter.

Since you stated via comment to Niet that you don't have a reliable delimiter, then we can only write a pattern that identifies your targeted substrings based on their location in each line.
I recommend preg_match_all() with no capture group, a start of the line anchor, and a multi-line pattern modifier (m).
I've written a preg_split() alternative, but the pattern is a little "clunkier" because of the way I'm handling the line returns.
Code: (Demo)
$string = 'VIP-AX757
VIP-QBHE6
CODE-IUEF7
CODE-QDGF3
VIP-KJQFB';
var_export(array_count_values(preg_match_all('~^[A-Z]+~m', $string, $out) ? $out[0] : []));
echo "\n\n";
var_export(array_count_values(preg_split('~[^A-Z][^\r\n]+\R?~', $string, -1, PREG_SPLIT_NO_EMPTY)));
Output:
array (
'VIP' => 3,
'CODE' => 2,
)
array (
'VIP' => 3,
'CODE' => 2,
)

very large php string magically turns into array

I am getting an "Array to string conversion error on PHP";
I am using the "variable" (that should be a string) as the third parameter to str_replace. So in summary (very simplified version of whats going on):
$str = "very long string";
str_replace("tag", $some_other_array, $str);
$str is throwing the error, and I have been trying to fix it all day, the thing I have tried is:
if(is_array($str)) die("its somehow an array");
serialize($str); //inserted this before str_replace call.
I have spent all day on it, and no its not something stupid like variables around the wrong way - it is something bizarre. I have even dumped it to a file and its a string.
My hypothesis:
The string is too long and php can't deal with it, turns into an array.
The $str value in this case is nested and called recursively, the general flow could be explained like this:
--code
//pass by reference
function the_function ($something, &$OFFENDING_VAR, $something_else) {
while(preg_match($something, $OFFENDING_VAR)) {
$OFFENDING_VAR = str_replace($x, y, $OFFENDING_VAR); // this is the error
}
}
So it may be something strange due to str_replace, but that would mean that at some point str_replace would have to return an array.
Please help me work this out, its very confusing and I have wasted a day on it.
---- ORIGINAL FUNCTION CODE -----
//This function gets called with multiple different "Target Variables" Target is the subject
//line, from and body of the email filled with << tags >> so the str_replace function knows
//where to replace them
function perform_replacements($replacements, &$target, $clean = TRUE,
$start_tag = '<<', $end_tag = '>>', $max_substitutions = 5) {
# Construct separate tag and replacement value arrays for use in the substitution loop.
$tags = array();
$replacement_values = array();
foreach ($replacements as $tag_text => $replacement_value) {
$tags[] = $start_tag . $tag_text . $end_tag;
$replacement_values[] = $replacement_value;
}
# TODO: this badly needs refactoring
# TODO: auto upgrade <<foo>> to <<foo_html>> if foo_html exists and acting on html template
# Construct a regular expression for use in scanning for tags.
$tag_match = '/' . preg_quote($start_tag) . '\w+' . preg_quote($end_tag) . '/';
# Perform the substitution until all valid tags are replaced, or the maximum substitutions
# limit is reached.
$substitution_count = 0;
while (preg_match ($tag_match, $target) && ($substitution_count++ < $max_substitutions)) {
$target = serialize($target);
$temp = str_replace($tags,
$replacement_values,
$target); //This is the line that is failing.
unset($target);
$target = $temp;
}
if ($clean) {
# Clean up any unused search values.
$target = preg_replace($tag_match, '', $target);
}
}

How do you know $str is the problem and not $some_other_array?
From the manual:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject. If
replace has fewer values than search, then an empty string is used for
the rest of replacement values. If search is an array and replace is a
string, then this replacement string is used for every value of
search. The converse would not make sense, though.
The second parameter can only be an array if the first one is as well.

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.

Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.

I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.

Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Obtaining PHP regex matches but unable to do anything with them - php

Related

How to convert object class into string

Unable to make use of PHP regex matches

How to get equal parts of multiple strings/array?

very large php string magically turns into array

Efficient way to parse this string into array in PHP?

Categories

Resources