Let's say you have a string that looks like this:
token1 token2 tok3
And you want to get all of the tokens (specifically the strings between the spaces), AND ALSO their position (offset) and length).
So I would want a result that looks something like this:
array(
array(
'value'=>'token1'
'offset'=>0
'length'=>6
),
array(
'value'=>'token2'
'offset'=>7
'length'=>6
),
array(
'value'=>'tok3'
'offset'=>14
'length'=>4
),
)
I know that this can be done by simply looping through the characters of the string and I can simpy write a function to do this.
I am wondering, does PHP have anything built-in that will do this efficiently or at least help with part of this?
I am looking for suggestions and appreciate any help given. Thanks
You can use preg_match_all with the PREG_OFFSET_CAPTURE flag:
$str = 'token1 token2 tok3';
preg_match_all('/\S+/', $str, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);
Then you just need to replace the items in $matches[0] like this:
function update($match) {
return array( 'value' => $value[0], 'offset' => $value[1], 'length' => strlen($value[0]));
}
array_map('update', $matches[0]);
var_dump($matches[0]);
There's a simpler way, in most respects. You'll have a more basic result, but with much less work put in.
Assuming you have tokena tokenb tokenc stored in $data
$tokens = explode(' ', $data);
Now you have an array of tokens separated by spaces. They will be in order, so $tokens[0] = tokena, $tokens[1] = tokenb, etc. You can very easily get the length of any given item by doing strlen($tokens[$index]); If you need to know how many tokens you were passed, use $token_count = count($tokens);
Not as sophisticated, but next to no work to get it.
You could use explode(), which will give you an array of tokens from the string, and strlen() to count the number of characters in the string. As far as I know, I don't think there is a PHP function to tell you where an element is in an array.
To get around the last problem, you could use a counter variable that loops through the explod()ed array (foreach() for for()) and gives each sub-array in the new data it's position.
Someone please correct my if I'm wrong.
James
I like the first answer the most - to use PREG_OFFSET_CAPTURE. In case anyone else is interested, I ended up writing something that does this as well, although I am going to accept the first answer.
Thank you everybody for helping!
function get_words($string) {
$string_chars = str_split($string);
$words = array();
$curr_offset = 0;
foreach($reduced_string_chars as $offset=>$char) {
if ($char == ' ') {
if ($length) $words[] = array('offset'=>$curr_offset,'length'=>$length,'value'=>implode($value_array));
$curr_offset = $offset;
$length = 0;
$value_array = array();
}
else {
$length++;
$value_array[] = $char;
}
}
return $words;
}
Related
I've spent my last 4 hours figuring out how to ... I got to ask for your help now.
I'm trying to extract from a text multiple substring match my starting_words_array and ending_words_array.
$str = "Do you see that ? Indeed, I can see that, as well as this." ;
$starting_words_array = array('do','I');
$ending_words_array = array('?',',');
expected output : array ([0] => 'Do you see that ?' [1] => 'I can see that,')
I manage to write a first function that can find the first substring matching one of both arrays items. But i'm not able to find how to loop it in order to get all the substring matching my requirement.
function SearchString($str, $starting_words_array, $ending_words_array ) {
forEach($starting_words_array as $test) {
$pos = strpos($str, $test);
if ($pos===false) continue;
$found = [];
forEach($ending_words_array as $test2) {
$posStart = $pos+strlen($test);
$pos2 = strpos($str, $test2, $posStart);
$found[] = ($pos2!==false) ? $pos2 : INF;
}
$min = min($found);
if ($min !== INF)
return substr($str,$pos,$min-$pos) .$str[$min];
}
return '';
}
Do you guys have any idea about how to achieve such thing ?
I use preg_match for my solution. However, the start and end strings must be escaped with preg_quote. Without that, the solution will be wrong.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr[] = $match[0];
}
}
return $resArr;
}
The result is what the questioner expects.
If the expressions can occur more than once, preg_match_all must also be used. The regex must be modify.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*?".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr = array_merge($resArr,$match[0]);
}
}
return $resArr;
}
The resut for the second variant:
array (
0 => "Do you see that ?",
1 => "Indeed,",
2 => "I can see that,",
)
I would definitely use regex and preg_match_all(). I won't give you a full working code example here but I will outline the necessary steps.
First, build a regex from your start-end-pairs like that:
$parts = array_map(
function($start, $end) {
return $start . '.+' . $end;
},
$starting_words_array,
$ending_words_array
);
$regex = '/' . join('|', $parts) . '/i';
The /i part means case insensitive search. Some characters like the ? have a special purpose in regex, so you need to extend above function in order to escape it properly.
You can test your final regex here
Then use preg_match_all() to extract your substrings:
preg_match_all($regex, $str, $matches); // $matches is passed by reference, no need to declare it first
print_r($matches);
The exact structure of your $matches array will be slightly different from what you asked for but you will be able to extract your desired data from it
Benni answer is best way to go - but let just point out the problem in your code if you want to fix those:
strpos is not case sensitive and find also part of words so you need to changes your $starting_words_array = array('do','I'); to $starting_words_array = array('Do','I ');
When finding a substring you use return which exit the function so you want find any other substring. In order to fix that you can define $res = []; at the beginning of the function and replace return substr($str,$pos,... with $res[] = substr($str,$pos,... and at the end return the $res var.
You can see example in 3v4l - in that example you get the output you wanted
I'm trying to get all numeric before space/alpha in PHP string.
Example:
<?php
//string
$firstStr = '12 Car';
$secondStr = '412 8all';
$thirdStr = '100Pen';
//result I need
firstStr = 12
SecondStr = 412
thirdStr = 100
How do I can get all the number of a string just like example above?
I've an idea to get the position of first Alpha, then get all numeric before that position.
I've successfully get the position using
preg_match('~[a-z]~i', $value, $match, PREG_OFFSET_CAPTURE);
But I'm not done yet to get the numeric before the posisition.
How do I can do that, or anybody know how to fix my idea?
Anyhelp will be appreciated.
You don't need to use regex for strings like the examples you've shown, or any functions at all for that matter. You can just cast them to ints.
$number = (int) $firstStr; // etc.
The PHP rules for string conversion to number will handle it for you.
However, because of those rules, there are some other types of strings that this won't work for. For example, '-12 Car' or '412e2 8all'.
If you do use a regex, be sure to anchor it to the beginning of the string with ^ or it will match digits anywhere in the string as the other regex answers here do.
preg_match('/^\d+/', $string, $match);
$number = $match[0] ?? '';
Here's an extremely hackish approach that will work in most situations:
$s = "1001BigHairyCamels";
$n = intval($s);
$my_number = str_replace($n, '', $s);
$input = '100Pen';
if (preg_match('~(\d+)[ a-zA-Z]~', $input, $m)) {
echo $m[1];
}
This function will do the job!
<?php
function getInt($str){
preg_match_all('!\d+!', $str, $matches);
return $matches[0][0];
}
$firstStr = '12 Car';
$secondStr = '412 8all';
$thirdStr = '100Pen';
echo 'firstStr = '.getInt($firstStr).'<br>';
echo 'secondStr = '.getInt($secondStr).'<br>';
echo 'thirdStr = '.getInt($thirdStr);
?>
I am currently working on a small script to convert data coming from an external source. Depending on the content I need to map this data to something that makes sense to my application.
A sample input could be:
$input = 'We need to buy paper towels.'
Currently I have the following approach:
// Setup an assoc_array what regexp match should be mapped to which itemId
private $itemIdMap = [ '/paper\stowels/' => '3746473294' ];
// Match the $input ($key) against the $map and return the first match
private function getValueByRegexp($key, $map) {
$match = preg_grep($key, $map);
if (count($match) > 0) {
return $match[0];
} else {
return '';
}
}
This raises the following error on execution:
Warning: preg_grep(): Delimiter must not be alphanumeric or backslash
What am I doing wrong and how could this be solved?
In preg_grep manual order of arguments is:
string $pattern , array $input
In your code $match = preg_grep($key, $map); - $key is input string, $map is a pattern.
So, your call is
$match = preg_grep(
'We need to buy paper towels.',
[ '/paper\stowels/' => '3746473294' ]
);
So, do you really try to find string We need to buy paper towels in a number 3746473294?
So first fix can be - swap'em and cast second argument to array:
$match = preg_grep($map, array($key));
But here comes second error - $itemIdMap is array. You can't use array as regexp. Only scalar values (more strictly - strings) can be used. This leads you to:
$match = preg_grep($map['/paper\stowels/'], $key);
Which is definitely not what you want, right?
The solution:
$input = 'We need to buy paper towels.';
$itemIdMap = [
'/paper\stowels/' => '3746473294',
'/other\sstuff/' => '234432',
'/to\sbuy/' => '111222',
];
foreach ($itemIdMap as $k => $v) {
if (preg_match($k, $input)) {
echo $v . PHP_EOL;
}
}
Your wrong assumption is that you think you can find any item from array of regexps in a single string with preg_grep, but it's not right. Instead, preg_grep searches elements of array, which fit one single regexp. So, you just used the wrong function.
I have string like this
$str="absdbsasd k=12312 sdasd l=89879 m=ken asddq casdasd"
and the output should be like this
the question is how to process the string on variable $str to get output which is like this
k=12312
l=89879
m=ken asddq casdasd
I have tried to implement parse_str after I replace the space character (' ') into '&', but the output still got the wrong answer
k=12312
l=89879
m=ken
Could anybody help me..
Assuming the logic is that identifiers are word chars ending with a = and values ends when the next identifier comes, but if the value starts with numbers then only the first word of numbers needed for the value, i would go about it like this:
$str="absdbsasd k=12312 sdasd l=89879 m=ken asddq casdasd";
$parts = preg_split('/(\w+=)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$result = array();
$prev_was_an_identifier = false;
$last_identifier = null;
foreach ($parts as $part) {
if ($prev_was_an_identifier) {
if (preg_match('/^\d+/', $part)) {
$result[$last_identifier] = preg_replace('/^(\d+).*/', '$1', $part);
} else {
$result[$last_identifier] = $part;
}
$prev_was_an_identifier = false;
} elseif (preg_match('/=$/', $part)) {
$prev_was_an_identifier = true;
$last_identifier = mb_substr($part, 0, -1);
}
}
outputs:
array (
'k' => '12312',
'l' => '89879',
'm' => 'ken asddq casdasd',
)
Well, first you need to define the structure of the string something like this:
$str = "$first_value k=$secound_value l=$third_value m=$forth_value";
If the structure is as I have written, then It's pretty impossible to get what you need, since there are no sepperators or any other types of way to determine where one value ends and another value begins.
Look at this example:
$str="absdbsasd k=12312 sdasd l=s l=8987 l=s 9 m=ken asddq casdasd"
There is no way to teel where the real l= starts.
If you would add some seperators like ' ( and make sure they don't appear in the values, the you can get something like this:
$str="'absdbsasd' k='12312 sdasd l=s' l='8987 l=s 9' m='ken asddq casdasd'"
And then you can do a preg_match or preg_split check and get your desired values.
Or, suggested, just make an array in the 1st place.
I have a string
&168491968426|mobile|3|100|1&185601651932|mobile|3|120|1&114192088691|mobile|3|555|5&
and i have to delete, say, this part &185601651932|mobile|3|120|1& (starting with amp and ending with amp) knowing only the first number up to vertical line (185601651932)
so that in result i would have
&168491968426|mobile|3|100|1&114192088691|mobile|3|555|5&
How could i do that with PHP preg_replace function. The number of line (|) separated values would be always the same, but still, id like to have a flexible pattern, not depending on the number of lines in between the & sign.
Thanks.
P.S. Also, I would be greatful for a link to a good simply written resource relating regular expressions in php. There are plenty of them in google :) but maybe you happen to have a really great link
preg_replace("/&185601651932\\|[^&]+&/", ...)
Generalized,
$i = 185601651932;
preg_replace("/&$i\\|[^&]+&/", ...);
if you want real flexibility, use preg_replace_callback. http://php.net/manual/en/function.preg-replace-callback.php
Important: don't forget to escape your number using preg_quote():
$string = '&168491968426|mobile|3|100|1&185601651932|mobile|3|120|1&114192088691|mobile|3|555|5&';
$number = 185601651932;
if (preg_match('/&' . preg_quote($number, '/') . '.*?&/', $string, $matches)) {
// $matches[0] contains the captured string
}
It seems to me you ought to be using another data structure than a string to manipulate this data.
I'd want this data in a structure like
Array(
[id] => Array(
[field_1] => value_1
[field_2] => value_2
)
)
Your massive string can be massaged into such a structure by doing something like this:
$data_str = '168491968426|mobile|3|100|1&185601651932|mobile|3|120|1&114192088691|mobile|3|555|5&';
$remove_num = '185601651932';
/* Enter a descriptive name for each of the numbers here
- these will be field names in the data structure */
$field_names = array(
'number',
'phone_type',
'some_num1',
'some_num2',
'some_num3'
);
/* split the string into its parts, and place them into the $data array */
$data = array();
$tmp = explode('&', trim($data_str, '&'));
foreach($tmp as $record) {
$fields = explode('|', trim($record, '|'));
$data[$fields[0]] = array_combine($field_names, $fields);
}
echo "<h2>Data structure:</h2><pre>"; print_r($data); echo "</pre>\n";
/* Now to remove our number */
unset($data[$remove_num]);
echo "<h2>Data after removal:</h2><pre>"; print_r($data); echo "</pre>\n";