Regex hash and colons - php

I want to use regular expression to filter substrings from this string
eg: hello world #level:basic #lang:java:php #...
I am trying to produce an array with a structure like this:
Array
(
[0]=> hello world
[1]=> Array
(
[0]=> level
[1]=> basic
)
[2]=> Array
(
[0]=> lang
[1]=> java
[2]=> php
)
)
I have tried preg_match("/(.*)#(.*)[:(.*)]*/", $input_line, $output_array);
and what I have got is:
Array
(
[0] => hello world #level:basic #lang:java:php
[1] => hello world #level:basic
[2] => lang:java:php
)
In this case then I will have to apply this regex few times to the indexes and then apply a regex to filter the colon out. My question is: is it possible to create a better regex to do all in one go? what would the regex be? Thanks

You can use :
$array = explode("#", "hello world #level:basic #lang:java:php");
foreach($array as $k => &$v) {
$v = strpos($v, ":") === false ? $v : explode(":", $v);
}
print_r($array);

do this
$array = array() ;
$text = "hello world #level:basic #lang:java:php";
$array = explode("#", $text);
foreach($array as $i => $value){
$array[$i] = explode(":", trim($value));
}
print_r($array);

Got something for you:
Rules:
a tag begins with #
a tag may not contain whitespace/newline
a tag is preceeded and followed by whitespace or line beginning/ending
a tag can have several parts divided by :
Example:
#this:tag:matches this is some text #a-tag this is no tag: \#escaped
and this one tag#does:not:match
Function:
<?php
function parseTags($string)
{
static $tag_regex = '#(?<=\s|^)#([^\:\s]+)(?:\:([^\s]+))*(?=\s|$)#m';
$results = array();
preg_match_all($tag_regex, $string, $results, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);
$tags = array();
foreach($results as $result) {
$tag = array(
'offset' => $result[0][1],
'raw' => $result[0][0],
'length' => strlen($result[0][0]),
0 => $result[1][0]);
if(isset($result[2]))
$tag = array_merge($tag, explode(':', $result[2][0]));
$tag['elements'] = count($tag)-3;
$tags[] = $tag;
}
return $tags;
}
?>
Result:
array(2) {
[0]=>array(7) {
["offset"]=>int(0)
["raw"]=>string(17) "#this:tag:matches"
["length"]=>int(17)
[0]=>string(4) "this"
[1]=>string(3) "tag"
[2]=>string(7) "matches"
["elements"]=>int(3)
}
[1]=>array(5) {
["offset"]=>int(36)
["raw"]=>string(6) "#a-tag"
["length"]=>int(6)
[0]=>string(5) "a-tag"
["elements"]=>int(1)
}
}
Each matched tag contains
the raw tag text
the tag offset and original length (e.g. to replace it in the string later with str... functions)
the number of elements (to safely iterate for($i = 0; $i < $tag['elements']; $i++))

This might work for you:
$results = array() ;
$text = "hello world #level:basic #lang:java:php" ;
$parts = explode("#", $text);
foreach($parts as $part){
$results[] = explode(":", $part);
}
var_dump($results);

Two ways using regex, note that you somehow need explode() since PCRE for PHP doesn't support capturing a subgroup:
$string = 'hello world #level:basic #lang:java:php';
preg_match_all('/(?<=#)[\w:]+/', $string, $m);
foreach($m[0] as $v){
$example1[] = explode(':', $v);
}
print_r($example1);
// This one needs PHP 5.3+
$example2 = array();
preg_replace_callback('/(?<=#)[\w:]+/', function($m)use(&$example2){
$example2[] = explode(':', $m[0]);
}, $string);
print_r($example2);

This give you the array structure you are looking for:
<pre><?php
$subject = 'hello world #level:basic #lang:java:php';
$array = explode('#', $subject);
foreach($array as &$value) {
$items = explode(':', trim($value));
if (sizeof($items)>1) $value = $items;
}
print_r($array);
But if you prefer you can use this abomination:
$subject = 'hello world #level:basic #lang:java:php';
$pattern = '~(?:^| ?+#)|(?:\G([^#:]+?)(?=:| #|$)|:)+~';
preg_match_all($pattern, $subject, $matches);
array_shift($matches[1]);
$lastKey = sizeof($matches[1])-1;
foreach ($matches[1] as $key=>$match) {
if (!empty($match)) $temp[]=$match;
if (empty($match) || $key==$lastKey) {
$result[] = (sizeof($temp)>1) ? $temp : $temp[0];
unset($temp);
}
}
print_r($result);

Related

explode string to multidimensional with regular expression

I have string like this
$string = 'title,id,user(name,email)';
and I want result to be like this
Array
(
[0] => title
[1] => id
[user] => Array
(
[0] => name
[1] => email
)
)
so far I tried with explode function and multiple for loop the code getting ugly and i think there must be better solution by using regular expression like preg_split.
Replace the comma with ### of nested dataset then explode by a comma. Then make an iteration on the array to split nested dataset to an array. Example:
$string = 'user(name,email),office(title),title,id';
$string = preg_replace_callback("|\(([a-z,]+)\)|i", function($s) {
return str_replace(",", "###", $s[0]);
}, $string);
$data = explode(',', $string);
$data = array_reduce($data, function($old, $new) {
preg_match('/(.+)\((.+)\)/', $new, $m);
if(isset($m[1], $m[2]))
{
return $old + [$m[1] => explode('###', $m[2])];
}
return array_merge($old , [$new]);
}, []);
print '<pre>';
print_r($data);
First thanks #janie for enlighten me, I've busied for while and since yesterday I've learnt a bit regular expression and try to modify #janie answer to suite with my need, here are my code.
$string = 'user(name,email),title,id,office(title),user(name,email),title';
$commaBetweenParentheses = "|,(?=[^\(]*\))|";
$string = preg_replace($commaBetweenParentheses, '###', $string);
$array = explode(',', $string);
$stringFollowedByParentheses = '|(.+)\((.+)\)|';
$final = array();
foreach ($array as $value) {
preg_match($stringFollowedByParentheses, $value, $result);
if(!empty($result))
{
$final[$result[1]] = explode('###', $result[2]);
}
if(empty($result) && !in_array($value, $final)){
$final[] = $value;
}
}
echo "<pre>";
print_r($final);

Extracting a string starting with # or # php

function extractConnect($str,$connect_type){
$connect_array = array();
$connect_counter = 0;
$str = trim($str).' ';
for($i =0; $i<strlen($str);$i++) {
$chr = $str[$i];
if($chr==$connect_type){ //$connect_type = '#' or '#' etc
$connectword = getConnect($i,$str);
$connect_array[$connect_counter] = $connectword;
$connect_counter++;
}
}
if(!empty($connect_array)){
return $connect_array;
}
}
function getConnect($tag_index,$str){
$str = trim($str).' ';
for($j = $tag_index; $j<strlen($str);$j++) {
$chr = $str[$j];
if($chr==' '){
$hashword = substr($str,$tag_index+1,$j-$tag_index);
return trim($hashword);
}
}
}
$at = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
$hash = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
print_r($at);
print_r($hash);
What this method does is it extracts # or # from a string and return an array of those words.
e.g input #stackoverflow is great. #google.com is the best search engine and outputs this
Array ( [0] => google.com ) Array ( [0] => stackoverflow )
But it seems like this method is to slow is there any alternative ?
You could use a regex to achieve this:
/<char>(\S+)\b/i
Explanation:
/ - starting delimiter
<char> - the character you're searching for (passed as a function argument)
(\S+) - any non-whitespace character, one or more times
\b - word boundary
i - case insensitivity modifier
/ - ending delimiter
Function:
function extractConnect($string, $char) {
$search = preg_quote($char, '/');
if (preg_match('/'.$search.'(\S+)\b/i', $string, $matches)) {
return [$matches[1]]; // Return an array containing the match
}
return false;
}
With your strings, this would produce the following output:
array(1) {
[0]=>
string(10) "google.com"
}
array(1) {
[0]=>
string(13) "stackoverflow"
}
Demo
You can do it like this:
<?php
function extractConnect($strSource, $tags) {
$matches = array();
$tags = str_split($tags);
$strSource = explode(' ', $strSource);
array_walk_recursive($strSource, function(&$item) {
$item = trim($item);
});
foreach ($strSource as $strPart) {
if (in_array($strPart[0], $tags)) {
$matches[$strPart[0]][] = substr($strPart, 1);
}
}
return $matches;
}
var_dump(extractConnect(
"#stackoverflow is great. #twitter is good. #google.com is the best search engine",
"##"
));
Outputs:
This seemed to work for me. Provide it with the symbol you want.
function get_stuff($str) {
$result = array();
$words = explode(' ', $str);
$symbols = array('#', '#');
foreach($words as $word) {
if (in_array($word[0], $symbols)) {
$result[$word[0]][] = substr($word, 1);
}
}
return $result;
}
$str = '#stackoverflow is great. #google.com is the best search engine';
print_r(get_stuff($str));
This outputs Array ( [#] => Array ( [0] => stackoverflow ) [#] => Array ( [0] => google.com ) )

Display element of 2nd array whose suffix matched with 1st array

I have two arrays, i.e.:
array('ly', 'ful', 'ay')
and
array('beautiful', 'lovely', 'power')
I want to print the content of second array whose suffix matched with first array. i.e. the output should be lovely, beautiful.
How can I do this in PHP?
Try this
$suffix=array('ly','ful','ay');
$words = array('beautiful','lovely','power');
$finalarray=array();
foreach($words as $word)
{
foreach($suffix as $suff)
{
$pattern = '/'.$suff.'$/';
if(preg_match($pattern, $word))
{
$finalarray[]=$word;
}
}
}
print_r($finalarray);
You can test online on http://writecodeonline.com/php/
Output
Array ( [0] => beautiful [1] => lovely )
This should give you what you want, assuming the order is not important in the resulting array:
$arr1 = ['ly', 'ful', 'ay'];
$arr2 = ['beautiful', 'lovely', 'power'];
$result = array_filter($arr2, function($word) use ($arr1){
$word_length = strlen($word);
return array_reduce($arr1, function($result, $suffix) use ($word, $word_length) {
if($word_length > strlen($suffix))
$result = $result || 0 === substr_compare($word, $suffix, -strlen($suffix), $word_length);
return $result;
}, false);
});
print_r($result);
/*
Array
(
[0] => beautiful
[1] => lovely
)
*/
See Demo
Try to use array_filter() with valid callback. In your case I suggest to look at regular expressions (preg_replace() or preg_match()).
<?php
header('Content-Type: text/plain');
$a = array('beautiful','lovely','power');
$b = array('ly','ful','ay');
$filters = array_map(function($filter){ return '/' . $filter . '$/'; }, $b);
$c = array_filter(
$a,
function($element)use($filters){ return $element != preg_replace($filters, '', $element); }
);
var_dump($c);
?>
Shows:
array(2) {
[0]=>
string(9) "beautiful"
[1]=>
string(6) "lovely"
}
UPDv1:
More short and optimized version with preg_match():
<?php
header('Content-Type: text/plain');
$a = array('beautiful','lovely','power');
$b = array('ly','ful','ay');
$filter = '/^.*(' . implode('|', $b) . ')$/';
$c = array_filter(
$a,
function($element)use($filter){ return preg_match($filter, $element); }
);
var_dump($c);
?>
Same output.
This should work:
$suffixes = array('ly','ful','ay');
$words = array('beautiful','lovely','power');
foreach($suffixes as $suffix){
foreach($words as $word){
if(strripos($word, $suffix) == strlen(str_replace($suffix, '', $word))){
$results[] = $word;
}
}
}
print_r($results);
You could definitely optimize this and make it shorter, but it's easy to understand and a good starting point.

Check if a string starts with certain words, and split it if it is

$str = 'foooo'; // <- true; how can I get 'foo' + 'oo' ?
$words = array(
'foo',
'oo'
);
What's the fastest way I could find out if $str starts with one of the words from the array, and split it if it does?
Using $words and $str from your example:
$pieces = preg_split('/^('.implode('|', $words).')/',
$str, 0, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
Result:
array(2) {
[0]=>
string(3) "foo"
[1]=>
string(2) "oo"
}
Try:
<?php
function helper($str, $words) {
foreach ($words as $word) {
if (substr($str, 0, strlen($word)) == $word) {
return array(
$word,
substr($str, strlen($word))
);
}
}
return null;
}
$words = array(
'foo',
'moo',
'whatever',
);
$str = 'foooo';
print_r(helper($str, $words));
Output:
Array
(
[0] => foo
[1] => oo
)
This solution iterates through the $words array and checks if $str starts with any words in it. If it finds a match, it reduces $str to $w and breaks.
foreach ($words as $w) {
if ($w == substr($str, 0, strlen($w))) {
$str=$w;
break;
}
}
string[] MaybeSplitString(string[] searchArray, string predicate)
{
foreach(string str in searchArray)
{
if(predicate.StartsWith(str)
return new string[] {str, predicate.Replace(str, "")};
}
return predicate;
}
This will need translation from C# into PHP, but this should point you in the right direction.

String manipulation/parsing in PHP

I've a string in the following format:
John Bo <jboe#gmail.com>, abracadbra#gmail.com, <asking#gmail.com>...
How can I parse the above string in PHP and just get the email addresses? Is there an easy way to parse?
=Rajesh=
You could of course just use a regex on the string, but the RFC complaint regex is a monster of a thing.
It would also fail in the unlikely (but possible event) of a#b.com <b#a.com> (unless you really would want both extracted in that case).
$str = 'John Bo <jboe#gmail.com>, abracadbra#gmail.com, <asking#gmail.com>';
$items = explode(',', $str);
$items = array_map('trim', $items);
$emails = array();
foreach($items as $item) {
preg_match_all('/<(.*?)>/', $item, $matches);
if (empty($matches[1])) {
$emails[] = $item;
continue;
}
$emails[] = $matches[1][0];
}
var_dump($emails);
Ideone.
Output
array(3) {
[0]=>
string(14) "jboe#gmail.com"
[1]=>
string(20) "abracadbra#gmail.com"
[2]=>
string(16) "asking#gmail.com"
}
One-liner no loops!
$str = 'John Bo <jboe#gmail.com>, abracadbra#gmail.com, <asking#gmail.com>';
$extracted_emails = array_map( function($v){ return trim( end( explode( '<', $v ) ), '> ' ); }, explode( ',', $str ) );
print_r($extracted_emails);
requires PHP 5.3
The most straight-forward way would be to (also I am terrible at regex):
<?php
$emailstring = "John Bo <jboe#gmail.com>,<other#email.com>, abracadbra#gmail.com, <asking#gmail.com>";
$emails = explode(',',$emailstring);
for ($i = 0; $i < count($emails); $i++) {
if (strpos($emails[$i], '<') !== false) {
$emails[$i] = substr($emails[$i], strpos($emails[$i], '<')+1);
$emails[$i] = str_replace('>','',$emails[$i]);
}
$emails[$i] = trim($emails[$i]);
}
print_r($emails);
?>
http://codepad.org/6lKkGBRM
Use int preg_match_all (string pattern, string subject, array matches, int flags) which will search "subject" for all matches of the regex (perl format) pattern, fill the array "matches" will all matches of the rejex and return the number of matches.
See http://www.regular-expressions.info/php.html

Categories