Extracting a string starting with # or # php

Extracting a string starting with # or # php - php

function extractConnect($str,$connect_type){
$connect_array = array();
$connect_counter = 0;
$str = trim($str).' ';
for($i =0; $i<strlen($str);$i++) {
$chr = $str[$i];
if($chr==$connect_type){ //$connect_type = '#' or '#' etc
$connectword = getConnect($i,$str);
$connect_array[$connect_counter] = $connectword;
$connect_counter++;
}
}
if(!empty($connect_array)){
return $connect_array;
}
}
function getConnect($tag_index,$str){
$str = trim($str).' ';
for($j = $tag_index; $j<strlen($str);$j++) {
$chr = $str[$j];
if($chr==' '){
$hashword = substr($str,$tag_index+1,$j-$tag_index);
return trim($hashword);
}
}
}
$at = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
$hash = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
print_r($at);
print_r($hash);
What this method does is it extracts # or # from a string and return an array of those words.
e.g input #stackoverflow is great. #google.com is the best search engine and outputs this
Array ( [0] => google.com ) Array ( [0] => stackoverflow )
But it seems like this method is to slow is there any alternative ?

You could use a regex to achieve this:
/<char>(\S+)\b/i
Explanation:
/ - starting delimiter
<char> - the character you're searching for (passed as a function argument)
(\S+) - any non-whitespace character, one or more times
\b - word boundary
i - case insensitivity modifier
/ - ending delimiter
Function:
function extractConnect($string, $char) {
$search = preg_quote($char, '/');
if (preg_match('/'.$search.'(\S+)\b/i', $string, $matches)) {
return [$matches[1]]; // Return an array containing the match
}
return false;
}
With your strings, this would produce the following output:
array(1) {
[0]=>
string(10) "google.com"
}
array(1) {
[0]=>
string(13) "stackoverflow"
}
Demo

You can do it like this:
<?php
function extractConnect($strSource, $tags) {
$matches = array();
$tags = str_split($tags);
$strSource = explode(' ', $strSource);
array_walk_recursive($strSource, function(&$item) {
$item = trim($item);
});
foreach ($strSource as $strPart) {
if (in_array($strPart[0], $tags)) {
$matches[$strPart[0]][] = substr($strPart, 1);
}
}
return $matches;
}
var_dump(extractConnect(
"#stackoverflow is great. #twitter is good. #google.com is the best search engine",
"##"
));
Outputs:

This seemed to work for me. Provide it with the symbol you want.
function get_stuff($str) {
$result = array();
$words = explode(' ', $str);
$symbols = array('#', '#');
foreach($words as $word) {
if (in_array($word[0], $symbols)) {
$result[$word[0]][] = substr($word, 1);
}
}
return $result;
}
$str = '#stackoverflow is great. #google.com is the best search engine';
print_r(get_stuff($str));
This outputs Array ( [#] => Array ( [0] => stackoverflow ) [#] => Array ( [0] => google.com ) )

Related

PHP: String to multidimensional array

(Sorry for my bad English)
I have a string that I want to split into an array.
The corner brackets are multiple nested arrays.
Escaped characters should be preserved.
This is a sample string:
$string = '[[["Hello, \"how\" are you?","Good!",,,123]],,"ok"]'
The result structure should look like this:
array (
0 =>
array (
0 =>
array (
0 => 'Hello, \"how\" are you?',
1 => 'Good!',
2 => '',
3 => '',
4 => '123',
),
),
1 => '',
2 => 'ok',
)
I have tested it with:
$pattern = '/[^"\\]*(?:\\.[^"\\]*)*/s';
$return = preg_match_all($pattern, $string, null);
But this did not work properly. I do not understand these RegEx patterns (I found this in another example on this page).
I do not know whether preg_match_all is the correct command.
I hope someone can help me.
Many Thanks!!!

This is a tough one for a regex - but there is a hack answer to your question (apologies in advance).
The string is almost a valid array literal but for the ,,s. You can match those pairs and then convert to ,''s with
/,(?=,)/
Then you can eval that string into the output array you are looking for.
For example:
// input
$str1 = '[[["Hello, \\"how\\" are you?","Good!",,,123]],,"ok"]';
// replace , followed by , with ,'' with a regex
$pattern = '/,(?=,)/';
$replace = ",''";
$str2 = preg_replace($pattern, $replace, $str1);
// eval updated string
$arr = eval("return $str2;");
var_dump($arr);
I get this:
array(3) {
[0]=>
array(1) {
[0]=>
array(5) {
[0]=>
string(21) "Hello, "how" are you?"
[1]=>
string(5) "Good!"
[2]=>
string(0) ""
[3]=>
string(0) ""
[4]=>
int(123)
}
}
[1]=>
string(0) ""
[2]=>
string(2) "ok"
}
Edit
Noting the inherent dangers of eval the better option is to use json_decode with the code above e.g.:
// input
$str1 = '[[["Hello, \\"how\\" are you?","Good!",,,123]],,"ok"]';
// replace , followed by , with ,'' with a regex
$pattern = '/,(?=,)/';
$replace = ',""';
$str2 = preg_replace($pattern, $replace, $str1);
// eval updated string
$arr = json_decode($str2);
var_dump($arr);

If you can edit the code that serializes the data then it's a better idea to let the serialization be handled using json_encode & json_decode. No need to reinvent the wheel on this one.
Nice cat btw.

You might want to use a lexer in combination with a recursive function that actually builds the structure.
For your purpose, the following tokens have been used:
\[ # opening bracket
\] # closing bracket
".+?(?<!\\)" # " to ", making sure it's not escaped
,(?!,) # a comma, not followed by a comma
\d+ # at least one digit
,(?=,) # a comma followed by a comma
The rest is programming logic, see a demo on ideone.com. Inspired by this post.
class Lexer {
protected static $_terminals = array(
'~^(\[)~' => "T_OPEN",
'~^(\])~' => "T_CLOSE",
'~^(".+?(?<!\\\\)")~' => "T_ITEM",
'~^(,)(?!,)~' => "T_SEPARATOR",
'~^(\d+)~' => "T_NUMBER",
'~^(,)(?=,)~' => "T_EMPTY"
);
public static function run($line) {
$tokens = array();
$offset = 0;
while($offset < strlen($line)) {
$result = static::_match($line, $offset);
if($result === false) {
throw new Exception("Unable to parse line " . ($line+1) . ".");
}
$tokens[] = $result;
$offset += strlen($result['match']);
}
return static::_generate($tokens);
}
protected static function _match($line, $offset) {
$string = substr($line, $offset);
foreach(static::$_terminals as $pattern => $name) {
if(preg_match($pattern, $string, $matches)) {
return array(
'match' => $matches[1],
'token' => $name
);
}
}
return false;
}
// a recursive function to actually build the structure
protected static function _generate($arr=array(), $idx=0) {
$output = array();
$current = 0;
for($i=$idx;$i<count($arr);$i++) {
$type = $arr[$i]["token"];
$element = $arr[$i]["match"];
switch ($type) {
case 'T_OPEN':
list($out, $index) = static::_generate($arr, $i+1);
$output[] = $out;
$i = $index;
break;
case 'T_CLOSE':
return array($output, $i);
break;
case 'T_ITEM':
case 'T_NUMBER':
$output[] = $element;
break;
case 'T_EMPTY':
$output[] = "";
break;
}
}
return $output;
}
}
$input = '[[["Hello, \"how\" are you?","Good!",,,123]],,"ok"]';
$items = Lexer::run($input);
print_r($items);
?>

Display element of 2nd array whose suffix matched with 1st array

I have two arrays, i.e.:
array('ly', 'ful', 'ay')
and
array('beautiful', 'lovely', 'power')
I want to print the content of second array whose suffix matched with first array. i.e. the output should be lovely, beautiful.
How can I do this in PHP?

Try this
$suffix=array('ly','ful','ay');
$words = array('beautiful','lovely','power');
$finalarray=array();
foreach($words as $word)
{
foreach($suffix as $suff)
{
$pattern = '/'.$suff.'$/';
if(preg_match($pattern, $word))
{
$finalarray[]=$word;
}
}
}
print_r($finalarray);
You can test online on http://writecodeonline.com/php/
Output
Array ( [0] => beautiful [1] => lovely )

This should give you what you want, assuming the order is not important in the resulting array:
$arr1 = ['ly', 'ful', 'ay'];
$arr2 = ['beautiful', 'lovely', 'power'];
$result = array_filter($arr2, function($word) use ($arr1){
$word_length = strlen($word);
return array_reduce($arr1, function($result, $suffix) use ($word, $word_length) {
if($word_length > strlen($suffix))
$result = $result || 0 === substr_compare($word, $suffix, -strlen($suffix), $word_length);
return $result;
}, false);
});
print_r($result);
/*
Array
(
[0] => beautiful
[1] => lovely
)
*/
See Demo

Try to use array_filter() with valid callback. In your case I suggest to look at regular expressions (preg_replace() or preg_match()).
<?php
header('Content-Type: text/plain');
$a = array('beautiful','lovely','power');
$b = array('ly','ful','ay');
$filters = array_map(function($filter){ return '/' . $filter . '$/'; }, $b);
$c = array_filter(
$a,
function($element)use($filters){ return $element != preg_replace($filters, '', $element); }
);
var_dump($c);
?>
Shows:
array(2) {
[0]=>
string(9) "beautiful"
[1]=>
string(6) "lovely"
}
UPDv1:
More short and optimized version with preg_match():
<?php
header('Content-Type: text/plain');
$a = array('beautiful','lovely','power');
$b = array('ly','ful','ay');
$filter = '/^.*(' . implode('|', $b) . ')$/';
$c = array_filter(
$a,
function($element)use($filter){ return preg_match($filter, $element); }
);
var_dump($c);
?>
Same output.

This should work:
$suffixes = array('ly','ful','ay');
$words = array('beautiful','lovely','power');
foreach($suffixes as $suffix){
foreach($words as $word){
if(strripos($word, $suffix) == strlen(str_replace($suffix, '', $word))){
$results[] = $word;
}
}
}
print_r($results);
You could definitely optimize this and make it shorter, but it's easy to understand and a good starting point.

Regex hash and colons

I want to use regular expression to filter substrings from this string
eg: hello world #level:basic #lang:java:php #...
I am trying to produce an array with a structure like this:
Array
(
[0]=> hello world
[1]=> Array
(
[0]=> level
[1]=> basic
)
[2]=> Array
(
[0]=> lang
[1]=> java
[2]=> php
)
)
I have tried preg_match("/(.*)#(.*)[:(.*)]*/", $input_line, $output_array);
and what I have got is:
Array
(
[0] => hello world #level:basic #lang:java:php
[1] => hello world #level:basic
[2] => lang:java:php
)
In this case then I will have to apply this regex few times to the indexes and then apply a regex to filter the colon out. My question is: is it possible to create a better regex to do all in one go? what would the regex be? Thanks

You can use :
$array = explode("#", "hello world #level:basic #lang:java:php");
foreach($array as $k => &$v) {
$v = strpos($v, ":") === false ? $v : explode(":", $v);
}
print_r($array);

do this
$array = array() ;
$text = "hello world #level:basic #lang:java:php";
$array = explode("#", $text);
foreach($array as $i => $value){
$array[$i] = explode(":", trim($value));
}
print_r($array);

Got something for you:
Rules:
a tag begins with #
a tag may not contain whitespace/newline
a tag is preceeded and followed by whitespace or line beginning/ending
a tag can have several parts divided by :
Example:
#this:tag:matches this is some text #a-tag this is no tag: \#escaped
and this one tag#does:not:match
Function:
<?php
function parseTags($string)
{
static $tag_regex = '#(?<=\s|^)#([^\:\s]+)(?:\:([^\s]+))*(?=\s|$)#m';
$results = array();
preg_match_all($tag_regex, $string, $results, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);
$tags = array();
foreach($results as $result) {
$tag = array(
'offset' => $result[0][1],
'raw' => $result[0][0],
'length' => strlen($result[0][0]),
0 => $result[1][0]);
if(isset($result[2]))
$tag = array_merge($tag, explode(':', $result[2][0]));
$tag['elements'] = count($tag)-3;
$tags[] = $tag;
}
return $tags;
}
?>
Result:
array(2) {
[0]=>array(7) {
["offset"]=>int(0)
["raw"]=>string(17) "#this:tag:matches"
["length"]=>int(17)
[0]=>string(4) "this"
[1]=>string(3) "tag"
[2]=>string(7) "matches"
["elements"]=>int(3)
}
[1]=>array(5) {
["offset"]=>int(36)
["raw"]=>string(6) "#a-tag"
["length"]=>int(6)
[0]=>string(5) "a-tag"
["elements"]=>int(1)
}
}
Each matched tag contains
the raw tag text
the tag offset and original length (e.g. to replace it in the string later with str... functions)
the number of elements (to safely iterate for($i = 0; $i < $tag['elements']; $i++))

This might work for you:
$results = array() ;
$text = "hello world #level:basic #lang:java:php" ;
$parts = explode("#", $text);
foreach($parts as $part){
$results[] = explode(":", $part);
}
var_dump($results);

Two ways using regex, note that you somehow need explode() since PCRE for PHP doesn't support capturing a subgroup:
$string = 'hello world #level:basic #lang:java:php';
preg_match_all('/(?<=#)[\w:]+/', $string, $m);
foreach($m[0] as $v){
$example1[] = explode(':', $v);
}
print_r($example1);
// This one needs PHP 5.3+
$example2 = array();
preg_replace_callback('/(?<=#)[\w:]+/', function($m)use(&$example2){
$example2[] = explode(':', $m[0]);
}, $string);
print_r($example2);

This give you the array structure you are looking for:
<pre><?php
$subject = 'hello world #level:basic #lang:java:php';
$array = explode('#', $subject);
foreach($array as &$value) {
$items = explode(':', trim($value));
if (sizeof($items)>1) $value = $items;
}
print_r($array);
But if you prefer you can use this abomination:
$subject = 'hello world #level:basic #lang:java:php';
$pattern = '~(?:^| ?+#)|(?:\G([^#:]+?)(?=:| #|$)|:)+~';
preg_match_all($pattern, $subject, $matches);
array_shift($matches[1]);
$lastKey = sizeof($matches[1])-1;
foreach ($matches[1] as $key=>$match) {
if (!empty($match)) $temp[]=$match;
if (empty($match) || $key==$lastKey) {
$result[] = (sizeof($temp)>1) ? $temp : $temp[0];
unset($temp);
}
}
print_r($result);

Check if a string starts with certain words, and split it if it is

$str = 'foooo'; // <- true; how can I get 'foo' + 'oo' ?
$words = array(
'foo',
'oo'
);
What's the fastest way I could find out if $str starts with one of the words from the array, and split it if it does?

Using $words and $str from your example:
$pieces = preg_split('/^('.implode('|', $words).')/',
$str, 0, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
Result:
array(2) {
[0]=>
string(3) "foo"
[1]=>
string(2) "oo"
}

Try:
<?php
function helper($str, $words) {
foreach ($words as $word) {
if (substr($str, 0, strlen($word)) == $word) {
return array(
$word,
substr($str, strlen($word))
);
}
}
return null;
}
$words = array(
'foo',
'moo',
'whatever',
);
$str = 'foooo';
print_r(helper($str, $words));
Output:
Array
(
[0] => foo
[1] => oo
)

This solution iterates through the $words array and checks if $str starts with any words in it. If it finds a match, it reduces $str to $w and breaks.
foreach ($words as $w) {
if ($w == substr($str, 0, strlen($w))) {
$str=$w;
break;
}
}

string[] MaybeSplitString(string[] searchArray, string predicate)
{
foreach(string str in searchArray)
{
if(predicate.StartsWith(str)
return new string[] {str, predicate.Replace(str, "")};
}
return predicate;
}
This will need translation from C# into PHP, but this should point you in the right direction.

A pattern-matching function in PHP

I'm looking for a function, class or collection of functions that will assist in the process of pattern matching strings as I have a project that requires a fair amount of pattern matching and I'd like something easier to read and maintain than raw preg_replace (or regex period).
I've provided a pseudo example in hopes that it will help you understand what I'm asking.
$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
pattern_match($subject, $pattern, 0);
would return "$2,500".
$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
pattern_match($subject, $pattern, 1);
would return an array with the values: [$2,500], [$1,250], [$1,250]
The function — as I'm trying to write — uses 'n' for numbers, 'c' for lower-case alpha and 'C' for upper-case alpha where any non-alphanumeric character represents itself.
Any help would be appreciated.

<?php
// $match_all = false: returns string with first match
// $match_all = true: returns array of strings with all matches
function pattern_match($subject, $pattern, $match_all = false)
{
$pattern = preg_quote($pattern, '|');
$ar_pattern_replaces = array(
'n' => '[0-9]',
'c' => '[a-z]',
'C' => '[A-Z]',
);
$pattern = strtr($pattern, $ar_pattern_replaces);
$pattern = "|".$pattern."|";
if ($match_all)
{
preg_match_all($pattern, $subject, $matches);
}
else
{
preg_match($pattern, $subject, $matches);
}
return $matches[0];
}
$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
$result = pattern_match($subject, $pattern, 0);
var_dump($result);
$result = pattern_match($subject, $pattern, 1);
var_dump($result);

Here is the function with no regexp that should work ('C' and 'c' recognize only ascii chars) , enjoy:
function pattern_match($subject, $pattern, $result_as_array) {
$pattern_len = strlen($pattern);
if ($pattern_len==0) return false; // error: empty pattern
// translate $subject with the symboles of the rule ('n', 'c' or 'C')
$translate = '';
$subject_len = strlen($subject);
for ($i=0 ; $i<$subject_len ; $i++) {
$x = $subject[$i];
$ord = ord($x);
if ( ($ord>=48) && ($ord<=57) ) { // between 0 and 9
$translate .= 'n';
} elseif ( ($ord>=65) && ($ord<=90) ) { // between A and Z
$translate .= 'C';
} elseif ( ($ord>=97) && ($ord<=122) ) { // between a and z
$translate .= 'c';
} else {
$translate .= $x; // othre characters are not translated
}
}
// now search all positions in the translated string
// single result mode
if (!$result_as_array) {
$p = strpos($translate, $pattern);
if ($p===false) {
return false;
} else {
return substr($subject, $p, $pattern_len);
}
}
// array result mode
$result = array();
$p = 0;
$n = 0;
while ( ($p<$subject_len) && (($p=strpos($translate,$pattern,$p))!==false) ) {
$result[] = substr($subject, $p, $pattern_len);
$p = $p + $pattern_len;
}
return $result;
}

Update: This is an incomplete answer that doesn't hold up against several test patterns. See #Frosty Z's answer for a better solution.
<?php
function pattern_match($s, $p, $c=0) {
$tokens = array(
'$' => '\$',
'n' => '\d{1}',
'c' => '[a-z]{1}',
'C' => '[A-Z]{1}'
);
$reg = '/' . str_replace(array_keys($tokens), array_values($tokens), $p) . '/';
if ($c == 0) {
preg_match($reg, $s, $matches);
} else {
preg_match_all($reg, $s, $matches);
}
return $matches[0];
}
$subject = "$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).";
$pattern = '$n,nnn';
print_r(pattern_match($subject, $pattern, 0));
print_r(pattern_match($subject, $pattern, 1));
$pattern = 'cc-cccc';
print_r(pattern_match($subject, $pattern));
print_r(pattern_match($subject, $pattern, 1));
?>
Output:
$2,500
Array
(
[0] => $2,500
[1] => $1,250
[2] => $1,250
)
on-time
Array
(
[0] => on-time
[1] => on-time
)
Note: Make sure to use single-quotes for your $pattern when it contains $, or PHP will try to parse it as a $variable.

The function you're looking for is preg_match_all, although you'll need to use REGEX patterns for your pattern matching.

Sorry, but this is a problem for regex. I understand your objections, but there's just no other way as efficient or simple in this case. This is an extremely simple matching problem. You could write a custom wrapper as jnpcl demonstrated, but that would only involve more code and more potential pitfalls. Not to mention extra overhead.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting a string starting with # or # php - php

Related

PHP: String to multidimensional array

Display element of 2nd array whose suffix matched with 1st array

Regex hash and colons

Check if a string starts with certain words, and split it if it is

A pattern-matching function in PHP

Categories

Resources