Get the difference between two strings - php

I am creating a wildcard search/replace function and need to find the difference between two strings. I have tried some functions like array_diff and preg_match, browsed my way trough ~10 google pages with no solution.
I have a simple solution right now, but want to implement support for unknown value before wildcard
Here's what I got:
function wildcard_search($string, $wildcard) {
$wildcards = array();
$regex = "/( |_|-|\/|-|\.|,)/";
$split_string = preg_split($regex, $string);
$split_wildcard = preg_split($regex, $wildcard);
foreach($split_wildcard as $key => $value) {
if(isset($split_string[$key]) && $split_string[$key] != $value) {
$wildcards[] = $split_string[$key];
}
}
return $wildcards;
}
Example usage:
$str1 = "I prefer Microsoft products to Apple but love Linux"; //original string
$str2 = "I prefer * products to * but love *"; //wildcard search
$value = wildcard_search($str1, $str2);
//$value should now be array([0] => "Microsoft", [1] => "Apple", [2] => "Linux");
shuffle($value);
vprintf('I prefer %s products to %s but love %s', $value);
// now we can get all kinds of outputs like:
// I prefer Microsoft products to Linux but love Apple
// I prefer Apple products to Microsoft but love Linux
// I prefer Linux products to Apple but love Microsoft
// etc..
I want to implement support for unknown value before the wildcard.
Example:
$value = wildcard_search('Stackoverflow is an awesome site', 'Stack* is an awesome site');
// $value should now be array([0] => 'overflow');
// Because the wildcard (*) represents overflow in the second string
// (We already know some parts of the string but want to find the rest)
Could this be done without to much hassle with hundreds of loops etc.?

I'd change your function to use preg_quote and replace the escaped \* character with (.*?) instead:
function wildcard_search($string, $wildcard, $caseSensitive = false) {
$regex = '/^' . str_replace('\*', '(.*?)', preg_quote($wildcard)) . '$/' . (!$caseSensitive ? 'i' : '');
if (preg_match($regex, $string, $matches)) {
return array_slice($matches, 1); //Cut away the full string (position 0)
}
return false; //We didn't find anything
}
Example:
<?php
$str1 = "I prefer Microsoft products to Apple but love Linux"; //original string
$str2 = "I prefer * products to * but love *"; //wildcard search
var_dump( wildcard_search($str1, $str2) );
$str1 = 'Stackoverflow is an awesome site';
$str2 = 'Stack* is an awesome site';
var_dump( wildcard_search($str1, $str2) );
$str1 = 'Foo';
$str2 = 'bar';
var_dump( wildcard_search($str1, $str2) );
?>
Output:
array(3) {
[0]=>
string(9) "Microsoft"
[1]=>
string(5) "Apple"
[2]=>
string(5) "Linux"
}
array(1) {
[0]=>
string(8) "overflow"
}
bool(false)
DEMO

Related

PHP: String to multidimensional array

(Sorry for my bad English)
I have a string that I want to split into an array.
The corner brackets are multiple nested arrays.
Escaped characters should be preserved.
This is a sample string:
$string = '[[["Hello, \"how\" are you?","Good!",,,123]],,"ok"]'
The result structure should look like this:
array (
0 =>
array (
0 =>
array (
0 => 'Hello, \"how\" are you?',
1 => 'Good!',
2 => '',
3 => '',
4 => '123',
),
),
1 => '',
2 => 'ok',
)
I have tested it with:
$pattern = '/[^"\\]*(?:\\.[^"\\]*)*/s';
$return = preg_match_all($pattern, $string, null);
But this did not work properly. I do not understand these RegEx patterns (I found this in another example on this page).
I do not know whether preg_match_all is the correct command.
I hope someone can help me.
Many Thanks!!!
This is a tough one for a regex - but there is a hack answer to your question (apologies in advance).
The string is almost a valid array literal but for the ,,s. You can match those pairs and then convert to ,''s with
/,(?=,)/
Then you can eval that string into the output array you are looking for.
For example:
// input
$str1 = '[[["Hello, \\"how\\" are you?","Good!",,,123]],,"ok"]';
// replace , followed by , with ,'' with a regex
$pattern = '/,(?=,)/';
$replace = ",''";
$str2 = preg_replace($pattern, $replace, $str1);
// eval updated string
$arr = eval("return $str2;");
var_dump($arr);
I get this:
array(3) {
[0]=>
array(1) {
[0]=>
array(5) {
[0]=>
string(21) "Hello, "how" are you?"
[1]=>
string(5) "Good!"
[2]=>
string(0) ""
[3]=>
string(0) ""
[4]=>
int(123)
}
}
[1]=>
string(0) ""
[2]=>
string(2) "ok"
}
Edit
Noting the inherent dangers of eval the better option is to use json_decode with the code above e.g.:
// input
$str1 = '[[["Hello, \\"how\\" are you?","Good!",,,123]],,"ok"]';
// replace , followed by , with ,'' with a regex
$pattern = '/,(?=,)/';
$replace = ',""';
$str2 = preg_replace($pattern, $replace, $str1);
// eval updated string
$arr = json_decode($str2);
var_dump($arr);
If you can edit the code that serializes the data then it's a better idea to let the serialization be handled using json_encode & json_decode. No need to reinvent the wheel on this one.
Nice cat btw.
You might want to use a lexer in combination with a recursive function that actually builds the structure.
For your purpose, the following tokens have been used:
\[ # opening bracket
\] # closing bracket
".+?(?<!\\)" # " to ", making sure it's not escaped
,(?!,) # a comma, not followed by a comma
\d+ # at least one digit
,(?=,) # a comma followed by a comma
The rest is programming logic, see a demo on ideone.com. Inspired by this post.
class Lexer {
protected static $_terminals = array(
'~^(\[)~' => "T_OPEN",
'~^(\])~' => "T_CLOSE",
'~^(".+?(?<!\\\\)")~' => "T_ITEM",
'~^(,)(?!,)~' => "T_SEPARATOR",
'~^(\d+)~' => "T_NUMBER",
'~^(,)(?=,)~' => "T_EMPTY"
);
public static function run($line) {
$tokens = array();
$offset = 0;
while($offset < strlen($line)) {
$result = static::_match($line, $offset);
if($result === false) {
throw new Exception("Unable to parse line " . ($line+1) . ".");
}
$tokens[] = $result;
$offset += strlen($result['match']);
}
return static::_generate($tokens);
}
protected static function _match($line, $offset) {
$string = substr($line, $offset);
foreach(static::$_terminals as $pattern => $name) {
if(preg_match($pattern, $string, $matches)) {
return array(
'match' => $matches[1],
'token' => $name
);
}
}
return false;
}
// a recursive function to actually build the structure
protected static function _generate($arr=array(), $idx=0) {
$output = array();
$current = 0;
for($i=$idx;$i<count($arr);$i++) {
$type = $arr[$i]["token"];
$element = $arr[$i]["match"];
switch ($type) {
case 'T_OPEN':
list($out, $index) = static::_generate($arr, $i+1);
$output[] = $out;
$i = $index;
break;
case 'T_CLOSE':
return array($output, $i);
break;
case 'T_ITEM':
case 'T_NUMBER':
$output[] = $element;
break;
case 'T_EMPTY':
$output[] = "";
break;
}
}
return $output;
}
}
$input = '[[["Hello, \"how\" are you?","Good!",,,123]],,"ok"]';
$items = Lexer::run($input);
print_r($items);
?>

Extracting a string starting with # or # php

function extractConnect($str,$connect_type){
$connect_array = array();
$connect_counter = 0;
$str = trim($str).' ';
for($i =0; $i<strlen($str);$i++) {
$chr = $str[$i];
if($chr==$connect_type){ //$connect_type = '#' or '#' etc
$connectword = getConnect($i,$str);
$connect_array[$connect_counter] = $connectword;
$connect_counter++;
}
}
if(!empty($connect_array)){
return $connect_array;
}
}
function getConnect($tag_index,$str){
$str = trim($str).' ';
for($j = $tag_index; $j<strlen($str);$j++) {
$chr = $str[$j];
if($chr==' '){
$hashword = substr($str,$tag_index+1,$j-$tag_index);
return trim($hashword);
}
}
}
$at = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
$hash = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
print_r($at);
print_r($hash);
What this method does is it extracts # or # from a string and return an array of those words.
e.g input #stackoverflow is great. #google.com is the best search engine and outputs this
Array ( [0] => google.com ) Array ( [0] => stackoverflow )
But it seems like this method is to slow is there any alternative ?
You could use a regex to achieve this:
/<char>(\S+)\b/i
Explanation:
/ - starting delimiter
<char> - the character you're searching for (passed as a function argument)
(\S+) - any non-whitespace character, one or more times
\b - word boundary
i - case insensitivity modifier
/ - ending delimiter
Function:
function extractConnect($string, $char) {
$search = preg_quote($char, '/');
if (preg_match('/'.$search.'(\S+)\b/i', $string, $matches)) {
return [$matches[1]]; // Return an array containing the match
}
return false;
}
With your strings, this would produce the following output:
array(1) {
[0]=>
string(10) "google.com"
}
array(1) {
[0]=>
string(13) "stackoverflow"
}
Demo
You can do it like this:
<?php
function extractConnect($strSource, $tags) {
$matches = array();
$tags = str_split($tags);
$strSource = explode(' ', $strSource);
array_walk_recursive($strSource, function(&$item) {
$item = trim($item);
});
foreach ($strSource as $strPart) {
if (in_array($strPart[0], $tags)) {
$matches[$strPart[0]][] = substr($strPart, 1);
}
}
return $matches;
}
var_dump(extractConnect(
"#stackoverflow is great. #twitter is good. #google.com is the best search engine",
"##"
));
Outputs:
This seemed to work for me. Provide it with the symbol you want.
function get_stuff($str) {
$result = array();
$words = explode(' ', $str);
$symbols = array('#', '#');
foreach($words as $word) {
if (in_array($word[0], $symbols)) {
$result[$word[0]][] = substr($word, 1);
}
}
return $result;
}
$str = '#stackoverflow is great. #google.com is the best search engine';
print_r(get_stuff($str));
This outputs Array ( [#] => Array ( [0] => stackoverflow ) [#] => Array ( [0] => google.com ) )

Regex hash and colons

I want to use regular expression to filter substrings from this string
eg: hello world #level:basic #lang:java:php #...
I am trying to produce an array with a structure like this:
Array
(
[0]=> hello world
[1]=> Array
(
[0]=> level
[1]=> basic
)
[2]=> Array
(
[0]=> lang
[1]=> java
[2]=> php
)
)
I have tried preg_match("/(.*)#(.*)[:(.*)]*/", $input_line, $output_array);
and what I have got is:
Array
(
[0] => hello world #level:basic #lang:java:php
[1] => hello world #level:basic
[2] => lang:java:php
)
In this case then I will have to apply this regex few times to the indexes and then apply a regex to filter the colon out. My question is: is it possible to create a better regex to do all in one go? what would the regex be? Thanks
You can use :
$array = explode("#", "hello world #level:basic #lang:java:php");
foreach($array as $k => &$v) {
$v = strpos($v, ":") === false ? $v : explode(":", $v);
}
print_r($array);
do this
$array = array() ;
$text = "hello world #level:basic #lang:java:php";
$array = explode("#", $text);
foreach($array as $i => $value){
$array[$i] = explode(":", trim($value));
}
print_r($array);
Got something for you:
Rules:
a tag begins with #
a tag may not contain whitespace/newline
a tag is preceeded and followed by whitespace or line beginning/ending
a tag can have several parts divided by :
Example:
#this:tag:matches this is some text #a-tag this is no tag: \#escaped
and this one tag#does:not:match
Function:
<?php
function parseTags($string)
{
static $tag_regex = '#(?<=\s|^)#([^\:\s]+)(?:\:([^\s]+))*(?=\s|$)#m';
$results = array();
preg_match_all($tag_regex, $string, $results, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);
$tags = array();
foreach($results as $result) {
$tag = array(
'offset' => $result[0][1],
'raw' => $result[0][0],
'length' => strlen($result[0][0]),
0 => $result[1][0]);
if(isset($result[2]))
$tag = array_merge($tag, explode(':', $result[2][0]));
$tag['elements'] = count($tag)-3;
$tags[] = $tag;
}
return $tags;
}
?>
Result:
array(2) {
[0]=>array(7) {
["offset"]=>int(0)
["raw"]=>string(17) "#this:tag:matches"
["length"]=>int(17)
[0]=>string(4) "this"
[1]=>string(3) "tag"
[2]=>string(7) "matches"
["elements"]=>int(3)
}
[1]=>array(5) {
["offset"]=>int(36)
["raw"]=>string(6) "#a-tag"
["length"]=>int(6)
[0]=>string(5) "a-tag"
["elements"]=>int(1)
}
}
Each matched tag contains
the raw tag text
the tag offset and original length (e.g. to replace it in the string later with str... functions)
the number of elements (to safely iterate for($i = 0; $i < $tag['elements']; $i++))
This might work for you:
$results = array() ;
$text = "hello world #level:basic #lang:java:php" ;
$parts = explode("#", $text);
foreach($parts as $part){
$results[] = explode(":", $part);
}
var_dump($results);
Two ways using regex, note that you somehow need explode() since PCRE for PHP doesn't support capturing a subgroup:
$string = 'hello world #level:basic #lang:java:php';
preg_match_all('/(?<=#)[\w:]+/', $string, $m);
foreach($m[0] as $v){
$example1[] = explode(':', $v);
}
print_r($example1);
// This one needs PHP 5.3+
$example2 = array();
preg_replace_callback('/(?<=#)[\w:]+/', function($m)use(&$example2){
$example2[] = explode(':', $m[0]);
}, $string);
print_r($example2);
This give you the array structure you are looking for:
<pre><?php
$subject = 'hello world #level:basic #lang:java:php';
$array = explode('#', $subject);
foreach($array as &$value) {
$items = explode(':', trim($value));
if (sizeof($items)>1) $value = $items;
}
print_r($array);
But if you prefer you can use this abomination:
$subject = 'hello world #level:basic #lang:java:php';
$pattern = '~(?:^| ?+#)|(?:\G([^#:]+?)(?=:| #|$)|:)+~';
preg_match_all($pattern, $subject, $matches);
array_shift($matches[1]);
$lastKey = sizeof($matches[1])-1;
foreach ($matches[1] as $key=>$match) {
if (!empty($match)) $temp[]=$match;
if (empty($match) || $key==$lastKey) {
$result[] = (sizeof($temp)>1) ? $temp : $temp[0];
unset($temp);
}
}
print_r($result);

How to parse console command strings in PHP with quotes

For my game I'm coding a console that sends messages via AJAX and then receives output from the server.
For example, an input would be:
/testmessage Hello!
However, I would also need to parse the quotes e.g.:
/testmessage "Hello World!"
However, since I am simply exploding the string with spaces, PHP sees "Hello and World!" as separate parameters. How do I make PHP think that "Hello World!" is one parameter?
Right now I'm using the following code to parse the command:
// Suppose $inputstring = '/testmessage "Hello World!"';
$inputstring = substr($inputstring, 1);
$parameters = explode(" ", $inputstring);
$command = strtolower($parameters[0]);
switch ($command) {
case "testmessage":
ConsoleDie($parameters[1]);
break;
}
Thank you in advance.
This code will do what you want:
$params = preg_split('/(".*?")/', '/testmessage "Hello World!" 1 2 3', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$realParams = array();
foreach($params as $param)
{
$param = trim($param);
if ($param == '')
continue;
if (strpos($param, '"') === 0)
$realParams = array_merge($realParams, array(trim($param, '"')));
else
$realParams = array_merge($realParams, explode(' ', $param));
}
unset($params);
print_r($realParams);
that print:
array(5) {
[0]=>
string(12) "/testmessage"
[1]=>
string(14) "Hello World!"
[2]=>
string(1) "1"
[3]=>
string(1) "2"
[4]=>
string(1) "3"
}
Note: As you can see the first parameter is the command
Hope this code is more 'understandable'
$input = $inputstring = '/testmessage "Hello World!" "single phrase" level two';
// find the parameters surrounded with quotes, grab only the value (remove "s)
preg_match_all('/"(.*?)"/', $inputstring, $quotes);
// for each parameters with quotes, put a 'placeholder' like {{1}}, {{2}}
foreach ($quotes[1] as $key => $value) {
$inputstring = str_replace($value, "{{{$key}}}", $inputstring);
}
// then separate by space
$parameters = explode(" ", $inputstring);
// replace the placeholders {{1}} with the original value
foreach ($parameters as $key => $value) {
if (preg_match('{{(\d+)}}', $value, $matches)) {
$parameters[$key] = $quotes[1][$matches[1]];
}
}
// here you go
print_r($parameters);
I may not have understood you fully, but if you are assuming that the first word is always a command word, and anything following is 'one parameter' you could do the following
$inputstring = substr($inputstring, 1);
$parameters = explode(" ", $inputstring);
// shift the first element off the array i.e. the command
$command = strtolower(array_shift($parameters));
// Glue the rest of the array together
$input_message = implode($parameters);
switch ($command) {
case "testmessage":
ConsoleDie($input_message);
break;
}
You can use the Symfony Console Component which offers a secure and clean way to get console inputs.
For your use case you should do:
use Symfony\Component\Console\Input\ArgvInput;
use Symfony\Component\Console\Input\InputDefinition;
use Symfony\Component\Console\Input\InputArgument;
$input = new ArgvInput(null, new InputDefinition(array(
new InputArgument('message', InputArgument::REQUIRED)
)));
$parameters = $input->getArguments(); // $parameters['message'] contains the first argument

String manipulation/parsing in PHP

I've a string in the following format:
John Bo <jboe#gmail.com>, abracadbra#gmail.com, <asking#gmail.com>...
How can I parse the above string in PHP and just get the email addresses? Is there an easy way to parse?
=Rajesh=
You could of course just use a regex on the string, but the RFC complaint regex is a monster of a thing.
It would also fail in the unlikely (but possible event) of a#b.com <b#a.com> (unless you really would want both extracted in that case).
$str = 'John Bo <jboe#gmail.com>, abracadbra#gmail.com, <asking#gmail.com>';
$items = explode(',', $str);
$items = array_map('trim', $items);
$emails = array();
foreach($items as $item) {
preg_match_all('/<(.*?)>/', $item, $matches);
if (empty($matches[1])) {
$emails[] = $item;
continue;
}
$emails[] = $matches[1][0];
}
var_dump($emails);
Ideone.
Output
array(3) {
[0]=>
string(14) "jboe#gmail.com"
[1]=>
string(20) "abracadbra#gmail.com"
[2]=>
string(16) "asking#gmail.com"
}
One-liner no loops!
$str = 'John Bo <jboe#gmail.com>, abracadbra#gmail.com, <asking#gmail.com>';
$extracted_emails = array_map( function($v){ return trim( end( explode( '<', $v ) ), '> ' ); }, explode( ',', $str ) );
print_r($extracted_emails);
requires PHP 5.3
The most straight-forward way would be to (also I am terrible at regex):
<?php
$emailstring = "John Bo <jboe#gmail.com>,<other#email.com>, abracadbra#gmail.com, <asking#gmail.com>";
$emails = explode(',',$emailstring);
for ($i = 0; $i < count($emails); $i++) {
if (strpos($emails[$i], '<') !== false) {
$emails[$i] = substr($emails[$i], strpos($emails[$i], '<')+1);
$emails[$i] = str_replace('>','',$emails[$i]);
}
$emails[$i] = trim($emails[$i]);
}
print_r($emails);
?>
http://codepad.org/6lKkGBRM
Use int preg_match_all (string pattern, string subject, array matches, int flags) which will search "subject" for all matches of the regex (perl format) pattern, fill the array "matches" will all matches of the rejex and return the number of matches.
See http://www.regular-expressions.info/php.html

Categories