Given two equal-length strings, is there an elegant way to get the offset of the first different character?
The obvious solution would be:
for ($offset = 0; $offset < $length; ++$offset) {
if ($str1[$offset] !== $str2[$offset]) {
return $offset;
}
}
But that doesn't look quite right, for such a simple task.
You can use a nice property of bitwise XOR (^) to achieve this: Basically, when you xor two strings together, the characters that are the same will become null bytes ("\0"). So if we xor the two strings, we just need to find the position of the first non-null byte using strspn:
$position = strspn($string1 ^ $string2, "\0");
That's all there is to it. So let's look at an example:
$string1 = 'foobarbaz';
$string2 = 'foobarbiz';
$pos = strspn($string1 ^ $string2, "\0");
printf(
'First difference at position %d: "%s" vs "%s"',
$pos, $string1[$pos], $string2[$pos]
);
That will output:
First difference at position 7: "a" vs "i"
So that should do it. It's very efficient since it's only using C functions, and requires only a single copy of memory of the string.
Edit: A MultiByte Solution Along The Same Lines:
function getCharacterOffsetOfDifference($str1, $str2, $encoding = 'UTF-8') {
return mb_strlen(
mb_strcut(
$str1,
0, strspn($str1 ^ $str2, "\0"),
$encoding
),
$encoding
);
}
First the difference at the byte level is found using the above method and then the offset is mapped to the character level. This is done using the mb_strcut function, which is basically substr but honoring multibyte character boundaries.
var_dump(getCharacterOffsetOfDifference('foo', 'foa')); // 2
var_dump(getCharacterOffsetOfDifference('©oo', 'foa')); // 0
var_dump(getCharacterOffsetOfDifference('f©o', 'fªa')); // 1
It's not as elegant as the first solution, but it's still a one-liner (and if you use the default encoding a little bit simpler):
return mb_strlen(mb_strcut($str1, 0, strspn($str1 ^ $str2, "\0")));
If you convert a string to an array of single character one byte values you can use the array comparison functions to compare the strings.
You can achieve a similar result to the XOR method with the following.
$string1 = 'foobarbaz';
$string2 = 'foobarbiz';
$array1 = str_split($string1);
$array2 = str_split($string2);
$result = array_diff_assoc($array1, $array2);
$num_diff = count($result);
$first_diff = key($result);
echo "There are " . $num_diff . " differences between the two strings. <br />";
echo "The first difference between the strings is at position " . $first_diff . ". (Zero Index) '$string1[$first_diff]' vs '$string2[$first_diff]'.";
Edit: Multibyte Solution
$string1 = 'foorbarbaz';
$string2 = 'foobarbiz';
$array1 = preg_split('((.))u', $string1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$array2 = preg_split('((.))u', $string2, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$result = array_diff_assoc($array1, $array2);
$num_diff = count($result);
$first_diff = key($result);
echo "There are " . $num_diff . " differences between the two strings.\n";
echo "The first difference between the strings is at position " . $first_diff . ". (Zero Index) '$string1[$first_diff]' vs '$string2[$first_diff]'.\n";
I wanted to add this as as comment to the best answer, but I do not have enough points.
$string1 = 'foobarbaz';
$string2 = 'foobarbiz';
$pos = strspn($string1 ^ $string2, "\0");
if ($pos < min(strlen($string1), strlen($string2)){
printf(
'First difference at position %d: "%s" vs "%s"',
$pos, $string1[$pos], $string2[$pos]
);
} else if ($pos < strlen($string1)) {
print 'String1 continues with' . substr($string1, $pos);
} else if ($pos < strlen($string2)) {
print 'String2 continues with' . substr($string2, $pos);
} else {
print 'String1 and String2 are equal';
}
string strpbrk ( string $haystack , string $char_list )
strpbrk() searches the haystack string for a char_list.
The return value is the substring of $haystack which begins at the first matched character.
As an API function it should be zippy. Then loop through once, looking for offset zero of the returned string to obtain your offset.
Related
Can you think of any regular expression that resolves these similarities in PHP? The idea is to get a match without considering the last letters.
<?php
$word1 = 'happyness';
$word2 = 'happys';
if (substr($word1, 0, -4) == substr($word2, 0, -1))
{
echo 'same word1';
}
$word1 = 'kisses';
$word2 = 'kiss';
if (substr($word1, 0, -2) == $word2)
{
echo 'same word2';
}
$word1 = 'consonant';
$word2 = 'consonan';
if (substr($word1, 0, -1) == $word2)
{
echo 'same word3';
}
By putting the words together like happys happyness and capturing as many word characters from word 1 as word 2 matches. See this demo at regex101. Use it with the i flag for casless matching.
^(\w+)\w* \1
To use this in PHP with preg_match see this PHP demo at tio.run
preg_match('/^(\w+)\w* \1/i', preg_quote($word1,'/')." ".preg_quote($word2,'/'), $out);
where $out[1] holds the captures or $out would be an empty array if there wasn't a match.
You could use a small helper function, the first function just matches up to the length of the second string, so doesn't care how many characters it truncates. The main code works similar to your code except it uses the length of the second value as the length of the substring to take...
function match( string $a, string $b ) {
return substr($a, 0, strlen($b)) === $b;
}
This function is slightly more complicated as it takes into account a maximum gap length...
function match( string $a, string $b, int $length = 3 ) {
$len = max(strlen($a)-$length, strlen($b));
return substr($a, 0, $len) === $b;
}
So call it something along the lines of
$word1 = 'happyness';
$word2 = 'happys';
if (match($word1,$word2))
{
echo 'same word1';
}
You can use preg_match to match these data with regex as /^word2/ against word1. So regex would check if word1 starts with word2 or not, because of ^ symbol at the start.
It's always better to preg_quote() before matching to escape regex meta characters for accurate results.
<?php
$tests = [
[
'happyness',
'happys'
],
[
'kisses',
'kiss'
],
[
'consonant',
'consonan'
]
];
$filtered = array_filter($tests,function($values){
$values[1] = preg_quote($values[1]);
return preg_match("/^$values[1]/",$values[0]) === 1;
});
print_r($filtered);
Demo: https://3v4l.org/SLf15
You could also do a small function to find the similarity between the given 2 words. It could look like:
function similarity($word1, $word2)
{
$splittedWord1 = str_split($word1);
$splittedWord2 = str_split($word2);
$similarChars = array_intersect_assoc($splittedWord1, $splittedWord2);
return count($similarChars) / max(count($splittedWord1), count($splittedWord2));
}
var_dump(similarity('happyness', 'happys'));
var_dump(similarity('happyness', 'testhappys'));
var_dump(similarity('kisses', 'kiss'));
var_dump(similarity('consonant', 'consonan'));
The result would look like:
float(0.55555555555556)
int(0)
float(0.66666666666667)
float(0.88888888888889)
Based on the resulted percentage you could decide if the given words should be considered the same or not.
I'm not sure regex is the answer here.
You could try similar_text(), which returns the number of similar characters (and optionally sets a percentage value to a variable). Maybe if you consider the last two letters as non-important, you can see if the strlen() - $skippedCharacters is the same as what is matched. For example:
$skippedCharacters = 2;
$word1 = 'kisses';
$word2 = 'kiss';
$match = similar_text($word1, $word2);
if ($match + $skippedCharacters >= strlen($word1))
{
echo 'same word2';
}
You could use the PHP levenshtein function.
The levenshtein() function returns the Levenshtein distance between two strings. The Levenshtein distance is the number of characters you have to replace, insert or delete to transform string1 into string2.
$lev = levenshtein($word1, $word2);
The lower the number the bigger the similarity.
I want to count the frequency of occurrences of all the letters in a string. Say I have
$str = "cdcdcdcdeeeef";
I can use str_split and array_count_values to achieve this.
array_count_values(str_split($str));
Wondering if there is another way to do this without converting the string to an array? Thanks
You don't have to convert that into an array() you can use substr_count() to achieve the same.
substr_count — Count the number of substring occurrences
<?php
$str = "cdcdcdcdeeeef";
echo substr_count($str, 'c');
?>
PHP Manual
substr_count() returns the number of times the needle substring occurs in the haystack string. Please note that needle is case sensitive.
EDIT:
Sorry for the misconception, you can use count_chars to have a counted value of each character in a string. An example:
<?php
$str = "cdcdcdcdeeeef";
foreach (count_chars($str, 1) as $strr => $value) {
echo chr($strr) . " occurred a number of $value times in the string." . "<br>";
}
?>
PHP Manual: count_chars
count_chars — Return information about characters used in a string
There is a php function that returns information about characters used in a string: count_chars
Well it might not be what you are looking for, because according to http://php.net/manual/en/function.count-chars.php it
Counts the number of occurrences of every byte-value (0..255) in
string and returns it in various ways
Example from same link (http://php.net/manual/en/function.count-chars.php):
<?php
$data = "Two Ts and one F.";
foreach (count_chars($data, 1) as $i => $val) {
echo "There were $val instance(s) of \"" , chr($i) , "\" in the string.\n";
}
?>
class Strings
{
public function count_of_each_letter($string){
$string_chars = array();
$length_ = mb_strlen($string,'UTF-8');
if($length_== 0){return null;}
else{
for ($i=0; $i < $length_; $i++) {
$each_letter = mb_substr($string,0,1,'UTF-8');
$string_chars[$each_letter] = mb_substr_count($string, $each_letter);
$string = str_replace($each_letter,"", $string);
$length_ = mb_strlen($string,'UTF-8');
}
$string = '';
foreach ($string_chars as $key => $value) {
$string .= $key.'-'.$value.'<br>';
}
return $string;
}
}
}
$new_counter = new Strings();
echo $new_counter::count_of_each_letter('ختواجرایآهنگبهصورتتکنفرهنمود.اوازسال۱۹۷۲تا۱۹۷۵،۴آلبوماستودیوییتکنفرهمنتشرکردوحتینامزدیکجایزهاسکارهمشد.درهمینسالهاگروهاقدامبهبرگزاریتورکنسرتدراروپاونیزیکتورجهانیکردند.جکسونفایودرسال۱۹۷۵ازشرکتنشرموسیقیموتاونرکوردزبهسیبیاسرکوردزنقلمکانکردند.گروههمچنانبهاجراهایبینالمللیخودادامهمیدادواز۱۹۷۶تا۱۹۸۴(از۱۵تا۲۴سالگیمایکل)ششآلبوماستودیوییدیگرمنتشرکرد.درهمینمدت،مایکلترانهسرایاصلیگروهجکسونزبود.Cantional,oderGesangbuchAugsburgischerKonfessionin1627.ohannSebastianBachcomposedafour-partsetting,BWV285,whichiswithouttext.twaspublishedasNo.196inthecollectionofchoralesbyJohannPhilippKirnbergerundCarlPhilippEmanufread');
you can do it by following way as well:
$str = 'aabbbccccdddeeedfff';
$arr = str_split($str);
$result = array_count_values($arr);
$string = http_build_query($result,'','');
echo str_replace('=','',$string);
this is what I try to get:
My longest text to test When I search for e.g. My I should get My longest
I tried it with this function to get first the complete length of the input and then I search for the ' ' to cut it.
$length = strripos($text, $input) + strlen($input)+2;
$stringpos = strripos($text, ' ', $length);
$newstring = substr($text, 0, strpos($text, ' ', $length));
But this only works first time and then it cuts after the current input, means
My lon is My longest and not My longest text.
How I must change this to get the right result, always getting the next word. Maybe I need a break, but I cannot find the right solution.
UPDATE
Here is my workaround till I find a better solution. As I said working with array functions does not work, since part words should work. So I extended my previous idea a bit. Basic idea is to differ between first time and the next. I improved the code a bit.
function get_title($input, $text) {
$length = strripos($text, $input) + strlen($input);
$stringpos = stripos($text, ' ', $length);
// Find next ' '
$stringpos2 = stripos($text, ' ', $stringpos+1);
if (!$stringpos) {
$newstring = $text;
} else if ($stringpos2) {
$newstring = substr($text, 0, $stringpos2);
} }
Not pretty, but hey it seems to work ^^. Anyway maybe someone of you have a better solution.
You can try using explode
$string = explode(" ", "My longest text to test");
$key = array_search("My", $string);
echo $string[$key] , " " , $string[$key + 1] ;
You can take i to the next level using case insensitive with preg_match_all
$string = "My longest text to test in my school that is very close to mY village" ;
var_dump(__search("My",$string));
Output
array
0 => string 'My longest' (length=10)
1 => string 'my school' (length=9)
2 => string 'mY village' (length=10)
Function used
function __search($search,$string)
{
$result = array();
preg_match_all('/' . preg_quote($search) . '\s+\w+/i', $string, $result);
return $result[0];
}
There are simpler ways to do that. String functions are useful if you don't want to look for something specific, but cut out a pre-defined length of something. Else use a regular expression:
preg_match('/My\s+\w+/', $string, $result);
print $result[0];
Here the My looks for the literal first word. And \s+ for some spaces. While \w+ matches word characters.
This adds some new syntax to learn. But less brittle than workarounds and lengthier string function code to accomplish the same.
An easy method would be to split it on whitespace and grab the current array index plus the next one:
// Word to search for:
$findme = "text";
// Using preg_split() to split on any amount of whitespace
// lowercasing the words, to make the search case-insensitive
$words = preg_split('/\s+/', "My longest text to test");
// Find the word in the array with array_search()
// calling strtolower() with array_map() to search case-insensitively
$idx = array_search(strtolower($findme), array_map('strtolower', $words));
if ($idx !== FALSE) {
// If found, print the word and the following word from the array
// as long as the following one exists.
echo $words[$idx];
if (isset($words[$idx + 1])) {
echo " " . $words[$idx + 1];
}
}
// Prints:
// "text to"
I've got a string and I'd like to get everything after a certain value. The string always starts off with a set of numbers and then an underscore. I'd like to get the rest of the string after the underscore. So for example if I have the following strings and what I'd like returned:
"123_String" -> "String"
"233718_This_is_a_string" -> "This_is_a_string"
"83_Another Example" -> "Another Example"
How can I go about doing something like this?
The strpos() finds the offset of the underscore, then substr grabs everything from that index plus 1, onwards.
$data = "123_String";
$whatIWant = substr($data, strpos($data, "_") + 1);
echo $whatIWant;
If you also want to check if the underscore character (_) exists in your string before trying to get it, you can use the following:
if (($pos = strpos($data, "_")) !== FALSE) {
$whatIWant = substr($data, $pos+1);
}
strtok is an overlooked function for this sort of thing. It is meant to be quite fast.
$s = '233718_This_is_a_string';
$firstPart = strtok( $s, '_' );
$allTheRest = strtok( '' );
Empty string like this will force the rest of the string to be returned.
NB if there was nothing at all after the '_' you would get a FALSE value for $allTheRest which, as stated in the documentation, must be tested with ===, to distinguish from other falsy values.
Here is the method by using explode:
$text = explode('_', '233718_This_is_a_string', 2)[1]; // Returns This_is_a_string
or:
$text = end((explode('_', '233718_This_is_a_string', 2)));
By specifying 2 for the limit parameter in explode(), it returns array with 2 maximum elements separated by the string delimiter. Returning 2nd element ([1]), will give the rest of string.
Here is another one-liner by using strpos (as suggested by #flu):
$needle = '233718_This_is_a_string';
$text = substr($needle, (strpos($needle, '_') ?: -1) + 1); // Returns This_is_a_string
I use strrchr(). For instance to find the extension of a file I use this function:
$string = 'filename.jpg';
$extension = strrchr( $string, '.'); //returns "jpg"
Another simple way, using strchr() or strstr():
$str = '233718_This_is_a_string';
echo ltrim(strstr($str, '_'), '_'); // This_is_a_string
In your case maybe ltrim() alone will suffice:
echo ltrim($str, '0..9_'); // This_is_a_string
But only if the right part of the string (after _) does not start with numbers, otherwise it will also be trimmed.
if anyone needs to extract the first part of the string then can try,
Query:
$s = "This_is_a_string_233718";
$text = $s."_".substr($s, 0, strrpos($s, "_"));
Output:
This_is_a_string
$string = "233718_This_is_a_string";
$withCharacter = strstr($string, '_'); // "_This_is_a_string"
echo substr($withCharacter, 1); // "This_is_a_string"
In a single statement it would be.
echo substr(strstr("233718_This_is_a_string", '_'), 1); // "This_is_a_string"
If you want to get everything after certain characters and if those characters are located at the beginning of the string, you can use an easier solution like this:
$value = substr( '123_String', strlen( '123_' ) );
echo $value; // String
Use this line to return the string after the symbol or return the original string if the character does not occur:
$newString = substr($string, (strrpos($string, '_') ?: -1) +1);
Is there a nice way to iterate on the characters of a string? I'd like to be able to do foreach, array_map, array_walk, array_filter etc. on the characters of a string.
Type casting/juggling didnt get me anywhere (put the whole string as one element of array), and the best solution I've found is simply using a for loop to construct the array. It feels like there should be something better. I mean, if you can index on it shouldn't you be able to iterate as well?
This is the best I've got
function stringToArray($s)
{
$r = array();
for($i=0; $i<strlen($s); $i++)
$r[$i] = $s[$i];
return $r;
}
$s1 = "textasstringwoohoo";
$arr = stringToArray($s1); //$arr now has character array
$ascval = array_map('ord', $arr); //so i can do stuff like this
$foreach ($arr as $curChar) {....}
$evenAsciiOnly = array_filter( function($x) {return ord($x) % 2 === 0;}, $arr);
Is there either:
A) A way to make the string iterable
B) A better way to build the character array from the string (and if so, how about the other direction?)
I feel like im missing something obvious here.
Use str_split to iterate ASCII strings (since PHP 5.0)
If your string contains only ASCII (i.e. "English") characters, then use str_split.
$str = 'some text';
foreach (str_split($str) as $char) {
var_dump($char);
}
Use mb_str_split to iterate Unicode strings (since PHP 7.4)
If your string might contain Unicode (i.e. "non-English") characters, then you must use mb_str_split.
$str = 'μυρτιὲς δὲν θὰ βρῶ';
foreach (mb_str_split($str) as $char) {
var_dump($char);
}
Iterate string:
for ($i = 0; $i < strlen($str); $i++){
echo $str[$i];
}
If your strings are in Unicode you should use preg_split with /u modifier
From comments in php documentation:
function mb_str_split( $string ) {
# Split at all position not after the start: ^
# and not before the end: $
return preg_split('/(?<!^)(?!$)/u', $string );
}
You can also just access $s1 like an array, if you only need to access it:
$s1 = "hello world";
echo $s1[0]; // -> h
For those who are looking for the fastest way to iterate over strings in php, Ive prepared a benchmark testing.
The first method in which you access string characters directly by specifying its position in brackets and treating string like an array:
$string = "a sample string for testing";
$char = $string[4] // equals to m
I myself thought the latter is the fastest method, but I was wrong.
As with the second method (which is used in the accepted answer):
$string = "a sample string for testing";
$string = str_split($string);
$char = $string[4] // equals to m
This method is going to be faster cause we are using a real array and not assuming one to be an array.
Calling the last line of each of the above methods for 1000000 times lead to these benchmarking results:
Using string[i]
0.24960017204285 Seconds
Using str_split
0.18720006942749 Seconds
Which means the second method is way faster.
Most of the answers forgot about non English characters !!!
strlen counts BYTES, not characters, that is why it is and it's sibling functions works fine with English characters, because English characters are stored in 1 byte in both UTF-8 and ASCII encodings, you need to use the multibyte string functions mb_*
This will work with any character encoded in UTF-8
// 8 characters in 12 bytes
$string = "abcdأبتث";
$charsCount = mb_strlen($string, 'UTF-8');
for($i = 0; $i < $charsCount; $i++){
$char = mb_substr($string, $i, 1, 'UTF-8');
var_dump($char);
}
This outputs
string(1) "a"
string(1) "b"
string(1) "c"
string(1) "d"
string(2) "أ"
string(2) "ب"
string(2) "ت"
string(2) "ث"
Expanded from #SeaBrightSystems answer, you could try this:
$s1 = "textasstringwoohoo";
$arr = str_split($s1); //$arr now has character array
Hmm... There's no need to complicate things. The basics work great always.
$string = 'abcdef';
$len = strlen( $string );
$x = 0;
Forward Direction:
while ( $len > $x ) echo $string[ $x++ ];
Outputs: abcdef
Reverse Direction:
while ( $len ) echo $string[ --$len ];
Outputs: fedcba
// Unicode Codepoint Escape Syntax in PHP 7.0
$str = "cat!\u{1F431}";
// IIFE (Immediately Invoked Function Expression) in PHP 7.0
$gen = (function(string $str) {
for ($i = 0, $len = mb_strlen($str); $i < $len; ++$i) {
yield mb_substr($str, $i, 1);
}
})($str);
var_dump(
true === $gen instanceof Traversable,
// PHP 7.1
true === is_iterable($gen)
);
foreach ($gen as $char) {
echo $char, PHP_EOL;
}