How to compare two very large strings [duplicate] - php

This question already has answers here:
Closed 10 years ago.
How can I compare the two large strings of size 50Kb each using php. I want to highlight the differentiating bits.

Differences between two strings can also be found using XOR:
$s = 'the sky is falling';
$t = 'the pie is failing';
$d = $s ^ $t;
echo $s, "\n";
for ($i = 0, $n = strlen($d); $i != $n; ++$i) {
echo $d[$i] === "\0" ? ' ' : '#';
}
echo "\n$t\n";
Output:
the sky is falling
### #
the pie is failing
The XOR operation will result in a string that has '\0' where both strings are the same and something not '\0' if they're different. It won't be faster than just comparing both strings character by character, but it'd be useful if you want to just know the first character that's different by using strspn().

Do you want to output like diff?
Perhaps this is what you want https://github.com/paulgb/simplediff/blob/5bfe1d2a8f967c7901ace50f04ac2d9308ed3169/simplediff.php
ADDED:
Or if you want to highlight every character that is different, you can use a PHP script like this:
for($i=0;$i<strlen($string1);$i++){
if($string1[$i]!=$string2[$i]){
echo "Char $i is different ({$string1[$i]}!={$string2[$i]}<br />\n";
}
}
Perhaps if you can tell us in detail how you would like to compare, or give us some examples, it would be easier for us to decide the answer.

A little modification to #Alvin's script:
I tested it in my local server with a 50kb lorem ipsum string, i substituted all "a" to "4" and it highlight them. It runs pretty fast
<?php
$string1 = "This is a sample text to test a script to highlight the differences between 2 strings, so the second string will be slightly different";
$string2 = "This is 2 s4mple text to test a scr1pt to highlight the differences between 2 strings, so the first string will be slightly different";
for($i=0;$i<strlen($string1);$i++){
if($string1[$i]!=$string2[$i]){
$string3[$i] = "<mark>{$string1[$i]}</mark>";
$string4[$i] = "<mark>{$string2[$i]}</mark>";
}
else {
$string3[$i] = "{$string1[$i]}";
$string4[$i] = "{$string2[$i]}";
}
}
$string3 = implode("",$string3);
$string4 = implode("",$string4);
echo "$string3". "<br />". $string4;
?>

Related

Split a long string not using space

If I have sentences like this:
$msg = "hello how are you?are you fine?thanks.."
and I wish to seperate it into 3 (or whatever number).
So I'm doing this:
$msglen = strlen($msg);
$seperate = ($msglen /3);
$a = 0;
for($i=0;$i<3;$i++)
{
$seperate = substr($msg,$a,$seperate)
$a = $a + $seperate;
}
So the output should be..
hello how are
[a space here->] you?are you [<-a space here]
fine?thanks..
So is it possible to separate at middle of any word instead of having a space in front or end of the separated message?
Such as "thank you" -> "than" and "k you" instead of "thank" " you ".
Because I'm doing a convert function and with a space in front or end it will effect the convertion , and the space is needed for the conversion,so I can't ignore or delete it.
Thanks.
I take it you can't use trim because the message formed by the joined up strings must be unchanged?
That could get complicated. You could make something that tests for a space after the split, and if a space is detected, makes the split one character earlier. Fairly easy, but what if you have two spaces together? Or a single lettered word? You can of course recursively test this way, but then you may end up with split strings of lengths that are very different from each other.
You need to properly define the constraints you want this to function within.
Please state exactly what you want to do - do you want each section to be equal? Is the splitting in between words of a higher priority than this, so that the lengths do not matter much?
EDIT:
Then, if you aren't worried about the length, you could do something like this [starting with Eriks code and proceeding to change the lengths by moving around the spaces:
$msg = "hello how are you?are you fine?thanks..";
$parts = split_without_spaces ($msg, 3);
function split_without_spaces ($msg, $parts) {
$parts = str_split(trim($msg), ceil(strlen($msg)/$parts));
/* Used trim above to make sure that there are no spaces at the start
and end of the message, we can't do anything about those spaces */
// Looping to (count($parts) - 1) becaause the last part will not need manipulation
for ($i = 0; $i < (count($parts) - 1) ; $i++ ) {
$k = $i + 1;
// Checking the last character of the split part and the first of the next part for a space
if (substr($parts[$i], -1) == ' ' || $parts[$k][0] == ' ') {
// If we move characters from the first part to the next:
$num1 = 1;
$len1 = strlen($parts[$i]);
// Searching for the last two consecutive non-space characters
while ($parts[$i][$len1 - $num1] == ' ' || $parts[$i][$len1 - $num1 - 1] == ' ') {
$num1++;
if ($len1 - $num1 - 2 < 0) return false;
}
// If we move characters from the next part to the first:
$num2 = 1;
$len2 = strlen($parts[$k]);
// Searching for the first two consecutive non-space characters
while ($parts[$k][$num2 - 1] == ' ' || $parts[$k][$num2] == ' ') {
$num2++;
if ($num2 >= $len2 - 1) return false;
}
// Compare to see what we can do to move the lowest no of characters
if ($num1 > $num2) {
$parts[$i] .= substr($parts[$k], 0, $num2);
$parts[$k] = substr($parts[$k], -1 * ($len2 - $num2));
}
else {
$parts[$k] = substr($parts[$i], -1 * ($num1)) . $parts[$k];
$parts[$i] = substr($parts[$i], 0, $len1 - $num1);
}
}
}
return ($parts);
}
This takes care of multiple spaces and single lettered characters - however if they exist, the lengths of the parts may be very uneven. It could get messed up in extreme cases - if you have a string made up on mainly spaces, it could return one part as being empty, or return false if it can't manage the split at all. Please test it out thoroughly.
EDIT2:
By the way, it'd be far better for you to change your approach in some way :) I seriously doubt you'd actually have to use a function like this in practice. Well.. I hope you do actually have a solid reason to, it was somewhat fun coming up with it.
If you simply want to eliminate leading and trailing spaces, consider trim to be used on each result of your split.
If you want to split the string into exact thirds it is not known where the cut will be, maybe in a word, maybe between words.
Your code can be simplified to:
$msg = "hello how are you?are you fine?thanks..";
$parts = str_split($msg, ceil(strlen($msg)/3));
Note that ceil() is needed, otherwise you might get 4 elements out because of rounding.
You're probably looking for str_split, chunk_split or wordwrap.

counting Plagarism in PHP

Forgive me if this isn't a programming oriented question.
Lets say we have two sentences
[1]=This is a test idea
[2]=This is an experimental idea
If I jumble up [1]
[1]= a This idea test is
Would this count as plagiarism? What sort of logic do I have to apply to detect plagiarism.
I'm not making a complexed plagiarism service, but a rather simple one what can catch obvious plagiarism.
My logic is somewhat like this
<?php
$str1= "This is a test idea.";
$str2= "This is an experimental idea.";
echo "$str1<br>$str2<br>";
$str1Array = explode(" ",$str1);
$str2Array = explode(" ",$str2);
if(count($str1Array) > count($str2Array))
$max=count($str1Array);
else
$max=count($str2Array);
$word_seq = array();
$word_seq_history = array();
$c=0;
$plag_count=0;
for ($i = 0; $i < $max; $i++) {
$lev = levenshtein($str1Array[$i], $str2Array[$i]); // check for an exact match
if ($lev == 0) {
$c+=1;// (exact match)
//echo "<br>$c";
$word = $str1Array[$i];
array_push($word_seq,$word);
}
else
{
if($lev != 0){
if($c>=2)
$plag_count+= count($word_seq);
$current_seq = implode(" ", $word_seq);
array_push($word_seq_history,$current_seq);
echo $current_seq;
$c=0;
$word_seq= array();
}
}
}
echo "plag_count:";
echo $plag_count;
echo "max:";
echo $max;
echo "<br>" ;
echo ($plag_count/$max)*100;
?>
Output:
String 1: "This is a test idea."
String 2: "This is an experimental idea."
Words_Same:2 max:5
Plagiarism: 40%
Do I need to change it or is it fine the way it is?
What I would do to detect plagiarism in a very basic way is to first calibrate my system: ie first do a lot of comparisons with files from which you're sure aren't plagiated
1) compare a bunch of files with each other, detect the plagiarism rate with your function. Get out the words that are the most comonly used (let's say drop your rate up to XX%, trial and error here), put this words in your database and give them a weight of 0. Do this again without this words up to (less than XX%) (with regular expressions you can filter this words) and give them a weight of 1. And so on... Until you reach a plagiarism rate of nearly zero.
2) calculate the 'new' percent by sum(weight of words in your db that appear in the text)/ (the total weight of all your words) (and give the words that do not already come up in your database a weight of 10) = your rate
3) test it with plagiated stuff, if not ok, change a few parameters (weights)
I think this method, if used to check longer passages, will show a high level of correlation just because of common words, especially articles, prepositions, "be" verbs, and other common/overused words. If you're writing about a variety of subjects, be it code or Shakespeare, you're likely to run across a jargon sets that are common to many genuinely unique papers. I think you may need to look at an alternate approach. Have you done any research into plagiarism and its detection?

Php set value as a number

How do I output a value as a number in php? I suspect I have a php value but it is outputting as text and not as a number.
Thanks
Here is the code - Updated for David from question below
<?php
if (preg_match('/\-(\d+)\.asp$/', $pagename1, $a))
{
$pageNumber = $a[1];}
else
{ // failed to match number from URL}
}
?>
If I call it in: This code it does not seem to work.
$maxRows_rs_datareviews = 10;
$pageNum_rs_datareviews = $pagename1; <<<<<------ This is where I want to use it.
if (isset($_GET['pageNum_rs_datareviews'])) {
$pageNum_rs_datareviews = $_GET['pageNum_rs_datareviews'];
}
If I make page name a static number like 3 the code works, if I use $pagename1 it does not, this gives me the idea $pagename1 is not seen as a number?
My stupidity!!!! - I used $pagename1 instead of pageNumber
What kind of number? An integer, decimal, float, something else?
Probably the easiest method is to use printf(), eg
printf('The number %d is an integer', $number);
printf('The number %0.2f has two decimal places', $number);
This might be blindingly obvious but it looks like you want to use
$pageNum_rs_datareviews = $pageNumber;
and not
$pageNum_rs_datareviews = $pagename1;
echo (int)$number; // integer 123
echo (float)$number; // float 123.45
would be the easiest
I prefer to use number_format:
echo number_format(56.30124355436,2).'%'; // 56.30%
echo number_format(56.30124355436,0).'%'; // 56%
$num = 5;
echo $num;
Any output is text, since it's output. It doesn't matter what the type of what you're outputting is, since the human eye will see it as text. It's how you actually treat is in the code is what matters.
Converting (casting) a string to a number is different. You can do stuff like:
$num = (int) $string;
$num = intval($string);
Googling php string to number should give you a beautiful array of choices.
Edit: To scrape a number from something, you can use preg_match('/\d+/', $string, $number). $number will now contain all numbers in $string.

How to convert some character into numeric in php?

I need help to change a character in php.
I got some code from the web:
char dest='a';
int conv=(int)dest;
Can I use this code to convert a character into numeric? Or do you have any ideas?
I just want to show the result as a decimal number:
if null == 0
if A == 1
Use ord() to return the ascii value. Subtract 96 to return a number where a=1, b=2....
Upper and lower case letters have different ASCII values, so if you want to handle them the same, you can use strtolower() to convert upper case to lower case.
To handle the NULL case, simply use if($dest). This will be true if $dest is something other than NULL or 0.
PHP is a loosely typed language, so there is no need to declare the types. So char dest='a'; is incorrect. Variables have $ prefix in PHP and no type declaration, so it should be $dest = 'a';.
Live Example
<?php
function toNumber($dest)
{
if ($dest)
return ord(strtolower($dest)) - 96;
else
return 0;
}
// Let's test the function...
echo toNumber(NULL) . " ";
echo toNumber('a') . " ";
echo toNumber('B') . " ";
echo toNumber('c');
// Output is:
// 0 1 2 3
?>
PS:
You can look at the ASCII values here.
It does indeed work as in the sample, except that you should be using php syntax (and as a sidenote: the language that code you found most probably was, it did not do the same thing).
So:
$in = "123";
$out = (int)$in;
Afterwards the following will be true:
$out === 123
This may help you:
http://www.php.net/manual/en/function.ord.php
So, if you need the ASCII code you will need to do:
$dest = 'a';
$conv = ord($dest);
If you want something like:
a == 1
b == 2
.
.
.
you should do:
$dest = 'a';
$conv = ord($dest)-96;
For more info on the ASCII codes: http://www.asciitable.com/
And for the function ord: http://www.php.net/manual/en/function.ord.php
It's very hard to answer because it's not a real question but just a little bit of it.
But if you ask.
It seems you need some translation table, that defines links between letters and numbers
A -> 2
B -> 3
C -> 4
S -> 1
or whatever.
You can achieve this by using an array, where keys would be these letters and values - desired numbers.
$defects_arr = array(
'A' -> 2,
'B' -> 3,
'C' -> 4'
'S' -> 1
};
Thus, you can convert these letters to numbers
$letter = 'A';
$number = $defects_arr($letter);
echo $number; // outputs 1
But it still seems is not what you want.
Do these defect types have any verbose equivalents? If so, why not to use them instead of letters?
Telling the whole story instead of little bit of it will help you to avoid mistakes and will save a ton of time, both yours and those who to answer.
Out of this question, if you are looking for convert RT0005 to 5
$max = 'RT0005';
return base_convert($max,10,10);
// return 5

simplest, shortest way to count capital letters in a string with php?

I am looking for the shortest, simplest and most elegant way to count the number of capital letters in a given string.
function count_capitals($s) {
return mb_strlen(preg_replace('![^A-Z]+!', '', $s));
}
$str = "AbCdE";
preg_match_all("/[A-Z]/", $str); // 3
George Garchagudashvili Solution is amazing, but it fails if the lower case letters contain diacritics or accents.
So I did a small fix to improve his version, that works also with lower case accentuated letters:
public static function countCapitalLetters($string){
$lowerCase = mb_strtolower($string);
return strlen($lowerCase) - similar_text($string, $lowerCase);
}
You can find this method and lots of other string common operations at the turbocommons library:
https://github.com/edertone/TurboCommons/blob/70a9de1737d8c10e0f6db04f5eab0f9c4cbd454f/TurboCommons-Php/src/main/php/utils/StringUtils.php#L373
EDIT 2019
The method to count capital letters in turbocommons has evolved to a method that can count upper case and lower case characters on any string. You can check it here:
https://github.com/edertone/TurboCommons/blob/1e230446593b13a272b1d6a2903741598bb11bf2/TurboCommons-Php/src/main/php/utils/StringUtils.php#L391
Read more info here:
https://turbocommons.org/en/blog/2019-10-15/count-capital-letters-in-string-javascript-typescript-php
And it can also be tested online here:
https://turbocommons.org/en/app/stringutils/count-capital-letters
I'd give another solution, maybe not elegant, but helpful:
$mixed_case = "HelLo wOrlD";
$lower_case = strtolower($mixed_case);
$similar = similar_text($mixed_case, $lower_case);
echo strlen($mixed_case) - $similar; // 4
It's not the shortest, but it is arguably the simplest as a regex doesn't have to be executed. Normally I'd say this should be faster as the logic and checks are simple, but PHP always surprises me with how fast and slow some things are when compared to others.
function capital_letters($s) {
$u = 0;
$d = 0;
$n = strlen($s);
for ($x=0; $x<$n; $x++) {
$d = ord($s[$x]);
if ($d > 64 && $d < 91) {
$u++;
}
}
return $u;
}
echo 'caps: ' . capital_letters('HelLo2') . "\n";

Categories