Get the current + the next word in a string - php

this is what I try to get:
My longest text to test When I search for e.g. My I should get My longest
I tried it with this function to get first the complete length of the input and then I search for the ' ' to cut it.
$length = strripos($text, $input) + strlen($input)+2;
$stringpos = strripos($text, ' ', $length);
$newstring = substr($text, 0, strpos($text, ' ', $length));
But this only works first time and then it cuts after the current input, means
My lon is My longest and not My longest text.
How I must change this to get the right result, always getting the next word. Maybe I need a break, but I cannot find the right solution.
UPDATE
Here is my workaround till I find a better solution. As I said working with array functions does not work, since part words should work. So I extended my previous idea a bit. Basic idea is to differ between first time and the next. I improved the code a bit.
function get_title($input, $text) {
$length = strripos($text, $input) + strlen($input);
$stringpos = stripos($text, ' ', $length);
// Find next ' '
$stringpos2 = stripos($text, ' ', $stringpos+1);
if (!$stringpos) {
$newstring = $text;
} else if ($stringpos2) {
$newstring = substr($text, 0, $stringpos2);
} }
Not pretty, but hey it seems to work ^^. Anyway maybe someone of you have a better solution.

You can try using explode
$string = explode(" ", "My longest text to test");
$key = array_search("My", $string);
echo $string[$key] , " " , $string[$key + 1] ;
You can take i to the next level using case insensitive with preg_match_all
$string = "My longest text to test in my school that is very close to mY village" ;
var_dump(__search("My",$string));
Output
array
0 => string 'My longest' (length=10)
1 => string 'my school' (length=9)
2 => string 'mY village' (length=10)
Function used
function __search($search,$string)
{
$result = array();
preg_match_all('/' . preg_quote($search) . '\s+\w+/i', $string, $result);
return $result[0];
}

There are simpler ways to do that. String functions are useful if you don't want to look for something specific, but cut out a pre-defined length of something. Else use a regular expression:
preg_match('/My\s+\w+/', $string, $result);
print $result[0];
Here the My looks for the literal first word. And \s+ for some spaces. While \w+ matches word characters.
This adds some new syntax to learn. But less brittle than workarounds and lengthier string function code to accomplish the same.

An easy method would be to split it on whitespace and grab the current array index plus the next one:
// Word to search for:
$findme = "text";
// Using preg_split() to split on any amount of whitespace
// lowercasing the words, to make the search case-insensitive
$words = preg_split('/\s+/', "My longest text to test");
// Find the word in the array with array_search()
// calling strtolower() with array_map() to search case-insensitively
$idx = array_search(strtolower($findme), array_map('strtolower', $words));
if ($idx !== FALSE) {
// If found, print the word and the following word from the array
// as long as the following one exists.
echo $words[$idx];
if (isset($words[$idx + 1])) {
echo " " . $words[$idx + 1];
}
}
// Prints:
// "text to"

Related

PHP word censor with keeping the original caps

We want to censor certain words on our site but each word has different censored output.
For example:
PHP => P*P, javascript => j*vascript
(However not always the second letter.)
So we want a simple "one star" censor system but with keeping the original caps. The datas coming from the database are uncensored so we need the fastest way that possible.
$data="Javascript and php are awesome!";
$word[]="PHP";
$censor[]="H";//the letter we want to replace
$word[]="javascript";
$censor[]="a"//but only once (j*v*script would look wierd)
//Of course if it needed we can use the full censored word in $censor variables
Expected value:
J*vascript and p*p are awesome!
Thanks for all the answers!
You can put your censored words in key-based array, and value of the array should be the position of what char is replaced with * (see $censor array example bellow).
$string = 'JavaSCRIPT and pHp are testing test-ground for TEST ŠĐČĆŽ ŠĐčćŽ!';
$censor = [
'php' => 2,
'javascript' => 2,
'test' => 3,
'šđčćž' => 4,
];
function stringCensorSlow($string, array $censor) {
foreach ($censor as $word => $position) {
while (($pos = mb_stripos($string, $word)) !== false) {
$string =
mb_substr($string, 0, $pos + $position - 1) .
'*' .
mb_substr($string, $pos + $position);
}
}
return $string;
}
function stringCensorFast($string, array $censor) {
$pattern = [];
foreach ($censor as $word => $position) {
$word = '~(' . mb_substr($word, 0, $position - 1) . ')' . mb_substr($word, $position - 1, 1) . '(' . mb_substr($word, $position) . ')~iu';
$pattern[$word] = '$1*$2';
}
return preg_replace(array_keys($pattern), array_values($pattern), $string);
}
Use example :
echo stringCensorSlow($string, $censor);
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
echo stringCensorFast($string, $censor) . "\n";
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
Speed test :
foreach (['stringCensorSlow', 'stringCensorFast'] as $func) {
$time = microtime(true);
for ($i = 0; $i < 10000; $i++) {
$func($string, $censor);
}
$time = microtime(true) - $time;
echo "{$func}() took $time\n";
}
output on my localhost was :
stringCensorSlow() took 1.9752140045166
stringCensorFast() took 0.11587309837341
Upgrade #1: added multibyte character safe.
Upgrade #2: added example for preg_replace, which is faster than mb_substr. Tnx to AbsoluteƵERØ
Upgrade #3: added speed test loop and result on my local PC machine.
Make an array of words and replacements. This should be your fastest option in terms of processing, but a little more methodical to setup. Remember when you're setting up your patterns to use the i modifier to make each pattern case insensitive. You could ultimately pull these from a database into the arrays. I've hard-coded the arrays here for the example.
<!DOCTYPE html>
<html>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<?php
$word_to_alter = array(
'!(j)a(v)a(script)(s|ing|ed)?!i',
'!(p)h(p)!i',
'!(m)y(sql)!i',
'!(p)(yth)o(n)!i',
'!(r)u(by)!i',
'!(ВЗЛ)О(М)!iu',
);
$alteration = array(
'$1*$2*$3$4',
'$1*$2',
'$1*$2',
'$1$2*$3',
'$1*$2',
'$1*$2',
);
$string = "Welcome to the world of programming. You can learn PHP, MySQL, Python, Ruby, and Javascript all at your own pace. If you know someone who uses javascripting in their daily routine you can ask them about becoming a programmer who writes JavaScripts. взлом прохладно";
$newstring = preg_replace($word_to_alter,$alteration,$string);
echo $newstring;
?>
</html>
Output
Welcome to the world of programming. You can learn P*P, M*SQL, Pyth*n,
R*by, and J*v*script all at your own pace. If you know someone who
uses j*v*scripting in their daily routine you can ask them about
becoming a programmer who writes J*v*Scripts. взл*м прохладно
Update
It works the same with UTF-8 characters, note that you have to specify a u modifier to make the pattern treated as UTF-8.
u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP
4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
Why not just use a little helper function and pass it a word and the desired censor?
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
echo censorWord("Javascript", "a"); // returns J*avascript
echo censorWord("PHP", "H"); // returns P*P
Then you can check the word against your wordlist and if it is a word that should be censored, you can pass it to the function. Then, you also always have the original word as well as the censored one to play with or put back in your sentence.
This would also make it easy to change the number of letters censored by just changing the offset in the preg_replace. All you have to do is keep an array of words, explode the sentence on spaces or something, and then check in_array. If it is in the array, send it to censorWord().
Demo
And here's a more complete example doing exactly what you said in the OP.
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
$word_list = ['php','javascript'];
$data = "Javascript and php are awesome!";
$words = explode(" ", $data);
// pass each word by reference so it can be modified inside our array
foreach($words as &$word) {
if(in_array(strtolower($word), $word_list)) {
// this just passes the second letter of the word
// as the $censor argument
$word = censorWord($word, $word[1]);
}
}
echo implode(" ", $words); // returns J*vascript and p*p are awesome!
Another Demo
You could store a lowercase list of the censored words somewhere, and if you're okay with starring the second letter every time, do something like this:
if (in_array(strtolower($word), $censored_words)) {
$word = substr($word, 0, 1) . "*" . substr($word, 2);
}
If you want to change the first occurrence of a letter, you could do something like:
$censored_words = array('javascript' => 'a', 'php' => 'h', 'ruby' => 'b');
$lword = strtolower($word);
if (in_array($lword, array_keys($censored_words))) {
$ind = strpos($lword, $censored_words[$lword]);
$word = substr($word, 0, $ind) . "*" . substr($word, $ind + 1);
}
This is what I would do:
Create a simple database (text file) and make a "table" of all your censored words and expected censored results. E.G.:
PHP --- P*P
javascript --- j*vascript
HTML --- HT*L
Write PHP code to compare the database information to your simple censored file. You will have to use array explode to create an array of only words. Something like this:
/* Opening database of censored words */
$filename = "/files/censored_words.txt";
$file = fopen( $filename, "r" );
if( $file == false )
{
echo ( "Error in opening file" );
exit();
}
/* Creating an array of words from string*/
$data = explode(" ", $data); // What was "Javascript and PHP are awesome!" has
// become "Javascript", "and", "PHP", "are",
// "awesome!". This is useful.
If your script finds matching words, replace the word in your data with the censored word from your list. You would have to delimit the file first by \r\n and finally by ---. (Or whatever you choose for separating your table with.)
Hope this helped!

PHP - Turn this string: "adc 25...123.50 xyz" into 2 variables: "25" and "123.50"?

The title almost much sums what i am trying to accomplish.
I have a string that could consist of letters in the alphabet or, numbers or characters like ")" and "*". It may also include a numeric string separated by three dots "...", e.g. "25...123.50".
An example of this string could be:
peaches* 25...123.50 +("apples") or -(peaches*) apples* 25...123.50
Now, what i would like to do is capture the numbers before and after the three dots, so i end up with 2 variables, 25 and 123.50. I would then like to trim the string so that i end up with a string that excludes the number values:
peaches* +("apples") or -(peaches*) apples*
So essentially:
$string = 'peaches* 25...123.50 +("apples")';
if (preg_match("/\.\.\./", $string ))
{
# How do i get the left value (could or could not be a decimal, using .)
$from = 25;
# How do i get the right value (could or could not be a decimal, using .)
$to = 123.50;
# How do i remove the value "here...here" is this right?
$clean = preg_replace('/'.$from.'\.\.\.'.$to.'/', '', $string);
$clean = preg_replace('/ /', ' ', $string);
}
If anyone could provide me with some input on the best way to go about this complicated task it would be greatly appreciated! Any suggestions, advice, input, feedback or comments are most welcome, Thank you!
This preg_match should work:
$str = 'peaches* 25...123.50 +("apples")';
if (preg_match('~(\d+(?:\.\d+)?)\.{3}(\d+(?:\.\d+)?)~', $str, $arr))
print_r($arr);
Pseudo code
In a loop:
Perform a strpos for "..." and substr at that position. Then go back from the end of that substring (character by character), checking to see if each is_numeric or a period. On the first non-numeric/non-period occurrence, you grab a substring from the beginning of the original string to that point (store it temporarily). Then start checking for is_numeric or period in the other direction. Grab a substring and add it to the other substring you stored. Repeat.
It's not a regex, but it will accomplish the same goal nonetheless.
Some php
$my_string = "blah blah abc25.4...123.50xyz blah blah etc";
$found = 1;
while($found){
$found = $cursor = strpos($my_string , "...");
if(!empty($found)){
//Go left
$char = ".";
while(is_numeric($char) || $char == "."){
$cursor--;
$char = substr($my_string , $cursor, 1);
}
$left_substring = substr($my_string , 1, $cursor);
//Go right
$cursor = $found + 2;
$char = ".";
while(is_numeric($char) || $char == "."){
$cursor++;
$char = substr($my_string , $cursor, 1);
}
$right_substring = substr($my_string , $cursor);
//Combine the left and right
$my_string = $left_substring . $right_substring;
}
}
echo $my_string;

Check stock tickers in string against array

Consider the following array which holds all US stock tickers, ordered by length:
$tickers = array('AAPL', 'AA', 'BRK.A', 'BRK.B', 'BAE', 'BA'); // etc...
I want to check a string for all possible matches. Tickers are written with or without a "$" concatenated to the front:
$string = "Check out $AAPL and BRK.A, BA and BAE.B - all going up!";
All tickers are to be labeled like: {TICKER:XX}. The expected output would be:
Check out {TICKER:AAPL} and {TICKER:BRK.A} and BAE.B - all going up!
So tickers should be checked against the $tickers array and matched both if they are followed by a space or a comma. Until now, I have been using the following:
preg_replace('/\$([a-zA-Z.]+)/', ' {TICKER:$1} ', $string);
so I didn't have to check against the $tickers array. It was assumed that all tickers started with "$", but this only appears to be the convention in about 80% of the cases. Hence, the need for an updated filter.
My question being: is there a simple way to adjust the regex to comply with the new requirement or do I need to write a new function, as I was planning first:
function match_tickers($string) {
foreach ($tickers as $ticker) {
// preg_replace with $
// preg_replace without $
}
}
Or can this be done in one go?
Just make the leading dollar sign optional, using ? (zero or 1 matches). Then you can check for legal trailing characters using the same technique. A better way to go about it would be to explode your input string and check/replace each substring against the ticker collection, then reconstruct the input string.
function match_tickers($string) {
$aray = explode( " ", $string );
foreach ($aray as $word) {
// extract any ticker symbol
$symbol = preg_replace( '/^\$?([A-Za-z]?\.?[A-Za-z])\W*$/', '$1', $word );
if (in_array($symbol,$tickers)) { // symbol, replace it
array_push( $replacements, preg_replace( '/^\$?([A-Za-z]?\.?[A-Za-z])(\W*)$/', '{TICKER:$1}$2', $word ) );
}
else { // not a symbol, just output it normally
array_push( $replacements, $word );
}
}
return implode( " ", $replacements );
}
I think just a slight change to your regex should do the trick:
\$?([a-zA-Z.]+)
i added "?" in front of the "$", which means that it can appear 0 or 1 times
You can use a single foreach loop on your array to replace the ticker items in your string.
$tickers = array('AAPL', 'AA', 'BRK.A', 'BRK.B', 'BAE', 'BA');
$string = 'Check out $AAPL and BRK.A, BA and BAE.B - all going up!';
foreach ($tickers as $ticker) {
$string = preg_replace('/(\$?)\b('.$ticker.')\b(?!\.[A-Z])/', '{TICKER:$2}', $string);
}
echo $string;
will output
Check out {TICKER:AAPL} and {TICKER:BRK.A}, {TICKER:BA} and BAE.B -
all going up!
Adding ? after the $ sign will also accept words, i.e. 'out'
preg_replace accepts array as a pattern, so if you change your $tickers array to:
$tickers = array('/AAPL/', '/AA/', '/BRK.A/', '/BRK.B/', '/BAE/', '/BA/');
then this should do the trick:
preg_replace($tickers, ' {TICKER:$1} ', $string);
This is according to http://php.net/manual/en/function.preg-replace.php

How can we split a sentence

I have written the PHP code for getting some part of a given dynamic sentence, e.g. "this is a test sentence":
substr($sentence,0,12);
I get the output:
this is a te
But i need it stop as a full word instead of splitting a word:
this is a
How can I do that, remembering that $sentence isn't a fixed string (it could be anything)?
use wordwrap
If you're using PHP4, you can simply use split:
$resultArray = split($sentence, " ");
Every element of the array will be one word. Be careful with punctuation though.
explode would be the recommended method in PHP5:
$resultArray = explode(" ", $sentence);
first. use explode on space. Then, count each part + the total assembled string and if it doesn't go over the limit you concat it onto the string with a space.
Try using explode() function.
In your case:
$expl = explode(" ",$sentence);
You'll get your sentence in an array. First word will be $expl[0], second - $expl[1] and so on. To print it out on the screen use:
$n = 10 //words to print
for ($i=0;$i<=$n;$i++) {
print $expl[$i]." ";
}
Create a function that you can re-use at any time. This will look for the last space if the given string's length is greater than the amount of characters you want to trim.
function niceTrim($str, $trimLen) {
$strLen = strlen($str);
if ($strLen > $trimLen) {
$trimStr = substr($str, 0, $trimLen);
return substr($trimStr, 0, strrpos($trimStr, ' '));
}
return $str;
}
$sentence = "this is a test sentence";
echo niceTrim($sentence, 12);
This will print
this is a
as required.
Hope this is the solution you are looking for!
this is just psudo code not php,
char[] sentence="your_sentence";
string new_constructed_sentence="";
string word="";
for(i=0;i<your_limit;i++){
character=sentence[i];
if(character==' ') {new_constructed_sentence+=word;word="";continue}
word+=character;
}
new_constructed_sentence is what you want!!!

PHP: Best way to extract text within parenthesis?

What's the best/most efficient way to extract text set between parenthesis? Say I wanted to get the string "text" from the string "ignore everything except this (text)" in the most efficient manner possible.
So far, the best I've come up with is this:
$fullString = "ignore everything except this (text)";
$start = strpos('(', $fullString);
$end = strlen($fullString) - strpos(')', $fullString);
$shortString = substr($fullString, $start, $end);
Is there a better way to do this? I know in general using regex tends to be less efficient, but unless I can reduce the number of function calls, perhaps this would be the best approach? Thoughts?
i'd just do a regex and get it over with. unless you are doing enough iterations that it becomes a huge performance issue, it's just easier to code (and understand when you look back on it)
$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
print $match[1];
So, actually, the code you posted doesn't work: substr()'s parameters are $string, $start and $length, and strpos()'s parameters are $haystack, $needle. Slightly modified:
$str = "ignore everything except this (text)";
$start = strpos($str, '(');
$end = strpos($str, ')', $start + 1);
$length = $end - $start;
$result = substr($str, $start + 1, $length - 1);
Some subtleties: I used $start + 1 in the offset parameter in order to help PHP out while doing the strpos() search on the second parenthesis; we increment $start one and reduce $length to exclude the parentheses from the match.
Also, there's no error checking in this code: you'll want to make sure $start and $end do not === false before performing the substr.
As for using strpos/substr versus regex; performance-wise, this code will beat a regular expression hands down. It's a little wordier though. I eat and breathe strpos/substr, so I don't mind this too much, but someone else may prefer the compactness of a regex.
Use a regular expression:
if( preg_match( '!\(([^\)]+)\)!', $text, $match ) )
$text = $match[1];
i think this is the fastest way to get the words between the first parenthesis in a string.
$string = 'ignore everything except this (text)';
$string = explode(')', (explode('(', $string)[1]))[0];
echo $string;
The already posted regex solutions - \((.*?)\) and \(([^\)]+)\) - do not return the innermost strings between an open and close brackets. If a string is Text (abc(xyz 123) they both return a (abc(xyz 123) as a whole match, and not (xyz 123).
The pattern that matches substrings (use with preg_match to fetch the first and preg_match_all to fetch all occurrences) in parentheses without other open and close parentheses in between is, if the match should include parentheses:
\([^()]*\)
Or, you want to get values without parentheses:
\(([^()]*)\) // get Group 1 values after a successful call to preg_match_all, see code below
\(\K[^()]*(?=\)) // this and the one below get the values without parentheses as whole matches
(?<=\()[^()]*(?=\)) // less efficient, not recommended
Replace * with + if there must be at least 1 char between ( and ).
Details:
\( - an opening round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class)
[^()]* - zero or more characters other than ( and ) (note these ( and ) do not have to be escaped inside a character class as inside it, ( and ) cannot be used to specify a grouping and are treated as literal parentheses)
\) - a closing round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class).
The \(\K part in an alternative regex matches ( and omits from the match value (with the \K match reset operator). (?<=\() is a positive lookbehind that requires a ( to appear immediately to the left of the current location, but the ( is not added to the match value since lookbehind (lookaround) patterns are not consuming. (?=\() is a positive lookahead that requires a ) char to appear immediately to the right of the current location.
PHP code:
$fullString = 'ignore everything except this (text) and (that (text here))';
if (preg_match_all('~\(([^()]*)\)~', $fullString, $matches)) {
print_r($matches[0]); // Get whole match values
print_r($matches[1]); // Get Group 1 values
}
Output:
Array ( [0] => (text) [1] => (text here) )
Array ( [0] => text [1] => text here )
This is a sample code to extract all the text between '[' and ']' and store it 2 separate arrays(ie text inside parentheses in one array and text outside parentheses in another array)
function extract_text($string)
{
$text_outside=array();
$text_inside=array();
$t="";
for($i=0;$i<strlen($string);$i++)
{
if($string[$i]=='[')
{
$text_outside[]=$t;
$t="";
$t1="";
$i++;
while($string[$i]!=']')
{
$t1.=$string[$i];
$i++;
}
$text_inside[] = $t1;
}
else {
if($string[$i]!=']')
$t.=$string[$i];
else {
continue;
}
}
}
if($t!="")
$text_outside[]=$t;
var_dump($text_outside);
echo "\n\n";
var_dump($text_inside);
}
Output:
extract_text("hello how are you?");
will produce:
array(1) {
[0]=>
string(18) "hello how are you?"
}
array(0) {
}
extract_text("hello [http://www.google.com/test.mp3] how are you?");
will produce
array(2) {
[0]=>
string(6) "hello "
[1]=>
string(13) " how are you?"
}
array(1) {
[0]=>
string(30) "http://www.google.com/test.mp3"
}
This function may be useful.
public static function getStringBetween($str,$from,$to, $withFromAndTo = false)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
if ($withFromAndTo)
return $from . substr($sub,0, strrpos($sub,$to)) . $to;
else
return substr($sub,0, strrpos($sub,$to));
}
$inputString = "ignore everything except this (text)";
$outputString = getStringBetween($inputString, '(', ')'));
echo $outputString;
//output will be test
$outputString = getStringBetween($inputString, '(', ')', true));
echo $outputString;
//output will be (test)
strpos() => which is used to find the position of first occurance in a string.
strrpos() => which is used to find the position of first occurance in a string.
function getStringsBetween($str, $start='[', $end=']', $with_from_to=true){
$arr = [];
$last_pos = 0;
$last_pos = strpos($str, $start, $last_pos);
while ($last_pos !== false) {
$t = strpos($str, $end, $last_pos);
$arr[] = ($with_from_to ? $start : '').substr($str, $last_pos + 1, $t - $last_pos - 1).($with_from_to ? $end : '');
$last_pos = strpos($str, $start, $last_pos+1);
}
return $arr; }
this is a little improvement to the previous answer that will return all patterns in array form:
getStringsBetween('[T]his[] is [test] string [pattern]') will return:

Categories