How to scan text for multiple strings in php - php

In php, how do you scan text (in the form of user submitted messages) for multiple strings (in the form of other user names)?
Example, below a user submits a message, I want a way to "find" the strings 'user-one' and 'user-two' and send those strings into an array.
Hello this is a test message, can you see it #user-one, #user-two?

You can try
$message = "Hello this is a test message, can you see it #user-one, #user-two?" ;
preg_match_all("/\#[a-z\-]+/", $message,$match);
var_dump($match[0]);
Output
array (size=2)
0 => string '#user-one' (length=9)
1 => string '#user-two' (length=9)

preg_match_all('|\#(.*) |' , $userText , $match);
print_r($match[1])
$match[1] will contain all usernames. $userText is the users input text. Use $match[0] if you want usernames with the #.

You can use strrpos to retrieve a position and substr + strlen to get the text.
Example:
...
$mystring = "Hello this is a test message, can you see it #user-one, #user-two?";
$pos = strrpos($mystring, "#user-one");
if ($pos > 0) {
$str = substr($mystring, $pos, strlen("#user-one"));
}
...
Sorry if I don't understand correctly the question.

No messy pattern match with this!
function stringToUserArray($str) {
$remove = array(".",",","!","?");
$str = str_replace($remove, " ", $str);
$array = explode(" ", $str);
foreach($array as $string) {
if($string[0] == "#") {
$users[] = $string;
}
}
return $users;
}

Related

Make bold specific part of string

I have an array like
$array[]="This is a test";
$array[]="This is a TEST";
$array[]="TeSt this";
I need to make the string 'test' as bold like
$array[]="This is a <b>test</b>";
$array[]="This is a <b>TEST</b>";
$array[]="<b>TeSt</b> this";
I have tried with str_replace() but it is case sensitive,
Note:
I need to make the given string bold and keep as it is.
You can use array_walk PHP function to replace the string value within an array. Check below code
function my_str_replace(&$item){
$item = preg_replace("/test/i", '<b>$0</b>', $item);
}
$array[]="This is a test";
$array[]="This is a TEST";
$array[]="TeSt this";
array_walk($array, 'my_str_replace');
EDIT: Based on John WH Smith's comment
You can simply use $array = preg_replace("/test/i", '<b>$0</b>', $array); which would do the magic
If you're looking for patterns instead of fixed strings like "test", have a look at REGEXes and preg_replace :
$str = preg_replace("#(test|otherword)#i", "<b>$1</b>", $str);
More about REGEXes :
http://en.wikipedia.org/wiki/Regular_expression
http://www.regular-expressions.info/
http://uk.php.net/preg_replace
Edit : added "i" after the REGEX to remove case sensitivity.
You can use a function like the one I wrote below:
function wrap_text_with_tags( $haystack, $needle , $beginning_tag, $end_tag ) {
$needle_start = stripos($haystack, $needle);
$needle_end = $needle_start + strlen($needle);
$return_string = substr($haystack, 0, $needle_start) . $beginning_tag . $needle . $end_tag . substr($haystack, $needle_end);
return $return_string;
}
So you'd be able to call it as follows:
$original_string = 'Writing PHP code can be fun!';
$return_string = wrap_text_with_tags( $original_string , 'PHP' , "<strong>" ,"</strong>");
When returned the strings will look as follows:
Original String
Writing PHP code can be fun!
Modified Result
Writing PHP code can be fun!
This function only works on the FIRST instance of a string.
This is my solution. It also keeps all uppercase letters uppercase and all lowercase letters lowercase.
function wrapTextWithTags( $haystack, $needle , $tag ): string
{
$lowerHaystack = strtolower($haystack);
$lowerNeedle = strtolower($needle);
$start = stripos($lowerHaystack, $lowerNeedle);
$length = strlen($needle);
$textPart = substr($haystack, $start, $length);
$boldPart = "<" . $tag . ">" . $textPart . "</" . $tag . ">";
return str_replace($textPart, $boldPart, $haystack);
}
I find using a preg_replace() call to be the most appropriate tool for this task because:
it can affect all elements in the array without writing a loop,
it can replace more than one substring within a string,
adding a case-insensitive flag (i) is an easy and intuitive adjustment,
adding word boundaries (/b) on either side of the "needle" word will ensure that only whole words are replaced
when replacing the fullstring match, no parentheses / capture groups are necessary.
Code: (Demo)
$array = [
"This is a test",
"This is a TEST",
"Test this testy contest protest test!",
"TeSt this",
];
var_export(
preg_replace('/\btest\b/i', '<b>$0</b>', $array)
);
Output:
array (
0 => 'This is a <b>test</b>',
1 => 'This is a <b>TEST</b>',
2 => '<b>Test</b> this testy contest protest <b>test</b>!',
3 => '<b>TeSt</b> this',
)
Try str_ireplace. Case insensitive version of str_replace
Try this
Using str_ireplace
str_ireplace("test", "<b>test</b>", $array);
str_ireplace("TSET", "<b>TEST</b>", $array);

Get the current + the next word in a string

this is what I try to get:
My longest text to test When I search for e.g. My I should get My longest
I tried it with this function to get first the complete length of the input and then I search for the ' ' to cut it.
$length = strripos($text, $input) + strlen($input)+2;
$stringpos = strripos($text, ' ', $length);
$newstring = substr($text, 0, strpos($text, ' ', $length));
But this only works first time and then it cuts after the current input, means
My lon is My longest and not My longest text.
How I must change this to get the right result, always getting the next word. Maybe I need a break, but I cannot find the right solution.
UPDATE
Here is my workaround till I find a better solution. As I said working with array functions does not work, since part words should work. So I extended my previous idea a bit. Basic idea is to differ between first time and the next. I improved the code a bit.
function get_title($input, $text) {
$length = strripos($text, $input) + strlen($input);
$stringpos = stripos($text, ' ', $length);
// Find next ' '
$stringpos2 = stripos($text, ' ', $stringpos+1);
if (!$stringpos) {
$newstring = $text;
} else if ($stringpos2) {
$newstring = substr($text, 0, $stringpos2);
} }
Not pretty, but hey it seems to work ^^. Anyway maybe someone of you have a better solution.
You can try using explode
$string = explode(" ", "My longest text to test");
$key = array_search("My", $string);
echo $string[$key] , " " , $string[$key + 1] ;
You can take i to the next level using case insensitive with preg_match_all
$string = "My longest text to test in my school that is very close to mY village" ;
var_dump(__search("My",$string));
Output
array
0 => string 'My longest' (length=10)
1 => string 'my school' (length=9)
2 => string 'mY village' (length=10)
Function used
function __search($search,$string)
{
$result = array();
preg_match_all('/' . preg_quote($search) . '\s+\w+/i', $string, $result);
return $result[0];
}
There are simpler ways to do that. String functions are useful if you don't want to look for something specific, but cut out a pre-defined length of something. Else use a regular expression:
preg_match('/My\s+\w+/', $string, $result);
print $result[0];
Here the My looks for the literal first word. And \s+ for some spaces. While \w+ matches word characters.
This adds some new syntax to learn. But less brittle than workarounds and lengthier string function code to accomplish the same.
An easy method would be to split it on whitespace and grab the current array index plus the next one:
// Word to search for:
$findme = "text";
// Using preg_split() to split on any amount of whitespace
// lowercasing the words, to make the search case-insensitive
$words = preg_split('/\s+/', "My longest text to test");
// Find the word in the array with array_search()
// calling strtolower() with array_map() to search case-insensitively
$idx = array_search(strtolower($findme), array_map('strtolower', $words));
if ($idx !== FALSE) {
// If found, print the word and the following word from the array
// as long as the following one exists.
echo $words[$idx];
if (isset($words[$idx + 1])) {
echo " " . $words[$idx + 1];
}
}
// Prints:
// "text to"

Count spaces within exploded quotations

In simplest terms im trying to change the data string if more than 4 spaces are found within quotations. I'm able to do this on a simple string but not within exploded quotes as it becomes an array which count functions wont accept. Is there a regex to do what im looking for in this case or something?
$data = 'Hello World "This is a test string! Jack and Jill went up the hill."';
$halt = 'String had more than 4 spaces.';
$arr = explode('"', $data);
if (substr_count($arr, ' ') >= 4) {
$data = implode('"', $arr);
$data = $halt;
As far as I understand your request, this will do the job
$data = 'Hello World "This is a test string! Jack and Jill went up the hill."';
$halt = 'String had more than 4 spaces.';
// split $data on " and captures them
$arr = preg_split('/(")/', $data, -1, PREG_SPLIT_DELIM_CAPTURE);
// must we count spaces ?
$countspace = 0;
foreach ($arr as $str) {
// swap $countspace when " is encountered
if ($str == '"') $countspace = !$countspace;
// we have to count spaces
if ($countspace) {
// more than 4 spaces
if (substr_count($str, ' ') >= 4) {
// change data
$data = $halt;
break;
}
}
}
echo $data,"\n";
output:
String had more than 4 spaces.
If you define:
function count_spaces($str) {return substr_count($str, ' '); }
you can then use array_sum(array_map("count_spaces", $arr)) to count all of the spaces in all of the strings in $arr.

In PHP, how do I extract multiple e-mail addresses from a block of text and put them into an array?

I have a block of text from which I want to extract the valid e-mail addresses and put them into an array. So far I have...
$string = file_get_contents("example.txt"); // Load text file contents
$matches = array(); //create array
$pattern = '/[A-Za-z0-9_-]+#[A-Za-z0-9_-]+\.([A-Za-z0-9_-][A-Za-z0-9_]+)/'; //regex for pattern of e-mail address
preg_match($pattern, $string, $matches); //find matching pattern
However, I am getting an array with only one address. Therefore, I am guessing I need to cycle through this process somehow. How do I do that?
You're pretty close, but the regex wouldn't catch all email formats, and you don't need to specify A-Za-z, you can just use the "i" flag to mark the entire expression as case insensitive. There are email format cases that are missed (especially subdomains), but this catches the ones I tested.
$string = file_get_contents("example.txt"); // Load text file contents
// don't need to preassign $matches, it's created dynamically
// this regex handles more email address formats like a+b#google.com.sg, and the i makes it case insensitive
$pattern = '/[a-z0-9_\-\+]+#[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
// preg_match_all returns an associative array
preg_match_all($pattern, $string, $matches);
// the data you want is in $matches[0], dump it with var_export() to see it
var_export($matches[0]);
output:
array (
0 => 'test1+2#gmail.com',
1 => 'test-2#yahoo.co.jp',
2 => 'test#test.com',
3 => 'test#test.co.uk',
4 => 'test#google.com.sg',
)
I know this is not the question you asked but I noticed that your regex is not accepting any address like 'myemail#office21.company.com' or any address with a subdomain. You could replace it with something like :
/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/
which will reject less valid e-mail (although it is not perfect).
I also suggest you read this article on e-mail validation, it is pretty good and informative.
Your code is almost perfect, you just need to replace preg_match(...) with preg_match_all(...)
http://www.php.net/manual/en/function.preg-match.php
http://www.php.net/manual/en/function.preg-match-all.php
This detects all mail addresses:
$sourceeee= 'Here are examplr mymail#yahoo.com and my-e.mail#goog.com or something more';
preg_match_all('/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i', $sourceeee, $found_mails);
then you can use $found_mails[0] array.
This regex will extract all unique email address from a url or file and output each in new line. It will consider all subdomains and prefix suffix issues. Find comfortable to use it.
<?
$url="http://example.com/";
$text=file_get_contents($url);
$res = preg_match_all(
"/[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/i",
$text,
$matches
);
if ($res) {
foreach(array_unique($matches[0]) as $email) {
echo $email . "<br />";
}
}
else {
echo "No emails found.";
}
?>
check here for more reference : http://www.php.net/manual/en/function.preg-match-all.php
It worked better for me:
<?php
$content = "Hi my name is Joe, I can be contacted at joe#mysite.com.";
preg_match("/[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})/i", $content, $matches);
print $matches[0];
?>
Some of the others didn't accept domains like: name#example.com.sv
I found it on: http://snipplr.com/view/63938/
This function works fine without using regex. So it is really faster and low resource hungry.
<?php
function extract_email_addresses($str){
$emails = array();
$str = strip_tags( $str );
$str = preg_replace('/\s+/', ' ', $str);
$str = preg_replace("/[\n\r]/", "", $str);
$remove_chars = array (',', "<", ">", ";", "'", ". ");
$str = str_replace( $remove_chars, ' ', $str );
$parts = explode(' ', $str);
if(count($parts) > 0){
foreach($parts as $part){
$part = trim($part);
if( $part != '' ) {
if( filter_var($part, FILTER_VALIDATE_EMAIL) !== false){
$emails[] = $part;
}
}
}
}
if(count($emails) > 0){
return $emails;
}
else{
return null;
}
}
$string = "Guys, please help me to extract valid sam-ple.1990#gmail.co.uk email addresses from some text content using php
example , i have below text content in mysql database ' Life is more beautiful, and i like to explore lot please email me to sample#gmail.com. Learn new things every day. 'from the above text content i want to extract email address 'sample-x#gmail.com' using php regular expressions or other method.";
$matches = extract_email_addresses( $string );
print_r($matches);
?>

PHP str_replace

I'm currently using str_replace to remove a usrID and the 'comma' immediately after it:
For example:
$usrID = 23;
$string = "22,23,24,25";
$receivers = str_replace($usrID.",", '', $string); //Would output: "22,24,25"
However, I've noticed that if:
$usrID = 25; //or the Last Number in the $string
It does not work, because there is not a trailing 'comma' after the '25'
Is there a better way I can be removing a specific number from the string?
Thanks.
YOu could explode the string into an array :
$list = explode(',', $string);
var_dump($list);
Which will give you :
array
0 => string '22' (length=2)
1 => string '23' (length=2)
2 => string '24' (length=2)
3 => string '25' (length=2)
Then, do whatever you want on that array ; like remove the entry you don't want anymore :
foreach ($list as $key => $value) {
if ($value == $usrID) {
unset($list[$key]);
}
}
var_dump($list);
Which gives you :
array
0 => string '22' (length=2)
2 => string '24' (length=2)
3 => string '25' (length=2)
And, finally, put the pieces back together :
$new_string = implode(',', $list);
var_dump($new_string);
And you get what you wanted :
string '22,24,25' (length=8)
Maybe not as "simple" as a regex ; but the day you'll need to do more with your elements (or the day your elements are more complicated than just plain numbers), that'll still work :-)
EDIT : and if you want to remove "empty" values, like when there are two comma, you just have to modifiy the condition, a bit like this :
foreach ($list as $key => $value) {
if ($value == $usrID || trim($value)==='') {
unset($list[$key]);
}
}
ie, exclude the $values that are empty. The "trim" is used so $string = "22,23, ,24,25"; can also be dealt with, btw.
Another issue is if you have a user 5 and try to remove them, you'd turn 15 into 1, 25 into 2, etc. So you'd have to check for a comma on both sides.
If you want to have a delimited string like that, I'd put a comma on both ends of both the search and the list, though it'd be inefficient if it gets very long.
An example would be:
$receivers = substr(str_replace(','.$usrID.',', ',', ','.$string.','),1,-1);
An option similar to Pascal's, although I think a bit simipler:
$usrID = 23;
$string = "22,23,24,25";
$list = explode(',', $string);
$foundKey = array_search($usrID, $list);
if ($foundKey !== false) {
// the user id has been found, so remove it and implode the string
unset($list[$foundKey]);
$receivers = implode(',', $list);
} else {
// the user id was not found, so the original string is complete
$receivers = $string;
}
Basically, convert the string into an array, find the user ID, if it exists, unset it and then implode the array again.
I would go the simple way: add commas around your list, replace ",23," with a single comma then remove extra commas. Fast and simple.
$usrID = 23;
$string = "22,23,24,25";
$receivers = trim(str_replace(",$usrID,", ',', ",$string,"), ',');
With that said, manipulating values in a comma separated list is usually sign of a bad design. Those values should be in an array instead.
Try using preg:
<?php
$string = "22,23,24,25";
$usrID = '23';
$pattern = '/\b' . $usrID . '\b,?/i';
$replacement = '';
echo preg_replace($pattern, $replacement, $string);
?>
Update: changed $pattern = '/$usrID,?/i'; to $pattern = '/' . $usrID . ',?/i';
Update2: changed $pattern = '/' . $usrID . ',?/i to $pattern = '/\b' . $usrID . '\b,?/i' to address onnodb's comment...
Simple way (providing all 2 digit numbers):
$string = str_replace($userId, ',', $string);
$string = str_replace(',,','', $string);

Categories