how could I combine these regex rules? - php

I'm detecting #replies in a Twitter stream with the following PHP code using regexes.
$text = preg_replace('!^#([A-Za-z0-9_]+)!', '#$1', $text);
$text = preg_replace('! #([A-Za-z0-9_]+)!', ' #$1', $text);
How can I best combine these two rules without false flagging email#domain.com as a reply?

OK, on a second thought, not flagging whatever#email means that the previous element has to be a "non-word" item, because any other element that could be contained in a word could be signaled as an email, so it would lead:
!(^|\W)#([A-Za-z0-9_]+)!
but then you have to use $2 instead of $1.

Since the ^ does not have to stand at the beginning of the RE, you can use grouping and | to combine those REs.
If you don't want re-insert the whitespace you captured, you have to use "positive lookbehind":
$text = preg_replace('/(?<=^|\s)#(\w+)/',
'#$1', $text);
or "negative lookbehind":
$text = preg_replace('/(?<!\S)#(\w+)/',
'#$1', $text);
...whichever you find easier to understand.

Here's how I'd do the combination
$text = preg_replace('!(^| )#([A-Za-z0-9_]+)!', '$1#$2', $text);

$text = preg_replace('/(^|\W)#(\w+)/', '#$2', $text);

preg_replace('%(?<!\S)#([A-Za-z0-9_]+)%', '#$1', $text);
(?<!\S) is loosely translated to "no preceding non-whitespace character". Sort of a double-negation, but also works at the start of the string/line.
This won't consume any preceding character, won't use any capturing group, and won't match strings such as "foo-#host.com", which is a valid e-mail address.
Tested:
Input = 'foo bar baz-#qux.com bee #def goo#doo #woo'
Output = 'foo bar baz-#qux.com bee #def goo#doo #woo'

Hu, guys, don't push too far... Here it is :
!^\s*#([A-Za-z0-9_]+)!

I think you can use alternation,: so look for the beginning of a string or a space
'!(?:^|\s)#([A-Za-z0-9_]+)!'

Related

Remove text after link

So I have an #mentions function on my site that users input themselves but can do something line:
#foo Hello This is some mention text included.
I would like to remove just the text (Everything after #foo) The content comes through the streamitem_content:
$json['streamitem_content_usertagged'] =
preg_replace('/(^|\s)#(\w+)/', '\1#$1',
$json['streamitem_content']);
Give this a try
$json['streamitem_content'] = '#foo Hello This is some mention text included.';
$json['streamitem_content_usertagged'] =
preg_replace('/#(\w+)/', '#$1',
$json['streamitem_content']);
echo $json['streamitem_content_usertagged'];
Output:
#foo Hello This is some mention text included.
Preg_replace will only replace what it finds so you don't need to find content you aren't interested. If you did want to capture multiple parts of a string though capture groups increase by one after each group (). So this
preg_replace('/(^|\s)#(\w+)/', '$1#$2',
$json['streamitem_content']);
echo $json['streamitem_content_usertagged'];
would actually be
preg_replace('/(^|\s)#(\w+)/', '$1#$2',
$json['streamitem_content']);
Update:
$json['streamitem_content'] = '#foo Hello This is some mention text included.';
$json['streamitem_content_usertagged'] =
preg_replace('/#(\w+).*$/', '#$1',
$json['streamitem_content']);
echo $json['streamitem_content_usertagged'];
Output:
#foo
If the content you want to replace after #foo can extended to multiple lines use the s modifier.
Regex101 Demo: https://regex101.com/r/tX1rO0/1
So pretty much the regex says find an # then capture all continuous a-zA-Z0-9_ characters. After a those continuos characters we don't care go to the end of the string.
You can use this:
preg_replace('/^\s*#(\w+)/', '#$1',
$json['streamitem_content']);
This removes the leading white space, and includes the # in the hyperlink's text (not the link argument).
If you need to keep the leading white space in tact:
preg_replace('/^(\s*)#(\w+)/', '$1#$2',
$json['streamitem_content']);
You could use explode(); and str_replace(); . They might have a speed advantage over preg.
Assuming the line is available as a variable (e.g. $mention):
$mention = $json['streamitem_content'];
$mention_parts = explode(" ", $mention);
$the_part_you_want = str_replace('#','', $mention_parts[0]);
// or you could use $the_part_you_want = ltrim($mention_parts[0], '#');
$json['streamitem_content_usertagged'] = '#' . $mention_parts[0] . '';
or use trim($mention_parts[0]); to remove any whitespace if it is unwanted.
You could use fewer variables and reuse $mention as array but this seemed a clearer way to illustrate the principle.

Issue with regular expression for identifying encrypted ids from string

I want to convert certain patterns into links and it works fine as far as normal user ids are considered.But now i want to do the same for encrypted ids as well.
Below is my code:(works)
$text = "hi how are you guys???... ##[Sam Thomas:10181] ##[Jack Daniel:11074] ##[Paul Walker:11043] ";
$pattern = "/##\[([^:]*):(\d*)\]/";
$matches = array();
preg_match_all($pattern, $text, $matches);
$output = preg_replace($pattern, "$1", $text);
Now i need to do link the text like:
"hi how are you guys???... ##[Sam Thomas:ZGNjAmD9ac3K] ##[Jack Daniel:ZGNjAmD9ac3K] ##[Paul Walker:ZGNjAmD9ac3K] ";
But this encrypted is not identified by above regular expression...
##\[([^:]*):(.*?)\]
^^
Try this.See demo.Just change \d* to .*? to accept anything or \w* to accept only numbers and letters.or [^\]]* or [0-9a-zA-Z] as well.
https://regex101.com/r/vD5iH9/52
Change your regex to accept numbers and letters as well.
Something like this -
##\[([^:]*):([0-9a-zA-Z]*)\]
^^^^^^^^^^^ Replaced \d
Demo

PHP regEx help needed with /*<##> </##>*/

I am struggling with regEx, but can not get it to work.
I already try with:
SO question, online tool,
$text = preg_replace("%/\*<##>(?:(?!\*/).)</##>*\*/%s", "new", $text);
But nothing works.
My input string is:
$input = "something /*<##>old or something else</##>*/ something other";
and expected result is:
something /*<##>new</##>*/ something other
I see two issues that point out here, you have no capturing groups to replace the delimited markers inside your replacement call and your Negative Lookahead syntax is missing a repetition operator.
$text = preg_replace('%(/\*<##>)(?:(?!\*/).)*(</##>*\*/)%s', '$1new$2', $text);
Although, you can replace the lookahead with .*? since you are using the s (dotall) modifier.
$text = preg_replace('%(/\*<##>).*?(</##>*\*/)%s', '$1new$2', $text);
Or consider using a combination of lookarounds to do this without capturing groups.
$text = preg_replace('%/\*<##>\K.*?(?=</##>\*/)%s', 'new', $text);
Tested:
$input = "something /*<##>old or something else</##>*/ something other";
echo preg_replace('%(/\*<##>)(.*)(</##>\*/)%', '$1new$3', $input);

preg_replace() seems to remove entire word instead of part of it

I'm trying to match a certain word and replace part of the word with certain text but leave the rest of the word intact. It is my understanding that adding parentheses to part of the regex pattern means that the pattern match within the parentheses gets replaced when you use preg_replace()
for testing purposes I used:
$text = 'batman';
echo $new_text = preg_replace('#(bat)man#', 'aqua', $text);
I only want 'bat' to be replaced by 'aqua' to get 'aquaman'. Instead, $new_text echoes 'aqua', leaving out the 'man' part.
preg_replace replaces all the string matched by regular expression
$text = 'batman';
echo $new_text = preg_replace('#bat(man)#', 'aqua\\1', $text);
Capture man instead and append it to your aqua prefix
Another way of doing that is to use assertions:
$text = 'batman';
echo $new_text = preg_replace('#bat(?=man)#', 'aqua', $text);
I would not use preg_* functions for this and just do str_replace() DOCs:
echo str_replace('batman', 'aquaman', $text);
This is simpler as a regex is not really needed in this case. Otherwise it would be with a regular expression:
echo $new_text = preg_replace('#bat(man)#', 'aqua\\1', $text);
This will substitute your man in after aqua when replacing the entire search phrase. preg_replace DOCs replaces the entire matching portion of the pattern.
The way you're trying to do it, it would be more like:
preg_replace('#bat(man)#', 'aqua$1', $text);
I'd using positive lookahead:
preg_replace('/bat(?=man)/', 'aqua', $text)
Demo here: http://ideone.com/G9F4q
The brackets are creating a capturing group, that means you can access the part matched by this group using \1.
you can do either what zerkms suggested or use a lookahead that does just check but not match.
$text = 'batman';
echo $new_text = preg_replace('#bat(?=man)#', 'aqua', $text);
This will match "bat" but only if it is followed by "man", and only "bat" is replaced.

Replacing the last occurrence of a character in a string

I have a little issue I'm trying to find a solution for.
Basically, imagine you have the following string:
$string = 'Hello I am a string';
And you'd like it to end with something like the folowing:
$string = 'Hello I am a string';
Simply, replacing the last occurrence of a space, with a non-breaking space.
I'm doing this because I don't want the last word in a heading to be on its own. Simply because when it comes to headings:
Hello I am a
string
Doesn't look as good as
Hello I am
a string
How does one do such a thing?
Code from this example will do the trick:
// $subject is the original string
// $search is the thing you want to replace
// $replace is what you want to replace it with
substr_replace($subject, $replace, strrpos($subject, $search), strlen($search));
echo preg_replace('/\s(\S*)$/', ' $1', 'Hello I am a string');
Output
Hello I am a string
CodePad.
\s matches whitespace characters. To match a space explictly, put one in (and change \S to [^ ]).
This would do the trick:
$string = preg_replace('/([\s\S]+)\s(\w)$/','$1 $2',$string);
as per pounndifdef's answer, however i needed to decode the HTML entity like so:
substr_replace($subject, html_entity_decode($replace), strrpos($subject, $search), strlen($search));
also worked using alex's answer:
preg_replace('/\s(\S*)$/', html_entity_decode(' ').'$1', 'Hello I am a string');
Use str_replace() like normal, but reverse the string first. Then, reverse it back.

Categories