I have a PHP page that is reading text stored in a MYSQL database table.
The text might look something like this
Bob: Hi blah blah
(Bob walking around)
Fred Johnson: blah blah blah
Bob: Something something: something
I want to do a preg_replace to bold everything that comes before the first colon in each line.
So in this situation only the names would be bold and on that last line "Something something" would not be bold
What I have now bolds everything on each line that comes before any colon
$reg='(.*\w:)';
$text = preg_replace("/".$reg."/", "<b>\${1}</b>", $text);
You can use:
$reg='^([^:]*:)';
See it
The ^ symbol is used to match the beginning of a line. Prepend that to the beginning of your regex and it will assure that the match starts at the beginning of the current line :-)
Related
I'm trying to make a regex statement that can get the previous sentence before the occurrence of "[bbcode]" but is flexible enough to work in different scenarios.
For example, the previous sentence may be defined as following a period. However, it may simply be on a new line. I cannot use ^$ to define start or end of line as this may not always be the case.
Whole test string:
Example 1:
Blah blah blah. THIS SENTENCE SHOULD BE SELECTED [bbcode]
Example 2:
THIS SENTENCE SHOULD BE SELECTED [bbcode]
Example 3:
A trick sentence. And another. THIS SENTENCE SHOULD BE SELECTED
[bbcode]
Expected matches:
All three instances of THIS SENTENCE SHOULD BE SELECTED should be matched.
This is the regex I tried:
'/(?:\.)(.+)(\[bbcode\])/gUs'
This fails when sentence is on a new line as in Example 2.
Link to
Regex Interrupter using my Regex
I have tried negative lookbehinds to no avail. The strings "THIS SENTENCE SHOULD BE SELECTED" should get picked up in all three examples.
Picking up surrounding spaces is ok because I can trim it later.
Challenges:
The entire supplied code must be tested as one string. This is how the data will be supplied and will likely contain many random spaces, new lines etc which the regex must consider.
It is likely impossible to prepare / sanitize the string first, as the string will likely be very poorly formatted without proper punctuation. Contracting the string could cause unintended run-on sentences.
This can be achieved with basic PHP functions. Something like this:
function extractSentence($string)
{
$before = substr($string, 0, strpos($string, '[bbcode]'));
return trim(substr($before, strrpos($before, '.')), "\n .");
}
The advantage is that it is easy to understand, doesn't take much time to develop and can more easily be changed if that need arises.
See: PHP Fiddle
Match and release an optional space ( *\K) then
Lazily match one or more non-dot characters ([^.]+?) then
Lookahead for zero or more whitespace characters followed by the bbcode tag ((?=\s+\[bbcode]))
Make the pattern case-insensitive if the bbcode might be uppercase (i)
Code: (Demo)
$tests = [
'Blah blah blah. THIS SENTENCE SHOULD BE SELECTED [bbcode] text',
'THIS SENTENCE SHOULD BE SELECTED [bbcode] text',
'A trick sentence. And another. THIS SENTENCE SHOULD BE SELECTED
[bbcode]] text'
];
foreach ($tests as $test) {
var_export(preg_match('/ *\K[^.]+?(?=\s+\[bbcode])/i', $test, $m) ? $m[0] : 'no match');
echo "\n---\n";
}
I have little confidence when it comes to regular expressions. Writing this in PHP code.
I need to be able to filter out strings that follow this format, where the numbers can be 4~6 digits (numeric only):
$input = "This is my string with a weird ID added cause I'm a weirdo! (id:11223)";
I could simply remove the last word by finding the last position of a space via strrpos(); (it appears none of them have a trailing space from JSON feed), then use substr(); to cut it. But I think the more elegant way would be a substring. The intended output would be:
$output = trim(preg_replace('[regex]', $input));
// $output = "This is my string with a weird ID added cause I'm a weirdo!"
So this regex should match with the brackets, and the id: portion, and any contiguous numbers, such as:
(id:33585)
(id:1282)
(id:9845672)
Intending to use the preg_replace() function to remove these from a data feed. Don't ask me why they decided to include an ID in the description string... It blows my mind too why it's not a separate column in the JSON feed altogether.
Try using the pattern \(id:\d+\):
$input = "Text goes here (id:11223) and also here (id:33585) blah blah";
echo $input . "\n";
$output = preg_replace("/\(id:\d+\)/", "", $input);
echo $output;
This prints:
Text goes here (id:11223) and also here (id:33585) blah blah
Text goes here and also here blah blah
There is an edge case here, which you can see in the possible (unwanted) extract whitespace left behind after the replacement. We could try to get sophisticated and remove that too, but you should state what you expected output is.
I have string with multiple newlines, each new line contains a new substring.
Here is an example of the string with 2 lines :
$x="Hello
World!
You can see Hello is in the first line and World! is in a newline.
Each substrings start and end with or without a newline(s)|space(s).
I want to match both lines.
My currunt code matched the second line and ignored the first.
if(preg_match_all("/^\n*\s*(Hello|World!)\s*$/i",$x,$m))
{print_r($m);}
Is there any way to fix this,so that both lines can be matched?
if(preg_match_all("/(Hello|World!)/i",$x,$m))
{
print_r($m);
}
If you user ^ $ it matches only the full line.
So the Hello part has no \n*\s* before Hello so it won't match.
If you change the script like above, it will only lookup for the words, reagless of having any stuff before or after the word.
Hope that helps.
Try this
$x="Hello
world! ";
preg_match_all("#\b(Hello|World!)\b#",$x,$m) ;
echo count($m)
Output
2
Phpfiddle Preview
I'm not sure what the "goal" is but this regex will match both lines. I changed your Hello|World to (.+?) as it matches any character and so would match variables if you are parsing code.
/^(.+?)[\n,\s,\t]{1,8}(.+?)$/
Here is your example with the new regex, so if I misinterpreted you can see what's going on.
<?php
$x="Hello
World!";
if(preg_match_all("/^(.+?)[\n,\s,\t]{1,8}(.+?)$/i",$x,$m))
{print_r($m);}
?>
you'll need to determine what and how many potential characters are between the first and second lines if your parsing more than a single series, and then modify the {1,8} to suit spaces, tabs, newlines, etc...
I'm trying to identify if the user has typed someone else's username if the typed an at symbol (#) before it, much like on twitter. My function can recognise the # symbol and the username after it, but it includes the a spacebar after it (if there was one).
Here's my regex stuff
/#([A-Za-z0-9_]+)(\s|\Z)/
So let's say that a user typed
#testificate blah blah blah
My function would select the following (between the | symbols)
|#testificate |blah blah blah
When what I actually want is for it to select
|#testificate| blah blah blah
It includes the space afterwards and that's not what I want. Is there a better way to do this? I'm turning the # tags into links with a preg_replace, can anyone help me out? Thanks
why you add (\s|\Z) ?
\s space
\Z End of subject or newline at end
Regex: /#([A-Za-z0-9_]+)/
Edit:
davidchambers's suggestion
Shorter Regex: /#(\w)+/
I have a situation in which I parse a body of text and replace certain phrases with links. I then need to re-parse the string to replace a second set of phrases with links. The problem arises at this point, where certain words or phrases in the second set can be substrings of phrases already replaced in the first pass.
Example: The string "blah blah grand canyon blah" will become "blah blah grand canyon blah" after the first pass. The second pass might try to replace the word "canyon" with a link, so the resulting, broken, text would read: "blah blah grand <a href="#">canyon</a> blah".
So I've been trying to use preg_replace and a regular expression to prevent nested <a> tags from occurring - by only replacing text which is not already in a link. I have tried to regexes that check based on whether there are </a> tags further on in the text but can't get these to work.
Maybe another approach is required?
Many thanks in advance!
Dave
This might work for all passes:
$string = preg_replace('/([^>]|^)grand canyon\b/','$1<a href=#>grand canyon</a>',$string);
EDIT: assuming you can afford missing when the text contains stuff like "amazonas>grand canyon"
For the second pass, you could use a regex such as:
(<a[^>]*>.*?</a>)|grand
This regex matches either a link, or the word "grand". If the link is matched, it is captured into the first (and only) capturing group. If the group matched, simply re-insert the existing link. If the word grand matches, you know it's outside a link, and you can turn it into a link.
In PHP you can do this with preg_replace_callback:
$result = preg_replace_callback('%(<a[^>]*>.*?</a>)|grand%', compute_replacement, $subject);
function compute_replacement($groups) {
// You can vary the replacement text for each match on-the-fly
// $groups[0] holds the regex match
// $groups[n] holds the match for capturing group n
if ($groups[1]) {
return $groups[1];
} else {
return "<a href='#'>$groups[0]</a>";
}