php regular expression help needed on special charecters

php regular expression help needed on special charecters - php

here goes my code
$string="According to a report on the Times of In#dia, &#8220 Telan#gana Rashtra Samiti chief K Chandrasekhar #Rao has seen a #sinister motive behind the protests against the formation of Telangana";
preg_match_all('/(?!\b)(#\w+\b)/' ,$string, $matches);
foreach($matches[1] as $match){
$string = str_replace("$match","[h]".$match."[/h]",$string);
}
echo $string;
output
According to a report on the Times of In#dia, &[h]#8220[/h] Telan#gana
Rashtra Samiti chief K Chandrasekhar [h]#Rao[/h] has seen a
[h]#sinister[/h] motive behind the protests against the formation of
Telangana
i want to replace only the string starts with # but it also replacing &#8220 to &[h]#8220[/h] . please help me on this.

Try using a positive lookbehind since there's always a word boundary before a hash # :
/(?<=\s|^)(#\w+\b)/
Which makes sure there's either a space or the beginning of the string before the hashed word.
You can use this in a preg_replace:
$string="According to a report on the Times of In#dia, &#8220 Telan#gana Rashtra Samiti chief K Chandrasekhar #Rao has seen a #sinister motive behind the protests against the formation of Telangana";
$result = preg_replace('/(?<=\s|^)(#\w+\b)/', "[h]$1[/h]", $string);

Related

preg match text between tags excluding same tag in between

Well I know there several questions similar but could not find any with this specific case.
I took one code and tweak it to my needs but now I'm founding a bug on it that I can't correct.
Code:
$tag = 'namespace';
$match = Tags::get($f, $tag);
var_dump($match);
static function get( $xml, $tag) { // http://stackoverflow.com/questions/3404433/get-content-within-a-html-tag-using-7-processing
// bug case string(56) "<namespaces>
// <namespace key="-2">Media</namespace>"
$tag_ini = "<{$tag}[^\>]*?>"; $tag_end = "<\\/{$tag}>";
$tag_regex = '/' . $tag_ini . '(.*?)' . $tag_end . '/si';
preg_match_all($tag_regex,
$xml,
$matches,
PREG_OFFSET_CAPTURE);
return $matches;
}
As you can see, there is a bug if the tag is nested:
<namespaces> <namespace key="-2">Media</namespace>
When it should return 'Media', or even the outer '<namespaces>' and then the inside ones.
I tried to add "<{$tag}[^\>|^\r\n ]*?>", ^\s+, changing the * to *?, and other few things that in best case turned to recognize only the bugged case.
Also tried "<{$tag}[^{$tag}]*?>" which gives blank, I suppose it nullifies itself.
I'm a newb on regex, I can tell that to fix this just is needed to add don't let open a new tag of the same type.
Or I could even use a hack answer for my use case, that excludes if the inside text has new line carriage.
Can anyone get the right syntax for this?
You can check an extract of the text here: http://pastebin.com/f2naN2S3
After the proposed change: $tag_ini = "<{$tag}\\b[^>]*>"; $tag_end = "<\\/{$tag}>"; it does work for the the example case, but not for this one:
<namespace key="0" />
<namespace key="1">Talk</namespace>
As it results in:
<namespace key="1">Talk"
It's because numbers and " and letters are considered inside word boundary. How could I address that?

The main problem is that you did not use a word boundary after the opening tag and thus, namespace in the pattern could also match namespaces tag, and many others.
The subsequent issue is that the <${tag}\b[^>]*>(.*?)<\/${tag}> pattern would overfire if there is a self-closing namespace tag followed with a "normal" paired open/close namespace tag. So, you need to either use a negative lookbehind (?<!\/) before the > (see demo), or use a (?![^>]*\/>) negative lookahead after \b (see demo).
So, you can use
$tag_ini = "<{$tag}\\b[^>]*(?<!\\/)>"; $tag_end = "<\\/{$tag}>";

This is probably not the idea answer, but I was messing with a regex generator:
<?php
# URL that generated this code:
# http://txt2re.com/index-php.php3?s=%3Cnamespace%3E%3Cnamespace%20key=%22-2%22%3EMedia%3C/namespace%3E&12&11
$txt='arstarstarstarstarstarst<namespace key="-2">Media</namespace>arstarstarstarstarst';
$re1='.*?'; # Non-greedy match on filler
$re2='(?:[a-z][a-z]+)'; # Uninteresting: word
$re3='.*?'; # Non-greedy match on filler
$re4='(?:[a-z][a-z]+)'; # Uninteresting: word
$re5='.*?'; # Non-greedy match on filler
$re6='(?:[a-z][a-z]+)'; # Uninteresting: word
$re7='.*?'; # Non-greedy match on filler
$re8='((?:[a-z][a-z]+))'; # Word 1
if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8."/is", $txt, $matches))
{
$word1=$matches[1][0];
print "($word1) \n";
}
#-----
# Paste the code into a new php file. Then in Unix:
# $ php x.php
#-----
?>

This line is what I needed
$tag_ini = "<{$tag}\\b[^>|^\\/>]*>"; $tag_end = "<\\/{$tag}>";
Thank you very much you #Alison and #Wictor for your help and directions

php regex to get middle of string

I parse an html page into a plain text in order to find and get a numeric value.
In the whole html mess, I need to find a string like this one:
C) Debiti33.197.431,90I - Di finanziamento
I need the number 33.197.431,90 (where this number is going to change on every html parsing request.
Is there any regex to achieve this? For example:
STARTS WITH 'C) Debiti' ENDS WITH 'I - Di finanziamento' GETS the middle string that can be whatever.
Whenever I try, I get empty results...don't know that much about regex.
Can you please help me?
Thank you very much.

You could try the below regex,
^C\) Debiti\K.*?(?=I - Di finanziamento$)
DEMO
PHP code would be,
<?php
$mystring = "C) Debiti33.197.431,90I - Di finanziamento";
$regex = '~^C\) Debiti\K.*?(?=I - Di finanziamento$)~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 33.197.431,90

This should work. Read section Want to Be Lazy? Think Twice.
(?<=\bC\) Debiti)[\d.,]+(?=I - Di finanziamento\b)
Here is demo
sample code:
$re = "/(?<=\\bC\\) Debiti)[\\d.,]+(?=I - Di finanziamento\\b)/i";
$str = "C) Debiti33.197.431,90I - Di finanziamento";
preg_match($re, $str, $matches);

preg_replace_callback highlight pattern not match in result

I have this code:
$string = 'The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.';
$text = explode("#", str_replace(" ", " #", $string)); //ugly trick to preserve space when exploding, but it works (faster than preg_split)
foreach ($text as $value) {
echo preg_replace_callback("/(.*p.*e.*d.*|.*a.*y.*)/", function ($matches) {
return " <strong>".$matches[0]."</strong> ";
}, $value);
}
The point of it is to be able to enter a sequence of characters (in the code above it's a fixed pattern), and it finds and highlights those characters in the matched word. The code I have now highlights the entire word. I'm looking for the most efficient way of highlighting the characters.
The result of the current code:
The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.
What I would like to have:
The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.
Did I take the wrong approach? It would be awesome if someone could point me in the right way, I've been searching for hours and didn't find what I was looking for.
EDIT 2:
Divaka's been a great help. Almost there.. I apologize if I haven't been clear enough on what my goal is. I will try to explain further.
- Part A -
One of the things I will be using this code for is a phone book. A simple example:
When following characters are entered:
Jan
I need it to match following examples:
Jan Verhoeven
Arjan Peters
Raj Naren
Jered Von Tran
The problem is that I will be iterating over the entire phone book, person-record per person-record. Each person also has email-addresses, a postal address, maybe a website, a extra note, ect.. This means that the text I'm actually search can contain anything from letters, numbers, special characters(&#()%_- etc..), newlines, and most importantly spaces. So an entire record (csv) might contain the following info:
Name;Address;Email address;Website;Note
Jan Verhoeven;Veldstraat 2a, 3209 Herkstad;jan#werk.be;www.janophetwerk.be,jan#telemet.be;Jan die ik ontmoet heb op de bouwbeurs.\n Zelfstandige vertegenwoordiger van bouwmaterialen.
Raj Naren;Kerklaan 334, 5873 Biep;raj#werk.be;;Rechtstreekse contactpersoon bij Werk.be (#654 intern)
The \n is meant to be an actual newline. So if I search for #werk.be, I'd like to see both these records as a result.
- Part B -
Something else I want to use this for is searching song-texts. When I'm looking for a song and I can only remember it had to do something with ducks or docks and a circle, I would enter dckcircle and get the following result:
... and the ducks were all dancing in a great big circle, around the great big bonfire ...
To be able to fine-tune the searching I'd like to be able to limit the number of spaces (or any other character), because I would imagine it finding a simple pattern like eve in every song while I'm only looking for a song that has the exact word eve in it.
- Conclusion -
If I summarize this in pseudo-regex, for a search pattern abc with a max of 3 spaces in-between it would be something like this: (I might be totally off here)
(a)(any character, max 3 spaces)(b)(any character, max 3 spaces)(c)
Or more generic:
(a)({any character}{these characters with a limit of 3})(b)({any character}{these characters with a limit of 3})(c)
This can even be extended to this fairly easily I'm guessing:
(a)({any character}{these characters with a limit of 3}{not these characters})(b)({any character}{these characters with a limit of 3}{not these characters})(c)
(I know the ´{}´ brackets are not to be used that way in a regular expression, but I don't know how else to put it without using a character that has a meaning in regular expressions.)
If anyone wonders, I know the sql like statement would be able to do 80% (I'm guessing, might even be more) of what I'm trying to do, but I'm trying to avoid using a database to make this as portable as possible.
When the correct answer has been found, I'll clean this question (and the code) up and post the resulting php-class here (maybe I'll even put it up on github if that would be useful), so anyone looking for the same will have a fully working class to work with :).

I've came up with this. Tell me if it's what you want!
//$string = "The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.";
$string = "abcdefo";
//$pattern_array1 = array(a,y);
//$pattern_array2 = array(p,e,d);
$pattern_array1 = array(e,f);
$pattern_array2 = array(o);
$pattern_array2 = array(a,f);
$number_of_patterns = 2;
$regexp1 = generate_regexp($pattern_array1, 1);
$regexp2 = generate_regexp($pattern_array2, 2);
$string = preg_replace($regexp1["pattern"], $regexp1["replacement"], $string);
$string = preg_replace($regexp2["pattern"], $regexp2["replacement"], $string);
$string = transform_multimatched_chars($string);
// transforming other chars after transforming the multimatched ones
for($i = 1; $i <= $number_of_patterns; $i++) {
$string = str_replace("#{$i}", "<strong>", $string);
$string = str_replace("#/{$i}", "</strong>", $string);
}
echo $string;
function generate_regexp($pattern_array, $pattern_num) {
$regexp["pattern"] = "/";
$regexp["replacement"] = "";
$i = 0;
foreach($pattern_array as $key => $char) {
$regexp["pattern"] .= "({$char})";
$regexp["replacement"] .= "#{$pattern_num}\$". ($key + $i+1) . "#/{$pattern_num}";
if($key < count($pattern_array) - 1) {
$regexp["pattern"] .= "(?s)((?:(?!{$pattern_array[$key + 1]})(?!\s).)*)";
$regexp["replacement"] .= "\$".($key + $i+2) . "";
}
$i = $key + 1;
}
$regexp["pattern"] .= "/";
return $regexp;
}
function transform_multimatched_chars($string)
{
preg_match_all("/((#[0-9]){2,})(.*)((#\/[0-9]){2,})/", $string, $matches);
// change this for your purposes
$start_replacement = '<span style="color:red;">';
$end_replacement = '</span>';
foreach($matches[1] as $key => $match)
{
$string = str_replace($match, $start_replacement, $string);
$string = str_replace($matches[4][$key], $end_replacement, $string);
}
return $string;
}

wrap words in string with regex

This is the string
(code)
Pivot: 96.75<br />Our preference: Long positions above 96.75 with targets # 97.8 & 98.25 in extension.<br />Alternative scenario: Below 96.75 look for further downside with 96.35 & 95.9 as targets.<br />Comment the pair has broken above its resistance and should post further advance.<br />
(text)
"Pivot: 96.75Our preference: Long positions above 96.75 with targets # 97.8 & 98.25 in extension.Alternative scenario: Below 96.75 look for further downside with 96.35 & 95.9 as targets.Comment the pair has broken above its resistance and should post further advance."
the result should be
(code)
<b>Pivot</b>: 96.75<br /><b>Our preference</b>: Long positions above 96.75 with targets # 97.8 & 98.25 in extension.<br /><b>Alternative scenario</b>: Below 96.75 look for further downside with 96.35 & 95.9 as targets.<br />Comment the pair has broken above its resistance and should post further advance.<br />
(text)
Pivot: 96.75Our preference: Long positions above 96.75 with targets # 97.8 & 98.25 in extension.Alternative scenario: Below 96.75 look for further downside with 96.35 & 95.9 as targets.Comment the pair has broken above its resistance and should post further advance.
The porpuse:
Wrap all the words before : sign.
I've tried this regex: ((\A )|(<br />))(?P<G>[^:]*):, but its working only on python environment. I need this in PHP:
$pattern = '/((\A)|(<br\s\/>))(?P<G>[^:]*):/';
$description = preg_replace($pattern, '<b>$1</b>', $description);
Thanks.

This preg_replace should do the trick:
preg_replace('#(^|<br ?/>)([^:]+):#m','$1<b>$2</b>:',$input)
PHP Fiddle - Run (F9)

I should start by saying that HTML operations are better done with a proper parser such as DOMDocument. This particular problem is straightforward, so regular expressions may work without too much hocus pocus, but be warned :)
You can use look-around assertions; this frees you from having to restore the neighbouring strings during the replacement:
echo preg_replace('/(?<=^|<br \/>)[^:]+(?=:)/m', '<b>$0</b>', $str);
Demo
First, the look-behind assertion matches either the start of each line or a preceding <br />. Then, any characters except the colon are matched; the look-ahead assertion makes sure it's followed by a colon.
The /m modifier is used to make ^ match the start of each line as opposed to \A which always matches the start of the subject string.

The most "general" and least regex-expensive way to do this that I could come up with was this:
$parts = explode('<br', $str);//don't include space and `/`, as tags may vary
$formatted = '';
foreach($parts as $part)
{
$formatted .= preg_replace('/^\s*[\/>]{0,2}\s*([^:]+:)/', '<b>$1</b>',$part).'<br/>';
}
echo $formatted;
Or:
$formatted = array();
foreach($parts as $part)
{
$formatted[] = preg_replace('/^\s*[\/>]{0,2}\s*([^:]+:)/', '<b>$1</b>',$part);
}
echo implode('<br/>', $formatted);
Tested with, and gotten this as output
Pivot: 96.75Our preference: Long positions above 96.75 with targets # 97.8 & 98.25 in extension.Alternative scenario: Below 96.75 look for further downside with 96.35 & 95.9 as targets.Comment the pair has broken above its resistance and should post further advance.
That being said, I do find this bit of data weird, and, if I were you, I'd consider str_replace or preg_replace-ing all breaks with PHP_EOL:
$str = preg_replace('/\<\s*br\s*\/?\s*\>/i', PHP_EOL, $str);//allow for any form of break tag
And then, your string looks exactly like the data I had to parse, and got the regex for that here:
$str = preg_replace(...);
$formatted = preg_replace('/^([^:\n\\]++)\s{0,}:((\n(?![^\n:\\]++\s{0,}:)|.)*+)/','<b>$1:</b>$2<br/>', $str);

PHP: How to find the beginning and end of a substring in a string?

This is the content of one mysql table field:
Flash LEDs: 0.5W
LED lamps: 5mm
Low Powers: 0.06W, 0.2W
Remarks(1): this is remark1
----------
Accessories: Light Engine
Lifestyle Lights: Ambion, Crane Fun
Office Lights: OL-Deluxe Series
Street Lights: Dolphin
Retrofits: SL-10A, SL-60A
Remarks(2): this is remark2
----------
Infrared Receiver Module: High Data Rate Short Burst
Optical Sensors: Ambient Light Sensor, Proximity Sensor, RGB Color Sensor
Photo Coupler: Transistor
Remarks(3): this is remark3
----------
Display: Dot Matrix
Remarks(4): this is remark4
Now, I want to read the remarks and store them in a variable. Remarks(1), Remarks(2), etc. are fixed. 'this is remark1', etc. come from form input fields, so they are flexible.
Basically what I need is: Read everything between 'Remarks(1):' and '--------' and save it in a variable.
Thanks for your help.

You can use regex:
preg_match_all("~Remarks\(([^)]+)\):([^\n]+)~", $str, $m);
As seen on ideone.
The regex will put X in match group 1, Y in match group 2 (Remarks(X): Y)

This would be a job for regular expressions, which allow you to match on exactly the kinds of rules your requirements express. Here is a tutorial for you.

Use preg function for this or otherwise you can explode and implode function to get correct result. Don't Use Substring it may not provide correction.
Example of Implode and Explode Function for your query string :
$sdr = "Remarks(4): this is remark4";
$sdr1 = explode(":",$sdr);
$frst = $sdr1[0];
$sdr2 = array_shift($sdr1);
$secnd = implode(" ", $sdr1);
echo "First String - ".$frst;
echo "<br>";
echo "Second String - ".$secnd;
echo "<br>";
Your Answer :
First String - Remarks(4)
Second String - this is remark4

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php regular expression help needed on special charecters - php

Related

preg match text between tags excluding same tag in between

php regex to get middle of string

preg_replace_callback highlight pattern not match in result

wrap words in string with regex

PHP: How to find the beginning and end of a substring in a string?

Categories

Resources