replace special strings in a html page by php - php

I am looking for a way to replace all string looking alike in entire page with their defined values
Please do not recommend me other methods of including language constants.
Strings like this :
[_HOME]
[_NEWS]
all of them are looking the same in [_*] part
Now the big issue is how to scan a HTML page and to replace the defined values .
One ways to parse the html page is to use DOMDocument and then pre_replace() it
but my main problem is writing a pattern for the replacement
$pattern = "/[_i]/";
$replacement= custom_lang("/i/");
$doc = new DOMDocument();
$htmlPage = $doc->loadHTML($html);
preg_replace($pattern, $replacement, $htmlPage);

In RegEx, [] are operators, so if you use them you need to escape them.
Other problem with your expression is _* which will match Zero or more _. You need to replace it with some meaningful match, Like, _.* which will match _ and any other characters after that. SO your full expression becomes,
/\[_.*?\]/
Hey, why an ?, you might be tempted to ask: The reason being that it performs a non-greedy match. Like,
[_foo] [_bar] is the query string then a greedy match shall return one match and give you the whole of it because your expression is fully valid for the string but a non-greedy match will get you two seperate matches. (More information)
You might be better-off in being more constrictive, by having an _ followed by Capital letters. Like,
/\[_[A-Z]+\]/
Update: Using the matched strings and replacing them. To do so we use the concept called back-refrencing.
Consider modifying the above expression, enclosing the string in parentheses, like, /\[_([A-Z]+)\]/
Now in preg-replace arguments we can use the expression in parentheses by back-referencing them with $1. So what you can use is,
preg_replce("/\[_([A-Z]+)\]/e", "my_wonderful_replacer('$1')", $html);
Note: We needed the e modifier to treat the second parameter as PHP code. (More information)

If you know the full keyword you are trying to replace (e.g. [_HOME]), then you can just use str_replace() to replace all instances.
No need to make things like this more complex by introducing regex.

Related

PHP Regex - Remove a specific string after certain pattern

I'm making column shortcodes in WordPress and it always add </p> after the tag.
So the raw HTML result from dumping the variable looks like this:
<column class="size-5"></p>
....
</column>
I want to delete that lone </p> with regex, so I made this:
$content = preg_replace("/(?!<column[^<]+)<\/p>/", '', $content);
I matched</p> while excluding the column tag. Here's the Regexr link.
In regexr (which I assume uses JS syntax), it works perfectly. But in PHP, it matches every single </p> and remove it.
I have tried many variation for look behind ?<! and ?>! but doesn't work.
Has anyone experienced this same problem before?
Thanks
First of all, you should know that manipulating HTML with regex is vulnerable and may not work in 100% cases with arbitrary HTML code. You should only use it when you know what you are doing (you generate the HTML yourself in the unique way, or the HTML provider is known and uses a unqiue approach to HTML escaping, etc.).
Next, you do not need to use any negative lookaheads. The pattern you are using matches any </p> that is not a starting subsequence of <column[^<]+ subpattern, which is always true, and you effectively match any </p>.
In case you want to remove some text that appears in some specific known context, you may rely on capturing what you need and just matching what you want to replace. The only thing to do is to enclose the part of pattern you will need to keep with (...) and use a backreference to that group in the replacement pattern.
Use
$content = preg_replace('/(<column\b[^<]*>)<\/p>/', '$1', $content);
Alternatively, in PCRE, you may use \K operator that omits the whole text matched so far like
$content = preg_replace('/<column\b[^<]*>\s*\K<\/p>/', '', $content);
And you won't have to use any backreferences in the replacement pattern.
I added the \b (word boundary) to make sure column is matched as a whole word. Since it still can match column in column-editor, you might want to repace <column\b[^<]*> with <column(?:\s[^<]*)?>.

How would i write a regular expression to check for a string of text surrounded by equal signs?

How would i use regular expressions to check for characters within the following string of text:
=== logo ===
I tried to use a regex tester but could come up with the correct expression for i've tried this:
/^[=]{3}$/
I want search within a string find where the text starts with 3 equal signs.
Find a string or any other characters within the equal signs.
Find 3 more equal signs.. ending the expression.
Thanks in advance.
Try using this regex:
/===[^=]+===/
If you want to capture the text, surround it in parentheses:
/===([^=]+)===/
Here's the fiddle: http://jsfiddle.net/jufXA/
If you might have equal signs in your text (but less than 3, obviously) you should instead match everything lazily (which is a tad slower):
/===(.+?)===/
Here's the fiddle: http://jsfiddle.net/jufXA/1/
How about as simple as...
/===(.+?)===/
For example:
$test = "here's ===something special===, like ===this=one===";
preg_match_all('/===(.+?)===/', $test, $matches);
var_dump($matches[1]);
Laziness is kinda virtue here: the regex engine won't advance past the first 'closing delimiter ==='. Without ?, however, you need to use negated character classes (but then again, what about ===something=like=this===?).
I prefer:
/([=]{3})\s*(.+?)\s*\1/.
This puts the text markup (three equal signs) in the beginning and then just uses a back reference for the end. It also trims your text of spaces, which is what you probably want.

Split string on non-alphanumerics in PHP? Is it possible with php's native function?

I was trying to split a string on non-alphanumeric characters or simple put I want to split words. The approach that immediately came to my mind is to use regular expressions.
Example:
$string = 'php_php-php php';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
But there are two problems that I see with this approach.
It is not a native php function, and is totally dependent on the PCRE Library running on server.
An equally important problem is that what if I have punctuation in a word
Example:
$string = 'U.S.A-men's-vote';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
Now this will spilt the string as [{U}{S}{A}{men}{s}{vote}]
But I want it as [{U.S.A}{men's}{vote}]
So my question is that:
How can we split them according to words?
Is there a possibility to do it with php native function or in some other way where we are not dependent?
Regards
Sounds like a case for str_word_count() using the oft forgotten 1 or 2 value for the second argument, and with a 3rd argument to include hyphens, full stops and apostrophes (or whatever other characters you wish to treat as word-parts) as part of a word; followed by an array_walk() to trim those characters from the beginning or end of the resultant array values, so you only include them when they're actually embedded in the "word"
Either you have PHP installed (then you also have PCRE), or you don't. So your first point is a non-issue.
Then, if you want to exclude punctuation from your splitting delimiters, you need to add them to your character class:
preg_split('/[^a-z0-9.\']+/i', $string);
If you want to treat punctuation characters differently depending on context (say, make a dot only be a delimiter if followed by whitespace), you can do that, too:
preg_split('/\.\s+|[^a-z0-9.\']+/i', $string);
As per my comment, you might want to try (add as many separators as needed)
$splitArr = preg_split('/[\s,!\?;:-]+|[\.]\s+/', $string, -1, PREG_SPLIT_NO_EMPTY);
You'd then have to handle the case of a "quoted" word (it's not so easy to do in a regular expression, because 'is" "this' quoted? And how?).
So I think it's best to keep ' and " within words (so that "it's" is a single word, and "they 'll" is two words) and then deal with those cases separately. For example a regexp would have some trouble in correctly handling
they 're 'just friends'. Or that's what they say.
while having "'re" and a sequence of words of which the first is left-quoted and the last is right-quoted, the first not being a known sequence ('s, 're, 'll, 'd ...) may be handled at application level.
This is not a php-problem, but a logical one.
Words could be concatenated by a -. Abbrevations could look like short sentences.
You can match your example directly by creating a solution that fits only on this particular phrase. But you cant get a solution for all possible phrases. That would require a neuronal-computing based content-recognition.

Regular expression doesn't quite work

I have created a Regular Expression (using php) below; which must match ALL terms within the given string that contains only a-z0-9, ., _ and -.
My expression is: '~(?:\(|\s{0,},\s{0,})([a-z0-9._-]+)(?:\s{0,},\s{0,}|\))$~i'.
My target string is: ('word', word.2, a_word, another-word).
Expected terms in the results are: word.2, a_word, another-word.
I am currently getting: another-word.
My Goal
I am detecting a MySQL function from my target string, this works fine. I then want all of the fields from within that target string. It's for my own ORM.
I suppose there could be a situation where by further parenthesis are included inside this expression.
From what I can tell, you have a list of comma-separated terms and wish to find only the ones which satisfy [a-z0-9._\-]+. If so, this should be correct (it returns the correct results for your example at least):
'~(?<=[,(])\\s*([a-z0-9._-]+)\\s*(?=[,)])~i'
The main issues were:
$ at the end, which was anchoring the query to the end of the string
When matching all you continue from the end of the previous match - this means that if you match a comma/close parenthesis at the end of one match it's not there at match at the beginning of the next one. I've solved this with a lookbehind ((?<=...) and a lookahead ((?=...)
Your backslashes need to be double escaped since the first one may be stripped by PHP when parsing the string.
EDIT: Since you said in a comment that some of the terms may be strings that contain commas you will first want to run your input through this:
$input = preg_replace('~(\'([^\']+|(?<=\\\\)\')+\'|"([^"]+|(?<=\\\\)")+")~', '"STRING"', $input);
which should replace all strings with '"STRING"', which will work fine for matching the other regex.
Maybe using of regex is overkill. In this kind of text you can just remove parenthesis and explode string by comma.

Replacing a string using preg_match

I'm having trouble using preg_match to find and replace a string. The string of interest is:
<span style="font-size:0.6em">EXPIRATION DATE: 04/30/2011</span>
I need to target and replace the date, "04/30/2011" with a different date. Can someone throw me a bone a give me the regular expression to match this pattern using preg_match in PHP? I also need it to match in such a way that it only replaces up to the first closing span and not closing span tags later in the code, e.g.:
<span style="font-size:0.6em">EXPIRATION DATE: 04/30/2011</span><span class="hello"></span>
I'm not versed in regex, and although I've spent the last hour trying to learn enough to make this work, I'm utterly failing. Thanks so much!
EDIT: As you can see this has gotten me exhausted. I did mean preg_replace, not preg_match.
If you're after a replacement, consider using preg_replace(), something like
preg_replace('#(\d{2})/(\d{2})/(\d{4})#', '<new date>', $string);
How about this:
$toBeFoundPattern = '/([0-9][0-9])\/([0-9][0-9])\/([0-9][0-9][0-9][0-9])/';
$toBeReplacedPattern = '$2.$1.$3';
$inString = '<span style="font-size:0.6em">EXPIRATION DATE: 04/30/2011</span>';
// Will convert from US date format 04/30/2011 to european format 30.04.2011
echo preg_replace( $toBeFoundPattern, $toBeReplacedPattern, $inString );
and prints
EXPIRATION DATE: 30.04.2011
Patterns always begin and end with identical so called delimiter characters. Often the character / is used.
$1 references the string, which matched the first string matched by ([0-9][0-9]), $2 references be (...) and $3 the four letters matched by the last (...).
[...] matched a single character, which is one of those listed inside the brackets. E.g. [a-z] matches all lower case letters.
To use the special meaning character / inside of a pattern, you need to escape it by \ to make it be the literal slash character.
Update: Using {..} as pointed out below is shorthand for repeated patterns.
Regex should be:
(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d
If you want to only match one instance, this is OK. For multiple instances, use preg_match_all instead. Taken from http://www.regular-expressions.info/regexbuddy/datemmddyyyy.html.
Edit: are you looking to just search and replace inside a PHP script or do you want to do some javascript live replacement?

Categories