Search dynamic term twice in Regex

Search dynamic term twice in Regex - php

I know I can refer in replacement to dynamic parts of the term in regex in PHP:
preg_replace('/(test1)(test2)(test3)/',"$3$2$1",$string);
(Somehow like this, I don't know if this is correct, but its not what I am looking for)
I want that in the regex, like:
preg_match_all("~<(.*)>.*</$1>~",$string,$matches);
The first part between the "<" and ">" is dynamic (so every tag existing in html and even own xml tags can be found) and i want to refer on that again in the same regex-term.
But it doesn't work for me. Is this even possible?
I have a server with PHP 5.3
/edit:
my final goal is this:
if have a html-page with e. g. following source-code:
HTML
<html>
<head>
<title>Titel</title>
</head>
<body>
<div>
<p>
p-test<br />
br-test
</p>
<div>
<p>
div-p-test
</p>
</div>
</div>
</body>
</html>
And after processing it should look like
$htmlArr = array(
'html' => array(
'head' => array('title' => 'Titel'),
'body' => array(
'div0' => array(
'p0' => 'p-test<br />br-test',
'div1' => array(
'p1' => 'div-p-test'
)
)
)
));

Placeholders in the replacement string use the $1 syntax. In the regex itself they are called backreferences and follow the syntax \1 backslash and number.
http://www.regular-expressions.info/brackets.html
So in your case:
preg_match_all("~<(.*?)>.*?</\\1>~",$string,$matches);
The backslash is doubled here, because in PHP strings the backslash escapes itself. (In particular for double quoted strings, else it would become an ASCII symbol.)

Related

Is there a way in PHP array to show line break? [duplicate]

This question already has answers here:
PHP - how to create a newline character?
(15 answers)
Closed 4 years ago.
I have an index.php file and an array which has message. Is there a way that instead of <br> tag I can display the text with a new line in PHP so I can also store it in database?
The code:
$array = array(
array(
'id' => 1,
'message' => 'I\'m reading Harry Potter!',
),
array(
'id' => 2,
'message' => 'Ok. I just got a notification that you sent me a pin on Pinterest.<br>Will you come to school tomorrow?',
)
);
For example:
Ok. I just got a notification that you sent me a pin on Pinterest.
Will you come to school tomorrow?

The new line character is \n. Simply replace <br> with \n and you will have the results you're looking for.
PHP - how to create a newline character?
Note!
php does not process escape characters within single quotes.
'\n' is not processed as a new line character, while "\n" is.
What is the difference between single-quoted and double-quoted strings in PHP?
Depending on your platform, you may want to be more specific about which new line sequence you choose.
\r\n, \r and \n what is the difference between them?
$array = array(
array(
'id' => 1,
'message' => "I'm reading Harry Potter!"
),
array(
'id' => 2,
'message' => "Ok. I just got a notification that you sent me a pin on Pinterest.\nWill you come to school tomorrow?"
)
);
echo "<pre>";
print_r($array);
exit;
This will show you the raw formatting of your string.
To then convert new lines to the <br> tag for display on a webpage, you would pass that string to nl2br()
<?php echo nl2br($array[1]['message']); ?>

Is there a way that instead of <br> tag I can display the text with a new line in PHP
Yes you can easily do this with CSS
white-space: pre;
https://www.w3schools.com/cssref/pr_text_white-space.asp
Back in the day I used to do the whole "replace" thing, then I got bored of it. Now I just use CSS.
The pre option/setting will preserve whitespace much like using the <pre> tag. The only thing you have to watch for is indenting in the source code
<p style="white-space:pre;">
<?php echo $something; ?>
</p>
This extra space in the code will be added to the PHP output, instead do this:
<p style="white-space:pre;"><?php echo $something; ?></p>

You can close the php tag and reopen after the break.
For example -
'message'=>'Ok.I just got a notification that you sent me a pin on Pinterest. ?>
<br>
<?php Will you come to school tomorrow?',

Find all occurrences of $ through the last capital letter

I'm having an issue trying to find all the variables in an html file.
The HTML file contains an email template, when the email itself is sent out, it converts variables like "$EMAIL_FIRST_NAME" into things like "John" because information is sent to the email function to replace all occurrences of "$EMAIL_FIRST_NAME" with "John". The issue I'm having is trying to help people create these email templates. I want to provide them the ability to insert test data.
They can take their email template (editing in a textarea) and by use of jquery it loads their template into a new window to preview. And I have it replacing some "stock" fields. However I'm running into the issue where it would be nice for them to be able to add test data.
And since each template is for different purposes it would be nice to have the correct fields appear. I'm looking for a way to use PHP to look through the HTML template and find the variable to compile an array (to be used to create input boxes).
Some snippets for examples:
<meta http-equiv="Content-Type" content="text/html; charset=$CHARSET">
$INTRO_ORDER_NUM_TITLE $INTRO_ORDER_NUMBER
$INTRO_DATE_TITLE $INTRO_DATE_ORDERED
$WEBSITE_ADDRESSindex.php?main_page=contact_us
I'm looking for a way to give an array like this:
array('$CHARSET','$INTRO_ORDER_NUM_TITLE','$INTRO_ORDER_NUMBER','$INTRO_DATE_TITLE','$INTRO_DATE_ORDERED','$WEBSITE_ADDRESS')
Some of them I could just do an explode with a space, and then find those that start with $. However the others namely $WEBSITE_ADDRESS is a bit more challenging as the rest is not part of the variable.
All the variables are suppose to start with $ and be all capital letters.
I'm looking for a way to find the sub strings that start with $ and then through the last capital letter.
Ideas?

You could use an expression such as this: \$[A-Z_]+ (example here) to look for strings which start with a dollar sign ($) and is followed by one or many upper case letters and underscores.
As pointed out by #Jan, you can use preg_replace_callback() to have your code do some logic when replacing.

In addition to npintis regex, here's an example with a callback function. I would change the $ sign to something not so complicated for PHP (e.g. _ before and after):
<?php
$tmpl = '<meta http-equiv="Content-Type" content="text/html; charset=_CHARSET_">
_INTRO_ORDER_NUM_TITLE_ _INTRO_ORDER_NUMBER_
_INTRO_DATE_TITLE_ _INTRO_DATE_ORDERED_
_WEBSITE_ADDRESS_index.php?main_page=contact_us';
$allowed = array('_CHARSET_','_INTRO_ORDER_NUM_TITLE_','_INTRO_ORDER_NUMBER_','_INTRO_DATE_TITLE_','_INTRO_DATE_ORDERED_','_WEBSITE_ADDRESS_');
$replacements = array("_CHARSET_" => "some stupid charset");
$regex = '~(?<variable>_[A-Z_]+)~';
$tmpl = preg_replace_callback(
$regex,
function ($match) {
global $allowed, $replacements;
$m = $match["variable"];
if (in_array($m, array_keys($allowed))) {
return $replacements[$m];
// or anything else
}
},
$tmpl
);
echo $tmpl;
// now you have a stupid charset ...
?>

Try this one:-
$str = '<meta http-equiv="Content-Type" content="text/html; charset=$CHARSET">
$INTRO_ORDER_NUM_TITLE $INTRO_ORDER_NUMBER
$INTRO_DATE_TITLE $INTRO_DATE_ORDERED
$WEBSITE_ADDRESSindex.php?main_page=contact_us';
$input = preg_match_all('/\$[A-Z_]+/', $str, $match);
$result = array_unique($match[0]);
echo '<pre>'; print_r($result);
Output:-
Array
(
[0] => $CHARSET
[1] => $INTRO_ORDER_NUM_TITLE
[2] => $INTRO_ORDER_NUMBER
[3] => $INTRO_DATE_TITLE
[4] => $INTRO_DATE_ORDERED
[5] => $WEBSITE_ADDRESS
)

Regex Everything Including Whitespace

I need a regular expression pattern all characters including whitespace what is not a variable in PHP.
<li class="xyz" data-name="abc">
<span id="XXX">some words</span>
<div data-attribute="values">
<a class="klm" href="http://example.com/blabla">somethings</a>
</div>
<div class="xyz sub" data-name="abc-sub"><img src="/images/any_image.jpg" class="qqwwee"></div>
</li><!--repeating li tags-->
I wrote a pattern;
preg_match_all('#<li((?s).*?)<div((?s).*?)href="((?s).*?)"((?s).*?)</li>#', $subject, $matches);
This works well but I don't want to get four variables. I just want to get
http://example.com/blabla
And anyone can tell me why this does not work like that?
preg_match_all('#<li[[?s].*?]<div[[?s].*?]href="((?s).*?)"[[?s].*?]</li>#', $subject, $matches);

Using (?:) will allow grouping but make those groups not captured, for example, the following:
#<li(?:(?s).*?)<div(?:(?s).*?)href="((?s).*?)"(?:(?s).*?)</li>#
Will output:
array (
0 =>
array (
0 => '<li class="xyz" data-name="abc">
<span id="XXX">some words</span>
<div data-attribute="values">
<a class="klm" href="http://example.com/blabla">somethings</a>
</div>
<div class="xyz sub" data-name="abc-sub"><img src="/images/any_image.jpg" class="qqwwee"></div>
</li>',
),
1 =>
array (
0 => 'http://example.com/blabla',
),
)
All of your matches will be contained in $matches[1], so iterate through that.

Don't use RegExps to parse HTML
Read this famous answer on StackOverflow.
HTML is not a regular language, so it cannot be reliably processed with a RegExp. Instead, use a proper (and robust) HTML parser.
Also note that data mining (analysis) != web-scraping (data collection).
If you don't want a regexp group to store the "captured" data, use a non-capturing flag.
(?:some-complex-regexp-here)
In your case, the following may work:
(?s)<li.*?<div.*?href="([^"]*?)".*?</li>
But seriously, don't use regexps for this; regexps are fragile. Use an xpath like /li//div//a//#href instead.

PHP : preg_match_all none xhtml attributes

I want to improve my code but I do not know how to write the regexp.
I want to get all none xhtml attributes in a tag.
So after the preg match I want to get :
array(
0 => "required",
1 => "autocomplete"
);
$balise = <input id="myId" class="myClassA myClassB myClassC" required autocomplete/>;
I actually use this preg_match_all("/(?<=\s)[\w]+(?=[\s\/>])/i", $balise, $attributs);
But with the regexp I get :
array(
0 => "myClassB",
1 => "required",
3 => "autocomplete"
);
I do not want to get myClassB...
can anyone help me to write my regex ?
Thx

You can add the negative look-ahead (?![^=]*?") to make sure the next " doesn't precede the next =, that way you're getting only words that aren't within a quoted value. Single-quote the string so that the " in the regex won't terminate it.
preg_match_all('/(?<=\s)[\w]+(?=[\s\/>])(?![^=]*?")/i', $balise, $attributs);

How to str_replace a section of PHP Code

$embedCode = <<<EOF
getApplicationContent('video','player',array('id' => $iFileId, 'user' => $this->iViewer, 'password' => clear_xss($_COOKIE['memberPassword'])),true)
EOF;
$name = str_replace($embedCode,"test",$content);
I'm trying to replace a section of code with another piece of code. I can do it with smaller strings but once I added the larger strings to $embedCode, it throw an "unexpected T_ENCAPSED_AND_WHITESPACE" error

you should unescape the $ using \$
$embedCode = <<<EOF
getApplicationContent('video','player',array('id' => \$iFileId, 'user' => \$this->iViewer, 'password' => clear_xss(\$_COOKIE['memberPassword'])),true)
EOF;
IF your objective is to use the vars name, if you want to use the real value of the variables, then the problem is in $this->iViewer...

remove ' around the memberPassword near the $_COOKIE
anyway seems you're looking for language construction that not interprets variable inside - so then you have to use not HEREDOC syntax - but regular string definition limited with '
$sample = 'qwe $asd zxc';
or escape $ with \ as Marcx propose below

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.