Find string patterns enclosed in specific characters (PHP) - php

I have a HTML template (as a single string) that contains a various number of keys enclosed in ### characters. For instance, these keys could be ###textItem1###, ###textItem2### and so on ...
Now, how do I find all keys that are enclosed in ### in that HTML template/string? I want to read the keys, save them in an array and then loop through the array in order to replace the keys by a proper text item (that is also represented by the same key, but in another array).
I'm working with PHP.
Thanks!

You can use regular expressions with PHP's preg_match_all function:
$pattern = '/###(.+?)###/';
$string = 'This is a text with ###textItem1### and ###textItem2### in it. It also has ###textItem3### and ###textItem4### as well';
preg_match_all($pattern, $string, $matches);
print_r($matches[1]);
PHPFiddle Link: http://phpfiddle.org/main/code/psad-tq9r

This Works too.
$string = 'hello, this is [#firstname], i am [#age] years old';
preg_match_all('~\[#(.+?)\]~', $string, $matches);
var_dump( $matches );

You can create a custom function like this
function getdatabetween($string, $start, $end){
$sp = strpos($string, $start)+strlen($start);
$ep = strpos($string, $end)-strlen($start);
$data = trim(substr($string, $sp, $ep));
return trim($data);
}
echo getdatabetween(" ###textItem1###","###", "###");

You can do this with preg_match_all
for example this is your template code
<?php
$string = '
<html>
<head>
<title>###title###</title>
</head>
<body>
###content###
</body>
</html>
';
and this is the data that you want to replace
$data = array("title" => 'hello world', 'content' => 'Page content here ....');
you can replace it like this
function getTemplate($string, $data){
preg_match_all("/[###]{3}+[a-z0-9_-]+[###]{3}/i", $string, $matches);
foreach ($matches[0] as $key => $match) {
$string = str_replace($match, $data[str_replace('#', '', $match)], $string);
}
return $string;
}
echo getTemplate($string, $data);
output
<html>
<head>
<title>hello world</title>
</head>
<body>
Page content here ....
</body>
</html>

Related

How to get value inside <a tag using preg match all?

i got html content that need to extract values inside hyperlink tag using preg match all. I tried the following but i don't get any data. I included a sample input data. Could you guys help me fix this code and print all values in front of play.asp?ID=(example: i want to get this value 12345 from play.asp?ID=12345) ?
sample input html data:
<span id="Img_1"></span></TD>
and the code
$regexp = "<A\s[^>]*HREF=\"play.asp(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/A>";
if(preg_match_all("/$regexp/siU", $input, $matches))
{
$url=str_replace('?ID=', '', $matches[2]);
$url2=str_replace('&Selected_ID=&PhaseID=123', '', $url);
print_r($url2);
}
$str = '<span id="Img_1"></span>';
preg_match_all( '/<\s*A[^>]HREF="(.*?)"\s?(.*?)>/i', $str, $match);
print_r( $match );
Try out this.
Don't! Regular expressions are a (bad) way of text processing. This is not text, but HTML sourcecode. The tools to cope with it are called HTML parsers. Although PHP's DOMDocument is also able to loadHTML, it may glitch on some rare cases. A poorly built regexp (and you are wrong to think there's any other) will glitch on almost any changes in the page.
Isnt this enough?
/<a href="(.*?)?"/I
EDIT:
This seems to work:
'/<a href="(.*?)\?/i'
this should achieve the desired result. it's a combination of an HTML parser and a contents extraction function:
function extractContents($string, $start, $end)
{
$pos = stripos($string, $start);
$str = substr($string, $pos);
$str_two = substr($str, strlen($start));
$second_pos = stripos($str_two, $end);
$str_three = substr($str_two, 0, $second_pos);
$extractedContents = trim($str_three);
return $extractedContents;
}
include('simple_html_dom.php');
$html = file_get_html('http://siteyouwantlinksfrom.com');
$links = $html->find('a');
foreach($links as $link)
{
$playIDs[] = extractContents($link->href, 'play.asp?ID=', '&');
}
print_r($playIDs);
you can download simple_html_dom.php from here
You shouldn't use Regular Expression to parse HTML.
This is a solution with DOMDocument :
<?php
$input = '<span id="Img_1"></span>';
// Clean "&" element in href
$cleanInput = str_replace('&','&',$input);
// Load HTML
$domDocument = new DOMDocument();
$domDocument->loadHTML($cleanInput);
// Retrieve <a /> tags
$aTags = $domDocument->getElementsByTagName('a');
foreach($aTags as $aTag)
{
$href = $aTagA->getAttribute('href');
$url = parse_url($href);
$vars = array();
parse_str($url['query'], $vars);
var_dump($vars);
}
?>
Output :
array (size=3)
'ID' => string '12345' (length=5)
'Selected_ID' => string '' (length=0)
'PhaseID' => string '123' (length=3)

preg_replace string with array

Okay, so my question is pretty simple. I hope the answer is too.
Let's say I have the following php string:
<!DOCTYPE html>
<html>
<head>
<title>test file</title>
</head>
<body>
<div id="dynamicContent">
<myTag>PART_ONE</myTag>
<myTag>PART_TWO </myTag>
<myTag> PART_THREE</myTag>
<myTag> PART_FOUR </myTag>
</div>
</body>
</html>
Let's say this is $content.
Now, you can see I have 4 custom tags (myTag) with one word content. (PART_ONE, PART_TWO, etc.)
I want to replace those 4 with 4 different strings. Those latter 4 strings are in an array:
$replace = array("PartOne", "PartTwo", "PartThree", "PartFour");
I did this but it doesn't work succesfully:
$content = preg_replace("/<myTag>(.*?)<\/myTag>/s", $replace, $content);
So, I want to search for myTags (it finds 4) and replace it with one entry of the array. The first occurrence should be replaced by $replace[0], the second by $replace[1], etc.
Then, it will return the "new" content as a string (not as an array) so I can use it for further parsing.
How should I realize this?
Something like the following should work:
$replace = array("PartOne", "PartTwo", "PartThree", "PartFour");
if (preg_match_all("/(<myTag>)(.*?)(<\/myTag>)/s", $content, $matches)) {
for ($i = 0; $i < count($matches[0]); $i++) {
$content = str_replace($matches[0][$i], $matches[1][$i] . $replace[$i] . $matches[3][$i], $content);
}
}
One approach would be to loop over each element in the array you want to replace with; replace the words myTag with myDoneTag or something for each one you finished, so you find the next one. Then you can always put back myTag at the end, and you have your string:
for(ii=0; ii<4; ii++) {
$content = preg_replace("/<myTag>.*<\/myTag>/s", "<myDoneTag>".$replace[ii]."<\/myDoneTag>", $content, 1);
}
$content = preg_replace("/myDoneTag/s", "myTag", $content);
With regexes, you could something like this:
$replaces = array('foo','bar','foz','bax');
$callback = function($match) use ($replaces) {
static $counter = 0;
$return = $replaces[$counter % count($replaces)];
$counter++;
return $return;
};
var_dump(preg_replace_callback('/a/',$callback, 'a a a a a '));
But really, when searching for tags in html or xml, you want a parser:
$html = '<!DOCTYPE html>
<html>
<head>
<title>test file</title>
</head>
<body>
<div id="dynamicContent">
<myTag>PART_ONE</myTag>
<myTag>PART_TWO </myTag>
<myTag> PART_THREE</myTag>
<myTag> PART_FOUR </myTag>
</div>
</body>
</html>';
$d = new DOMDocument();
$d->loadHTML($html);
$counter = 0;
foreach($d->getElementsByTagName('mytag') as $node){
$node->nodeValue = $replaces[$counter++ % count($replaces)];
}
echo $d->saveHTML();
This should be the syntax you're looking for:
$patterns = array('/PART_ONE/', '/PART_TWO/', '/PART_THREE/', '/PART_FOUR/');
$replaces = array('part one', 'part two', 'part three', 'part four');
preg_replace($patterns, $replaces, $text);
But be warned, these are run sequentially so if the text for 'PART_ONE` contains the text 'PART_TWO' that will be subsequently replaced.

Use a regex match as an array pointer

I want to replace some numbers in a string with the content of the array in the position which this number points to.
For example, replace "Hello 1 you are great", with "Hello myarray[1] you are great"
I was doing the next: preg_replace('/(\d+)/','VALUE: ' . $array[$1],$string);
But it does not work. How could I do it?
You should use a callback.
<?php
$str = 'Hello, 1!';
$replacements = array(
1 => 'world'
);
$str = preg_replace_callback('/(\d+)/', function($matches) use($replacements) {
if (array_key_exists($matches[0], $replacements)) {
return $replacements[$matches[0]];
} else {
return $matches[0];
}
}, $str);
var_dump($str); // 'Hello, world!'
Since you are using a callback, in the event that you actually want to use a number, you might want to encode your strings as {1} or something instead of 1. You can use a modified match pattern:
<?php
// added braces to match
$str = 'Hello, {1}!';
$replacements = array(
1 => 'world'
);
// added braces to regex
$str = preg_replace_callback('/\{(\d+)\}/', function($matches) use($replacements) {
if (array_key_exists($matches[1], $replacements)) {
return $replacements[$matches[1]];
} else {
// leave string as-is, with braces
return $matches[0];
}
}, $str);
var_dump($str); // 'Hello, world!'
However, if you are always matching known strings, you may want to use #ChrisCooney's solution because it offers less opportunity to screw up the logic.
The other answer is perfectly fine. I managed it this way:
$val = "Chris is 0";
// Initialise with index.
$adj = array("Fun", "Awesome", "Stupid");
// Create array of replacements.
$pattern = '!\d+!';
// Create regular expression.
preg_match($pattern, $val, $matches);
// Get matches with the regular expression.
echo preg_replace($pattern, $adj[$matches[0]], $val);
// Replace number with first match found.
Just offering another solution to the problem :)
$string = "Hello 1 you are great";
$replacements = array(1 => 'I think');
preg_match('/\s(\d)\s/', $string, $matches);
foreach($matches as $key => $match) {
// skip full pattern match
if(!$key) {
continue;
}
$string = str_replace($match, $replacements[$match], $string);
}
<?php
$array = array( 2 => '**', 3 => '***');
$string = 'lets test for number 2 and see 3 the result';
echo preg_replace_callback('/(\d+)/', 'replaceNumber', $string);
function replaceNumber($matches){
global $array;
return $array[$matches[0]];
}
?>
output
lets test for number ** and see *** the result

PHP extract text from string - trim?

I have the following XML:
<id>tag:search.twitter.com,2005:22204349686</id>
How can i write everything after the second colon to a variable?
E.g. 22204349686
if(preg_match('#<id>.*?:.*?:(.*?)</id>#',$input,$m)) {
$num = $m[1];
}
When you already have just the tags content in a variable $str, you could use explode to get everything from the second : on:
list(,,$rest) = explode(':', $str, 3);
$var = preg_replace('/^([^:]+:){2}/', '', 'tag:search.twitter.com,2005:22204349686');
I am assuming you already have the string without the <id> bits.
Otherwise, for SimpleXML:
$var = preg_replace('/^([^:]+:){2}/', '', "{$yourXml->id}");
First, parse the XML with an XML parser. Find the text content of the node in question (tag:search.twitter.com,2005:22204349686). Then, write a relevant regex, e.g.
<?php
$str = 'tag:search.twitter.com,2005:22204349686';
preg_match('#^([^:]+):([^,]+),([0-9]+):([0-9]+)#', $str, $matches);
var_dump($matches);
I suppose you have in a variable ($str) the content of id tag.
// get last occurence of colon
$pos = strrpos($str, ":");
if ($pos !== false) {
// get substring of $str from position $pos to the end of $str
$result = substr($str, $pos);
} else {
$result = null;
}
Regex seems to me inappropriate for such a simple matching.
If you dont have the ID tags around the string, you can simply do
echo trim(strrchr($xml, ':'), ':');
If they are around, you can use
$xml = '<id>tag:search.twitter.com,2005:22204349686</id>';
echo filter_var(strrchr($xml, ':'), FILTER_SANITIZE_NUMBER_INT);
// 22204349686
The strrchr part returns :22204349686</id> and the filter_var part strips everything that's not a number.
Use explode and strip_tags:
list(,,$id) = explode( ':', strip_tags( $input ), 3 );
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$x='<id>tag:search.twitter.com,2005:22204349686</id>';
$text=between(',','<',$x);
if($text!==false) {
//got some text..
}

PHP/regex: How to get the string value of HTML tag?

I need help on regex or preg_match because I am not that experienced yet with regards to those so here is my problem.
I need to get the value "get me" but I think my function has an error.
The number of html tags are dynamic. It can contain many nested html tag like a bold tag. Also, the "get me" value is dynamic.
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
That should do the trick
Try this
$str = '<option value="123">abc</option>
<option value="123">aabbcc</option>';
preg_match_all("#<option.*?>([^<]+)</option>#", $str, $foo);
print_r($foo[1]);
In your pattern, you simply want to match all text between the two tags. Thus, you could use for example a [\w\W] to match all characters.
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname>([\w\W]*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
Since attribute values may contain a plain > character, try this regular expression:
$pattern = '/<'.preg_quote($tagname, '/').'(?:[^"'>]*|"[^"]*"|\'[^\']*\')*>(.*?)<\/'.preg_quote($tagname, '/').'>/s';
But regular expressions are not suitable for parsing non-regular languages like HTML. You should better use a parser like SimpleXML or DOMDocument.
this might be old but my answer might help someone
You can simply use
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
echo strip_tags($str);
https://www.php.net/manual/en/function.strip-tags.php
$userinput = "http://www.example.vn/";
//$url = urlencode($userinput);
$input = #file_get_contents($userinput) or die("Could not access file: $userinput");
$regexp = "<tagname\s[^>]*>(.*)<\/tagname>";
//==Example:
//$regexp = "<div\s[^>]*>(.*)<\/div>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
// $match[2] = link address
// $match[3] = link text
}
}
try $pattern = "<($tagname)\b.*?>(.*?)</\1>" and return $matches[2]
The following php snippets would return the text between html tags/elements.
regex : "/tagname(.*)endtag/" will return text between tags.
i.e.
$regex="/[start_tag_name](.*)[/end_tag_name]/";
$content="[start_tag_name]SOME TEXT[/end_tag_name]";
preg_replace($regex,$content);
It will return "SOME TEXT".
Your HTML
$html='<ul id="main">
<li>
<h1>My Title</h1>
<span class="date">Date</span>
<div class="section">
[content]
</div>
</li>
</ul>';
//function call you can change the tag name
echo contentBetweenTags($html,"span");
// this function will help you to fetch the data from a specific tag
function contentBetweenTags($content, $tagname){
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match($pattern, $content, $matches);
if(empty($matches))
return;
$str = "<$tagname>".html_entity_decode($matches[1])."</$tagname>";
return $str;
}

Categories