Find string in text and explode - php

I have a text:
$body = 'I lorem ipsum. And I have more text, I want explode this and search link: I link.';
How can I find "I link" and explode href?
My code is:
if (strpos($body, 'site.com/') !== false) {
$itemL = explode('/', $body);
}
But this not working.

RegEx is good choice here. Try:
$body = 'I lorem ipsum. And I have more text, I want explode this
and search link: I link.';
$reg_str='/<a href="(.*?)">(.*?)<\/a>/';
preg_match_all($reg_str, $body, $matches);
echo $matches[1][0]."<br>"; //site.com/text/some
echo $matches[2][0]; //I link
Update
If you have a long text including many hrefs and many link text like I link, you could use for loop to output them, using code like:
for($i=0; $i<count($matches[1]);$i++){
echo $matches[1][$i]; // echo all the hrefs
}
for($i=0; $i<count($matches[2]);$i++){
echo $matches[2][$i]; // echo all the link texts.
}
If you want to replace the old href (e.g. site.com/text/some) with the new one (e.g. site.com/some?id=32324), you could try preg_replace like:
echo preg_replace('/<a(.*)href="(site.com\/text\/some)"(.*)>/','<a$1href="site.com/some?id=32324"$3>',$body);

You could use DOM to operate through your html:
$dom = new DOMDocument;
$dom->loadHTML($body);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $link) {
var_dump($link->textContent);
var_dump($link->getAttribute('href'));
}
demo

Related

PHP: Replace DOMElement with DOMText node

I want to create some customised tags for translating, for instance
<trad>SOMETHING</trad>
I've also got a file with some $GLOBALS variable, like:
$GLOBALS['SOMETHING'] = 'Some text';
$GLOBALS['SOMETHINGELSE'] = 'Some other text';
So I've been able to show my translation in this way:
$string = "<trad>SOMETHING</trad>";
$string = preg_replace('/<trad[^>]*?>([\\s\\S]*?)<\/trad>/','\\1', $string);
echo $GLOBALS[$string];
This works perfectly, but when I've got something more complex like the following code, or when I have more occurences of this tag, I'm not able to let it work:
$string = "Lorem ipsum <trad>SOMETHING</trad> <h1>Hello</h1> <trad>SOMETHINGELSE</trad>";
I ideally want to create a new variale $string, replacing the values that I found into my tags and being able to show it with a simple echo.
So I want an output like this with:
echo $string;
//output: Lorem ipsum Some text <h1>Hello</h1> Some other text
Can you guys help me?
Regex is not a valid approach for treating HTMLstring. Here we are using DOMDocument instead of Regex to achieve desired output. The last step of strip_tags has been done to achieve desired output, there will no need in case a valid HTML string is supplied to loadHTML, in that case saveHTML($node) will do the job.
Try this code snippet here
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$array["SOMETHING"]="some text";
$array["SOMETHINGELSE"]="some text other";
$string = "Lorem ipsum <trad>SOMETHING</trad> <h1>Hello</h1> <trad>SOMETHINGELSE</trad>";
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$results=$domDocument->getElementsByTagName("trad");
do
{
foreach($results as $result)
{
$result->parentNode->replaceChild($domDocument->createTextNode($array[trim($result->nodeValue)]),$result);
}
}
while($results->length>0);
echo strip_tags($domDocument->saveHTML(),"<h1>");

PHP regex - replace all text between a tag

I have a link being outputted on my site, what i want to do is replace the visible text that the user sees, but the link will always remain the same.
There will be many different dynamic urls with the text being changed, so all the example regex that i have found so far only use exact tags like '/.*/'...or something similar
Edited for a better example
$link = '<a href='some-dynamic-link'>Text to replace</a>';
$pattern = '/#(<a.*?>).*?(</a>)#/';
$new_text = 'New text';
$new_link = preg_replace($pattern, $new_text, $link);
When printing the output, the following is what i am looking for, against my result.
Desired
<a href='some-dynamic-link'>New text</a>
Actual
'New text'
As you're already using the capture groups, why not actually use them.
$link = "<a href='some-dynamic-link'>Text to replace</a>";
$newText = "Replaced!";
$result = preg_replace('/(<a.*?>).*?(<\/a>)/', '$1'.$newText.'$2', $link);
If you needs to get everything in Between tags then you can use below function
<?php
function getEverything_inBetween_tags(string $htmlStr, string $tagname)
{
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match_all($pattern, $htmlStr, $matches);
return $matches[1];
}
$str = 'see here for more details about test.com';
echo getEverything_inBetween_tags($str, 'a');
//output:- see here for more details about test.com
?>
if you needs to extract HTML Tag & get Array of that tag
<?php
function extractHtmlTag_into_array(string $htmlStr, string $tagname)
{
preg_match_all("#<\s*?$tagname\b[^>]*>.*?</$tagname\b[^>]*>#s", $htmlStr , $matches);
return $matches[0];
}
$str = '<p>test</p>test.com<span>testing string</span>';
$res = extractHtmlTag_into_array($str, 'a');
print_r($res);
//output:- Array([0] => "amazon.in/xyz/abc?test=abc")
?>

Regular expression to remove links with their inner text from a string with PHP

I have the following code:
$string = 'Try to remove the link text from the content links in it Try to remove the link text from the content testme Try to remove the link text from the content';
$string = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $string);
$result = preg_replace('/<a href="(.*?)">(.*?)<\/a>/', "\\2", $string);
echo $result; // this will output "I am a lot of text with links in it";
I am looking to merge these preg_replace lines. Please suggest.
You need to use DOM for these tasks. Here is a sample that removes links from this content of yours:
$str = 'Try to remove the link text from the content links in it Try to remove the link text from the content testme Try to remove the link text from the content';
$dom = new DOMDocument;
#$dom->loadHTML($str, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$links = $xp->query('//a');
foreach ($links as $link) {
$link->parentNode->removeChild($link);
}
echo preg_replace('/^<p>([^<>]*)<\/p>$/', '$1', #$dom->saveHTML());
Since the text node is the only one in the document, the PHP DOM creates a dummy p node to wrap the text, so I am using a preg_replace to remove it. I think it is not your case.
See IDEONE demo

I have to find title but print value how?

My code is given below:-
$text = "<div class='title'>Title</div><div class='content'>This is title</div>";
$words = array('Title');
$words = join("|", $words);
$matches = array();
if ( preg_match('/' . $words . '/i', $text, $matches) ){
echo "Words matched: <br/>";
print_r($matches);
}
else{
echo "Not match";
}
The problem is that in above code I am finding title but i don't want to print title; I want to print this: "This is title" and I am not understanding how I can print this by finding title.
Because title is like keyword that will not change but value which i want to print it is dynamic value and it will change every time, that's why i cannot finding value of title. So how can i do it?
Don't use regex for parsing HTML. Use a DOM Parser instead. In this case, you can use an XPath expression to get the element by class name:
$text = "<div class='title'>Title</div>
<div class='content'>This is title</div>";
$dom = new DOMDocument;
$dom->loadHTML($text);
$xpath = new DOMXPath($dom);
$title = $xpath->query('//*[#class="content"]')->item(0)->nodeValue;
Output:
This is title
This should get you started. If the title is in a different position, you can modify the expression accordingly to retrieve it.

preg_replace - How to remove contents inside a tag?

Say I have this.
$string = "<div class=\"name\">anyting</div>1234<div class=\"name\">anyting</div>abcd";
$regex = "#([<]div)(.*)([<]/div[>])#";
echo preg_replace($regex,'',$string);
The output is
abcd
But I want
1234abcd
How do I do it?
Like this:
preg_replace('/(<div[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
If you want to remove the divs too:
preg_replace('/<div[^>]*>.*?<\/div>/i', '', $string);
To replace only the content in the divs with class name and not other classes:
preg_replace('/(<div.*?class="name"[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
$string = "<div class=\"name\">anything</div>1234<div class=\"name\">anything</div>abcd";
echo preg_replace('%<div.*?</div>%i', '', $string); // echo's 1234abcd
Live example:
http://codepad.org/1XEC33sc
add ?, it will find FIRST occurence
preg_replace('~<div .*?>(.*?)</div>~','', $string);
http://sandbox.phpcode.eu/g/c201b/3
This might be a simple example, but if you have a more complex one, use an HTML/XML parser. For example with DOMDocument:
$doc = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($doc);
$query = "//body/text()";
$nodes = $xpath->query($query);
$text = "";
foreach($nodes as $node) {
$text .= $node->wholeText;
}
Which query you have to use or whether you have to process the DOM tree in some other way, depends on the particular content you have.

Categories