php - get string between two HTML elements - php

I have the following function:
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
I passing in the following information over to this function:
$result = scraped HTML page;
$name = get_string_between($result, '<div class="model ww"> ',' </div>');
$name= strtok($name, "\n");
I expect the following results:
$name = 'XM1014 | Bone Machine (Well-Worn)';
The whole section is as follows:
<div class="modal ww"> XM1014 | Bone Machine (Well-Worn) </div>

Try following code
function get_string_between($string, $start, $end){
$ar=array();
$ar=explode($start,$string);
$ar1=explode($end,$ar[0]);
return implode("",$ar1);
}
$result = "<div class='modal ww'> XM1014 | Bone Machine (Well-Worn) </div>";
$text_result=get_string_between($result, '<div class="model ww"> ',' </div>');
print_r($text_result);
parse html with simple html dom.
here is short example
$html = new simple_html_dom();
// Load HTML from a string
$html->load('<html><body>
<div>test1</div>
<div>test2</div>
</body></html>');
foreach($html->find('div') as $element)
print_r($element->plaintext);
for working above code you need to include this file
So you can get content between all div.
You can read more here
Finally your function will be
function get_string_between($string){
$result=array();
$html = new simple_html_dom();
$html->load($string);
foreach($html->find('div') as $element)
array_push($result,$element->plaintext);
return $result;
}
$result = "<div class='modal ww'> XM1014 | Bone Machine (Well-Worn) </div>";
$text_result=get_string_between($result);
print_r($text_result);
Hope it helps :)

You can of course use a regexp, but there are libraries which will do the job quicker and easier. Try using a HTML dom parser, or phquery which is an implementation of jquery lib in php. It's also available for Composer. So if you're familiar with jquery syntax, you can find it very neat.
Get the html you need using this:
$html = pq('.modal.ww')->html();

Related

PHP str_replace notworking if used 2 times

So I have this practical page I made to see if I can make a template language, the code is listed below:
<?php
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
function get_attributes($element) {
$output = explode(" ", $element);
return $output;
}
function build_element($item, $attributes) {
switch($element) {
case "form";
$template = "<form {{attributes}}>";
$template = str_replace("{{attributes}}", $attributes, $template);
return $template;
break;
}
}
function get_string_between($string, $start, $end){
$string = ' ' . $string;
$ini = strpos($string, $start);
if ($ini == 0) return '';
$ini += strlen($start);
$len = strpos($string, $end, $ini) - $ini;
return substr($string, $ini, $len);
}
$fullstring = '
<xe:form style="width:100px; height: 100px; background: #55ff55;"></xe:form>
';
$parsed = get_string_between($fullstring, '<xe:', '>');
$e = get_attributes($parsed);
$full = '<{{et}} {{attr}}></form>';
$full = str_replace('{{attr}}', $e[1], $full);
// the str_replace() bellow this comment is causing issues
$full = str_replace('{{et}}', $e[0], $full); // <--------------------- issue is here
echo $full;
It seems like if I add 2 str_replace functions, the echo is blank, and the $e var is working fine.
I tried echoing out both $e vars, but they are both fine.
If someone could point me in the right direction, I'd greatly appreciate it.
The result is:
<form style="width:100px;></form>
Not sure what do you want exactly, but:
the style attribute is opened with a double quote, but not closed, causing the form to be not visible.
you didn't parse the attributes perfectly as you miss the height and bgcolor. (separating with the space is not a good idea as some attributes can have space in the value)
HTML is fair complex, you might want to check out https://www.php.net/manual/en/class.domdocument.php to manipulate it without weird issues. You may change a few things with search&replace, but they will break easily.

Extract string using htmldom php

how to extract specific string after specific word using html dom in php. I have
<div class="abc">
<script type="text/javascript">
var flashvars = { word : path } </script>
Now i want to extract path after word
thanks for your response.
I got the solution for what i was looking.
Here is the code in case someone needs it.
Explanation :
'$results' is the curl response.
Enter div class name (which you want to fetch) inside "$xpath->query() function"
You will get source code for entire class inside "$tag->textContent"
$dom = new DOMDocument();
$dom->loadHTML($results);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="e"]');
foreach ($tags as $tag)
{
echo "<br>----------<br>";
var_dump($tag->textContent);
echo "<br>----------<br>";
}
Now you have your required class' html source inside "$tag->textContent".
Now you can fetch anything from the string between "start" and "end" points using below function.
function get_string_between($string, $start, $end){
$string = ' ' . $string;
$ini = strpos($string, $start);
if ($ini == 0) return '';
$ini += strlen($start);
$len = strpos($string, $end, $ini) - $ini;
return substr($string, $ini, $len);
}
In my case i used it like this :
$price = get_string_between($tag->textContent,'swf', '+');
echo $price;
Here "swf" is the starting point of the path and "+" is the end point.
Hope it saves somebody else time :)

Combining two php codes

So right now I have 2 php codes that work exactly as they are supposed to
the first one pulls all info from the "src" of an "img" tag
<?php
$url="foo";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src') . "<br>";
}
?>
the second one is designed to pull a string of characters from between two others strings
<?php
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$fullstring = "this is my [tag]dog[/tag]";
$parsed = get_string_between($fullstring, "[tag]", "[/tag]");
echo $parsed; // (result = dog)
?>
what I need is to figure out how to use the second code to only pull a piece of the "src" and replace it for as long as there are still "img" tags to process
so if the tag comes back "/pics/foo.jpg" i can remove the "/pics/" and the ".jpg" leaving me with just "foo"
i hope i have made some sense. thanks
Why do you want to use the second code? You can do it with exploding the fullstring:
$exploded_tag = explode($tag,'\');
Then you need the last element of the string (foo.jpg):
$last_part = end($exploded_tag);
Then you have to explode it and take the first element (foo):
$exploded_lastpart = explode($last_part,'.');
$piece = $exploded_lastpart[0];
You don't need second code. There is function in PHP called pathinfo(). So you can just do:
$path_parts = pathinfo('/pics/foo.jpg');
echo $path_parts['filename'];

How to parse multiple similar XML tags in PHP?

I have an XML file which is as follows. I want to parse this XML file using PHP.
<id_list>
-
<ids>
<id>195002349</id>
<id>374487611</id>
<id>192983648</id>
<id>168378766</id>`
<id>161573001</id>
</ids>
<next_cursor>0</next_cursor>
<previous_cursor>0</previous_cursor>
</id_list>
I want the output in the form:
Id1=195002349 Id2=374487611 Id3=192983648 Id4=168378766 Id5=161573001
When reading the XML (with SimpleXMLElement($file)), use XPath to search for "ids", and do a while loop to read the element`s text.
Like so:
$notes = new SimpleXMLElement('test2.xml', NULL, true);
$all = array();
$results = $notes->xpath("/id_list/ids/id");
foreach($results as $to){
echo "<br>".$to;
$all[]=(string)$to;
}
print_r($all);
This is a handy little function to strip out a string between two specified pieces of text. This could be used to parse XML text, bbCode, or any other delimited code/text for that matter.
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$fullstring = "this is my [tag]dog[/tag]";
$parsed = get_string_between($fullstring, "[tag]", "[/tag]");
echo $parsed; // (result = dog)

How to get the value of the href attribute?

With the help of XPath, how to get the value of the href attribute in the following case (only grabbing the url that is the right one)?:
a wrong one
the right one
a wrong one
That is, to get the value of the href attribute if the link has a particular text.
This will select the attributes:
"//a[text()='the right one']/#href"
i think this is the best solution, you can use each of them as an array element
$String= '
a wrong one
the right one
a wrong one
';
$array=get_all_string_between($String,'href="','">');
print_r($array);//just to see what is inside the array
//now get each of them
foreach($array as $value){
echo $value.'<br>';
}
function get_all_string_between($string, $start, $end)
{
$result = array();
$string = " ".$string;
$offset = 0;
while(true)
{
$ini = strpos($string,$start,$offset);
if ($ini == 0)
break;
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
$result[] = substr($string,$ini,$len);
$offset = $ini+$len;
}
return $result;
}
"//a[#href='http://example.com']"
I'd use an opensource class like simple_html_dom.php
$oHtml = new simple_html_dom();
$oHtml->load($sBody)
foreach($oHtml->find('a') as $oElement) {
echo $oElement->href
}
Here's a full example using SimpleXML:
$xml = '<html>a wrong one'
. 'the right one'
. 'a wrong one</html>';
$tree = simplexml_load_string($xml);
$nodes = $tree->xpath('//a[text()="the right one"]');
$href = (string) $nodes[0]['href'];

Categories