how to extract specific string after specific word using html dom in php. I have
<div class="abc">
<script type="text/javascript">
var flashvars = { word : path } </script>
Now i want to extract path after word
thanks for your response.
I got the solution for what i was looking.
Here is the code in case someone needs it.
Explanation :
'$results' is the curl response.
Enter div class name (which you want to fetch) inside "$xpath->query() function"
You will get source code for entire class inside "$tag->textContent"
$dom = new DOMDocument();
$dom->loadHTML($results);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="e"]');
foreach ($tags as $tag)
{
echo "<br>----------<br>";
var_dump($tag->textContent);
echo "<br>----------<br>";
}
Now you have your required class' html source inside "$tag->textContent".
Now you can fetch anything from the string between "start" and "end" points using below function.
function get_string_between($string, $start, $end){
$string = ' ' . $string;
$ini = strpos($string, $start);
if ($ini == 0) return '';
$ini += strlen($start);
$len = strpos($string, $end, $ini) - $ini;
return substr($string, $ini, $len);
}
In my case i used it like this :
$price = get_string_between($tag->textContent,'swf', '+');
echo $price;
Here "swf" is the starting point of the path and "+" is the end point.
Hope it saves somebody else time :)
Related
So I have this practical page I made to see if I can make a template language, the code is listed below:
<?php
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
function get_attributes($element) {
$output = explode(" ", $element);
return $output;
}
function build_element($item, $attributes) {
switch($element) {
case "form";
$template = "<form {{attributes}}>";
$template = str_replace("{{attributes}}", $attributes, $template);
return $template;
break;
}
}
function get_string_between($string, $start, $end){
$string = ' ' . $string;
$ini = strpos($string, $start);
if ($ini == 0) return '';
$ini += strlen($start);
$len = strpos($string, $end, $ini) - $ini;
return substr($string, $ini, $len);
}
$fullstring = '
<xe:form style="width:100px; height: 100px; background: #55ff55;"></xe:form>
';
$parsed = get_string_between($fullstring, '<xe:', '>');
$e = get_attributes($parsed);
$full = '<{{et}} {{attr}}></form>';
$full = str_replace('{{attr}}', $e[1], $full);
// the str_replace() bellow this comment is causing issues
$full = str_replace('{{et}}', $e[0], $full); // <--------------------- issue is here
echo $full;
It seems like if I add 2 str_replace functions, the echo is blank, and the $e var is working fine.
I tried echoing out both $e vars, but they are both fine.
If someone could point me in the right direction, I'd greatly appreciate it.
The result is:
<form style="width:100px;></form>
Not sure what do you want exactly, but:
the style attribute is opened with a double quote, but not closed, causing the form to be not visible.
you didn't parse the attributes perfectly as you miss the height and bgcolor. (separating with the space is not a good idea as some attributes can have space in the value)
HTML is fair complex, you might want to check out https://www.php.net/manual/en/class.domdocument.php to manipulate it without weird issues. You may change a few things with search&replace, but they will break easily.
I have the following function:
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
I passing in the following information over to this function:
$result = scraped HTML page;
$name = get_string_between($result, '<div class="model ww"> ',' </div>');
$name= strtok($name, "\n");
I expect the following results:
$name = 'XM1014 | Bone Machine (Well-Worn)';
The whole section is as follows:
<div class="modal ww"> XM1014 | Bone Machine (Well-Worn) </div>
Try following code
function get_string_between($string, $start, $end){
$ar=array();
$ar=explode($start,$string);
$ar1=explode($end,$ar[0]);
return implode("",$ar1);
}
$result = "<div class='modal ww'> XM1014 | Bone Machine (Well-Worn) </div>";
$text_result=get_string_between($result, '<div class="model ww"> ',' </div>');
print_r($text_result);
parse html with simple html dom.
here is short example
$html = new simple_html_dom();
// Load HTML from a string
$html->load('<html><body>
<div>test1</div>
<div>test2</div>
</body></html>');
foreach($html->find('div') as $element)
print_r($element->plaintext);
for working above code you need to include this file
So you can get content between all div.
You can read more here
Finally your function will be
function get_string_between($string){
$result=array();
$html = new simple_html_dom();
$html->load($string);
foreach($html->find('div') as $element)
array_push($result,$element->plaintext);
return $result;
}
$result = "<div class='modal ww'> XM1014 | Bone Machine (Well-Worn) </div>";
$text_result=get_string_between($result);
print_r($text_result);
Hope it helps :)
You can of course use a regexp, but there are libraries which will do the job quicker and easier. Try using a HTML dom parser, or phquery which is an implementation of jquery lib in php. It's also available for Composer. So if you're familiar with jquery syntax, you can find it very neat.
Get the html you need using this:
$html = pq('.modal.ww')->html();
So right now I have 2 php codes that work exactly as they are supposed to
the first one pulls all info from the "src" of an "img" tag
<?php
$url="foo";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src') . "<br>";
}
?>
the second one is designed to pull a string of characters from between two others strings
<?php
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$fullstring = "this is my [tag]dog[/tag]";
$parsed = get_string_between($fullstring, "[tag]", "[/tag]");
echo $parsed; // (result = dog)
?>
what I need is to figure out how to use the second code to only pull a piece of the "src" and replace it for as long as there are still "img" tags to process
so if the tag comes back "/pics/foo.jpg" i can remove the "/pics/" and the ".jpg" leaving me with just "foo"
i hope i have made some sense. thanks
Why do you want to use the second code? You can do it with exploding the fullstring:
$exploded_tag = explode($tag,'\');
Then you need the last element of the string (foo.jpg):
$last_part = end($exploded_tag);
Then you have to explode it and take the first element (foo):
$exploded_lastpart = explode($last_part,'.');
$piece = $exploded_lastpart[0];
You don't need second code. There is function in PHP called pathinfo(). So you can just do:
$path_parts = pathinfo('/pics/foo.jpg');
echo $path_parts['filename'];
I have been working on a script that pulls information from a certain website. The said website pulls the information from a database and displays it in a way the user can easily read it (like always).
Imagine it looks like this:
Var1: result1
Var2: result2
Var3: result3
What my script does is that it reads the page's source code and retrieves "result1", "result2" and "result3" by obtaining the text between two strings.
Sample code:
<?php
function get_string_between($string, $start, $end) {
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
function check($url) {
// usually, $fullstring = file_get_contents($url);
$fullstring = "<string1>result1</string1><string1>result2</string1><string1>result3</string1>";
$result = get_string_between($fullstring, "<string1>", "</string1>");
echo "<b>Result: </b>".$result;
}
check("random"); // just to execute the function
?>
In case you wonder why I have the check() function there it is because this code is part of something bigger and I need a solution that works in this case scenario, so I tried to keep it immaculate.
Now, I can easily get "result1" because it's the first occurrence, but how can I get "result2" and "result3"?
Thank you :)
Use a regex to extract all of the matches, then pick the ones you want:
function get_string_between($string, $start, $end)
{
preg_match_all( '/' . preg_quote( $start, '/') . '(.*?)' . preg_quote( $end, '/') . '/', $string, $matches);
return $matches[1];
}
The regex will capture anything between the $start and $end variables.
Now the function returns an array of all of the result values, which you can pick which one you want:
list( $first, $second, $third) = get_string_between( $string, "<string1>", "</string1>");
You can see it working in this demo.
I have an XML file which is as follows. I want to parse this XML file using PHP.
<id_list>
-
<ids>
<id>195002349</id>
<id>374487611</id>
<id>192983648</id>
<id>168378766</id>`
<id>161573001</id>
</ids>
<next_cursor>0</next_cursor>
<previous_cursor>0</previous_cursor>
</id_list>
I want the output in the form:
Id1=195002349 Id2=374487611 Id3=192983648 Id4=168378766 Id5=161573001
When reading the XML (with SimpleXMLElement($file)), use XPath to search for "ids", and do a while loop to read the element`s text.
Like so:
$notes = new SimpleXMLElement('test2.xml', NULL, true);
$all = array();
$results = $notes->xpath("/id_list/ids/id");
foreach($results as $to){
echo "<br>".$to;
$all[]=(string)$to;
}
print_r($all);
This is a handy little function to strip out a string between two specified pieces of text. This could be used to parse XML text, bbCode, or any other delimited code/text for that matter.
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$fullstring = "this is my [tag]dog[/tag]";
$parsed = get_string_between($fullstring, "[tag]", "[/tag]");
echo $parsed; // (result = dog)