php - parsing html doc but issue with comparing text content [closed]

php - parsing html doc but issue with comparing text content [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm using PHP to retrieve a document and find some data within the HTML.
I've used Tidy clean and repair as the document contains lots of bad html.
Anyway,
In the html document there is a tag like:
Link 12345
I want to get the value of the attribute (www.google.com) if the text content (Link 12345) matches a certain string.
$h2 = $doc->getElementsByTagName('a');
for ($i2; $i2 < $h2->length; $i2++) {
$attr2 = $h2->item($i2)->getAttribute('href');
if ($h2->item($i2)->textContent == "Link 12345")
print "FOUND";
}
which doesn't seem to work. I know that the for loop returns 'Link 12345' at some point (when ->textContent is called). But the comparison always fails even though Link 12345 appears if it is printed out. I suspect there is some issue with the encoding but I can't get it fixed.
Thanks.

You can use PHP's DOMXPath to execute an XPath query against your DOM object.
I believe that for yours it'll be
//a[text()="Link 12345"]
Will return all the who's text is "Link 12345".

A simple bug: you are testing "$h2->item($i2)->textContent" instead of "$h2->textContent"
Isn't it?

Related

Parse HTML Table - PHP [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have an HTML table that I would like to parse in PHP to store into a MySQL Database. The HTML looks like this:
<tr><td>DATE</td><td>LOCATION</td><td>NAME</td></tr>
I would like to create a PHP function that returns in an array, the fields in capital letters. Does anyone know any php libraries that can do this, or should I be using a different language, as this may be complex. I don't know exactly how to do this with many tables on the page, but I am trying to parse the VEX events on RobotEvents. The table that I want to parse starts at line 465.

Take a look at the PHP HTML DOM Parser library.
To use, you can do something similar to this (not my example):
require('simple_html_dom.php');
$table = array();
$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
$time = $row->find('td',0)->plaintext;
$artist = $row->find('td',1)->plaintext;
$title = $row->find('td',2)->plaintext;
$table[$artist][$title] = true;
}
echo '<pre>';
print_r($table);
echo '</pre>';
There's some tutorials, SO questions and interesting reads about the library. It seems to be pretty popular.
http://davidwalsh.name/php-notifications
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/
Looping through a table with Simple HTML DOM
how to print cells of a table with simple html dom
UPDATE FOR FINDING SPECIFIC TABLE IN HTML USING ABOVE LIBRARY
To find a particular table amongst many:
1. By class:
On line 465 of your scraped HTML, the table starts with a class catalog-listing, so:
foreach ($html->find('table[#class="catalog-listing"]')->find('tr') as $row) {
// extract TD data
}
2. By instance (find 2nd table in HTML)
foreach ($html->find('table', 2)->find('tr') as $row) {
// extract TD data
}

As you're prepared to look beyond PHP, Nokogiri (Ruby) and Beautiful Soup (Python) are well-established libraries that parse HTML very well.
That doesn't imply that there are no suitable PHP libraries.

Find all reaccurances of text inbetween two strings [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I'm trying to find a text between $something and $something_else and ditch it out in an array.
I would think you need preg_match to do this but I have been trying alot and still have no idea.
This should work no matter what $something and $something_else is.

You need to read the documentation of preg_match and preg_match_all.
Here's a simple example that will match whatever content inside (double quotes)..
<?php
preg_match_all('~"(.*?)"~',
'Hey there "I will be matched", because "I am inside the double quotes"',
$out, PREG_PATTERN_ORDER);
print_r(($out[0]));
OUTPUT :
Array
(
[0] => "I will be matched"
[1] => "I am inside the double quotes"
)

Correct me if i am wrong. We can use explode one string into a array. Use pre_match_all for the another string with each word of the array . In this way, it will work with any string.

Dynamically replacing # and # like Twitter [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
For my Laravel-based site, I need to find # and # within text content and replace it with a URL, such as a URL pointing at a user's Twitter page. How can I:
reliably find these strings within text portions of the HTML
replace found instances with a URL

The code for it is vast. You will have to use ajax here, in the textarea/textbox you will have to use "onkeyup" event, every key pressed have to be compared with "#" or "#" then the next character right after "#" has to be searched in the database.
So lets saw the user has typed "#A" till now and the user aims to type "#Ankur" Then as soon as "A" is typed the ajax script will start searching for users in the database and it is retrieved with the name, url and you just have to echo it on the screen.

THis is what you are looking for.. https://stackoverflow.com/a/4277114/829533
$strTweet = preg_replace('/(^|\s)#(\w*[a-zA-Z_]+\w*)/', '\1#\2', $strTweet);
And https://stackoverflow.com/a/4766219/829533
$input = preg_replace('/(?<=^|\s)#([a-z0-9_]+)/i', '#$1', $input);

Using regex it's rather simple. I would make one function that takes a prefix and replacement, like so.
function tweetReplaceObjects($input, $prefix, $replacement) {
return preg_replace("/$prefix([A-Za-z0-9_-])/", $replacment, $input);
}
An example usage would be something like this.
$text = 'hey look, it\'s #stackoverflow over there';
// expected output:
// hey look, it's stackoverflow over there
echo tweetReplaceObjects($text, '#', '$1');

Make every word in a string into classes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am trying to create an CMS-system where I can add photos to my website, and for the photo-part, I want to have the ability to enter which programs I had used to create that photo.
I should be possible to add more than one program.
Right now I am saving the programs in the database like this:
In the programs-row:
photoshop illistrator indesign
Then at my website, it could be nice if the icons/logos of the used programs, could show up next to the photo.
So my question is how to do create a new div, what a new class, based on the words from the programs-row?
Fx:
photoshop illustrator indesign
Becomes to:
<div class="photoshop"></div>
<div class="illustrator"></div>
<div class="indesign"></div>
Hope you guys can help me with this problem :)
Thanks ;)

Use the explode function and a foreach loop to perform the string manipulation.
$programs = explode(" ", $data);
foreach($programs as $value) {
//Echo out the html - $value contains the program name
}
I leave it to you to figure out how to format the program name with the HTML that you need.

$fx = "photoshop illistrator indesign";
$words = explode(" ", $fx);
array_walk($words, function(&$n) {
echo '<div class="'.$n.'"></div>';
});

In PHP, how can I use the DOMDocument class to replace the src= attribute of an IMG tag? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
It's often useful to be able to swap out the src= attribute of an HTML IMG tag without losing any of the other attributes. What's a quick, non-regex way of doing this?
The reasons I don't want to use RegEx are:
It's not very readable. I don't want to spend 20 mins deciphering a pattern every time I need to account for a new case.
I am planning on modifying this function to add in width and height attributes when they're missing. A simple RegEx string replacement won't be easy to modify for this purpose.
Here's the context: I have a bunch of RSS feed posts that each contain one image. I would like to replace these images with blank images, but keep the HTML otherwise unaffected:
$raw_post_html = "<h2>Feed Example</h2>
<p class='feedBody'>
<img src='http://premium.mofusecdn.com/6ff7098b3c8561d70c0af16d30e57d4e/cache/other/48da8425bc54af2d5d022f28cc8b021c.200.0.0.png' alt='Feed Post Image' width='350' height='200' />
Feed Body Content
</p>";
echo replace_img_src($raw_post_html, "http://cdn.company.org/blank.gif");

This is what I've come up with. It uses the PHP DOM API to create a tiny HTML document, then saves the XML for just the IMG element.
function replace_img_src($original_img_tag, $new_src_url) {
$doc = new DOMDocument();
$doc->loadHTML($original_img_tag);
$tags = $doc->getElementsByTagName('img');
if(count($tags) > 0)
{
$tag = $tags->item(0);
$tag->setAttribute('src', $new_src_url);
return $doc->saveHTML($tag);
}
return false;
}
Note: In versions of PHP before 5.3.6, $doc->saveHTML($tag) can be changed to $doc->saveXML($tag).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php - parsing html doc but issue with comparing text content [closed] - php

You can use PHP's DOMXPath to execute an XPath query against your DOM object. I believe that for yours it'll be //a[text()="Link 12345"] Will return all the who's text is "Link 12345".

A simple bug: you are testing "$h2->item($i2)->textContent" instead of "$h2->textContent" Isn't it?

Related

Parse HTML Table - PHP [closed]

Find all reaccurances of text inbetween two strings [closed]

Dynamically replacing # and # like Twitter [closed]

Make every word in a string into classes [closed]

In PHP, how can I use the DOMDocument class to replace the src= attribute of an IMG tag? [closed]

Categories

Resources