Get contents of a changing DIV with static ID - php

I am trying to make myself a homepage, for my personal use only and what I want to do is to display different information from different websites that change few times a day. i.e. News, weather and such. I want to have my favorite information always on sight without the need to visit many pages. As many of websites don't load within an iframe which was the first thing I tried I figured PHP might be able to help me.
So what I need to do to is to get the contents of a DIV and place it within my page with PHP.
The DIV on the source page is generated on the server but it always have the same ID.
example:
<div id="nowbox">
<a href="http://www.seznam.cz/jsTitleExecute?id=91&h=19331020">
<img width="135" height="77" src="http://seznam.cz/favicons/title//009/91-JrAEVc.jpg" alt="" /></a>
<div class="cont"> <ul> <li>
<strong>Sledujte dnes od 20.00 koncert Tata Bojs</strong>
<p>Nenechte si ujít tradiční benefiční koncert kapely Tata Bojs. Sledujte představení na Seznam.cz</p> </li> </ul>
</div>
</div>
so the ID of the DIV is "nowbox" and I need to copy all that is within it and put it in my page.
So far I was only able to use this
$contents = file_get_contents("http://seznam.cz");
and view all contents of the page but I have no idea how to strip everything and leave only the needed DIV.
I am not very experienced in PHP so I would be very grateful for any help, the easier to understand the better.
EDIT:
THX for answers. Basically I just wanted to get the code I posted as example to a variable so I could ECHO it somewhere on my page. The problem is that the code changes as does the rest of the website and only some things remain the same i.e. the DIV ID.
Definitely NOT the most elegant solution (even I know that but as the website is for my purposes only it shouldn't matter) but one that I successfully managed to get to work is that I got the whole page with:
$contents = file_get_contents("http://seznam.cz");
and then counted the number of chars to a specific unique position in the code with STRPOS plus/minus a static number of characters that I could count manually. Then I split the string into ARRAYs and discard the parts I don't need to get the beginning of the code in the beginning of a string and then use the same method to cut the string after the code ended.

If you want to do this server side, I suggest you to use phpquery
require('phpQuery/phpQuery.php');
$doc = phpQuery::newDocumentFileXHTML('http://seznam.cz');
$html = pq('#nowbox')->htmlOuter();

I'm unable to fully understand what you want to achieve and why, but you can do this both through the server and client, first, the client side way:
Well, what you're asking is for is to extract parts of the DOM, using javascript + jQuery on the client side you can achieve it this very rapidly, simply by calling the $.load("/mypage #nowbox") function.
This could be achieved on the server side aswell using php by using any DOM manipulation library, either one that is bundled within (DOMDocument) or one the easier to use libs (which is a bit memory leakish), simplehtmldom
So there you have it, options for both client & server ways to implement, select which one suites your needs best.
please notice that any CSS ruling will not be available by either method, as the css won't be loaded in your dom.
Good Luck!

Related

How can I massively delete old styles applied in Wordpress posts with REGEX?

Good day!
I have a massive magazine website I’ve just migrated from Divi to X Pro. Inside every post there's a sugestion to another post. And there’s a bit more than +10,000 posts in total, so this is not something editors can fix manually in every post. This element was added within the post content.
<blockquote>
<h3>Te sugerimos</h3>
<p class="entry-title"><strong>POST TITLE</strong></p>
</blockquote>
It should be just an h3 tag, and then a p tag without that entry-title class, and ofcourse, without that blockquote tag.
That code is just part of the posts. Back in the old Divi website, editors wrote posts normally using native WP WYSIWYG editor. It was Divi, for reasons I don’t know, that apply all this… styles? Anyway, everything passed on to this X-Pro-based website once I did the migration.
Here I check every post in WP WYSIWYG and they seems normal, and when I see any article online it has those big chunk of text. And it’s when I check HTML tab in post editor that I see all that garbage code.
In order to get rid of all that, I'm thinking about using REGEX, but honestly, I have no idea how to tell REGEX to delete every class="entry-title" from a p tag which is inside a blockquote tag, which I would delete too but only if it has all those elements inside.
This would be a life saver. I'm going crazy here.
Thanks in advance!
Let's define the matching regular expression (PCRE-compatible), first of all:
~<blockquote>\s*(.+?)<p class="entry-title">(.+?)<\/blockquote>~s
See live at RegExr; click "explain" to understand the expression. Then our replacement:
\1<p>\2
Then, here's a test block with added surrounding content:
<blockquote>
<h3>Te sugerimos</h3>
<p class="entry-title"><strong>POST TITLE</strong></p>
</blockquote>
<p>Other stuff</p>
<blockquote>Not matched</blockquote>
When the regex above is applied, for example as in preg_replace($pattern, $replace, $content), the above block transforms into:
<h3>Te sugerimos</h3>
<p><strong>POST TITLE</strong></p>
<p>Other stuff</p>
<blockquote>Not matched</blockquote>
Which I assume is your desired output.
Now, how to apply this to all your content? You have three basic options:
Use MySQL's REGEXP_REPLACE function -- whether in terminal, in PHPMyAdmin, or from a PHP script. Please see How to do a regular expression replace in MySQL? for usage examples, then match to your database structure.
Handle the cleanup in PHP: Run a select query for all posts with this pattern; then modify content with preg_replace; finally update the database entries.
Download a database dump, open it up in your favorite text editor (with regex support), or pipe it into your tool of choice, and do the necessary replacements; finally reload into your database. (You may want to have your site in maintenance mode while this is happening!)
Whichever way you choose to do this, be sure to backup your data first.

Grab Text from a URL Using HTML?

I run my own game, and I can use PHP to get an updated value of how many users are online at a current time. I want to create an updating string of text that shows how many users are online. In game it's programmed to update the value every 20 seconds.
The problem is that my website can only use HTML, and that's about as far as it goes for how much customization I have. The other option is Flash, which I have zero clue on how to use.
The HTML doesn't seem to work with PHP inside of it, so I'm really unsure of how to approach this.
I just need the html to grab the text that outputs from a PHP url from my website, basically in the same way you use html to grab an image. It's 100% readable, and it's just a single string that I need to grab to show how many users are online. : ( Is there any way to do this or am I out of luck?
You can try using <iframe src="#url"></iframe> or <embed src="#url"></embed> if you want this to be done using only html.

Running preg_replace on html code taking too long

At the risk of getting redirected to this answer (yes, I read it and spent the last 5 minutes laughing out loud at it), allow me to explain this issue, which is just one in a list of many.
My employer asked me to review a site written in PHP, using Smarty for templates and MySQL as the DBMS. It's currently running very slowly, taking up to 2 minutes (with a entirely white screen through it all, no less) to load completely.
Profiling the code with xdebug, I found a single preg_replace call that takes around 30 seconds to complete, which currently goes through all the HTML code and replaces each URL found to its SEO-friendly version. The moment it completes, it outputs all of the code to the browser. (As I said before, that's not the only issue -the code is rather old, and it shows-, but I'll focus on it for this question.)
Digging further into the code, I found that it currently looks through 1702 patterns with each appropriate match (both matches and replacements in equally-sized arrays), which would certainly account for the time it takes.
Code goes like this:
//This is just a call to a MySQL query which gets the relevant SEO-friendly URLs:
$seourls_data = $oSeoShared->getSeourls();
$url_masks = array();
$seourls = array();
foreach ($seourls_data as $seourl_data)
{
if ($seourl_data["url"])
{
$url_masks[] = "/([\"'\>\s]{1})".$site.str_replace("/", "\/", $seourl_data["url"])."([\#|\"'\s]{1})/";
$seourls[] = "$1".MAINSITE_URL.$seourl_data["seourl"]."$2";
}
}
//After filling both $url_masks and $seourls arrays, then the HTML is parsed:
$html_seo = preg_replace($url_masks, $seourls, $html);
//After it completes, $html_seo is simply echo'ed to the browser.
Now, I know the obvious answer to the problem is: don't parse HTML with a regexp. But then, how to solve this particular issue? My first attempt would probably be:
Load the (hopefully, well-formed) HTML into a DOMDocument, and then get each href attribute in each a tag, like so.
Go through each node, replacing the URL found for its appropriate match (which would probably mean using the previous regexps anyway, but on a much-reduced-size string)
???
Profit?
but I think it's most likely not the right way to solve the issue.
Any ideas or suggestions?
Thanks.
As your goal is to be SEO-friendly, using canonical tag in the target pages would tell the search engines to use your SEO-friendly urls, so you don't need to replace them in your code...
Oops ,That's really tough, bad strategy from the beginning , any way that's not your fault,
i have 2 suggestion:-
1-create a caching technique by smarty so , first HTML still generated in 2 min >
second HTMl just get from a static resource .
2- Don't Do what have to be done earlier later , so fix the system ,create a database migration that store the SEO url in a good format or generate it using titles or what ever, on my system i generate SEO links in this format ..
www.whatever.com/jobs/722/drupal-php-developer
where i use 722 as Id by parsing the url to get the right page content and (drupal-php-developer) is the title of the post or what ever
3 - ( which is not a suggestion) tell your client that project is not well engineered (if you truly believe so ) and need a re structure to boost performance .
run

Intelligently grab first paragraph/starting text

I'd like to have a script where I can input a URL and it will intelligently grab the first paragraph of the article... I'm not sure where to begin other than just pulling text from within <p> tags. Do you know of any tips/tutorials on how to do this kind of thing?
update
For further clarification, I'm building a section of my site where users can submit links like on Facebook, it'll grab an image from their site as well as text to go with the link. I'm using PHP and trying to determine the best method of doing this.
I say "intelligently" because I'd like to try to get content on that page that's important, not just the first paragraph, but the first paragraph of the most important content.
If the page you want to grab is foreign or even if it is local but that you don't know its structure in advance, I'd say the best to achieve this would be by using the php DOM functions.
function get_first_paragraph($url)
{
$page = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($page);
/* Gets all the paragraphs */
$p = $doc->getElementsByTagName('p');
/* extracts the first one */
$p = $p->items(0);
/* returns the paragraph's content */
return $p->textContent;
}
Short answer: you can't.
In order to have a PHP script "intelligently" fetch the "most important" content from a page, the script would have to understand the content on the page. PHP is no natural language processor, nor is this a trivial area of study. There might be some NLP toolkits for PHP, but I still doubt it would be easy then.
A solution that can be achieved with reasonable effort would be fetch those entire page with an HTML parser and then look out for elements with certain class names or ids commonly found in blog engines. You could also parse for hAtom Microformats. Or you could look out for Meta tags within the document and more clearly defined information.
I wrote a Python script a while ago to extract a web page's main article content. It uses a heuristic to scan all text nodes in a document and group together nodes at similar depths, and then assume the largest grouping is the main article.
Of course, this method has its limitations, and no method will work on 100% of web pages. This is just one approach, and there are many other ways you might accomplish it. You may also want to look at similar past questions on this subject.

How can I convert language of a div?

I am recently working in a project. There I need to convert language from English to Japanese by button click event. The text is in a div. Like this:
"<div id="sampletext"> here is the text </div>"
"<div id="normaltext"> here is the text </div>"
The text is come from database. How can I convert this text easily?
Assuming that you have both the English and the Japanese version in the database, you can do two things:
Use AJAX to load the correct text from the database and replace the contents of the div. There are tons and tons of tutorials on the internet about AJAX content replacement.
Put both languages on the website and hide one using CSS display:none. Then use some JavaScript to hide/display the correct div when a button is clicked.
The first is technically more complex but keeps your page size small. The second one is very easy to do, but your page size is larger because you need to send both languages.
If the div is small and there is only one or two of these on the page, I recommend number two, the CSS technique. If the div is large (i.e. a complete article) or there are many of them then use the first method.
If you mean translating the text, you cannot do it easily. To get some idea of the best attempts that software can make at translating natural languages, go to Google Translate or Babelfish. It's not that good, but it's sometimes an intelligible starting point.
If you just mean setting the language attribute on an element, then assign a new language code to the lang property of the div element object.
document.getElementById("normaltext").lang = "en-US";
I don't know the language code for Japanese; possibly ja-ja.
Assuming your literals have an id in your database you could put that id as a class in your div. Then with jquery fetch the ID, send it to your Ajax back-end and fetch the translated one.
First, if you have the texts in a database it really doesn't matter if you render it in divs, tables or whatever.
First you need a php api for some translation service. Here is just an example that might give you some ideas.
$textArray = getTextForThisPage();
?>
...
english_to_japanese($textArray["text1"]);?>
...

Categories