I have a site that get content from other sites with some JSON and XML API. To prevent loading problems and problems with limitations I do the following:
PHP - Show the cached content with PHP, if any.
PHP - If never cached content, show an empty error page and return 404. (The second time the page loads it will be fine "success 200")
Ajax - If a date field does not exist in the database, or current date is earlier than the stored date, load/add content from API. Add a future date to the database. (This makes the page load fast and the Ajax caches the content AFTER the page is loaded).
I use Ajax just to run the PHP-file. I get the content with PHP.
Questions
Because I cache the content AFTER it was loaded the user will see the old content. Which is the best way to show the NEW content to the user. I'm thinking automatically with Javascript reload the page or message-nag. Other prefered ways?
If I use very many API:s the Ajax loadtime will be long and it's a bigger risk that some error will accur. Is there a clever way of splitting the load?
The second question is the important one.
Because I cache the content AFTER it
was loaded the user will see the old
content. Which is the best way to show
the NEW content to the user. I'm
thinking automatically with Javascript
reload the page or message-nag. Other
prefered ways?
I don't think you should reload the page via javascript, but just use Jquery's .load(). This way new content is inserted in the DOM without reloading the entire page. Maybe you highlight the newly inserted content be adding some CSS via addClass().
If I use very many API:s the Ajax
loadtime will be long and it's a
bigger risk that some error will
accur. Is there a clever way of
splitting the load?
You should not be splitting content in first place. You should try to minimize number of HTTP requests. If possible you should be doing all the API calling offline using some sort of message queue like for example beanstalkd, redis. Also cache the data inside in-memory database like for example redis. You can have a free instance of redis available thanks to http://redistogo.com. To connect to redistogo you should probably use predis
Why not use the following structure:
AJAX load content.php
And in content.php
check if content is loaded. yes > check if date is new. yes > return content
there is content, but its older > reload content from external > return content
there is no content > reload content from external > return content.
And for your second question. It depends on how often the content of the api's needs to be refreshed. If its daily you could run a script at night (or when there are the littlest people active) to get all new content and then during the day present that content. This way you minimize the calls to external resources during peak hours.
If you have access to multiple servers, the clever way is splitting the load. have each server handle a part of the requests.
Related
There is a site that I want to scrape: https://tse.ir/MarketWatch.html
I know that I have to use:
file_get_contents("https://examplesite.html")
to get the html part of site, but how can I find a specific part of site for example like this part in text file:
<td title="دالبر"title="something" class="txtclass="someclass">Tag namad">دالبر<Name</td>
When I open the text file, I never see this part and I think it is because in website there is JavaScript file. How can I get all information of website that include every part I want?
Content loaded by ajax request via javascript. This means you can't get this data simply grabbing the page contents.
There are two ways of collecting data you need:
Use solution based on selenium webdriver to load this page by real browser (which will execute JS), and collect data from rendered DOM.
Research what kind of requests are sent by website to get this data. You could use network activity tab in browser dev tools. Here is example for chrome. For other browsers is the same or similar. Than you send the same request and pase response regarding to your needs.
In your specific case, probably, you could use this url: https://tseest.ir/json/MarketWatch/data_211111.json to accees the json object with data you need.
YOU have three variants for scraping the data:
There's an export to excel file: https://tse.ir/json/MarketWatch/MarketWatch_1.xls?1582392259131. Parse through it, just remember that this number is Unix Timestamp, where first 10 numbers are the month/day/year/hours/minutes
Also there's probably a refresh function(s) for the market data somewhere in all .js files loaded in the page. Just find it and see if you can connect directly to the source (usually a .json)
Download the page at your specific interval and scrape each table row using PHP's DOMXPath::query
As you know infinite-scroll does repeated ajax request to get new content, and now hits directly Apache because the call is POST, and it has Cookies. We store in the session the last displayed item for each visitor, that's why the session hence the cookies.
We would like to take advantage of Varnish caching, so we are looking to improve this, and we are wondering what are out options here, as we need to do without cookies, without POST (so there is no user real identification).
We store in the session the last displayed item for each visitor
You can pass this information as a query string in the url of the next page. Also try not to use POST for loading a next page, use GET requests.
I have use caching with Infinite scroll based on the example code provided on the github page here the part we specifically need to look at is as follows...
nextSelector: "div.navigation a:first",
navSelector: "div.navigation",
The next 'section' loaded by the infinite scroll is picked up by read a link and getting the page contents.
As far as my knowledge goes, it uses the jQuery Load Feature and that feature states the following...
Request Method
The POST method is used if data is provided as an object; otherwise,
GET is assumed.
Therefore most standard caching techniques should work fine. I hope this helps, although i'm not familiar with varnish this should point you in the right direction.
Following the code above each link picked up by nextselector can contain GET Parameters for dynamic content.
I read something a while ago (but have lost the source) that explained how it was possible (with PHP) to transfer data(content) via things such as cookies, header fields, etc...
The PHP script would write the content to lets say cookies, and then the browser would read from the cookie and display the content using javascript. In this way you could deliver content without making another request to the webserver and without refreshing the page.
Is that possible? If so what are the options, and what are the limitations? (e.g. default time-out is 30 seconds on a php script...).
I don't understand why you would want to do something like this because it massively increases page load times. Although it is possible to do such things.
I wouldn't use cookies, though. If I had to build a structure like that, I would load different pages in DIVs with IDs and set their CSS display to none by default and to "load" a different page, I would simply set the previous' pages DIV to display: none and the newly loaded DIV to display: block.
But I guess there are many solutions to this problem.
I have a webpage that uses AJAX, MySQL, and a simple PHP file to grab content for the body of my site and place it in the body of the page. Basically, the entire site is one, dynamic page that utilizes jQuery and the history plugin to keep all the links bookmarkable and Back/Forward capable.
I want to optimize my site to use the least amount of resources possible (server-side). Right now, anytime someone clicks a link to another "page" on my site, the PHP page is called and a database connection is created, the content is grabbed from the database, and then placed on the page with JavaScript.
Would it be better to instead have the PHP file grab a cached file that contains the content and then send that to the browser?
I still want my pages to be as up to date as possible, though, so I was thinking instead to have a column in the table with my content that says its modification date, and, if the cached file is older, load the data in the table and replace the cached file. However, that would make the PHP script both create a database connection AND check the file modification time of the cached file.
What's the best solution?
When you update the data in the database also remove any cached version of the corresponding data.
This way you can have the php file check if a cached version of the file exists. If there isn't one then connect to the db and cache that data and return it else just return the cached version. This way you only establish a database connection if no cached version exists.
Can I Stop or interrupt PHP script for particular time?
I would like to implement logic like below:
on mydomain.com/index.php there will be flash appear showing some intro of a product for suppose 20 sec. Once it complete, on same index.php page the home page of site will appear.
I am not aware about flash (action script).
Is it possible in PHP or Javascript ?
Usually "splash pages", as the're called, are made up of a seperate page.
In flash you can use the following code (Actionscript 3). Put it int the last frame, or use an event listener to redirecrect when the file is finished. The actual redirect looks like this:
getURL("http://www.woursecondpagehere.com", "_self")
Where you place it is up to you.
EDIT: I think that this is a reliable solution because this guarantees (if implemented correctly) that the page won't move until Flash is done. CSS and Javascript will work fine too.
There isn't a need to interrupt PHP in the scenario given. Though I think what you want is to load the rest of the HTML after a certain event occurs.
If thats the case then you can use AJAX to load the additional HTML from the server. Or you can use CSS to hide that content and show it after a certain point.
The META Refresh tag is probably not what you want since it will redirect the user after 20 seconds, regardless of how long it took to load your Flash file, then play it. Since the speed of the user's connection cannot be reliably predicted, you will probably end up with a poor user experience.
What you want to do is definitely possible but it will involve some interaction between the Flash object and the rest of your page. If you could do as Moshe suggested and simply have the Flash object redirect the user's browser to your actual home page with content on it, that would be easier.
If you insist on keeping everything on the same page, one way to do it is to call a Javascript function from the Flash object once it's finished playing. The function you call should be written to hide the Flash object and/or it's container and display the container () with all of your content that you're ready to show.
See this Codefidelity blog post for a quick tutorial on how to call JS functions from Flash.
Also, to clarify, you won't be interrupting or changing when your PHP script runs. That runs on the server before the page is created and sent back to the user's browser. All you need to do is structure the HTML/CSS of your page to have two DIVs: one with the Flash object in it and the other with all your normal page content. However, set the CSS to only show the DIV with the Flash object, then finally use Javascript to hide that DIV and show the one with the content in it.
Try this,
write the your flash (splash screen) <embede> code in index.html and simply use javascript redirect function with javascript timer functions to redirect to index.php where you actual content is there.
something like...
window.location = "http://your.domain.name/index.php"
or
location.href = "http://your.domain.name/index.php"
use setTimeout() to call redirect after specified time.