Accessing data from external websites to 'create' own application of that data

Accessing data from external websites to 'create' own application of that data - php

Wow, i hope i have written the title in a correct way, because i really have no idea how this is called.
Let me explain what i am looking for.
I have a simple application. And it contains the following:
- Frontpage (salespage)
- Admin area
- Member area
- Database to provide the app of data
I have hosted the basics of this application on a server, let's call it 'www.the.app'. And i have written it in PHP using Laravel.
Now i want to use the functions of the app, which is hosted on www.the.app and use the functions like the admin area, member area, the frontpage and create my own database on 'www.awesome.app'.
What would be the best way to make such a thing happen?
I am not looking for direct solutions. I am just looking for information to point me in the right direction to be able to make the above reality. Anything would be apreciated, like information, a name i can search on, what ever is related to this.
And if there is any more information needed, let me know please :)

Here is exactly how you can do it :
1) To Auto Login To The Site :
Note : As you will need to probably get data from login based site so you can auto login to the site through using CURL while using User Credentials with it.
To Learn How To Login Through CURL.Take A Look At :
Using PHP & Curl to login to my websites form
2) Extracting Data After Logging In Through CURL :
Note : After logging to the site now you will need to use PHP DOM to extract data from the site.So you can extract data something like this way by using PHP Simple HTML DOM Parser Library.
PHP Simple HTML DOM Parser :
LINK : http://simplehtmldom.sourceforge.net/
Sample Code For Downloading All Images From A Link :
Note : Following code will download all the images present at the URL given in the code.
<?php
// Make sure to include the library php file
include('simple_html_dom.php');
//URL To Download Images From
$url = "http://www.facebook.com/"
// Create DOM from URL or file
$html = file_get_html($url);
// Find all images
$i=1;
foreach($html->find('img') as $element) {
$url = $element->src;
$img = "/my_folder/image_".$i.".png";
file_put_contents($img, file_get_contents($url));
$i++;
}
?>

Related

Scraping data from a website with Simple HTML Dom

I work to finish an API for a website (https://rushwallet.com/) for github.
I am using PHP and attempting to retrieve the wallet address from this URL: https://rushwallet.com/#n3GjsndjdCURphhsqJ4mQH7AjiXlGI.
Can anyone can help me?
My code so far:
$url = "https://rushwallet.com/#n3GjsndjdCURphhsqJ4mQH7AjiXlGI";
$open_url = str_get_html(file_get_contents($url));
$content_url = $open_url->find('span[id=btcBalance]', 0)->innertext;
die(var_dump($content_url));

You cannot read the correct content in this case. You are trying to access the non-rendered page content. Therefore, you always read the empty string. The content is loaded after the page is fully loaded. The page source is shown as:
฿<span id="btcBalance"></span>
If you want to scrape the data in this case, you need to use rendering engine which is possible to render javascript. One possible engine is phantomJS, which is a headless browser and able to scrape the data after rendering.

How to get an XML and integrate it into own website

I once was a Web-Designer who knew HTML/CSS. Now I'm a 3D animator, but I want to get back into the Web-Developer world.
But there's so much new to learn. E.g. flat file cms. Wow!
But my question for now is how I read an API, create the right PHP file to pull an XML file and put that data onto a web page.
Specificially I'm interested in this mobile.de API:
http://services.mobile.de/manual/search-api.html
And it seems that this is the XML that I need:
http://services.mobile.de/schema/ad-1.0.xsd
What are the next steps to get this beginner's project going?
I guess I need some sort of PHP file that uses GET and some sort of authentication. How can I test, if and what will come back?
And how do I use the pulled information to put in into a new page?
Or is my thinking all wrong?
Many thanks in advance.
Ben

Little bit you can understand through this post and answers on this post:
How to echo xml file in php

If you don't mind using already created library, please check : PHP Curl Class
Taken from the readme:
PHP Curl Class is an object-oriented wrapper of the PHP cURL extension that makes it easy to send HTTP requests and integrate with web APIs.
And this code snippet (also taken from the readme) could be your starting point:
$curl = new Curl();
$curl->setBasicAuthentication('username', 'password');
$curl->setUserAgent('');
$curl->setReferrer('');
$curl->setHeader('X-Requested-With', 'XMLHttpRequest');
$curl->setCookie('key', 'value');
$curl->get('http://www.example.com/');
if ($curl->error) {
echo 'Error: ' . $curl->errorCode . ': ' . $curl->errorMessage;
}
else {
echo $curl->response;
}
var_dump($curl->requestHeaders);
var_dump($curl->responseHeaders);
Oh, and it's an unlicensed license type software.

Get ajax generated content from another website

I have an automated archive of several (media) websites' frontpage, written in php. Specifically, I am copying the html in the <body> tag twice a day, I have a copy of all their css and js files, so I can recreate the frontpage from any point in the past. Now, I came to a problem with one of those websites, as they load the main slider content (most important news) with an ajax call. I would like this ajax call to be executed before I parse the data, not just a blank div. By looking around, I found out they use a wordpress plugin named lof-jslidernews2, but I can't find the specific ajax call to see the url and make curl request. Any ideas how to achieve this?
The website: http://fokus.mk/
My code (had to parse manually like this, because of some problems with DomDocument and not-valid html):
// ...
if($html = file_get_contents ($row['page_url'])) {
$content = strstr($html, '<body');
$content = str_before($content, '</body>') . '</body>';
$filename = date('YmdHis') . $row['page_name'];
if($success = file_put_contents ('app/webroot/files/' . $filename, $content)) {
// ....
** There is nothing illegal about my project, I am not stealing content, just freezing frontpages for later comparison. I have consulted a lawyer about this. :)

I don't know why, but the guy that actually solved my problem deleted his answer. So, here it is:
He suggested using an emulator, specifically Mink. It was easy to install (using composer) and did the job on the first try. Awesome library.
Mink is an open source browser controller/emulator for web applications, written in PHP 5.3.

HTML content extraction using Diffbot

Can someone help me I want to extract html data from http://www.quranexplorer.com/Hadith/English/Index.html. I have found a service that does exactly that http://diffbot.com/dev/docs/ they support data extraction via a simple api, the problem it that I have a large number of url that needs that needs to be processed. The link below http://test.deen-ul-islam.org/html/h.js
I need to create a script that that follows the url then using the api generate the json format of the html data (the apis from the site allows batch requests check website docs)
Please note diffbot only allows 10000 free request per month so I need a way to save the progress and be able to pick up where I left off.
Here is an example I created using php.
$token = "dfoidjhku";// example token
$url = "http://www.quranexplorer.com/Hadith/English/Hadith/bukhari/001.001.006.html";
$geturl="http://www.diffbot.com/api/article?tags=1&token=".$token."&url=".$url;
$json = file_get_contents($geturl);
$data = json_decode($json, TRUE);
echo $article_title=$data['title'];
echo $article_author=$data['author'];
echo $article_date=$data['date'];
echo nl2br($article_text=$data['text']);
$article_tags=$data['tags'];
foreach($article_tags as $result) {
echo $result, '<br>';
}
I don't mind if the tool is in javascript or php I just need a way to get the html data in json format.

John from Diffbot here. Note: not a developer, but know enough to write hacky code to do simple things.
You have a list of links -- it should be straightforward to iterate through those, making a call to us for each.
Here's a Python script that does such: https://gist.github.com/johndavi/5545375
I used a quick search regex in Sublime Text to pull out the links from the JS file.
To truncate this, just cut out some of the links, then run it. It will take a while as I'm not using the Batch API.
If you need to improve or change this, best seek out a stronger developer directly. Diffbot is a dev-friendly tool.

Simple HTML DOM only returns partial html of website

I had a big PHP script written out to scrape images from this site: "http://www.mcso.us/paid/", but when it didn't work I butchered my code to simply echo the whole page.
I found that the table with the image links I want doesn't show up. I believe it's because the remote site uses ASP to generate the table. Is there a way around this? Am I wrong? Please help.
<?php
include("simple_html_dom.php");
set_time_limit(0);
$baseURL = "http://www.mcso.us/paid/";
$html = file_get_html($baseURL);
echo $html;
?>

There's no obvious reason why them using ASP would cause this, have you tried navigating the page with JavaScript turned off? It's a more likely scenario that the tables are generated through JS.

Do note that the search results are retrieved through ajax ( page http://www.mcso.us/paid/default.aspx ) by making a POST request, you can use cURL http://php.net/manual/en/book.curl.php , use chrome right-click-->inspect element---> network and make a search you will see all the info there (post variables etc ...)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Accessing data from external websites to 'create' own application of that data - php

Related

Scraping data from a website with Simple HTML Dom

How to get an XML and integrate it into own website

Get ajax generated content from another website

HTML content extraction using Diffbot

Simple HTML DOM only returns partial html of website

Categories

Resources