I try to scrape data of this website:
http://ntthnue.edu.vn/tracuudiem
First, when I insert the SBD field with data 'TS4740', I can successfully get the result. However, when I try to run this code:
Here is my PHP cURL code:
<?php
function getData($id) {
$url = 'http://ntthnue.edu.vn/tracuudiem';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, ['sbd' => $id]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
echo getData('TS4740');
I just got the old page. Can anybody explain why? Thank you!
Make sure you add all the necessary headers and input data. The server that is processing this request can do all kinds of checks to see if it's a "valid" form request. As such you need to spoof the request to be as close to a regular browser request as possible.
Use tools like Chrome Dev Tools to see both the request and respons headers that are sent between the server and your browser to better understand what you curl setup should be like. And further use a app like Postman to make the request simulation super easy and to see what works and not.
Working example:
<?php
function getData($id) {
$url = 'http://ntthnue.edu.vn/tracuudiem';
$ch = curl_init($url);
$postdata = 'namhoc=2015-2016&kythi_name=Tuy%E1%BB%83n+sinh+v%C3%A0o+l%E1%BB%9Bp+10&hoten=&sbd='.$id.'&btnSearch=T%C3%ACm+ki%E1%BA%BFm';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Origin: http://ntthnue.edu.vn',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'Referer: http://ntthnue.edu.vn/tracuudiem',
));
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
echo getData('TS4740');
Related
I am trying to get video mp4 URLs with Vimeo API and its official documentation says to make an authenticated GET request to https://api.vimeo.com/videos/[video_id]
Below is a code that I found on github but it doesn't return anything.
$url = "https://api.vimeo.com/videos/$videoid";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36");
$result = curl_exec($ch);
Can someone please suggest me how to make an authenticated GET request to Vimeo API?
I have Vimeo keys and access tokens.
Thanks.
since you said that you already have your token, you can try this code with that obtained.
Ensure to to also enter your video id. To get video id of your. click on the video to open in a browser. Get your video id of your vimeo account
Eg on browser url
https://vimeo.com/12083674
<?php
//$your_video_id='12083674';
$your_video_id='your video id goes here';
$access_token='your access token goes here';
$clink = "https://api.vimeo.com/videos/$your_video_id";
$curl=curl_init();
curl_setopt($curl,CURLOPT_RETURNTRANSFER,true);
curl_setopt($curl,CURLOPT_URL,$clink);
curl_setopt($curl,CURLOPT_CUSTOMREQUEST,'GET');
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
"Authorization: Bearer $access_token")
);
curl_setopt($curl,CURLOPT_SSL_VERIFYPEER,0);
curl_setopt($curl,CURLOPT_SSL_VERIFYHOST,0);
$out = curl_exec($curl);
$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
var_dump($out);
if($status==200){
echo "video found<br>";
}else{
echo "There is an issue. Try Again..<br>";
}
I am implementing a scrapy spider to crawl a website that contains real estate offers. The site contains a telephone number to the real estate agent, which can be retreived be an ajax post request.
To get a phone number I have to get ID from URL, next get from source csrfToken and send this with POST by special URL with ID. This method was working good but since yesterday not working.
My code:
$urlSite = "https://www.otodom.pl/mazowieckie/oferta/piekne-mieszkanie-na-mokotowie-do-wynajecia-ID3ezHA.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($ch, CURLOPT_URL, $urlSite);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
preg_match("/csrfToken = '(.+?)'/", $result, $output_array);
preg_match("/ID(.+?).html/", $urlSite, $output_array_id);
$token = $output_array[1];
$id = $output_array_id[1];
$url = "https://www.otodom.pl/ajax/mazowieckie/misc/contact/phone/" . $id . "/";
$headers = [
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding: gzip, deflate, br',
'Accept-Language: pl,en-US;q=0.8,en;q=0.6,ru;q=0.4',
'Cache-Control: no-cache',
'Content-Type: application/x-www-form-urlencoded; charset=UTF-8',
'Content-Length: 74',
'Host: www.otodom.pl',
'Referer: ' . $urlSite,
'User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36'
];
$data = array(
'CSRFToken' => $token
);
$data_string = http_build_query($data);
$ch = curl_init();
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$phone = utf8_decode(curl_exec($ch));
curl_close($ch);
echo $phone;
Please help me, I am working for this a few hours and nothing.
{"status":"error","message":"Spróbuj wykonać operację ponownie. Jeśli
to nie pomoże, sprawdź czy masz włączoną obsługę JavaScript w
przeglądarce."}
As I mentioned on my comment, you need JavaScript in order to get the phone number. One way to achieve this is using selenium, here's a python example:
import time
from selenium import webdriver
geckodriver = 'C:/path_to/geckodriver.exe'
driver = webdriver.Firefox(executable_path = geckodriver)
driver.get("https://www.otodom.pl/mazowieckie/oferta/piekne-mieszkanie-na-mokotowie-do-wynajecia-ID3ezHA.html")
driver.find_element_by_class_name("phone-spoiler").click()
time.sleep(2)
print driver.find_element_by_class_name("phone-number").text
# 515 174 616
Notes:
1 - Install Selenium:
pip install selenium
2 - Download the geckodriver
3 - Replace C:/path_to with the path where you saved geckodriver.exe.
4 - Add C:/path_to to your environment.
5 - Restart your system.
6 - Run python name_of_script.py and the phone number will be displayed.
The steps above assume that you're using a windows machine.
I apologize in advance for my English. I have small problem.
I want to get Final Effective URL from page
streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e
When you put a link into the browser gives me to .flv file
But when I put it through PHP gives me s3.streamuj.tv/unauthorized.flv
When I try it through this: getlinkinfo.com/info?link=http%3A%2F%2Fwww.streamuj.tv%2Fvideo%2F00e276bf5841bf77c8de%3Fstreamuj%3Doriginal%26authorize%3Dac13bb77d3d863ca362315b9b4dcdf3e&x=49&y=11
So everything is fine indicates that
s4.streamuj.tv:8080/vid/d0fe77e1020b6414a16aa5316c759add/58aaf1dd/00e276bf5841bf77c8de_hd.flv?start=0
My PHP CODE:
<?php
session_start();
include "simple_html_dom.php";
$proxy = array("189.3.93.114:8080");
$proxyNum = 0;
$proxy = explode(':', $proxy[$proxyNum]);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://www.streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e');
curl_setopt($curl, CURLOPT_FILETIME, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxy[0]);
curl_setopt($curl, CURLOPT_PROXYPORT, $proxy[1]);
$header = curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
$u1 = $info['url'];
echo "u1: $u1</br>";
$u2 = str_replace("flv?start=0","flv",$u1);
echo $u2;
?>
Where is the problem? Why it makes unauthorized.flv?
Solution
Server was checking client legitimacy via user-agent HTTP header parameter.
Using custom user-agent solved the problem.
curl_setopt($curl, CURLOPT_HTTPHEADER, array( 'user-agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2950.0 Iron Safari/537.36' ));
Original post:
Most likely the generated flv URL is not pointing to static place. It
probably uses sessionID + cookie / verifies IP (one of these, or
both).
Without knowing what header you have to request with via CURL, you
probably won't get a relevant response.
I have a script that gathers a session id, puts it together with a URL and then redirects to the URL. This works perfectly in the browser and mx player for Android. But on kodi, there seems to be an error. Kodi seems to use my server as the host of the file. So instead of using: streamsite.com/index.m3u8, it uses MYSERVER.com/index.m3u8. This is driving me crazy since I do not even know how to code. This is my script:
<?php
$url = link.tojson
$cURL = curl_init();
curl_setopt($cURL,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($cURL, CURLOPT_URL, $url); curl_setopt($cURL, CURLOPT_HTTPGET, true);
curl_setopt($cURL, CURLOPT_RETURNTRANSFER, true);
curl_setopt($cURL, CURLOPT_HTTPHEADER, array( 'Content-Type: application/json', 'Accept: application/json' ));
$result = curl_exec($cURL);
curl_close($cURL);
$json=json_decode($result,true);
$pre=$json[0]['id'];
$stream='streamsite.com/index.m3u8?&sessionId='.$pre. '';
ini_set('user_agent', 'Mozilla/5.0 (Linux; Android 6.0; en-US; Nexus 5 Build/Veneno ROM) MXPlayer/1.8.3
');
header("Location:$stream");
die();
?>
Try using a proper URL:
$stream="http://streamsite.com/index.m3u8?sessionId=$pre";
header("Location:$stream");
Also I don't know what you think that call to ini_set() will accomplish, but it won't.
I am using the following piece of code to get tracker data (converted from JSON to PHP) and find the sum total of the number of seeders from the BitSnoop API:
$hash = "98C5C361D0BE5F2A07EA8FA5052E5AA48097E7F6";
if(!function_exists("curl_init")) die("cURL extension is not installed");
$url = "http://bitsnoop.com/api/trackers.php?hash=" . $hash . "&json=1";
echo $url;
$ch=curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$r=curl_exec($ch);
curl_close($ch);
$myarr = json_decode($r,true);
print_r($myarr);
But the script is not able to retrieve ANY data from the URL.
Chrome's view-source is working on the page, but any other way of retrieving the source of the page, either via viewsource.in or i-tools don't seem to retrieve any data from the URL as well.
Could anyone explain why is it so?
And please provide an alternative way to accomplish the retrieval.
Thanks in advance !
You should pretend to be a legit browser:
$hash = "98C5C361D0BE5F2A07EA8FA5052E5AA48097E7F6";
if(!function_exists("curl_init")) die("cURL extension is not installed");
$url = "http://bitsnoop.com/api/trackers.php?hash=" . $hash . "&json=1";
$headers = array(
'Host: bitsnoop.com',
'Connection: keep-alive',
'Cache-Control: max-age=0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36',
'Accept-Encoding: deflate,sdch',
'Accept-Language: ru,en-US;q=0.8,en;q=0.6');
echo $url;
$ch=curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$r=curl_exec($ch);
curl_close($ch);
$myarr = json_decode($r,true);
print_r($myarr);
And also it's a good idea to test curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); , however, I did not look if they use or not HTTP redirect