I am using simpleTest WebBrowser for DataScraping on this URL http://www.magicbricks.com/bricks/agentSearch.html. But though everything seems right I always get the error City Field is required. I guess the problem might be with the fact that values in city field changes dynamically when value of State changes. Any solutions? Here is my code.
<?php
require_once('simpletest/browser.php');
$browser = &new SimpleBrowser();
$browser->addHeader('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2');
$browser->get('http://www.magicbricks.com/bricks/agentSearch.html');
$browser->setField('source','agentSearch');
$browser->setField('_transactionType','1');
$browser->setField('_propertyType','1');
$browser->setField('resultPerPage','50');
$browser->setField('agentSearchType','B');
$browser->setField('state','520');
$browser->setField('city','4320');
$browser->setField('keyword','');
$browser->setField('country','50');
print $browser->submitFormById('searchFormBean');
print $browser->getResponseCode()
?>
Here are some errors i noticed
Field Missing
Missing agentSearchType field
Missing transactionType ( There is Both transactionType & _transactionType)
missing propertyType ( There is both propertyType & _propertyType)
There are some header information you need to add such as
Referer
Cookie
A typical post test should come this format if you view the headers
POST http://www.magicbricks.com/bricks/agentSearch.html HTTP/1.1
Host: www.magicbricks.com
Connection: keep-alive
Content-Length: 173
Cache-Control: max-age=0
Origin: http://www.magicbricks.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.79 Safari/535.11
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: http://www.magicbricks.com/bricks/agentSearch.html
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: JSESSIONID=nF1UqV3DM2tZC42zByYm6Q**.MBAPP09; __utma=163479907.1423216630.1331970312.1331970312.1331970312.1; __utmb=163479907.1.10.1331970312; __utmc=163479907; __utmz=163479907.1331970312.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _mbRunstats=3k0ilrpcgprh4tea
source=agentSearch&agentSearchType=B&country=51&state=601&city=8417&transactionType=11951&_transactionType=1&propertyType=10001&_propertyType=1&keyword=tesy&resultPerPage=50
I hope this helps
:D
Related
trying to upload an image at laravel: On echo $request; i am getting this: POST /infoshore0/api/public/v1/product/addProductImg?lang=en-us HTTP/1.1 Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate, br Accept-Language: en-IN,en-GB;q=0.9,en-US;q=0.8,en;q=0.7 Authorization: Bearer eyJ0eXAiOiJKV1QiLNjc0MTcsIm5iZiI6MTYwMjI2MzgxNywianRpIjoiZmxhdHJ1Mk5BeGdLc0JWRCJ9.lFcZOOSZx_94eZY97L-OfR36XLX_KLqvbZFJo5l2FW8 Connection: keep-alive Content-Length: 209141 Content-Type: text/plain;charset=UTF-8 Cookie: laravel_session=7dab0a4fdba93fbec911f Host: localhost:8080 Origin: http://localhost:8080 Referer: http://localhost:8080/cp/ Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36 {"file":"ÿØÿà\u0000\u00.......very long text","product_id":499859489}
I have tried if ($request->hasFile('file')) { echo 'File Found'; } and nithing is happeningAlso Passing product_id and image file using angularJS
I'm using PHP slim framework for a personal project. For some reason, the PSR implementation of Request in Slim apparently is filtering some headers. I am trying to set a custom CSRF token and it is not available via $request->getHeaders(). Here's one example that shows the problem:
$app->get('/bar', function ($request, $response, $args) {
echo "PHP's getallheaders() <br>";
foreach (getallheaders() as $name => $value) {
echo "$name: $value <br>";
}
echo "Slim's GetHeaders() <br>";
foreach ($request->getHeaders() as $name => $values) {
foreach ($values as $value) {
echo "$name: $value <br>";
}
}
});
I get this output:
PHP's getallheaders()
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: null
Accept-Encoding: gzip, deflate
csrf_name: csrf56fc038c2f6eb
csrf_value: 4e077c04dadf22377da2aebc1a8caa78
Cookie: PHPSESSID=41016nbag70gi6shq4u2tg0aq1
Connection: keep-alive
Slim's GetHeaders()
Host: localhost
HTTP_USER_AGENT: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_LANGUAGE: null
HTTP_ACCEPT_ENCODING: gzip, deflate
HTTP_COOKIE: PHPSESSID=41016nbag70gi6shq4u2tg0aq1
HTTP_CONNECTION: keep-alive
I am trying to understand why the custom headers:
csrf_name: csrf56fc038c2f6eb
csrf_value: 4e077c04dadf22377da2aebc1a8caa78
are being removed by Slim.
It is not Slim, it is the webserver.
Even though header whose name contains underscore is valid by HTTP spec, both Nginx and Apache silently drop those headers for security reasons. In general you should use only use headers containing a..zA..Z and - characters.
With Apache you can still access header with underscore in their name using getallheaders() which is an alias to apache_request_headers().
With Nginx you can enable headers with underscrore in their name with underscores_in_headers on setting.
Believe it or not, the problem was that Slim does not like an underscore in a user-defined header. Once I changed csrf_name to csrfname it worked:
PHP's getallheaders()
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: null
Accept-Encoding: gzip, deflate
csrfvalue: 4e077c04dadf22377da2aebc1a8caa78
csrfname: csrf56fc038c2f6eb
Cookie: PHPSESSID=5aom8b5q7ottorc9279q9sh4g1
Connection: keep-alive
Slim's GetHeaders()
Host: localhost
HTTP_USER_AGENT: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_LANGUAGE: null
HTTP_ACCEPT_ENCODING: gzip, deflate
HTTP_CSRFVALUE: 4e077c04dadf22377da2aebc1a8caa78
HTTP_CSRFNAME: csrf56fc038c2f6eb
HTTP_COOKIE: PHPSESSID=5aom8b5q7ottorc9279q9sh4g1
HTTP_CONNECTION: keep-alive
So, don't forget, remove underscores!!
EDIT As explained by Mika Tuupola, the root cause is the HTTP server and not slim.
I want to scrap some data, but I need to log in. So, my idea is to copy the cookies when I log in to my program. But I don't know why, but if I'm using my program, I kept redirect to login pages. I already compared it, but the cookies are same.
Here's the header if I login using my google chrome, (Copied it from request header):
GET /example/data/data.jsp?date=01-Jan-2001&_=1439020103330 HTTP/1.1
Host: www.example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ms;q=0.6
Cookie: SESSIONID=BA4BA42C628D5C6EB959D49DB745D94A.NGXA; __utma=77920972.1013585791.1438786361.1438966138.1439020034.5; __utmc=77920972; __utmz=77920972.1438786423.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
My curl code:
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$f = fopen('request.txt', 'w');
curl_setopt($ch, CURLOPT_STDERR , $f);
curl_setopt($ch, CURLOPT_HTTPHEADER,array('
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding: gzip, deflate, sdch',
'Accept-Language: en-US,en;q=0.8,ms;q=0.6',
'Cookie: SESSIONID=BA4BA42C628D5C6EB959D49DB745D94A.NGXA; __utma=77920972.1013585791.1438786361.1438966138.1439020034.5; __utmc=77920972; __utmz=77920972.1438786423.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36',
'X-DevTools-Emulate-Network-Conditions-Client-Id: 3A45EE97-D41F-45A3-AFCD-1540014377A7
'));
Here's my request.txt used to debug my program header:
* About to connect() to www.example.com port 80 (#0)
* Trying 202.43.163.203... * connected
* Connected to www.example.com (202.43.163.203) port 80 (#0)
> GET /example/data/data.jsp?date=01-Jan-2001&_=1439020103330 HTTP/1.1
Host: www.example.com
Accept: */*
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ms;q=0.6
Cookie: SESSIONID=BA4BA42C628D5C6EB959D49DB745D94A.NGXA; __utma=77920972.1013585791.1438786361.1438966138.1439020034.5; __utmc=77920972; __utmz=77920972.1438786423.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36
X-DevTools-Emulate-Network-Conditions-Client-Id: 3A45EE97-D41F-45A3-AFCD-1540014377A7
< HTTP/1.1 302 Found
< Date: Sat, 08 Aug 2015 09:44:06 GMT
< Server: Apache
< Set-Cookie: SESSIONID=7C8779894A3CE29D4BCED4B4D311E07E.NGXA; Path=/example/; HttpOnly
< Location: http://www.example.com/login.jsp
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host www.example.com left intact
* Closing connection #0
This my first time scrapping data with login user, but did I miss something?
Cookies can be set via the CURLOPT_COOKIE. In your case
curl_setopt($ch, CURLOPT_COOKIE, 'SESSIONID=BA4BA42C628D5C6EB959D49DB745D94A.NGXA');
With semicolon space you can add more cookies to the request. See http://php.net/manual/en/function.curl-setopt.php for more information.
If you want to store and re-use the cookies you can also use CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR.
Im running PHP version 5.5 on WAMP. I have a very simple API. I want to get the custom request header called "api_key". First of all, I made the GET request and logged the headers like this:
foreach (getallheaders() as $name => $value) {
$message .= "$name: $value\n";
}
file_put_contents('headers.log', $message);
This resulted in:
Host: localhost
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json, text/javascript, */*; q=0.01
device_id: 63843
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36
api_key: hv7Vgd4jsbb
Referer: http://localhost/server/cli/beaufort/www/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: PHPSESSID=bd3c8ce878ebc504b2128686efbe30cf;
bd3c8ce878ebc504b2128686efbe30cf=DEFAULT%7C0%7C2M3TMlgUx3gTlaarYzHIdD28l8q9FTcNubt55%2BUGpAo%3D%7C7456bf61db3500c8bb7b3bc38082a470ce4a2ad3
So "api_key" is there. However, somehow, when I do:
$message = $_SERVER['HTTP_API_KEY'];
I get the error:
Fatal error: Uncaught exception 'ErrorException' with message 'Undefined index: HTTP_API_KEY'
Why can I not get this header??
$headers = getallheaders();
$message = $headers['api_key'];
When I check apache_request_headers() I found PHPSESSID.
$headers = apache_request_headers();
foreach ($headers as $header => $value) {
echo "$header: $value <br />\n";
}
results something like this.
Host: localhost.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: PHPSESSID=ltj5b4tvu9lcpvt9itt3ge4oj6
Question :
How do I turn off the PHPSESSID and why it's appear on every page by default?
If you want to turn off using cookies in sessions, you can set the PHP ini directive session.use_cookies to 0. See the manual.