I want to crawl the following page: https://db.aa419.org/fakebankslist.php with search word "sites".
I'm using requests package in python. No plan to try selenium b/c there is no javascript in this page, neither do I need to click any button. I think requests package should have the ability to crawl.
For the website itself, I guess it send query words using php. So I created a php session using requests.post() and retrieve cookies using response.cookies, then feed the cookies to the site in the following post requests. The code structure is below:
#crawl 1st page with search word in url
url='https://db.aa419.org/fakebankslist.php?psearch=sites&Submit=GO&psearchtype='
response = requests.post(url)
cookies= response.cookies
print(cookies)
#crawl page 2-4
for i in range(2, 5):
url = 'https://db.aa419.org/fakebankslist.php?start={}'.format(str(1+20*(i-1)))
response = requests.post(url, cookies=cookies)
cookies= response.cookies #update cookie for each page
print(cookies)
However, it only works for the first 2 pages. After the loop begin to crawl page 3, the cookie becomes empty: <RequestsCookieJar[]>. I checked the response of page 3 and found it's some random page irrelevant to my query words "sites".
Could anyone explain whats's going on with this situation? How can I keep crawling the following pages? Thanks in advance!
I am not certainly sure what you are trying to obtain from that website but I will try to help.
First page with results can be obtained through this url:
https://db.aa419.org/fakebankslist.php?psearch=essa&Submit=GO&start=1
Value 1 for start key indicates the first result that apears on page. Since there are 19 results on each page to view second page you need to switch '1' to '21' :
https://db.aa419.org/fakebankslist.php?psearch=essa&Submit=GO&start=21
The second thing is that your requests should be made using GET method.
I checked the response of page 3 and found it's some random page irrelevant to my query words "sites"
I believe this is related to broken search engine of the website.
I hope this code helps:
#crawl page 1-5
s = requests.Session()
for i in range(0, 5):
url = 'https://db.aa419.org/fakebankslist.php?psearch=essa&Submit=GO start='+str(1+i*20)
response = s.get(url)
cookies= s.cookies #update cookie for each page
print('For page ', i+1, 'with results from', 1+i*20, 'to', i*20+20, ', cookies are:', str(cookies))
Related
I've just designed my first form in HTML and a PHP page to display the results. In the form the user inputs some codes in response to some questions, a bit like a multiple choice, so for example, these are "ABC". The PHP page displays the code to the user as a link, which when clicked will go to a bookmark (a link within the same page) with the ID #ABC. This was achieved with simple manipulation of the PHP variable as follows:
<?php
$code = "ABC"
$part1 = '<a href="mywebpage.php#';
$part2 = '">Go to this code</a>';
$string = $part1.$code.$part2;
echo $string;
?>
(i.e. Link in the page says "go to this code" and when clicked will go to section with bookmark ABC)
This all works fine, but I simply need to know if there is a way of error trapping so that if a bookmark does not exist for the code entered, a message can be displayed to the user instead? Can this be done using the PHP variable, or do I need to use JavaScript? One work around may be to search the web page for the ID "#ABC'. Is it possible to do this? Another option would be to store an array of valid codes on the server then query this before setting the bookmark, but I want to keep it as simple as possible. Any help appreciated, thanks.
What you call a "bookmark" we call a hash. And when you say "go to a bookmark" you mean a hash change. Hash changes do not make an additional request to the server, it is all handled on the client-side, therefore this must be done with JavaScript and not PHP.
So let's just do some simple JavaScript on hash change window.onhashchange that will search for an element with that ID and if it's not found alert something.
window.onhashchange = function(){
if(!document.getElementById(location.hash){
alert("not found");
}
}
Purely hypothetical at this point, no code yet. Trying to figure out the best way to do this. We are company "A" and we have two partners, company "B" and company "C". On a sign up form, we collect data and then pass it on to either partner "B" or parnter "C" - this part is good to go and working fine. I do this with ajax on the front end and a cURL processor on the back end so no one leaves our site and just post the data directly to the partner's form.
Unfortunately due to partner "B" and "C"'s required data the forms we post to are different and we have to have 2 separate html form files, one for each partner. The problem is that we need to do this all from one URL, not a separate one for each partner.
I would guess we would use a 'handler' page that has the specific url - http://www.example.com/parterForm.php
Then in the 'handler' page we would make the switch serve the correct content. I need a way to evenly split who we send data to. I'd like to do the switch on a very granular, MS level for example:
if the time = 0-500 ms - serve Parter B page;
if time = 501-1000ms -serve Partner C page;
all done within the 'handler' page - calling the forms as php includes?
I realize this is not a specific code question and I aplogize, this is something I've never done before and am trying to figure out how to do this. I'm a Creative Director btw who codes, no other resource avail.
thanks.
Hmm, yes, you could do that. That would work reasonably well, in fact. The important thing is to make sure the form goes to the right partner. You could use $_SESSION for that, or check which fields were sent and deduce from that which partner was chosen.
For example:
if( fmod(microtime(),1) < 0.5) include("forms/partner1.php");
else include("forms/partner2.php");
Then when submitted:
$partner1fields = array("name","email","country","dateofbirth");
$partner2fields = array("name","address","postcode","ethnicity");
// the above are examples - they should correspond to the $_POST keys you expect
// now check if they match. Array equality depends on order, so sort first
$postkeys = array_keys($_POST);
sort($postkeys);
sort($partner1fields);
sort($partner2fields);
if( $postkeys == $partner1fields) { /* submit to partner 1 */ }
elseif( $postkeys == $partner2fields) { /* submit to partner 2 */ }
else {
echo "<p>Given keys did not match either partner</p>";
echo "<p>POST keys: ".implode(", ",$postkeys)."</p>";
echo "<p>Partner 1 keys: ".implode(", ",$partner1keys)."</p>";
echo "<p>Partner 2 keys: ".implode(", ",$partner2keys)."</p>";
echo "<p>Please report this error to the site administrator.</p>";
exit;
}
First, by MS I assume you mean the latency between client and server?
Use javascript to either load a tiny image from the server or make an ajax call that gets one char or something and time this. For testing you'll need to do some real pings and adjust your js time to reflect the ping round trip. For example, if the js time to load the image is 500ms but ping time is only 80ms then maybe divide by 6 for the result. This will never be very precise as the client and the server both have processing overhead. Make sure to echo no cache headers or past expire times with the image or ajax response.
Easy, if time <= 500 redirect to form A, if time > 500 redirect to form B or use ajax to load them up.
Sorry if I'm duplicating threads here, but I wasn't able to find an answer to this anywhere else on StackOverflow.
Basically what I'm trying to do is make a list in which variables entered in a form by a user can be kept. At the moment, I have the code which makes this possible, and functional, however the variables entered in the form only appear on the list after the user hits submit... As soon as I refresh the page or go to the page from somewhere else, the variables disappear. Is there any way I can stop this from happening?
Edit: here are the codes:
//Page 1
<?php
session_start();
$entries = array(
0 => $_POST['signup_username'],
1 => $_POST['signup_email'],
2 => $_POST['signup_city']);
$entries_unique = array_unique($entries);
$entries_unique_values = array_values($entries_unique);
echo "<a href='Page 2'>Link</a>";
$_SESSION['entries_unique_values'] = $entries_unique_values;
?>
//Page2
<?php
session_start();
$entries_unique_values = $_SESSION['entries_unique_values'];
foreach($entries_unique_values as $key => $value) {
$ValueReplace = $value;
echo "<br /><a href='http://example.com/members/?s=$ValueReplace'>" . $value . "</a><br/>";
}
?>
Your question is really quite vague. the answer depends on how much data you have to store, and fopr how long you need it to exsist.
By variable I assume you mean data the user has entered and that you want to put into a variable.
I also presume that the list of variables is created by php when the form is submitted.
Php will only create the variable list when the form is submitted as php is done entirely on the server, therefore you will not have or see the variables until the form is submitted.
if you wanted to be able to see the list as it is being created you could use javascript then once you have you php variables the javascript list isn't necesary.
each time you request a php page wheather it is the same one or not the server generates a totally new page, meaning all unhardcoded variables from previous pages will be lost unless you continually post the variables around the pages the server will have no memory of them.
You have a few viable options.
) keep passing the user created variables in POST or GET requests so each page has the necesary info to work with. Depending on the situation it might or might not be a good idea. If the data only needs to exsits for one or two pages then it is ok, but bad if you need the data to be accessable from any page on your web.
2.) start a session and store the variables in a session. Good if the data only needs to be around while the user is connected to the site. but will be lost if user close window or after a time.
3.) place a cookie. not a good idea but ok for simple data.
4.) create a mysql database and drop the variable info in there. great for permanent data. this is how i always complex user data.
just a few ideas for you to look into as it is difficult to see what you really mean. good luck.
use PHP session or store variable values in Cookies via JS or using PHP. It would be nice if you show your working codes :)
Your idea is fine, however you just need to add a little condition to your Page 1 that only set your SESSION values when POST is made, that way it will keep the values even if you refresh. Otherwise when you visit the page without a POST those values will be overwritten by blank values, which is what you are seeing now. You can modify it like
<?php
session_start();
if(isset($_POST["signup_username"]))
{
$entries = array(
0 => $_POST['signup_username'],
1 => $_POST['signup_email'],
2 => $_POST['signup_city']);
$entries_unique = array_unique($entries);
$entries_unique_values = array_values($entries_unique);
$_SESSION['entries_unique_values'] = $entries_unique_values;
}
echo "<a href='http://localhost/Calculator/form2.1.php'>Link</a>";
?>
You could use JavaScript and HTML5 local storage.
My website relies completely on a random page generator that loads a page from a text file list. The code was kindly written by "lserni" on the forum. The script has been working perfectly the last few days, and it's happily processed over 100,000 page views in 3 days!
I noticed today however that it seems to have stopped working properly. If you are a brand new visitor to the page, or you've cleared your internet cache/cookies etc - When you load the page for the first time, it doesn't randomly generate a page.. it just shows a BLANK page. If you then refresh the page, the script works perfectly. I just can't get my head round it, but it's now resulted in a large drop in traffic! Hope you can help:
<?php
session_start();
if (!isset($_SESSION['urlist'])) // Do we know the user?
$_SESSION['urlist'] = array(); // No, start with empty list
if (empty($_SESSION['urlist'])) // Is the list empty?
{
$_SESSION['urlist'] = file("linklist.txt"); // Fill it.
$safe = array_pop($_SESSION['urlist']);
shuffle($_SESSION['urlist']); // Shuffle the list
array_push($_SESSION['urlist'], $safe);
}
$url = trim(array_pop($_SESSION['urlist']));
header("Location: $url");
?>
It's actually the LAST item in the file that's used first if there is no session data.
{
$safe = array_pop($_SESSION['urlist']); // gets item at the END of the array
shuffle($_SESSION['urlist']);
array_push($_SESSION['urlist'], $safe); // puts item at the END of the array
}
$url = trim(array_pop($_SESSION['urlist']));// gets item at the END of the array
So if you introduced a newline in your textfile at the end, it may be your issue.
I would suggest, after the header call, add some HTML that explains where the user is being redirected to. All being well nobody will ever see it, but it could help diagnose why the user gets an empty page.
I have a PHP page that uses jQuery to let a user update a particular item without needing to refresh the page. It is an availability update where they can change their availability for an event to Yes, No, or Maybe. Each time they click on the link the appropriate jQuery function is called to send data to a separate PHP file (update_avail.php) and the appropriate data is returned.
Yes
Then when clicked the params are sent to a PHP file which returns back:
No
Then, if clicked again the PHP will return:
Maybe
It all works fine and I'm loving it.
BUT--
I also have a total count at the bottom of the page that is PHP code to count the total number of users that have selected Yes as their availability by simply using:
<?php count($event1_accepted); ?>
How can I make it so that if a user changes their availability it will also update the count without needing to refresh the page?
My thoughts so far are:
$var = 1;
while ($var > 0) {
count($day1_accepted);
$var = 0;
exit;
}
Then add a line to my 'update_avail.php' (which gets sent data from the jQuery function) to make $var = 1
Any help would be great. I would like to stress that my main strength is PHP, not jQuery, so a PHP solution would be preferred, but if necessary I can tackle some simple jQuery.
Thanks!
In the response from update_avail.php return a JSON object with both your replacement html and your new counter value.
Or to keep it simple, if they click "yes" incriment the counter, if they click No or maybe and their previous action wasn't No or Maybe decrease the counter.
Assuming your users are logged into the system I'd recommend having a status field in the user table, perhaps as an enum with "offline", "available", "busy", "unavailable" or something similar and use the query the number of available users whilst updating the users status.
If you were to do this you'd need to include in extend your methods containing session)start() and session_destroy() to change the availability of the user to available / offline respectively
The best way is the one suggested by Scuzzy with some improvements.
In your php, get the count from the database and return a JSON object like:
{ count: 123, html: 'Yes' }
In your page, in the ajax response you get the values and update the elements:
...
success: function(data) {
$("#linkPlaceholder").html(data.html);
$("#countPlaceholder").html(data.count);
}
...