Whats up with this google url? - php

So I know a little bit of PHP, and while I was making custom search links I realized that today google's url after searching something like "hi" isnt just
http://www.google.com/search?q=hi
instead its this
http://www.google.com/#sclient=psy&hl=en&site=&source=hp&q=hi&pbx=1&oq=hi&aq=f&aqi=g5&aql=&gs_sm=e&gs_upl=1705l1911l0l2131l2l2l0l0l0l0l173l299l0.2l2l0&bav=on.2,or.r_gc.r_pw.r_cp.&fp=6c03fc000f912511&biw=1366&bih=681
Just wondering if someone has some insight as to what kind of info the rest of the url is passing along

Some examples of the data passed along:
hl=en #Locale: english
source=hp #Source: homepage
q=hi #Query: hi
In general, it's just whatever extra data that Google wishes to capture or pass along. As you've probably noticed, it's not required data in the sense that http://www.google.com/search?q=query works fine.

The 2nd one is likely an addressing string. This is often used for navigating without full page loads. This way a user can press the back button without going back to a previous url.

If you notice, outside of all the parameters passed, the real difference is:
www.google.com/#
vs:
www.google.com/search?all_the_name=values&
The first example there is no page defined, the pound symbol tells it to stay where it is. Google has a feature that loads the results as you type (not the auto suggest) but the entire page. I can't seem to find how to trigger this though.

Related

In Python, how can I request specific data from a dynamically loaded website?

I want to load pages from PeoplePerHour.com into python to run some data analysis, but it keeps getting data from a page I didn't ask for, I think it must go to the main page and then refreshes somehow into the page I ask for.
For example:
I want to pull the prices from all users at http://www.peopleperhour.com/freelance/data+analyst, and the data spans over multiple pages.
Say I want to request page 2, http://www.peopleperhour.com/freelance/data+analyst#page=2. If I go here in a browser, it works fine and pulls up page 2, but I think it pulls up page one first and then "refreshes" into page 2 (I think). If I access this in python, it loads the HTML from the first page, and never sees page 2.
Here's my code:
import requests
from pattern import web
import re
import pandas as pd
def list_of_prices(url):
html = requests.get(url).text
dom = web.DOM(html)
list = []
for person in dom('.freelancer-list-item .medium.price-tag'):
currency = person('sup')
amount = person('span')
list.append([currency[0].content if currency else 'na', amount[0].content if amount else 'na'])
return list
list_of_prices('http://www.peopleperhour.com/freelance/data+analyst#page=2')
No matter what, this returns the prices from page 1.
What is going on that I'm just not seeing?
If I understand correctly, you want to iterate through the pages. If that's the case, I believe the problem is with your URL.
Here's the URL you gave:
http://www.peopleperhour.com/freelance/data+analyst#page=2
The problem is, "page" is not a bookmark on that page. When you use the #page=2, it tells the browser to go down to the same page for a bookmark called "page=2".
Here's the URL for the Next button in that site:
http://www.peopleperhour.com/freelance/data+analyst?sort=most-relevant&page=2
You'll see it says "&page=2" which means something else. In their code "page" is a variable being passed via the url, with a value of 2. You use the "&" if there are more than one of these variables. Also, you are missing a "?" symbol. If you're passing variables via the URL, you have to put a ? followed by the name=value pairs for your variables.
So, easy fix, change your url to this:
http://www.peopleperhour.com/freelance/data+analyst?page=2
That's in comparison to your old url:
http://www.peopleperhour.com/freelance/data+analyst#page=2
As a quick test, copy/paste the corrected url on your web browser. You will see it now is on page 2.
Getting dynamic content (those generated by client-side code) is always very tricky. There is no easy solution to this, but if you really want to dig into it, I recommend PyV8, a JavaScript engine in Python.
Error in pattern when using pattern3 in python 3.6
Please click on the above Hyperlink to open the Image
What is the alternative to executing the same code under python3.6 environment because due to this I have to install the pattern3, the pattern is not supported by the python 3.6
Thanks!

Keep Dynamic URL Content in URL on Each Page

I have a PHP website that I send users to via a Dynamic URL like this:
http://mwebsitehere.com/?gw=1
well the page I send them too, works great with the code I am using to do certain things if the Dynamic content is set in the url. But whenever they click on a link on the page, which are ALWAYS changing, the Dynamic Content in the url is completely gone... For instances:
Lets say they are on the homepage that looks like this http://mwebsitehere.com/?gw=1, and then they click on a link that looks like this http://mwebsitehere.com/new-page/. Notice the ?gw=1 is completely gone from the url.
Is there a way to keep the Dynamic Links on every page if the url has dynamic content.
Like if it were to say ?gw=2 could all the links they click on or url somehow keep ?gw=2 on every page. Or if it said ?gw=1 for it to do the same thing.
Any help would be appreciated! Let me know if I need to explain my question better. Thanks!
I am also using wordpress, just in case you know anything wordpress specific! Thx!
the only reason to have get variables ?gw=2 in the url is if they are needed for that page, if you are wanting them for all pages,
have your scripts check to see if it exists in the $_GET array or $_COOKIES array, if its in the $_GET array but not it in the $_COOKIE array then set it in the cookies. That way your script will still see it,by checking the cookies.
No sense in cluttering the url with variables that dont need to always be shown.
If you want the exact same variable passed to every page, why not use
$_SESSION['gw'];
or
$_COOKIE['gw'];
to store "gw".
Otherwise you would have to pass it on via each link as follows
For example on page http://mwebsitehere.com/?gw=1
Link
There are a few ways you can do this.
You may use $_SERVER['QUERY_STRING'] and put it in every single link in your page. It will keep your links always repeating the same query string that your current file is.
You should try storing data in sessions! Then you can carry data from a page to another. Take a look at the PHP manual.
Good luck!

Get top-frame (address bar in browser) URL in PHP?

I'm sorry for my English, it isn't so good...
I need help with a simple PHP program: if the page's URL contain a specific word, the content is showed. If there isn't it, the entire page redirects. I would use this to prevent me from frames that other webmasters make to "steal" my website contents.
For this is ok.. But a friend of mine need to make a black list of specific words, not a white list. So if the URL contain a word, the entire page redirects. I tried using *$_SERVER['HTTP_REFERER']* but if the "thief" put the frame on an external website that is not in the black list and after he make a frame of the frame, the content is showed and the page doesn't redirect..
So I think it's like that: *$_SERVER['HTTP_REFERER']* could read just 1st level frame and not the main page (top frame).
I really need help with this, I can't make it with Javascript because we want to keep the code not visible.
If it is not possible, is there a method to pass a JS variable value to a PHP variable for my case?
Thank you in advance!
This is only possible by javascript if and only if the top level domain is the same as yours. If that is the case, you could make an ajax call to the server, passing along the url and then the server could determine whether or not to redirect.
Instead of showing what the code would look like, I'm much more inclined to persuade you against this approach. If you have content that you don't want people to steal, you should make your site non-public (i.e. users must login). If that is not possible, the content that you are worried about sounds like it shouldn't be on the internet.

Updating URL without apparent reloading

I´m building a webpage that has a sort of catalog in it, which shows the current item and its description, and thumbnails for other items below it, if I click on a thumbnail of a different item, I have some script to change the description and the big image to the desired item. The problem is that I want this to reflect in the URL so the user could send the URL as a link to other to show the desired item. But I havent found a way to change the URL without having to reload the page, and for aesthethics, I dont want to reload the page.
Any ideas how to do this?
The solution is to use location.hash. Also, to implement it correctly, you might want to read this article from Google: Making AJAX Applications Crawlable
There is no reliable (cross browser) way to change the URL in the address bar without reloading the page - the very act of changing window.location.href (which I imagine is what your trying to do) tells the browser to reload the page (even window.location.href = window.location.href; will do it in some browsers).
I think you would have to put a [link to this page] element on the page and change that instead - you can easily populate it with the current URL either at the server side or using a window.onload function and manipulate it in the same way as you have been doing using element.value or element.innerHTML (depending on what type of element you choose to contain the link).
You can do it with hashes (see the window.location.hash property) but this can be messy programmatically.
The usual, currently-broadly-compatible way is to use a hash, e.g.:
http://myniftystore.com/catalog#11321R-red-shirt
then
http://myniftystore.com/catalog#11321B-blue-shirt
then
http://myniftystore.com/catalog#95748B-blue-slacks
...as you navigate items. You can change the hash on the page by assigning to the location.hash property, without reloading. This requires that you use some client-side script in the first place to figure out what to show when the user first goes to the URL (by examining the location.hash).
Google has a proposal out for how to make these things crawlable. Personally, I think they've really messed it up by requiring that weird hashtag (#!xyz rather than just #xyz), but if it's me or Google, I think I know who'll win. :-)
Coming down the pike there's the whole history API, but support isn't very thick on the ground yet (particularly not — cough — from certain vendors).

problem with ajax( page refresh)

hi im using ajax to extract all the pages into the main page but am not being able to control the refresh , if somebody refreshes the page returns back to the main page can anybody give me any solutions , i would really appreciate the help...
you could add anchor (#something) to your URL and change it to something you can decode to some particular page state on every ajax event.
then in body.onload check the anchor and decode it to some state.
back button (at least in firefox) will be working alright too. if you want back button to work in ie6, you should add some iframe magic.
check various javascript libraries designed to support back button or history in ajax environment - this is probably what you really need. for example, jQuery history plugin
You can rewrite the current url so it gives pointers to where the user was - see Facebook for examples of this.
I always store the 'current' state in PHP session.
So, user can refresh at any time and page will still be the same.
if somebody refreshes the page returns back to the main page can anybody give me any solutions
This is a feature, not a bug in the browser. You need to change the URL for different pages. Nothing is worse then websites that use any kind of magic either on the client side or the server side which causes a bunch of completely different pages to use the same URL. Why? How the heck am I gonna link to a specific page? What if I like something and want to copy & paste the URL into an IM window?
In other words, consider the use cases. What constitutes a "page"? For example, if you have a website for stock quotes--should each stock have a unique URL? Yes. Should you have a unique URL for every variation you can make to the graph (i.e. logarithmic vs linear, etc)? Depends--if you dont, at least provide a "share this" like google maps does so you can have some kind of URL that you can share.
That all said, I agree with the suggestion to mess with the #anchor and parse it out. Probably the most elegant solution.

Categories