I am working on a project which needs to extract data from website by parsing its html and getting the content out of title tag and meta description.I am able parsing that data from normal website, but in this matter the website is only can be access using IP address as the URL.Is it possible to be extract and what solution can be use?
A URL doesn't need a domain name, something like http://127.0.0.1/test.php is a valid url and all scraper should work with this correctly.
This requires the website to respond on requests to the ip-based url. Those on private servers or very big sites might do, sites from ordinary shared hosters usually don't as they host multiple sites with the same ip.
Related
I am developing a website similar to a web forum. Users will post queries and others help them through their replies. In many websites like mine, the topic of the query will be included in the URL, e.g. www.sample.com/topic-1.html or www.sample.com/topic-2.html, and these links can be found via search engines.
How can I dynamically generate the HTML files and configure my website so that search engines can access it?
No, they usually aren't putting these files on web server manually. They are rewriting urls using web server (e.g. apache2/nginx). Check the response headers to get more info about what happens behind the scenes.
Please see How to create friendly URL in php?
I'm trying to detect the CNAME used on incoming traffic so I can customize a site accordingly. I have a site that displays some info to the client (actually more complicated, but will work for an example). Some of my customers send their own customers to my site to see this info. They are using a CNAME to get to my site. I would like to display certain logos etc to the viewer based on what CNAME was used to get to my page.
What I have come up with so far is using dns_get_record, I'm I on the right track with that if it can be done at all?
Thanks for any help!
It depends on the language that your web content is written in, it also depends on what webserver you are using. What you want to look for in the web content is the "HOST header". If you want to treat it differently at the web server then what you want to look for is "virtual hosts".
Use Case: I am working on an application where user can build his own html template and publish it to his domain by selecting one. He can use different components to do this.
Issue: I want to transfer the pages build by the user from my domain to his created domain. Something similar to what is done here. Now currently in the prototye, what I do is write the content to a file (a .html file using ajax request) then make an FTP connection to the users domain (possible beacuse domains are created dynamically by the application) and transfer the files to his domain.
This, I believe, cannot be the right way and I would like to build it around a REST service which would make it flexible and also secure.
Research: I went through the web and found some website handle this very well (like the one mentioned above) and believe they have built it as a service. Am I on the ?
I would like suggestions and the possibilities so I can move forward. I am using PHP on the server side and javascript on the client side.
I can see some possible features to add safety to your services :
Check the source IP from the request, and only allow your servers to make the REST calls.
add header('Access-Control-Allow-Origin: *'); with the list of the possible domains to make the calls, instead of the *.
add security tokens related to the caller machine IP address, that would work as a password for the machine.
send all the data using post instead of get.
To update the page, I would go this way :
keep the data in the database, on server 1
add a page that reads from the database the content of the page based on a domain parameter
on the domain page call server 1 with the domain parameter to get the contents
I have a website thats essentially a people directory.
Each person has a profile page, I want to somehow enabled other webmasters to take a link of code that they can paste on their websites and it will pull information of my page and format it in my brand colours etc, with a link back to my website. Is this possible or is an iframe the only way?
You can do it in a iframe but comes with shortcomings, or you can use jquery/javascript to load the content from your site inside a div or some container at the remote site. But you would be facing some cross domain issues due to the common origin policy.
So you have explicitly define in your app to allow headers of origin you prefer, you can do that using JSONP or CORS, where jsonp only supports get request, cors is more appropriate way to do and supportallows any type of request.
CORS, to understand more
Read this
I've got a php rss feed. There's a lot of domains that are using my RSS feed for news. I'd like to be able to track which domains are using my rss feed. I tried using $_SERVER['http_referrer'] to no avail.
Perhaps if you link an image in your feeds, the client will load them and then you will have a referer to look for.
You could of course link to a script which doesn't really load a visible image but tracks the traffic
$_SERVER["REMOTE_ADDR"] is the best you can do to find out the client's IP address. That is not identical to the domain of the site that a possible bot would be working for, though, and will not tell you in what ways your content is re-used.
One thing you could do is attach a "?from=feed" flag to any links that point to your site from the feed. That way, you could at least tell how many visitors come to your site through your feed. The referer variable will then contain the site the link was published on. This is pretty accurate but of course works only if people click the links.
Have you tried your web server logs? You could parse and filter all lines containing/listing access to the resource.