Unable to delete unique document from Solr

Unable to delete unique document from Solr - php

I am trying to delete a document from Solr using deleteByQuery.
The issue I am getting is that whenever I am trying to delete the document uniquely using only id, it is working fine.
However, when I am trying to delete based on more than one attribute, it is deleting all the documents where it finds even one attribute.
For eg,
if I have two documents say :
{
"id": "232",
"Author": "DEFG",
"Name": "Alphabet",
"Description": "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern",
"_version_": 1513077984291979300
},
{
"id": "231",
"Author": "ABCD",
"Name": "Alphabet",
"Description": "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern",
"_version_": 1513077999721775000
}
and I want to delete the document where id is 231 and Author is "ABCD", I wrote this query to delete that particular document.
$id=231;
$author= "ABCD";
$client->deleteByQuery("id:$id, Author:$author");
$client->commit();
It is deleting both the documents with id 231 and id 232 rather than deleting only one.
Can anyone please resolve this issue or give me any solution so that I can achieve this?
Thanks.

The delete query uses the same syntax as a search query. So you can easily test that query and tune it until it works. In your case, I suspect just doing id:$id AND Author:$author should work.

Related

How to get form fields in Facebook Lead Ads API?

I'm using Facebook Lead Ads API. I need to retrieve all fields from a form by ID. I know I can:
get all forms by calling /<PAGE_ID>/leadgen_forms, but it doesn't return the fields
get a form by /<FORM_ID>, but it displays only the name and a few
data, but not fields
get all leads by /<FORM_ID>/leads - it gives me the fields in each
lead, but only if I have leads; there's also another problem with this solution - the order of the fields is random
Is there any dedicated way to retrieve leadgen form fields, even when there are no leads yet?
I found out that I can download the CSV and in the first row, it gives me all fields IDs (and some other columns). I'm not sure though how I can read the content of this file in PHP, because it gives me some HTML when I try to use get_file_contents() on it.

You can get these by adding non-default field questions, so the url becomes /<form_id>?fields=id,name,questions.
The official docs don't describe the fields available for reading but the questions field and its nested fields are described in the params used for creating a lead form.
Example output
{
"id": "1234567890000",
"name": "test form",
"questions": [
{
"key": "name",
"label": "Full Name",
"type": "FULL_NAME",
"id": "777888999000111"
},
{
"key": "email",
"label": "Email Address",
"type": "EMAIL",
"id": "111222333444555"
}
]
}

Just a warning since this answer comes first on google search.
Since Facebook API v5.0 field "qualifiers" is removed and will throw an error.
Replace it with "questions" which is similar (if not exact) syntax as qualifiers. Found out the hard way on production server...

Finding all pages containing images in Wikimedia Commons category via API

I'm currently trying to find all the pages where images/media from a particular category are being used on Wikimedia Commons.
Using the API, I can list all the images with no problem, but I'm struggling to make the query add in all the pages where the items are used.
Here is an example category with only two media images
https://commons.wikimedia.org/wiki/Category:Automobiles
Here is the API call I am using
https://commons.wikimedia.org/w/api.php?action=query&prop=images&format=json&generator=categorymembers&gcmtitle=Category%3AAutomobiles&gcmprop=title&gcmnamespace=6&gcmlimit=200&gcmsort=sortkey
The long term aim is to find all the pages the images from our collections appear on and then get all the tags from those pages about the images. We can then use this to enhance our archive of information about those images and hopefully used linked data to find relevant images we may not know about from DBpedia.
I might have to do two queries, first get the images then request info about each page, but I was hoping to do it all in one call.

Assuming that you don't need to recurse into subcategories, you can just use a prop=globalusage query with generator=categorymembers, e.g. like this:
https://commons.wikimedia.org/w/api.php?action=query&prop=globalusage&generator=categorymembers&gcmtitle=Category:Images_from_the_German_Federal_Archive&gcmtype=file&gcmlimit=200&continue=
The output, in JSON format, will looks something like this:
// ...snip...
"6197351": {
"pageid": 6197351,
"ns": 6,
"title": "File:-Bundesarchiv Bild 183-1987-1225-004, Schwerin, Thronsaal-demo.jpg",
"globalusage": [
{
"title": "Wikipedia:Fotowerkstatt/Archiv/2009/M\u00e4rz",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Wikipedia:Fotowerkstatt/Archiv/2009/M%C3%A4rz"
}
]
},
"6428927": {
"pageid": 6428927,
"ns": 6,
"title": "File:-Fernsehstudio-Journalistengespraech-crop.jpg",
"globalusage": [
{
"title": "Kurt_von_Gleichen-Ru\u00dfwurm",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Kurt_von_Gleichen-Ru%C3%9Fwurm"
},
{
"title": "Wikipedia:Fotowerkstatt/Archiv/2009/April",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Wikipedia:Fotowerkstatt/Archiv/2009/April"
}
]
},
// ...snip...
Note that you will very likely have to deal with query continuations, since there may easily be more results than MediaWiki will return in a single request. See the linked page for more information on handling those (or just use an MW API client that handles them for you).

I don't understand your use case ("our collections"?) so I don't know why you want to use the API directly, but if you want to recurse in categories you're going to do a lot of wheel reinvention.
Most people use the tools made by Magnus Manske, creator of MediaWiki: in this case it's GLAMourous. Example with 3 levels of recursion (finds 186k images, 114k usages): https://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Automobiles&use_globalusage=1&depth=3
Results can also be downloaded in XML format, so it's machine-readable.

AtTask API - Adding Time Off Hours

I'm using the Chrome Advanced Rest Client to test the AtTask API. I'm getting a lot of stuff figured out, but also getting some unexpected results. The latest is when adding new records to the AtTask Time Off Calendar.
I am able to easily add time off to the calendar. I am use the POST method, with the following URL:
https://COMPANY.attasksandbox.com/attask/api/v4.0/resvt?sessionID=SESSIONIDGOESHERE&userID=USERIDGOESHERE&startDate=2014-11-24T00:00:00&endDate=2014-11-28T23:59:59
This mark all the days between 11/24 through 11/28 as time off. Great, so far. The problem is that it removes all other rime-off records for the specified user. I am not issuing a DELETE, so I'm not understanding whey the records are being deleted. More importantly, I'm not understanding how to keep them from being deleted.
Once again, thanks in advance.

Time-off in attask is stored as a collection and when you make an edit to a collection it will replace the collection data with the date provided in the update. This is why your call is removing existing data.
In order to add a new time-off you will need to make 2 calls 1 to get exsisting time-off and one to enter the data back with the new dates.
note-I am using my own data so dates are a bit different for me but concept is same
Your get call will be
GET /api/resvt/search?userID=[userID]&fields=endDate,startDate,ID
which returns something like
{
"data": [
{
"ID": "547debb6000dea62198bd66b7c73e174",
"objCode": "RESVT",
"endDate": "2014-07-08T23:59:00:163-0600",
"startDate": "2014-07-08T00:00:00:163-0600"
},
{
"ID": "547debb6000dea61b8c695ba24918fe8",
"objCode": "RESVT",
"endDate": "2014-02-13T23:59:00:329-0700",
"startDate": "2014-02-13T00:00:00:329-0700"
}
]
}
Once you have this you can add your new time off to the collection using an updates command on the user object. Notice you are providing IDs to the time-off that is already in the system and the new time-Off you are providing no ID
PUT /attask/api/v4.0/user/[userID]?&sessionID=[sessionID]&updates={reservedTimes: [ { "ID": "547debb6000dea62198bd66b7c73e174", "objCode": "RESVT", "endDate": "2014-07-08T23:59:00:163-0600", "startDate": "2014-07-08T00:00:00:163-0600" }, { "ID": "547debb6000dea61b8c695ba24918fe8", "objCode": "RESVT", "endDate": "2014-02-13T23:59:00:329-0700", "startDate": "2014-02-13T00:00:00:329-0700" }, { "objCode": "RESVT", "endDate": "2014-02-14T23:59:00:329-0700", "startDate": "2014-02-14T00:00:00:329-0700" } ] }
This is a bit bulky and complex but is the only way to do this in the API at this time.

Bulk inserting documents into couchbase database with php - how to?

I am experimenting actually a little bit with couchbase server.
I have tried to read a mysql database table, build a document from each data row and then inserting the document with an id which I generate with
uniqid('table_name');
via cUrl, method is POST.
This so far works pretty good, until the script has inserted roundabout 7050 documents. Then an exception is thrown -> "No buffer space".
Until now I was not able to fix this, so I decided to collect i.e. 50 rows of data build a json_encode(d) string and POST it again via cUrl.
This worked so far if I don't set the id - but I can't figure out how to set the id of the inserted documents.
Actually I try to send my documents in a format like this:
{"docs": {
"_id": {
"geodata_de_54476f7e6adc57.14196038": {
"table": "geodata_de",
"country": "DE",
"postal_code": "01945",
"place_name": "Lindenau",
"state_name": "Brandenburg",
"state_code": "BB",
"province_name": "",
"province_code": "00",
"community_name": "Oberspreewald-Lausitz",
"community_code": "12066",
"lat": "51.4",
"lng": "13.7333",
"Xco": "3861.1",
"Yco": "943.614",
"Zco": "4979.07"
}
}, ...
}
}
but this just inserts ONE document with the above object.
Maybe there is someone here who can point me the right direction.

I would use the Couchbase PHP SDK to insert these documents instead of using curl. http://docs.couchbase.com/developer/php-2.0/storing.html
Also for CB, you do not have to set the ID in the document itself. it depends. I might take a look at instead using the ID you have in your example ("geodata_de_54476f7e6adc57.14196038") and put it as the key for the object in Couchbase. Then you do not necessarily need the _id. The key in Couchbase can be up to 250 bytes of data and you can make it meaningful to your application so you can do lookup by key extremely fast.
Another option is, if you wrote your docs to the filesystem, you could also use cbdocloader utility which is specifically for bulk loading docs. If you are on linux it is in /opt/couchbase/bin/tools/cbdocloader.

Scraping .php websites

I'v been having trouble scraping the following website content: http://www.qe.com.qa/wp/mw/MarketWatch.php
using file_get_contents() never gets me the right tag. I would like to scrape the content of the following tag: td aria-describedby="grid_OfferPrice"
is the website protected from scraping? because when I try the same method with diffrent websites it works. if yes, then what is a good work around for this ?

The way to see if scraping works is to output what file_get_contents returns. If you have nothing back or an error then maybe your IP has been restricted by their admin.
If it returns their source code then it's working but maybe the tag you're looking for has not been found.
Eliminate failures in your process by answering these questions first, one at a time.
I viewed their source code and the aria attribute you are searching for doesn't appear to exist.
It seems they load the data on that page from another source which is at this page (http://www.qe.com.qa/wp/mw/bg/ReadData.php?Types=SELECTED&iType=SO&dummy=1401401577192&_search=false&nd=1401401577279&rows=100&page=1&sidx=&sord=asc)
If you want the data from that page then use file_get_contents on it directly.
The data from that page in an online json editor gives you a neat way of quickly seeing whether this is a useful solution for you.
A sample of that data is listed below:
{
"total": "140",
"page": "1",
"records": "140",
"rows": [
{
"Topic": "QNBK/NM",
"Symbol": "QNBK",
"CompanyEN": "QNB",
"CompanyAR": "QNB",
"Trend": "-",
"StateEN": "Tradeable",
"StateAR": "المتداوله",
"CatEN": "Listed Companies",
"CatAR": "الشركات المدرجة",
"SectorEN": "Banks & Financial Services",
"SectorAR": "البنوك والخدمات المالية",
"ShariahEN": "N/A",
"ShariahAR": "N/A",
"OfferVolume": "7503",
"OfferPrice": "184.00",
"BidPrice": "182.00",
"BidVolume": "15807",
"OpenPrice": "190.0",
"High": "191.7",
"Low": "181.0",
"IMP": "182.0",
"LastPrice": "182.0",
"PrevClosing": "187.0",
"Change": "-5.0",
"PercentChange": "-2.6737",
"Trades": "980",
"Volume": "2588830",
"W52High": "199.0",
"W52Low": "145.0",
"Value": "481813446.4"
},
{
"Topic": "QIBK/NM",
"Symbol": "QIBK",
"CompanyEN": "Qatar Islamic Bank",
"CompanyAR": "المصرف ",
"Trend": "+",
"StateEN": ...
Make sure you read this link about 'scraping' etiquette.

Link: http://simplehtmldom.sourceforge.net/
$dom = new DOMDocument();
$dom->loadHTML(file_get_contents("EXAMPLE.COM");
$items = $dom->getElementsByTagName("YOUR TAG");
This class allows you to search HTML code for elements. I have used it a few times before and it is by far the best solution I have found for your issue.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Unable to delete unique document from Solr - php

The delete query uses the same syntax as a search query. So you can easily test that query and tune it until it works. In your case, I suspect just doing id:$id AND Author:$author should work.

Related

How to get form fields in Facebook Lead Ads API?

Finding all pages containing images in Wikimedia Commons category via API

AtTask API - Adding Time Off Hours

Bulk inserting documents into couchbase database with php - how to?

Scraping .php websites

Categories

Resources