RESTFul API

官方 api 文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html。

本文档仅记录常用操作

查询

列出所有 index

$ curl -i http://10.158.113.158:9200/_cat/indices?v
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 704

health status index          pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   log-2017.01.08   5   1      23121            0        7mb            7mb 
yellow open   log-2017.01.10   5   1       4735            0      1.5mb          1.5mb

查看指定的 index

# 显示指定 index 的结构
$ curl -i http://10.158.113.158:9200/log-2017.01.08?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 1971

{
  "log-2017.01.08" : {
    "aliases" : { },
    "mappings" : {
      "message" : {
        "properties" : {
          "Hostname" : {
            "type" : "string"
          },
          ...
          "python_module" : {
            "type" : "string"
          },
          "request_id" : {
            "type" : "string"
          },
          "severity_label" : {
            "type" : "string"
          },
          "syslogfacility" : {
            "type" : "long"
          },
          "tenant_id" : {
            "type" : "string"
          },
          "user_id" : {
            "type" : "string"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1483833625786",
        "uuid" : "0drApHDRTMWzZsIsuwdi2w",
        "number_of_replicas" : "1",
        "number_of_shards" : "5",
        "version" : {
          "created" : "2040199"
        }
      }
    },
    "warmers" : { }
  }
}

注：加上 ?pretty 会格式化 json 串

查看某文档的部分属性值：_source 的使用

# 只获取文档的 content 和 modul 属性值
$ curl '10.158.113.158:9200/dtcube-2017.01.10/ceilometer/AVmHuCtvm_RLDnunMYID?_source=content,modul' | python -m json.tool
{
    "_id": "AVmHuCtvm_RLDnunMYID",
    "_index": "dtcube-2017.01.10",
    "_source": {
        "content": "Cannot inspect data of MemoryUsagePollster for 09de795d-828e-49c8-a661-35e81db06f2e, non-fatal reason: Failed to inspect memory usage of instance <name=instance-0000008f, id=09de795d-828e-49c8-a661-35e81db06f2e>, can not get info from libvirt.",
        "modul": "ceilometer.compute.pollsters.memory"
    },
    "_type": "ceilometer",
    "_version": 1,
    "found": true
}

只查看文档存储数据，不看其他元数据：_source 的使用

$ curl '10.158.113.158:9200/dtcube-2017.01.10/ceilometer/AVmHuCtvm_RLDnunMYID/_source' | python -m json.tool
{
    "@timestamp": "2017-01-10T17:30:57+08:00",
    "content": "Cannot inspect data of MemoryUsagePollster for 09de795d-828e-49c8-a661-35e81db06f2e, non-fatal reason: Failed to inspect memory usage of instance <name=instance-0000008f, id=09de795d-828e-49c8-a661-35e81db06f2e>, can not get info from libvirt.",
    "id": "28247",
    "level": "WARNING",
    "location": "ceilometer.api",
    "modul": "ceilometer.compute.pollsters.memory",
    "occurTime": "2017-01-05 16:48:55.191",
    "requestID": "[-]"
}

获取多个文档：_mget 的使用

# docs 数组：分别从 index 为 jiaop 和 dtcube 下获取一篇符合查询条件的文档
$ curl '10.158.113.158:9200/_mget?pretty' 、
    -d '{"docs": [{"_index": "jiaop", "_type": "test", "_id": 1}, {"_index": "dtcube-2017.01.10", "_type": "ceilometer", "_id": "AVmHuCtvm_RLDnunMYID"}]}'
{
  "docs" : [ {
    "_index" : "jiaop",
    "_type" : "test",
    "_id" : "1",
    "_version" : 1,
    "found" : true,
    "_source" : {
      "user" : "jiaop",
      "post_date" : "2017-01-13",
      "message" : "Just a test"
    }
  }, {
    "_index" : "dtcube-2017.01.10",
    "_type" : "ceilometer",
    "_id" : "AVmHuCtvm_RLDnunMYID",
    "_version" : 5,
    "found" : true,
    "_source" : {
      "id" : "30001",
      "modul" : "ceilometer",
      "name" : "paul"
    }
  } ]
}

# ids 数组：从同 index，同 type 下获取 2 篇文档
$ curl '10.158.113.158:9200/jiaop/test/_mget?pretty' -d '{"ids": ["1", "2"]}'
{
  "docs" : [ {
    "_index" : "jiaop",
    "_type" : "test",
    "_id" : "1",
    "_version" : 1,
    "found" : true,
    "_source" : {
      "user" : "jiaop",
      "post_date" : "2017-01-13",
      "message" : "Just a test"
    }
  }, {
    "_index" : "jiaop",
    "_type" : "test",
    "_id" : "2",
    "_version" : 2,
    "found" : true,
    "_source" : {
      "user" : "jiaop3",
      "post_date" : "2017-01-13",
      "message" : "Just a test"
    }
  } ]
}

搜索：_search

URL	描述
/_search	全文搜索/空白搜索
/_search?q=mary	搜索所有带 mary 字符串的文档
/gb*/_search	搜索所有以gb为前缀的indices的文档
/gb,us/_search	同时搜索gb 和 us 内的文档
/gb/test/_search	搜索gb下type为test的所有文档
/gb,us/user,tweet/_search	搜索索引gb和索引us中类型user以及类型 tweet 内的所有文档
/_all/user,tweet/_search	搜索所有索引中类型为user以及tweet内的所有文档
/_search?size=5	每次返回 5 条数据。size 默认为 10
GET /_search?size=5&from=10	每次返回 5 条数据，忽略前 10 条数据。from 默认为 0

对于分页搜索，不要一次请求过多或者页码过大的结果。以搜索拥有5个主分片的索引的第1000页(第10001~10010数据)的结果为例，说明分页搜索的原理：

请求节点发送搜索请求到每个分片(总共5个)

每个分片产生前10010个结果(1000页，每页默认 10 条)，并排序

请求节点获取所有结果(50050条)

请求节点对数据(50050条数据)排序，抛弃其中 50040 条

返回搜索结果

分布式系统中，大页码请求所消耗的系统资源是呈指数式增长的，太耗费性能。
注：mysql 中的分页查询也是先获取所有数据，然后抛弃一部分数据，再返回结果的

全文检索/空白检索：不加任何查询条件的，只是返回集群中所有文档的搜索

$ curl -X GET http://10.20.0.253:9200/log-2017.01.10/_search?pretty
{
  "took" : 2,           # 搜索耗时，单位：毫秒
  "timed_out" : false,  # 搜索是否超时
  "_shards" : {         # 参与查询分片的总数
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 24552,
    "max_score" : 1.0,   # 显示所有匹配文档中的_score的最大值
    "hits" : [{          # hits 指明匹配的文档总数，默认返回前十个结果
      ....
    }, {
      "_index" : "log-2017.01.10",
      "_type" : "message",
      "_id" : "AVmFrdNym_RLDnunMVfu",
      "_score" : 1.0,    # _score：文档相关性评分，表示当前文档与查询的匹配程度。默认按照_score由高至低进行排列
      "_source" : {
        "Timestamp" : "2017-01-10T00:01:17",
        "Type" : "log",
        "Logger" : "openstack.nova",
        "Severity" : 6,
        "Payload" : "[req-69928afa-2345-48a5-8a20-a68652e9f76f - - - - -] Successfully synced instances from host 'kolla-com1'.\n",
        "Pid" : 7,
        "Hostname" : "kolla-con1",
        "python_module" : "nova.scheduler.host_manager",
        "programname" : "nova-scheduler",
        "severity_label" : "INFO",
        "request_id" : "69928afa-2345-48a5-8a20-a68652e9f76f"
      }
    } ]
  }
}

说明：

若在 _search 之前未指定索引，则默认使用最古老的 index 值。如此处就会默认查 log-2017.01.08

timeout 并不会终止查询，它只是会在你指定的时间内返回当时已经查询到的数据，然后关闭连接。在后台，其他的查询可能会依旧继续，尽管查询结果已经被返回

在索引中搜索时，Elasticsearch 会将搜索请求转发给相应索引中的所有主从分片，然后收集每个分片的结果

支持正则式的搜索

# 搜索所有以 dtcube 开头的索引内的文档，并返回
$ curl '10.158.113.158:9200/dtcube*/_search?pretty'

查询字符串搜索(query string)：检索名字 dtcube 下 employ 中 user 属性为 paul 的数据

$ curl -X GET http://10.158.113.158:9200/dtcube/employ/_search?q=username:paul | python -m json.tool
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "2",
                "_index": "dtcube",
                "_score": 0.028130025,
                "_source": {
                    "age": 22,
                    "username": "Paul"
                },
                "_type": "employ"
            }
        ],
        "max_score": 0.028130025,
        "total": 1
    },
    "timed_out": false,
    "took": 3
}

query dsl 搜索：Elasticsearch 提供的查询语言，使用 JSON 作为主体进行查询

# 查询匹配 username 为 john 的数据
$ curl -X GET http://10.158.113.158:9200/dtcube/employ/_search \
                 -d '{"query": {"match": {"username": "john"}}}' | python -m json.tool
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "1",
                "_index": "dtcube",
                "_score": 0.30685282,
                "_source": {
                    "age": 21,
                    "username": "John"
                },
                "_type": "employ"
            }
        ],
        "max_score": 0.30685282,
        "total": 1
    },
    "timed_out": false,
    "took": 2
}

# 添加过滤器：搜索 age > 20 且名字为 paul 的记录
$ curl -X GET http://10.158.113.158:9200/dtcube/employ/_search \
     -d '{"query": {
             "filtered": {
                 "filter": {
                     "range": {
                         "age": {"gt": 20}
                      }
                  }
              },
              "query": {"macth": {"username": "paul"}}
           }' | python -m json.tool

检索排序：使用 query dsl + sort 参数，根据文档 _source 内的某属性值，对查询结果进行排序

# 按 id 降序排序
$ curl http://10.158.113.158:9200/dtcube-2017.01.10/_search?pretty -d '{"sort": {"id": {"order": "desc"}}}'
{
  "took" : 834,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : null,
    "hits" : [ {
      "_index" : "dtcube-2017.01.10",
      "_type" : "ceilometer",
      "_id" : "AVmHuCtvm_RLDnunMYID",
      "_score" : null,
      "_source" : {
        "id" : "30001",
        "modul" : "ceilometer",
        "name" : "paul"
      },
      "sort" : [ "30001" ]
    }, {
      "_index" : "dtcube-2017.01.10",
      "_type" : "ceilometer",
      "_id" : "AVmHuCtvm_RLDnunMYIG",
      "_score" : null,
      "_source" : {
        "occurTime" : "2017-01-05 16:48:55.191",
        "id" : "28247",
        "level" : "WARNING",
        "modul" : "ceilometer.compute.pollsters.memory",
        "requestID" : "[-]",
        "content" : "Cannot inspect data of MemoryUsagePollster for 09de795d-828e-49c8-a661-35e81db06f2e, non-fatal reason: Failed to inspect memory usage of instance <name=instance-0000008f, id=09de795d-828e-49c8-a661-35e81db06f2e>, can not get info from libvirt.",
        "@timestamp" : "2017-01-10T17:31:04+08:00",
        "location" : "ceilometer.api"
      },
      "sort" : [ "28247" ]
    }, {
      "_index" : "dtcube-2017.01.10",
      "_type" : "ceilometer",
      "_id" : "AVmHuCtvm_RLDnunMYIE",
      "_score" : null,
      "_source" : {
        "occurTime" : "2017-01-05 16:48:55.191",
        "id" : "28247",
        "level" : "WARNING",
        "modul" : "ceilometer.compute.pollsters.memory",
        "requestID" : "[-]",
        "content" : "Cannot inspect data of MemoryUsagePollster for 09de795d-828e-49c8-a661-35e81db06f2e, non-fatal reason: Failed to inspect memory usage of instance <name=instance-0000008f, id=09de795d-828e-49c8-a661-35e81db06f2e>, can not get info from libvirt.",
        "@timestamp" : "2017-01-10T17:30:57+08:00",
        "location" : "ceilometer.api"
      },
      "sort" : [ "28247" ]
    }, {
      "_index" : "dtcube-2017.01.10",
      "_type" : "ceilometer",
      "_id" : "AVmHuCtvm_RLDnunMYIF",
      "_score" : null,
      "_source" : {
        "occurTime" : "2017-01-05 16:48:55.191",
        "id" : "28247",
        "level" : "WARNING",
        "modul" : "ceilometer.compute.pollsters.memory",
        "requestID" : "[-]",
        "content" : "Cannot inspect data of MemoryUsagePollster for 09de795d-828e-49c8-a661-35e81db06f2e, non-fatal reason: Failed to inspect memory usage of instance <name=instance-0000008f, id=09de795d-828e-49c8-a661-35e81db06f2e>, can not get info from libvirt.",
        "@timestamp" : "2017-01-10T17:30:58+08:00",
        "location" : "ceilometer.api"
      },
      "sort" : [ "28247" ]
    } ]
  }
}

# 按 occurTime 降序排序
$ curl http://10.158.113.158:9200/dtcube-2017.01.10/_search?pretty -d '{"sort": {"occurTime": {"order": "desc"}}}'

注：可根据多个属性值进行排序，排序结果按 sort 中指定的属性字段顺序来

段落检索：使用 query dsl，改 match 为 match_phrase

$ curl -X GET http://10.158.113.158:9200/dtcube/employ/_search \
                 -d '{"query": {"match_phrase": {"username": "john"}}}' | python -m json.tool
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "1",
                "_index": "dtcube",
                "_score": 0.30685282,
                "_source": {
                    "age": 21,
                    "username": "John"
                },
                "_type": "employ"
            }
        ],
        "max_score": 0.30685282,
        "total": 1
    },
    "timed_out": false,
    "took": 2
}

注：在 DSL 查询时，可以使用 highlight 来进行结果高亮

删除：DELETE

删除指定 index

# 加上 pretty 会格式化返回的 json 串
$ curl -i -X DELETE http://10.158.113.158:9200/log-2017.01.04?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 28

{
  "acknowledged" : true
}

注：删除一个文档并不会立即删除(之后还是会删除)，只是在Elasticsearch内部标记成已删除，不能继续访问。因此仅调用DELETE删除会有数据残留

增加：PUT

增加一条新数据：创建一个名为 jiaop 的 index，其 type 为 test, 其 id 为 1。类比 mysql 就是创建数据库 jiaop，并在其创建表 test，然后插入一条数据，其 id 为 1

$ curl -X PUT http://10.158.113.158:9200/jiaop/test/1?pretty -d '{"user": "jiaop", "post_date": "2017-01-13", "message": "Just a test"}'
{
  "_index" : "jiaop",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

$ curl http://10.158.113.158:9200/_cat/indices?v
yellow open   jiaop               5   1          1            0       130b           130b

# 查询
$ curl -X GET http://10.158.113.158:9200/jiaop/test/1?pretty
{
  "_index" : "jiaop",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "user" : "jiaop",
    "post_date" : "2017-01-13",
    "message" : "Just a test"
  }
}

# 检索：默认返回最开始的 10 条数据
$ curl http://10.158.113.158:9200/jiaop/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "jiaop",
      "_type" : "test",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "user" : "jiaop",
        "post_date" : "2017-01-13",
        "message" : "Just a test"
      }
    } ]
  }
}

注：若已经存在一个 id 为 2 的文档，则覆盖掉旧的文档(更新)，并更新文档的 version 值和更改 created 为 false

数据批处理：Bluk

bluk API可以帮助进行数据批处理，极大提高效率。bulk的请求主体的格式稍微有些不同，类似于一个用 "\n" 字符来连接的单行json。如下：

{ action: { metadata }}\n
{ request body        }\n
{ action: { metadata }}\n
{ request body        }\n
...

每一个子请求都会被单独执行，因此一旦有一个子请求失败了，并不会影响到其他请求的成功执行。执行完毕 Elasticsearch 会返回含有 items 的列表，其顺序和请求顺序是相同的。

执行 delete 动作时不需要指定 request body

参数说明：

action/metadata 行：指定了将要在哪个文档中执行什么操作
action：必须是 index, create, update 或者 delete
metadata：需要指明需要被操作文档的 _index, _type 以及 _id

注意事项：

每一行都结尾处都必须有换行字符"\n"，最后一行也要有
行里不能包含非转义字符，以免干扰数据的分析
bulk应该有一个最佳的限度(取决于硬件，文档大小以及复杂性，索引以及搜索的负载)，超过这个限制后，性能不但不会提升反而可能会造成宕机
- 一般比较好初始数量级是1000-5000个文档
- 一般比较好的初始批量容量是5-15MB

更新：_update + POST

局部更新：仅更新文档的某一属性值

# 原始文档
$ curl '10.158.113.158:9200/dtcube-2017.01.10/ceilometer/AVmHuCtvm_RLDnunMYID?pretty' 
{
  "_index" : "dtcube-2017.01.10",
  "_type" : "ceilometer",
  "_id" : "AVmHuCtvm_RLDnunMYID",
  "_version" : 3,
  "found" : true,
  "_source" : {
    "id" : "29248",
    "modul" : "ceilometer"
  }
}

# 更新 id: 需要传入一个 doc 键值对
$ curl -X POST '10.158.113.158:9200/dtcube-2017.01.10/ceilometer/AVmHuCtvm_RLDnunMYID/_update?pretty'  -d '{"doc": {"id": "30000"}}'
{
  "_index" : "dtcube-2017.01.10",
  "_type" : "ceilometer",
  "_id" : "AVmHuCtvm_RLDnunMYID",
  "_version" : 4,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  }
}

# 更新后文档
$ curl '10.158.113.158:9200/dtcube-2017.01.10/ceilometer/AVmHuCtvm_RLDnunMYID?pretty' 
{
  "_index" : "dtcube-2017.01.10",
  "_type" : "ceilometer",
  "_id" : "AVmHuCtvm_RLDnunMYID",
  "_version" : 4,
  "found" : true,
  "_source" : {
    "id" : "30000",
    "modul" : "ceilometer"
  }
}

# 添加新属性
$ curl -X POST '10.158.113.158:9200/dtcube-2017.01.10/ceilometer/AVmHuCtvm_RLDnunMYID/_update?pretty'  -d '{"doc": {"id": "30001", "name": "paul"}}'
{
  "_index" : "dtcube-2017.01.10",
  "_type" : "ceilometer",
  "_id" : "AVmHuCtvm_RLDnunMYID",
  "_version" : 5,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  }
}

注：Elasticsearch 支持使用脚本来完成使用API无法直接完成的自定义行为，默认的脚本语言为 MVEL(一个简单高效的JAVA基础动态脚本语言，它的语法类似于Javascript)，但也支持JavaScript, Groovy 以及 Python

restful api