tQL - Operation with Nested Objects
Nested objects
Elasticsearch lacks support for SQL-style joins (e.g., JOIN in relational databases). Instead, it denormalizes data, storing and indexing documents independently for optimized search performance. By default, Elasticsearch flattens JSON objects, which can cause incorrect search results when querying arrays of objects. Since Elasticsearch excels at quickly searching documents of the same type and does not generally support joins, the best solution is to use the nested type. The nested field type ensures that objects within an array remain separate entities within the same document, preserving their relationships and enabling accurate queries.
- Give me all tickets where one of the tasks is assigned to me
- Give me the sum of worklogs on tickets of project ‘X’
- Give me all tasks on tickets of project ‘X’
You need to use nested objects in ElasticSearch. Please consult the ElasticSearch documentation.
We will use nested documents of worklogs inside the ticket index in the following example:
"custom": {
"worklogs": [
{
"duration": 180,
"workerId": "0b4b7f40-4731-4ffd-b1b2-8c4ec7030ff7",
"endDate": "2022-01-14T19:16:00.000+0000",
"id": "54a21df3-eb77-4ee0-87d5-d7bbf54127ea",
"type": null,
"startDate": "2022-01-14T16:16:00.000+0000"
}
]
}
Return nested rows
Simplified Methodology for the Retrieval of Nested Objects and the Viewing of Their Data.
Retrieving and inspecting nested objects can be done using a straightforward approach. This method allows users to efficiently access nested data structures and analyze their contents.
SELECT key, tasks
from ticket
WHERE key = 'ZE_3';
This query retrieves the key and tasks fields from the ticket table where the key matches 'ZE_3'.
key | tasks |
---|---|
ZE_3 | [ { "dueDate": null, "description": null, "userGroupId": null, "skills": [], "activityId": "Activity_1ed4tpv", "routing": null, "taskType": "USER_TASK", "plannedStartTime": null, "tsmTaskDefinitionCode": null, "commitedStartTimeTo": null, "startTime": "2024-06-27T07:15:32.469+0000", "id": "0a7e5263-3455-11ef-a796-3e5d7e594a64", "_timestamp": null, "processInstanceId": "0a7d19dd-3455-11ef-a796-3e5d7e594a64", "processDefinitionId": "ticket_zkouska.exportu:2:fb335fca-3454-11ef-a796-3e5d7e594a64", "userGroupCode": null, "entityType": "TASK", "module": "ticket", "custom": null, "active": true, "history": [], "charsTyped": {}, "userId": null, "commitedStartTimeFrom": null, "taskDefinitionKey": "Activity_1ed4tpv", "executionId": "0a7d19dd-3455-11ef-a796-3e5d7e594a64", "canceled": false, "taskSpecification": null, "plannedEndTime": null, "name": "Vyřešit", "wfmUserId": null, "endTime": null, "incident": null, "chars": {}, "status": null, "followupDate": null } ] |
Advantages:
- The query returns all documents stored within the tasks field as an array.
- Users can view the complete dataset in its raw form, ensuring that every document and all associated keys are accessible.
- The structure of the nested JSON data remains intact, allowing for a comprehensive overview of the original records.
Disadvantages:
- The returned data is not well-suited for analysis in its current form.
- The nested JSON structure makes the output difficult to interpret, especially when visually inspecting large datasets.
- The lack of a tabular or structured format makes it challenging to extract meaningful insights without further processing or transformation. It is crucial to ensure that every document and all associated keys are accessible.
Instead of retrieving the entire nested JSON structure, we can extract specific fields from the nested object and return them as separate columns. This can be achieved using the following SQL query:
SELECT key
, tasks.name
, tasks.taskType
FROM ticket
WHERE key = 'LNX1';
key | tasks.name | tasks.taskType |
---|---|---|
LNX1 | [ "Řešení", "Řešení", "Řešení", "Po message", "Řešení", "Řešení", "Řešení", "Řešení" ] | [ "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK" ] |
Advantages of this approach:
- This query provides a more structured output by directly returning individual fields (name and taskType) from the nested tasks object.
- Users can immediately see the relevant values without needing to manually parse a complex JSON structure.
- This method makes it easier to work with specific fields in reporting, visualisation, or further data processing.
Disadvantages of this approach:
- The extracted data is returned in arrays, reflecting the structure of the original nested object.
- Analyzing relationships between different fields within the nested objects can be challenging because the dependencies between elements in multiple arrays may not be clearly defined.
- If the tasks field contains multiple objects, the query might return ambiguous or unclear results, making it harder to correlate related data points.
Let's look at the some issues that can arise when selecting specific fields from a nested object directly. When you query a field from a nested document, the result is usually an array that contains all the values associated with that field across all entries in the nested document. This means that instead of retrieving a single value, you may end up with an array containing multiple values from the nested field, even if you're only interested in one of them.
select key
, custom.worklogs.startDate
, custom.worklogs.workerId
from ticket
where key = 'TSMTTS-1384'
key | custom.worklogs.startDate | custom.worklogs.workerId |
TSMTTS-1384 | 2021-12-13T08:48:00.000+0000,2022-01-04T07:12:00.000+0000,2022-01-03T03:13:00.000+0000,2022-01-04T10:51:00.000+0000,2022-01-06T06:47:00.000+0000,2022-01-20T08:13:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 |
However, you can filter the results using the nested filter operator []:
select key
, custom.worklogs[workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].startDate
, custom.worklogs.workerId
from ticket
where key = 'TSMTTS-1384'
Internally, this is translated to script query:
select key, script('return
params._source["custom"]["worklogs"]
?.stream()
?.filter(
__collectionItem -> __collectionItem.workerId == "d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414"
)
?.map(
__collectionItem -> __collectionItem.startDate
)
?.collect(Collectors.toList())
') as custom.worklogs.startDate
, custom.worklogs.workerId
from ticket
where key = 'TSMTTS-1384'
Both queries give the same result with filtered startDate for worklogs of one worklogId:
key | custom.worklogs.startDate | custom.worklogs.workerId |
TSMTTS-1384 | 2021-12-13T08:48:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 |
Please note, that because we did not filter on the workerId field, results from all nested records are returned.
Notice: For empty filters, a nested indicator is optional for the select query. Both columns return the same value:
select custom.worklogs.workerId, custom.worklogs[].workerId from ticket
This is not the case for where / group by / order by clauses, where brackets are mandatory even for empty filters.
In general, if you want to return nested documents, you need to use the special syntax:
FROM nested(entity, nestedPath)
select key
, custom.worklogs.workerId
, custom.worklogs.startDate
, custom.worklogs.duration
from nested(ticket, custom.worklogs)
where key = 'TSMTTS-1384'
key | custom.worklogs.workerId | custom.worklogs.startDate | custom.worklogs.duration |
TSMTTS-1384 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 2021-12-13T08:48:00.000+0000 | 15 |
TSMTTS-1384 | 6f30f0d3-1074-45a9-bfe9-93f0c802a10a | 2022-01-04T07:12:00.000+0000 | 240 |
TSMTTS-1384 | 6f30f0d3-1074-45a9-bfe9-93f0c802a10a | 2022-01-03T03:13:00.000+0000 | 240 |
TSMTTS-1384 | 6c570ee6-139e-47b0-ac8f-8c9ae2d203ee | 2022-01-04T10:51:00.000+0000 | 30 |
TSMTTS-1384 | 6c570ee6-139e-47b0-ac8f-8c9ae2d203ee | 2022-01-06T06:47:00.000+0000 | 60 |
TSMTTS-1384 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 2022-01-20T08:13:00.000+0000 | 15 |
Let's go throw other sample. Please note that the following two queries produce the same result, meaning they can be considered alternative approaches to retrieving data from nested structures:
select key
, tasks.name
, tasks.taskType
, tasks.id
from nested(ticket, tasks)
where key = 'LNX1'
This query explicitly the nested tasks array within the ticket document.
Alternatively, using the GROUP BY clause:
select key
, tasks.name
, tasks.taskType
, tasks.id
from ticket
where key = 'LNX1'
group by key, tasks[].name, tasks[].taskType, tasks[].id
This method achieves the same result by grouping data at the document level while implicitly handling the nested fields.
key | tasks.name | tasks.taskType | tasks.id |
---|---|---|---|
LNX1 | Řešení | USER_TASK | 5c635c86-e23e-11ef-9004-3256945c58db |
The second approach is executed as a special aggregation query, where the results are derived from a nested aggregation subquery in the background. This means that the query engine processes the nested fields internally, aggregating and extracting relevant data before returning the final structured output.
Key Differences between this two approaches:
Feature | Group by (Group by Composite) | Nested (Nested with Filters & Top Hits) |
---|---|---|
Primary Focus | Grouping and counting terms in nested fields | Retrieving documents matching filters |
Returned Data | Aggregated counts and term distributions | Actual matching documents (via top_hits ) |
Type of Aggregation | Composite + Nested aggregation | Nested aggregation with filters and top_hits |
Filters Applied | Top-level filter on the "key" field | Filters inside the nested aggregation |
Document Retrieval | No documents returned | Actual documents returned (filtered) |
Count | Counts of terms in nested fields | Count of matching tasks via filteredCount |
Efficiency | Efficient for grouping and counting | Requires retrieval of documents, potentially less efficient |
So, in conclusion, approach 1 Best for aggregating and counting data without needing to see individual document data however approach 2 Best for retrieving specific documents that match certain criteria and also counting matching tasks.
When working with deeply nested data structures that extend beyond the second level, such as a third level of nesting (tasks.history.changes and tasks.history.duration), similar principles apply for querying and accessing the data efficiently.
Below are different ways to retrieve data from a third-level nested structure and how to handle it properly.
select key
, tasks[].history[].changes
, tasks[].history[].duration
from ticket
where key='VBFeedbackPSP-7'
or
select key
, tasks.history.changes
, tasks.history.duration
from nested(ticket, tasks)
where key='VBFeedbackPSP-7'
and apply ordinary operation this data
key | tasks.history.changes | tasks.history.duration |
---|---|---|
VBFeedbackPSP-7 | [ [ "USER_GROUP" ], [ "STATUS_CODE" ], [ "USER" ] ] | [ 0, 1, 0 ] |
For a clearer and more structured representation of the data, it is highly recommended to use the GROUP BY clause when working with third-level nested fields. This approach ensures that the extracted data maintains an organized structure, preventing duplication and preserving the relationships within the nested hierarchy. group by clouse
select key
, tasks.history.changes
, tasks.history.duration
from nested(ticket, tasks)
group by key, tasks.history[].changes, tasks.history[].duration
By applying GROUP BY, the query aggregates the results in a structured format, where each row corresponds to a specific combination of values from the inner JSON documents. This method ensures that nested records are properly grouped for deeply nested data.
key | tasks.history.changes | tasks.history.duration |
---|---|---|
vbfeedbackpsp-7 | STATUS_CODE | 1 |
vbfeedbackpsp-7 | USER | 0 |
vbfeedbackpsp-7 | USER_GROUP | 0 |
Nested in the WHERE condition
tQL contains special syntax for filtering in nested documents, similar to the filter in the select clause:
select key, custom.worklogs.startDate, custom.worklogs.workerId, custom.worklogs.duration
from ticket
where custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414']
key | custom.worklogs.startDate | custom.worklogs.workerId | custom.worklogs.duration |
TSMTTS-1390 | 2021-12-15T09:24:00.000+0000,2022-02-14T14:00:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 45,60 |
TSMTTS-1372 | 2021-12-09T05:30:00.000+0000,2021-12-08T10:00:00.000+0000,2021-12-10T13:08:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,3255f56a-95aa-4c5f-90d9-e5925684ea55,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 45,20,15 |
It is important to understand the difference between the usage of nested filters in where condition and in select. Where condition filters “tickets”, not “worklogs”, if you want to find tickets with worklogs matching the filter and get only associated worklogs records, you need to combine both approaches:
select key,
custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].startDate,
custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].workerId,
custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].duration
from ticket
where custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414']
key | custom.worklogs.startDate | custom.worklogs.workerId | custom.worklogs.duration |
TSMTTS-1384 | 2021-12-13T08:48:00.000+0000,2022-01-20T08:13:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 15,15 |
TSMTTS-1390 | 2021-12-15T09:24:00.000+0000,2022-02-14T14:00:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 45,60 |
TSMTTS-1379 | 2021-12-13T14:18:00.000+0000 | d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414 | 150 |
Data aggregation in nested objects
Group by and Elasticsearch aggregation methods
Elasticsearch organizes aggregations into three categories:
- Metric aggregations that calculate metrics, such as sum or average, from field values.
- Bucket aggregations that group documents into buckets also called bins, based on field values, ranges, or other criteria.
- Pipeline aggregations that take input from other aggregations instead of documents or fields.
For metric aggregations use aggregation functions in tQL. Bucket aggregations are realized in Group by clauses.
Group by clause supported syntax:
Group by group_list,
[group_list] ::=\{\{ * | \{ key_name | prefix.key_name | nested_obj[].nested_key | const_field | expression | script(pianless_script) \}
Notice that group_list must contain all select_list keys, instead of aggregates. However, group_list may also contain additional keys if bucketing logic is necessary.
A simple group by resembles the standard SQL even with flattend structure:
select chars.project
, count(id) as "Tasks"
from ticket
group by chars.project
A nested Group by with a calc field
When nested object fields are grouped, use syntax with [ ], e.g. nestedObjectName[].fieldName in Group by clause
select custom.worklogs.workerId
, sum('doc["custom.worklogs.duration"].value / 60.0 / 8') as "Worklog MD"
from ticket
group by custom.worklogs[].workerId
It is also possible to combine nested and not nested fields:
select converted priorityId as priority
, statusId as Status
, count(custom.worklogs.workerId)
from ticket
group by priorityId, statusId, custom.worklogs[]
order by statusId desc
Filtering nested objects by Group by
There are three ways to apply filtering with the Group by clauses:
-
Where clause as usual supports all predicate to create a range of documents for bucketing
-
Having clauses for filtered aggregation result in sum, min, max, avg, and count functions.
-
Bucket filter for nested objects
The first one is simple:
select converted priorityId as priority
, statusId as Status
, count(custom.worklogs.workerId)
from ticket
where priorityId is not null and statusId is not null
group by priorityId, statusId, custom.worklogs[]
limit 5
priority | Status | Worklogs - workerId |
Major | New | 0 |
Major | Canceled | 24 |
Major | Postponed | 60 |
Major | Closed | 5288 |
Major | In progress | 5948 |
The second one should only be used for aggregation results:
select converted priorityId as priority
, statusId as status
, count(id) as worklogs
from ticket
Where priorityId is not null and statusId is not null
group by priorityId, statusId, custom.worklogs[]
Having count(id) >60
-
The bucket filter for nested objects has two different results for nested field aggregation.
If we use filters in the Where clause, tQL creates a range of objects which contains nested by conditions and calculates aggregates of all nested values in this range.
select converted priorityId as priority
, statusId as status
, custom.worklogs.workerId
, count(custom.worklogs.duration) as worklogs
from ticket
Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
group by priorityId, statusId, custom.worklogs[].workerId
priority | status | Worklogs - workerId | worklogs |
Major | Closed | Michal Jandura | 74 |
Major | Closed | Marek Schmidt | 19 |
Major | Closed | Jaroslav Lehar | 8 |
Major | Closed | Petr Kropík | 8 |
Major | Closed | Jan Tecl | 5 |
Major | Closed | Jira Synchronizace | 4 |
Major | Closed | Petr Žoček | 3 |
Major | Closed | Martin Tměj | 3 |
When we use bucket filters in a Group by clause, only nested values which support condition expressions will be aggregated
select converted priorityId as priority
, statusId as status
, custom.worklogs.workerId
, count(custom.worklogs.duration) as worklogs
from ticket
Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
group by priorityId, statusId, custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7'].workerId
Priority | status | Worklogs - workerId | worklogs |
Major | Closed | Michal Jandura | 74 |
Major | In progress | Michal Jandura | 100 |
Blocker | Closed | Michal Jandura | 30 |
Minor | Closed | Michal Jandura | 14 |
Critical | Closed | Michal Jandura | 40 |
If you want to group by a nested field, you need to always use nested “[]” syntax:
select converted chars.project
, statusName
, priorityId
, sum(custom.worklogs.duration)
, count(custom.worklogs.id)
from ticket
group by statusName, chars.project, priorityId,
custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
Projekt | statusName | Priority | custom.worklogs.duration | custom.worklogs.id |
RMD | closed | Major | 0 | 0 |
SPM | closed | Major | 6090 | 50 |
SPM | closed | Blocker | 480 | 7 |
select custom.worklogs.workerId
, count(custom.worklogs.id) AS "Count worklogs"
, count(id) AS "Count tickets"
from ticket
group by custom.worklogs[].workerId
custom.worklogs.workerId | Count worklogs | Count tickets |
04f316cd-4897-45e3-8145-7bf9e951c9a4 | 381 | 75 |
0511ca9b-e20f-43c5-b59c-f864385b46d7 | 258 | 76 |
055b7d5c-e40c-4565-94d7-5cf620f1abe9 | 381 | 98 |
0b4b7f40-4731-4ffd-b1b2-8c4ec7030ff7 | 684 | 255 |
0e6cfa34-dc2b-4157-b1df-a55278621aa0 | 186 | 49 |
Also notice different numbers of nested documents (worklogs) and main documents (tickets). You may also want to filter the worklogs for grouping:
select custom.worklogs.workerId
, count(custom.worklogs.id) AS "Count worklogs"
, count(id) AS "Count tickets"
from ticket
group by custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].workerId
custom.worklogs.workerId | Count worklogs | Count tickets |
04f316cd-4897-45e3-8145-7bf9e951c9a4 | 381 | 75 |
This is not the same as filtering the tickets followed by group by (first select all tickets containing at least one worklog with this worker, then group by all worklogs from these tickets):
select custom.worklogs.workerId
, count(custom.worklogs.id) AS "Count worklogs"
, count(id) AS "Count tickets"
from ticket
where custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
group by custom.worklogs[].workerId
custom.worklogs.workerId | Count worklogs | Count tickets |
04f316cd-4897-45e3-8145-7bf9e951c9a4 | 381 | 75 |
0511ca9b-e20f-43c5-b59c-f864385b46d7 | 13 | 5 |
055b7d5c-e40c-4565-94d7-5cf620f1abe9 | 10 | 2 |
More complex example:
select custom.worklogs.workerId
, round(sum(custom.worklogs.duration)/60/8, 1) AS "Sum duration MD"
from ticket
group by custom.worklogs[startDate > '2022-01-01T00:00:00.000+0000'].workerId
custom.worklogs.workerId | Sum duration MD |
6f30f0d3-1074-45a9-bfe9-93f0c802a10a | 618.9 |
d8769f98-9758-4fcd-80a2-410c9d915b52 | 92.5 |
e7f5865c-7719-4a14-beef-4bce2aae34f9 | 30.6 |
bd4cabfe-17b5-4d43-9020-7b8d04187567 | 74.5 |
Or even:
select custom.worklogs[].workerId
, format(histogram(custom.worklogs[startDate > '2022-01-01T00:00:00.000+0000'].startDate, 'month'), 'MM') as "Month"
, round(sum(custom.worklogs.duration)/60/8, 1) AS "Sum duration MD"
from ticket
group by custom.worklogs[].workerId
, histogram(custom.worklogs[startDate > '2022-01-01T00:00:00.000+0000'].startDate, 'month')
custom.worklogs.workerId | Month | Sum duration MD |
6f30f0d3-1074-45a9-bfe9-93f0c802a10a | 01 | 118 |
e7f5865c-7719-4a14-beef-4bce2aae34f9 | 01 | 11.3 |
6c570ee6-139e-47b0-ac8f-8c9ae2d203ee | 01 | 16.6 |
d8769f98-9758-4fcd-80a2-410c9d915b52 | 01 | 16.4 |
Group by and differentd in ways of filtering data
Nested in order by
You can sort by nested value in the same format as in where/group by.
select key
, format(custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].startDate, 'dd.MM.yyyy HH:mm') as "When"
, custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].duration as "Duration minutes"
from ticket
where custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
order by custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].startDate
key | When | Duration minutes |
TSMTTS-868 | 14.07.2021 07:04 | 360 |
TSMTTS-850 | 15.07.2021 05:01, 29.07.2021 13:20, 30.07.2021 10:45, 04.08.2021 08:00 | 120,180,180,330 |
TSMTTS-881 | 15.07.2021 10:00, 16.07.2021 04:43, 19.07.2021 12:46, 21.07.2021 11:51 | 90,120,120,120 |
TSMTTS-883 | 15.07.2021 12:01, 16.07.2021 06:57, 19.07.2021 10:59, 20.07.2021 05:34, 20.07.2021 09:02, 21.07.2021 06:06, 02.08.2021 10:01 | 60,150,180,210,210,240,270 |
The query may look a bit tricky. You need to specify the filtering clause on many levels:
- where - select _tickets _which contain at least one work log matching the criteria
- order by - order those tickets by the start date of the earliest work log of this ticket
- select - return start date/duration of all work logs of this ticket filtered by the criteria (internally calculated by ElasticSearch painless script)
If you think that this does not suit you, you probably want to return nested rows instead. For example:
select key
, custom.worklogs.startDate
, custom.worklogs.duration
from nested(ticket, custom.worklogs)
where custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
order by custom.worklogs.startDate
key | custom.worklogs.startDate | custom.worklogs.duration |
TSMTTS-868 | 2021-07-14T07:04:00.000+0000 | 360 |
TSMTTS-850 | 2021-07-15T05:01:00.000+0000 | 180 |
TSMTTS-881 | 2021-07-15T10:00:00.000+0000 | 120 |
TSMTTS-883 | 2021-07-15T12:01:00.000+0000 | 60 |
TSMTTS-888 | 2021-07-15T15:35:00.000+0000 | 120 |
For access to nesdted objects always use [ ] when calling objects in form or/and group by
It is also possible to combine nested and not nested fields:
select converted priorityId as priority
, statusId as Status
, count(custom.worklogs.workerId)
from ticket
group by priorityId, statusId, custom.worklogs[]
order by statusId desc
Filtering nested objects by Group by
There are three ways to apply filtering with the Group by clauses:
-
Where clause as usual supports all predicate to create a range of documents for bucketing
-
Having clauses for filtered aggregation result in sum, min, max, avg, and count functions.
-
Bucket filter for nested objects
The first one is simple:
select converted priorityId as priority
, statusId as Status
, count(custom.worklogs.workerId)
from ticket
where priorityId is not null and statusId is not null
group by priorityId, statusId, custom.worklogs[]
limit 5
priority | Status | Worklogs - workerId |
Major | New | 0 |
Major | Canceled | 24 |
Major | Postponed | 60 |
Major | Closed | 5288 |
Major | In progress | 5948 |
-
The bucket filter for nested objects has two different results for nested field aggregation.
If we use filters in the Where clause, tQL creates a range of objects which contains nested by conditions and calculates aggregates of all nested values in this range.
select converted priorityId as priority
, statusId as status
, custom.worklogs.workerId
, count(custom.worklogs.duration) as worklogs
from ticket
Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
group by priorityId, statusId, custom.worklogs[].workerId
priority | status | Worklogs - workerId | worklogs |
Major | Closed | Michal Jandura | 74 |
Major | Closed | Marek Schmidt | 19 |
Major | Closed | Jaroslav Lehar | 8 |
Major | Closed | Petr Kropík | 8 |
Major | Closed | Jan Tecl | 5 |
Major | Closed | Jira Synchronizace | 4 |
Major | Closed | Petr Žoček | 3 |
Major | Closed | Martin Tměj | 3 |
When we use bucket filters in a Group by clause, only nested values which support condition expressions will be aggregated
select converted priorityId as priority
, statusId as status
, custom.worklogs.workerId
, count(custom.worklogs.duration) as worklogs
from ticket
Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
group by priorityId, statusId, custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7'].workerId
Priority | status | Worklogs - workerId | worklogs |
Major | Closed | Michal Jandura | 74 |
Major | In progress | Michal Jandura | 100 |
Blocker | Closed | Michal Jandura | 30 |
Minor | Closed | Michal Jandura | 14 |
Critical | Closed | Michal Jandura | 40 |
Nested Grouping
In tQL, aggregation functions like sum(), count(), max(), min(), and avg() can be used in combination with complex painless scripts to perform sophisticated transformations and calculations on data. These aggregation functions allow you to extract meaningful insights from your dataset by summarizing values and applying conditions. This type of complex aggregation is useful when you want to group documents based on certain attributes, such as extracting specific parts of a timestamp (e.g., month) and calculating aggregates like the average or maximum month of creation for tickets in a Closed status. This example is used to perform aggregation over a complex, nested structure, allowing users to group tickets by project and task type while applying conditions (e.g., tasks that are inactive and have non-null id) to the data. This scenario would be useful when you want to group and aggregate tasks across all statuses (active or inactive), comparing worklog durations and counting task IDs and priority names for each group.
select converted chars.project
, tasks.taskType
, sum('def d ;
if (doc["custom.worklogs.duration"].size()!=0)
{d=doc["custom.worklogs.duration"].value; }
else d=0;
return d')
, count(tasks.id)
, count(priorityName)
from ticket
where tasks[active = false]
group by chars.project, tasks[active = false and id is not null].taskType, custom.worklogs[id is not null]
Nested Grouping with a Calculated Field
The sample query below applies nested grouping to ticket-related data, aggregating key task details such as task type, total worklog duration and task count, while filtering for inactive tasks. It integrates aggregation functionality with Painless scripting, demonstrating the primary capabilities of tQL. In addition, the query demonstrates flexible filtering and grouping across nested structures, enabling detailed task analysis.
select chars.project
, tasks.taskType
, sum('def d ;
if (doc["custom.worklogs.duration"].size()!=0) {
d=doc["custom.worklogs.duration"].value;
}
else d=0;
return d')
, count(tasks.id)
, count(priorityName)
from ticket
where tasks[active = false]
group by chars.project,
tasks[active = false and id is not null].taskType, custom.worklogs[id is not null]
For a standard query, we can ask elastic to return https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html, which has the original value. We use it for the standard select queries, hence 'select priorityName from ticket' would return original values. It is not possible for the Group by, 'select priorityName from ticket group by priorityName', which returns a grouped value.
select name
, sum('doc["custom.worklogs.duration"].value / 60.0 / 8') as "Worklog MD"
from ticket
where creationDate between '2022-02-01T00:00:00.000+0000' and '2022-02-28T00:00:00.000+0000'
group by name, custom.worklogs[]
When nested object fields are grouped, use syntax with [ ], e.g. nestedObjectName[].fieldName in Group by clause
select name
, custom.worklogs.workerID
, count(id)
, sum('doc["custom.worklogs.duration"].value / 60.0 / 8') as "Worklog MD"
from ticket
group by name, custom.worklogs[].workerID
Filtering in queries with data aggregation
Let's clearly examine the filtering possibilities in TQL and the key differences between using WHERE filters and GROUP BY filters.
select converted key
, tasks[].active
, tasks[].name
, tasks[].activityId
, tasks[].id
from ticket
where key in ('lnx1', 'adr_tik2')
group by key, tasks[].active, tasks[].name, tasks[].id, tasks[].activityId
A query without a filter includes all documents from the dataset. The WHERE tasks[] condition ensures that only tickets containing at least one task are considered, but it does not filter tasks based on specific attributes. As a result, all tasks associated with the selected tickets (key in ('lnx1', 'adr_tik2')) are included in the results. Therefore, predicates for basic documents work in ordinary way.
key | tasks.active | tasks.name | tasks.activityId | tasks.id |
---|---|---|---|---|
adr_tik2 | true | Řešení | Activity_1ed4tpv | 5c9a0aea-ce96-11ef-8f4c-0e9aa0b6cff9 |
adr_tik2 | true | Řešení | Activity_1ed4tpv | f1953807-cdfe-11ef-8405-7e930c7a76bb |
lnx1 | false | Řešení | Activity_1ed4tpv | 2be9eb07-e23c-11ef-9004-3256945c58db |
lnx1 | false | Řešení | Activity_1ed4tpv | 6204944a-d96b-11ef-b50e-5a12acd032e5 |
lnx1 | false | Řešení | Activity_1ed4tpv | 6fb2dd7e-e23d-11ef-9004-3256945c58db |
lnx1 | false | Řešení | Activity_1ed4tpv | 8a07ea1e-d8b1-11ef-99d9-5a12acd032e5 |
lnx1 | false | Řešení | Activity_1ed4tpv | b2f65bb4-d7ca-11ef-99d9-5a12acd032e5 |
lnx1 | false | Řešení | Activity_1ed4tpv | ef960a6a-d960-11ef-b50e-5a12acd032e5 |
lnx1 | false | Po message | Activity_08n06fv | 34bcad81-d961-11ef-b50e-5a12acd032e5 |
lnx1 | true | Řešení | Activity_1ed4tpv | 5c635c86-e23e-11ef-9004-3256945c58db |
After applying a filter for a nested object, we may observe in result some records that do not satisfy the conditions.
select converted key
, tasks[].active
, tasks[].name
, tasks[].activityId
, tasks[].id
from ticket
where tasks[active=false] and key in ('lnx1', 'adr_tik2')
group by key, tasks[].active, tasks[].name, tasks[].id, tasks[].activityId
This query retrieves all tasks for each ticket that contains at least one task that meets the condition active = false. However, both active = true and active = false tasks are included in the results. The condition WHERE tasks[] ensures that only tickets with at least one task are considered with the predicate.
key | tasks.active | tasks.name | tasks.activityId | tasks.id |
---|---|---|---|---|
lnx1 | false | Řešení | Activity_1ed4tpv | 2be9eb07-e23c-11ef-9004-3256945c58db |
lnx1 | false | Řešení | Activity_1ed4tpv | 6204944a-d96b-11ef-b50e-5a12acd032e5 |
lnx1 | false | Řešení | Activity_1ed4tpv | 6fb2dd7e-e23d-11ef-9004-3256945c58db |
lnx1 | false | Řešení | Activity_1ed4tpv | 8a07ea1e-d8b1-11ef-99d9-5a12acd032e5 |
lnx1 | false | Řešení | Activity_1ed4tpv | b2f65bb4-d7ca-11ef-99d9-5a12acd032e5 |
lnx1 | false | Řešení | Activity_1ed4tpv | ef960a6a-d960-11ef-b50e-5a12acd032e5 |
lnx1 | false | Po message | Activity_08n06fv | 34bcad81-d961-11ef-b50e-5a12acd032e5 |
lnx1 | true | Řešení | Activity_1ed4tpv | 5c635c86-e23e-11ef-9004-3256945c58db |
When filters are applied only in the GROUP BY clause, it can lead to the inclusion of unintended duplicates or records that don't satisfy the intended conditions. This is because GROUP BY aggregates results based on specified columns, and any filtering done here only affects the aggregated data, not the original dataset. As a result, it can unintentionally expand the scope of the results, leading to duplicates or over-aggregation, which is not ideal.
select converted key
, tasks[].active
, tasks[].name
, tasks[].activityId
, count(tasks[].id)
from ticket
where tasks[active=false ] and key in ('lnx1', 'adr_tik2')
group by key, tasks[active=false].active, tasks[].name, tasks[].activityId
To ensure performance, it’s crucial to apply filters in the WHERE clause before any aggregation. This way, the data is already narrowed down before performing the grouping and aggregation operations, leading to more precise and efficient results.
key | tasks.active | tasks.name | tasks.activityId | count_tasks.id_0 |
---|---|---|---|---|
lnx1 | false | Řešení | Activity_1ed4tpv | 6 |
lnx1 | false | Po message | Activity_08n06fv | 1 |
In addition to using filters in the WHERE clause, we can apply filters directly in the SELECT statement to target nested objects. This allows for more specific results based on additional conditions for nested fields. Filters in the SELECT clause are particularly useful for analytical comparisons, as they allow us to count or aggregate data based on particular criteria that can't always be expressed in the WHERE clause.
select converted key, tasks[].active, tasks[].activityId
, count(tasks[name like '%Řešen%' ].id) as "Solved task"
, count(tasks[].id) as "Total"
, round(count(tasks[name like '%Řešen%' ].id) / count(tasks[].id) *100, 1) as "% solved"
from ticket
where tasks[active=false ] and key like 'tick1'
group by key, tasks[active=false].active, tasks[].activityId
key | tasks.active | tasks.activityId | Solved task | Total | % solved |
---|---|---|---|---|---|
tick1 | false | Activity_1ed4tpv | 4 | 6 | 66.7 |
tick1 | false | Activity_0072gmu | 0 | 1 | 0 |
tick1 | false | Activity_0akgfb0 | 0 | 1 | 0 |
tick1 | false | Activity_1xcu8v0 | 0 | 1 | 0 |
Aggregate with two nested objects
tQL is a specialized query language designed to handle deeply nested data structures efficiently. It is particularly useful when working with hierarchical data stored in NoSQL engine Elasticsearch.
At this point we have two use case. First is simple and resolve by the ordinary way of tQl operators appliyng.
select count(tasks.id)
, sum(custom.worklogs.duration)
, priorityName
from ticket
group by priorityName, custom.worklogs[], tasks[]
Secon one is nested in nested. Nested-in-nested querying follows the same core rules as regular nested queries, with an additional layer of complexity due to deeper hierarchy levels. The rules remain the same, but they must be applied at each level of nesting. A sample dataset contain multiple levels of nesting, such as:
- A ticket contains multiple tasks.
- Each task contains multiple history entries.
The querying such a structure requires conditions on multiple levels of nesting, as seen in this query:
select converted key
, tasks.name
, tasks[id IS NOT NULL].history[startTime IS NOT NULL].changes
from ticket
where tasks[id IS NOT NULL].history[startTime IS NOT NULL] and key like 'ebox-51%'
key | tasks.name | tasks.history.changes |
---|---|---|
EBOX-514 | Řešení | [["USER","USER_GROUP"],["USER","USER_GROUP"],["USER","USER_GROUP"],["USER_GROUP"],["USER","USER_GROUP"]] |
EBOX-515 | Řešení | [["USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ] ] |
EBOX-516 | Řešení | [ [ "USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ] ] |
EBOX-519 | Řešení | [ [ "USER", "USER_GROUP" ] ] |
Handling NULL Values in Nested Queries
NULL Values Must Be Explicitly Filtered in tQL Filtering NULL values is crucial in tQL to prevent unwanted results, especially in GROUP BY queries where missing values can distort aggregations or lead to incorrect dataset outputs. To ensure accuracy in queries, apply explicit NULL checks at relevant levels:
-
Filter out NULL tasks before accessing subfields: tasks[id IS NOT NULL]
-
Filter out NULL history records before accessing subfields: history[startTime IS NOT NULL]
NULL filtering is not limited to changes. The filter should be adapted to the specific predicate needed for accurate results. For example, if filtering on status:
tasks[active=true].history[changes = 'USER_GROUP']
select converted() key
, tasks.name
, tasks[id IS NOT NULL].history.changes
, sum(tasks[].history[].duration) as "Duration"
from ticket
where tasks[id IS NOT NULL].history[startTime IS NOT NULL]
group by tasks[id IS NOT NULL].history[startTime IS NOT NULL].changes, tasks[id IS NOT NULL].name, key
key | tasks.name | tasks.history.changes | Duration |
---|---|---|---|
ebox-514 | Řešení | USER_GROUP | 951357 |
ebox-514 | Řešení | USER | 951352 |
ebox-515 | Řešení | USER | 1538050 |
ebox-515 | Řešení | USER_GROUP | 1538050 |
ebox-516 | Řešení | USER | 1207300 |
ebox-516 | Řešení | USER_GROUP | 1207300 |
ebox-519 | Řešení | USER | 1173 |
ebox-519 | Řešení | USER_GROUP | 1173 |