tQL - Operation with Nested Objects

Nested objects

Elasticsearch lacks support for SQL-style joins (e.g., JOIN in relational databases). Instead, it denormalizes data, storing and indexing documents independently for optimized search performance. By default, Elasticsearch flattens JSON objects, which can cause incorrect search results when querying arrays of objects. Since Elasticsearch excels at quickly searching documents of the same type and does not generally support joins, the best solution is to use the nested type. The nested field type ensures that objects within an array remain separate entities within the same document, preserving their relationships and enabling accurate queries.

Give me all tickets where one of the tasks is assigned to me
Give me the sum of worklogs on tickets of project ‘X’
Give me all tasks on tickets of project ‘X’

You need to use nested objects in ElasticSearch. Please consult the ElasticSearch documentation.

We will use nested documents of worklogs inside the ticket index in the following example:

"custom": {
    "worklogs": [
        {
            "duration": 180,
            "workerId": "0b4b7f40-4731-4ffd-b1b2-8c4ec7030ff7",
            "endDate": "2022-01-14T19:16:00.000+0000",
            "id": "54a21df3-eb77-4ee0-87d5-d7bbf54127ea",
            "type": null,
            "startDate": "2022-01-14T16:16:00.000+0000"
        }
    ]
}

Return nested rows

Simplified Methodology for the Retrieval of Nested Objects and the Viewing of Their Data.

Retrieving and inspecting nested objects can be done using a straightforward approach. This method allows users to efficiently access nested data structures and analyze their contents.

SELECT key, tasks
from ticket
WHERE key = 'ZE_3';

This query retrieves the key and tasks fields from the ticket table where the key matches 'ZE_3'.

key	tasks
ZE_3	[ { "dueDate": null, "description": null, "userGroupId": null, "skills": [], "activityId": "Activity_1ed4tpv", "routing": null, "taskType": "USER_TASK", "plannedStartTime": null, "tsmTaskDefinitionCode": null, "commitedStartTimeTo": null, "startTime": "2024-06-27T07:15:32.469+0000", "id": "0a7e5263-3455-11ef-a796-3e5d7e594a64", "_timestamp": null, "processInstanceId": "0a7d19dd-3455-11ef-a796-3e5d7e594a64", "processDefinitionId": "ticket_zkouska.exportu:2:fb335fca-3454-11ef-a796-3e5d7e594a64", "userGroupCode": null, "entityType": "TASK", "module": "ticket", "custom": null, "active": true, "history": [], "charsTyped": {}, "userId": null, "commitedStartTimeFrom": null, "taskDefinitionKey": "Activity_1ed4tpv", "executionId": "0a7d19dd-3455-11ef-a796-3e5d7e594a64", "canceled": false, "taskSpecification": null, "plannedEndTime": null, "name": "Vyřešit", "wfmUserId": null, "endTime": null, "incident": null, "chars": {}, "status": null, "followupDate": null } ]

key

tasks

ZE_3

[ { "dueDate": null, "description": null, "userGroupId": null, "skills": [], "activityId": "Activity_1ed4tpv", "routing": null, "taskType": "USER_TASK", "plannedStartTime": null, "tsmTaskDefinitionCode": null, "commitedStartTimeTo": null, "startTime": "2024-06-27T07:15:32.469+0000", "id": "0a7e5263-3455-11ef-a796-3e5d7e594a64", "_timestamp": null, "processInstanceId": "0a7d19dd-3455-11ef-a796-3e5d7e594a64", "processDefinitionId": "ticket_zkouska.exportu:2:fb335fca-3454-11ef-a796-3e5d7e594a64", "userGroupCode": null, "entityType": "TASK", "module": "ticket", "custom": null, "active": true, "history": [], "charsTyped": {}, "userId": null, "commitedStartTimeFrom": null, "taskDefinitionKey": "Activity_1ed4tpv", "executionId": "0a7d19dd-3455-11ef-a796-3e5d7e594a64", "canceled": false, "taskSpecification": null, "plannedEndTime": null, "name": "Vyřešit", "wfmUserId": null, "endTime": null, "incident": null, "chars": {}, "status": null, "followupDate": null } ]

Advantages:

The query returns all documents stored within the tasks field as an array.
Users can view the complete dataset in its raw form, ensuring that every document and all associated keys are accessible.
The structure of the nested JSON data remains intact, allowing for a comprehensive overview of the original records.

Disadvantages:

The returned data is not well-suited for analysis in its current form.
The nested JSON structure makes the output difficult to interpret, especially when visually inspecting large datasets.
The lack of a tabular or structured format makes it challenging to extract meaningful insights without further processing or transformation. It is crucial to ensure that every document and all associated keys are accessible.

Instead of retrieving the entire nested JSON structure, we can extract specific fields from the nested object and return them as separate columns. This can be achieved using the following SQL query:

SELECT key
    , tasks.name
    , tasks.taskType
 FROM ticket
WHERE key = 'LNX1';

key	tasks.name	tasks.taskType
LNX1	[ "Řešení", "Řešení", "Řešení", "Po message", "Řešení", "Řešení", "Řešení", "Řešení" ]	[ "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK", "USER_TASK" ]

Advantages of this approach:

This query provides a more structured output by directly returning individual fields (name and taskType) from the nested tasks object.
Users can immediately see the relevant values without needing to manually parse a complex JSON structure.
This method makes it easier to work with specific fields in reporting, visualisation, or further data processing.

Disadvantages of this approach:

The extracted data is returned in arrays, reflecting the structure of the original nested object.
Analyzing relationships between different fields within the nested objects can be challenging because the dependencies between elements in multiple arrays may not be clearly defined.
If the tasks field contains multiple objects, the query might return ambiguous or unclear results, making it harder to correlate related data points.

Let's look at the some issues that can arise when selecting specific fields from a nested object directly. When you query a field from a nested document, the result is usually an array that contains all the values associated with that field across all entries in the nested document. This means that instead of retrieving a single value, you may end up with an array containing multiple values from the nested field, even if you're only interested in one of them.

select key
    , custom.worklogs.startDate
    , custom.worklogs.workerId
from ticket
where key = 'TSMTTS-1384'

key	custom.worklogs.startDate	custom.worklogs.workerId
TSMTTS-1384	2021-12-13T08:48:00.000+0000,2022-01-04T07:12:00.000+0000,2022-01-03T03:13:00.000+0000,2022-01-04T10:51:00.000+0000,2022-01-06T06:47:00.000+0000,2022-01-20T08:13:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414

However, you can filter the results using the nested filter operator []:

select key
    , custom.worklogs[workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].startDate
    , custom.worklogs.workerId
from ticket
where key = 'TSMTTS-1384'

Internally, this is translated to script query:

select key, script('return 
   params._source["custom"]["worklogs"]
      ?.stream()
      ?.filter(
         __collectionItem -> __collectionItem.workerId == "d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414" 
      )
      ?.map(
         __collectionItem -> __collectionItem.startDate
      )
      ?.collect(Collectors.toList())
   ') as custom.worklogs.startDate
   , custom.worklogs.workerId
from ticket
where key = 'TSMTTS-1384'

Both queries give the same result with filtered startDate for worklogs of one worklogId:

key	custom.worklogs.startDate	custom.worklogs.workerId
TSMTTS-1384	2021-12-13T08:48:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6f30f0d3-1074-45a9-bfe9-93f0c802a10a,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,6c570ee6-139e-47b0-ac8f-8c9ae2d203ee,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414

Please note, that because we did not filter on the workerId field, results from all nested records are returned.

Notice: For empty filters, a nested indicator is optional for the select query. Both columns return the same value:

select custom.worklogs.workerId, custom.worklogs[].workerId from ticket

This is not the case for where / group by / order by clauses, where brackets are mandatory even for empty filters.

In general, if you want to return nested documents, you need to use the special syntax:

FROM nested(entity, nestedPath)

select key
    , custom.worklogs.workerId
    , custom.worklogs.startDate
    , custom.worklogs.duration
from nested(ticket, custom.worklogs)
where key = 'TSMTTS-1384'

key	custom.worklogs.workerId	custom.worklogs.startDate	custom.worklogs.duration
TSMTTS-1384	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	2021-12-13T08:48:00.000+0000	15
TSMTTS-1384	6f30f0d3-1074-45a9-bfe9-93f0c802a10a	2022-01-04T07:12:00.000+0000	240
TSMTTS-1384	6f30f0d3-1074-45a9-bfe9-93f0c802a10a	2022-01-03T03:13:00.000+0000	240
TSMTTS-1384	6c570ee6-139e-47b0-ac8f-8c9ae2d203ee	2022-01-04T10:51:00.000+0000	30
TSMTTS-1384	6c570ee6-139e-47b0-ac8f-8c9ae2d203ee	2022-01-06T06:47:00.000+0000	60
TSMTTS-1384	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	2022-01-20T08:13:00.000+0000	15

Let's go throw other sample. Please note that the following two queries produce the same result, meaning they can be considered alternative approaches to retrieving data from nested structures:

select key
    , tasks.name
    , tasks.taskType
    , tasks.id
from nested(ticket, tasks)
where key = 'LNX1'

This query explicitly the nested tasks array within the ticket document.

Alternatively, using the GROUP BY clause:

select key
    , tasks.name
    , tasks.taskType
    , tasks.id
from ticket
where key = 'LNX1'
group by key, tasks[].name, tasks[].taskType, tasks[].id

This method achieves the same result by grouping data at the document level while implicitly handling the nested fields.

key	tasks.name	tasks.taskType	tasks.id
LNX1	Řešení	USER_TASK	5c635c86-e23e-11ef-9004-3256945c58db

The second approach is executed as a special aggregation query, where the results are derived from a nested aggregation subquery in the background. This means that the query engine processes the nested fields internally, aggregating and extracting relevant data before returning the final structured output.

Key Differences between this two approaches:

Feature	Group by (Group by Composite)	Nested (Nested with Filters & Top Hits)
Primary Focus	Grouping and counting terms in nested fields	Retrieving documents matching filters
Returned Data	Aggregated counts and term distributions	Actual matching documents (via `top_hits`)
Type of Aggregation	Composite + Nested aggregation	Nested aggregation with filters and `top_hits`
Filters Applied	Top-level filter on the `"key"` field	Filters inside the nested aggregation
Document Retrieval	No documents returned	Actual documents returned (filtered)
Count	Counts of terms in nested fields	Count of matching tasks via `filteredCount`
Efficiency	Efficient for grouping and counting	Requires retrieval of documents, potentially less efficient

So, in conclusion, approach 1 Best for aggregating and counting data without needing to see individual document data however approach 2 Best for retrieving specific documents that match certain criteria and also counting matching tasks.

When working with deeply nested data structures that extend beyond the second level, such as a third level of nesting (tasks.history.changes and tasks.history.duration), similar principles apply for querying and accessing the data efficiently.

Below are different ways to retrieve data from a third-level nested structure and how to handle it properly.

select key
    , tasks[].history[].changes
    , tasks[].history[].duration
from ticket
where   key='VBFeedbackPSP-7'

select key
    , tasks.history.changes
    , tasks.history.duration
from nested(ticket, tasks)
where   key='VBFeedbackPSP-7'

and apply ordinary operation this data

key	tasks.history.changes	tasks.history.duration
VBFeedbackPSP-7	[ [ "USER_GROUP" ], [ "STATUS_CODE" ], [ "USER" ] ]	[ 0, 1, 0 ]

For a clearer and more structured representation of the data, it is highly recommended to use the GROUP BY clause when working with third-level nested fields. This approach ensures that the extracted data maintains an organized structure, preventing duplication and preserving the relationships within the nested hierarchy. group by clouse

select key
   , tasks.history.changes
   , tasks.history.duration
from nested(ticket, tasks)
group by key, tasks.history[].changes, tasks.history[].duration

By applying GROUP BY, the query aggregates the results in a structured format, where each row corresponds to a specific combination of values from the inner JSON documents. This method ensures that nested records are properly grouped for deeply nested data.

key	tasks.history.changes	tasks.history.duration
vbfeedbackpsp-7	STATUS_CODE	1
vbfeedbackpsp-7	USER	0
vbfeedbackpsp-7	USER_GROUP	0

Nested in the WHERE condition

tQL contains special syntax for filtering in nested documents, similar to the filter in the select clause:

select key, custom.worklogs.startDate, custom.worklogs.workerId, custom.worklogs.duration
from ticket
where custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414']

key	custom.worklogs.startDate	custom.worklogs.workerId	custom.worklogs.duration
TSMTTS-1390	2021-12-15T09:24:00.000+0000,2022-02-14T14:00:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	45,60
TSMTTS-1372	2021-12-09T05:30:00.000+0000,2021-12-08T10:00:00.000+0000,2021-12-10T13:08:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,3255f56a-95aa-4c5f-90d9-e5925684ea55,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	45,20,15

It is important to understand the difference between the usage of nested filters in where condition and in select. Where condition filters “tickets”, not “worklogs”, if you want to find tickets with worklogs matching the filter and get only associated worklogs records, you need to combine both approaches:

select key,
    custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].startDate,
    custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].workerId,
    custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414'].duration
from ticket
where custom.worklogs[startDate is not null and workerId = 'd825efd0-9b9a-4a7c-baa5-a4e9a8d4d414']

key	custom.worklogs.startDate	custom.worklogs.workerId	custom.worklogs.duration
TSMTTS-1384	2021-12-13T08:48:00.000+0000,2022-01-20T08:13:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	15,15
TSMTTS-1390	2021-12-15T09:24:00.000+0000,2022-02-14T14:00:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414,d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	45,60
TSMTTS-1379	2021-12-13T14:18:00.000+0000	d825efd0-9b9a-4a7c-baa5-a4e9a8d4d414	150

Data aggregation in nested objects

Group by and Elasticsearch aggregation methods

Elasticsearch organizes aggregations into three categories:

Metric aggregations that calculate metrics, such as sum or average, from field values.
Bucket aggregations that group documents into buckets also called bins, based on field values, ranges, or other criteria.
Pipeline aggregations that take input from other aggregations instead of documents or fields.

For metric aggregations use aggregation functions in tQL. Bucket aggregations are realized in Group by clauses.

Group by clause supported syntax:

Group by group_list,

[group_list] ::=\{\{ * | \{ key_name | prefix.key_name | nested_obj[].nested_key | const_field | expression | script(pianless_script) \}

Notice that group_list must contain all select_list keys, instead of aggregates. However, group_list may also contain additional keys if bucketing logic is necessary.

A simple group by resembles the standard SQL even with flattend structure:

select chars.project
    , count(id) as "Tasks"
from ticket
group by chars.project

A nested Group by with a calc field

When nested object fields are grouped, use syntax with [ ], e.g. nestedObjectName[].fieldName in Group by clause

select custom.worklogs.workerId
    , sum('doc["custom.worklogs.duration"].value / 60.0 / 8') as "Worklog MD"
from ticket
group by custom.worklogs[].workerId

It is also possible to combine nested and not nested fields:

select converted priorityId as priority
    , statusId as Status
    , count(custom.worklogs.workerId)
from ticket
group by priorityId, statusId, custom.worklogs[]
order by statusId desc

Filtering nested objects by Group by

There are three ways to apply filtering with the Group by clauses:

Where clause as usual supports all predicate to create a range of documents for bucketing
Having clauses for filtered aggregation result in sum, min, max, avg, and count functions.
Bucket filter for nested objects

The first one is simple:

    select converted priorityId as priority
        , statusId as Status
        , count(custom.worklogs.workerId)
    from ticket
    where priorityId is not null and statusId is not null
    group by priorityId, statusId, custom.worklogs[]
    limit 5

priority	Status	Worklogs - workerId
Major	New	0
Major	Canceled	24
Major	Postponed	60
Major	Closed	5288
Major	In progress	5948

The second one should only be used for aggregation results:

select converted priorityId as priority
    , statusId as status
    , count(id) as worklogs
from ticket
Where priorityId is not null and statusId is not null
group by priorityId, statusId, custom.worklogs[]
Having count(id) >60

The bucket filter for nested objects has two different results for nested field aggregation.

If we use filters in the Where clause, tQL creates a range of objects which contains nested by conditions and calculates aggregates of all nested values in this range.

select converted priorityId as priority
    , statusId as status
    , custom.worklogs.workerId
    , count(custom.worklogs.duration) as worklogs
from ticket
Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
group by priorityId, statusId, custom.worklogs[].workerId

priority	status	Worklogs - workerId	worklogs
Major	Closed	Michal Jandura	74
Major	Closed	Marek Schmidt	19
Major	Closed	Jaroslav Lehar	8
Major	Closed	Petr Kropík	8
Major	Closed	Jan Tecl	5
Major	Closed	Jira Synchronizace	4
Major	Closed	Petr Žoček	3
Major	Closed	Martin Tměj	3

When we use bucket filters in a Group by clause, only nested values which support condition expressions will be aggregated

    select converted priorityId as priority
        , statusId as status
        , custom.worklogs.workerId
        , count(custom.worklogs.duration) as worklogs
    from ticket
    Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
    group by priorityId, statusId, custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7'].workerId

Priority	status	Worklogs - workerId	worklogs
Major	Closed	Michal Jandura	74
Major	In progress	Michal Jandura	100
Blocker	Closed	Michal Jandura	30
Minor	Closed	Michal Jandura	14
Critical	Closed	Michal Jandura	40

If you want to group by a nested field, you need to always use nested “[]” syntax:

select converted chars.project 
    , statusName
    ,  priorityId
    , sum(custom.worklogs.duration)
    , count(custom.worklogs.id)
from ticket
group by  statusName, chars.project,  priorityId,
custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']

Projekt	statusName	Priority	custom.worklogs.duration	custom.worklogs.id
RMD	closed	Major	0	0
SPM	closed	Major	6090	50
SPM	closed	Blocker	480	7

select custom.worklogs.workerId
       , count(custom.worklogs.id) AS "Count worklogs"
       , count(id) AS "Count tickets"
from ticket
group by custom.worklogs[].workerId

custom.worklogs.workerId	Count worklogs	Count tickets
04f316cd-4897-45e3-8145-7bf9e951c9a4	381	75
0511ca9b-e20f-43c5-b59c-f864385b46d7	258	76
055b7d5c-e40c-4565-94d7-5cf620f1abe9	381	98
0b4b7f40-4731-4ffd-b1b2-8c4ec7030ff7	684	255
0e6cfa34-dc2b-4157-b1df-a55278621aa0	186	49

Also notice different numbers of nested documents (worklogs) and main documents (tickets). You may also want to filter the worklogs for grouping:

select custom.worklogs.workerId
    , count(custom.worklogs.id) AS "Count worklogs"
    , count(id) AS "Count tickets"
from ticket
group by custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].workerId

custom.worklogs.workerId	Count worklogs	Count tickets
04f316cd-4897-45e3-8145-7bf9e951c9a4	381	75

This is not the same as filtering the tickets followed by group by (first select all tickets containing at least one worklog with this worker, then group by all worklogs from these tickets):

select custom.worklogs.workerId
    , count(custom.worklogs.id) AS "Count worklogs"
    , count(id) AS "Count tickets"
from ticket
where custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
group by custom.worklogs[].workerId

custom.worklogs.workerId	Count worklogs	Count tickets
04f316cd-4897-45e3-8145-7bf9e951c9a4	381	75
0511ca9b-e20f-43c5-b59c-f864385b46d7	13	5
055b7d5c-e40c-4565-94d7-5cf620f1abe9	10	2

More complex example:

select custom.worklogs.workerId
    , round(sum(custom.worklogs.duration)/60/8, 1) AS "Sum duration MD"
from ticket
group by custom.worklogs[startDate > '2022-01-01T00:00:00.000+0000'].workerId

custom.worklogs.workerId	Sum duration MD
6f30f0d3-1074-45a9-bfe9-93f0c802a10a	618.9
d8769f98-9758-4fcd-80a2-410c9d915b52	92.5
e7f5865c-7719-4a14-beef-4bce2aae34f9	30.6
bd4cabfe-17b5-4d43-9020-7b8d04187567	74.5

Or even:

select custom.worklogs[].workerId 
    , format(histogram(custom.worklogs[startDate > '2022-01-01T00:00:00.000+0000'].startDate, 'month'), 'MM') as "Month" 
    , round(sum(custom.worklogs.duration)/60/8, 1) AS "Sum duration MD"
from ticket
group by custom.worklogs[].workerId
, histogram(custom.worklogs[startDate > '2022-01-01T00:00:00.000+0000'].startDate, 'month')

custom.worklogs.workerId	Month	Sum duration MD
6f30f0d3-1074-45a9-bfe9-93f0c802a10a	01	118
e7f5865c-7719-4a14-beef-4bce2aae34f9	01	11.3
6c570ee6-139e-47b0-ac8f-8c9ae2d203ee	01	16.6
d8769f98-9758-4fcd-80a2-410c9d915b52	01	16.4

Group by and differentd in ways of filtering data

Nested in order by

You can sort by nested value in the same format as in where/group by.

select key 
    , format(custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].startDate, 'dd.MM.yyyy HH:mm') as "When"
    , custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].duration as "Duration minutes"
from ticket
where custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
order by custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4'].startDate

key	When	Duration minutes
TSMTTS-868	14.07.2021 07:04	360
TSMTTS-850	15.07.2021 05:01, 29.07.2021 13:20, 30.07.2021 10:45, 04.08.2021 08:00	120,180,180,330
TSMTTS-881	15.07.2021 10:00, 16.07.2021 04:43, 19.07.2021 12:46, 21.07.2021 11:51	90,120,120,120
TSMTTS-883	15.07.2021 12:01, 16.07.2021 06:57, 19.07.2021 10:59, 20.07.2021 05:34, 20.07.2021 09:02, 21.07.2021 06:06, 02.08.2021 10:01	60,150,180,210,210,240,270

The query may look a bit tricky. You need to specify the filtering clause on many levels:

where - select _tickets _which contain at least one work log matching the criteria
order by - order those tickets by the start date of the earliest work log of this ticket
select - return start date/duration of all work logs of this ticket filtered by the criteria (internally calculated by ElasticSearch painless script)

If you think that this does not suit you, you probably want to return nested rows instead. For example:

select key
    , custom.worklogs.startDate
    , custom.worklogs.duration
from nested(ticket, custom.worklogs)
where custom.worklogs[workerId = '04f316cd-4897-45e3-8145-7bf9e951c9a4']
order by custom.worklogs.startDate

key	custom.worklogs.startDate	custom.worklogs.duration
TSMTTS-868	2021-07-14T07:04:00.000+0000	360
TSMTTS-850	2021-07-15T05:01:00.000+0000	180
TSMTTS-881	2021-07-15T10:00:00.000+0000	120
TSMTTS-883	2021-07-15T12:01:00.000+0000	60
TSMTTS-888	2021-07-15T15:35:00.000+0000	120

For access to nesdted objects always use [ ] when calling objects in form or/and group by

It is also possible to combine nested and not nested fields:

select converted priorityId as priority
    , statusId as Status
    , count(custom.worklogs.workerId)
from ticket
group by priorityId, statusId, custom.worklogs[]
order by statusId desc

Filtering nested objects by Group by

There are three ways to apply filtering with the Group by clauses:

Where clause as usual supports all predicate to create a range of documents for bucketing
Having clauses for filtered aggregation result in sum, min, max, avg, and count functions.
Bucket filter for nested objects

The first one is simple:

    select converted priorityId as priority
        , statusId as Status
        , count(custom.worklogs.workerId)
    from ticket
    where priorityId is not null and statusId is not null
    group by priorityId, statusId, custom.worklogs[]
    limit 5

priority	Status	Worklogs - workerId
Major	New	0
Major	Canceled	24
Major	Postponed	60
Major	Closed	5288
Major	In progress	5948

The bucket filter for nested objects has two different results for nested field aggregation.

If we use filters in the Where clause, tQL creates a range of objects which contains nested by conditions and calculates aggregates of all nested values in this range.

select converted priorityId as priority
    , statusId as status
    , custom.worklogs.workerId
    , count(custom.worklogs.duration) as worklogs
from ticket
Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
group by priorityId, statusId, custom.worklogs[].workerId

priority	status	Worklogs - workerId	worklogs
Major	Closed	Michal Jandura	74
Major	Closed	Marek Schmidt	19
Major	Closed	Jaroslav Lehar	8
Major	Closed	Petr Kropík	8
Major	Closed	Jan Tecl	5
Major	Closed	Jira Synchronizace	4
Major	Closed	Petr Žoček	3
Major	Closed	Martin Tměj	3

When we use bucket filters in a Group by clause, only nested values which support condition expressions will be aggregated

    select converted priorityId as priority
        , statusId as status
        , custom.worklogs.workerId
        , count(custom.worklogs.duration) as worklogs
    from ticket
    Where custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7']
    group by priorityId, statusId, custom.worklogs[workerId like '0511ca9b-e20f-43c5-b59c-f864385b46d7'].workerId

Priority	status	Worklogs - workerId	worklogs
Major	Closed	Michal Jandura	74
Major	In progress	Michal Jandura	100
Blocker	Closed	Michal Jandura	30
Minor	Closed	Michal Jandura	14
Critical	Closed	Michal Jandura	40

Nested Grouping

In tQL, aggregation functions like sum(), count(), max(), min(), and avg() can be used in combination with complex painless scripts to perform sophisticated transformations and calculations on data. These aggregation functions allow you to extract meaningful insights from your dataset by summarizing values and applying conditions. This type of complex aggregation is useful when you want to group documents based on certain attributes, such as extracting specific parts of a timestamp (e.g., month) and calculating aggregates like the average or maximum month of creation for tickets in a Closed status. This example is used to perform aggregation over a complex, nested structure, allowing users to group tickets by project and task type while applying conditions (e.g., tasks that are inactive and have non-null id) to the data. This scenario would be useful when you want to group and aggregate tasks across all statuses (active or inactive), comparing worklog durations and counting task IDs and priority names for each group.

select converted chars.project
        , tasks.taskType
        , sum('def d ; 
                if (doc["custom.worklogs.duration"].size()!=0) 
                {d=doc["custom.worklogs.duration"].value; }
                else d=0; 
                return d') 
        ,   count(tasks.id)
        , count(priorityName)
from ticket
where tasks[active = false]
group by chars.project,   tasks[active = false and id is not null].taskType, custom.worklogs[id is not null]

Nested Grouping with a Calculated Field

The sample query below applies nested grouping to ticket-related data, aggregating key task details such as task type, total worklog duration and task count, while filtering for inactive tasks. It integrates aggregation functionality with Painless scripting, demonstrating the primary capabilities of tQL. In addition, the query demonstrates flexible filtering and grouping across nested structures, enabling detailed task analysis.

select chars.project
        , tasks.taskType
        , sum('def d ;  
                if (doc["custom.worklogs.duration"].size()!=0) {
                        d=doc["custom.worklogs.duration"].value; 
                    } 
                    else d=0;  
                    return d') 
        , count(tasks.id)
        , count(priorityName)
from ticket
where tasks[active = false]
group by chars.project,   
    tasks[active = false and id is not null].taskType, custom.worklogs[id is not null]

For a standard query, we can ask elastic to return https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html, which has the original value. We use it for the standard select queries, hence 'select priorityName from ticket' would return original values. It is not possible for the Group by, 'select priorityName from ticket group by priorityName', which returns a grouped value.

select name 
    , sum('doc["custom.worklogs.duration"].value / 60.0 / 8') as "Worklog MD"
from ticket
where creationDate between '2022-02-01T00:00:00.000+0000' and '2022-02-28T00:00:00.000+0000'
group by name, custom.worklogs[]

When nested object fields are grouped, use syntax with [ ], e.g. nestedObjectName[].fieldName in Group by clause

select name
    , custom.worklogs.workerID
    , count(id)
    , sum('doc["custom.worklogs.duration"].value / 60.0 / 8') as "Worklog MD"
from ticket
group by name, custom.worklogs[].workerID

Filtering in queries with data aggregation

Let's clearly examine the filtering possibilities in TQL and the key differences between using WHERE filters and GROUP BY filters.

select converted key
    , tasks[].active
    , tasks[].name
    , tasks[].activityId
    , tasks[].id
from ticket
where key in ('lnx1', 'adr_tik2')
group by key, tasks[].active, tasks[].name, tasks[].id, tasks[].activityId

A query without a filter includes all documents from the dataset. The WHERE tasks[] condition ensures that only tickets containing at least one task are considered, but it does not filter tasks based on specific attributes. As a result, all tasks associated with the selected tickets (key in ('lnx1', 'adr_tik2')) are included in the results. Therefore, predicates for basic documents work in ordinary way.

key	tasks.active	tasks.name	tasks.activityId	tasks.id
adr_tik2	true	Řešení	Activity_1ed4tpv	5c9a0aea-ce96-11ef-8f4c-0e9aa0b6cff9
adr_tik2	true	Řešení	Activity_1ed4tpv	f1953807-cdfe-11ef-8405-7e930c7a76bb
lnx1	false	Řešení	Activity_1ed4tpv	2be9eb07-e23c-11ef-9004-3256945c58db
lnx1	false	Řešení	Activity_1ed4tpv	6204944a-d96b-11ef-b50e-5a12acd032e5
lnx1	false	Řešení	Activity_1ed4tpv	6fb2dd7e-e23d-11ef-9004-3256945c58db
lnx1	false	Řešení	Activity_1ed4tpv	8a07ea1e-d8b1-11ef-99d9-5a12acd032e5
lnx1	false	Řešení	Activity_1ed4tpv	b2f65bb4-d7ca-11ef-99d9-5a12acd032e5
lnx1	false	Řešení	Activity_1ed4tpv	ef960a6a-d960-11ef-b50e-5a12acd032e5
lnx1	false	Po message	Activity_08n06fv	34bcad81-d961-11ef-b50e-5a12acd032e5
lnx1	true	Řešení	Activity_1ed4tpv	5c635c86-e23e-11ef-9004-3256945c58db

After applying a filter for a nested object, we may observe in result some records that do not satisfy the conditions.

select converted key
    , tasks[].active
    , tasks[].name
    , tasks[].activityId
    , tasks[].id
from ticket
where tasks[active=false] and key in ('lnx1', 'adr_tik2')
group by key, tasks[].active, tasks[].name, tasks[].id, tasks[].activityId

This query retrieves all tasks for each ticket that contains at least one task that meets the condition active = false. However, both active = true and active = false tasks are included in the results. The condition WHERE tasks[] ensures that only tickets with at least one task are considered with the predicate.

key	tasks.active	tasks.name	tasks.activityId	tasks.id
lnx1	false	Řešení	Activity_1ed4tpv	2be9eb07-e23c-11ef-9004-3256945c58db
lnx1	false	Řešení	Activity_1ed4tpv	6204944a-d96b-11ef-b50e-5a12acd032e5
lnx1	false	Řešení	Activity_1ed4tpv	6fb2dd7e-e23d-11ef-9004-3256945c58db
lnx1	false	Řešení	Activity_1ed4tpv	8a07ea1e-d8b1-11ef-99d9-5a12acd032e5
lnx1	false	Řešení	Activity_1ed4tpv	b2f65bb4-d7ca-11ef-99d9-5a12acd032e5
lnx1	false	Řešení	Activity_1ed4tpv	ef960a6a-d960-11ef-b50e-5a12acd032e5
lnx1	false	Po message	Activity_08n06fv	34bcad81-d961-11ef-b50e-5a12acd032e5
lnx1	true	Řešení	Activity_1ed4tpv	5c635c86-e23e-11ef-9004-3256945c58db

When filters are applied only in the GROUP BY clause, it can lead to the inclusion of unintended duplicates or records that don't satisfy the intended conditions. This is because GROUP BY aggregates results based on specified columns, and any filtering done here only affects the aggregated data, not the original dataset. As a result, it can unintentionally expand the scope of the results, leading to duplicates or over-aggregation, which is not ideal.

select converted key
    , tasks[].active
    , tasks[].name
    , tasks[].activityId
    , count(tasks[].id)
from ticket
where tasks[active=false ] and key in ('lnx1', 'adr_tik2')
group by key, tasks[active=false].active, tasks[].name, tasks[].activityId

To ensure performance, it’s crucial to apply filters in the WHERE clause before any aggregation. This way, the data is already narrowed down before performing the grouping and aggregation operations, leading to more precise and efficient results.

key	tasks.active	tasks.name	tasks.activityId	count_tasks.id_0
lnx1	false	Řešení	Activity_1ed4tpv	6
lnx1	false	Po message	Activity_08n06fv	1

In addition to using filters in the WHERE clause, we can apply filters directly in the SELECT statement to target nested objects. This allows for more specific results based on additional conditions for nested fields. Filters in the SELECT clause are particularly useful for analytical comparisons, as they allow us to count or aggregate data based on particular criteria that can't always be expressed in the WHERE clause.

select converted key, tasks[].active, tasks[].activityId
  , count(tasks[name like '%Řešen%' ].id) as "Solved task"
  , count(tasks[].id) as "Total"
  , round(count(tasks[name like '%Řešen%' ].id) / count(tasks[].id) *100, 1) as "% solved"
from ticket
where tasks[active=false ]  and key like 'tick1'
group by key, tasks[active=false].active, tasks[].activityId

key	tasks.active	tasks.activityId	Solved task	Total	% solved
tick1	false	Activity_1ed4tpv	4	6	66.7
tick1	false	Activity_0072gmu	0	1	0
tick1	false	Activity_0akgfb0	0	1	0
tick1	false	Activity_1xcu8v0	0	1	0

Aggregate with two nested objects

tQL is a specialized query language designed to handle deeply nested data structures efficiently. It is particularly useful when working with hierarchical data stored in NoSQL engine Elasticsearch.

At this point we have two use case. First is simple and resolve by the ordinary way of tQl operators appliyng.

select  count(tasks.id)
    , sum(custom.worklogs.duration)
    , priorityName
from ticket
group by   priorityName, custom.worklogs[], tasks[]

Secon one is nested in nested. Nested-in-nested querying follows the same core rules as regular nested queries, with an additional layer of complexity due to deeper hierarchy levels. The rules remain the same, but they must be applied at each level of nesting. A sample dataset contain multiple levels of nesting, such as:

A ticket contains multiple tasks.
Each task contains multiple history entries.

The querying such a structure requires conditions on multiple levels of nesting, as seen in this query:

select converted key
    ,  tasks.name
    , tasks[id IS NOT NULL].history[startTime IS NOT NULL].changes
from ticket
where tasks[id IS NOT NULL].history[startTime IS NOT NULL] and key like 'ebox-51%'

key	tasks.name	tasks.history.changes
EBOX-514	Řešení	[["USER","USER_GROUP"],["USER","USER_GROUP"],["USER","USER_GROUP"],["USER_GROUP"],["USER","USER_GROUP"]]
EBOX-515	Řešení	[["USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ] ]
EBOX-516	Řešení	[ [ "USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ], [ "USER", "USER_GROUP" ] ]
EBOX-519	Řešení	[ [ "USER", "USER_GROUP" ] ]

Handling NULL Values in Nested Queries

NULL Values Must Be Explicitly Filtered in tQL Filtering NULL values is crucial in tQL to prevent unwanted results, especially in GROUP BY queries where missing values can distort aggregations or lead to incorrect dataset outputs. To ensure accuracy in queries, apply explicit NULL checks at relevant levels:

Filter out NULL tasks before accessing subfields: tasks[id IS NOT NULL]
Filter out NULL history records before accessing subfields: history[startTime IS NOT NULL]

NULL filtering is not limited to changes. The filter should be adapted to the specific predicate needed for accurate results. For example, if filtering on status:

tasks[active=true].history[changes = 'USER_GROUP']

select converted() key
    ,  tasks.name
    , tasks[id IS NOT NULL].history.changes
    , sum(tasks[].history[].duration) as "Duration"
    from ticket
where tasks[id IS NOT NULL].history[startTime IS NOT NULL] 
group by tasks[id IS NOT NULL].history[startTime IS NOT NULL].changes, tasks[id IS NOT NULL].name, key

key	tasks.name	tasks.history.changes	Duration
ebox-514	Řešení	USER_GROUP	951357
ebox-514	Řešení	USER	951352
ebox-515	Řešení	USER	1538050
ebox-515	Řešení	USER_GROUP	1538050
ebox-516	Řešení	USER	1207300
ebox-516	Řešení	USER_GROUP	1207300
ebox-519	Řešení	USER	1173
ebox-519	Řešení	USER_GROUP	1173