Skip to content

Part 2

Part 2. Basic Query Structure

References: - ClickHouse 'WHERE' documentation - ClickHouse 'Functions for working with strings' documentation

You may want to run queries which include logic to return results only matching certain values. these are specified using a WHERE clause.

For this exercise, we will use = , AND, NOT, OR and LIKE.

NOTE: '=' is used to specify an exact match and is used for all data types. You MUST put string values, IP addresses, number ids inside single quotes (ie source_hostname = ‘SenseOn’). For columns that return INT32 or INT64, the quotes can be omitted. When manipulating numbers, the usual BIDMAS rule of order applies, use brackets to ensure the right operators apply in the correct order.

When chaining multiple conditions, a logic operator AND, OR must be specified at the start of each new matching condition. You can embed these in brackets eg time_observed > 1058926028 AND (column = value 2 OR column = value 3) means it must equal 2 or 3 and the time must be greater than the number given. Without brackets, it will return value = 2 when the time condition is also met, but will return all matches where its value = 3 regardless of the time.

IMPORTANT: For the sake of speed and for all queries in questions to work , include the line _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000 Straight after the WHERE clause. This and other time operations will be explained further down the test.


Structuring WHERE statements

The structure for WHERE statements is:

SELECT
WHERE
<column name> <operator> <matching conditions>,
<logic operator (AND.OR)> <column name> <operator> <matching condition>
ORDER BY

NOTE: to use the LIKE operator, enter the wildcard value ‘%' before and/or after the text you want to match. This allows you to easily specify certain text within an entry you want to return. WHERE name LIKE ‘%.exe’ - will return all entries in the name column that end with .exe , regardless of the text in front of that. WHERE name LIKE ‘%edge%’ will return all entries that have ‘edge’ somewhere in the text, regardless of what is in front of or after it.


2a. Common WHERE operators

Using the = operator

= is used to return an exact match of a condition.

Write a query which returns the time observed, hostname , eventid and data from the endpoint_windows_event_log table where eventid = ‘4624’, with a limit of 20

(REMEMBER: use the _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000 line. Note: If no results are returned, remove the eventid check

Answer
SELECT _time_observed, _hostname, eventid, data
FROM endpoint_windows_event_log
WHERE
  _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000
  AND
  eventid = '4624'
LIMIT 20
Using the LIKE operator

LIKE can be used where a partial match is needed. Using a % on either (or both) sides of the matching condition can be used to find this within strings. For example, name LIKE 'bat%' will return batman and batmobile but not acrobat.

Write a query that returns a list of the _hostname, name,command and parent name from the endpoint_process table, for 20 entries containing ‘dll’ somewhere in the name, ordered alphabetically by the hostname in the last 24 hours.

Hint - operator The LIKE operator used on the `name` column with '%dll%' would be useful here.
Answer
SELECT _hostname, _platform, name, command, parent_name
FROM endpoint_process
WHERE
  _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000
  AND
  name LIKE '%dll%'
ORDER BY _hostname asc
LIMIT 20

Next, modify the query so it returns a distinct list of hostname, process names and parent_names in the past 12 hours, ordered alphabetically by hostname, for records containing dll in the process name

Answer
SELECT distinct _hostname, _platform, name, command, parent_name -- add distinct
FROM endpoint_process
WHERE
  _time_observed >= toUnixTimestamp(subtractHours(now(), 12)) * 1000 --change to 12h
  AND
  name LIKE '%dll%'
ORDER BY _hostname asc
LIMIT 20
Using the NOT operator

NOT LIKE can be used to exclude results. We will next use this alongside other conditions to form a more complex query.

Now, we'll look through the endpoint_program table to find specific Microsoft-published programs, but not with .NET or C++ in the package name.

Adjust the query to look at the ‘endpoint_program’ table , returning a distinct list of the hostnames, and name where the publisher name contains the string ‘Micro’ but not ‘.NET’ and not ‘C++’ in the name in the past 24 hours. Limit this to the first 100 results. Do not select the 'publisher' for display - but still use this column in the where clause for filtering.

Answer
SELECT distinct _hostname, name
FROM endpoint_program
WHERE
  _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000
  AND
  publisher LIKE '%Micro%'
  AND
  name NOT LIKE '%.NET%'
  AND
  name NOT LIKE '%C++%'
ORDER BY _hostname asc
LIMIT 100

2b. Case sensitivity

SQL databases are case sensitive, however we can make it return all results (regardless of case) by using LOWER or UPPER , followed in brackets by the column name EG lower(name) = ‘text in lower case'.

Write a query that returns the hostname, name, and parent_name for records that contain ‘excel’ in the name in the past 3 days from the endpoint_program table. Notice that there aren’t any (or few)

Answer
SELECT _hostname, name, parent_name
FROM endpoint_process
WHERE
  _time_observed >= toUnixTimestamp(subtractHours(now(), (24*3))) * 1000 -- changed to 3 days
  AND
  name LIKE '%excel%'
LIMIT 20

Now, use lower on the name column to ignore case sensitivity.

Hint You'll need to use `lower (name)`
Answer
SELECT _hostname, name, parent_name
FROM endpoint_process
WHERE
  _time_observed >= toUnixTimestamp(subtractHours(now(), (24*3))) * 1000
  AND
  lower(name) LIKE '%excel%' --added lower
LIMIT 20

You should notice we now get results for EXCEL as well as excel now, too.


2c. Mathematical operations in the SELECT clause

Mathematical operations can be performed in the WHERE clause, which is the most common place they will be used. However, they can be used in certain situations within the SELECT clause when dealing with data of a numerical type.

Write a query that looks back 24 hours in the network flow table , with the time , hostname, destination host, application protocols and data downloaded and uploaded. Order by the most data received, converting the bytes into megabytes (use the \ function to divide). Give all the columns friendly names, and set a limit of 50 results

Answer ```sql SELECT _time_observed AS "Time Observed", _hostname AS "Hostname", dest_hostname AS "Destination Hostname", application_protos AS "Application Protocols", source_bytes_sent/1000000 AS "Source MB sent", dest_bytes_sent/1000000 AS "Destination MB sent" FROM network_flow WHERE _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000 ORDER BY dest_bytes_sent desc LIMIT 50 ```

Write a query which shows a list of distinct USB devices, also with the host, platform, hardware model, partitions, type, and the capacity (converted to MB) - order by this column, largest disk first. You'll need to find the table to use in Table Reference.

Hint - which table to use The table that would help us here is `endpoint_disk_info`
Hint - which column to use for distinct The column `_hardware_id` would be a good one to use for distinct values.
Answer ```sql SELECT DISTINCT _hardware_id, _hostname, _platform, hardware_model partitions, type, disk_size/1000000 AS "Disk Size (MB)" FROM endpoint_disk_info WHERE _time_observed >= toUnixTimestamp(subtractHours(now(), 24)) * 1000 ORDER BY disk_size desc ```

Great! Now you can move onto Part 3 of the Hunt Lab training


Part 3 - coming soon!