Configuring Data Sources¶
An data source represents a log source. It can be a CRM, ERP, Active Directory, Exchange, core banking system, or anything else that qualifies as a distinct log source.
Data sources are created and configured from the Data sources menu. Each data source has the following properties and features:
- ID - this is auto-generated ID that is used to designate logs for a specific data source through the
dataSourceIdproperty in the LogSentinel Collector configuration (or the
Application-Idheader for API calls)
- Name - a human-readable name for the data source, e.g. "ERP", "CRM", "Active Directory"
- Description - a human-readable extended description of the data source
- Source category - one of a number predefined categories of sources, useful for analysis and filtering
- Data source group - one or several data source groups to which the data source belongs; groups are dynamically defined in the "Data source groups" page.
- Retention period - number of days for which the logs from this source are retained in searchable state
- Default alert destination - where are default alerts for this data source sent (e.g. in case of log-level based alerts or threat intelligence matches)
- Risk level - risk level (from 0 to 100) for alerts originating from this data source. The number is part of the risk score of each alert (taking into account rule configurations as well)
- Paths - upon receiving a log enry, the system can extract relevant information from the body. This is useful in cases the Collector did not do the preprocesing or when the user does not have full control over what gets sent, but would like to have actor, action and other parameters extracted, as well as custom tags. Supported extraction schemes are XPath, JSON Path and (Java-style) Regex. In case of XPath or JSON Path multiple comma-separated paths may be entered. In contrast - only a single Regex may be entered. Notice though, that a 'disjunction' of several regular expressions is also a regular expresion. So in reallity you may enter multiple, |-separated Regex(es). If the Regex conatins a group named 'result' (specified by (?<result>...)) then the extracted value is the text matched by this group, otherwise it is the text matched by the whole Regex.
- Group identical logs - allows multiple identical logs to be bundled together into a single raw log with a
params.countset to the number of aggregated logs. By default the logs should be fully identical, but custom identity definition can be specified by listing the params to match.
- Data masking patternse - masks all matches by the specified Regex patterns. There are built-in patterns for IBAN, SSN and credit card numbers.
- Regex-based log levels - an error, fatal or critical log level can be set on each entry based on a matching regex.
- Discarded logs - a list of patterns for discarding non-interesting logs can be configured. The patterns are specified via key=value pairs where the key is the param name and the value is a regex to be metched in order for the log to be ignored.
Data is typically normalized in the collector, according to an adapted Elastic Common Schema. In the extraction screen, parameters from the ECS can be configured for extraction as well.
- Displayed details fields - this allows configuring a comma-separated list of fields that should be displayed on the dashboard for JSON bodies. This is useful in case of noisy log messages that contain less useful information. If you specify a whitelist of fields, all others is stored and searchable, but not displayed by default.
- Generate merkle tree - specified whether a merkle tree should be regularly built using the log entries from this source. A merkle tree proof is discussed in the verification page.
- Generate hash chain - specify if logs in this data source are chained to form a hash chain (and timestamped blocks of incoming data are also chained). This allows for granular integrity guarantees.
- Anchor hashes externally - if the subscription plan allows it (or if an Ethereum private key is configured for on-premise installations), this turns on periodic anchoring of latest hashes and merkle roots to Ethereum. There's an option to manually push the latest ones in addition to the periodic triggers.
- Verification periods - how often should full and partial background verifications run. If you have a lot of data it's recommended that full verifications is done rarely, as it puts significant load on the system (in case of on-premise setups).
List of recipients of verification reports - a list of emails to receive verification reports for this data source
Hash recipients - Regularly send hash updates to these emails to allow for better future verifications in case of tampering attempts. This email should be service inbox and not a real person's email address, as it will potentially be noisy.
- Public keys - a list of public keys used to verify log signatures if such are supplied by the client. Clients can sign each log entry with a private key and provide the public one(s) for verification.
- Warn log level - whether admins should be warned in case a log level above a certain threshold is received
- IP Whitelist - a whitelist of IPs that are allowed to send logs for this data source
- Disabled threat intelligence feeds - select threat intel sources to disable for this data source
- Opendata - on-premise installations have support for opendata. This is usually applicable to public sector customers that have legal transparency obligations. It allows making all data for an data source publicly accessible and automatically eported as JSON. The options here include a whitelist of regexes which designate which entries should be displayed unanonymized, particular fields (JSON Path or XPath) to be anonymized in the body and regexes for anonymizing content.