
Data collection – concurrent searches
If you are building a new Splunk deployment for your company, you and your user communities may have limited estimates of how many ad hoc and scheduled searches for reports, dashboards, and alerts you can expect to employ, so this factor may be a bit difficult to nail down exactly. It may help to understand the various types of searches that are available from a Splunk perspective. We will cover searches more thoroughly in Chapter 6, Searching with Splunk, but for the purpose of gathering an estimate of the number of concurrent searches expected, you only need to be aware of the basic types of searches so that you can discuss this information with your user community:
- Ad hoc: These are run by a user from Splunk Web for troubleshooting, one-time investigative reports, and so on.
- Scheduled historical search: These are searches from a scheduled report or alert, or a dashboard that updates its panels periodically, which run against already-indexed data. This will likely represent the vast majority of the searches on your system.
- Real-time search: These are searches that are run against events as they are received for indexing, typically for time-sensitive monitoring. Real-time searches can be run against events before they are indexed, or just after they are indexed with a slight delay. The number of concurrent real-time searches running can greatly affect indexing performance; for this reason, only users with the admin role can run and save them by default (this ability can also be assigned to specific users or roles), and it is best to limit their use.
- Summary indexing search: This is a frequently running search that extracts information of interest (specific fields out of each event, for example) and saves the results into a designated summary index. You can then run searches and reports for longer reporting periods against this significantly smaller summary index (instead of against the entire range of full-sized events for the given time period) for increased performance.
Again, it is usually difficult to determine the number of expected concurrent searches because there are so many variables, and there are no hard and fast rules of thumb for estimating them. It is a good idea to sit down with each of your user communities and discuss the various types and number of reporting products they may want to leverage from the Splunk deployment. The Fig 2.2 depicts an example of a Splunk reporting products document that you may want to prepare to aid in these discussions, and to let your user base provide some useful planning feedback:
Don't let this part of the data collection process discourage you; assuming you're planning to implement a reasonable amount of search capability to start with, and your user community isn't wanting to run an exceptionally high number of concurrent searches, this part of the data collection process isn't nearly as critical as determining the volume of data you should expect to ingest. As a general rule, you will need to add more indexers before you run out of processing capability in search heads, and if you are expanding an existing Splunk environment, you can run reports to measure the number of searches by type and compare this to the numbers of users and data sources to establish a user-to-concurrent-search ratio for better capacity planning and management.
Before we dive into using the information we've just collected to choose Splunk hardware options, there are a few more topics that need to be covered. By the way, in the discussion and examples of implementing Splunk in this book, we are going to assume the use of a Splunk Enterprise license since that is the most likely scenario you will be working with eventually.