Oracle Exadatas have been around for a long time — but really understanding how they work can put you and your databases in a much better position. For instance, what is essentially different in Smart Scan compared to Buffered Read, and why it is considered superior? Let’s get into it.
An Exadata Smartscan is a Full Segment Scan that is executed by Storage Cells. The reason that Smart Scans can be so much better than regular, Buffered Reads, is apparent when you compare the two processes:
Serial, Buffered Read (Regular Database Read / Not a Smart Scan)
The Oracle database goes through a basic process to retrieve data that, at a high level, includes the following:
- A user runs a query that executes a Full Segment Scan
- Oracle software looks in the Buffer Cache for the appropriate blocks of data
- If the requested blocks are not in the Buffer Cache, the Oracle software requests the blocks from storage which are then copied into the Buffer Cache
This essentially says that the database really only looks for data in the Buffer Cache. This is true most of the time. In the above description, “Serial” just means non-parallel. In this process, one CPU on the Compute Node will be busy scanning blocks of data in the Buffer Cache and subsequently waiting for more blocks to be copied into memory (the Read / Wait Cycle).
The negatives of this process?
- Only one CPU will be utilized for scanning and filtering blocks of data from the Buffer Cache
- The Buffer Cache is only so big and is used by ALL sessions on the database. If your Segment is really big and doesn’t fit in the Buffer Cache, it will be re-read from disk over and over again.
- When lots of users run queries which access the same large Segments, there will be lots of waits for IO, one of the slowest things the database does.
- Lots of IO means that the network between the Compute Nodes and Storage Cells may be very busy if not saturated.
CPU Count Used = 1 (because only the Compute Node is involved)
Exadata Smart Scan
The same, Serial query on the Exadata platform will look like this:
- A user runs a query that executes a Full Segment Scan
- The Compute Node will send a request to the Storage Cells asking them to Scan all the blocks of the segment
- All of the Storage Cells will scan All of the blocks of the requested segment simultaneously. This is possible because each Storage Cell maintains a portion of the data in your Database. ASM takes care of this. Because the Exadata Software (cellsrv) is multi-threaded, it can exercise all of the cores on the Storage Cell simultaneously.
The difference between the Buffered Read and the Exadata Smart Scan is that the Storage Cells are Scanning and Filtering the data without pushing blocks of data over the network. Since the data is on the Storage Cell and cellsrv is capable of scanning and filtering data, the efficiency of the IO process is improved significantly.
Once the data is filtered by the Storage Cell, only the rows and / or columns of data that are required to satisfy the query are extracted from the data blocks and sent back to the Compute Node. Hence, the IO is much faster (happening locally on each Storage Cell) and the amount of data passed back from the Storage Cell to the Compute Node can be dramatically reduced.
Remember that the Exadata Compute Node is still running a Serial Query. The inclusion of the Storage Cells provides an inherent level of parallelism equal to the number of Storage Cells multiplied by the Core Count.
For example, on an Exadata X9-2 Quarter Rack:
CPU Count Used = Compute Node CPU + (Storage Cells * CPU per Storage Cell) CPU Count Used = 1 + (3 * 32) = 97
This should be enough to convince you of the superiority of Exadata Smart Scans. Because, sometimes, seeing is believing and understanding.
If you don’t have the tools and Exadata machines to test all these theories, contact INFOLOB. Our Technology Lab in Irving, TX, is populated with 2 Exadata X6-2 Quarter Racks along with an X5-2 ZDLRA. Please feel free to visit.