Learn how to tune Oracle 12c SQL with histograms! In this free tutorial, Oracle Certified Master DBA John Watson demonstrates what histograms do (provide correct cardinality), the difference between histogram types (Frequency and Height Balanced). You will also learn the importance of the auto sample size algorithm in 12c and the new "Hybrid" type histogram.
>> John: Today I want to talk about another major enhancement in database release 12C. That's the new options for creating histograms.
Just as a quick review, I hope we are all aware that getting decent SQL performance is all about getting accurate cardinality estimates. It's all about the cardinality. How does the cost-based optimizer generate cardinality estimates? It uses histograms. It uses histograms, getting it the distribution of data within tables. Accurate statistics are vital if the optimizer is to come up with decent execution plans. If the cardinality estimates are wrong, you are going to get inappropriate join orders, inappropriate join methods, inappropriate access methods. I strongly believe that nine out of ten of your fixable performance problems can be fixed by giving Oracle accurate statistics.
What are improvements with release 12? Firstly, you can have many more buckets. In all other releases, the maximum number of buckets for histogram is only 254. Not meant to take more than 254 distinct values in a column, the estimates are pretty imprecise. Oracle have to generate what was called the height-balanced histogram, which was a pretty crude technique.
In 12, firstly the number of buckets can go up to over 2000. That would give you an absolute and perfect estimate. Perhaps more importantly though for tables where there are more than 2000 distinct values in a column have new histogram types. But one way or another, the optimizer will now have much more accurate cardinality estimates.
How do you implement this functionality? Enable the auto stats task. Always make sure you're analyzing on a regular basis. Certainly join individual tables and columns consider raising the number of buckets from default. Most importantly, do not interfere with the sample size algorithm that the gathering statistics process uses. If you interfere with the sample size, you will not get the new histogram types.
Let's take a look at the 12c new features and see how well they work. I'll create a table.
Create table t1 as select * from all_objects
And see what statistics I have by default.
Select column_name, num_distinct, num_nulls, histogram from user_tab_col_statistics where tab_name = 'T1'
We get some useful information. Those of you who are familiar with 12 should be surprised already. In earlier release, there would've been nothing. But in release 12 - and I create a table with the syntax - some basic statistics are automatically generated during the table creation. But histograms, no. I've got no histograms created. For that I need to explicitly request them.
I'll gather statistics in a way that many DBAs will do.
I'm telling Oracle to look at the entire table and that should give it as accurate statistics as possible to get. I'm also requesting histograms on all columns that need them, for all columns size skew only. There's no point in building a histogram on a column if it has no skew.
What histograms do I have now? Frequency and height-balanced. A frequency histogram means that as long as the number of distinct values is less than the number of buckets, the histogram will be perfect. The default number of buckets and the maximum in earlier releases was 254. So if I have less than 254 values, I'm going to get perfect estimates. High-balanced histograms, that's where there are more than 254 distinct values. Oracle has to combine several values into each bucket of histogram. That's not going to be nearly as accurate.
How accurate is it? Enable autotrace facility and run a simple query. My query
Select count (*) from t1 where created > trunc(sysdate)
And that the created column has 1600 values. Therefore, a high-balanced histogram.
We get back 95 rows and here's the bad news. Oracle expected only 56 rows. Of course I just have a very small demonstration data set here. But this sort of inaccuracy of about 40 rows, [5:15 inaudible] is up to the real world and isn't 40 rows. It's 40,000 rows. This sort of mistake can totally distort your execution plans.
Let's just see what we can do with the new features. I'm going to gather the statistics again but this time I'm going to remove the estimate percent argument.
exec dbms_stats, gather_table_stats(user, 'T1', method_opt=> 'for all columns size skewonly')
Allow Oracle to use the auto sample size algorithm as it does the analysis.
Having regathered the stats, let's just see how accurate the estimates are now. Enable the autotrace and run my query.
Much better, isnt it? And I think Oracle use auto sample size, we now find the estimate is much close to reality. Why has that happened? If we look at the histogram types that had been generated, we'll see the answer.
We now have hybrid histograms rather than high-balanced histograms. The hybrid histogram algorithm is far more accurate. Notice it's imperfect but clearly, we're a lot closer to reality than we were before.
Believe me, when you have real data sets, with sensible volumes of data, the estimates you get through hybrid histograms are very nearly perfect. But what if very nearly perfect still isnt good enough for you?
There is an alternative. You can take control. As an example,
I've chosen the one column, I'm using the predicate which has about 1600 distinct values. And the size - building histogram of that with 1024 buckets. That's four times as many as possible with the previous release.
Now let's see how accurate the histogram is. Perfect. The estimate is now spot-on with reality. It doesn't get better than that.
What is the takeaway from this? 12c has new histogram types. Hybrid histograms and top frequency histograms. I havent demonstrated the top frequency histogram today. Maybe we'll do that another time. These two new histogram types are far more accurate than the previous high-balanced histograms. However, you must use auto sample size when you gather statistics. If you interfere with the auto sample size algorithm, as I demonstrated, you do not get the new histogram types.
Second major feature, the maximum number of buckets you could have, you will no longer contrain to frequency histograms of only 254 buckets. You can now go to 2048 buckets. That's going to be a huge advantage for many sites.
The conclusion? You will get when you upgrade, more accurate cardinality estimates, much better performing SQL. This should be a major driver for the 12c upgrade.