The Purpose of COLLECT STATISTICS is to gather and store demographic data for one or more columns or indices of a table or join index.In this process it collects data and stores the summary in the Data Dictionary (DD) inside USER DBC.

The optimizer uses this synopsis data to generate efficient table access and join plans.

Below are the statistics will collect

Do NOT COLLECT Statistics on all columns and indexes because it takes too much space.

COLLECT STATISTICS on COLUMN ;

COLLECT STATISTICS on INDEX ();

Examples for a better Understanding

COLLECT STATISTICS on Emp_Table ;

COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;

COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);

COLLECT STATISTICS on Emp_Table INDEX Emp_no ;

COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);

COLLECT STATISTICS COLUMN(First_Name, Last_Name)

1) There is a

Table-level statistics known as "summary statistics" are collected whenever column or index statistics are collected.

SHOW SUMMARY STATISTICS VALUES ON Employee_Table;

One critical advantage is that the optimizer now uses summary stats to get the most up-to-date row count from the table in order to provide more accurate extrapolations

2)

SAMPLE n PERCENT allows you to specify sampling at the individual statistics collection level rather than at the system level

COLLECT STATISTICS USING Sample 20 Percent COLUMN (Last_Name) ON Employee_Table;

3) Statistics are stored in DBC.StatsTbl to reduce access contention and improve performance.

4) New views DBC.StatsV, DBC.ColumnStatsV, DBC.MultiColumnStatsV, and IndexStatsV.

5) SHOW STATISTICS statement reports detailed statistics in plain text or XML formatting.

6)

Teradata only placed the first 16 bytes in the statistics before Version 14 but now the default is 25 bytes.

COLLECT STATISTICS USING MAXVALUELENGTH 50 COLUMN (Item_Name) ON Shipping_Table

The optimizer uses this synopsis data to generate efficient table access and join plans.

Below are the statistics will collect

- The number of rows in the table
- The average row size
- Information on all Indexes in which statistics were collected
- The range of values for the column(s) in which statistics were collected
- The number of rows per value for the column(s) in which statistics were collected
- The number of NULLs for the column(s) in which statistics were collected

**What to COLLECT STATISTICS On?**Do NOT COLLECT Statistics on all columns and indexes because it takes too much space.

- Primary Index of a Join Index
- Secondary Indexes defined on any join index
- Non-indexed columns used in joins
- The Unique Primary Index of small tables (less than 1,000 rows per AMP)
- All Non-Unique Primary Indexes and All Non-Unique Secondary Indexes
- Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
- The Unique Primary Index of small tables (less than 1,000 rows per AMP)
- Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins

**Syntax for COLLECT STATISTICS**COLLECT STATISTICS on

COLLECT STATISTICS on

Examples for a better Understanding

COLLECT STATISTICS on Emp_Table ;

COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;

COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);

COLLECT STATISTICS on Emp_Table INDEX Emp_no ;

COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);

The New Teradata V14 Way to Collect StatisticsThe New Teradata V14 Way to Collect Statistics

COLLECT STATISTICS COLUMN(First_Name, Last_Name)

,COLUMN(First_Name)

,COLUMN(Dept_No) ON Employee_Table;

With the latest approach, Only a single table scan is required instead multiple table scan like old approach.

Teradata keeps all Collect Statistics information in DBC in the Data Dictionary tables. Those are

,COLUMN(Dept_No) ON Employee_Table;

With the latest approach, Only a single table scan is required instead multiple table scan like old approach.

Teradata keeps all Collect Statistics information in DBC in the Data Dictionary tables. Those are

*DBC.Indexes (for multi-column indexes only)**DBC.TVFields (for all columns and single column indexes)**DBC.StatsTbl (Teradata V14 and beyond)*

**Collect Statistics on a PPI Table on the Partition**
COLLECT STATISTICS on Order_Table_PPI COLUMN PARTITION

The main reasons to collect stats on Partiton are

The main reasons to collect stats on Partiton are

- PE will give better plan for PPI tables
- It helps on Partition Elimination on Range Queries.
- Mainly helpful when a table has a lot of empty partitions.

**Teradata V14 Statistics Enhancements**1) There is a

*option to collect table-level statistics*

**SUMMARY**Table-level statistics known as "summary statistics" are collected whenever column or index statistics are collected.

SHOW SUMMARY STATISTICS VALUES ON Employee_Table;

One critical advantage is that the optimizer now uses summary stats to get the most up-to-date row count from the table in order to provide more accurate extrapolations

2)

*option allows the system to determine the sampled system percentage.*

**SYSTEM SAMPLE**SAMPLE n PERCENT allows you to specify sampling at the individual statistics collection level rather than at the system level

COLLECT STATISTICS USING Sample 20 Percent COLUMN (Last_Name) ON Employee_Table;

3) Statistics are stored in DBC.StatsTbl to reduce access contention and improve performance.

4) New views DBC.StatsV, DBC.ColumnStatsV, DBC.MultiColumnStatsV, and IndexStatsV.

5) SHOW STATISTICS statement reports detailed statistics in plain text or XML formatting.

6)

**MaxValueLength**Teradata only placed the first 16 bytes in the statistics before Version 14 but now the default is 25 bytes.

COLLECT STATISTICS USING MAXVALUELENGTH 50 COLUMN (Item_Name) ON Shipping_Table

7)

**MaxIntervals**
Whenever you collected statistics, Teradata did a full table scan on the values, sorted them, and then placed them into 200 intervals. Now, the default is 250 intervals

8)

COLLECT STATISTICS USING SYSTEM SAMPLE COLUMN (EXTRACT(MONTH FROM Order_Date), Order_Total) AS Month_Plus_Total ON Order_Table;

Also use Substr(), MOD, CONCATENATION, Format Conversion expressions like Uppercase, Lowercase, Data type Conversions, CASE-Expressions, BEGIN and END expressions of PERIOD types.

9)

Teradata V14.10 Autostats identifies and collects missing statistics needed and detects stale statistics for refreshing. If statistics are not being used by optimizer, they will be removed.

A new repository of a system supplied database named

10) External stored procedures (XSPs) perform the stats management process.

8)

*can now be specified in COLLECT STATS statements***Expressions**COLLECT STATISTICS USING SYSTEM SAMPLE COLUMN (EXTRACT(MONTH FROM Order_Date), Order_Total) AS Month_Plus_Total ON Order_Table;

Also use Substr(), MOD, CONCATENATION, Format Conversion expressions like Uppercase, Lowercase, Data type Conversions, CASE-Expressions, BEGIN and END expressions of PERIOD types.

9)

*feature***AutoStats**Teradata V14.10 Autostats identifies and collects missing statistics needed and detects stale statistics for refreshing. If statistics are not being used by optimizer, they will be removed.

A new repository of a system supplied database named

*stores metadata for all stats collections. This is created by a new process called***TDSTATS***.***DIPSTATS**10) External stored procedures (XSPs) perform the stats management process.

These are the latest enhancements of COLLECT STATISTICS.

COLLECT STATISTICS COLUMN(First_Name, Last_Name)

ReplyDelete,COLUMN(First_Name)

,COLUMN(Dept_No) ON Employee_Table;

.........................................

In this case it scans two times, Because deptno is not selected in first scan.

COLLECT STATISTICS COLUMN(First_Name, Last_Name)

,COLUMN(First_Name)

,COLUMN(Last_Name) ON Employee_Table;

In this case it go for single scan, because first_name & last_name data already available while collecting on the combination. Please correct me if I am wrong...

Murali Udayagiri

DWH Evangelist