Re-Sync issue on Exadata X5-2 after adding Automatic Storage Management Cluster File System (ACFS)

Had the joy of working on a new X5-2 with a 12c database recently. We added an ACFS mount and suddenly the ASM Operations showed a long re-sync process — days instead of minutes as usual. We asked around and no one we asked had seen this behavior. So after a little searching on the Oracle Support pages we found the bug and workaround in this note: “ASM RESYNC TOOK VERY LONG FOR ONE OF THE DISKGROUP IN EXADATA (Doc ID 1957331.1)”. Fixed it!

A few notes from the docs: You may want to make sure all of your files / DB’s are mounted and active before making this change (see the Oracle Doc: 12c ASM Docs )

We’re not sure if it was related, but before making this change startup and shutdown times on some empty default database(s) seemed inappropriately long, but now that seems to be cleared up as well(hmmm).

Exadata Storage Index Aging – Part 3 – Analysis on Test 2B

Continued from previous post.

Based on the observations below, it looks as if there are more mechanisms for offloading than a max of 8 columns in the storage index array. It seems that the 8 columns parameter is a minimum parameter for retention (_cell_storage_index_columns = 8). If columns 1-10 have all been offloaded, but a post test still shows a similar, but different set of columns (look closely at 1, 19, 20 and 21) retained then the total number of columns must have risen above the 8 columns in the storage index array. I’d just attribute this to the fact that the array items are only created for 1Mb sections of each segment storage region, but the same percentage of offload occurs on the In Line (IL) tests as previously tested on an earlier IL test.

This is good news for the end users meaning there there is some elastic capacity to the storage index array above the 8 columns specified. It seems that although more than these specific columns were offloaded, but once activity died down only 8 remained active. So 8 seems to be a minimum and not a maximum.

But, still need to understand why 1, 21, 20, and 19 seem to persist longer? Was it previous activity or something else in the data? Other possibilities?

—————————————————————————————————————————

Observations
Looking at the data from Test 2B here are some summary observations of the changes:

Baseline Test:
8 offloaded columns retained from previous actions. Columns (in order of testing): 1, 7, 8, 9, 10, 19, 20, 21

Test Columns 1-10:
All ten columns get storage index offloads, but column 1 is offloaded on first try and has not been aged out of the storage index array.

IL Test – All Columns – After test on columns 1-10
Offloads not aged out in order of column testing: 10, 9, 8, 7, 1, 21, 20, 19

Test Columns 2-11:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 2-11
Offloads not aged out in order of column testing: 11, 10, 9, 8, 1, 21, 20, 19

Test Columns 3-12:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 3-12
Offloads not aged out in order of column testing: 12, 11, 10, 9, 1, 21, 20, 19

Test Columns 4-13:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 4-13
Offloads not aged out in order of column testing: 13, 12, 11, 10, 1, 21, 20, 19

Test Columns 5-14:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 5-14
Offloads not aged out in order of column testing: 14, 13, 12, 11, 1, 21, 20, 19

Test Columns 6-15:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 6-15
Offloads not aged out in order of column testing: 15, 14, 13, 12, 1, 21, 20, 19

Test Columns 7-16:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 7-16
Offloads not aged out in order of column testing: 16, 15, 14, 13, 1, 21, 20, 19

Test Columns 8-17:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 8-17
Offloads not aged out in order of column testing: 17, 16, 15, 14, 1, 21, 20, 19

Test Columns 9-18:
All ten columns get storage index offloads, all aged out (have to offload after 1st query)

IL Test – All Columns – After test on columns 9-18
Offloads not aged out in order of column testing: 18, 17, 16, 15, 1, 21, 20, 19

Test Columns 10-19:
All ten columns get storage index offloads, all aged out (have to offload after 1st query), except 19

IL Test – All Columns – After test on columns 10-19
Offloads not aged out in order of column testing: 19, 18, 17, 16, 15, 1, 21, 20

Test Columns 11-20:
All ten columns get storage index offloads, all aged out (have to offload after 1st query), except 19, and 20

IL Test – All Columns – After test on columns 11-20
Offloads not aged out in order of column testing: 20, 19, 18, 17, 16, 15, 1, 21

Test Columns 12-21:
All ten columns get storage index offloads, all aged out (have to offload after 1st query), except 19, 20, and 21

IL Test – All Columns – After test on columns 12-21
Offloads not aged out in order of column testing: 21, 20, 19, 18, 17, 16, 15, 1

Exadata Storage Index Aging – Part 2B

This is the third blog in a series looking at Storage Cell Storage Index Predicate Offloads. Continued from previous post. And there’s more here: Analysis for Test 2B click here.

For Test 2B – This corrects the columns with very little offload so that they get between 60-100 percent offload from the storage index. See the previous 2 blogs for more background information.

Results:
Now we have offload for each test segment and 8 columns offloaded for each In Line (IL) test as expected. We can even see a pattern on when columns are released (aged out) after repeated queries on the 8 columns.

The pattern I see is that after 10 queries columns 1, and 7-21 seemed to keep their offload status without aging out. But what is going on with column 1 that keeps it from aging out for the entire test. And what about 2-6 that seem to age out fairly fast after their query run with a storage index offload.

I see some patterns developing, but may need more eyes info or tests to figure this out. Will continue when I can.

Let me hear your ideas.

Here’s the test sequence again:

Test 2 - Test Plan

Test 2 – Test Plan

Here’s the full output in excel format; this time run to completion:
dcox_exadata_storage_index_aging_test2b

Exadata Storage Index Aging – Part 1

Just took some classes with some of the smart guys at the Accenture Enkitec Group (Enkitec) guys. Disclaimer: I work there too.

In my last class I got inspired by Carlos Sierra’s Exadata Tuning class. Carlos was talking about how little we know about the actual aging process in the Exadata Storage Index arena. He mentioned it might be easy to write a little code to check out this aging process. So I stole was inspired by his idea.

If you are unfamiliar with Exadata Storage Indexes read this Oracle Magazine article from Arup Nanda. Or better yet, get more details in the Expert Oracle Exadata book.

Here’s what I think I know about this process: Up to 8 columns per table can be held in the storage index array.

Target Problem: How do single table single predicate storage index offloads age out.

Environment Specifications:
Target Machine: X3 quarter rack
GI Version: 11.2.0.4.0
DB Version: 11.2.0.3.0
Parameters/Values:
_cell_storidx_mode EVA
_kcfis_storageidx_diag_mode 0
_kcfis_storageidx_disabled FALSE

Cell Parameters(storage index parameters):
_cell_enable_storage_index_for_loads = TRUE
_cell_enable_storage_index_for_writes = TRUE
_cell_si_max_num_diag_mode_dumps = 20
_cell_storage_index_columns = 8 (default = 0)
_cell_storage_index_partial_reads_threshold_percent = 85
_cell_storage_index_partial_rd_sectors = 512
_cell_enable_storage_index_for_loads = TRUE
_cell_enable_storage_index_for_writes = TRUE
_cell_storage_index_diag_mode = 0
_cell_storage_index_sizing_factor = 2

Test 1:
I’ve identified a table where I can easily get a storage index to run for most columns. I verify this by gathering statistics before and after a query execution and computing the difference. The two statistics used for this test are:
cell physical IO bytes eligible for predicate offload
cell physical IO bytes saved by storage index

From the two statistics above the percentage of storage index predicate offload can be determined (see partial output data below). This is similar to a script we used in class to capture the difference in stats on both sides of a query.

Data:
My test table has 2.5+ billion rows. I took lab table from class with 7 columns and copied the same 7 columns over to have 21 columns.  It takes about 1-4 seconds for an offloaded query and 27-30 ish seconds for a non offloaded query.  Remember lots of variables here could change things, but this specific table and data on this specific machine works for my test goals.  I can repeat queries fairly quickly and have enough difference in the numbers to see when I do and don’t get an offload for the storage index.

Test Sequence (for Test 1):
I run a baseline test and query on a value out of each column (data types: number, varchar2 and date). Then I run 10 queries on each column, then move over to the next column and do it again for a total of 10 columns. An “in-line” (IL) or intermediate test is performed on all 21 columns starting at the first column and then the test is run again, but starting in column n+1 for each iteration.

Test 1 - Test Plan

Test 1 – Test Plan

Expectation:
The initial guess is that 8 of the columns, possibly the last 8 would still perform well on the IL test, but the others would not use the storage index offload.

The query each time would look something like this:
select count(column1) from table where column1 = ‘x’;

What I found:

Ok, I didn’t get what I expected, but this is the first try. Only the first column of the IL test showed an offload, and a small bit on the second column. But then stale from there on for the IL test. A strange blip of offloads on columns 18 and 19. So I need to modify my test to get more information. Still a start at least.

It will be easy to add to these test scripts. I’ve added some of the resulting output in an image below and a little more in an excel spreadsheet attached. Here are some basic notes:

  • After running the first 10 columns through the 10 repetitions, there is still a 83.2 pct offload on column 1 when it gets run on the IL test.
  • Curiously columns 18 and 19 get offloads and they haven’t been touched since the baseline run at the start.
  • Columns 6,7,13,14 (and probably 20,21) don’t get much offload at all.

This begs for more testing. The primary interest here is still to figure out more on aging. Perhaps these columns age out quicker than thought originally.

Modifying Test (next blog):
So I’ll test this next (see next blog – TEST 2). After the main set of repetitions, on the IL test I’ll test the columns “just run” first, starting with the most recent, then test the remaining columns.

Limited/No offload on some columns:
A quick glance at Columns 6,7,13,14,20,21 shows the type are all number(10,2). But there must be some difference in the way the data turns out since the offload on column 6 is 8.1% and 0 on column 7. And remember that if the value exists on all of the cells there would be no storage index offload, but there may be other types of cell offload. (Over simplifying here, see the docs on the intro for more information on this.)

After a quick look at the counts for each values one value has 1 billion rows of the 2.5 billion rows. Here are the top 5 counts:
VALUE Count
1 109500000
81 85190307
21 85190243
61 85190243
41 84500018
Concentrating on the aging for now, will leave “why” it didn’t offload for another blog.

Note the colors closer to white show more offload. Darker colors show less offload (brown and blue).

Test Case 1 Exadata Storage Aging

Test Case 1 Exadata Storage Aging

Test Case 1 Exadata Storage Aging – Excel Version

Change from global names to non-global names

Had some requests from friends on how to change the global names, there are lots of other posts on this same topic. Here’s another.

If you started up your database with global_names=true and want to change, here is how to do this.

Perform the following tasks.

Example – Global names is using “dbname.dbdomainname.com” as the DB_NAME and DB_DOMAIN combination, non-global names would use “dbname”, with no DB_DOMAIN.

Open up a command tool and Log in as sysdba or SYS.
cmd> sqlplus / as sysdba

Create the file initDB_NAMETEMP.ora (where DB_NAME is your DBNAME).
sql> create pfile=’initTHESHOWTEMP.ora’ from spfile;

Shutdown the database.
sql> shutdown immediate

Keep the command tool open, you’ll jump back to it later.

Now edit the initDB_NAMETEMP.ora file … C:\app\oracle\product\11.2.0\server\database\initDB_NAMETEMP.ora (or $ORACLE_HOME/dbs).

Remove the line:
db_domain=’dbdomainname.com’

Change the line with the *.global_names to false
*.global_name=FALSE

Save the file initDB_NAMETEMP.ora

Back to the command tool from earlier:

create spfile from pfile=’initTHESHOWTEMP.ora’;

Startup the database.
sql> startup;

sql> show parameter db_domain
Check db_domain – should be null.

sql> show parameter global_name
Check global_names – should be false.

sql> show parameter service_name
Check the service name to be sure it matches the db_name. This is just an extra check.

You can run the listener status from the command line to verify the database is correctly registered:
cmd&gt lsnrctl status

Look for the db_name without the domain name

Change any tnsnames.ora files to match accordingly.

Test with tnsping
tnsping db_name

Try to connect, and you’re done.

Simple Permissions Example for ACL and Network Test

Here a simple example for granting the permissions to all hosts and all ports for a user. Then below is a function to test the network connection.

For scanning ports see: Scanning Ports Example

DECLARE
    p_user VARCHAR2 ( 30 ) := 'DCOX'; -- user to receive grants
BEGIN

    DBMS_NETWORK_ACL_ADMIN.create_acl (
                                        acl => 'everything.xml',
                                        description => 'Full Access',
                                        principal => UPPER ( p_user ),
                                        is_grant => TRUE,
                                        privilege => 'connect',
                                        start_date => SYSDATE - 1,
                                        end_date => NULL
                                       );
    DBMS_NETWORK_ACL_ADMIN.add_privilege (
                                           acl => 'everything.xml',
                                           principal => UPPER ( p_user ),
                                           is_grant => TRUE,
                                           privilege => 'resolve',
                                           start_date => SYSDATE - 1,
                                           end_date => NULL
                                          );

    DBMS_NETWORK_ACL_ADMIN.assign_acl (
                                        acl => 'everything.xml',
                                        HOST => '*', -- All hosts and ports
                                        lower_port => 1,
                                        upper_port => 65535
                                       );

    COMMIT;
END;
/

Function to test a single port:

create or replace FUNCTION fn_check_network
(p_remote_host in varchar2, -- host name
 p_port_no in integer default 1521 -- port number
 )
  RETURN  number IS
/*

    Purpose: Check Network Connection to tns port

    MODIFICATION HISTORY
    Person      Date        Comments
    ---------   ------      -------------------------------------------
    dcox        8/19/2013    Initial Build

*/
    v_connection   utl_tcp.connection; -- tcp connection
BEGIN

    v_connection := utl_tcp.open_connection(REMOTE_HOST=>p_remote_host, REMOTE_PORT=>p_port_no, IN_BUFFER_SIZE=>1024, OUT_BUFFER_SIZE=>1024,   TX_TIMEOUT=>5);
    RETURN 0 ;
EXCEPTION
   WHEN others THEN
       return sqlcode ;
END fn_check_network;   -- Function fn_check_network
/

Don’t forget to grant UTL_TCP to the user where you create the function. And if you add the following function grant execute on DBMS_NETWORK_ACL_ADMIN to that user too.

More complicated, but easier to maintain:

CREATE OR REPLACE FUNCTION dcox.fn_grant_acl_permission ( p_user IN VARCHAR2 -- username to grant acl permissions
                                                                             )
    RETURN NUMBER
IS
BEGIN

    DBMS_NETWORK_ACL_ADMIN.create_acl (
                                        acl => p_user || '_everything.xml',
                                        description => 'Full Access',
                                        principal => UPPER ( p_user ),
                                        is_grant => TRUE,
                                        privilege => 'connect',
                                        start_date => SYSDATE - 1,
                                        end_date => NULL
                                       );
    DBMS_NETWORK_ACL_ADMIN.add_privilege (
                                           acl => p_user || '_everything.xml',
                                           principal => UPPER ( p_user ),
                                           is_grant => TRUE,
                                           privilege => 'resolve',
                                           start_date => SYSDATE - 1,
                                           end_date => NULL
                                          );

    DBMS_NETWORK_ACL_ADMIN.assign_acl (
                                        acl => p_user || '_everything.xml',
                                        HOST => '*', -- All hosts and ports
                                        lower_port => 1,
                                        upper_port => 65535
                                       );

    COMMIT;
    RETURN 0; -- success
EXCEPTION
    WHEN OTHERS
    THEN
        RETURN SQLCODE;
END fn_grant_acl_permission; -- Function fn_grant_acl_permission
/


Quick SQL Load Testin’

Here’s a quick and simple load testing script. Add you own base.sql set of scripts to run and a sub-directory called output and you’ll be ready. More examples later.

base.sql file:

select 'x' from dual;
select count(*) from dba_objects where status != 'VALID';
select count(*) from dba_objects;

login.sql file:

set arraysize 5000
set autocommit OFF
set autoprint OFF
set autorecovery OFF
set autotrace OFF
set echo ON
set editfile "afiedt.buf"
set embedded OFF
set escape OFF
set escchar OFF
set exitcommit ON
set flagger OFF
set flush ON
set heading ON
set linesize 300
set long 300
set longchunksize 80
set newpage 1
set null ""
set numformat ""
set numwidth 40
set pagesize 0
set PAUSE OFF
set recsep WRAP
set serveroutput ON
set shiftinout INVISIBLE
set showmode OFF
set sqlblanklines OFF
set sqlcase MIXED
set sqlcontinue "> "
set sqlnumber ON
set sqlpluscompatibility 11.2.0
set sqlprompt "Default SQL> "
set suffix "sql"
set tab ON
set termout ON
set timing ON
set trimout ON
set trimspool OFF
set verify ON
set wrap off
set xmloptimizationcheck OFF
set errorlogging OFF

Exit.sql file:

exit

Korn Shell script start_load.ksh expects a sub-directory called output and testuser1 through testuser with a password you specify (or you could use the same account over and over). Permissions are up to you.

#!/usr/bin/ksh

# Environmental Variables

set -x
# Mail
RUN_MAIL_TO='david.cox@yourcompany.com'; export RUN_MAIL_TO

# Oracle
ORACLE_BASE=/u01/app/oracle

# Logging and Working Files
LOG=/home/oracle/log/load_test.$(date '+%Y%m%d.%H%M%S').log
WORKING_DIR=/home/oracle/scripts/ksh/odaperf
WORKING_DIR_OUTPUT=/home/oracle/scripts/ksh/odaperf/output
BASE_SCRIPT=$WORKING_DIR/base.sql

export LOG WORKING_DIR WORKING_DIR_OUTPUT BASE_SCRIPT

# Process
PROCESS_NAME="11g Oracle Load Test"
NO_OF_PARALLEL_SESSIONS=10 # Min is 2 or larger
SQL_TIMES=3
RAMP_UP_TIME=21
((SESSION_DELAY=NO_OF_PARALLEL_SESSIONS-1))
((SESSION_DELAY=$RAMP_UP_TIME / $SESSION_DELAY))
export PROCESS_NAME NO_OF_PARALLEL_SESSIONS SQL_TIMES RAMP_UP_TIME SESSION_DELAY

echo "Starting ${PROCESS_NAME} at `date` on `hostname` \n" >>$LOG

echo "Session Delay: ${SESSION_DELAY} /n"

# Build Script File for SQL_TIMES Iterations *********************
#
# Loop through NO_OF_PARALLEL_SESSIONS and run sql SQL_TIMES
# This means that there will be NO_OF_PARALLEL_SESSIONS to simulate sessions 
# and the base sql script will be run SQL_TIMES for each session started
#
# To do this Let's create a script from the base  that is the same as the base script run SQL_TIMES
# Let's delete copies already created in case the base script has been modified 
# Note all change should be made to the base test script

nLoopIteration=1
# Remove existing concatonates SQL SCRIPT FILE of the form base_x<no of iterations>.sql 
RUN_BASE=base_x${SQL_TIMES}
export RUN_BASE

rm ${RUN_BASE}.sql
# rebuild concatonated sql script file
while [ $nLoopIteration -le ${SQL_TIMES} ] 
do 
	cat ${BASE_SCRIPT} >> ${RUN_BASE}.sql # Append
	((nLoopIteration=nLoopIteration + 1))
done

echo "Built run script $ at `date`\n"

# Now that our special base script is created, let's start the processes with no-hup
# We can add in a ramp up delay rounded to the narest second -- ramp_up_time
# So our delay between each session start is sess_delay = ramp_up_time / (NO_OF_PARALLEL_SESSIONS-1)
# Assumption: we start a session - then sleep between sessions so the last session starts up at end of ramp_up_time period roughly
# zero time for kick off on each session is assumed (not completely true) and no program time - 
# So this means ramp up time is a rough estimate

nLoopIterations=1
while [ $nLoopIterations -le $NO_OF_PARALLEL_SESSIONS ]
do
        # Create spool command
	echo "spool ${WORKING_DIR_OUTPUT}/$(date '+%Y%m%d.%H%M%S')_${RUN_BASE}_p${nLoopIterations}.out" > ${WORKING_DIR}/${RUN_BASE}_p${nLoopIterations}.spooltext # overwrite spool text file
	# Create full sql file
	cat ${WORKING_DIR}/${RUN_BASE}_p${nLoopIterations}.spooltext ${WORKING_DIR}/login.sql ${WORKING_DIR}/${RUN_BASE}.sql exit.sql > ${WORKING_DIR}/${RUN_BASE}_p${nLoopIterations}.sql
	nohup $ORACLE_HOME/bin/sqlplus -s testuser${nLoopIterations}/PasswordGoesHere @${WORKING_DIR}/${RUN_BASE}_p${nLoopIterations}.sql  > ${WORKING_DIR_OUTPUT}/${RUN_BASE}_p${nLoopIterations}.nohupout 2>&1 &
        ((nLoopIterations=nLoopIterations + 1))
        sleep $SESSION_DELAY
done

echo "All nohup job started at `date`/n"


echo "Startup Initiated of ${PROCESS_NAME} at `date` on `hostname` /n" >>$LOG
mailx -s "${PROCESS_NAME} at `date` on `hostname` STARTED" ${RUN_MAIL_TO} < $LOG

exit 0