一本面向 Oracle 运行的好书《Oracle 24x7 Tips and Techniques》(Venkat S. Devraj, Osborne/McGraw-Hill, November 1
来源:百度文库 编辑:神马文学网 时间:2024/04/27 16:52:02
这本99年的书,作者根据自己咨询经验,总结提炼了很多 Tips(窍门),如果你生活在高可用运行环境中,你会碰到那些作者提到的问题,作者不保守,乐于分享知识
图书英文目录 (Table of contents)
Part I: Introduction.
Chapter 1. Identifying our Uptime Requirements.
Understanding What 24X7 Means for Your Organization
Working Toward an "Ideal" 24X7 System
Tips for Semi-Technical Managers and Supervisors
Summary
Chapter 2. Understanding and Dealing with Emergencies.
What is an Emergency?
Summary
Part II: Understanding Your Environment.
Chapter 3. Hardware Configuration.
Hardware Configuration
Know real well what's inside the box
choose your disk-array size with caution
Do Not Use Read-Ahead Caches for OLTP Applications
Do Not Rely on Write Caches to Eliminate I/O Hotspots
Use multi-level RAIDs
Ensure Your Stripe-Sizes in line with Your OS and Database Block-Sizes
Ensure that your disk and tape I/O sizes match
Architectural Alternatives for System Redundancy and Performance
Summary
Chapter 4. Operating Systems.
Oracle on Unix Versus Oracle on Windows NT
Kernel and Block Size
Learn to customize your kernel
Know your OS logical and physical block sizes
Raw Devices
......
Pre-create abundant raw partitions
Choose multiple standard sizes for raw partitions
Do not use cylinder 0 of the disk for creating raw partitions
Create symbolic links for all raw devices
Be aware of other file system options
System Capacity and Bottlenecks
Proactively ensure that system capacity is not reached at any time
Distribute all resource-intensive applications among multiple servers
Set up limits on CPU and memory use for non-database servers
Discourage non-production jobs during peak production hours
Avoid resource-intensive commands that compete with Oracle processes
Do not set priority levels for Oracle-related processes
Do not use processor affinity on the database server
Check frequently for memory leaks
Set swap space between two and four times larger than physical memory
Distribute swap area across your fast disks
Know whether your OS can address more than 2GB of RAM
If possible, lock your shared-memory area in physical memory
Understand logical and physical drives mapping
Enable file system journaling on all production boxes
Check the availability of free disk space on a regular basis
Keep your file systems and directories slim
Maintain a spare root file system
If available, turn on large file support
Look at important OS logs at regular intervals
Invest in automated tools to monitor system bottlenecks
Chapter 5. Networks.
Managing your network
......
Ensure that your network is not being overloaded
Configure your network to use sub-networking effectively
Customizing SQL*Net and Net8 for Your Network Configuration
Increase the network queue size
Disable the NAGLE algorithm
Set the SQL*Net or Net8 packet size to match the protocol MTU
Deploy homogeneous servers throughout your enterprise
Tips for Semi-Technical Managers and Supervisors
Chapter 6. Applications and Data.
Applications
......
Make application code independent of future schemetic changes
Evaluate using transaction splitting to clone data
Evaluate using pro*C or OCI for all mission-critical applications
Implement failover capabilities within all mission-critical applications
Be familiar with various automatic failover options
Have MAXEXTENTS unlimited for all volatile application segments
Expedite your data loads (加速数据装载过程)
Version control your application source code
Manage your indexes efficiently
Use threads,rather than processes,in your applications
Use shared libraries in your applications
Invest time in physical database design
Some quick application-realted tips
Data
Take time to understand the schema model
Categorize your data
Part III: Database Setup and Configuration.
Chapter 7. Installing, Configuring and Customizing our Database Environment.
Server Configuration
Follow the OFA standard
Always use a config.ora file
Use the crdb_SID.sql and crdb2_SID.sql creation scripts
Ensure sufficient disk space for at least two weeks' worth of trace files and alert-log
Consider OS block size and application characteristics when setting the database block size
Ensure that contiguous(相邻) blocks per write are set high
For a large number of data-files in Oracle7.x, enable CKPT
Use LOG_CHECKPOINT_TIMEOUT and/or LOG_CHECKPOINT_INTERVAL toensure instance-recovery times specified in your SLA are met
Create more than three mirrored online redo-log groups for high-actvity databases
Place redo-logs such that ARCH and LGWR do not contend with each other
Configure adequate redo latches to minimize contention
Ensure that dbwr can keep up with database loads
Partition your buffer cache into multiple buffer pools according to segment usage patterns
Use process slaves to alleviate I/O bottlenecks during high database activity
Enable vector posting
Set up dedicated temporary tablespaces
Set up sort area efficiently
Use direct writes for sorts
Use profiles to prevent runaway processes from hogging all resources
Evaluate using vectorized reads for cooked file systems
Enable direct I/O when using cooked file systems
Use intimate shared memory(ISM)for databases with high concurrent user access
Use post-wait drivers for high IPC thrughout
Pre-page and "lock" the SGA in main memory
Always install SQL*Plus help
Other Self-Explanatory(Well,Almost)Server Configuration Tips
SQL*Net/Net8 Configuration
Use Oracle names when dealing with a large number of clients
Use out-of-band breaks with SQL*Net/Net8 or set poll frequency high when out-of-band breaks are unavailable
Keep dead-connection detection to a minimum
Manage large user populations effectively via SQL*Net and Net8 features
Pre-spawn dedicated server processes
Chapter 8. Database Upgrades, Downgrades, Reorganizations, and Migrations.
Upgrades
Downgrades
Reorganizations
Migrations
When to Upgrade
Practice the migration to a test environment
Analyze each step and construct strategies to expedite it
Gotchas to watch for while upgrading from Oracle7 to Oracle8/8i
Quick tips to enhance load/unload performance
Real-Life Migration Case Study
Chapter 9. Backup, Recover, and Archiving Policies and Procedures.
Backups
Select at least two backup methods after adequate analysis
Perform hot backups during period of low DML activity
Do not place all tablespaces in hot backup mode simultaneously
If possible, back up to disk first and then to tape
Do not back up online redo-logs during OS hot backups
Evaluate the use of tripe mirrors
Make sure that hot backup commands are executed synchronously
Check the backup logs regularly
Avoid performing backups manually
Ensure that security is not compromised when using backup tools
With RMAN, periodically resynchronize the recovery catalog
Ensure that your backup strategy accounts for all Oracle file-types
Back up the control-files immediately after all database changes
Address export-related comcerns before you start performing exports
Form "export-groups" for inter-segment consistency
Plan sufficient space for export dumps
Perform exports using direct path, rather than conventional path
Ensure compatibility and usability of all backed-up export-dumps
Perform a backup immediately after any system clock changes
Check for database corruption during and after backups
Set optimal I/O size during backups
Make use of incremental backups available in Oracle8
Archiving
Evaluate whether ARCHIVELOG is necessary for your environment
Implement antomatic archival as the primary archival mode
Allocate adequate space for ARCHIVE_LOG_DEST
........
Take a complete backup as soon as ARCHIVELOG mode is enabled
Use ARCHIVE LOG CURRENT rather than a variation
Do not create holes in the archived log sequence
If possible, keep dual copies of archived logs on disk
Create more online redo-log groups when archiving to offline media
Be aware of special archiving considerations, when using OPS
Recovery
Understand the need for recovery and take proactive steps to avoid failure
Understand factors affecting recovery time
Ensure that recovery timings are consistent with your SLA MTTR specifications
Consider alternative backup/recovery approaches for point-in-time recovery
Ensure that checkpoints occur at sufficient intervals and redo-logs are sized appropriately to prevent recovery delays
Always consider the lowest unit of recovery
Always examine v$datafile prior to recovery
Maintain a special init.ora file for recovery operations
Always use RESETLOGS as the last resort
Evaluate all UNRECOVERABLE and NOLOGGING operations
Catergorize all segments and tablespaces for recovery
Keep tablespaces without write activity READONLY to expedite backup and recovery
Set AUTORECOVERY on during recovery
Tips for semi-technical managers and supervisors
.......
Document all database structural changes
Chapter 10. Startup and Shutdown Procedures.
Startup and Shutdown Procedures
Always Automate Database Startup and Shutdown
Use OEM for Starting UP/Shutting Down Multiple Databases
For Urgent Database Bounce Requirements, Consider a "Shutdown Abort"
Just Prior to a SHUTDOWN ABORT do an ALTER SYSTEM CHECKPOINT
Part IV: Database Maintenance.
Chapter 11. General Maintenance.
Proactive and Reactive Maintenance
Implement robust security at all possible levels
Understand the "ORA-1555: Snapshot too old" error and take steps to avoid it
Understand, prevent, and resolve latch and enqueue contention
Be familiar with different wait events and take steps to reduce them
Periodically watch for and resolve locking conflicts
Monitor shared pool effectiveness and tune, if necessary
Other Critical Areas That Your General Maintenance Routines Need to Address
Chapter 12. Space and Growth Management.
Understanding and managing space and growth
Understand the business drivers influencing segment growth
Understand the units of space consumption
Understand the various segments that comprise your database
Be familiar with the errors and effects that result from inadequate space management
Enable the AUTOEXTEND feature for all application tablespaces and SYSTEM
Use standardized file-sizes
Accommodate the data-file header block in your standardized file-sizes
Use standardized storage clauses for each segment-type to reduce fragmentation and use space more optimally
Set INITIAL and NEXT keeping DB_BLOCK_SIZE in mind
Set PCTINCREASE optimally to reduce fragmentation
Understand FREELISTS and set them optimally to reduce latch contention
Periodically coalesce your application tablespaces
Understand the effect of an erroneous high-water mark
Periodically deallocate unused blocks
Avoid dropping data-files explicitly
Learn the "tablespace shuffle" for emergency unanticipated space requirements
In a DSS environment, do not blindly accept unrestricted segment growth
Part V: Troubleshooting.
Chapter 13. Interpreting the Alert-Log and Trace-Files.
The Alert-Log
Whenever possible, continually tail the alert-log
Be familiar with events/messages in the alert-log
Edit repetitive error messages to be listed just once
Archive and trim the alert-log once every two weeks to three months
Trace-Files
Know what to look for in a trace-file
Know how to identify trace-file belonging to a specific event/session
Know what causes trace-files to be generated
Tracing
SQL*Net and Net8 Tracing
PL/SQL Tracing
ODBC Tracing
Core Dumps
Oracle Debugging Tools
Whenever necessary, make your trace-files readable by developers - 让开发人员拥有读到跟踪文件的权限
Chapter 14. Identifying and Repairing Database Corruption.
Understanding, Preventing, and Repairing Corruption
Understand corruption and the ways it can manifest itself
Proactively check for and detect media corruption
Understand what you can do to prevent media corruption
Be familiar with techniques to deal with memory corruption
Be familiar with techniques to deal with logical corruption
Identify and list all components susceptible to corruption
Identify and list measures to prevent, detect and repair corruption in your environment
Retain relevant information regarding the corruption
Train your administrators to handle corruption
Part VI: High Availability Solutions.
Chapter 15. Standby Options.
Standby Options from Oracle
Consider using a standby database for disaster recovery
Consider using standby instances in environments where true 24X7 uptime is not mandatory
Consider combining multiple solution to form a comprehensive one - 多方案组合
Strategies for availability During Routine Maintenance and Emergencies - 例行维护和紧急情况下的可用性策略
Evaluate using a Custom Standby Database(CSD)
Consider using standby tables to cater to segment-level failures
Be aware of caveats with the custom solutions desccribed here
Chapter 16. Oracle Parallel Server.
Understanding and managing an OPS environment
Know what OPS is and how it enhences availability
Understand, detect, and eliminate false pinging
Partitioning applications prior to deploying them on OPS
Chapter 17. Advanced Replication.
Understanding AR
Be familiar with AR operations and basic functioning
Understand your options for highly available AR implementations
Be aware of possible AR architectural alternatives
Chapter 18. Third-Party HA Solutions.
Hardware/OS Based
Consider EMC SRDF for disaster-recovery purposes
Evaluate using EMC TIMEFINDER for higher return on investment
Database-Based
Consider Quest Software's Shareplex for Oracle for higher standby flexibility and concurrent reporting access
Part VII: Building A Real-World Arsenal.
Chapter 19. The 24x7 Toolkit.
Using Scripts to Be Alerted to Disruptive Problems
Monitor for specific failures by running poll-scripts periodically
Be familiar with techniques for application failover when calling PL/SQL
Part VIII: New Features.
Chapter 20. New HA Features in Oracle8i.
Maintenance-Oriented Features
Consider transportable tablespaces for quick movement of data subsets
Consider locally managed tablespaces to prevent fragmentation
Be familiar with new change management features
Create, rebuild and defragment indexes online
Be familiar with non-partitioned table reorganization features
Drop columns, when necessary, with greater ease
Troubleshooting Features
Be familiar with new techniques to repair corruption
Backup and Recovery-Oriented Features
Utilize methods to facilitate and expedite RMAN backups
Control and expedite recovery time
Miscellaneous HA Features
Use newer listener failover and load-balancing services
Be familiar with LogMiner functionality
Be aware of an option to eliminate "fat" OS overhead
A: References.
Index