应用系统阅读之数据仓库(中英文对照)

来源:百度文库 编辑:神马文学网 时间:2024/04/29 10:54:17
作者: 来源: http://www.csai.cn 2006年9月13日
DATA WAREHOUSE
It is often said,the age of the Industrial Revolution has finally been completed and the world has entered the age of the Information Technology revolution(see Fig. 7-5).It is our belief that the need for data warehouse applications is one of the manifestations of this Information Technology age.A data warehouse is becoming more of a necessity than an accessory for a progressive,competitive,and focused organization[1].It provides the right foundation for building decision support and executive information system tools that are often built to measure and provide a feel for how well an organization is progressing toward its goals[2].
1.COLLECTING OPERATIONAL DATA
Advances in computer and networking technology have led to the introduction of very powerful hardware and software platforms that can collect,manage,and distribute large amounts of pertinent data.In the case of a business application,detailed transactions are often generated during product-or service-related interactions.These transactions are not limited to commercial sectors.They are also found in sectors such as,government,health care,insurance,manufacturing,finance,distribution,education,and so on.Any enterprise that has some computerized record keeping systems and is interested in deducting or drawing logical conclusions from their voluminous,granular,and detailed information pool should consider building an enterprise-level data warehouse application[3].These enterprises will then be capable of improving their insights into the trends in their operations and eventually increase the accuracy of their forecasts and plans.The effectiveness of the data warehouse application intensifies especially when the operational data resides in distributed,non-homogenous systems and replace manual data gathering and reconciliation procedures.
Operational data is the highly structured sets of information that support the ongoing and day-to-day operation of an organization.In case of a decentralized organization,operational data is generated at remote locations sometimes in non-homogeneous distributed systems(see Fig. 7-6).Distributed systems can span many different geographical locations and time zones.They are configured to provide scalability,visibility,and tracking capabilities of business processes.For instance,the order is entered by a customer representative in one site.The financial state of the order is verified at another site.Once approved,it is forwarded to manufacturing to be assembled[4].Finally,the shipping staff is alerted to fulfill the order that was booked at the remote site[5].Standard reports or ad hoc queries that inquire about the details of these events are typical examples of operational reports.They are generated on a regular basis.Any delay in their processing will have a significant disruption to the normal operation of that business.
2.CONVERGENCE OF MANY COMPUTER TECHNOLOGIES
The infrastructure that supports the data warehouse application relies on the same technologies that most other applications are dependent upon.The difference is in the variety and specialization at the product level that can greatly improve the quality of the data warehouse infrastructure.
Below are some technologies that have made their mark in the data warehouse marketplace.In order to produce a data warehouse that best meets users needs,these underlining technologies have to be evaluated as part of the periodic resource capacity planning.Depending upon the requirements and resources available,the best combination can be selected and configured.
•  Server technology.
•   Client technology.
•  Database Management System(DBMS)technology.
•   Networking technology.
. Mass storage technology.
•   Data presentation and publication requirements.
•   Software engineering methodology and tools.
COMMON CHARACTERISTICS OF A DATA WAREHOUSE
(1)Data is divided into three categories.
(a)Reference and Transaction Data.
•   Includes lists,charts,and transaction data from source systems.
•   Originally generated in the source systems.
•  Can be kept in the data warehouse or an operational data store system.
•   Is loaded into the data warehouse on a regular basis.
•   Should never change once in the data warehouse(data correction and refresh are exceptions).
•   May be purged from the source.
•   Is archived in the data warehouse if purged from the source.
(b)Derived Data.
•   Is based on the reference data and certain business rules.
•   Can always be re-created.
•   Business rules must be approved by end-users.
(c)Denormalized Data.
•   Is based on the detailed reference data.
•  Is prepared periodically.
•   Is the foundation for OLAP tools.
(2)Enhancements are done in an iterative approach.
(3)Enhancements should be based on the overall architecture.
(4)One end-user tool may not be adequate for all analytical needs.Depending on the amount of data and type of queries,different end-user tools must be selected.
(5)Transaction-level database recovery is not necessary.
(6)Data warehouse platform should he tuned for performance rather than quick recovery purposes.
NOTES
[1] 此处accessory是指附属机构,附庸。
[2] that后面引出的是定语从句,是对前面的information system tools的说明,而provide a feel for后面的how…是一个宾语从句。
[3] 长句,主句为Any enterprise…should consider…主语Any enterprise后面是由that引出的有并列谓语的定语从句that has… and is interested in…
[4] manufacturing制造(业)。
[5] that引出的是定语从句,修饰order。
KEYWORDS
data warehouse       数据仓库
information technology(IT)    信息技术
decision support       决策支持
operational data       操作数据
platform         平台
transaction        事务(处理)
distrbuted system       分布式系统
infrastrticture        基础设施
client         客户
mass storage        大容量存储器、海量存储器
data refresh        数据刷新
information pool       信息库
iterative approach       迭代方法
database recovery       数据库恢复
翻译:
人们常说工业革命时代已最终完成,世界进入了信息技术革命时代(参见图7-5)。我们相信数据仓库应用的需求就是信息技术时代的标志之一。数据仓库对于不断进取的、具有竞争力的、成为关注焦点的组织来说不只是一个附属物,而是不可缺少的。它提供建立决策支持和执行信息系统工具的基础,这些常常是用于衡量和评价某个单位是否正在向预定目标前进的标志。
1.操作数据的收集
计算机和网络技术的进步已经导致功能非常强大的,能收集、管理和分发大量有关数据的硬件和软件平台出现。在商业应用中,与产品和服务相关的交往经常会涉及繁琐的事务处理,这些事务不限于商业方面,它们也会在政府、保健、保险、制造、财经、分销、教育等方面出现。任何企业,如果有了某些计算机化记录保存系统,并且很希望从大量松散详细的信息池中演绎或作出逻辑结论,就应考虑建立一个企业级数据仓库应用系统。之后,这些企业就能够提高洞察运营趋势的能力,最终提高其预测和计划的精确性。尤其是操作数据分布在多个异构系统上,并可取代人工数据收集和调节过程时,数据仓库应用系统能提高效率。
操作数据是高度结构化的、支持一个单位开展日常工作以及持续发展的信息集。如果一个单位的各部门是分散的,则操作数据有时是在远程的分布式异构系统中产生的(如图7-6所示)。分布式系统可以跨不同的地域和时区。分布式系统可以配置成使商业过程具有伸缩性、可视性和跟踪的能力。例如,由客户代表在一个地方输入订单,而订货人的财经状况在另一个地方进行验证。一经批准,则订单传向加工厂装配。最后,由于是在远端站点预订的,因而要提醒装运人员完成订单规定的工作。标准报告或特别询问是上述事件细节的查询操作报告的典型实例,报告是正规的。这一过程中的任何延误都会对正常的运作造成很大的损害。
2.聚集了多种计算机技术
支持数据仓库应用的基础设施与其他大多数应用所依靠的是相同的基础设施不同的是产品层次上的种类和专用性方面,这在很大程度上提高了数据仓库基础设施的质量。
下面是在数据仓库市场上最具代表性的一些技术为了产生能最好地满足用户需求的数据仓库,作为周期性资源能力计划的一部分须对这些重要技术进行评价。根据需要和可用的资源,可以选择和配置最佳组合。
服务器技术。
客户技术。
数据库管理系统(DBMS)技术。
联网技术
海量存储技术。
数据表示和发布需求。
软件工程方法学和工具。
3.数据仓库的一般特性
(1)数据分为3种类型。
①基准和事务数据。
包括出自源系统的列表、图表和事务数据。
源系统产生的原始数据。
可以保存在数据仓库或在操作数据存储系统中的数据。
按正规格式装入到数据仓库的数据。
数据仓库中从未更改的数据(数据校正或刷新除外)。
可从源系统清除的数据。
从源系统清除而又在数据仓库中存档的数据。
②导出数据。
基于基准数据和一定业务规则的数据。
总能再生成的数据。
必须由最终用户认可的业务规则。
③非正规化数据。
基于详细的基准数据的数据。
周期性准备的数据。
用于OLAP 工具的基础数据。
(2)加强了迭代方法。
(3)增强应基于整个体系结构。
(4)一种最终用户工具不一定适应所有的分析需求。应根据所查询的数据量和类型,选择不同的终端用户工具。
(5)不必有事务级数据库恢复。
(6)数据仓库平台应以性能而不是以快速恢复为目标进行调试。