陈老师:1415968548 郑老师:2735197625 乐老师:354331153
客服热线:
19941464235/19906632509(微信同号)

客服微信

【Oracle OCP】外部表

作者:炎燚小寶
发布时间:2023-12-19 09:14
浏览量:441

本文为云贝教育 刘峰 原创,请尊重知识产权,转发请注明出处,不接受任何抄袭、演绎和未经注明出处的转载。

外部表允许 Oracle 查询存储在数据库外部平面文件中的数据。 ORACLE_LOADER 驱动程序可用于访问以任何格式存储,主要由 SQL*Loader 和DUMP加载的数据。不能对外部表执行 DML,但它们可以用于查询、联接和排序操作。可以针对外部表创建视图和同义词。它们在数据仓库的 ETL 过程中非常有用,因为数据不需要暂存并且可以并行查询。它们不应该用于频繁查询的表。


一、ORACLE_LOADER Access Driver

1.1、基础用法

创建一个指向文件位置的目录对象。

mkdir -p /home/oracle/loader_dir

CREATE OR REPLACE DIRECTORY ext_tab_data AS '/home/oracle/loader_dir';
grant read,write on directory ext_tab_data to public;


目标数据如下

#Countries1.txt

ENG,England,English
SCO,Scotland,English
IRE,Ireland,English
WAL,Wales,Welsh

#Countries2.txt
FRA,France,French
GER,Germany,German
USA,Unites States of America,English


使用 CREATE TABLE..ORGANIZATION EXTERNAL 语法创建外部表。这定义了表的元数据,描述它应该如何显示以及如何加载数据。

CREATE TABLE countries_ext (
  country_code      VARCHAR2(5),
  country_name      VARCHAR2(50),
  country_language  VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY ext_tab_data
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ','
    MISSING FIELD VALUES ARE NULL
    (
      country_code      CHAR(5),
      country_name      CHAR(50),
      country_language  CHAR(50)
    )
  )
  LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 4
REJECT LIMIT UNLIMITED;


外部表创建后,就可以像普通表一样进行查询。

 SELECT *
    FROM   countries_ext
    ORDER BY country_name;

COUNT COUNTRY_NAME                 COUNTRY_LANGUAGE
----- ---------------------------- -----------------------------
ENG   England                      English
FRA   France                       French
GER   Germany                      German
IRE   Ireland                      English
SCO   Scotland                     English
USA   Unites States of America     English
WAL   Wales                        Welsh

7 rows selected.


如果加载文件尚未保存在适当的目录中,将显示以下结果。

SQL> SELECT *
  2  FROM   countries_ext
  3  ORDER BY country_name;
SELECT *
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file Countries1.txt in EXT_TAB_DATA not found
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1


一旦表正常运行,我们就可以针对它创建视图。

CREATE OR REPLACE VIEW english_speaking_countries AS
  SELECT *
  FROM   countries_ext
  WHERE  country_language = 'English'
  ORDER BY country_name;

SELECT *
FROM   english_speaking_countries;

COUNT COUNTRY_NAME                 LANGUAGE
----- ---------------------------- ----------------------------
ENG   England                      English
IRE   Ireland                      English
SCO   Scotland                     English
USA   Unites States of America     English

4 rows selected.


默认情况下,加载操作日志创建在与加载文件相同的目录中,但可以使用 LOGFILE 参数更改此设置。


任何无法加载的行都会写入错误文件。默认情况下,坏文件创建在与加载文件相同的目录中,但可以使用 BADFILE 参数更改。
请记住,外部表的每次查询都会导致文件被再次读取,因此请尝试尽可能少地传递外部表。

1.2、查看ALERT日志

以下示例展示了如何使用外部表来查询警报日志的内容。

CREATE OR REPLACE DIRECTORY bdump AS '/opt/oracle/diag/rdbms/orclcdb/ORCLCDB/trace/';

DROP TABLE alert_log;

CREATE TABLE alert_log (
  line  VARCHAR2(4000)
)
ORGANIZATION EXTERNAL
(
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY bdump
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE
    BADFILE bdump:'read_alert_%a_%p.bad'
    LOGFILE bdump:'read_alert_%a_%p.log'
    FIELDS TERMINATED BY '~'
    MISSING FIELD VALUES ARE NULL
    (
      line  CHAR(4000)
    )
  )
  LOCATION ('alert_orcl.log ')
)
PARALLEL 1
REJECT LIMIT UNLIMITED
/

SET LINESIZE 1000
SELECT * FROM alert_log;


1.3、PREPROCESSOR


PREPROCESSOR 子句来标识用于在外部表读取文件之前处理文件的目录对象和脚本。 PREPROCESSOR 子句对于读取压缩文件特别有用,因为它们被解压缩并直接通过管道传输到外部表进程中,而无需在文件系统上解压缩。

CREATE OR REPLACE DIRECTORY exec_dir AS '/usr/bin';

drop table countries_ext;

CREATE TABLE countries_ext (
  country_code      VARCHAR2(5),
  country_name      VARCHAR2(50),
  country_language  VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY ext_tab_data
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    PREPROCESSOR exec_dir:'zcat'
    FIELDS TERMINATED BY ','
    MISSING FIELD VALUES ARE NULL
    (
      country_code      CHAR(5),
      country_name      CHAR(50),
      country_language  CHAR(50)
    )
  )
  LOCATION ('Countries1.txt.gz','Countries2.txt.gz')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;

select * from countries_ext;


必须对包含可执行文件或脚本的目录授予 EXECUTE 权限。


如果您需要指定命令行参数,您应该创建一个脚本来执行该操作,然后在 PREPROCESSOR 子句中调用该脚本。例如,假设我们在可执行目录中有一个名为“my_unzip.sh”的可执行文件,其中包含以下内容。

#!/bin/bash
/bin/gunzip -c $1


我们可以在 PREPROCESSOR 子句中使用以下内容。

PREPROCESSOR exec_dir:'my_unzip.sh'


1.4、相关视图


[DBA|ALL|USER]_EXTERNAL_TABLES 视图显示有关外部表的信息。
SELECT COUNT(*) FROM dba_external_tables;

  COUNT(*)
----------
       194

二、ORACLE_DATAPUMP Access Driver



ORACLE_DATAPUMP 访问驱动程序能够指定将数据卸载到转储文件时使用的压缩级别。此功能需要 Oracle Advanced Compression 选项并将 COMPATIBLE 初始化参数设置为 12.0.0 或更高版本。
CONN scott/tiger@pdb1

-- Delete if it already exists.
DROP TABLE emp_ext;
HOST rm /tmp/emp_ext.dmp

-- Unload EMP table into dump file using compression.
CREATE TABLE emp_ext
  ORGANIZATION EXTERNAL
   (
     TYPE ORACLE_DATAPUMP
     DEFAULT DIRECTORY TEST_DIR
     ACCESS PARAMETERS (COMPRESSION ENABLED MEDIUM) 
     LOCATION ('emp_ext.dmp')
   )
   AS SELECT * FROM emp;

-- Create a new external table using the compressed dump file.
DROP TABLE emp2_ext;

CREATE TABLE emp2_ext (
  EMPNO     NUMBER(4),
  ENAME     VARCHAR2(10),
  JOB       VARCHAR2(9),
  MGR       NUMBER(4),
  HIREDATE  DATE,
  SAL       NUMBER(7,2),
  COMM      NUMBER(7,2),
  DEPTNO    NUMBER(2))
  ORGANIZATION EXTERNAL (
     TYPE ORACLE_DATAPUMP
     DEFAULT DIRECTORY TEST_DIR
     LOCATION ('emp_ext.dmp')
  );

SELECT * FROM emp2_ext;

     EMPNO ENAME      JOB	       MGR HIREDATE	    SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK	      7902 17-DEC-80	    800 		   20
      7499 ALLEN      SALESMAN	      7698 20-FEB-81	   1600        300	   30
      7521 WARD       SALESMAN	      7698 22-FEB-81	   1250        500	   30
      7566 JONES      MANAGER	      7839 02-APR-81	   2975 		   20
      7654 MARTIN     SALESMAN	      7698 28-SEP-81	   1250       1400	   30
      7698 BLAKE      MANAGER	      7839 01-MAY-81	   2850 		   30
      7782 CLARK      MANAGER	      7839 09-JUN-81	   2450 		   10
      7788 SCOTT      ANALYST	      7566 19-APR-87	   3000 		   20
      7839 KING       PRESIDENT 	   17-NOV-81	   5000 		   10
      7844 TURNER     SALESMAN	      7698 08-SEP-81	   1500 	 0	   30
      7876 ADAMS      CLERK	      7788 23-MAY-87	   1100 		   20

     EMPNO ENAME      JOB	       MGR HIREDATE	    SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7900 JAMES      CLERK	      7698 03-DEC-81	    950 		   30
      7902 FORD       ANALYST	      7566 03-DEC-81	   3000 		   20
      7934 MILLER     CLERK	      7782 23-JAN-82	   1300 		   10

14 rows selected





三、分区外部表

Oracle Database 12c 第 2 版 (12.2) 中引入了分区外部表,允许外部表受益于分区修剪和分区明智连接。除了哈希分区之外,许多分区和子分区策略都受到一些限制的支持。
Partitioned External Tables

3.1、开始

为了演示外部表,我们需要一些平面文件中的数据。以下代码假脱机输出四个 CSV 文件,每个文件有 1000 行。

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

SET MARKUP CSV ON QUOTE ON
SET TRIMSPOOL ON LINESIZE 1000 FEEDBACK OFF PAGESIZE 0

SPOOL /tmp/gbr1.txt
SELECT 'GBR',
       object_id,
       owner,
       object_name
FROM   dba_objects
WHERE  object_id <= 2000 AND rownum <= 1000; SPOOL OFF SPOOL /tmp/gbr2.txt SELECT 'GBR', object_id, owner, object_name FROM dba_objects WHERE object_id BETWEEN 2000 AND 3999 AND rownum <= 1000; SPOOL OFF SPOOL /tmp/ire1.txt SELECT 'IRE', object_id, owner, object_name FROM dba_objects WHERE object_id <= 2000 AND rownum <= 1000; SPOOL OFF SPOOL /tmp/ire2.txt SELECT 'IRE', object_id, owner, object_name FROM dba_objects WHERE object_id BETWEEN 2000 AND 3999 AND rownum <= 1000; SPOOL OFF SET MARKUP CSV OFF SET TRIMSPOOL ON LINESIZE 1000 FEEDBACK OFF PAGESIZE 14


创建两个目录对象来访问这些文件。在这种情况下,两者都指向同一个目录,但它仍然允许我们看到语法。

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

SET MARKUP CSV ON QUOTE ON
SET TRIMSPOOL ON LINESIZE 1000 FEEDBACK OFF PAGESIZE 0

SPOOL /tmp/gbr1.txt
SELECT 'GBR',
       object_id,
       owner,
       object_name
FROM   dba_objects
WHERE  object_id <= 2000 AND rownum <= 1000; SPOOL OFF SPOOL /tmp/gbr2.txt SELECT 'GBR', object_id, owner, object_name FROM dba_objects WHERE object_id BETWEEN 2000 AND 3999 AND rownum <= 1000; SPOOL OFF SPOOL /tmp/ire1.txt SELECT 'IRE', object_id, owner, object_name FROM dba_objects WHERE object_id <= 2000 AND rownum <= 1000; SPOOL OFF SPOOL /tmp/ire2.txt SELECT 'IRE', object_id, owner, object_name FROM dba_objects WHERE object_id BETWEEN 2000 AND 3999 AND rownum <= 1000; SPOOL OFF SET MARKUP CSV OFF SET TRIMSPOOL ON LINESIZE 1000 FEEDBACK OFF PAGESIZE 14


3.2、创建分区外部表


以下示例根据我们之前创建的 CSV 文件创建一个列表分区外部表。每个分区可以有一个单独的位置定义,其中可以选择包含目录定义。如果未定义位置,则分区将被视为空。

CONN test/oracle@pdb1

DROP TABLE part_tab_ext;

CREATE TABLE part_tab_ext (
  country_code  VARCHAR2(3),
  object_id     NUMBER,
  owner         VARCHAR2(128),
  object_name   VARCHAR2(128)
)
ORGANIZATION EXTERNAL (
  TYPE oracle_loader
  DEFAULT DIRECTORY tmp_dir1
  ACCESS PARAMETERS (
    FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    MISSING FIELD VALUES ARE NULL (
      country_code,
      object_id,
      owner,
      object_name 
    )
  )
)
REJECT LIMIT UNLIMITED
PARTITION BY LIST (country_code) (
  PARTITION part_gbr    VALUES ('GBR') LOCATION ('gbr1.txt', 'gbr2.txt'),
  PARTITION part_usa    VALUES ('IRE') DEFAULT DIRECTORY tmp_dir2 LOCATION ('ire1.txt', 'ire2.txt'),
  PARTITION part_others VALUES ('XXX')
);


查询外部表显示文件正在被正确读取。

SELECT country_code, COUNT(*)
FROM   part_tab_ext
GROUP BY country_code
ORDER BY country_code;

COU   COUNT(*)
--- ----------
GBR	  2000
IRE	  2000


如果我们收集统计数据,我们可以看到行是如何在分区之间分布的。

EXEC DBMS_STATS.gather_table_stats(USER, 'part_tab_ext');

SET LINESIZE 120
COLUMN table_name FORMAT A20
COLUMN partition_name FORMAT A20
COLUMN high_value FORMAT A20

SELECT table_name,
       partition_name,
       high_value,
       num_rows
FROM   user_tab_partitions
WHERE  table_name = 'PART_TAB_EXT'
ORDER BY 1, 2;

TABLE_NAME	     PARTITION_NAME	  HIGH_VALUE		 NUM_ROWS
-------------------- -------------------- -------------------- ----------
PART_TAB_EXT	     PART_GBR		  'GBR' 		     2000
PART_TAB_EXT	     PART_OTHERS	  'XXX' 			0
PART_TAB_EXT	     PART_USA		  'IRE' 		     2000


3.3、子分区外部表


以下示例根据我们之前创建的 CSV 文件创建一个列表范围子分区外部表。每个子分区可以有一个单独的位置定义,其中可以选择包含分区或子分区级别的目录定义。

CONN test/test@pdb1

DROP TABLE subpart_tab_ext;

CREATE TABLE subpart_tab_ext (
  country_code  VARCHAR2(3),
  object_id     NUMBER,
  owner         VARCHAR2(128),
  object_name   VARCHAR2(128)
)
ORGANIZATION EXTERNAL (
  TYPE oracle_loader
  DEFAULT DIRECTORY tmp_dir1
  ACCESS PARAMETERS (
    FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    MISSING FIELD VALUES ARE NULL (
      country_code,
      object_id,
      owner,
      object_name 
    )
  )
)
REJECT LIMIT UNLIMITED
PARTITION BY LIST (country_code)
SUBPARTITION BY RANGE (object_id) (
  PARTITION part_gbr VALUES ('GBR') (
    SUBPARTITION subpart_gbr_le_2000 VALUES LESS THAN (2000) LOCATION ('gbr1.txt'),
    SUBPARTITION subpart_gbr_gt_2000 VALUES LESS THAN (MAXVALUE) DEFAULT DIRECTORY tmp_dir2 LOCATION ('gbr2.txt')
  ),
  PARTITION part_ire VALUES ('IRE') DEFAULT DIRECTORY tmp_dir2 (
    SUBPARTITION subpart_ire_le_2000 VALUES LESS THAN (2000) LOCATION ('ire1.txt'),
    SUBPARTITION subpart_ire_gt_2000 VALUES LESS THAN (MAXVALUE) LOCATION ('ire2.txt')
  )
);


查询外部表显示文件正在被正确读取。

SELECT country_code, COUNT(*)
FROM   subpart_tab_ext
GROUP BY country_code
ORDER BY country_code;

COU   COUNT(*)
--- ----------
GBR	  2000
IRE	  2000


如果我们收集统计数据,我们可以看到行是如何在分区之间分布的。