A distributed database is stored on many sites and usually uses the relational data model. It is composed by global tables. A global table of a distributed database can be horizontally fragmented. The fragments can be replicated on the sites of the distributed database. In order to execute a distributed query, data is transferred between sites. In the paper L. Tambulea, A. S. Darabant, V. Varga: "Query Evaluation Optimization in a Distributed Database using Data Reorganization" we propose two methods (mathematical models) to minimize data transfer cost based on information about database storage (the fragments of the database, the dimension of fragments, the sites where the fragments are stored) and query execution (sites where the queries are initiated, for every query the fragments used, an estimation of the queries results obtained by the evaluation of relational algebra operators). These information are generated by Oracle procedures and are stored in tables. Some views were defined in order to obtain information about fragments, sites and queries.
If the fragments which are necessary to execute a relational operator are not stored in the site where the execution of the operator is initiated, the fragments have to be transferred, which implies a data transfer cost. These data transfer costs needed for query executions are determined by some views, which are based on information about fragments, sites and queries.
The proposed methods solve the next problems:
For each method a linear programming problem is genarated. These mathematical programming problems are generated by procedures from Oracle package.
In the following there are presented the resources used, the phases of processing and the results obtained from experimental tests of the proposed methods.
Nr. | Generating data | Available space | Generated files | Data transfer final cost |
---|---|---|---|---|
1 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(50, 20, 10000, 100) Initial data transfer cost: 103 MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (817 variables) The solution of the problem Insert commands generated by the problem |
Cost: 78 MB (75.65% from initial cost) |
update sites set dimlib=trunc(dims*1.5); |
The mathematical programming problem (817 variables) The solution of the problem Insert commands generated by the problem |
Cost: 66 MB (64.04% from initial cost) |
||
2 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(100, 20, 100000, 1000) Initial data transfer cost: 9612MByte |
update sites set dimlib=dims; |
The mathematical programming problem (1538 variables) The solution of the problem Insert commands generated by the problem |
Cost: 6700MB (69.70% from initial cost) |
update sites set dimlib=trunc(dims*1.5); |
The mathematical programming problem (1538 variables) The solution of the problem Insert commands generated by the problem |
Cost: 5269MB (54.81% from initial cost) |
||
3 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(200, 20, 100000, 10000) Initial data transfer cost: 103397MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (3179 variables) The solution of the problem Insert commands generated by the problem |
Cost: 74157MB (71.72% from initial cost) |
update sites set dimlib=trunc(dims*1.5); |
The mathematical programming problem (3179 variables) The solution of the problem Insert commands generated by the problem |
Cost: 60133MB (58.16% from initial cost) |
||
4 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(300, 20, 100000, 10000) Initial data transfer cost: 100608MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (4703 variables) The solution of the problem Insert commands generated by the problem |
Cost: 70179MB (69.75% from initial cost) |
update sites set dimlib=trunc(dims*1.5); |
The mathematical programming problem (4703 variables) The solution of the problem Insert commands generated by the problem |
Cost: 55845MB (55.51% from initial cost) |
Nr. | Generating data | Available space | Generated files | Data transfer final cost |
---|---|---|---|---|
1 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(50, 20, 10000, 100) Initial data transfer cost: 102MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (1000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 100MB (97.72% from initial cost) |
update sites set dimlib=2*dims; |
The mathematical programming problem (1000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 76MB (74.26% from initial cost) |
||
update sites set dimlib=trunc(dims*2.5); |
The mathematical programming problem (1000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 64MB (62.62% from initial cost) |
||
2 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(100, 20, 10000, 1000) Initial data transfer cost: 9612MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (2000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 9480MB (98.62% from initial cost) |
update sites set dimlib=2*dims; |
The mathematical programming problem (2000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 6621MB (68.88% from initial cost) |
||
update sites set dimlib=trunc(dims*2.5); |
The mathematical programming problem (2000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 5222MB (54.33% from initial cost) |
||
3 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(200, 20, 100000, 10000) Initial data transfer cost: 103397MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (4000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 101206MB (97.88% from initial cost) |
update sites set dimlib=2*dims; |
The mathematical programming problem (4000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 73096MB (70.69% from initial cost) |
||
update sites set dimlib=trunc(dims*2.5); |
The mathematical programming problem (4000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 59434MB (57.48% from initial cost) |
||
4 | Commands for generating data (fragm.nr., nodes nr., queries nr., max.reps)=(300, 20, 100000, 10000) Initial data transfer cost: 100608MBytes |
update sites set dimlib=dims; |
The mathematical programming problem (6000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 97658MB (97.07% from initial cost) |
update sites set dimlib=2*dims; |
The mathematical programming problem (6000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 68857MB (68.44% from initial cost) |
||
update sites set dimlib=trunc(dims*2.5); |
The mathematical programming problem (6000 variables) The solution of the problem Insert commands generated by the problem 2 |
Cost: 54882MB (54.55% from initial cost) |