Serializability theory is a mathematical
tool with the capability of proving whether or not a scheduler works in the
correct way (Zhang & Elmagarmid, 1993). The theory represents a concurrent
execution as a set o0f transactions using a structure known as a history. The
theory offers some properties that a history needs to meet for it to be
serializable. Consider a database d=(x,
y, z), where you need to perform a series of transactions concurrently and the
transactions include T1, T2….., Tn. We
need a way of knowing if the execution of the transactions took place in the
right manner. An execution is correct if only it is similar to some serial
execution of the transactions. We make use of logs to determine whether a
concurrent execution was serializable. A log is a record of operations
performed by each transaction. The logs
are central to the study of Serializability.
Two logs are equal if the executions that
produced those logs leave the database the same state as it was initially. For instance, each read operation should read
from the same write operation in both of the logs or both logs should contain
similar final writes. Such logs are what
we consider to be equivalent. To find
out if a log is serializable, we draw a serialization graph. We construct that
graph as follows. Let transactions T1 to
Tn be nodes in the graph. There will be a directed graph between Ti to Tj if
only for some x one of the following rules hold.
o
Ri[x]<wj[x] or
o
Wi[x]<rj[x] or
o
Wi[x]<wj[x].
The serializable theory states that a
log M is serializable if and only if SG (M) (serialization graph) is acyclic
(Microsoft.com, n.d). If we can determine a serial history, H, consistent with
all edges in SG (Hi), we can conclude that Hi is equivalent to H. That is how
the theory works in controlling concurrency and through that it can control
both write and read operations as long as we can construct a serialization
graph that is acyclic.
Why
it is desirable to integrate Hadoop with big data to support big data analytic
Organizations are making use of
real-time big data analytics to reorganize the landscape of their industries.
They achieve that via the capturing, analyzing and storing volumes data
previously unmanageable and from that analysis they can extract insights that
can aid in supporting real-time business processes. They use Apache Hadoop in
achieving that. Those businesses have realized that the use of Hadoop can help
analyze big volumes of data without even paying regard to the chronology of
that data as Hadoop provides excellent means to reorganize it perfectly. Big data, Hadoop, and advanced analytics are
very useful when integrating as they help in the formation of evolving analytics
ecosystem (SAS Institute Inc., n.d). The integration of Hadoop in big data
analysis helps organizations to have real-time analytics and consequently
maximize their business values. For example with Hadoop, enterprises can
analyze click stream trails of online clients in conjunction with historical
buying patterns to provide personalized information to those customers. The integration of Hadoop with big data helps
provide deep analysis across a variety of datasets and this in turn improves
outcomes in such cases. It makes it
possible to provide quick results, thus impacting the online transactions
positively. There are analytic algorithms such as well as predictive analytics
in Hadoop that aid big data analytics for big data analytics (TDWI, 2014).
Hadoop enables the performance of
queries as well as data capacity to scale in a cost-effective manner across
hundreds of two-socket servers based on Intel Xeon processor that has an
attachment of direct storage drives. With the integration of Hadoop with big
data, there is the provision of hot replication whereby multiple replicas of
the often used data have automatic creation, and that avoids contention. A
company can launch a popular product, and the product’s associated data is in
continuous demand. Hundreds of replicas can have generation and manipulation
without any bottlenecks with the help of Hadoop. Once the big data is in
Hadoop, companies can perform traditional ETL tasks or normalizing,
aggregating, cleansing and aligning data for their EDW by employing the
MapReduce’s massive scalability (1105 Media Inc., 2014). Hadoop helps analytics
to avoid transformational issues in their traditional ETLT as it enables the
off-loading of ingestion, integration and transformation of unstructured data
into their warehouses.
The technology’s integration into big
data to support analysis is imperative as it is a suitable fit for iterative
analysis that traditionally required the building of a data warehouse. The SQL
on Hadoop does not replace data warehouse, but it offers an alternative to more
expensive software and appliances needed for particular types of analysis (Marian
& Thompson, 2014). The presence of SQL in Hadoop provides the way for
enterprises to avoid the costly high-end business analysts and data scientists.
Intense analysis of big data requires data to be present in the right place
when in need. Moving data across systems is costly and time consuming, and that
culminates to the slowing down of business operations. Hadoop makes the
performance of big data analysis where the data sits without having to move it.
References
1105
Media Inc. (2014). TDWI checklist report// eight considerations for utilizing
big data analytics with Hadoop.
Marian,
G. & Thompson, W. (2014). Big data analytics and Hadoop.
Microsoft.com
(n.d). Serializability theory.
SAS
Institute Inc. (n.d). Hadoop.
TDWI
(2014). Eight considerations for utilizing big data analytics with Hadoop
(On-Demand Webinar).
Zhang,
A. & Elmagarmid, K. (1993). A theory of global concurrency control in
multidatabase systems. VLDB journal, 2, 331-360.
Carolyn Morgan is the author of this paper. A senior editor at MeldaResearch.Com in legitimate essay writing service. If you need a similar paper you can place your order from research paper services.
No comments:
Post a Comment