Introduction
A distributed file system is described
in a client/server-based application used by the clients to get access and
process data that is stored on the server making the data appear as it is on
the user computer machine. When a user does access the file, the server sends
the copy of the file to the user by caching the file on user's computer. The
data is processed and then returned to the server later. Ideally, a distributed
file system does organize the file and directory services of the particular
individual servers into a global directory for allowing the remote data access.
The systems get all users of the global file system to access all files, and
this makes the organization hierarchical and directory-based. At this point,
the same data is accessed by many clients simultaneously. It, therefore,
requires the server to develop some mechanisms of maintaining the data such
that it is updated and can get accessed at anytime and anywhere. Therefore,
several algorithms are always adopted. The Sun Microsystems' Network File
System (NFS), Microsoft's Distributed File System, Novell NetWare, and
IBM/Transarc's DFS describes examples of distributed file systems.
Distributed file systems is an area that
any organization should wish to get engaged. It goes by the good benefits that
the distributed systems offer the organizations. Primarily the DFS enables the
easy distribution of documents/ files to multiple clients who get access to the
files from a centralized place where they also get to store the information.
The DFS has a unified namespace that works to link the shared folders on the
different servers where it creates a hierarchical structure that acts as a
single high-capacity hard disk that users can access files from. The DFS
implements the location transparency by the fact it simplifies the data
migration from one server to another without the knowledge of the users to know
the physical location of the data source. There is storage scalability with
distributed file systems as one can deploy additional high performing servers
as the new folders from which the files are stored. The DFS system does support
the offline folders whereby an end user can automatically cache programs so
that he/she can run the programs on the local computer machine instead of
running the application from the server. Also, it allows the security
integrations one is not required to configure additional security for the DFS
namespaces since the file and folder accessibility can be enforced by the
existing NTFS from which it can share the permissions on each established link
target (TechNet, 2003).
Description of
algorithms
Remote
Differential Compression (RDC) algorithm
The Remote Differential Compression
(RDC) algorithm allows data synchronization with a remote source by the aid of
the compression techniques with the aim of significantly minimizing the amount
of data being sent across existing network. The algorithm is designed to get
used with different data transmission modes protocols such as the RPC and HTTP.
The RDC application is responsible for choosing the appropriate route for
transporting the data between the servers and the clients’ machines. It is also
responsible for performing any client or server authentication that is
responsible for supporting the transport's security model (Microsoft, 2007).
RDC algorithm divides file data into
chunks by means computing the local maxima of a fingerprinting function. The
fingerprinting function is computed at every file’s byte position. A
fingerprinting function is a hash function always computed incrementally.
Conceptually, if computing function F over a given range of bytes basically
from file, Ai...Aj, then it’s true to compute function F from file Ai+1….Aj+1
in an incrementing manner by simply adding the byte Aj+1 and also subtracting
the byte Ai. In a typical RDC algorithm scenario, the server and client have
different file versions. The client's file copy is called the seed file while
the server's copy is known as the source file. The core principle of the RDC
application is to get the file updates downloaded to the client, of which the
updates are used to construct the target file which comprises of the updates
from the source file and the unchanged data contents from the seed file.
Load
Balancing Algorithm in Distributed File System
The Distributed File System is regarded
as the key component in the cloud computing application basing on the as the
building block in the Map Reduce Programming. The Distributed File System nodes
simultaneously serve for the computing function and at the same time storage
function. The file is further subdivided into some chunks that are allocated in
the distinct DFS nodes. The load balancing algorithm enables the adding of
thousands of nodes together in the large cloud. The primary goal of allocating
the files is to make sure that any significant load is not created to any of
the nodes since the files in the cloud are differently partitioned in squares
that completely measure off the different modules. The other subsequent
objective of the load balancing algorithm in the distributed file system is to
back the defined network traffic attribute and the network inconsistencies so
as to unbalance the hundreds of nodes. A large amount of application can get
run within the cloud as a result of a reduction in the network information
measures. The algorithm has the quantifiability property that can enable one to
add, delete, update and new nodes so as to support the heterogeneity of the
system (Kalahasti & Velvizhi, 2014).
The
rsync algorithm
The rsync algorithm has been credibly
applied for the updating of a file on one machine so as to make sure that the
file is identical to the other file on another existing machine. The algorithm
is applied with the assumption that the two existing machines have an
established connection by the low-bandwidth and a high-latency bi-directional
communications link. This rsync algorithm does identify the parts of a source
file from the server which is regarded as identical to some sections of the
destination file (client machine) and only specifically sends those file
sections that cannot get matched in such a way. Efficiently and effectively,
the rsync algorithm computes a set of existing differences without the
existence of both files on the same working machine. The rsync algorithm works
best when the available files are similar and identical, but the algorithm will
also function well, correctly and reasonably efficiently at times when the
files are total quite different (Tridgell & Mackerras, 1996).
Comparison and analysis of algorithms
The comparison and analysis of the
algorithm will get defined by the various aspects that will get the distributed
file systems deliver the anticipated objectives. In this case, the comparison
and analysis were based on the algorithm impact to facilitating scalability,
transparency, availability, reliability and impacts to the network traffic. The
scalability describes the ability of the distributed system to grow without
disturbing or interrupting the system’s performance. The transparency is
related to the integrity such that the end-user should not be able to
understand and locate the source file location instead remain to know the files
are located within the client machine. Availability means that in the case of
technical occurrence the system should remain functioning it goes hand in hand
with the reliability that the system stands to deliver the services in
difficult circumstances (Depardon, Mahec, & Seguin, 2013).
The table below describes the comparison
and analysis of the three described algorithms against the highlighted aspects
of the distributed systems. The algorithms are rated as either high, medium or
low.
Algorithm
|
Scalability
|
Transparency
|
Availability
|
Reliability
|
Network traffic
|
RDC
|
High
|
Low
|
Medium
|
Medium
|
Medium
|
Load
balancing
|
High
|
Low
|
High
|
High
|
High
|
rsync
|
High
|
Medium
|
High
|
Low
|
Low
|
Conclusion
The described algorithm comes with their
positive impacts to ensuring that the file sharing is well achieved.
Distributed file system is a technique that many organizations are adopting.
The RDC algorithm is good at ensuring the scalability with the averagely rated
for the other elements. The Load balancing algorithm is a good algorithm that
tries to balance everything to make sure that the service is accordingly. On
the other side the rsync algorithm has is good application especially when it
comes to the scalability of the distributed file systems. However, the
algorithms should get advanced to make sure that they reflect all highly to the
critical elements mentioned.
References
Depardon, B., Mahec, G. L. &
Seguin, C. (2013). Analysis of Six Distributed
File Systems. [Research Report], pp.44.<hal-00789086>
Kalahasti, K. P. & Velvizhi, N. (2014). Load Balancing Algorithm in Distributed File System: A Survey.
Microsoft,
(2007). About Remote Differential Compression.
TechNet, (2003). Reviewing
the Benefits of Using DFS. Retrieved from https://technet.microsoft.com/en-us/library/cc739590%28v=ws.10%29.aspx
Tridgell,
A. & Mackerras, P. (1996). The rsync
algorithm.
Sherry Roberts is the author of this paper. A senior editor at Melda Research in legitimate research paper writing services if you need a similar paper you can place your order for buy narrative essay.
No comments:
Post a Comment