Thursday, October 18, 2018

Distributed file Systems


Introduction
A distributed file system is described in a client/server-based application used by the clients to get access and process data that is stored on the server making the data appear as it is on the user computer machine. When a user does access the file, the server sends the copy of the file to the user by caching the file on user's computer. The data is processed and then returned to the server later. Ideally, a distributed file system does organize the file and directory services of the particular individual servers into a global directory for allowing the remote data access. The systems get all users of the global file system to access all files, and this makes the organization hierarchical and directory-based. At this point, the same data is accessed by many clients simultaneously. It, therefore, requires the server to develop some mechanisms of maintaining the data such that it is updated and can get accessed at anytime and anywhere. Therefore, several algorithms are always adopted. The Sun Microsystems' Network File System (NFS), Microsoft's Distributed File System, Novell NetWare, and IBM/Transarc's DFS describes examples of distributed file systems.
Distributed file systems is an area that any organization should wish to get engaged. It goes by the good benefits that the distributed systems offer the organizations. Primarily the DFS enables the easy distribution of documents/ files to multiple clients who get access to the files from a centralized place where they also get to store the information. The DFS has a unified namespace that works to link the shared folders on the different servers where it creates a hierarchical structure that acts as a single high-capacity hard disk that users can access files from. The DFS implements the location transparency by the fact it simplifies the data migration from one server to another without the knowledge of the users to know the physical location of the data source. There is storage scalability with distributed file systems as one can deploy additional high performing servers as the new folders from which the files are stored. The DFS system does support the offline folders whereby an end user can automatically cache programs so that he/she can run the programs on the local computer machine instead of running the application from the server. Also, it allows the security integrations one is not required to configure additional security for the DFS namespaces since the file and folder accessibility can be enforced by the existing NTFS from which it can share the permissions on each established link target (TechNet, 2003).
Description of algorithms
Remote Differential Compression (RDC) algorithm
The Remote Differential Compression (RDC) algorithm allows data synchronization with a remote source by the aid of the compression techniques with the aim of significantly minimizing the amount of data being sent across existing network. The algorithm is designed to get used with different data transmission modes protocols such as the RPC and HTTP. The RDC application is responsible for choosing the appropriate route for transporting the data between the servers and the clients’ machines. It is also responsible for performing any client or server authentication that is responsible for supporting the transport's security model (Microsoft, 2007).
RDC algorithm divides file data into chunks by means computing the local maxima of a fingerprinting function. The fingerprinting function is computed at every file’s byte position. A fingerprinting function is a hash function always computed incrementally. Conceptually, if computing function F over a given range of bytes basically from file, Ai...Aj, then it’s true to compute function F from file Ai+1….Aj+1 in an incrementing manner by simply adding the byte Aj+1 and also subtracting the byte Ai. In a typical RDC algorithm scenario, the server and client have different file versions. The client's file copy is called the seed file while the server's copy is known as the source file. The core principle of the RDC application is to get the file updates downloaded to the client, of which the updates are used to construct the target file which comprises of the updates from the source file and the unchanged data contents from the seed file.
Load Balancing Algorithm in Distributed File System
The Distributed File System is regarded as the key component in the cloud computing application basing on the as the building block in the Map Reduce Programming. The Distributed File System nodes simultaneously serve for the computing function and at the same time storage function. The file is further subdivided into some chunks that are allocated in the distinct DFS nodes. The load balancing algorithm enables the adding of thousands of nodes together in the large cloud. The primary goal of allocating the files is to make sure that any significant load is not created to any of the nodes since the files in the cloud are differently partitioned in squares that completely measure off the different modules. The other subsequent objective of the load balancing algorithm in the distributed file system is to back the defined network traffic attribute and the network inconsistencies so as to unbalance the hundreds of nodes. A large amount of application can get run within the cloud as a result of a reduction in the network information measures. The algorithm has the quantifiability property that can enable one to add, delete, update and new nodes so as to support the heterogeneity of the system (Kalahasti & Velvizhi, 2014).
The rsync algorithm
The rsync algorithm has been credibly applied for the updating of a file on one machine so as to make sure that the file is identical to the other file on another existing machine. The algorithm is applied with the assumption that the two existing machines have an established connection by the low-bandwidth and a high-latency bi-directional communications link. This rsync algorithm does identify the parts of a source file from the server which is regarded as identical to some sections of the destination file (client machine) and only specifically sends those file sections that cannot get matched in such a way. Efficiently and effectively, the rsync algorithm computes a set of existing differences without the existence of both files on the same working machine. The rsync algorithm works best when the available files are similar and identical, but the algorithm will also function well, correctly and reasonably efficiently at times when the files are total quite different (Tridgell & Mackerras, 1996).
 Comparison and analysis of algorithms
The comparison and analysis of the algorithm will get defined by the various aspects that will get the distributed file systems deliver the anticipated objectives. In this case, the comparison and analysis were based on the algorithm impact to facilitating scalability, transparency, availability, reliability and impacts to the network traffic. The scalability describes the ability of the distributed system to grow without disturbing or interrupting the system’s performance. The transparency is related to the integrity such that the end-user should not be able to understand and locate the source file location instead remain to know the files are located within the client machine. Availability means that in the case of technical occurrence the system should remain functioning it goes hand in hand with the reliability that the system stands to deliver the services in difficult circumstances (Depardon, Mahec, & Seguin, 2013).
The table below describes the comparison and analysis of the three described algorithms against the highlighted aspects of the distributed systems. The algorithms are rated as either high, medium or low.


Algorithm
Scalability
Transparency
Availability
Reliability
Network traffic
RDC
High
Low
Medium
Medium
Medium
Load balancing
High
Low
High
High
High
rsync
High
Medium
High
Low
Low
Conclusion
The described algorithm comes with their positive impacts to ensuring that the file sharing is well achieved. Distributed file system is a technique that many organizations are adopting. The RDC algorithm is good at ensuring the scalability with the averagely rated for the other elements. The Load balancing algorithm is a good algorithm that tries to balance everything to make sure that the service is accordingly. On the other side the rsync algorithm has is good application especially when it comes to the scalability of the distributed file systems. However, the algorithms should get advanced to make sure that they reflect all highly to the critical elements mentioned.
References
Depardon, B., Mahec, G. L. & Seguin, C. (2013). Analysis of Six Distributed File Systems. [Research Report], pp.44.<hal-00789086>

Kalahasti, K. P. & Velvizhi, N. (2014). Load Balancing Algorithm in Distributed File System: A Survey. 

Microsoft, (2007). About Remote Differential Compression. 
TechNet, (2003). Reviewing the Benefits of Using DFS. Retrieved from https://technet.microsoft.com/en-us/library/cc739590%28v=ws.10%29.aspx
Tridgell, A. & Mackerras, P. (1996). The rsync algorithm


Sherry Roberts is the author of this paper. A senior editor at Melda Research in legitimate research paper writing services if you need a similar paper you can place your order for buy narrative essay.

No comments:

Post a Comment

Buy thesis Online for Cheap

We are keen on ensuring that, any time students Buy thesis Online papers from our website, they get good grades that align with their expec...