Sunday, October 27, 2019
Characteristics of Java Language
Characteristics of Java Language Chapter 2 Literature review About Java:- Initially java language is named as Oak in 1991, which is designed for the consumer electronic appliances. Later in 1995 the name was changed to Java. Java was developed by James Gosling, a development leader in sun micro system. Oak was redesigned in 1995 and changed the name to java for the development of the applications which can be run over internet. Using the java language, java programs can be embedded in to the html pages. Java is not only limited for the web applications, it is also useful to develop the stand alone applications. Java has a feature called OOPs, which make it more familiar. Object oriented programming replaced the old traditional techniques i.e. procedural programming. Characteristics of java:- Simple:- Java language is simple than the previous languages such as c and c++. Java eliminates the pointers concept which is earlier present in c and c++. Java also has a properties i.e. automatic allocation of memory and garbage collection, where as in c/c++ the garbage collection and allocation of memory will be done by the programmer which is a complex task. Object oriented:- All the programming languages apart from the c++ are procedural languages which are paradigm of procedures. Java programming language is object oriented because java uses the concept of the object. In java everything will depend on objects i.e. creating the objects and making objects to work together. The overall functionality of the high level program will depends on the objects. Because java is object oriented program it provides great range of reusability, modularity and flexibility. Distributed:- Java uses the http and ftp which are internet protocols, in order to have access the files over the network. So by using this libraries which are in java can easily make file transfers over the network which is connected to internet. Interpreted:- In order to run the java programs we need interpreter. When the java programs are compiled it produces the byte code, which is machine understandable language. The byte code which is produced after the compilation is machine independent, so that it can run on any system using java interpreter. Most of the compilers will convert the high level language instructions to the low-level machine understandable language as machine cant understand the high level instruction. The machine code can only be executed on that compiled native machine. For example a source code is compile on windows platform, the executable file produced after cant be executed on other platforms apart from the windows. But, coming to java it is different i.e. the source code is compiled once and the executable byte code can be run on any platform using java interpreter. The main functionality of the interpreter is, it converts the byte code to the machine language of the target machine. Robust and secure:- Java programming is more reliable. At the time of the execution time java shows all the errors. In java bad and error prone language constructs are eliminated. Java eliminated the concepts such as pointers, due to this there is no corruption of data and overwriting the memory locations. In the same way java supports the exception-handling, which makes java more reliable and robust. Java forces the programmer to write the code for the exceptions, which may occur during the execution of the program. So that program can be terminated successfully, without any error stopping the execution flow of the program. Java also provides the lot of security. Security is important over the network because the computer will be attacked by the external program. Java provides the security that; it encounters the applets for the un-trusted sources. Architecture- neutral:- Java is a interpreted language, which enables java as a architectural neutral i.e. platform independent. We can write the program once and it can be executed on any platform with the help of the Java Virtual Machine (JVM). The java virtual machine can be embedded on the operating system or on web browser. Once the part of the java code is loaded into the machine, it is verified. Byte code verification play a major role, as it check all the code generated by the compiler will not corrupt the machine on which the code is loaded. At the end of the compilation, byte code verification will be done; in order to make sure thats the code is accurate and correct. So the byte code verification is the integral to the compilation and execution. Due to the property of architectural neutral had by java, it is portable. The program once written can be run on any platform without recompilation. Java does not provide any platform specific features. In other languages, such as Ada where the large integer varies according to the platform it runs. But in the case of java the range of the numbers are fixed. Java environment is portal to every operating system and hardware. Multi-threaded:- It is defined as the programs ability to perform several tasks (or) functions simultaneously. The multithreading property is embedded in the java program. Using the java programs we can perform the several tasks simultaneously without calling any procedures of the operating system, which is done by the other programming languages in order to perform the multi-threading. Constant Pool:- Every program i.e. class in java, has a array of constants in the heap memory called as the constant pool, which is available to that class. Usually it is created by the java compiler. The constants encode all the name of the (methods, variables and constant that are presented in the constant pool) which is used by particular method of any class. Each individual class i.e. stored in heap memory has a count of how many constants are there and also has offset which specifies how far in to the class description itself the array of constants begins (Laura Lemay, Charles L.Perkins, and Micheal Morrison, n.d). The constants are represented (or) typed in the special coded bytes and which has a very well defined format, when these constants are appeared in the .class file for the java class file. JVM instructions refer to the symbolic information in java, rather than relying on the run time layouts of the class, methods and fields. Sun Java Wireless Toolkit:- Sun java wireless toolkit CLDC (connected Limited Device Configuration) is a group of tools which is used to develop the applications for the mobiles and for other wireless equipments (or) devices. Although the sun java wireless toolkit is based on the MIDP (Mobile Information Device Profile), it also supports many other optional packages, which make a sun java wireless toolkit as a great tool for developing many applications. It can be supported on the windows and Linux. All the users who have account on the host machine can access this tool either singly or simultaneously. It allows you to use a byte code obfuscator to reduce the size of your MIDlet suite JAR file. It also supports many other standard Application Programming Interfaces (APIs) which are defined by the (JCP) Java Community Process program. Even though, the sun java wireless toolkit did not come up with an obfuscator, it is configured in a way that it supports the ProGaurd. All you need to do is, just simply to download the ProGuard and place it in the system, which sun java wireless tool kit can find it. But due to the flexible nature of the tool, it allows any kind of the obfuscator. BCEL:- BCEL full abbreviation is Byte Code Engineering library. The BCEL helps you to dig the byte code of the java classes. BCEL gives the utmost power on the code because it works at the individual JVM instructions, even though the power comes with cost in complexity. Using the BCEL, we can transform the existing classes transformation or we can construct the new classes. The main difference between the BCEL and Javassist is javassist provides the source code interface where as the BCEL is developed in the intension to work at the level of the JVM assembly language. BCEL is good because the approach it uses is low level, which is very helpful to control the program at the instruction level. Compared to Javassist it is more complex to work with the BCEL. BCEL has the capability to inspect, to edit and to create binary classes in java. There are 2 hierarchy components in the BCEL, in which one component is used to create the new code and the other component is used to edit (or) update the existing code. The inspection of the class aspect in the BCEL mainly deals with the duplication whatever available in the java platform using the Reflection API. This duplication is necessary (or) mandatory in classworking because we generally dont want to load the classes on which we are working until they are modified fully. Org.apache.bcel.classfile package provides all the definition which is related to inspection-related code.org.apache.bcel package provides the basic constant definitions. JavaClass is a class which is the starting point of the package. The JavaClass plays a role in accessing the information of the class using the BCEL same as like java,lang.Class does using the regular reflection in java. The JavaClass has a methods to get the information like structural information about the super classes and interfaces, to get the information of the class i.e. information about the field and methods in the class. The JavaClass will provide access to the some internal information about the class, including constant pool and identifiers. It also represents the Byte stream which is the complete binary class representation. If the actual binary class is parcel, then we can create the instance for the JavaClass. To handle the parsing BCEL provides a class called org.apache.bcel.Respository. The representation of the classes are parsed and cached by the BCEL by default, which are on the JVM path, to get the actual binary classes representation from the org.apache.bcel.util.respository instance. org.apache.bcel.util.respository is an interface which is source for binary classes representations. Changing the classes:- Not only the accessing the components of class, org.apache.bcel.Classfile.JavaClass also provide certain methods, in order to provide the liberty to change (or) alter the classes. The class component can be set to the new values by using those methods. Although those are of no direct use much, because the other classes in the package dont support constructing the new versions of the components that are building. There are certain classes in the org.apache.bcel.generic package that will provide the editable versions of the same components there in the org.apache.bcel.classfile classes. Org.apache.bcel.generic.ClassGen is the starting step (or) point for the creating the new classes. This also useful to modify the existing classes, to do this one, there is a constructor that takes a JavaClass Instance in order to initialize ClassGen class information. Once you modified the changes to the class, then we get the usable (or) useful class representation from ClassGen instance, in order get the usable representation of the class, we need to call any method that returns the class called JavaClass. Later it will be converted into the binary class information. It is little bit confusing, in order to eliminate this confusion, it is better to write a wrapper class for eliminating some differences. In order to manage the construction of the various class components, org.apache.bcel.generic provides many other classes apart from the ClassGen. It has a class called ConstantPoolGen , which is used to handle the constant pool. FieldGen, MethodGen classes which are used to handle the Fields and the methods in classes. For the working with the sequence of the JVM instructions there is other class called Instruction List. org.apache.bcel.generic also provides the classes for the each and every type instructions which are executed over JVM. We can create the instance for these classes directly some times and in other times by using the helper class called org.apache.generic.InstrcutionFactory. The main advantage of this helper class is, it handles are the book keeping details of the each and every instruction constructing for us( i.e. adding the items to the constant pool as required for the instructions). Sand Mark:- Sandmark is a tool i.e. developed to measure the performance of the software protection algorithms and effectiveness of the methods that are preventing the software from the piracy issues, water tampering and reverse engineering techniques. Sandmark is also has an ability to find which algorithm is most resiliences to the attacks and have a least performance of over head. There are many software protections are proposed both in software and hardware. The hardware protections are there from the dongle protection and now tamper-proof software. The sandmark tool is developed to evaluating and implementing the software-based techniques such as code obfuscation (making code complex to understand) and water tampering. History of reverse engineering:- Reverse engineering most probably starts with Dos (disk operating system) based computer games. The aim is to have full life and armed for the player to finish the final stage of the game. In that way the technique of reverse engineering came in to picture, it is just to find the memory locations where the life and number of weapons are stored and modifying the values of that memory locations. So that, the player can changes the values and gets through the final stage and win the game. Thats why memory cheating tools such as game hack came in to existence. Reverse Engineering:- Reverse engineering is the process of the understanding the particular aspects of the program, which are listed below To identify the components of the system and the interrelationship between the components. And enhance the components of the system and to improve the performance and scalability of the system (or) subsystem. Software reverse engineering is a technique that converts a machine code of a program (string 0s and 1s usually sent to logic processor) back in to the programmable language statements which is called as source code. Software reverse engineering is done to get the source code of the program because to know how the particular parts of the program performs particular operations in order to improve the program functionality or to fix the bugs in the program or to find malicious block of statements in the software if any. Generally, this reverse engineering will take place in older industries on machines. But now it is frequently used on computer hardware and softwar e. The important contents like data formats, algorithms what the programmer used to implement the software and ideas of the programmer (or) company will be revealed to the 3rd person by violating the security and privacy issues using reverse engineering technique. Reverse engineering is evolving as a major link in the software lifecycle, but its growth is hampered by confusion (Elliot J.chilkofsky James H.Cross ii, Jan 1990). Reverse engineering is generally implemented to improve the quality of the product, to observe the competitors products. Forward engineering is the process of moving from the high level abstracts (or) from the initial requirements stage (objectives, constraints and proper solution to the problem), logical, and independent designs (specification of the solution) to the final product i.e. implementation (coding and testing).; whereas the reverse engineering is the process of moving from the final product to the initial requirements stage in order to under the system logically, why particular function (or) action is being performed. By knowing the system logically, the flaws and errors in the system can be rectified and helps to improve the systems functionality when the source code of the application is not available. For this sake the concept of the reverse engineering techniques is evolved. Fig 1: reverse engineering and related process are transformations between or within the abstract levels, represented here in terms of life cycle phases. (Elliot J.chilkofsky James H.Cross ii, Jan 1990) Reverse engineering in and of itself doesnt mean changing the subsystem or developing the new system based on the existing. It is a process of examination (or) understanding the program (or) software but not replication (or) change. Reverse engineering involves very broad range of aspects such as starting from the existing implementation, recreating or recapturing the design ideas and extracts the actual requirements of the existing system. Design recovery is the most vital subset of the reverse engineering because in which knowledge of the domain, external (or) outer side information and deduction or fuzzy reasoning are added to the investigated (or) subjected system in order to find the high level abstract of the system, normally which is not obtained by directly observing the system. According to the Ted BiggerStaff: Students Paper: Ted BiggerStaff: design recovery recreates design abstractions from a combination of code, existing design documentation(if available), personal experience, and general knowledge about problem and application domains. Design recovery must reproduce Re-engineering is termed as renovation and reclamation, is the examination and altering the subjective system again to construct in the new form and the implementation of the new system. Re-engineering involves some form of reverse engineering i.e. to obtain the high level of the abstract of the existing system followed by forward engineering. This may be changes according to the new requirements that were not previously implemented in the system. While re-engineering is not super type of the forward engineering and reverse engineering but it uses the forward engineering and reverse engineering. Objectives:- The primary goal of the reverse engineering is to enhance the overall comprehensibility of the system for the both maintenance and new development. Cope up with the complexity. In order to meet the complexity and shear volumes of the system we have to develop a better methods i.e. automated support. In order to extract the relevant information reverse engineering methods and tools should be combined with the CASE environments. So that decision makers can control the process and product in system evolutions. Alternative views should be generated. Comprehension aids such as graphic representation as been accepted for long time. However maintaining and creating them is becoming difficult in the process. Reverse engineering facilitates the generation or regeneration of the graphical representation in the other forms. While many designers work on single diagrams such as data flow diagrams where as the reverse engineering tools will give the other graphical representations such as control flow diagrams, entity relation diagrams and structure charts to aid the review and verification process. To identify the side effects. Both haphazard initial design and intentional modifications to the system can lead to unintentional ramifications and side effects that affect the system performance. Reverse engineering can provide better observation than we can observe by forward engineering perspective. So it makes us to solve that ramifications and anomalies before users intimate them as bugs. Component reuse. Software reusability is becoming the more essential part in developing the new products in the software field. Reverse engineering can be able to help to detect the candidates for reusable components from the present system. To recover the lost information. When the continuous evolution of the long lived system which will lead to loss of information. In order to preserve the old information of the system design; design recovery of reverse engineering techniques is used. Many reverse engineering tools try to extract the structure of the legacy systems with the intension to pass this information to software engineers in order to re-engineer or to reverse engineer the existing component. Code reverse engineering:- During the evolution of the software, many changes will apply to the code, to add any functionality which is to be added and to change the code in order to rectify the defect and enhance the systems performance (or) quality. Systems with the poor documentation only the code will be reliable solution to get information about the system. As a result, the process of reverse engineering is focused on understanding the code. Thus reverse engineering has good and bad ends. Obfuscation:- Java provides platform independence to the software programs so that software programs will run independently on any platform. All the programs are compiled in order get intermediate code format i.e. A class file consists of a stream of very large amount of information regarding the program methods, variable and constant enough to do reverse engineering. When a company develops the program (or) software in java and sell this product in intermediate code format to the other organization by not giving the original software. The organization who buys the program (or) software will simply change (or) modify the software by violating the security and privacy issues of authorised company; by simply applying the reverse engineering technique. This reverse engineering will be done by the software developers, automated tools and decompilers. Java byte code can be easily decompiled, which makes reverse engineering technique easier in java. In programming context Obfuscation is described as, making program code more difficult to read and understand for security and privacy purposes of the software. Decompilers can easily extract the source code from the compiled code, in that point of view protecting the code secretly will make impossible. So the growth of obfuscators increased rapidly in order to keep effectively smoke screen around the code. Code obfuscation is the one of the most prominent and best method to protect the java code securely. Code obfuscation makes program to understand difficult. So that code will be more resistant to the reverse engineering. There are 2 byte code obfuscation techniques that are: source code obfuscation byte code obfuscation Source code technique is simply changing the source code of the program, where as byte code obfuscation is changing the classfile of the program (functionality is same as the source code). There several obfuscation techniques to prevent java byte code from decompilation. For example consider a set of class files, S, becomes another set of class files S through an obfuscator. Here the set of class files of s and s are different, but they produce the same output. Example:- class OHello { public OHello() { int num=1; } public String gHello(String hname){ return hname; } when the above code is passed through the simple obfuscator (such as Klass Master), the following code will be generated. class aa { public static boolean aa; public aa() { int aa=1; } public String aa(String ba){ return ba; } By observing the above code the class name OHello is changed to the aa and the gHello method name is changed to the aa. It is more difficult to read the program with aa than a OHello. By this way less information will be interpreted and understand to the reverse engineers. This is just a simple example by renaming the class variables and class method names. Categories of obfuscation techniques:- Description of Obfuscation techniques:- One way of obfuscating the source program by the obfuscators is replacing a symbol of a class file by illegal string. The replacement might be the private are even worst ***. Other techniques usually obfuscator will use targeting the specific decompilers (Mocha and Jode) is inserting a bad instruction in the code. The example is Let us taken an example with bad instruction, lets take the original code (decompiled): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return and after obfuscation the code is as follows (names are not changed, not to make complex): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return 7 pop By observing the above routine we notice that a pop instruction is added after the return statement. The last and final statement in the method that has return type should be return statement, but in the above routine a pop keyword is inserted which make the routine not to be executed for ever. Lexical obfuscation:- Lexical obfuscation changes the lexical structure of a program by scrambling the identifiers. All the names of classes, fields and methods which are meaningful symbolic information of java program, is renamed with meaningless name i.e. useless names. An example obfuscator for lexical obfuscation is crema. Obfuscator is defined as the program that automatically makes the transformation in the classfile in order obfuscate the classfile, to undo the reverse engineering technique to produce the source code from the class file. Layout obfuscation:- Layout obfuscation dealt with changing the layout structure of the program i.e. done by 2 basic methods Renaming the identifiers Removing the debugging information. Above 2 will make program code less informative to the reverse engineers. Layout obfuscation techniques use the one way functions such renaming the identifiers by random symbols, removing the comments, unused methods and debugging information. Though the reverse engineers can understand the obfuscated code i.e. done by layout obfuscation, it consumes the cost of reverse engineering. Layout obfuscation techniques are most commonly used in the code obfuscation. All most all obfuscators of java will use these techniques. Control obfuscation:- Changing the control flow of the program. It is easiest way to do and which make reverse engineer to find the code what exactly. For example consider a code in which a there is a method A(). Here another new method called A_Dummy() will be created and in the program Data Obfuscation:- Data obfuscation mainly deals with breaking up the data structures used in the program and encrypting the literals. This includes changing the inheritance, restructuring the arrays, making the variable names constant etc. In that way data obfuscation affect the data structures of the program. Thus data obfuscation make impossible to obtain the original source code of the program. More viable source code obfuscation methods are based on composite functions, which are Array Index Transformation, Method Argument Transformation, and Hiding Constant. The obfuscation techniques that are based on composite functions make the computation complex and extensive use of these techniques make the software to respond slowly. Some source code obfuscation methods are directed at the object oriented concept; Class Coalescing, Class splitting, and Type Hiding. Other source code obfuscation techniques may include; false refactoring, restructure arrays, inline and outline methods, clone methods, split v ariables, convert static to procedural data, and merge scalar variables. The obfuscation techniques that work over object oriented concept and other techniques like restructure arrays, split variables, merge scalar variables may distort the logic of the software, so these must be carefully used. The employment of obfuscation technique like outline methods, clone methods, convert static to procedural data increase the size of a class file without providing any significant advantage. In lining a method results in an unresolved method call when some other class calls the in lined method. Advanced obfuscation techniques for byte code:- There are several obfuscation techniques to prevent java byte code from de-compilation. Many of these tools are simply to change the names of the identifiers with the meaningless names which are stored in byte code. Many crackers can understand the actual source code, even though identifier name are changed, but it will take more time to understand. Traditionally, when a program is compiled to machine code, most of the symbolic information will be stripped off, after the compilation of the program. When the program is compiled, the address of the variable and functions of the program will be denoted by the identifiers. Even though de-compilation of such compiled code is difficult, but still it is possible to decompile the code. We say protection techniques are difficult if and only if the time and effort taken by the cracker to crack the software should be with more cost and effort. Cracking time to crack software is more than a re-writing a program, then its of no use and waste of time and valueless. Java became the most popular because of benefits that it is providing. One of the major benefits is portability i.e. compiled program can run on any platform i.e. platform independent. When the program is compiled it produces independent byte code. Java uses the symbolic references rather than the traditional memory addresses. Therefore, the names of methods and, variables and types are stored in a constant pool with in a byte code file. There are many commercial de-compilers (P C, 2001, Vliot 1996, hoeniche 2001 etc.). When the program is decompiled, it extracts the program almost identical to the source code. Making use of decompiler to extract the source code becomes the lethal weapon to intellectual property piracy. Obfuscation technique is used to stop de-compilation of the byte code. The main aim of obfuscation technique is to make decompiled program harder to understand i.e. more time and effort to understand the obfuscated code. Obfuscation scope:- Java application consists of one or more packages. A programmer might divide the program in to packages. He can also use the packages that are in standard library and proprietary libraries. Only the part of the program developed by the developer will be given outside. The proprietary library is not distributed due to the copyright restrictions. Obfuscation scope termed as the part of the program obfuscated by the obfuscation techniques, i.e. the part of the program/software developed by the developer is protected not the entire software. The package that serves as the utilities for the standard library and proprietary libraries not obfuscated. Candidates considered for identifiers scrambling:- An identifier will denote the following terms in java http://www.cis.nctu.edu.tw/~wuuyang/papers/Obfuscation20011123.doc the bytecode file. By default, parameters and local variables are stripped and deleted (or) removed from the byte code. The names of the local variables and parameters are stored in the LocalVariableTable in the byte code, if the debug info is enabled. But, by default the de-bug info is enabled in java compiler. If the local variable is not found, de-compilers itself create the names for local variable and parameter, which makes program after reverse somewhat understandable. Even, if we rename the names of the variables and parameter in LocalVariableTable, good decompiler will simply Characteristics of Java Language Characteristics of Java Language Chapter 2 Literature review About Java:- Initially java language is named as Oak in 1991, which is designed for the consumer electronic appliances. Later in 1995 the name was changed to Java. Java was developed by James Gosling, a development leader in sun micro system. Oak was redesigned in 1995 and changed the name to java for the development of the applications which can be run over internet. Using the java language, java programs can be embedded in to the html pages. Java is not only limited for the web applications, it is also useful to develop the stand alone applications. Java has a feature called OOPs, which make it more familiar. Object oriented programming replaced the old traditional techniques i.e. procedural programming. Characteristics of java:- Simple:- Java language is simple than the previous languages such as c and c++. Java eliminates the pointers concept which is earlier present in c and c++. Java also has a properties i.e. automatic allocation of memory and garbage collection, where as in c/c++ the garbage collection and allocation of memory will be done by the programmer which is a complex task. Object oriented:- All the programming languages apart from the c++ are procedural languages which are paradigm of procedures. Java programming language is object oriented because java uses the concept of the object. In java everything will depend on objects i.e. creating the objects and making objects to work together. The overall functionality of the high level program will depends on the objects. Because java is object oriented program it provides great range of reusability, modularity and flexibility. Distributed:- Java uses the http and ftp which are internet protocols, in order to have access the files over the network. So by using this libraries which are in java can easily make file transfers over the network which is connected to internet. Interpreted:- In order to run the java programs we need interpreter. When the java programs are compiled it produces the byte code, which is machine understandable language. The byte code which is produced after the compilation is machine independent, so that it can run on any system using java interpreter. Most of the compilers will convert the high level language instructions to the low-level machine understandable language as machine cant understand the high level instruction. The machine code can only be executed on that compiled native machine. For example a source code is compile on windows platform, the executable file produced after cant be executed on other platforms apart from the windows. But, coming to java it is different i.e. the source code is compiled once and the executable byte code can be run on any platform using java interpreter. The main functionality of the interpreter is, it converts the byte code to the machine language of the target machine. Robust and secure:- Java programming is more reliable. At the time of the execution time java shows all the errors. In java bad and error prone language constructs are eliminated. Java eliminated the concepts such as pointers, due to this there is no corruption of data and overwriting the memory locations. In the same way java supports the exception-handling, which makes java more reliable and robust. Java forces the programmer to write the code for the exceptions, which may occur during the execution of the program. So that program can be terminated successfully, without any error stopping the execution flow of the program. Java also provides the lot of security. Security is important over the network because the computer will be attacked by the external program. Java provides the security that; it encounters the applets for the un-trusted sources. Architecture- neutral:- Java is a interpreted language, which enables java as a architectural neutral i.e. platform independent. We can write the program once and it can be executed on any platform with the help of the Java Virtual Machine (JVM). The java virtual machine can be embedded on the operating system or on web browser. Once the part of the java code is loaded into the machine, it is verified. Byte code verification play a major role, as it check all the code generated by the compiler will not corrupt the machine on which the code is loaded. At the end of the compilation, byte code verification will be done; in order to make sure thats the code is accurate and correct. So the byte code verification is the integral to the compilation and execution. Due to the property of architectural neutral had by java, it is portable. The program once written can be run on any platform without recompilation. Java does not provide any platform specific features. In other languages, such as Ada where the large integer varies according to the platform it runs. But in the case of java the range of the numbers are fixed. Java environment is portal to every operating system and hardware. Multi-threaded:- It is defined as the programs ability to perform several tasks (or) functions simultaneously. The multithreading property is embedded in the java program. Using the java programs we can perform the several tasks simultaneously without calling any procedures of the operating system, which is done by the other programming languages in order to perform the multi-threading. Constant Pool:- Every program i.e. class in java, has a array of constants in the heap memory called as the constant pool, which is available to that class. Usually it is created by the java compiler. The constants encode all the name of the (methods, variables and constant that are presented in the constant pool) which is used by particular method of any class. Each individual class i.e. stored in heap memory has a count of how many constants are there and also has offset which specifies how far in to the class description itself the array of constants begins (Laura Lemay, Charles L.Perkins, and Micheal Morrison, n.d). The constants are represented (or) typed in the special coded bytes and which has a very well defined format, when these constants are appeared in the .class file for the java class file. JVM instructions refer to the symbolic information in java, rather than relying on the run time layouts of the class, methods and fields. Sun Java Wireless Toolkit:- Sun java wireless toolkit CLDC (connected Limited Device Configuration) is a group of tools which is used to develop the applications for the mobiles and for other wireless equipments (or) devices. Although the sun java wireless toolkit is based on the MIDP (Mobile Information Device Profile), it also supports many other optional packages, which make a sun java wireless toolkit as a great tool for developing many applications. It can be supported on the windows and Linux. All the users who have account on the host machine can access this tool either singly or simultaneously. It allows you to use a byte code obfuscator to reduce the size of your MIDlet suite JAR file. It also supports many other standard Application Programming Interfaces (APIs) which are defined by the (JCP) Java Community Process program. Even though, the sun java wireless toolkit did not come up with an obfuscator, it is configured in a way that it supports the ProGaurd. All you need to do is, just simply to download the ProGuard and place it in the system, which sun java wireless tool kit can find it. But due to the flexible nature of the tool, it allows any kind of the obfuscator. BCEL:- BCEL full abbreviation is Byte Code Engineering library. The BCEL helps you to dig the byte code of the java classes. BCEL gives the utmost power on the code because it works at the individual JVM instructions, even though the power comes with cost in complexity. Using the BCEL, we can transform the existing classes transformation or we can construct the new classes. The main difference between the BCEL and Javassist is javassist provides the source code interface where as the BCEL is developed in the intension to work at the level of the JVM assembly language. BCEL is good because the approach it uses is low level, which is very helpful to control the program at the instruction level. Compared to Javassist it is more complex to work with the BCEL. BCEL has the capability to inspect, to edit and to create binary classes in java. There are 2 hierarchy components in the BCEL, in which one component is used to create the new code and the other component is used to edit (or) update the existing code. The inspection of the class aspect in the BCEL mainly deals with the duplication whatever available in the java platform using the Reflection API. This duplication is necessary (or) mandatory in classworking because we generally dont want to load the classes on which we are working until they are modified fully. Org.apache.bcel.classfile package provides all the definition which is related to inspection-related code.org.apache.bcel package provides the basic constant definitions. JavaClass is a class which is the starting point of the package. The JavaClass plays a role in accessing the information of the class using the BCEL same as like java,lang.Class does using the regular reflection in java. The JavaClass has a methods to get the information like structural information about the super classes and interfaces, to get the information of the class i.e. information about the field and methods in the class. The JavaClass will provide access to the some internal information about the class, including constant pool and identifiers. It also represents the Byte stream which is the complete binary class representation. If the actual binary class is parcel, then we can create the instance for the JavaClass. To handle the parsing BCEL provides a class called org.apache.bcel.Respository. The representation of the classes are parsed and cached by the BCEL by default, which are on the JVM path, to get the actual binary classes representation from the org.apache.bcel.util.respository instance. org.apache.bcel.util.respository is an interface which is source for binary classes representations. Changing the classes:- Not only the accessing the components of class, org.apache.bcel.Classfile.JavaClass also provide certain methods, in order to provide the liberty to change (or) alter the classes. The class component can be set to the new values by using those methods. Although those are of no direct use much, because the other classes in the package dont support constructing the new versions of the components that are building. There are certain classes in the org.apache.bcel.generic package that will provide the editable versions of the same components there in the org.apache.bcel.classfile classes. Org.apache.bcel.generic.ClassGen is the starting step (or) point for the creating the new classes. This also useful to modify the existing classes, to do this one, there is a constructor that takes a JavaClass Instance in order to initialize ClassGen class information. Once you modified the changes to the class, then we get the usable (or) useful class representation from ClassGen instance, in order get the usable representation of the class, we need to call any method that returns the class called JavaClass. Later it will be converted into the binary class information. It is little bit confusing, in order to eliminate this confusion, it is better to write a wrapper class for eliminating some differences. In order to manage the construction of the various class components, org.apache.bcel.generic provides many other classes apart from the ClassGen. It has a class called ConstantPoolGen , which is used to handle the constant pool. FieldGen, MethodGen classes which are used to handle the Fields and the methods in classes. For the working with the sequence of the JVM instructions there is other class called Instruction List. org.apache.bcel.generic also provides the classes for the each and every type instructions which are executed over JVM. We can create the instance for these classes directly some times and in other times by using the helper class called org.apache.generic.InstrcutionFactory. The main advantage of this helper class is, it handles are the book keeping details of the each and every instruction constructing for us( i.e. adding the items to the constant pool as required for the instructions). Sand Mark:- Sandmark is a tool i.e. developed to measure the performance of the software protection algorithms and effectiveness of the methods that are preventing the software from the piracy issues, water tampering and reverse engineering techniques. Sandmark is also has an ability to find which algorithm is most resiliences to the attacks and have a least performance of over head. There are many software protections are proposed both in software and hardware. The hardware protections are there from the dongle protection and now tamper-proof software. The sandmark tool is developed to evaluating and implementing the software-based techniques such as code obfuscation (making code complex to understand) and water tampering. History of reverse engineering:- Reverse engineering most probably starts with Dos (disk operating system) based computer games. The aim is to have full life and armed for the player to finish the final stage of the game. In that way the technique of reverse engineering came in to picture, it is just to find the memory locations where the life and number of weapons are stored and modifying the values of that memory locations. So that, the player can changes the values and gets through the final stage and win the game. Thats why memory cheating tools such as game hack came in to existence. Reverse Engineering:- Reverse engineering is the process of the understanding the particular aspects of the program, which are listed below To identify the components of the system and the interrelationship between the components. And enhance the components of the system and to improve the performance and scalability of the system (or) subsystem. Software reverse engineering is a technique that converts a machine code of a program (string 0s and 1s usually sent to logic processor) back in to the programmable language statements which is called as source code. Software reverse engineering is done to get the source code of the program because to know how the particular parts of the program performs particular operations in order to improve the program functionality or to fix the bugs in the program or to find malicious block of statements in the software if any. Generally, this reverse engineering will take place in older industries on machines. But now it is frequently used on computer hardware and softwar e. The important contents like data formats, algorithms what the programmer used to implement the software and ideas of the programmer (or) company will be revealed to the 3rd person by violating the security and privacy issues using reverse engineering technique. Reverse engineering is evolving as a major link in the software lifecycle, but its growth is hampered by confusion (Elliot J.chilkofsky James H.Cross ii, Jan 1990). Reverse engineering is generally implemented to improve the quality of the product, to observe the competitors products. Forward engineering is the process of moving from the high level abstracts (or) from the initial requirements stage (objectives, constraints and proper solution to the problem), logical, and independent designs (specification of the solution) to the final product i.e. implementation (coding and testing).; whereas the reverse engineering is the process of moving from the final product to the initial requirements stage in order to under the system logically, why particular function (or) action is being performed. By knowing the system logically, the flaws and errors in the system can be rectified and helps to improve the systems functionality when the source code of the application is not available. For this sake the concept of the reverse engineering techniques is evolved. Fig 1: reverse engineering and related process are transformations between or within the abstract levels, represented here in terms of life cycle phases. (Elliot J.chilkofsky James H.Cross ii, Jan 1990) Reverse engineering in and of itself doesnt mean changing the subsystem or developing the new system based on the existing. It is a process of examination (or) understanding the program (or) software but not replication (or) change. Reverse engineering involves very broad range of aspects such as starting from the existing implementation, recreating or recapturing the design ideas and extracts the actual requirements of the existing system. Design recovery is the most vital subset of the reverse engineering because in which knowledge of the domain, external (or) outer side information and deduction or fuzzy reasoning are added to the investigated (or) subjected system in order to find the high level abstract of the system, normally which is not obtained by directly observing the system. According to the Ted BiggerStaff: Students Paper: Ted BiggerStaff: design recovery recreates design abstractions from a combination of code, existing design documentation(if available), personal experience, and general knowledge about problem and application domains. Design recovery must reproduce Re-engineering is termed as renovation and reclamation, is the examination and altering the subjective system again to construct in the new form and the implementation of the new system. Re-engineering involves some form of reverse engineering i.e. to obtain the high level of the abstract of the existing system followed by forward engineering. This may be changes according to the new requirements that were not previously implemented in the system. While re-engineering is not super type of the forward engineering and reverse engineering but it uses the forward engineering and reverse engineering. Objectives:- The primary goal of the reverse engineering is to enhance the overall comprehensibility of the system for the both maintenance and new development. Cope up with the complexity. In order to meet the complexity and shear volumes of the system we have to develop a better methods i.e. automated support. In order to extract the relevant information reverse engineering methods and tools should be combined with the CASE environments. So that decision makers can control the process and product in system evolutions. Alternative views should be generated. Comprehension aids such as graphic representation as been accepted for long time. However maintaining and creating them is becoming difficult in the process. Reverse engineering facilitates the generation or regeneration of the graphical representation in the other forms. While many designers work on single diagrams such as data flow diagrams where as the reverse engineering tools will give the other graphical representations such as control flow diagrams, entity relation diagrams and structure charts to aid the review and verification process. To identify the side effects. Both haphazard initial design and intentional modifications to the system can lead to unintentional ramifications and side effects that affect the system performance. Reverse engineering can provide better observation than we can observe by forward engineering perspective. So it makes us to solve that ramifications and anomalies before users intimate them as bugs. Component reuse. Software reusability is becoming the more essential part in developing the new products in the software field. Reverse engineering can be able to help to detect the candidates for reusable components from the present system. To recover the lost information. When the continuous evolution of the long lived system which will lead to loss of information. In order to preserve the old information of the system design; design recovery of reverse engineering techniques is used. Many reverse engineering tools try to extract the structure of the legacy systems with the intension to pass this information to software engineers in order to re-engineer or to reverse engineer the existing component. Code reverse engineering:- During the evolution of the software, many changes will apply to the code, to add any functionality which is to be added and to change the code in order to rectify the defect and enhance the systems performance (or) quality. Systems with the poor documentation only the code will be reliable solution to get information about the system. As a result, the process of reverse engineering is focused on understanding the code. Thus reverse engineering has good and bad ends. Obfuscation:- Java provides platform independence to the software programs so that software programs will run independently on any platform. All the programs are compiled in order get intermediate code format i.e. A class file consists of a stream of very large amount of information regarding the program methods, variable and constant enough to do reverse engineering. When a company develops the program (or) software in java and sell this product in intermediate code format to the other organization by not giving the original software. The organization who buys the program (or) software will simply change (or) modify the software by violating the security and privacy issues of authorised company; by simply applying the reverse engineering technique. This reverse engineering will be done by the software developers, automated tools and decompilers. Java byte code can be easily decompiled, which makes reverse engineering technique easier in java. In programming context Obfuscation is described as, making program code more difficult to read and understand for security and privacy purposes of the software. Decompilers can easily extract the source code from the compiled code, in that point of view protecting the code secretly will make impossible. So the growth of obfuscators increased rapidly in order to keep effectively smoke screen around the code. Code obfuscation is the one of the most prominent and best method to protect the java code securely. Code obfuscation makes program to understand difficult. So that code will be more resistant to the reverse engineering. There are 2 byte code obfuscation techniques that are: source code obfuscation byte code obfuscation Source code technique is simply changing the source code of the program, where as byte code obfuscation is changing the classfile of the program (functionality is same as the source code). There several obfuscation techniques to prevent java byte code from decompilation. For example consider a set of class files, S, becomes another set of class files S through an obfuscator. Here the set of class files of s and s are different, but they produce the same output. Example:- class OHello { public OHello() { int num=1; } public String gHello(String hname){ return hname; } when the above code is passed through the simple obfuscator (such as Klass Master), the following code will be generated. class aa { public static boolean aa; public aa() { int aa=1; } public String aa(String ba){ return ba; } By observing the above code the class name OHello is changed to the aa and the gHello method name is changed to the aa. It is more difficult to read the program with aa than a OHello. By this way less information will be interpreted and understand to the reverse engineers. This is just a simple example by renaming the class variables and class method names. Categories of obfuscation techniques:- Description of Obfuscation techniques:- One way of obfuscating the source program by the obfuscators is replacing a symbol of a class file by illegal string. The replacement might be the private are even worst ***. Other techniques usually obfuscator will use targeting the specific decompilers (Mocha and Jode) is inserting a bad instruction in the code. The example is Let us taken an example with bad instruction, lets take the original code (decompiled): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return and after obfuscation the code is as follows (names are not changed, not to make complex): Method void main(java.lang.String[]) 0 new #4 3 invokespecial #10 6 return 7 pop By observing the above routine we notice that a pop instruction is added after the return statement. The last and final statement in the method that has return type should be return statement, but in the above routine a pop keyword is inserted which make the routine not to be executed for ever. Lexical obfuscation:- Lexical obfuscation changes the lexical structure of a program by scrambling the identifiers. All the names of classes, fields and methods which are meaningful symbolic information of java program, is renamed with meaningless name i.e. useless names. An example obfuscator for lexical obfuscation is crema. Obfuscator is defined as the program that automatically makes the transformation in the classfile in order obfuscate the classfile, to undo the reverse engineering technique to produce the source code from the class file. Layout obfuscation:- Layout obfuscation dealt with changing the layout structure of the program i.e. done by 2 basic methods Renaming the identifiers Removing the debugging information. Above 2 will make program code less informative to the reverse engineers. Layout obfuscation techniques use the one way functions such renaming the identifiers by random symbols, removing the comments, unused methods and debugging information. Though the reverse engineers can understand the obfuscated code i.e. done by layout obfuscation, it consumes the cost of reverse engineering. Layout obfuscation techniques are most commonly used in the code obfuscation. All most all obfuscators of java will use these techniques. Control obfuscation:- Changing the control flow of the program. It is easiest way to do and which make reverse engineer to find the code what exactly. For example consider a code in which a there is a method A(). Here another new method called A_Dummy() will be created and in the program Data Obfuscation:- Data obfuscation mainly deals with breaking up the data structures used in the program and encrypting the literals. This includes changing the inheritance, restructuring the arrays, making the variable names constant etc. In that way data obfuscation affect the data structures of the program. Thus data obfuscation make impossible to obtain the original source code of the program. More viable source code obfuscation methods are based on composite functions, which are Array Index Transformation, Method Argument Transformation, and Hiding Constant. The obfuscation techniques that are based on composite functions make the computation complex and extensive use of these techniques make the software to respond slowly. Some source code obfuscation methods are directed at the object oriented concept; Class Coalescing, Class splitting, and Type Hiding. Other source code obfuscation techniques may include; false refactoring, restructure arrays, inline and outline methods, clone methods, split v ariables, convert static to procedural data, and merge scalar variables. The obfuscation techniques that work over object oriented concept and other techniques like restructure arrays, split variables, merge scalar variables may distort the logic of the software, so these must be carefully used. The employment of obfuscation technique like outline methods, clone methods, convert static to procedural data increase the size of a class file without providing any significant advantage. In lining a method results in an unresolved method call when some other class calls the in lined method. Advanced obfuscation techniques for byte code:- There are several obfuscation techniques to prevent java byte code from de-compilation. Many of these tools are simply to change the names of the identifiers with the meaningless names which are stored in byte code. Many crackers can understand the actual source code, even though identifier name are changed, but it will take more time to understand. Traditionally, when a program is compiled to machine code, most of the symbolic information will be stripped off, after the compilation of the program. When the program is compiled, the address of the variable and functions of the program will be denoted by the identifiers. Even though de-compilation of such compiled code is difficult, but still it is possible to decompile the code. We say protection techniques are difficult if and only if the time and effort taken by the cracker to crack the software should be with more cost and effort. Cracking time to crack software is more than a re-writing a program, then its of no use and waste of time and valueless. Java became the most popular because of benefits that it is providing. One of the major benefits is portability i.e. compiled program can run on any platform i.e. platform independent. When the program is compiled it produces independent byte code. Java uses the symbolic references rather than the traditional memory addresses. Therefore, the names of methods and, variables and types are stored in a constant pool with in a byte code file. There are many commercial de-compilers (P C, 2001, Vliot 1996, hoeniche 2001 etc.). When the program is decompiled, it extracts the program almost identical to the source code. Making use of decompiler to extract the source code becomes the lethal weapon to intellectual property piracy. Obfuscation technique is used to stop de-compilation of the byte code. The main aim of obfuscation technique is to make decompiled program harder to understand i.e. more time and effort to understand the obfuscated code. Obfuscation scope:- Java application consists of one or more packages. A programmer might divide the program in to packages. He can also use the packages that are in standard library and proprietary libraries. Only the part of the program developed by the developer will be given outside. The proprietary library is not distributed due to the copyright restrictions. Obfuscation scope termed as the part of the program obfuscated by the obfuscation techniques, i.e. the part of the program/software developed by the developer is protected not the entire software. The package that serves as the utilities for the standard library and proprietary libraries not obfuscated. Candidates considered for identifiers scrambling:- An identifier will denote the following terms in java http://www.cis.nctu.edu.tw/~wuuyang/papers/Obfuscation20011123.doc the bytecode file. By default, parameters and local variables are stripped and deleted (or) removed from the byte code. The names of the local variables and parameters are stored in the LocalVariableTable in the byte code, if the debug info is enabled. But, by default the de-bug info is enabled in java compiler. If the local variable is not found, de-compilers itself create the names for local variable and parameter, which makes program after reverse somewhat understandable. Even, if we rename the names of the variables and parameter in LocalVariableTable, good decompiler will simply
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.