lawyers in India

Information Retrieval and its Legal Impact on the Society

Written by: Meenakshi Sinha - 5th year, Hidayatullah National Law University, Raipur
Constitutional Lawyers in India
Legal Service
  • Law is an unusually information-rich discipline. Not only is there an enormous text-based literature about the law, but even the raw materials of law is textual in nature. Lawyers both practitioners and academics can all be thought of as wordsmiths, or rather less kindly as wordmongers. It is not surprising that such an information-orientated discipline as law should have embraced information technology from an early stage. Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources.
    The present paper is basically a research work on Information Retrieval and its Legal Effect. In order to make it comprehensive it proposed to do the basic concept of Information Retrieval, its impact on the society and legal enticement.

    Since the 1940s the problem of information storage and retrieval has attracted increasing attention. It is simply stated: we have vast amounts of information to which accurate and speedy access is becoming ever more difficult. One effect of this is that relevant information gets ignored since it is never uncovered, which in turn leads to much duplication of work and effort. With the advent of computers, a great deal of thought has been given to using them to provide rapid and intelligent retrieval systems. In libraries, many of which certainly have an information storage and retrieval problem, some of the more mundane tasks, such as cataloguing and general administration, have successfully been taken over by computers. However, the problem of effective retrieval remains largely unsolved.
    From the period of history, one of the first applications of the technology to law was the legal information system. The earliest viable system was produced back in 1960. From such primitive origins developed the first wave of legal information systems, private sector systems such as Lexis and Westlaw. The coming of the Internet and, in particular, its publishing arm the World Wide triggered a new era of development.

    Concept of Information Retrieval

    With the enormous increase in recent years in the number of text databases available on-line, and the consequent need for better techniques to access this information, there has been a strong resurgence of interest in the research done in the area of information retrieval (herein after referred as IR). For many years, IR research was done by a small community that had little impact on industry. Most applications of text retrieval focused on bibliographic databases, and the large information services such as Dialog or Westlaw were based on standard Boolean logic approaches to text matching and paid little attention to the results of research on topics such as retrieval models, query processing, term weighting and relevance feedback.

    Since the 1940s, there is some problem of information storage and retrieval has attracted increasing attention. It is simply stated: we have vast amounts of information to which accurate and speedy access is becoming ever more difficult. One effect of this is that relevant information gets ignored since it is never uncovered, which in turn leads to much duplication of work and effort. With the advent of computers, a great deal of thought has been given to using them to provide rapid and intelligent retrieval systems. In libraries, many of which certainly have an information storage and retrieval problem, some of the more mundane tasks, such as cataloguing and general administration, have successfully been taken over by computers. However, the problem of effective retrieval remains largely unsolved.

    In this context the technical definition of information is useful and it is a term which may be a synonym for “data” or “news” or reports etc. In information science, the word is used with a more specific meaning, but here also definitions vary according to the pragmatics of the use. The concept of information retrieval is broad and often employed in an imprecise meaning. It is used to denote systems designed to provide users or a group of users with information. Salton/McGill (1983) gave this definition: An information retrieval system is an information system, that is, a system used to store items of information that need to be processed, searched, retrieved, and disseminated to various user populations.

    This definition of an information system is a part of the legal communication process.

    In the literature, there are a number of different definitions of information retrieval. The concept is generally used in a more precise meaning than the one above, making retrieval systems one type of information systems. But a further limitation is generally made, often implicit to computerized retrieval systems or bibliographical retrieval systems. Perhaps one might say that information retrieval system has become accepted as denoting the type of systems discussed in the works of Salton, Sparck Jones, and others. Lancaster (1978) gives the following definition of an information retrieval system:
    As it is most commonly used, the term information retrieval is really synonymous with literature searching. Information retrieval is the process of searching some collection of documents, using the term document in its widest sense, in order to identify those documents which deal with a particular subject. Any system that is designed to facilitate this literature searching may legitimately be called an information retrieval system.

    Information retrieval is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or World Wide Web or intranets, for text, sound, images or data. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these has its own bodies of literature, theory, praxis and technologies. IR is like most embryonic fields interdisciplinary, based on computer science, mathematics, library science, information science, cognitive psychology, linguistics, statistics, and physics.

    According to Science and Technology Dictionary , information retrieval is the technique and process of searching, recovering, and interpreting information from large amounts of stored data. Britannica Concise Encyclopedia says information retrieval is recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. Keyword searching has been the dominant approach to text retrieval since the early 1960s; hypertext has so far been confined largely to personal or corporate information-retrieval applications. Evolving information-retrieval techniques, exemplified by developments with modern Internet search engines, combine natural language, hyperlinks, and keyword searching. Other techniques that seek higher levels of retrieval precision are studied by researchers involved with artificial intelligence.

    Legal information systems are online collections of legal information in full text form. Such systems may also contain significant quantities of secondary sources i.e. description, analysis, and evaluation of the law. An object is an entity which keeps or stores information in a database. User queries are matched to objects stored in the database. A document is, therefore, a data object. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates. For example, web search engines such as Google,, or Yahoo search are the most visible IR applications.

    In 1992 the US Department of Defense, along with the National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for such a huge evaluation of text retrieval methodologies.

    There are various ways to measure how well the retrieved information matches (i.e., how well it is relevant to) the intended information:
    (a) Precision: The proportion of retrieved and relevant documents to all the documents retrieved. In binary classification, precision is analogous to positive predictive value. Precision can also be evaluated at a given cut-off rank, instead of all retrieved documents. The meaning and usage of precision in the field of Information Retrieval differs from the definition of accuracy and precision within other branches of science and technology.

    (b) Recall: The proportion of relevant documents that are retrieved, out of all relevant documents available. In binary classification, recall is called sensitivity.

    (c) Fall-Out: The proportion of irrelevant documents that are retrieved, out of all irrelevant documents available.

    (d) F-measure: The weighted harmonic mean of precision and recall, the traditional F- measure or balanced F-score is known as the F1 measure, because recall and precision are evenly weighted.

    (e) Mean average precision: Over a set of queries, find the mean of the average precisions, where Average Precision is the average of the precision after each relevant document is retrieved.

    Generally software agents group give a wide-range approached to information retrieval which includes user profiling, information filtering, privacy, recommender systems, community ware, negotiation mechanisms and coordination. Peter Norvig in a presentation at the 2005, O’Reilly Emerging Technology Conference of America said that researchers in computational linguistics and information retrieval now have a million times more data than was available 30 years ago. In this presentation, Peter Norvig explores what this data can do for problems in language understanding, translation, information extraction, and inference, and extrapolates to what more data may bring in the future.

    Impact on The Society
    Law means to regulate the society for its development and establishment. The development and fall of any society depends upon their legal system. From past 20 years computers are used in legal information systems. But the rate of development this has become hectic and difficult to relate with each other. There is a need to monitor, where possible, the development of the use of computers for legal purposes and avoid haphazard development and proliferation of conflicting systems of information storage and retrieval, and other uses of computers for lawyers. In other words we can say that, since the beginning of civilization, man has always been motivated by the need to make progress and better the existing technologies. This has led to tremendous development and progress which has been a launching pad for further developments. Of all the significant advances made by mankind from the beginning till date, probably the most important of them is the development of Internet.

    Given the speed with which industry has adopted the results of IR research from the 1970s and 1980s, the IR community is faced with identifying major new directions. The emergence of new applications such as digital libraries is both an opportunity and a challenge. These applications provide unique opportunities as test beds for evaluating and stimulating research, but the challenge for IR researchers is to define and pursue research programs that maintain their relevance in a rapidly changing environment. One problem is that the priorities that IR researchers place on research issues are not necessarily the same as those of companies and government agencies that use and sell IR systems. Understanding those priorities and the operational experience behind them will be part of the process of deciding which issues are of fundamental importance and which are more transient.

    Today, however, the situation is considerably different. Retrieval techniques based on IR research have found their way into major information services (for example, West Publishing’s system, Individual’s clipping service) and the World Wide Web (for example, InfoSeek and Lycos). Many of the features once considered too esoteric for the typical user, such as natural language queries, ranked retrieval results, term weighting, query-by-example, and query formulation assistance, have become common and, indeed, necessary in most IR products.

    Private Information Retrieval:

    A private information retrieval (herein after referred as PIR) protocol allows a user to retrieve an item from a server in possession of a database without revealing which item she is retrieving.

    In other words, we can say that PIR protocols allow users to retrieve information from a database while keeping their query private. Motivating examples for this problem include databases with sensitive information, such as stocks, patents or medical databases, in which users are likely to be highly motivated to hide which record they are trying to retrieve. PIR protocols aim at achieving this goal efficiently, where the main cost measure is communication complexity. PIR is a strong primitive, which may also be useful as a building block within other protocols.

    One insignificant, but very inefficient way to achieve PIR is for the server to send an entire copy of the database to the user. In fact, this is the only possible protocol that gives the user information theoretic privacy for her query. There are two ways to get around this problem, one is to make the server computationally bounded and the other is to assume that there are multiple non-cooperating servers, each having a copy of the database. The problem was introduced in 1996 by Chor et al. Since then, very efficient solutions have been discovered. Single database which is computationally private PIR can be achieved with constant communication and k-database which is information theoretic PIR can be done with communication.

    Advances in computational PIR:

    1. In 1999, Cachin, Micali and Stadler achieved poly-logarithmic communication complexity. The security of their system is based on the Phi-hiding assumption.
    2. In 2004 Chang achieved logarithmic (server-side) communication complexity. The security of his system reduces to the semantic security of the Paillier cryptosystem.
    3. In 2005 C. Gentry and Z. Ramzan achieved constant communication complexity. The security of their scheme is also based on a variant of the Phi-hiding assumption.

    Advances in information theoretic PIR:

    Achieving information theoretic security requires the assumption that there are multiple non-cooperating servers, each having a copy of the database. Without this assumption, any information-theoretically secure PIR protocol requires an amount of communication that is at least the size of the database. Single database PIR with sub-linear communication complexity cannot be achieved in the information theoretic model, so some computational assumptions must be made for this. This work addresses the information-theoretic setting for PIR, in which the user's privacy should be unconditionally protected from collusions of servers. We present a unified general construction, whose abstract components can be instantiated to yield both old and new families of PIR protocols.

    Information Extraction:
    Information extraction techniques, primarily developed in the context of the Advanced Research Projects Agency (herein after referred as the ARPA), Message Understanding Conferences (herein after referred as “MUCs”), are designed to identify database entities, attributes and relationships in full text. For example, for people interested in new joint ventures, an information extraction system could identify the names of the companies involved, the new company, the products, and the location, all from articles coming over a news feed. Companies and government agencies have considerable interest in these techniques, and see them as contributing significant added-value to the text databases they and others generate. The current state of information extraction tools is such that it requires a considerable investment to build a new extraction application, and certain types of information are very difficult to identify.

    Multimedia Retrieval:
    Multimedia indexing and retrieval refers to techniques being developed to access image, video and sound databases without text descriptions. The perceived value of multimedia information systems is very high and, consequently, industry has a considerable interest in the development of these techniques. General solutions to multimedia indexing are very difficult and, where they currently exist, tend to be of limited utility. An example of this is indexing images by their color distribution. This technique can be effectively used in some applications, such as retrieving pictures of fabric in specified color shades, but in many other applications simply cannot be used. Some progress has been made in multimedia indexing for specific applications (for example, retrieval of photographs of faces), and in processing language-related multimedia.

    Effective Retrieval:

    The development of effective retrieval techniques has been the core of IR research for more than 30 years. A number of measures of effectiveness have been proposed, but the most frequently mentioned are recall and precision. Finding text that satisfies a user’s information need is not simple, and considerable progress has been made in developing ranking techniques that are significantly more effective than Boolean logic.

    Having a more effective retrieval engine is a major selling point. Companies are particularly interested in techniques that produce significant improvements (rather than a few percent average precision) and that avoid occasional major mistakes.

    Global computer-based communications and information technology cut across territorial borders, creating a new realm of human activity and undermining the feasibility and legitimacy of applying laws based on geographic boundaries. While these electronic communications play havoc with geographic boundaries, a new boundary, made up of the screens and passwords that separate the virtual world from the real world of atom, emerges. The explicit definition of complicated words plays a very important role to make it understand for any layman. With this it is easy to demonstrate the basic concept and its application in the society. This has become an interconnected network that ‘resonates’ with society; the law both influences and is influenced by the society in which it is constructed.
    Information technology today have impacts on virtually every aspect of society and every corner of the world in information or digital age fostering commerce, business, improving education and health care, and facilitating communications among all stakeholders. With the system of information retrieval, the society is able to retrieve relevant data concerning a particular search through an efficient system.

    Legal Enticement

    Legislative databases are a primary source of legal information. Legislative texts are currently accessible through specifically designed portal sites owned by governments or private institutions. The search engines of these portals usually offer a full-text search (i.e., every word of the text can be searched). They also allow for an extra selection of the content through filling out specific fields that represent certain structured content of the statute (e.g., statute title, number of an article, etc.). A full-text search is popular because it provides a flexible information access: The user can build any search query. The answers resulting from a full text search are ranked according to relevance to the query. Statutes are often long documents that are hierarchically divided into chapters, sections, article etc. It is important to return as an answer the parts of a statute that are most relevant for the information query.

    There are several essential key issues of current relevance to legal information systems as following:

    # the issue of accessibility of legal information and in particular questions of promulgation, cost, and availability will be explored. Then the importance of reliability of systems looking, in particular, at stability, accuracy, and authenticity.

    # the issue of use ably is explored, especially ease of use, functionality, and customization.

    Recently in a vision of the Civil Justice System in the Information Age, the Lord Chancellor’s Department identifies four key programmes for immediate implementation. The third of these programmes is stated to be ‘The provision of primary legal source materials online’.

    One of the first applications of that technology to law was the legal information system. Indeed, the earliest viable system was produced in back in 1960. From such primitive origins developed the first wave of legal information systems, private sector systems such as Lexis and Westlaw. The coming of the Internet and, in particular, its publishing arm the World Wide Web (the Web) triggered a new era of development. Interestingly, lawyers were involved in producing one of the earliest successful Web browsers. Based on this involvement, we can now find impressive examples of the second wave of legal information systems Web-based, public sector systems starting with the prototype Cornell Legal Information Institute (here in after referred as Cornell LII), including the hugely successful Australasian Legal Information Institute (herein after referred as “AUSTLII”), and now adding the British and Irish Legal Information Institute (herein after referred BAILII). At present, first wave private sector systems such as Lexis and Westlaw are migrating their collections onto the Web.

    Promulgation: The first issue of accessibility to be addressed is that of promulgation. It has been forcefully argued that, for a long time, there has been a ‘Catch 22’ within our legal system. While on the one hand everybody is presumed to know the law, on the other hand totally inadequate promulgation of that law has meant that virtually no one other than lawyers actually does know it. This change of position has now resulted in a substantial relaxation of restrictions on access to primary sources, opening the door to wholesale promulgation of law online.

    Availability: The third aspect of accessibility to be considered is that of availability. Clearly, the advent of legal information systems has greatly increased ease of access to primary sources. Online access via the Web means that sources are available very fast, at any time, and from anywhere. Even more strikingly, many users can access a single document simultaneously.

    Breadth of Coverage: The coverage of primary sources in legal information systems is linguistic, cultural, but there are practical difficulties in the way of achieving the fullest possible coverage. Both the breadth and depth of coverage of sources and the key issue of selectivity i.e. adopting an appropriate policy on which sources to include and which to exclude from the present systems.

    In an era of rapid globalization, a worldwide legal information system must surely be highly desirable. One day it may become a reality. No such system exists at present, although there have been some preliminary efforts in that direction. In the private sector, systems have gradually broadened their coverage. Lexis, for example, has steadily extended its coverage beyond US and major Commonwealth law jurisdictions to include a significant number of civil law jurisdictions, as well as some European Union and International sources.

    One interesting development is the Global Legal Information Network established by the Law Library of Congress and based in the USA. The Network is a centralised system, which, it is planned, will one day hold all the world's primary legal sources together with selected secondary sources. Linguistic and cultural difficulties are simply finessed by the decision both that the system will use English as the lingua franca, and the imposition of a range of standards relating both to the content and to the main search tool built around a giant legal thesaurus. Political and practical difficulties are to be reduced, it is hoped, by operating the system on a cooperative basis. All nations are invited to join the project. The ‘fee’ for joining is simply providing authentic versions of their own laws and being responsible for maintaining and updating those versions. In return for such a contribution, each participating nation obtains access via the Network to the laws of all other participating nations.

    Depth of Coverage: The successful application of information technology to the storage and retrieval of legal information has brought with it some major disadvantages. Perhaps the most widely accepted disadvantage is that of information overload. In common law jurisdictions, a striking manifestation of this problem is the steadily increasing amount of new case law of which practitioners and academics need to be aware. Before online information systems, the limited capacity of paper law reports dictated that firm policy decisions had to be made about which cases to publish and which not. Once those decisions had been made, however, unreported cases became difficult to track down and so where, in effect, forgotten. Increasingly sophisticated search and retrieval facilities that are associated with legal information systems guarantee that anything put into a system can be found again.

    Stability: There are three facets of reliability like stability, accuracy, and authenticity. For stability, the Web tended to be a highly unstable place. It was all too common to find that sites containing significant information had changed their URLs , sometimes leaving a forwarding address.

    Accuracy: Users need to feel confident that the contents of information systems can be relied upon. The most obvious problem is that crucial words or phrases may be missing. Needless to say, omission of a word like ‘not’ from a body of text can have potentially catastrophic results. So too, missing words from large numbers can utterly change meaning. More menacing, is the mistyped word that produces another word which is passed by superficial spell checking e.g. ‘to’ instead of ‘too’ or ‘two’. Such mistyping can be surprisingly common particularly where data capture, the digitisation of paper information is undertaken by personnel who are not used to legal terminology, or are not very familiar with the English language. The occasional mistake in paper documents will be no more than a minor annoyance. Such a loss can disrupt or even destroy the meaning of the section or regulation. As with content, so with layout, user confidence has to be ensured by applying rigorous quality control.

    Authenticity: Trust that the contents of legal information systems have not been deliberately sabotaged is just as important as confidence that no inaccuracies have accidentally crept in. Despite criminalization perhaps because of it both computer hacking and the launch of viruses onto the Web are becoming a global mass participation activity. While some hackers operate from the developed world, others are based in the developing world where there may be less rigorous policing of their activities.

    Over the last forty years, the development of legal information systems has been seen primarily as a process of automation. The technology has been viewed as enabling legislatures, courts, the professions and law schools to continue to function as they did in the world before computers existed, albeit with greater speed, increased efficiency, and reduced cost. However, the paradigm is shifting from that of automation towards innovation. Massive accessibility of information online coupled with the early fruits of research into artificial intelligence and the law, are combining to create not only hugely impressive new informational services, but also the possibility of an entirely new wave of legal knowledge systems. The difference between such knowledge systems and legal information systems is the difference between knowledge and information. Jonscher distinguishes these two key concepts by stating that, while information comprises the facts that are distilled from raw data, knowledge is a further distillation of ideas, thought and beliefs from that information. An information system is simply an enormous collection of facts. By contrast, a knowledge system comprises a subset of those facts structured, processed, and presented in such a way that it can provide advice and assistance to users.

    We are now moving beyond the information age into an era where machines will play a key role in helping us extract, understand and apply knowledge. In this coming era, the manner in which we learn, work and do business will be changed in ways that are unimaginable to us today.

    The author can be reached at: [email protected] / Print This Article

    The Right To Information Act: A Real Step To Ensure Good Governance
    Interpretation of Section 8 (1) (b) of The Right to Information Act, 2005
    Section.8(1)(j) of the RTI Act
    Right to Know-Constitutional Prospective
    Right to Information Act - Boon or Bane
    Expanding Horizons of Right to Information
    Application of RTI Act 2005 on private companies
    Right to Information as an Exception to Concepts of Locus Standi And PIL

    How To Submit Your Article:

    Follow the Procedure Below To Submit Your Articles

    Submit your Article by using our online form Click here
    Note* we only accept Original Articles, we will not accept Articles Already Published in other websites.
    For Further Details Contact: [email protected]

    Divorce by Mutual Consent in Delhi/NCR

    Mutual DivorceRight Away Call us at Ph no: 9650499965

    File Your Copyright - Right Now!

    Copyright Registration
    Online Copyright Registration in India
    Call us at: 9891244487 / or email at: [email protected]