|
Introduction
Law is an unusually information-rich discipline. Not only is there
an enormous text-based literature about the law, but even the raw
materials of law is textual in nature. Lawyers both practitioners
and academics can all be thought of as wordsmiths, or rather less
kindly as wordmongers. It is not surprising that such an
information-orientated discipline as law should have embraced
information technology from an early stage. Our accustomed systems
of retrieving particular bits of information no longer fill the
needs of many people. Searching traditional indexes of print
publications has been aided by computerized databases, but still
usually requires time consuming serial searching of one database
after the other, and then moving on to other methods of searching
for internet sources.
The present paper is basically
a research work on Information Retrieval and its Legal Effect.
In order to make it comprehensive it proposed to do the basic
concept of Information Retrieval, its impact on the society and
legal enticement.
Since the 1940s the problem of information storage and retrieval
has attracted increasing attention. It is simply stated: we have
vast amounts of information to which accurate and speedy access is
becoming ever more difficult. One effect of this is that relevant
information gets ignored since it is never uncovered, which in
turn leads to much duplication of work and effort. With the advent
of computers, a great deal of thought has been given to using them
to provide rapid and intelligent retrieval systems. In libraries,
many of which certainly have an information storage and retrieval
problem, some of the more mundane tasks, such as cataloguing and
general administration, have successfully been taken over by
computers. However, the problem of effective retrieval remains
largely unsolved.
From the period of history,
one of the first applications of the technology to law was the
legal information system. The earliest viable system was produced
back in 1960. From such primitive origins developed the first wave
of legal information systems, private sector systems such as Lexis
and Westlaw. The coming of the Internet and, in particular, its
publishing arm the World Wide triggered a new era of development.
Concept Of Information Retrieval
With the enormous increase in recent years in the number of text
databases available on-line, and the consequent need for better
techniques to access this information, there has been a strong
resurgence of interest in the research done in the area of
information retrieval (herein after referred as
IR). For many
years, IR research was done by a small community that had little
impact on industry. Most applications of text retrieval focused on
bibliographic databases, and the large information services such
as Dialog or Westlaw were based on standard Boolean logic
approaches to text matching and paid little attention to the
results of research on topics such as retrieval models, query
processing, term weighting and relevance feedback.
Since the 1940s, there is some problem of information storage and
retrieval has attracted increasing attention. It is simply stated:
we have vast amounts of information to which accurate and speedy
access is becoming ever more difficult. One effect of this is that
relevant information gets ignored since it is never uncovered,
which in turn leads to much duplication of work and effort. With
the advent of computers, a great deal of thought has been given to
using them to provide rapid and intelligent retrieval systems. In
libraries, many of which certainly have an information storage and
retrieval problem, some of the more mundane tasks, such as
cataloguing and general administration, have successfully been
taken over by computers. However, the problem of effective
retrieval remains largely unsolved.
In this context the technical
definition of information is useful
and it is a term which may be a synonym for “data” or “news” or
reports etc. In information science,
the word is used with a more specific meaning, but here also
definitions vary according to the pragmatics of the use. The
concept of information retrieval is broad and often employed in
an imprecise meaning. It is used to denote systems designed to
provide users or a group of users with information. Salton/McGill
(1983) gave this definition: An information retrieval
system is an information system, that is, a system used to store
items of information that need to be processed, searched,
retrieved, and disseminated to various user populations.
This definition of an
information system is a part of the legal communication process.
In the literature, there are a
number of different definitions of
information retrieval. The
concept is generally used in a more precise meaning than the one
above, making retrieval systems one type of information systems.
But a further limitation is generally made, often implicit to
computerized retrieval systems or bibliographical retrieval
systems. Perhaps one might say that
information retrieval system
has become accepted as denoting the type of systems discussed in
the works of Salton, Sparck Jones, and others. Lancaster (1978)
gives the following definition of an
information retrieval system:
As it is most commonly used, the term information retrieval is
really synonymous with literature searching. Information retrieval
is the process of searching some collection of documents, using
the term document in its widest sense, in order to identify those
documents which deal with a particular subject. Any system that is
designed to facilitate this literature searching may legitimately
be called an information retrieval system.
Information retrieval is the
science of searching for information in documents, searching for
documents themselves, searching for metadata which describe
documents, or searching within databases, whether relational
stand-alone databases or hypertext networked databases such as the
Internet or World Wide Web or intranets, for text, sound, images
or data. There is a common confusion, however, between data
retrieval, document retrieval, information retrieval, and text
retrieval, and each of these has its own bodies of literature,
theory, praxis and technologies. IR is like most embryonic fields
interdisciplinary, based on computer science, mathematics, library
science, information science, cognitive psychology, linguistics,
statistics, and physics.
According to
Science and
Technology Dictionary , information retrieval is the technique and
process of searching, recovering, and interpreting information
from large amounts of stored data. Britannica Concise Encyclopedia
says information retrieval is recovery of information, especially
in a database stored in a computer. Two main approaches are
matching words in the query against the database index (keyword
searching) and traversing the database using hypertext or
hypermedia links. Keyword searching has been the dominant approach
to text retrieval since the early 1960s; hypertext has so far been
confined largely to personal or corporate information-retrieval
applications. Evolving information-retrieval techniques,
exemplified by developments with modern Internet search engines,
combine natural language, hyperlinks, and keyword searching. Other
techniques that seek higher levels of retrieval precision are
studied by researchers involved with artificial intelligence.
Legal information systems are
online collections of legal information in full text form. Such
systems may also contain significant quantities of secondary
sources i.e. description, analysis, and evaluation of the law. An
object is an entity which keeps or stores information in a
database. User queries are matched to objects stored in the
database. A document is, therefore, a data object. Often the
documents themselves are not kept or stored directly in the IR
system, but are instead represented in the system by document
surrogates. For example, web search engines such as Google,
Live.com, or Yahoo search are the most visible IR applications.
In 1992 the US Department of
Defense, along with the National Institute of Standards and
Technology (NIST), cosponsored the Text Retrieval Conference (TREC)
as part of the TIPSTER text program. The aim of this was to look
into the information retrieval community by supplying the
infrastructure that was needed for such a huge evaluation of text
retrieval methodologies.
There are various ways to
measure how well the retrieved information matches (i.e., how well
it is relevant to) the intended information:
(a) Precision: The proportion of
retrieved and relevant documents to all the documents retrieved.
In binary classification, precision is analogous to positive
predictive value. Precision can also be evaluated at a given
cut-off rank, instead of all retrieved documents. The meaning and
usage of precision in the
field of Information Retrieval differs from the definition of
accuracy and precision within other branches of science and
technology.
(b) Recall:
The proportion of
relevant documents that are retrieved, out of all relevant
documents available. In binary classification, recall is called
sensitivity.
(c) Fall-Out: The proportion
of irrelevant documents that are retrieved, out of all irrelevant
documents available.
(d) F-measure: The weighted
harmonic mean of precision and recall, the traditional F- measure
or balanced F-score is known as the F1 measure, because recall and
precision are evenly weighted.
(e) Mean average precision:
Over a set of queries, find the mean of the average precisions,
where Average Precision is the average of the precision after each
relevant document is retrieved.
Generally software agents
group give a wide-range approached to information retrieval which
includes user profiling, information filtering, privacy,
recommender systems, community ware, negotiation mechanisms and
coordination. Peter Norvig in a presentation at the 2005, O’Reilly
Emerging Technology Conference of America said that researchers
in computational linguistics and information retrieval now have a
million times more data than was available 30 years ago. In this
presentation, Peter Norvig explores what this data can do for
problems in language understanding, translation, information
extraction, and inference, and extrapolates to what more data may
bring in the future.
Impact On The Society
Law means to regulate the society for its development and
establishment. The development and fall of any society depends
upon their legal system. From past 20 years computers are used in
legal information systems. But the rate of development this has
become hectic and difficult to relate with each other. There is a
need to monitor, where possible, the development of the use of
computers for legal purposes and avoid haphazard development and
proliferation of conflicting systems of information storage and
retrieval, and other uses of computers for lawyers. In other words
we can say that, since the beginning of civilization, man has
always been motivated by the need to make progress and better the
existing technologies. This has led to tremendous development and
progress which has been a launching pad for further developments.
Of all the significant advances made by mankind from the beginning
till date, probably the most important of them is the development
of Internet.
Given the speed with which industry has adopted the results of IR
research from the 1970s and 1980s, the IR community is faced with
identifying major new directions. The emergence of new
applications such as digital libraries is both an opportunity
and a challenge. These applications provide unique opportunities
as test beds for evaluating and stimulating research, but the
challenge for IR researchers is to define and pursue research
programs that maintain their relevance in a rapidly changing
environment. One problem is that the priorities that IR
researchers place on research issues are not necessarily the same
as those of companies and government agencies that use and sell IR
systems. Understanding those priorities and the operational
experience behind them will be part of the process of deciding
which issues are of fundamental importance and which are more
transient.
Today, however, the situation
is considerably different. Retrieval techniques based on IR
research have found their way into major information services (for
example, West Publishing’s system, Individual’s clipping service)
and the World Wide Web (for example, InfoSeek and Lycos). Many of
the features once considered too esoteric for the typical user,
such as natural language queries,
ranked retrieval results, term weighting, query-by-example, and query formulation assistance,
have become common and, indeed, necessary in most IR products.
Private Information Retrieval:
A private information retrieval (herein after referred as
PIR)
protocol allows a user to retrieve an item from a server in
possession of a database without revealing which item she is
retrieving.
In other words, we can say that PIR protocols allow users to
retrieve information from a database while keeping their query
private. Motivating examples for this problem include databases
with sensitive information, such as stocks, patents or medical
databases, in which users are likely to be highly motivated to
hide which record they are trying to retrieve. PIR protocols aim
at achieving this goal efficiently, where the main cost measure is
communication complexity. PIR is a strong primitive, which may
also be useful as a building block within other protocols.
One insignificant, but very
inefficient way to achieve PIR is for the server to send an entire
copy of the database to the user. In fact, this is the only
possible protocol that gives the user information theoretic
privacy for her query. There are two ways to get around this
problem, one is to make the server computationally bounded and the
other is to assume that there are multiple non-cooperating
servers, each having a copy of the database. The problem was
introduced in 1996 by Chor et al. Since then, very efficient
solutions have been discovered. Single database which is
computationally private PIR can be achieved with constant
communication and k-database which is information theoretic PIR
can be done with communication.
Advances in computational PIR:
1. In 1999, Cachin, Micali and
Stadler achieved poly-logarithmic communication complexity. The
security of their system is based on the Phi-hiding assumption.
2. In 2004 Chang achieved logarithmic (server-side) communication
complexity. The security of his system reduces to the semantic
security of the Paillier cryptosystem.
3. In 2005 C. Gentry and Z. Ramzan achieved constant communication
complexity. The security of their scheme is also based on a
variant of the Phi-hiding assumption.
Advances in information
theoretic PIR:
Achieving information theoretic security requires the assumption
that there are multiple non-cooperating servers, each having a
copy of the database. Without this assumption, any
information-theoretically secure PIR protocol requires an amount
of communication that is at least the size of the database. Single
database PIR with sub-linear communication complexity cannot be
achieved in the information theoretic model, so some computational
assumptions must be made for this. This work addresses the
information-theoretic setting for PIR, in which the user's privacy
should be unconditionally protected from collusions of servers. We
present a unified general construction, whose abstract components
can be instantiated to yield both old and new families of PIR
protocols.
Information Extraction:
Information extraction techniques, primarily developed in the
context of the Advanced Research Projects Agency (herein after
referred as the ARPA), Message Understanding Conferences (herein
after referred as “MUCs”), are designed to identify database
entities, attributes and relationships in full text. For example,
for people interested in new joint ventures, an information
extraction system could identify the names of the companies
involved, the new company, the products, and the location, all
from articles coming over a news feed. Companies and government
agencies have considerable interest in these techniques, and see
them as contributing significant added-value to the text
databases they and others generate. The current state of
information extraction tools is such that it requires a
considerable investment to build a new extraction application, and
certain types of information are very difficult to identify.
Multimedia Retrieval:
Multimedia indexing and retrieval refers to techniques being
developed to access image, video and sound databases without text
descriptions. The perceived value of multimedia information
systems is very high and, consequently, industry has a
considerable interest in the development of these techniques.
General solutions to multimedia indexing are very difficult and,
where they currently exist, tend to be of limited utility. An
example of this is indexing images by their color distribution.
This technique can be effectively used in some applications, such
as retrieving pictures of fabric in specified color shades, but in
many other applications simply cannot be used. Some progress has
been made in multimedia indexing for specific applications (for
example, retrieval of photographs of faces), and in processing
language-related multimedia.
Effective Retrieval:
The development of effective retrieval techniques has been the
core of IR research for more than 30 years. A number of measures
of effectiveness have been proposed, but the most frequently
mentioned are recall and precision. Finding text that satisfies a
user’s information need is not simple, and considerable progress
has been made in developing ranking techniques that are
significantly more effective than Boolean logic.
Having a more effective
retrieval engine is a major selling point. Companies are
particularly interested in techniques that produce significant
improvements (rather than a few percent average precision) and
that avoid occasional major mistakes.
Global computer-based communications and information technology
cut across territorial borders, creating a new realm of human
activity and undermining the feasibility and legitimacy of
applying laws based on geographic boundaries. While these
electronic communications play havoc with geographic boundaries, a
new boundary, made up of the screens and passwords that separate
the virtual world from the real world of atom, emerges. The
explicit definition of complicated words plays a very important
role to make it understand for any layman. With this it is easy to
demonstrate the basic concept and its application in the society.
This has become an interconnected network that ‘resonates’ with
society; the law both influences and is influenced by the society
in which it is constructed.
Information technology today
have impacts on virtually every aspect of society and every corner
of the world in information or digital age fostering commerce,
business, improving education and health care, and facilitating
communications among all stakeholders. With the system of
information retrieval, the society is able to retrieve relevant
data concerning a particular search through an efficient system.
Legal
Enticement
Legislative databases are a primary source of legal information.
Legislative texts are currently accessible through specifically
designed portal sites owned by governments or private
institutions. The search engines of these portals usually offer a
full-text search (i.e., every word of the text can be searched).
They also allow for an extra selection of the content through
filling out specific fields that represent certain structured
content of the statute (e.g., statute title, number of an article,
etc.). A full-text search is popular because it provides a
flexible information access: The user can build any search query.
The answers resulting from a full text search are ranked according
to relevance to the query. Statutes are often long documents that
are hierarchically divided into chapters, sections, article etc.
It is important to return as an answer the parts of a statute that
are most relevant for the information query.
There are several essential
key issues of current relevance to legal information systems as
following:
# the issue of accessibility of legal information and in
particular questions of promulgation, cost, and availability will
be explored. Then the importance of reliability of systems
looking, in particular, at stability, accuracy, and authenticity.
# the issue of use ably is explored, especially ease of use,
functionality, and customization.
Recently in a vision of the
Civil Justice System in the Information Age, the Lord Chancellor’s
Department identifies four key programmes for immediate
implementation. The third of these programmes is stated to be ‘The
provision of primary legal source materials online’.
One of the first applications
of that technology to law was the legal information system.
Indeed, the earliest viable system was produced in back in 1960.
From such primitive origins developed the first wave of legal
information systems, private sector systems such as Lexis and
Westlaw. The coming of the Internet and, in particular, its
publishing arm the World Wide Web (the Web) triggered a new era of
development. Interestingly, lawyers were involved in producing one
of the earliest successful Web browsers. Based on this
involvement, we can now find impressive examples of the second
wave of legal information systems Web-based, public sector systems
starting with the prototype Cornell Legal Information Institute
(here in after referred as Cornell LII), including the hugely
successful Australasian Legal Information Institute (herein after
referred as “AUSTLII”), and now adding the British and Irish Legal
Information Institute (herein after referred BAILII). At
present, first wave private sector systems such as Lexis and
Westlaw are migrating their collections onto the Web.
Accessibility:
Promulgation: The first issue of accessibility to be addressed is
that of promulgation. It has been forcefully argued that, for a
long time, there has been a ‘Catch 22’ within our legal system.
While on the one hand everybody is presumed to know the law, on
the other hand totally inadequate promulgation of that law has
meant that virtually no one other than lawyers actually does know
it. This change of position has now resulted in a substantial
relaxation of restrictions on access to primary sources, opening
the door to wholesale promulgation of law online.
Availability:
The third aspect
of accessibility to be considered is that of availability.
Clearly, the advent of legal information systems has greatly
increased ease of access to primary sources. Online access via the
Web means that sources are available very fast, at any time, and
from anywhere. Even more strikingly, many users can access a
single document simultaneously.
Coverage:
Breadth of Coverage: The coverage of primary sources in legal
information systems is linguistic, cultural, but there are
practical difficulties in the way of achieving the fullest
possible coverage. Both the breadth and depth of coverage of
sources and the key issue of selectivity i.e. adopting an
appropriate policy on which sources to include and which to
exclude from the present systems.
In an era of rapid
globalization, a worldwide legal information system must surely be
highly desirable. One day it may become a reality. No such system
exists at present, although there have been some preliminary
efforts in that direction. In the private sector, systems have
gradually broadened their coverage. Lexis, for example, has
steadily extended its coverage beyond US and major Commonwealth
law jurisdictions to include a significant number of civil law
jurisdictions, as well as some European Union and International
sources.
One interesting development is
the Global Legal Information Network established by the
Law Library of Congress and based in the USA. The Network is a
centralised system, which, it is planned, will one day hold all
the world's primary legal sources together with selected secondary
sources. Linguistic and cultural difficulties are simply finessed
by the decision both that the system will use English as the
lingua franca, and the imposition of a range of standards relating
both to the content and to the main search tool built around a
giant legal thesaurus. Political and practical difficulties are to
be reduced, it is hoped, by operating the system on a cooperative
basis. All nations are invited to join the project. The ‘fee’ for
joining is simply providing authentic versions of their own laws
and being responsible for maintaining and updating those versions.
In return for such a contribution, each participating nation
obtains access via the Network to the laws of all other
participating nations.
Depth of Coverage:
The
successful application of information technology to the storage
and retrieval of legal information has brought with it some major
disadvantages. Perhaps the most widely accepted disadvantage is
that of information overload. In common law jurisdictions, a
striking manifestation of this problem is the steadily increasing
amount of new case law of which practitioners and academics need
to be aware. Before online information systems, the limited
capacity of paper law reports dictated that firm policy decisions
had to be made about which cases to publish and which not. Once
those decisions had been made, however, unreported cases became
difficult to track down and so where, in effect, forgotten.
Increasingly sophisticated search and retrieval facilities that
are associated with legal information systems guarantee that
anything put into a system can be found again.
Reliability:
Stability: There are three facets of reliability like stability,
accuracy, and authenticity. For stability, the Web tended to be a
highly unstable place. It was all too common to find that sites
containing significant information had changed their URLs ,
sometimes leaving a forwarding address.
Accuracy: Users need to feel
confident that the contents of information systems can be relied
upon. The most obvious problem is that crucial words or phrases
may be missing. Needless to say, omission of a word like ‘not’
from a body of text can have potentially catastrophic results. So
too, missing words from large numbers can utterly change meaning.
More menacing, is the mistyped word that produces another word
which is passed by superficial spell checking e.g. ‘to’ instead of
‘too’ or ‘two’. Such mistyping can be surprisingly common
particularly where data capture, the digitisation of paper
information is undertaken by personnel who are not used to legal
terminology, or are not very familiar with the English language.
The occasional mistake in paper documents will be no more than a
minor annoyance. Such a loss can disrupt or even destroy the
meaning of the section or regulation. As with content, so with
layout, user confidence has to be ensured by applying rigorous
quality control.
Authenticity:
Trust that the contents of legal information systems have not been
deliberately sabotaged is just as important as confidence that no
inaccuracies have accidentally crept in. Despite criminalization
perhaps because of it both computer hacking and the launch of
viruses onto the Web are becoming a global mass participation
activity. While some hackers operate from the developed world,
others are based in the developing world where there may be less
rigorous policing of their activities.
Conclusion
Over the last forty years, the development of legal information
systems has been seen primarily as a process of automation. The
technology has been viewed as enabling legislatures, courts, the
professions and law schools to continue to function as they did in
the world before computers existed, albeit with greater speed,
increased efficiency, and reduced cost. However, the paradigm is
shifting from that of automation towards innovation. Massive
accessibility of information online coupled with the early fruits
of research into artificial intelligence and the law, are
combining to create not only hugely impressive new informational
services, but also the possibility of an entirely new wave of
legal knowledge systems. The difference between such knowledge
systems and legal information systems is the difference between
knowledge and information. Jonscher distinguishes these two key
concepts by stating that, while information comprises the facts
that are distilled from raw data, knowledge is a further
distillation of ideas, thought and beliefs from that information.
An information system is simply an enormous collection of facts.
By contrast, a knowledge system comprises a subset of those facts
structured, processed, and presented in such a way that it can
provide advice and assistance to users.
We are now moving beyond the
information age into an era where machines will play a key role in
helping us extract, understand and apply knowledge. In this coming
era, the manner in which we learn, work and do business will be
changed in ways that are unimaginable to us today.
|