The French National Audiovisual Institute (INA) provides the scientific and technological community a corpus of audiovisual documents from its collections, document sheets and metadata associated with these documents.
This corpus is intended for finalising, experimentation and evaluation of search and analysis tools for multimedia content, strictly as part of scientific research. To access the Corpus, you must have first subscribed and have an FTP client available allowing you to download the Corpus.
The Corpus is made available under the conditions specified in the General Conditions of Use (GCU) , to any legal entity having previously subscribed and accepted all the aforesaid Conditions of Use (hereafter called the 'User').
Subscription is restricted to research laboratories, innovative SMEs as well as all other legal entities having a scientific research department or activity.
Before submitting your subscription request, check that all the required fields (marked with an asterisk) have been completed.
Your request will be sent to INA for consideration. After your request has been approved, you will receive an e-mail confirming the address for the FTP server as well as the confidential login name and password assigned to your Organisation, allowing you to access the Corpus.
Updated date: 01/04/2015
The French National Audiovisual Institute (INA) provides the scientific and technological community a corpus of audiovisual documents from its collections, document sheets and metadata associated with these documents. This corpus is intended for finalising, experimentation and evaluation of search and analysis tools for multimedia content, strictly as part of scientific research. The Corpus is located on the FTP server accessible using a login name and password supplied to the User by INA (hereafter 'the FTP Server').
The Corpus is made available under the conditions specified in those General Conditions of Use (GCU), any legal entity having previously subscribed and accepted all the aforesaid Conditions of Use (hereafter called the 'User').
Subscription is restricted to research laboratories, innovative SMEs as well as all other legal entities having a scientific research department or activity.
The User acknowledges and accepts that the Corpus is supplied 'as is'.
The only people authorised to access the Corpus are individuals working under the User's control, authority and responsibility as part of scientific research ('Authorised Persons').
The User undertakes to ensure that the login name and password providing access to the FTP Server are only communicated to Authorised Persons and remain strictly confidential.
The Corpus is made available by INA to the User free of charge, non-exclusively, and is non-transferable, strictly for scientific research purposes.
Strictly as part of scientific research, only Authorised Persons may for a period of two years:
Any use of the Corpus for other purposes or under other conditions must first be granted prior written permission from INA.
In particular, the User and Authorised Persons undertake:
The User undertakes and ensures that Persons authorised to use the Corpus accept and comply with the stipulations of these Conditions of Use.
The User shall ensure that all Authorised Persons no longer under their responsibility immediately ceases to access the Corpus and to use it.
Research Publications and Results including elements of the Corpus (such as images or extracts from leaflets) may not be disclosed without prior written authorisation from INA.
The User undertakes to inform INA of any scientific Publications relating to the research work conducted from the Corpus.
The User undertakes to make Results generated from the Corpus available to INA, under the following conditions:
These Results will be sent to the following address dataset@ina.fr
The User supplying these Results authorises:
The Corpus is accessible with effect from 01/06/2015 until 31/12/2016. This period may be extended by INA for an unspecified duration.
The Corpus may be used under the User's responsibility for a period of two (2) years after INA has sent the User their login name and password (based on the date the e-mail was sent).
At the end of this period or premature termination, the User undertakes to:
INA may terminate access and use of the Corpus before the end of the aforementioned periods, without compensation or notice, under the following circumstances:
The Corpus is protected by the law and particularly by the provisions of the French Intellectual Property Code.
All rights over the Corpus are therefore strictly reserved.
The User and Authorised Persons acquire no intellectual property rights over the Corpus and its composite parts.
The User undertakes and ensures that they are not used under conditions other than those expressly permitted by these Conditions of Use.
Any use of the Corpus, Research Results generated from the Corpus, as well as any Publication, authorised under conditions specified in Conditions of Use, must always mention the origin of the Corpus, making reference to INA.
The connection to the FTP Server and its use are entirely at the User's responsibility and at their sole risk.
The User acknowledges and accepts the specifications, technical performance, limits and risks of the Internet network.
The User is responsible for taking all appropriate measures to protect their own data and/or software and/or hardware against all harm, hijacking, pirating, virus, malevolent or intrusive programs.
The User is solely responsible for the use made of the FTP Server and/or the Corpus and misuse made from or through the FTP Server and/or Corpus, particularly their illicit, non-compliant and/or unauthorised use. The User guarantees INA against any recourse or action taken by any third party in this respect.
INA does not guarantee the FTP Server will be regularly accessible. Access to the FTP Server may be interrupted by INA at any time, for maintenance or through force majeure; INA declines any liability in this respect.
It is understood that the Corpus is supplied 'as is'.
INA does not guarantee the accuracy, precision or completeness of documents made available in the Corpus. As a result, INA declines any liability for any inaccuracy, lack of precision or omission relating to these documents.
In general, INA cannot be held liable for any direct or indirect loss that could result from:
INA reserves the right to amend the Conditions of Use at any time and without notice, particularly in order to take account of any legal, regulatory, editorial and/or technical change.
Amendments to the Conditions of Use will take effect and will apply to the User as soon as they are published on the FTP Server.
The date of the last update will be indicated at the top of the document.
These Conditions of Use are subject to French law. Any dispute regarding the application, interpretation or execution of these Conditions of Use will be subject to the legally competent French jurisdictions.
Updated date: 22/02/2018
The French National Audiovisual Institute (INA) provides the scientific and technological community a corpus of audiovisual documents from its collections, document sheets and metadata associated with these documents.
This corpus is intended for finalising, experimentation and evaluation of search and analysis tools for multimedia content, strictly as part of scientific research.
The description of the sub-corpora is given below. The figures and formats are provided for information purposes.
Entirety of the TV broadcast news of the “20 heures de France 2” from the 1st of January to the 30th of June 2007 together with the corresponding archivists’ notes.
Name: 2007 F2, 6 mois de 20 heures
Number of video documents: 181
Media format: MPEG-1
Channel: France2
Total duration: ~100 hours
Time span: 1st of January 2007 – 30th of June 2007
Number of archivists’ notes: 181 summary notes and ~4500 topic notes
Format of archivists’ notes: XML/MS-Word
Folder: /f2jt2007
This corpus has been used in the “Person Discovery” task of MediaEval 2015 and MediaEval 2016 evaluation campaigns (see https://github.com/MediaevalPersonDiscoveryTask/).
Corpus consisting of various TV documents collected for the Mex-Culture project (Indexing of multimedia collections for the preservation and dissemination of Mexican culture).
Name: MEXaction
Number of video documents: 114
Media format: MPEG-1
Channel: Les Actualités Françaises, ORTF, TF1, FR2, FR3
Total duration: ~77 hours
Time span: 1942 – 2011
Number of archivists’ notes: 114
Format of archivists’ notes: XML/MS-Word
Folder: /mexaction
This corpus is also part of the MEXAction2 dataset (see http://mexculture.cnam.fr/xwiki/bin/view/Datasets/Mex+action+dataset).
Thirty years of weekly news reports shown in cinemas from 1940 to 1969.
Name: Actualités Françaises
Number of video documents: ~22500
Media format: MPEG-4 AVC (H.264)
Channel: Les Actualités Françaises
Total duration: ~300 hours
Time span: 1940 – 1969
Number of archivists’ notes: ~22500
Format of archivists’ notes: XML/MS-Word
Folder: /AF
Six TV versions of Moliere’s theatre play, “Le Misanthrope”.
Name: Misanthrope
Number of video documents: 6
Media format: MPEG-4 AVC (H.264)
Channel: ORTF, TF1, A2, FR3
Total duration: ~12 hours
Time span: 1959 – 1980
Number of archivists’ notes: 6
Format of archivists’ notes: XML/MS-Word
Folder: /misanthrope
50 years of broadcasting of the radio show “Le Masque et la plume” dedicated to literature, theatre and cinema.
Name: Le Masque et la plume
Number of video documents: ~2500
Media format: MPEG-1/2 Audio Layer 3 (MP3)
Channel: ORTF, France Inter
Total duration: ~1700 hours
Time span: 1955 – 2005
Number of archivists’ notes: ~2500
Format of archivists’ notes: XML/MS-Word
Folder: /lemasqueetlaplume
A full week of broadcast focused on the Edward Snowden revelations for 3 TV channels (France2, France5, France24) and 3 radio channels (France Inter, France Info, France Culture).
Name: Affaire Snowden
Number of video documents: 1008
Media format: MPEG-4 AVC (H.264) and MPEG-1/2 Audio Layer 3 (MP3)
Channel: France2, France5, France24, France Inter, France Info, France Culture
Total duration: 1008 hours
Time span: 7th of June 2013 – 14th of June 2013
Number of archivists’ notes: ~1000 par chaîne
Format of archivists’ notes: XML/MS-Word
Folder: /Snowden
This corpus has been used in the “Person Discovery” task of the MediaEval 2016 evaluation campaigns (see https://github.com/MediaevalPersonDiscoveryTask/).
A full week of broadcast focused on the film “The Artist” when winning the Oscar for best picture for 3 TV channels (France2, France5, France24) and 3 radio channels (France Inter, France Info, France Culture).
Name: The Artist
Number of video documents: 1008
Media format: MPEG-4 AVC (H.264) and MPEG-1/2 Audio Layer 3 (MP3)
Channel: France2, France5, France24, France Inter, France Info, France Culture
Total duration: 1008 hours
Time span: 26th of February 2012 – 4th of March 2012
Number of archivists’ notes: ~1000 par chaîne
Format of archivists’ notes: XML/MS-Word
Folder: /theartist
Corpus consisting of various TV and radio documents dealing with the terrorist attack of the 11th of September 2001.
Name: 11 septembre 2001
Number of video documents: to be defined
Media format: to be defined
Channel: to be defined
Total duration: to be defined
Time span: to be defined
Number of archivists’ notes: to be defined
Format of archivists’ notes: XML/MS-Word
Folder: /11septembre2001
Corpus of 10M frames from TV broadcast (2010-2019) for learning a visual context. All faces have been blurred. The dataset contains a training set, a validation set, a test set and a verification test. The frames are organised as pairs (one pair being formed of frames containing at least one common face) and/or triplets so that they can be used for training or evaluation.
Name: Visual context for TV Programs
Number of video documents: 10000000
Media format: JPG
Channel: N/A
Total duration: N/A
Time span: 01st of january 2010 - 31st of december 2019
Number of archivists’ notes: N/A
Format of archivists’ notes: N/A
Folder: /vctp