GetLostFiles

From T2B Wiki
Jump to navigation Jump to search

Retrieve lost files from datasets

PageOutline


Introduction

Some files in datasets are not copied correctly to the T2's. It is at most a few files per dataset. A few scripts are in place to easily put them where they belong.

Identify

First, Go to the corrupted files page and see if there are any files from our site. If so, click on the "Get source JSON" tab

create the necessary files

Put the download file in /user/odevroed/Get_Lost_Files
Edit the file and remove all the html tags. This leaves only the json part. For further reference in this document, I called it cms-popularity.cern.ch.json
Run the script a first time (with option 1 :) ):

python /user/odevroed/bin/download_missing_dataset_files.py 1 ./cms-popularity.cern.ch.json

This will generate the file "files_to_retrieve" which contains only the files from our site that are not present (In case it was just a coincidence).
Go to this file and via DAS, find out at which site they reside. Usually you can aggregate many files from the list, present at one single site.
Put this list together in a new file, let's say "files_per_site" and give this as an argument for the second pass of the script (replace by the appropriate site name):

python /user/odevroed/bin/download_missing_dataset_files.py 2 T2_CH_CSCS ./files_per_site

The result of the script is the output:

fetching from site:
srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat

The files are ready to be transferred
First, copy the files to us:
     srmcp -debug -streams_num=1 -2 -copyjobfile=transfer_to_us
Then, put them on storage:
     srmcp -debug -streams_num=1 -2 -copyjobfile=put_on_storage

then do what the script told you to do :)



Template:TracNotice