Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Readme + Added compiled jar + Modified code to support overri… #82

Merged
merged 3 commits into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion data_labeling_examples/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,3 @@
.classpath
/target/
.settings/
src/main/java/com/oracle/.DS_Store
42 changes: 37 additions & 5 deletions data_labeling_examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,44 @@ Result of CUSTOM_LABELS_MATCH algorithm:

For more information [SDK for Java](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/javasdk.htm)

### Running the Utility
1. Open Terminal on your system.
2. Verify that Java 8 or higher is installed in the system. In case you do not have java installed on your system, download it from https://www.oracle.com/java/technologies/downloads/

```
java -version
```
3. Clone the repository.

```
git clone https://github.com/oracle-samples/oci-data-science-ai-samples.git
```
4. Go to data_labeling_examples directory

```
cd data_labeling_examples
```
5. Run the below command to bulk label by "FIRST_LETTER_MATCH" labeling algorithm.

```
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.us-ashburn-1.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a -DLABELING_ALGORITHM=FIRST_LETTER_MATCH -DLABELS=cat,dog -cp libs/bulklabelutility-v1.jar com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
```
6. Run the below command to bulk label by "FIRST_REGEX_MATCH" labeling algorithm.

```
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.us-ashburn-1.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a -DLABELING_ALGORITHM=FIRST_REGEX_MATCH -DFIRST_MATCH_REGEX_PATTERN=^abc* -DLABELS=cat,dog -cp libs/bulklabelutility-v1.jar com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
```
7. Run the below command to bulk label by "CUSTOM_LABELS_MATCH" labeling algorithm.

```
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.us-ashburn-1.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a -DLABELING_ALGORITHM=CUSTOM_LABELS_MATCH -DCUSTOM_LABELS='{"dog/": ["dog"], "cat/": ["cat"] }' -cp libs/bulklabelutility-v1.jar com.oracle.datalabelingservicesamples.scripts.CustomBulkLabelingScript
```

Note: You can override any config using -D followed by the configuration name. The list of all configurations are mentioned in following section.

### Configurations

Add the following configurations in config.properties file in the project to run the scripts:
Following is the list of all configurations (src/main/resources/config.properties file) supported by the bulk labeling script:

```
#Path of Config File
Expand All @@ -68,10 +103,7 @@ CONFIG_FILE_PATH=~/.oci/config
CONFIG_PROFILE=DEFAULT

#DLS DP URL
DLS_DP_URL=https://dlstest-dp.${REGION}.oci.oraclecloud.com

#Region where dataset is created
REGION=uk-london-1
DLS_DP_URL=https://dlsprod-dp.uk-london-1.oci.oraclecloud.com

#Dataset Id whose record you want to bulk label
DATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a
Expand Down
Binary file not shown.
47 changes: 40 additions & 7 deletions data_labeling_examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
<version>0.0.1-SNAPSHOT</version>
<name>OCI Data Labeling Service Examples</name>
<description>This repository contains code samples for OCI Data
Labeling Service</description>
Labeling Service
</description>
<dependencies>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
Expand All @@ -19,12 +20,6 @@
<artifactId>oci-java-sdk-datalabelingservicedataplane</artifactId>
<version>2.19.0</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.22</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
Expand All @@ -40,5 +35,43 @@
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.22</version>
<scope>provided</scope>
</dependency>
</dependencies>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<archive>
<manifest>
<mainClass>
com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,17 @@

public class DataLabelingConstants {

public static final int MAX_LIST_RECORDS_LIMITS = 1000;
public static final int MAX_LIST_RECORDS_LIMITS = 1000;
public static final int DEFAULT_THREAD_COUNT = 30;

public static final String CONFIG_FILE_PATH = "CONFIG_FILE_PATH";
public static final String CONFIG_PROFILE = "CONFIG_PROFILE";
public static final String DLS_DP_URL = "DLS_DP_URL";
public static final String DATASET_ID = "DATASET_ID";
public static final String REGION = "REGION";
public static final String LABELING_ALGORITHM = "LABELING_ALGORITHM";
public static final String THREAD_COUNT = "THREAD_COUNT";
public static final String LABELS = "LABELS";
public static final String CUSTOM_LABELS="CUSTOM_LABELS";
public static final String FIRST_MATCH_REGEX_PATTERN = "FIRST_MATCH_REGEX_PATTERN";
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,15 @@
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.oracle.bmc.datalabelingservicedataplane.model.RecordSummary;
import com.oracle.datalabelingservicesamples.requests.Config;

public class FirstRegexMatch implements LabelingStrategy {

private static final Pattern pattern = Pattern.compile(Config.INSTANCE.getRegexPattern());

@Override
public List<String> getLabel(RecordSummary record) {
Matcher m = pattern.matcher(record.getName());
Matcher m = Config.INSTANCE.getPattern().matcher(record.getName());
if (m.find()) {
String firstGroup = m.group(0);
for (String label : Config.INSTANCE.getLabels()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.regex.Pattern;

import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.exception.ExceptionUtils;

import com.fasterxml.jackson.core.JsonProcessingException;
Expand All @@ -14,6 +16,7 @@
import com.oracle.bmc.auth.AuthenticationDetailsProvider;
import com.oracle.bmc.auth.ConfigFileAuthenticationDetailsProvider;
import com.oracle.bmc.datalabelingservicedataplane.DataLabelingClient;
import com.oracle.datalabelingservicesamples.constants.DataLabelingConstants;
import com.oracle.datalabelingservicesamples.labelingstrategies.CustomLabelMatch;
import com.oracle.datalabelingservicesamples.labelingstrategies.FirstLetterMatch;
import com.oracle.datalabelingservicesamples.labelingstrategies.FirstRegexMatch;
Expand All @@ -33,77 +36,95 @@ public enum Config {
private String configProfile;
private String dpEndpoint;
private String datasetId;
private String region;

private List<String> labels;
private Map<String, List<String>> customLabels;
private String labelingAlgorithm;
private LabelingStrategy labelingStrategy;
private String regexPattern;
private Pattern pattern;
private int threadCount;

private Config() {
try {
Properties config = new Properties();
config.load(getClass().getClassLoader().getResourceAsStream("config.properties"));
configFilePath = config.getProperty("CONFIG_FILE_PATH");
configProfile = config.getProperty("CONFIG_PROFILE");
dpEndpoint = config.getProperty("DLS_DP_URL");
datasetId = config.getProperty("DATASET_ID");
region = config.getProperty("REGION");
labelingAlgorithm = config.getProperty("LABELING_ALGORITHM");
String threadConfig = config.getProperty("THREAD_COUNT");
if (!threadConfig.isEmpty()) {
threadCount = Integer.parseInt(threadConfig);
} else {
threadCount = 20;
}
configFilePath = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.CONFIG_FILE_PATH))
? config.getProperty(DataLabelingConstants.CONFIG_FILE_PATH)
: System.getProperty(DataLabelingConstants.CONFIG_FILE_PATH);
configProfile = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.CONFIG_PROFILE))
? config.getProperty(DataLabelingConstants.CONFIG_PROFILE)
: System.getProperty(DataLabelingConstants.CONFIG_PROFILE);
dpEndpoint = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.DLS_DP_URL))
? config.getProperty(DataLabelingConstants.DLS_DP_URL)
: System.getProperty(DataLabelingConstants.DLS_DP_URL);
datasetId = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.DATASET_ID))
? config.getProperty(DataLabelingConstants.DATASET_ID)
: System.getProperty(DataLabelingConstants.DATASET_ID);
labelingAlgorithm = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.LABELING_ALGORITHM))
? config.getProperty(DataLabelingConstants.LABELING_ALGORITHM)
: System.getProperty(DataLabelingConstants.LABELING_ALGORITHM);
String threadConfig = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.THREAD_COUNT))
? config.getProperty(DataLabelingConstants.THREAD_COUNT)
: System.getProperty(DataLabelingConstants.THREAD_COUNT);
threadCount = (!threadConfig.isEmpty()) ? Integer.parseInt(threadConfig)
: DataLabelingConstants.DEFAULT_THREAD_COUNT;
performAssertionOninput();
initializeLabelingStrategy();
validateAndInitializeLabels(config);
dpEndpoint = dpEndpoint.replace("${REGION}", region);
dlsDpClient = initializeDpClient();
} catch (IOException ex) {
ExceptionUtils.wrapAndThrow(ex);
}
}

private void initializeLabelingStrategy() {
switch (labelingAlgorithm) {
case "FIRST_LETTER_MATCH":
labelingStrategy = new FirstLetterMatch();
break;

case "FIRST_REGEX_MATCH":
labelingStrategy = new FirstRegexMatch();
break;

case "CUSTOM_LABELS_MATCH":
labelingStrategy = new CustomLabelMatch();
break;
}
}

@SuppressWarnings("unchecked")
private void validateAndInitializeLabels(Properties config) {
switch (labelingAlgorithm) {
case "FIRST_LETTER_MATCH":
case "FIRST_REGEX_MATCH":
labels = Arrays.asList(config.getProperty("LABELS").split(","));
String inputlLabels = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.LABELS))
? config.getProperty(DataLabelingConstants.LABELS)
: System.getProperty(DataLabelingConstants.LABELS);
labels = Arrays.asList(inputlLabels.split(","));
assert null != labels && labels.isEmpty() == false : "Labels Cannot be empty";
break;

case "CUSTOM_LABELS_MATCH":
try {
String customLabel = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.CUSTOM_LABELS))
? config.getProperty(DataLabelingConstants.CUSTOM_LABELS)
: System.getProperty(DataLabelingConstants.CUSTOM_LABELS);
ObjectMapper mapper = new ObjectMapper();
customLabels = mapper.readValue(config.getProperty("CUSTOM_LABELS"), Map.class);
customLabels = mapper.readValue(customLabel, Map.class);
} catch (JsonProcessingException e) {
log.error("Invalid Custom Labels Provided as Input");
ExceptionUtils.wrapAndThrow(e);
}

}
if (labelingAlgorithm.equals("FIRST_REGEX_MATCH")) {
regexPattern = config.getProperty("FIRST_MATCH_REGEX_PATTERN");
}
}

private void initializeLabelingStrategy() {
switch (labelingAlgorithm) {
case "FIRST_LETTER_MATCH":
labelingStrategy = new FirstLetterMatch();
break;
}

case "FIRST_REGEX_MATCH":
labelingStrategy = new FirstRegexMatch();
break;

case "CUSTOM_LABELS_MATCH":
labelingStrategy = new CustomLabelMatch();
break;
if (labelingAlgorithm.equals("FIRST_REGEX_MATCH")) {
regexPattern = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.FIRST_MATCH_REGEX_PATTERN))
? config.getProperty(DataLabelingConstants.FIRST_MATCH_REGEX_PATTERN)
: System.getProperty(DataLabelingConstants.FIRST_MATCH_REGEX_PATTERN);
pattern = Pattern.compile(regexPattern);
}
}

Expand All @@ -118,7 +139,6 @@ private DataLabelingClient initializeDpClient() {
final AuthenticationDetailsProvider configFileProvider = new ConfigFileAuthenticationDetailsProvider(
configFile);
dlsDpClient = new DataLabelingClient(configFileProvider);
dlsDpClient.setRegion(region);
dlsDpClient.setEndpoint(dpEndpoint);
return dlsDpClient;
}
Expand All @@ -128,7 +148,6 @@ private void performAssertionOninput() {
assert configProfile != null : "Config Profile cannot be empty";
assert dpEndpoint != null : "DLS DP URL cannot be empty";
assert datasetId != null : "Dataset Id cannot be empty";
assert region != null : "Region Cannot be empty";
assert labelingAlgorithm != null : "Labeling Strategy cannot be empty";
assert threadCount >= 1 : "Invalid Thread Count Passed";
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ public static void main(String[] args) throws InterruptedException, ExecutionExc
.runAsync(() -> processAnnotationForRecord(record, label), executorService);
completableFutures.add(future);
} else {
log.error("Label is null for record {}",record);
failedRecordIds.add(record.getId());
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ public static void main(String[] args) throws InterruptedException, ExecutionExc
.runAsync(() -> processAnnotationForRecord(record, label), executorService);
completableFutures.add(future);
} else {
log.error("Label is null for record {}",record);
failedRecordIds.add(record.getId());
}
}
Expand Down
10 changes: 2 additions & 8 deletions data_labeling_examples/src/main/resources/config.properties
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
CONFIG_FILE_PATH=~/.oci/config
CONFIG_PROFILE=DEFAULT
DLS_DP_URL=https://dlstest-dp.${REGION}.oci.oraclecloud.com
REGION=uk-london-1
DLS_DP_URL=https://dlsprod-dp.uk-london-1.oci.oraclecloud.com
THREAD_COUNT=30

DATASET_ID=ocid1.datalabelingdatasetint.oc1.uk-london-1.amaaaaaaniob46iarr2zttq7c5th3jfqwab7d3vrq4daa52tcnnwhkgrowca
Expand All @@ -16,9 +15,4 @@ LABELS=cat,dog
FIRST_MATCH_REGEX_PATTERN=^abc*

#Used for CUSTOM_LABELS_MATCH labeling algorithm
CUSTOM_LABELS={ "dog/": ["dog","pup"], "cat/": ["cat","kitten"] }





CUSTOM_LABELS={ "dog/": ["dog","pup"], "cat/": ["cat","kitten"] }